python get unicode string size

I have a binary file. This file contains an UTF-8 string. Moreover, it is guaranteed that this string is just a single word. In python, how can I get number of letters in this string?

Let's say, I opened this file and read bytes:

bytes = open("1.dat", "rb").read()

What next have I to do to find out length (in letters, not bytes) of UTF-8 string?


unicode_string = bytes.decode("utf-8")
print len(unicode_string)

Category:python Time:2011-11-08 Views:0

Related post

  • how to convert python/cython unicode string to array of long integers, to do levenshtein edit distance 2010-07-29

    Possible Duplicate: How to correct bugs in this Damerau-Levenshtein implementation? I have the following Cython code (adapted from the bpbio project) that does Damerau-Levenenshtein edit-distance calculation: #----------------------------------------

  • Python: Passing unicode string to C++ module 2010-09-19

    I'm working with an existing module at the moment that provides a C++ interface and does a few operations with strings. I needed to use Unicode strings and the module unfortunately didn't have any support for a Unicode interface, so I wrote an extra

  • Python: Split unicode string on word boundaries 2009-11-15

    I need to take a string, and shorten it to 140 characters. Currently I am doing: if len(tweet) > 140: tweet = re.sub(r"\s+", " ", tweet) #normalize space footer = "… " + utils.shorten_urls(post['url']) avail = 140 - len(footer) words = tweet.split

  • Python strip() unicode string? 2011-08-31

    How can you use string methods like strip() on a unicode string? and can't you access characters of a unicode string like with oridnary strings? (ex: mystring[0:4] ) --------------Solutions------------- It's working as usual, as long as they are actu

  • Python: Convert unicode string to MM/DD/YYYY 2012-03-27

    I have a unicode string for example u'Mar232012'. I want to convert it to the format MM/DD/YYYY using python in the post efficient and reliable manner. --------------Solutions------------- import datetime datetime.datetime.strptime(u'Mar232012', '%b%

  • Python : convert unicode string to raw object/text 2011-03-14

    I've got a set of key, value pairs dictionary in my Django application. The value in the dictionary is a string type. {u'question': u'forms.CharField(max_length=512)'} I need to convert this "value" string to an actual object, and get something like

  • Why does Python print unicode characters when the default encoding is ASCII? 2010-04-08

    From the Python 2.6 shell: >>> import sys >>> print sys.getdefaultencoding() ascii >>> print u'\xe9' é >>> I expected to have either some gibberish or an Error after the print statement, since the "é" character isn

  • Python: quickly loading 7 GB of text files into unicode strings 2010-09-05

    I have a large directory of text files--approximately 7 GB. I need to load them quickly into Python unicode strings in iPython. I have 15 GB of memory total. (I'm using EC2, so I can buy more memory if absolutely necessary.) Simply reading the files

  • Getting the same Unicode string length in both Python 2 and 3? 2013-05-10

    Uhh, Python 2 / 3 is so frustrating... Consider this example, #!/usr/bin/env python # -*- coding: utf-8 -*- import sys if sys.version_info[0] < 3: text_type = unicode binary_type = str def b(x): return x def u(x): return unicode(x, "utf-8

  • conversion of unicode string in python 2008-12-17

    I need to convert unicode strings in Python to other types such as unsigned and signed int 8 bits,unsigned and signed int 16 bits,unsigned and signed int 32 bits,unsigned and signed int 64 bits,double,float,string,unsigned and signed 8 bit,unsigned a

  • How to read Unicode input and compare Unicode strings in Python? 2009-01-25

    I work in Python and would like to read user input (from command line) in Unicode format, ie a Unicode equivalent of raw_input? Also, I would like to test Unicode strings for equality and it looks like a standard == does not work. Thank you for your

  • What is the best way to remove accents in a Python unicode string? 2009-02-05

    I have a Unicode string in Python, and I would like to remove all the accents (diacritics). I found on the Web an elegant way to do this in Java: convert the Unicode string to its long normalized form (with a separate character for letters and diacri

  • Converting to Precomposed Unicode String using Python-AppKit-ObjectiveC 2009-04-27

    This document by Apple Technical Q&A QA1235 describes a way to convert unicode strings from a composed to a decomposed version. Since I have a problem with file names containing some characters (e.g. an accent grave), I'd like to try the conversi

  • Reading "raw" Unicode-strings in Python 2009-05-26

    I am quite new to Python so my question might be silly, but even though reading through a lot of threads I didn't find an answer to my question. I have a mixed source document which contains html, xml, latex and other textformats and which I try to g

  • How do I sort unicode strings alphabetically in Python? 2009-07-08

    Python sorts by byte value by default, which means é comes after z and other equally funny things. What is the best way to sort alphabetically in Python? Is there a library for this? I couldn't find anything. Preferrably sorting should have language

  • Writing unicode strings via sys.stdout in Python 2009-09-24

    Assume for a moment that one cannot use print (and thus enjoy the benefit of automatic encoding detection). So that leaves us with sys.stdout. However, sys.stdout is so dumb as to not do any sensible encoding. Now one reads the Python wiki page Print

  • Python's libxml2 can't parse unicode strings 2009-10-14

    OK, the docs for Python's libxml2 bindings are really ****. My problem: An XML document is stored in a string variable in Python. The string is a instance of Unicode, and there are non-ASCII characters in it. I want to parse it with libxml2, looking

  • How do I include unicode strings in Python doctests? 2009-11-14

    I am working on some code that has to manipulate unicode strings. I am trying to write doctests for it, but am having trouble. The following is a minimal example that illustrates the problem: # -*- coding: utf-8 -*- def mylen(word): """ >>>

  • Python: How to get StringIO.writelines to accept unicode string? 2009-11-30

    I'm getting a UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 34: ordinal not in range(128) on a string stored in 'a.desc' below as it contains the '£' character. It's stored in the underlying Google App Engine datastore

Copyright (C), All Rights Reserved.

processed in 0.365 (s). 13 q(s)