![]() The output should be 'UTF-8', but in my case it was 'ANSI_X3.4-1968', some variant of ASCII. Locale.getpreferredencoding(False) is the function called by open() when you don't provide an encoding. Huh, that is pretty odd Do you have (or are you trying to use) any internationalized domain names (domains that contain non-ASCII characters, such as an accented character (, etc. You can check which locale Python is using like this: > import locale Which caused Python to open files as ASCII instead of UTF-8. In my case, I was using POSIX, the default Ubuntu locale instead of en_US.UTF-8, so I saw this output: $ locale 1 Just for anyone with a similar problems, the easiest solution is: LCALLC.UTF-8 update-command-not-found And this works for a lot of similar UTF-8 issues related to any kind of installs. Or run this command before running your Python code export PYTHONIOENCODING="UTF-8" If you don't have permission to do that you can run all your Python code like this: PYTHONIOENCODING="UTF-8" python3. If it's not en_US.UTF-8, change it like this: sudo apt install locales You might also want to know that some of the non-standard characters causing the issue are Ñ and possibly É.Ĭheck which locale you're using with the locale command. I've read several posts on this topic, but none of them seem to directly apply. I think I should tell you that I'm using python 2.7.2, and this is part of an app build on django 1.4. UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 2: ordinal not in range(128) Traceback (most recent call last):įile "push_into_db.py", line 32, in buildDistrictSchoolMapĬounty=row.encode('utf-8'), lat=row, lng=row) When I run the program to parse the dataset into what I can use, I get the following Traceback. I'm encoding everything except the lat and lng because those need to be sent out to an API. Then, I attempt to encode it with: name=school_name.encode('utf-8'), street=row.encode('utf-8'), city=row.encode('utf-8'), state=row.encode('utf-8'), zip5=row, zip4=row,county=row.encode('utf-8'), lat=row, lng=row) I open the CSV using: 15 ncesReader = csv.reader(open('geocoded_output.csv', 'rb'), delimiter='\t', quotechar='"') I need to use unicode, as per the job specs, but I am baffled. I am attempting to work with a very large dataset that has some non-standard characters in it.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |