Project

General

Profile

Notes » History » Version 4

Luke Murphey, 10/29/2015 06:05 PM

1 4 Luke Murphey
h1. Notes 
2 1 Luke Murphey
3 3 Luke Murphey
h2. "lxml refused to parse content that is Unicode if the content is xml with an encoding declaration", um, what does this mean?
4 1 Luke Murphey
5 2 Luke Murphey
lxml will refuse to parse a Unicode string containing XML that declares the encoding even if the encoding declaration matches the encoding used. The app handles this by attempting to parse the content a second time if it failed when using Unicode. This is necessary because I cannot allow lxml to discover the encoding since it doesn't know what the HTTP headers are and cannot sniff the encoding as well as the input does (which uses several methods to determine the encoding). See #987.