Project

General

Profile

Notes

"lxml refused to parse content that is Unicode if the content is xml with an encoding declaration", um, what does this mean?

lxml will refuse to parse a Unicode string containing XML that declares the encoding even if the encoding declaration matches the encoding used. The app handles this by attempting to parse the content a second time if it failed when using Unicode. This is necessary because I cannot allow lxml to discover the encoding since it doesn't know what the HTTP headers are and cannot sniff the encoding as well as the input does (which uses several methods to determine the encoding). See #987.