Bug #766
Page encoding is not determined correctly
Start date:
07/13/2014
Due date:
% Done:
100%
History
#1 Updated by Luke Murphey over 10 years ago
- Target version set to 0.8
#3 Updated by Luke Murphey over 10 years ago
This is how html5lib does it:
"If no encoding is specified, the parser will attempt to detect the encoding from a <meta> element in the first 512 bytes of the document (this is only a partial implementation of the current HTML 5 specification).
If no encoding can be found and the chardet library is available, an attempt will be made to sniff the encoding from the byte pattern.
If all else fails, the default encoding will be used. This is usually Windows-1252, which is a common fallback used by Web browsers."
#4 Updated by Luke Murphey over 10 years ago
- Status changed from New to In Progress
#5 Updated by Luke Murphey over 10 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100