Project

General

Profile

Bug #766

Page encoding is not determined correctly

Added by Luke Murphey almost 10 years ago. Updated almost 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
07/13/2014
Due date:
% Done:

100%

History

#1 Updated by Luke Murphey almost 10 years ago

  • Target version set to 0.8

#3 Updated by Luke Murphey almost 10 years ago

This is how html5lib does it:

"If no encoding is specified, the parser will attempt to detect the encoding from a <meta> element in the first 512 bytes of the document (this is only a partial implementation of the current HTML 5 specification).

If no encoding can be found and the chardet library is available, an attempt will be made to sniff the encoding from the byte pattern.

If all else fails, the default encoding will be used. This is usually Windows-1252, which is a common fallback used by Web browsers."

#4 Updated by Luke Murphey almost 10 years ago

  • Status changed from New to In Progress

#5 Updated by Luke Murphey almost 10 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

Also available in: Atom PDF