Project

General

Profile

Bug #434

Updated by Luke Murphey over 11 years ago

The Perseus importer generates errors on some documents due to an encoding problem. See below for an example of the error:

<pre>
2012-11-09 01:21:42,533 [ERROR] reader.importer.PerseusBatchImporter: Exception generated when attempting to process file="plut.cat.ma_gk.xml"Traceback (most recent call last):
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\PerseusBatchImporter.py", line 269, in process_directory if self.__process_file__( os.path.join( root, f) ):
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\PerseusBatchImporter.py", line 223, in __process_file__ return self.process_file(file_path, document_xml, title, author, language)
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\PerseusBatchImporter.py", line 396, in process_file perseus_importer.import_file(file_path)
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\Perseus.py", line 121, in import_file return self.import_xml_document(doc)
File "C:\Program Files\Python27\lib\site-packages\django\db\transaction.py", line 209, in inner return func(*args, **kwargs)
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\Perseus.py", line 557, in import_xml_document verses_created = self.make_verses(divisions, current_state_set)
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\Perseus.py", line 575, in make_verses verses_created = verses_created + self.make_verses_for_division(division, state_set)
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\Perseus.py", line 591, in make_verses_for_division division_doc = parseString(division.original_content)
File "C:\Program Files\Python27\lib\xml\dom\minidom.py", line 1924, in parseString return expatbuilder.parseString(string)
File "C:\Program Files\Python27\lib\xml\dom\expatbuilder.py", line 940, in parseString return builder.parseString(string)
File "C:\Program Files\Python27\lib\xml\dom\expatbuilder.py", line 223, in parseString parser.Parse(string, True)
</pre>

Back