Project

General

Profile

Bug #434

Importer error when attempting to make verse

Added by Luke Murphey about 12 years ago. Updated about 12 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Start date:
Due date:
% Done:

100%


Description

The Perseus importer generates errors on some documents due to an encoding problem. See below for an example of the error:

2012-11-09 01:21:42,533 [ERROR] reader.importer.PerseusBatchImporter: Exception generated when attempting to process file="plut.cat.ma_gk.xml"Traceback (most recent call last):
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\PerseusBatchImporter.py", line 269, in process_directory    if self.__process_file__( os.path.join( root, f) ):  
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\PerseusBatchImporter.py", line 223, in __process_file__    return self.process_file(file_path, document_xml, title, author, language)  
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\PerseusBatchImporter.py", line 396, in process_file    perseus_importer.import_file(file_path)  
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\Perseus.py", line 121, in import_file    return self.import_xml_document(doc)  
File "C:\Program Files\Python27\lib\site-packages\django\db\transaction.py", line 209, in inner    return func(*args, **kwargs)  
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\Perseus.py", line 557, in import_xml_document    verses_created = self.make_verses(divisions, current_state_set)  
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\Perseus.py", line 575, in make_verses    verses_created = verses_created + self.make_verses_for_division(division, state_set)  
File "C:\Users\Luke\Workspace\TextCritical.com\src\reader\importer\Perseus.py", line 591, in make_verses_for_division    division_doc = parseString(division.original_content)  
File "C:\Program Files\Python27\lib\xml\dom\minidom.py", line 1924, in parseString    return expatbuilder.parseString(string)  
File "C:\Program Files\Python27\lib\xml\dom\expatbuilder.py", line 940, in parseString    return builder.parseString(string)  
File "C:\Program Files\Python27\lib\xml\dom\expatbuilder.py", line 223, in parseString    parser.Parse(string, True)

History

#1 Updated by Luke Murphey about 12 years ago

  • Description updated (diff)

#2 Updated by Luke Murphey about 12 years ago

The import began at 11/9/12 1:21:41.348 AM.

[INFO] reader.importer.PerseusBatchImporter: Importing a Perseus XML file, file_path="C:\Users\Luke\Workspace\TextCritical.com\var\\resources\\Perseus\\Classics\Plutarch\opensource\plut.cat.ma_gk.xml" 

It looks like it failed on verse 5 of division 12 of Marcus Cato.

2012-11-09 01:21:42,517 [DEBUG] reader.importer.Perseus: Making verse 5 in division 12 of Marcus Cato 

This issue seems to affect 56 works at the current time. These can be identified by running the following search:

sourcetype="django" ERROR UnicodeEncodeError | table file

#3 Updated by Luke Murphey about 12 years ago

  • Status changed from New to In Progress

#4 Updated by Luke Murphey about 12 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

Also available in: Atom PDF