Project

General

Profile

Bug #987

Failure to parse content due to encoding problem

Added by Luke Murphey almost 10 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
03/24/2015
Due date:
% Done:

100%


Description

See http://answers.splunk.com/answers/223557/how-to-troubleshoot-why-the-website-input-app-is-n.html

The app generates the following exceptions:

2015-03-23 21:40:06,607 ERROR A general exception was thrown when executing a web request
Traceback (most recent call last):
  File "/Library/Splunk/splunk_sp/etc/apps/website_input/bin/web_input.py", line 314, in scrape_page
    tree = lxml.html.fromstring(content)
  File "/Library/Splunk/splunk_sp/lib/python2.7/site-packages/lxml/html/__init__.py", line 706, in fromstring
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
  File "/Library/Splunk/splunk_sp/lib/python2.7/site-packages/lxml/html/__init__.py", line 600, in document_fromstring
    value = etree.fromstring(html, parser, **kw)
  File "lxml.etree.pyx", line 3032, in lxml.etree.fromstring (src/lxml/lxml.etree.c:68121)
  File "parser.pxi", line 1781, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:102435)
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
2015-03-23 21:40:06,607 ERROR An exception occurred when attempting to retrieve information from the web-page
Traceback (most recent call last):
  File "/Library/Splunk/splunk_sp/etc/apps/website_input/bin/web_input.py", line 470, in run
    result = WebInput.scrape_page(url, selector, username, password, timeout, name_attributes, proxy_type=proxy_type, proxy_server=proxy_server, proxy_port=proxy_port, proxy_user=proxy_user, proxy_password=proxy_password)
  File "/Library/Splunk/splunk_sp/etc/apps/website_input/bin/web_input.py", line 314, in scrape_page
    tree = lxml.html.fromstring(content)
  File "/Library/Splunk/splunk_sp/lib/python2.7/site-packages/lxml/html/__init__.py", line 706, in fromstring
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
  File "/Library/Splunk/splunk_sp/lib/python2.7/site-packages/lxml/html/__init__.py", line 600, in document_fromstring
    value = etree.fromstring(html, parser, **kw)
  File "lxml.etree.pyx", line 3032, in lxml.etree.fromstring (src/lxml/lxml.etree.c:68121)
  File "parser.pxi", line 1781, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:102435)
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

History

#2 Updated by Luke Murphey over 9 years ago

  • Status changed from New to In Progress

#3 Updated by Luke Murphey over 9 years ago

I have a unit test reproducing this.

#5 Updated by Luke Murphey over 9 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

Also available in: Atom PDF