Bug #987
Failure to parse content due to encoding problem
Start date:
03/24/2015
Due date:
% Done:
100%
Description
See http://answers.splunk.com/answers/223557/how-to-troubleshoot-why-the-website-input-app-is-n.html
The app generates the following exceptions:
2015-03-23 21:40:06,607 ERROR A general exception was thrown when executing a web request Traceback (most recent call last): File "/Library/Splunk/splunk_sp/etc/apps/website_input/bin/web_input.py", line 314, in scrape_page tree = lxml.html.fromstring(content) File "/Library/Splunk/splunk_sp/lib/python2.7/site-packages/lxml/html/__init__.py", line 706, in fromstring doc = document_fromstring(html, parser=parser, base_url=base_url, **kw) File "/Library/Splunk/splunk_sp/lib/python2.7/site-packages/lxml/html/__init__.py", line 600, in document_fromstring value = etree.fromstring(html, parser, **kw) File "lxml.etree.pyx", line 3032, in lxml.etree.fromstring (src/lxml/lxml.etree.c:68121) File "parser.pxi", line 1781, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:102435) ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
2015-03-23 21:40:06,607 ERROR An exception occurred when attempting to retrieve information from the web-page Traceback (most recent call last): File "/Library/Splunk/splunk_sp/etc/apps/website_input/bin/web_input.py", line 470, in run result = WebInput.scrape_page(url, selector, username, password, timeout, name_attributes, proxy_type=proxy_type, proxy_server=proxy_server, proxy_port=proxy_port, proxy_user=proxy_user, proxy_password=proxy_password) File "/Library/Splunk/splunk_sp/etc/apps/website_input/bin/web_input.py", line 314, in scrape_page tree = lxml.html.fromstring(content) File "/Library/Splunk/splunk_sp/lib/python2.7/site-packages/lxml/html/__init__.py", line 706, in fromstring doc = document_fromstring(html, parser=parser, base_url=base_url, **kw) File "/Library/Splunk/splunk_sp/lib/python2.7/site-packages/lxml/html/__init__.py", line 600, in document_fromstring value = etree.fromstring(html, parser, **kw) File "lxml.etree.pyx", line 3032, in lxml.etree.fromstring (src/lxml/lxml.etree.c:68121) File "parser.pxi", line 1781, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:102435) ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
History
#2 Updated by Luke Murphey over 9 years ago
- Status changed from New to In Progress
#3 Updated by Luke Murphey over 9 years ago
I have a unit test reproducing this.
#4 Updated by Luke Murphey over 9 years ago
#5 Updated by Luke Murphey over 9 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100