Project

General

Profile

Bug #1597

Parse failures not handled well

Added by Luke Murphey over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Input: Web Spider
Target version:
Start date:
11/17/2016
Due date:
% Done:

100%


Description

Getting an exception like:

2016-11-17 01:04:33,894 ERROR An exception occurred when attempting to retrieve information from the web-page, stanza=web_input://Auth Fail Test
Traceback (most recent call last):
  File "/Users/lmurphey/Splunk/sp/etc/apps/website_input/bin/web_input.py", line 1035, in run
    result = WebInput.scrape_page(url, selector, username, password, timeout, name_attributes, proxy_type=proxy_type, proxy_server=proxy_server, proxy_port=proxy_port, proxy_user=proxy_user, proxy_password=proxy_password, user_agent=user_agent, use_element_name=use_element_name, page_limit=page_limit, depth_limit=depth_limit, url_filter=url_filter, include_raw_content=raw_content, text_separator=text_separator, browser=browser, output_matches_as_mv=output_matches_as_mv, output_matches_as_separate_fields=output_matches_as_separate_fields)
  File "/Users/lmurphey/Splunk/sp/etc/apps/website_input/bin/web_input.py", line 919, in scrape_page
    result = cls.get_result_single(http, urlparse(url), selector, headers, name_attributes, output_matches_as_mv, output_matches_as_separate_fields, charset_detect_meta_enabled, charset_detect_content_type_header_enabled, charset_detect_sniff_enabled, include_empty_matches, use_element_name, **kw)
  File "/Users/lmurphey/Splunk/sp/etc/apps/website_input/bin/web_input.py", line 622, in get_result_single
    tree = lxml.html.fromstring(content_decoded)
  File "/Users/lmurphey/Splunk/sp/lib/python2.7/site-packages/lxml/html/__init__.py", line 706, in fromstring
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
  File "/Users/lmurphey/Splunk/sp/lib/python2.7/site-packages/lxml/html/__init__.py", line 600, in document_fromstring
    value = etree.fromstring(html, parser, **kw)
  File "lxml.etree.pyx", line 3032, in lxml.etree.fromstring (src/lxml/lxml.etree.c:68121)
  File "parser.pxi", line 1786, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:102470)
  File "parser.pxi", line 1667, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:101229)
  File "parser.pxi", line 1035, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:96139)
  File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:91290)
  File "parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:92476)
  File "parser.pxi", line 631, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:91904)
XMLSyntaxError: line 135: Element script embeds close tag

History

#1 Updated by Luke Murphey over 7 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

Also available in: Atom PDF