Bug #1144
lxml not finding elements
Start date:
12/27/2015
Due date:
% Done:
100%
Description
https://answers.splunk.com/answers/337109/extracting-http-served-xml-datascraping-http-provi.html
Sample code:
import lxml.html from cssselector import CSSSelector content = """<nutcallstatus> <!--this is similar to status.xml, but with more values--> <!--all temperatures are displayed in tenths F, regardless of setting of unit--> <!--all temperatures sent by browser to unit should be in F. you can send tenths F with a decimal place, ex: 123.5--> <COOK> <COOK_NAME>Cook</COOK_NAME> <COOK_TEMP>695</COOK_TEMP> <COOK_SET>1000</COOK_SET> <COOK_STATUS>0</COOK_STATUS> </COOK>""" tree = lxml.html.fromstring(content) selector = CSSSelector("COOK_TEMP") matches = selector(tree) print matches
History
#1 Updated by Luke Murphey about 9 years ago
- Target version set to 1.2.0
#2 Updated by Luke Murphey about 9 years ago
It turns out that LXML converts the HTML representation to lowercase. That can be proven by parsing some XML and then calling this:
lxml.etree.tostring(tree)
The output is:
<nutcallstatus>\n <!--this is similar to status.xml, but with more values-->\n <!--all temperatures are displayed in tenths F, regardless of setting of unit-->\n <!--all temperatures sent by browser to unit should be in F. you can send tenths F with a decimal place, ex: 123.5--> \n <cook>\n <cook_name>Cook</cook_name>\n <cook_temp>695</cook_temp>\n <cook_set>1000</cook_set>\n <cook_status>0</cook_status>\n </cook></nutcallstatus>
#3 Updated by Luke Murphey about 9 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100