Bug #1144
lxml not finding elements
Start date:
12/27/2015
Due date:
% Done:
100%
Description
https://answers.splunk.com/answers/337109/extracting-http-served-xml-datascraping-http-provi.html
Sample code:
import lxml.html from cssselector import CSSSelector content = """<nutcallstatus> <!--this is similar to status.xml, but with more values--> <!--all temperatures are displayed in tenths F, regardless of setting of unit--> <!--all temperatures sent by browser to unit should be in F. you can send tenths F with a decimal place, ex: 123.5--> <COOK> <COOK_NAME>Cook</COOK_NAME> <COOK_TEMP>695</COOK_TEMP> <COOK_SET>1000</COOK_SET> <COOK_STATUS>0</COOK_STATUS> </COOK>""" tree = lxml.html.fromstring(content) selector = CSSSelector("COOK_TEMP") matches = selector(tree) print matches
History
#1
Updated by Luke Murphey almost 10 years ago
- Target version set to 1.2.0
#2
Updated by Luke Murphey almost 10 years ago
It turns out that LXML converts the HTML representation to lowercase. That can be proven by parsing some XML and then calling this:
lxml.etree.tostring(tree)
The output is:
<nutcallstatus>\n <!--this is similar to status.xml, but with more values-->\n <!--all temperatures are displayed in tenths F, regardless of setting of unit-->\n <!--all temperatures sent by browser to unit should be in F. you can send tenths F with a decimal place, ex: 123.5--> \n <cook>\n <cook_name>Cook</cook_name>\n <cook_temp>695</cook_temp>\n <cook_set>1000</cook_set>\n <cook_status>0</cook_status>\n </cook></nutcallstatus>
#3
Updated by Luke Murphey almost 10 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100