Project

General

Profile

Notes » History » Version 2

Luke Murphey, 09/17/2015 04:53 PM

1 1 Luke Murphey
h1. Notes
2 1 Luke Murphey
3 1 Luke Murphey
h2. lxml refused to parse content that is Unicode if the content is xml with an encoding declaration
4 1 Luke Murphey
5 1 Luke Murphey
lxml will refuse to parse a Unicode string containing XML that declares the encoding even if the encoding declaration matches the encoding used. The app handles this by attempting to parse the content a second time if it failed when using Unicode. This is necessary because I cannot allow lxml to discover the encoding since it doesn't know what the HTTP headers are and cannot sniff the encoding as well as the input does (which uses several methods to determine the encoding). See #987.
6 2 Luke Murphey
7 2 Luke Murphey
h2. I changed the sourcetype and now the match field is no longer a multi-value field; what do I do?
8 2 Luke Murphey
9 2 Luke Murphey
You can use rex to parse out the content into a field of your choice. In the example below, the matches are being parsed into a field called "file".
10 2 Luke Murphey
<pre>
11 2 Luke Murphey
sourcetype="downloads" | rex field=_raw "match=(?<file>[.a-zA-Z0-9_]+)" max_match=50 | mvexpand file
12 2 Luke Murphey
</pre>