Feature #1168: Output raw data - Website Input - LukeMurphey.net

Feature #1168

Output raw data

Added by Luke Murphey over 8 years ago. Updated almost 8 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Luke Murphey

Category:

Target version:

2.1

Start date:

05/18/2016

Due date:

% Done:

100%

Description

Add the ability to just output the raw content of the page that is matched. The user can then parse the content themselves in SPL (such as rex). This way, the user can import things such as:

JSON files
HTML that is not valid or is rendered by Javascript (especially document.write() calls).
Raw content that needs no parsing (like the Internet Storm Center status)

Subtasks

Related issues

History

#1 Updated by Luke Murphey over 8 years ago

Description updated (diff)

#2 Updated by Luke Murphey over 8 years ago

https://answers.splunk.com/answers/341483/website-input-another-device-to-scrape-information.html

#3 Updated by Luke Murphey almost 8 years ago

Here are some issues that would need to be addressed to get this to work:

how to handle source-typing.
1. For XML, it would need to output raw XML so that xpath could be used (http://docs.splunk.com/Documentation/Splunk/6.0.7/SearchReference/Xpath) and KV_MODE=xml
2. For JSON, the sourcetype would need to be such that Splunk would treat the content as JSON (using INDEXED_EXTRACTIONS = json)
How to handle the extra fields that are usually included as key-value pairs