Project

General

Profile

Feature #1168

Output raw data

Added by Luke Murphey almost 9 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
05/18/2016
Due date:
% Done:

100%


Description

Add the ability to just output the raw content of the page that is matched. The user can then parse the content themselves in SPL (such as rex). This way, the user can import things such as:

  • JSON files
  • HTML that is not valid or is rendered by Javascript (especially document.write() calls).
  • Raw content that needs no parsing (like the Internet Storm Center status)

Subtasks

Task #1343: Allow empty selector on input page if raw content is includedClosedLuke Murphey


Related issues

Related to Website Input - Feature #1748: Add parsing of JSON fields New 02/16/2017

History

#1 Updated by Luke Murphey almost 9 years ago

  • Description updated (diff)

#3 Updated by Luke Murphey over 8 years ago

Here are some issues that would need to be addressed to get this to work:
  1. how to handle source-typing.
    1. For XML, it would need to output raw XML so that xpath could be used (http://docs.splunk.com/Documentation/Splunk/6.0.7/SearchReference/Xpath) and KV_MODE=xml
    2. For JSON, the sourcetype would need to be such that Splunk would treat the content as JSON (using INDEXED_EXTRACTIONS = json)
  2. How to handle the extra fields that are usually included as key-value pairs

#5 Updated by Luke Murphey over 8 years ago

Not sure how to output raw data and define sourcetype, index

#6 Updated by Luke Murphey over 8 years ago

I currently have to strip endlines. To include endlines, I would need to configure a better line-break (http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/Indexmulti-lineevents).

#7 Updated by Luke Murphey over 8 years ago

  • Status changed from New to Closed

#8 Updated by Luke Murphey almost 8 years ago

Also available in: Atom PDF