Project

General

Profile

Feature #1882

Restrict inputs to HTTPS sites if on cloud

Added by Luke Murphey over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Input: Web Spider
Target version:
Start date:
05/25/2017
Due date:
% Done:

100%

Associated revisions

Revision 444 (diff)
Added by lukemurphey over 7 years ago

Restricting access on Splunk Cloud to HTTPS

Reference #1882

Revision 446 (diff)
Added by luke.murphey over 7 years ago

Changes examples to use HTTPS

Reference #1882

Revision 447 (diff)
Added by luke.murphey over 7 years ago

Making icon call use HTTPS

Reference #1882

Revision 448 (diff)
Added by lukemurphey over 7 years ago

Making the web_scrape command ensure that connections use HTTPS on Cloud

Reference #1882

Revision 450 (diff)
Added by lukemurphey over 7 years ago

Making sure link extraction requires HTTPS on Cloud

Reference #1882

History

#1 Updated by Luke Murphey over 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Luke Murphey

#2 Updated by Luke Murphey over 7 years ago

To update:
  • [done] Mod input editor
  • [done] Modular input code
  • [done] Wizard view
  • [done] Preview controller
  • [done] Search command
  • [done] Spider link extraction

#3 Updated by Luke Murphey over 7 years ago

From some reason the freaking endpoint isn't showing up in SplunkWeb.

Observations:

#5 Updated by Luke Murphey over 7 years ago

  • % Done changed from 0 to 80

#6 Updated by Luke Murphey over 7 years ago

Wierd, the non __raw endpoint works on Mac and 6.6.0.

#7 Updated by Luke Murphey over 7 years ago

I set the form value of the URL with the following to force the call to attempt to scrape the page:

document.getElementById('inputURL').value = "http://textcritical.net" 

#8 Updated by Luke Murphey over 7 years ago

I have to update several calls to pass the https_only parameter to:

  • scrape_page
  • get_result_single
  • extract_links

#9 Updated by Luke Murphey over 7 years ago

Tested with:

| webscrape selector="h3" url="https://www.reddit.com/r/popular/" page_limit=50 url_filter="*" depth_limit=25 empty_matches=0

#10 Updated by Luke Murphey over 7 years ago

I want to make tests for this.

To do this, I need:

  • decorator for run_only_on_cloud
  • decorator for run_only_on_enterprise
  • Tests for Cloud:
    • Scrape page doesn't extract non-HTTPS links
    • Controller doesn't extract non-HTTPS links
    • Scrape page won't scan non-HTTPS links
    • Controller won't scan non-HTTPS links
    • Wizard: rejects non-HTTPS

#11 Updated by Luke Murphey over 7 years ago

I wonder if I should change scrape_page to throw an exception if the URL provided is not HTTPS. Currently, https_only is only applicable to link extractions.

#12 Updated by Luke Murphey over 7 years ago

I recall now why I didn't add this into scrape_page: it doesn't have the session key to lookup whether the host is in the Cloud.

#13 Updated by Luke Murphey over 7 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 80 to 100

Also available in: Atom PDF