Project

General

Profile

Bug #1739

Figure why selector doesn't match

Added by Luke Murphey almost 8 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Input: Web Spider
Target version:
Start date:
02/11/2017
Due date:
% Done:

100%


Description

Try a selector of "#siteTable .title" on https://www.reddit.com/r/Splunk/new/. It shows matches and it matches in a browser manually. However, it produces no results.

Associated revisions

Revision 357 (diff)
Added by lukemurphey over 7 years ago

Fixing no matches on some IDs

Fixing no matches on selectors using non-lowercase IDs

Reference #1739

History

#1 Updated by Luke Murphey almost 8 years ago

Repro:

| webscrape selector="#siteTable .title" url="https://www.reddit.com/r/Splunk/new/" 

This doesn't work either:

| webscrape selector="#siteTable .title" url="https://www.reddit.com/r/Splunk/new/" browser="firefox" 

#2 Updated by Luke Murphey over 7 years ago

To check:

  1. Is the selector getting passed through
  2. Can this be reproduced via a simple Python script?

#3 Updated by Luke Murphey over 7 years ago

I cannot match on #siteTable.

#4 Updated by Luke Murphey over 7 years ago

The issue is that I am making the selector lowercase in the SelectorField class. This is because CSS selectors ought to match without respect to case ("DIV" and "div" should match identically).

See http://stackoverflow.com/questions/1734125/is-it-possible-for-lxml-to-work-in-a-case-insensitive-manner

#5 Updated by Luke Murphey over 7 years ago

Some options:

  1. See if a newer version of CSS select addresses this
  2. Use BeautifulSoup
  3. Change the selector to match both (given and lowercase)

#7 Updated by Luke Murphey over 7 years ago

  • % Done changed from 0 to 30

#8 Updated by Luke Murphey over 7 years ago

cssselector 1.0.1 does not change the behavior.

#9 Updated by Luke Murphey over 7 years ago

  • Status changed from New to Closed
  • % Done changed from 30 to 100

Also available in: Atom PDF