Project

General

Profile

Feature #539

Find similar forms when expanding word into related forms

Added by Anonymous about 11 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Start date:
02/24/2013
Due date:
% Done:

100%


Description

The search ought to find variations ignoring diacritical marks if a form could not be found. This useful because some searches fail to match for some reason even though they seem like thy ought to match.

For example, Josephi Vita 22:110 includes the word πιστὸς yet a search for "work:josephi-vita pisto)s" fails. A search for "work:josephi-vita no_diacritics:pistos" succeeds. Note that a search for "work:josephi-vita pisto\s" also fails.

History

#1 Updated by Luke Murphey about 11 years ago

It looks like the SimpleAnalyzer is not allowing the parentheses through.

#2 Updated by Luke Murphey about 11 years ago

It looks like the following are not considered part of the word (per the Simple Analyzer): ( ) \ + &

The following are considered part of the word: / = |

#3 Updated by Luke Murphey about 11 years ago

For some reason, a request for variations is repeated 48 times. Not sure why this is happening.

#4 Updated by Luke Murphey about 11 years ago

It looks like multiple calls are being made to GreekVariations._words. The constructor of GreekVariations is called twice. I put in some code to cache the results. I would rather figure out why the variations class is getting called more than once but from what I can tell this is something in the Whoosh classes that I cannot change. I added a caching system for the results and this works well.

#5 Updated by Luke Murphey about 11 years ago

I fixed the performance problem caused by multiple calls in #540.

#6 Updated by Luke Murphey about 11 years ago

  • % Done changed from 0 to 20

#7 Updated by Luke Murphey about 11 years ago

I may be able reduce the number of database calls by not getting the verse information in the search results if the results are not for the page to be shown. From what I can tell, all verses are being loaded even if only a subset are being shown on the results page.

#8 Updated by Luke Murphey about 11 years ago

  • Assignee changed from Anonymous to Luke Murphey

#9 Updated by Luke Murphey about 11 years ago

  • Status changed from New to In Progress

#10 Updated by Luke Murphey about 11 years ago

Rebuilding the indexes with SimpleAnalyzer allowed slashes to work provided they were escaped.

#11 Updated by Luke Murphey about 11 years ago

All of the characters seem to come through now with the exception: parentheses, asterisks, amperstands. I believe these are being filtered because whoosh uses them as special characters for the the purpose of searching.

#12 Updated by Luke Murphey about 11 years ago

  • % Done changed from 20 to 50

#13 Updated by Luke Murphey about 11 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 50 to 100

Also available in: Atom PDF