switch to another FTS provider #20

Closed
opened 9 months ago by René Wagner · 1 comments
Owner

The current implementation of the FTS using whoosh has some issues - whoosh related and design related (#17, #13, #8) - that cause serious trouble.

We need to evaluate if we can switch to another FTS implementation.

The following goals should be achieved:

  • robust fts with similar results to whoosh
  • no need for a separate index step, crawled content should be immediately available for searches
  • reduce maintenance overhead (currently mostly caused by whoosh)

SQLite

Pro

  • embedded, no additional software thats needs to be maintained/backed up
  • no separate index step necessary

Con

  • no concurrent writes -> no parallelisation of crawling
  • first tests show inferior performance and search results compared to whoosh

PostgreSQL/MariaDB

Pro

  • concurrent writes, fix #19
  • included fts

Con

  • additional software to administer and separate backup step

ElasticSearch

Pro

  • ???

Con

  • ???
The current implementation of the FTS using whoosh has some issues - whoosh related and design related (#17, #13, #8) - that cause serious trouble. We need to evaluate if we can switch to another FTS implementation. The following goals should be achieved: - robust fts with similar results to whoosh - no need for a separate index step, crawled content should be immediately available for searches - reduce maintenance overhead (currently mostly caused by whoosh) ## SQLite ### Pro * embedded, no additional software thats needs to be maintained/backed up * no separate index step necessary ### Con * no concurrent writes -> no parallelisation of crawling * first tests show inferior performance and search results compared to whoosh ## PostgreSQL/MariaDB ### Pro * concurrent writes, fix #19 * included fts ### Con * additional software to administer and separate backup step ## ElasticSearch ### Pro * ??? ### Con * ???
René Wagner added the
help wanted
label 9 months ago
Poster
Owner

SQLite

some more tests with the sqlite fts:

CREATE VIRTUAL TABLE fts_data
USING FTS5(id, url, content_type, charset, content, tokenize = "porter unicode61")

INSERT INTO fts_data SELECT id, url, content_type, charset, content from page

SELECT snippet(fts_data, 4, '', '', '...',100), bm25(fts_data, 0.0, 2.0, 0.5, 0.5, 1.0),
page.* from fts_data 
left join page on page.id = fts_data.id
WHERE fts_data MATCH "diohsc" order by 2 
LIMIT 10, 20

performance doesn't seem to be much of an issue, search results are - query part needs to be tuned

## SQLite some more tests with the sqlite fts: ``` CREATE VIRTUAL TABLE fts_data USING FTS5(id, url, content_type, charset, content, tokenize = "porter unicode61") INSERT INTO fts_data SELECT id, url, content_type, charset, content from page SELECT snippet(fts_data, 4, '', '', '...',100), bm25(fts_data, 0.0, 2.0, 0.5, 0.5, 1.0), page.* from fts_data left join page on page.id = fts_data.id WHERE fts_data MATCH "diohsc" order by 2 LIMIT 10, 20 ``` performance doesn't seem to be much of an issue, search results are - query part needs to be tuned
René Wagner added this to the (deleted) milestone 7 months ago
René Wagner closed this issue 7 months ago
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.