302 Commits (e0fba80405ff7ea29d1680098031eaee3e165628)
 

Author SHA1 Message Date
Natalie Pendragon e61d608c8e [crawl] Stop storing responses in GeminiResource objects 1 year ago
Natalie Pendragon 0a9ac040af Bump version of gusmobile dependency 1 year ago
Natalie Pendragon 009873a26d [crawl] Handle url fragments 1 year ago
Natalie Pendragon 59db145095 [crawl] Fix handling of robots.txt 1 year ago
Natalie Pendragon 72feabcfe8 [crawl] Exclude "rss.xml" paths 1 year ago
Natalie Pendragon a4864548ca [crawl] Optimize the index after crawls 1 year ago
Natalie Pendragon 96731d16d3 [serve] Update highlight scoring and rendering 1 year ago
Natalie Pendragon a7ea734248 [crawl] pickle and unpickle the robot_file_map 1 year ago
Natalie Pendragon cde47da62c Improve handling of unquoting URLs 1 year ago
Natalie Pendragon e3f879df84 [serve] Update documentation on filters 1 year ago
Natalie Pendragon 8219abb97c Update locked version of Gusmobile 1 year ago
Natalie Pendragon d96abf7055 [crawl] Add domain field to index 1 year ago
Natalie Pendragon 941b086b7d Remove outdated TODO 1 year ago
Natalie Pendragon 7d609838de [serve] Update formatting of statistics page 1 year ago
Natalie Pendragon d07bb33e67 [serve] Fix bug with first/next/previous page link formatting 1 year ago
Natalie Pendragon 25713d69d8 [serve] Only highlight nice content types in search results 1 year ago
Natalie Pendragon e4c042c330 [crawl] Make path exclusions more robust 1 year ago
Natalie Pendragon 1fedfc3bc5 [serve] Remove broken URL count from stats page 1 year ago
Natalie Pendragon ab9d86ca3d Add houston to seeds, but ignore its search results 1 year ago
Natalie Pendragon fbc302284a [crawl] [serve] Add search highlights 1 year ago
Natalie Pendragon dd1c2ffdef [crawl] Index massaged URLs 1 year ago
Natalie Pendragon 78ca450d9f [crawl] Handle trailing slash redirects better 1 year ago
Natalie Pendragon 161252e750 [serve] Update the loading of statistics 1 year ago
Natalie Pendragon 8994b21fea [crawl] Fix lots of bugs 1 year ago
Natalie Pendragon 97b15eaa87 [crawl] Crawl the seed requests after the main crawl 1 year ago
Natalie Pendragon 22d4dcaa8c [crawl] Fix bug in relative URL parsing 1 year ago
Natalie Pendragon f10f1fc9a0 [crawl] Fix bug with computing full_qualified_urls 1 year ago
Natalie Pendragon 484ef90979 [crawl] Use standardized print_index_statistics 1 year ago
Natalie Pendragon c1c29b4a74 [no-op] Clean up comments in whoosh_extensions 1 year ago
Natalie Pendragon 9ffc427a6c [serve] Crawl and index seed requests immediately 1 year ago
Natalie Pendragon 8bcf71965e Update README TODOs 1 year ago
Natalie Pendragon 53ce6aa505 [crawl] Implement GeminiResource 1 year ago
Natalie Pendragon 4b123933cf [crawl] Exclude GUS search result pages from crawl 1 year ago
Natalie Pendragon 7d7422b975 [crawl] Add seeds 1 year ago
Natalie Pendragon b0e990ca13 [crawl] Add jan.bio to seeds 1 year ago
Natalie Pendragon 4c6e886c7f Add index.bak to gitignore 1 year ago
Natalie Pendragon 7ce414234e [crawl] Create non-destructive crawl option 1 year ago
Natalie Pendragon c2dd86ae92 [serve] Improve documentation on content type queries 1 year ago
Natalie Pendragon e49e877eb7 [serve] Add verbose mode 1 year ago
Natalie Pendragon b2026faac6 [serve] Update how num_results is displayed 1 year ago
Natalie Pendragon 4136079b4e [serve] Improve search result data type 1 year ago
Natalie Pendragon 20a5cb896d [crawl] [serve] Add more statistics 1 year ago
Natalie Pendragon d16c11de01 [crawl] Update seeds 1 year ago
Natalie Pendragon ec6401a523 [crawl] Update seeds 1 year ago
Natalie Pendragon 5ff76ac64e Update and reorder TODOs 1 year ago
Natalie Pendragon 32d12c4c5e [crawl] [no-op] Add a line after backup operation 1 year ago
Natalie Pendragon 2c002c6d76 Update statistics TODOs 1 year ago
Natalie Pendragon c7905a645a [crawl] Add new seed 1 year ago
Natalie Pendragon a884649816 [serve] Update statistics copy slightly 1 year ago
Natalie Pendragon 087e227c67 [serve] Implement paging 1 year ago