source of - the search provider for gemini space
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Natalie Pendragon 5d7627a3f2 [serve] Upgrade to Jetforce v0.6.0 1 year ago
gus [crawl] Print change_frequency 1 year ago
scripts [threads] Only work with textual pages 1 year ago
serve [serve] Upgrade to Jetforce v0.6.0 1 year ago
.gitignore Gitignore all the indexes 1 year ago
LICENSE Add GUS licence 1 year ago Update naming 1 year ago
poetry.lock [serve] Upgrade to Jetforce v0.6.0 1 year ago
pyproject.toml [serve] Upgrade to Jetforce v0.6.0 1 year ago

Gemini Universal Search (GUS)

Roadmap / TODOs

  • log output of crawl: I see some errors fly by, and it would be nice to be able to review later and investigate.
  • get crawl to run on a schedule with systemd
  • add more statistics: this could go in the index statistics page, and, in addition to using the index itself, could also pull information from the jetforce logs.
    • server uptime (from indexes)
    • num new servers per week/month (from indexes)
    • num GUS queries per day (from server logs)
    • most common queries (not sure about this one) (from server logs)
    • num cross-domain redirects
    • num domains with robots
  • add tests: there aren't any yet!
  • add functionality to create a mock index: this would be useful for local hacking on, so one does not need to perform a real scrape of Geminispace to do said hacking.
  • exclude raw-text links: I think there is a "raw-text block" type of construct in the Gemini spec now, so I should probably add a TODO to refactor the extract_gemini_links function to exclude any links found within such a block.
  • track number of inbound links