You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
||1 year ago|
|gus||1 year ago|
|scripts||1 year ago|
|serve||1 year ago|
|.gitignore||1 year ago|
|LICENSE||1 year ago|
|README.md||1 year ago|
|poetry.lock||1 year ago|
|pyproject.toml||1 year ago|
Gemini Universal Search (GUS)
Roadmap / TODOs
- log output of crawl: I see some errors fly by, and it would be nice to be able to review later and investigate.
- get crawl to run on a schedule with systemd
- add more statistics: this could go in the index statistics
page, and, in addition to using the index itself, could also
pull information from the jetforce logs.
- server uptime (from indexes)
- num new servers per week/month (from indexes)
- num GUS queries per day (from server logs)
- most common queries (not sure about this one) (from server logs)
- num cross-domain redirects
- num domains with robots
- add tests: there aren't any yet!
- add functionality to create a mock index: this would be useful for local hacking on serve.py, so one does not need to perform a real scrape of Geminispace to do said hacking.
- exclude raw-text links: I think there is a "raw-text block" type of construct in the Gemini spec now, so I should probably add a TODO to refactor the extract_gemini_links function to exclude any links found within such a block.
- track number of inbound links