||1 year ago|
|gus||1 year ago|
|.gitignore||1 year ago|
|LICENSE||1 year ago|
|README.md||1 year ago|
|poetry.lock||1 year ago|
|pyproject.toml||1 year ago|
Note that doing this currently requires you to perform a full crawl of Geminispace. With little content, and few people hacking on this, it's probably fine, but we should definitely keep tabs on this to ensure we're kind and respectful to content and server owners (I think the solution is that we need a way to create a mock index sooner than later).
- Get Python and Poetry
- Generate a local Geminispace index with
poetry run crawl --destructive
- Serve GUS locally with
poetry run serve
At this point you should be able to interact with a running local version of GUS, modulo perhaps some mucking about with SSL (which is left as an exercise to the reader because I am not an expert at all in that stuff :).
Please send patches to ~email@example.com.
Roadmap / TODOs
- log output of crawl: I see some errors fly by, and it would be nice to be able to review later and investigate.
- get crawl to run on a schedule with systemd
- add more statistics: this could go in the index statistics
page, and, in addition to using the index itself, could also
pull information from the jetforce logs.
- server uptime (from indexes)
- num new servers per week/month (from indexes)
- num GUS queries per day (from server logs)
- most common queries (not sure about this one) (from server logs)
- num cross-domain redirects
- num domains with robots
- add tests: there aren't any yet!
- add functionality to create a mock index: this would be useful for local hacking on serve.py, so one does not need to perform a real scrape of Geminispace to do said hacking.
- exclude raw-text links: I think there is a "raw-text block" type of construct in the Gemini spec now, so I should probably add a TODO to refactor the extract_gemini_links function to exclude any links found within such a block.
- track number of inbound links