source of - the search provider for gemini space
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Natalie Pendragon 24167257f4 Add .git-blame-ignore-revs file 9 months ago
gus [crawl] Make logging message slightly clearer 9 months ago
scripts [threads] Only work with textual pages 11 months ago
serve Reformat code with Black 9 months ago
tests/gus/lib Reformat code with Black 9 months ago
.git-blame-ignore-revs Add .git-blame-ignore-revs file 9 months ago
.gitignore Gitignore all the indexes 1 year ago
LICENSE Add GUS licence 1 year ago Add exclusion improvement TODO to README 9 months ago
logging.ini Update default logging config to log to both console and file 9 months ago
poetry.lock [serve] Upgrade to Jetforce v0.6.0 11 months ago
pyproject.toml Update gusmobile clone location in pyproject.toml 9 months ago

Gemini Universal Search (GUS)


  1. Install python and poetry
  2. Run: "poetry install"

Making an initial index

Make sure you have some gemini URLs for testing which are nicely sandboxed to avoid indexing huge parts of the gemini space.

  1. Create a "seed-requests.txt" file with you test gemini URLs
  2. Run: "poetry run crawl -d"
  3. Run: "poetry run build_index -d"

Now you'll have created directory, rename it to index.

Running the frontend

  1. Run: "poetry run serve"
  2. Navigate your gemini client to: "gemini://localhost/"

Updating the index

  1. Run: "poetry run crawl"
  2. Run: "poetry run build_index"
  3. Restart frontend

Running test suite

Run: "poetry run python -m pytest"

Roadmap / TODOs

  • TODO: improve crawl and build_index automation
  • TODO: get crawl to run on a schedule with systemd
  • TODO: add functionality to create a mock index
  • TODO: exclude raw-text blocks from indexed content
  • TODO: strip control characters from logged output like URLs
  • TODO: fix bug in calulation of backlinks (iirc the bug is visible on
  • TODO: refactor manual exclusion logic to be regex-based instead of prefix-based. we could get more nuanced with exclusion logic this way