Browse Source

update 2021-05-12

master
René Wagner 5 months ago
parent
commit
06c0258323
  1. 5
      gus/crawl.py
  2. 4
      serve/templates/news.gmi

5
gus/crawl.py

@ -156,7 +156,6 @@ EXCLUDED_URL_PREFIXES = [
# mozz mailing list linkscraper
"gemini://mozz.us/files/gemini-links.gmi",
# gemini.techrights.org
"gemini://gemini.techrights.org/",
@ -167,11 +166,13 @@ EXCLUDED_URL_PREFIXES = [
# news mirrors - not our business
"gemini://guardian.shit.cx/",
"gemini://simplynews.metalune.xyz",
"gemini://illegaldrugs.net/cgi-bin/news.php?",
# wikipedia proxy
"gemini://wp.pitr.ca/",
"gemini://wp.glv.one/",
"gemini://wikipedia.geminet.org/",
# client torture test
"gemini://egsam.pitr.ca/",
"gemini://egsam.glv.one/",

4
serve/templates/news.gmi

@ -2,6 +2,10 @@
## News
### 2021-05-12
We are back on track with crawl and index, everything is up-to-date again.
I had to add another news and a wikipedia mirror to the exclude list. The current implementation can't handle such a huge amount of information well.
### 2021-05-08
Obviously this didn't work as expected. For whatever reason indexing fails repeatedly on one or another page with a mysterious sqlite error. It may to a few days till i find enough time to search for the cause of this error.
If you are familiar with peewee and sqlite or have come across this issue earlier, let me know:

Loading…
Cancel
Save