crawler: fix redirect issue #5

Closed
opened 10 months ago by René Wagner · 4 comments
Owner

There's one capsule that kills the crawler due to a redirect overflow...it think it's the best approach to limit redirects somehow.

There's one capsule that kills the crawler due to a redirect overflow...it think it's the best approach to limit redirects somehow.
René Wagner added the
bug
label 10 months ago
Poster
Owner
There is no content yet.
10 KiB
René Wagner self-assigned this 10 months ago
Poster
Owner

tentative fix: set max_crawl_depth = 50 in crawl.py run_crawl()

tentative fix: set `max_crawl_depth = 50` in `crawl.py run_crawl()`
Poster
Owner

increased max_crawl_depth to 100, todays crawl did not fail. This is included in 02d00f10ed, but unfortunately this commit is a bit of a mess as it includes changes in EXCLUDED_URL_PREFIXES and won't apply upstream.

In the long run i think EXCLUDED_URL_PREFIXES and similar configs should move to constants or an own file much like "seed-requests.txt".

increased `max_crawl_depth` to 100, todays crawl did not fail. This is included in 02d00f10ed, but unfortunately this commit is a bit of a mess as it includes changes in `EXCLUDED_URL_PREFIXES` and won't apply upstream. In the long run i think `EXCLUDED_URL_PREFIXES` and similar configs should move to constants or an own file much like "seed-requests.txt".
Poster
Owner
patch sent to upstream: https://lists.sr.ht/~natpen/gus/patches/20181
René Wagner closed this issue 10 months ago
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.