crawl: abort download of media mimetypes #11

Open
opened 10 months ago by René Wagner · 3 comments
Owner

Due to the increasing amount of large files which take a long time to download and prolong the crawl drastically:

Feb 08 14:38:05 geminispace-info poetry[1088576]: 2021-02-08 14:38:05,857 crawl    INFO     Fetching resource: gemini://kamalatta.ddnss.de/elektro/aaa.mp4
Feb 08 14:57:59 geminispace-info poetry[1088576]: 2021-02-08 14:57:59,646 crawl    WARNING  Failed to fetch: gemini://kamalatta.ddnss.de/elektro/aaa.mp4

Do we need to download the complete files at all?
May be skip if meta is not text and only index the filename and its existence?

Due to the increasing amount of large files which take a long time to download and prolong the crawl drastically: ``` Feb 08 14:38:05 geminispace-info poetry[1088576]: 2021-02-08 14:38:05,857 crawl INFO Fetching resource: gemini://kamalatta.ddnss.de/elektro/aaa.mp4 Feb 08 14:57:59 geminispace-info poetry[1088576]: 2021-02-08 14:57:59,646 crawl WARNING Failed to fetch: gemini://kamalatta.ddnss.de/elektro/aaa.mp4 ``` Do we need to download the complete files at all? May be skip if meta is not text and only index the filename and its existence?
Poster
Owner

or maybe generally use a timeout - drop connection if file is not loaded within 30 seconds?

or maybe generally use a timeout - drop connection if file is not loaded within 30 seconds?
Collaborator

May be skip if meta is not text and only index the filename and its existence?

That's a nice idea, I like it!

> May be skip if meta is not text and only index the filename and its existence? That's a nice idea, I like it!
René Wagner added the
enhancement
label 10 months ago
Poster
Owner

That's a nice idea, I like it!

unfortunately we will loose the filesize if we don't complete the download...not sure how huge this loss will be

> That's a nice idea, I like it! unfortunately we will loose the filesize if we don't complete the download...not sure how huge this loss will be
René Wagner changed title from abort download of media mimetypes to crawl: abort download of media mimetypes 9 months ago
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.