browsertrix/backend
Ilya Kreymer b4fd5e6e94
Crawl Timeout via elapsed time (#1338)
Fixes #1337 

Crawl timeout is tracked via `elapsedCrawlTime` field on the crawl
status, which is similar to regular crawl execution time, but only
counts one pod if scale > 1. If scale == 1, this time is equivalent.

Crawl is gracefully stopped when the elapsed execution time exceeds the
timeout. For more responsiveness, also adding current crawl time since
last update interval.

Details:
- handle crawl timeout via elapsed crawl time - longest running time of a
single pod, instead of expire time.
- include current running from last update for best precision
- more accurately count elapsed time crawl is actually running
- store elapsedCrawlTime in addition to crawlExecTime, storing the
longest duration of each pod since last test interval

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-11-06 16:32:58 -08:00
..
btrixcloud Crawl Timeout via elapsed time (#1338) 2023-11-06 16:32:58 -08:00
test users: add case-insensitive index to maintain backwards compatibility with fastapi-users (#1319) 2023-10-27 14:31:29 -07:00
test_nightly Crawl Timeout via elapsed time (#1338) 2023-11-06 16:32:58 -08:00
.pylintrc
Dockerfile
mypy.ini
requirements.txt Storage Refactor: Replication + Custom Storage Support (#1296) 2023-10-26 21:44:09 -07:00
test-requirements.txt Add slugs to org backend (#1250) 2023-10-10 18:30:09 -07:00