browsertrix/backend
Tessa Walsh df4c4e6c5a
Optimize workflow statistics updates (#892)
* optimizations:
- rename update_crawl_config_stats to stats_recompute_all, only used in migration to fetch all crawls
and do a full recompute of all file sizes
- add stats_recompute_last to only get last crawl by size, increment total size by specified amount, and incr/decr number of crawls
- Update migration 0007 to use stats_recompute_all
- Add isCrawlRunning, lastCrawlStopping, and lastRun to
stats_recompute_last
- Increment crawlSuccessfulCount in stats_recompute_last

* operator/crawls:
- operator: keep track of filesAddedSize in redis as well
- rename update_crawl to update_crawl_state_if_changed() and only update
if state is different, otherwise return false
- ensure mark_finished() operations only occur if crawl is state has changed
- don't clear 'stopping' flag, can track if crawl was stopped
- state always starts with "starting", don't reset to starting

tests:
- Add test for incremental workflow stats updating
- don't clear stopping==true, indicates crawl was manually stopped

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-05-26 22:57:08 -07:00
..
btrixcloud Optimize workflow statistics updates (#892) 2023-05-26 22:57:08 -07:00
test Optimize workflow statistics updates (#892) 2023-05-26 22:57:08 -07:00
test_nightly tests: fixes for crawl cancel + crawl stopped (#864) 2023-05-22 20:17:29 -07:00
.pylintrc quickfix: pydantic / lint fix (#452) 2023-01-10 18:54:11 -08:00
Dockerfile Remove Code and Configs for Swarm/podman support (#407) 2022-12-08 18:19:58 -08:00
requirements.txt Refactor to use new operator on backend (#789) 2023-04-24 18:30:52 -07:00