browsertrix/backend/btrixcloud
Tessa Walsh df4c4e6c5a
Optimize workflow statistics updates (#892)
* optimizations:
- rename update_crawl_config_stats to stats_recompute_all, only used in migration to fetch all crawls
and do a full recompute of all file sizes
- add stats_recompute_last to only get last crawl by size, increment total size by specified amount, and incr/decr number of crawls
- Update migration 0007 to use stats_recompute_all
- Add isCrawlRunning, lastCrawlStopping, and lastRun to
stats_recompute_last
- Increment crawlSuccessfulCount in stats_recompute_last

* operator/crawls:
- operator: keep track of filesAddedSize in redis as well
- rename update_crawl to update_crawl_state_if_changed() and only update
if state is different, otherwise return false
- ensure mark_finished() operations only occur if crawl is state has changed
- don't clear 'stopping' flag, can track if crawl was stopped
- state always starts with "starting", don't reset to starting

tests:
- Add test for incremental workflow stats updating
- don't clear stopping==true, indicates crawl was manually stopped

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-05-26 22:57:08 -07:00
..
migrations Optimize workflow statistics updates (#892) 2023-05-26 22:57:08 -07:00
templates
__init__.py
colls.py Rework collections to track collections in Crawl (#878) 2023-05-25 15:41:50 -04:00
crawlconfigs.py Optimize workflow statistics updates (#892) 2023-05-26 22:57:08 -07:00
crawlmanager.py
crawls.py Optimize workflow statistics updates (#892) 2023-05-26 22:57:08 -07:00
db.py Rework collections to track collections in Crawl (#878) 2023-05-25 15:41:50 -04:00
emailsender.py
invites.py
k8sapi.py
main_op.py
main_scheduled_job.py
main.py
operator.py Optimize workflow statistics updates (#892) 2023-05-26 22:57:08 -07:00
orgs.py
pagination.py
profiles.py
storages.py
users.py
utils.py
version.py
zip.py