browsertrix/backend/btrixcloud
Tessa Walsh df4c4e6c5a
Optimize workflow statistics updates (#892)
* optimizations:
- rename update_crawl_config_stats to stats_recompute_all, only used in migration to fetch all crawls
and do a full recompute of all file sizes
- add stats_recompute_last to only get last crawl by size, increment total size by specified amount, and incr/decr number of crawls
- Update migration 0007 to use stats_recompute_all
- Add isCrawlRunning, lastCrawlStopping, and lastRun to
stats_recompute_last
- Increment crawlSuccessfulCount in stats_recompute_last

* operator/crawls:
- operator: keep track of filesAddedSize in redis as well
- rename update_crawl to update_crawl_state_if_changed() and only update
if state is different, otherwise return false
- ensure mark_finished() operations only occur if crawl is state has changed
- don't clear 'stopping' flag, can track if crawl was stopped
- state always starts with "starting", don't reset to starting

tests:
- Add test for incremental workflow stats updating
- don't clear stopping==true, indicates crawl was manually stopped

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-05-26 22:57:08 -07:00
..
migrations Optimize workflow statistics updates (#892) 2023-05-26 22:57:08 -07:00
templates crawlconfig: fix default filename template, make configurable (#835) 2023-05-08 14:03:27 -07:00
__init__.py
colls.py Rework collections to track collections in Crawl (#878) 2023-05-25 15:41:50 -04:00
crawlconfigs.py Optimize workflow statistics updates (#892) 2023-05-26 22:57:08 -07:00
crawlmanager.py crawlconfig: fix default filename template, make configurable (#835) 2023-05-08 14:03:27 -07:00
crawls.py Optimize workflow statistics updates (#892) 2023-05-26 22:57:08 -07:00
db.py Rework collections to track collections in Crawl (#878) 2023-05-25 15:41:50 -04:00
emailsender.py
invites.py
k8sapi.py Refactor to use new operator on backend (#789) 2023-04-24 18:30:52 -07:00
main_op.py startup fixes: (#793) 2023-04-24 18:32:52 -07:00
main_scheduled_job.py Improve sorting workflows by lastUpdated (#826) 2023-05-22 18:42:30 -04:00
main.py Wait for DB init for healthcheck + settings (#885) 2023-05-25 09:58:30 -07:00
operator.py Optimize workflow statistics updates (#892) 2023-05-26 22:57:08 -07:00
orgs.py Refactor to use new operator on backend (#789) 2023-04-24 18:30:52 -07:00
pagination.py
profiles.py
storages.py
users.py startup fixes: (#793) 2023-04-24 18:32:52 -07:00
utils.py stopping fix: backend fixes for #836 + prep for additional status fields (#837) 2023-05-08 14:02:20 -07:00
version.py version: bump to 1.6.0-beta.0 2023-05-19 11:29:31 -07:00
zip.py