browsertrix

History

Ilya Kreymer ad9bca2e92 Operator refactor to control pods + pvcs directly instead of statefulsets (#1149 ) - Ability for pod to be Completed, unlike in Statefulset - eg. if 3 pods are running and first one finishes, all 3 must be running until all 3 are done. With this setup, the first finished pod can remain in Completed state. - Fixed shutdown order - crawler pods now correctly shutdown first before redis pods, by switching to background deletion. - Pod priority decreases with scale: 1st instance of a new crawl can preempt 3rd or 2nd instance of another crawl - Create priority classes upto 'max_crawl_scale, configured in values.yaml - Improved scale change reconciliation: if increasing scale, immediately scale up. If decreasing scale, graceful stop scaled-down instance to complete via redis 'stopone' key, wait until they exit with Completed state before adjust status.scale / removing scaled down pods. Ensures unaccepted interrupts don't cause scaled down data to be deleted. - Redis pod remains inactive until crawler is first active, or after no crawl pods are active for 60 seconds - Configurable Redis storage with 'redis_storage' value, set to 3Gi by default - CrawlJob deletion starts as soon as post-finish crawl operations are run - Post-crawl operations get their own redis instance, since one during response is being cleaned up in finalizer - Finalizer ignores request with incorrect state (returns 400 if reported as not finished while crawl is finished) - Current resource usage added to status - Profile browser: also manage single pod directly without statefulset for consistency. - Restart pods via restartTime value: if spec.restartTime != status.restartTime, clear out pods and update status.restartTime (using OnDelete policy to avoid recreate loops in edge cases). - Update to latest metacontroller (v4.11.0) - Add --restartOnError flag for crawler (for browsertrix-crawler 0.11.0) - Failed crawl logging: dd 'fail_crawl()' to be used for failing a crawl, which prints logs for default container (if enabled) as well as pod status - tests: check other finished states to avoid stuck in infinite loop if crawl fails - tests: disable disk utilization check, which adds unpredictability to crawl testing! fixes #1147 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>		2023-09-11 10:38:04 -07:00
..
data	Uploads API: BaseCrawl refactor + Initial support for /uploads endpoint (#937 )	2023-07-07 09:13:26 -07:00
__init__.py
conftest.py	Operator refactor to control pods + pvcs directly instead of statefulsets (#1149 )	2023-09-11 10:38:04 -07:00
test_collections.py	feat: implement 'collections' array with {name, id} for archived item details (#1098 )	2023-08-25 00:26:46 -07:00
test_crawl_config_search_values.py	Backend API consistency pass (#921 )	2023-06-16 18:52:46 -07:00
test_crawl_config_tags.py	Backend API consistency pass (#921 )	2023-06-16 18:52:46 -07:00
test_crawlconfigs.py	Add max crawl size option to backend and frontend (#1045 )	2023-08-26 22:00:37 -07:00
test_filter_sort_results.py	feat: implement 'collections' array with {name, id} for archived item details (#1098 )	2023-08-25 00:26:46 -07:00
test_invites.py	Paginate API list endpoints (#659 )	2023-03-06 14:41:25 -05:00
test_login.py
test_org.py	Add event webhook notifications system to backend (#1061 )	2023-08-31 19:52:37 -07:00
test_permissions.py	Paginate API list endpoints (#659 )	2023-03-06 14:41:25 -05:00
test_run_crawl.py	Add and enforce org storage quota (#1106 )	2023-09-07 12:45:43 -04:00
test_settings.py	tests: fixes for crawl cancel + crawl stopped (#864 )	2023-05-22 20:17:29 -07:00
test_stop_cancel_crawl.py	Operator refactor to control pods + pvcs directly instead of statefulsets (#1149 )	2023-09-11 10:38:04 -07:00
test_uploads.py	Add and enforce org storage quota (#1106 )	2023-09-07 12:45:43 -04:00
test_users.py
test_webhooks.py	Add event webhook notifications system to backend (#1061 )	2023-08-31 19:52:37 -07:00
test_workflow_auto_add_to_collection.py	feat: implement 'collections' array with {name, id} for archived item details (#1098 )	2023-08-25 00:26:46 -07:00
utils.py	Uploads API: BaseCrawl refactor + Initial support for /uploads endpoint (#937 )	2023-07-07 09:13:26 -07:00