browsertrix

Author	SHA1	Message	Date
Ilya Kreymer	94e985ae13	optimize org quota lookups (#1973 ) - instead of looking up storage and exec min quotas from oid, and loading an org each time, load org once and then check quotas on the org object - many times the org was already available, and was looked up again - storage and exec quota checks become sync - rename can_run_crawl() to more generic can_write_data(), optionally also checks exec minutes - typing: get_org_by_id() always returns org, or throws, adjust methods accordingly (don't check for none, catch exception) - typing: fix typo in BaseOperator, catch type errors in operator 'org_ops' - operator quota check: use up-to-date 'status.size' for current job, ignore current job in all jobs list to avoid double-counting - follow up to #1969	2024-07-25 14:00:16 -07:00
Ilya Kreymer	01ddf95a56	allow disabling of auto-resize of crawler pods (#1964 ) - only enable if 'enable_auto_resize' is true, default to false - if true, set memory limit to 1.2 of memory requests, resize when hitting 'soft oom' of initial request, adjust by 1.2 (current behavior) up to max_crawler_memory - if false, set memory limit to max_crawler_memory and never adjust memory requests or memory limits - part of #1959	2024-07-23 21:00:40 -04:00
Ilya Kreymer	335700e683	Additional typing cleanup (#1938 ) Misc typing fixes, including in profiles and time functions --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-07-17 10:49:22 -07:00
Ilya Kreymer	61239a40ed	include workflow config in QA runs + different browser instances for QA (#1829 ) Currently, the workflow crawl settings were not being included at all in QA runs. This mounts the crawl workflow config, as well as QA configmap, into QA run crawls, allowing for page limits from crawl workflow to be applied to QA runs. It also allows a different number of browser instances to be used for QA runs, as QA runs might work better with less browsers, (eg. 2 instead of 4). This can be set with `qa_browser_instances` in helm chart. Default qa browser workers to 1 if unset (for now, for best results) Fixes #1828	2024-05-29 13:32:25 -07:00
Ilya Kreymer	f6c0791dc1	fix missing settings / typos: (#1748 ) - ensure max_crawler_memory_size is inited before it is set! - pass profile_browser_memory / profile_browser_cpu from chart values - map volume to /tmp/home to avoid persisting /tmp for profiles	2024-04-25 09:00:17 +02:00
Ilya Kreymer	f286f04130	Typo Fix: fix typos in setting max crawler memory (#1747 ) Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-04-24 10:29:09 -04:00
Ilya Kreymer	ec74eb4242	operator: add 'max_crawler_memory' to limit autosizing of crawler pods (#1746 ) Adds a `max_crawler_memory` chart setting, which, if set, will defines the upper crawler memory limit that crawler pods can be resized up to. If not set, auto resizing is disabled and pods are always set to 'crawler_memory' memory	2024-04-24 15:16:32 +02:00
Ilya Kreymer	3438133fcb	Crawler pod memory padding + auto scaling (#1631 ) - set memory limit to 1.2x memory request to provide extra padding and avoid OOM - attempt to resize crawler pods by 1.2x when exceeding 90% of available memory - do a 'soft OOM' (send extra SIGTERM) to pod when reaching 100% of requested memory, resulting in faster graceful restart, but avoiding a system-instant OOM Kill - Fixes #1632 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-03-28 16:39:00 -07:00
Ilya Kreymer	e7af081af1	profile browser fixes: better resource usage + load retry (main) (#1604 ) - Backend: Use separate resource constraints for profiles: default profile browser resources to either 'profile_browser_cpu' / 'profile_browser_memory' or single browser 'crawler_memory_base' / 'crawler_cpu_base', instead of scaled to the number of browser workers - Frontend: check that profile html page is loading, keep retrying if still getting nginx error instead of loading an iframe with the error. Fixes #1598 (Copy of #1599 from 1.9.4)	2024-03-16 15:07:04 -07:00
Ilya Kreymer	2ac6584942	Refactor operator class into module (#1564 ) The operator class has gotten fairly large, this is a first pass in refactoring operator.py into a submodule instead, with multiple operator instances which handle different types of objects. - The main k8s interface has been split into K8sOpApi which extends K8sApi and is shared across all operators. - Each operator extends BaseOperator which also has an instance of K8sOpApi - The CrawlOperator is still the bulk of the functionality, but will likely be further refactored to support QA jobs --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-02-29 14:40:12 -08:00

10 Commits