browsertrix/chart/templates
Ilya Kreymer fa86555eed
Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235)
* keep track of per pod status on crawljob:
- crashes time, and reason
- 'used' vs 'allocated' resources 
- 'percent' used / allocated

* crawl log errors: log error when crawler crashes via OOM, either via redis error log
or to console

* add initial autoscaling support!
- detect if metrics server is available via K8SApi.is_pod_metrics_available()
- if available, use metrics for 'used' fields
- if no metrics, set memory used for redis only (using redis apis)
- allow overriding memory and cpu via newMemory and newCpu settings on pod status
- scale memory / cpu based on newMemory and newCpu setting
- templates: update jinja templates to allow restarting crawler and redis with new resources
- ci: enable metrics-server on k3d, microk8s and nightly k3d ci runs

* roles: cleanup unused roles, add permissions for listing metrics

* stats for running crawls:
- update in db via operator
- avoids losing stats if redis pod happens to be done
- tradeoff is more db access in operator, but less extra connections to redis + already
loading from db in backend
- size stat: ensure size of previous files is added to the stats

* crawler deployment tweaks:
- adjust cpu/mem per browser
- add --headless flag to configmap to use new headless mode by default!
2023-10-05 20:41:18 -07:00
..
backend.yaml Resource Constraints Cleanup: (fixes #895) (#1019) 2023-08-01 00:11:16 -07:00
configmap.yaml Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235) 2023-10-05 20:41:18 -07:00
frontend.yaml supports overriding the replayweb.page version without having to be r… (#1122) 2023-09-05 20:10:21 -04:00
ingress.yaml ingress: simplify ingress config: (fixes #1135) (#1146) 2023-09-07 09:51:48 -07:00
minio.yaml chart: move minio credentials to separate secret, part of #490 (#1143) 2023-09-06 17:35:30 -07:00
mongo.yaml Resource Constraints Cleanup: (fixes #895) (#1019) 2023-08-01 00:11:16 -07:00
namespaces.yaml
operators.yaml Scheduled Crawl Refactor: Handle via Operator + Add Skipped Crawls on Quota Reached (#1162) 2023-09-12 13:05:43 -07:00
priorities.yaml Operator refactor to control pods + pvcs directly instead of statefulsets (#1149) 2023-09-11 10:38:04 -07:00
role.yaml Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235) 2023-10-05 20:41:18 -07:00
secrets.yaml feat: add SMTP {port, use_tls} config (#1142) 2023-09-08 08:18:36 -07:00
service.yaml Use Shared Services for Crawling, Redis, Profile Browsers (#1088) 2023-08-24 20:08:53 -07:00
signer.yaml Resource Constraints Cleanup: (fixes #895) (#1019) 2023-08-01 00:11:16 -07:00