browsertrix/.github/workflows
Ilya Kreymer fa86555eed
Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235)
* keep track of per pod status on crawljob:
- crashes time, and reason
- 'used' vs 'allocated' resources 
- 'percent' used / allocated

* crawl log errors: log error when crawler crashes via OOM, either via redis error log
or to console

* add initial autoscaling support!
- detect if metrics server is available via K8SApi.is_pod_metrics_available()
- if available, use metrics for 'used' fields
- if no metrics, set memory used for redis only (using redis apis)
- allow overriding memory and cpu via newMemory and newCpu settings on pod status
- scale memory / cpu based on newMemory and newCpu setting
- templates: update jinja templates to allow restarting crawler and redis with new resources
- ci: enable metrics-server on k3d, microk8s and nightly k3d ci runs

* roles: cleanup unused roles, add permissions for listing metrics

* stats for running crawls:
- update in db via operator
- avoids losing stats if redis pod happens to be done
- tradeoff is more db access in operator, but less extra connections to redis + already
loading from db in backend
- size stat: ensure size of previous files is added to the stats

* crawler deployment tweaks:
- adjust cpu/mem per browser
- add --headless flag to configmap to use new headless mode by default!
2023-10-05 20:41:18 -07:00
..
ansible-lint.yaml Move DO ansible playbook to new format (#1159) 2023-09-27 22:36:34 -07:00
deploy-dev.yaml Optimize Frontend Image Build on CI (#1057) 2023-08-09 12:06:20 -07:00
docs-publish.yaml
frontend-build-check.yaml Optimize Frontend Image Build on CI (#1057) 2023-08-09 12:06:20 -07:00
k3d-ci.yaml Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235) 2023-10-05 20:41:18 -07:00
k3d-log-ci.yaml
k3d-nightly-ci.yaml Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235) 2023-10-05 20:41:18 -07:00
lint.yaml Improved type checking for backend with mypy (#1174) 2023-09-13 19:40:26 -07:00
microk8s-ci.yaml Add event webhook tests (#1155) 2023-09-12 22:08:40 -07:00
password-check.yaml feat: add pre-commit to check we don't have real passwords in yml files (#990) 2023-07-26 13:29:37 -07:00
project-assign-issue.yml
publish-helm-chart.yaml quick fix: fix typo in publish-helm-chart specifying version 2023-09-05 15:51:10 -04:00
release.yaml Optimize Frontend Image Build on CI (#1057) 2023-08-09 12:06:20 -07:00
ui-tests-playwright.yml ci: make playwright integration tests run only on PRs involving frontend 2023-04-05 09:57:34 -07:00