browsertrix/backend/btrixcloud
Ilya Kreymer fa86555eed
Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235)
* keep track of per pod status on crawljob:
- crashes time, and reason
- 'used' vs 'allocated' resources 
- 'percent' used / allocated

* crawl log errors: log error when crawler crashes via OOM, either via redis error log
or to console

* add initial autoscaling support!
- detect if metrics server is available via K8SApi.is_pod_metrics_available()
- if available, use metrics for 'used' fields
- if no metrics, set memory used for redis only (using redis apis)
- allow overriding memory and cpu via newMemory and newCpu settings on pod status
- scale memory / cpu based on newMemory and newCpu setting
- templates: update jinja templates to allow restarting crawler and redis with new resources
- ci: enable metrics-server on k3d, microk8s and nightly k3d ci runs

* roles: cleanup unused roles, add permissions for listing metrics

* stats for running crawls:
- update in db via operator
- avoids losing stats if redis pod happens to be done
- tradeoff is more db access in operator, but less extra connections to redis + already
loading from db in backend
- size stat: ensure size of previous files is added to the stats

* crawler deployment tweaks:
- adjust cpu/mem per browser
- add --headless flag to configmap to use new headless mode by default!
2023-10-05 20:41:18 -07:00
..
migrations optimization: convert all uses of 'async for' to use iterator directly (#1229) 2023-09-28 12:31:08 -07:00
templates Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235) 2023-10-05 20:41:18 -07:00
__init__.py
basecrawls.py Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235) 2023-10-05 20:41:18 -07:00
colls.py Fix: Make Collections Public on Creation (#1213) 2023-09-29 12:08:10 -07:00
crawlconfigs.py Separate seeds into a new endpoints (#1217) 2023-10-02 10:56:12 -07:00
crawlmanager.py feat: use is_bool to check EMAIL_SMTP_USE_TLS (#1231) 2023-10-02 21:29:36 -07:00
crawls.py Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235) 2023-10-05 20:41:18 -07:00
db.py migration improvements: (#1228) 2023-09-28 12:04:19 -07:00
emailsender.py feat: use is_bool to check EMAIL_SMTP_USE_TLS (#1231) 2023-10-02 21:29:36 -07:00
invites.py feat: use is_bool to check EMAIL_SMTP_USE_TLS (#1231) 2023-10-02 21:29:36 -07:00
k8sapi.py Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235) 2023-10-05 20:41:18 -07:00
main_op.py Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235) 2023-10-05 20:41:18 -07:00
main.py feat: use is_bool to check EMAIL_SMTP_USE_TLS (#1231) 2023-10-02 21:29:36 -07:00
models.py Add --failOnFailedSeed checkbox to URL list workflows (#1236) 2023-10-03 18:46:09 -07:00
operator.py Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235) 2023-10-05 20:41:18 -07:00
orgs.py optimization: convert all uses of 'async for' to use iterator directly (#1229) 2023-09-28 12:31:08 -07:00
pagination.py Move pydantic models to separate module + refactor crawl response endpoints to be consistent (#983) 2023-07-20 13:05:33 +02:00
profiles.py Track bytes stored per file type and include in org metrics (#1207) 2023-09-22 12:55:21 -04:00
storages.py Fix: Stream log downloading from WACZ (#1225) 2023-09-28 18:54:52 -07:00
uploads.py API delete endpoint improvements (#1232) 2023-10-03 13:05:00 -07:00
users.py Require that all passwords are between 8 and 64 characters (#1239) 2023-10-03 18:57:46 -07:00
utils.py Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235) 2023-10-05 20:41:18 -07:00
version.py version: bump to 1.7.0-beta.2 2023-10-05 20:33:38 -07:00
webhooks.py Improved type checking for backend with mypy (#1174) 2023-09-13 19:40:26 -07:00
zip.py Fix: Stream log downloading from WACZ (#1225) 2023-09-28 18:54:52 -07:00