browsertrix/.github/workflows/k3d-ci.yaml
Ilya Kreymer fa86555eed
Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235)
* keep track of per pod status on crawljob:
- crashes time, and reason
- 'used' vs 'allocated' resources 
- 'percent' used / allocated

* crawl log errors: log error when crawler crashes via OOM, either via redis error log
or to console

* add initial autoscaling support!
- detect if metrics server is available via K8SApi.is_pod_metrics_available()
- if available, use metrics for 'used' fields
- if no metrics, set memory used for redis only (using redis apis)
- allow overriding memory and cpu via newMemory and newCpu settings on pod status
- scale memory / cpu based on newMemory and newCpu setting
- templates: update jinja templates to allow restarting crawler and redis with new resources
- ci: enable metrics-server on k3d, microk8s and nightly k3d ci runs

* roles: cleanup unused roles, add permissions for listing metrics

* stats for running crawls:
- update in db via operator
- avoids losing stats if redis pod happens to be done
- tradeoff is more db access in operator, but less extra connections to redis + already
loading from db in backend
- size stat: ensure size of previous files is added to the stats

* crawler deployment tweaks:
- adjust cpu/mem per browser
- add --headless flag to configmap to use new headless mode by default!
2023-10-05 20:41:18 -07:00

97 lines
2.6 KiB
YAML

name: Cluster Run (K3d)
on:
push:
paths:
- 'backend/**'
- 'chart/**'
pull_request:
paths:
- 'backend/**'
- 'chart/**'
env:
ECHO_SERVER_HOST_URL: http://host.k3d.internal:18080
jobs:
btrix-k3d-test:
runs-on: ubuntu-latest
steps:
- name: Create k3d Cluster
uses: AbsaOSS/k3d-action@v2
with:
cluster-name: btrix-1
args: >-
-p "30870:30870@agent:0:direct"
--agents 1
--no-lb
--k3s-arg "--disable=traefik,servicelb@server:*"
- name: Checkout
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
driver-opts: network=host
- name: Build Backend
uses: docker/build-push-action@v3
with:
context: backend
load: true
#outputs: type=tar,dest=backend.tar
tags: webrecorder/browsertrix-backend:latest
cache-from: type=gha,scope=backend
cache-to: type=gha,scope=backend,mode=max
- name: Build Frontend
uses: docker/build-push-action@v3
with:
context: frontend
load: true
#outputs: type=tar,dest=frontend.tar
tags: webrecorder/browsertrix-frontend:latest
cache-from: type=gha,scope=frontend
cache-to: type=gha,scope=frontend,mode=max
- name: 'Import Images'
run: |
k3d image import webrecorder/browsertrix-backend:latest -m direct -c btrix-1 --verbose
k3d image import webrecorder/browsertrix-frontend:latest -m direct -c btrix-1 --verbose
- name: Install Kubectl
uses: azure/setup-kubectl@v3
- name: Install Helm
uses: azure/setup-helm@v3
with:
version: 3.10.2
- name: Start Cluster with Helm
run: |
helm upgrade --install -f ./chart/values.yaml -f ./chart/test/test.yaml btrix ./chart/
- name: Install Python
uses: actions/setup-python@v3
with:
python-version: '3.9'
- name: Install Python Libs
run: pip install pytest requests
- name: Wait for all pods to be ready
run: kubectl wait --for=condition=ready pod --all --timeout=240s
- name: Run Tests
run: pytest -s -vv ./backend/test/test_*.py
- name: Print Backend Logs (API)
if: ${{ failure() }}
run: kubectl logs svc/browsertrix-cloud-backend -c api
- name: Print Backend Logs (Operator)
if: ${{ failure() }}
run: kubectl logs svc/browsertrix-cloud-backend -c op