browsertrix/backend/btrixcloud/templates/crawler.yaml
Ilya Kreymer ad9bca2e92
Operator refactor to control pods + pvcs directly instead of statefulsets (#1149)
- Ability for pod to be Completed, unlike in Statefulset - eg. if 3 pods are running and first one finishes, all 3 must be running until all 3 are done. With this setup, the first finished pod can remain in Completed state.
- Fixed shutdown order - crawler pods now correctly shutdown first before redis pods, by switching to background deletion.
- Pod priority decreases with scale: 1st instance of a new crawl can preempt 3rd or 2nd instance of another crawl
- Create priority classes upto 'max_crawl_scale, configured in values.yaml
- Improved scale change reconciliation: if increasing scale, immediately scale up. If decreasing scale,
graceful stop scaled-down instance to complete via redis 'stopone' key, wait until they exit with Completed state
before adjust status.scale / removing scaled down pods. Ensures unaccepted interrupts don't cause scaled down data to be deleted.
- Redis pod remains inactive until crawler is first active, or after no crawl pods are active for 60 seconds
- Configurable Redis storage with 'redis_storage' value, set to 3Gi by default
- CrawlJob deletion starts as soon as post-finish crawl operations are run
- Post-crawl operations get their own redis instance, since one during response is being cleaned up in finalizer
- Finalizer ignores request with incorrect state (returns 400 if reported as not finished while crawl is finished)
- Current resource usage added to status
- Profile browser: also manage single pod directly without statefulset for consistency.
- Restart pods via restartTime value: if spec.restartTime != status.restartTime, clear out pods and update status.restartTime (using OnDelete policy to avoid recreate loops in edge cases).
- Update to latest metacontroller (v4.11.0)
- Add --restartOnError flag for crawler (for browsertrix-crawler 0.11.0)
- Failed crawl logging: dd 'fail_crawl()' to be used for failing a crawl, which prints logs for default container (if enabled) as well as pod status
- tests: check other finished states to avoid stuck in infinite loop if crawl fails
- tests: disable disk utilization check, which adds unpredictability to crawl testing!
fixes #1147 

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-09-11 10:38:04 -07:00

172 lines
3.6 KiB
YAML

# -------
# PVC
# -------
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ name }}
namespace: {{ namespace }}
labels:
crawl: {{ id }}
role: crawler
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: {{ crawler_storage }}
{% if volume_storage_class %}
storageClassName: {{ volume_storage_class }}
{% endif %}
# -------
# CRAWLER
# -------
{% if not force_restart %}
---
apiVersion: v1
kind: Pod
metadata:
name: {{ name }}
namespace: {{ namespace }}
labels:
crawl: {{ id }}
role: crawler
spec:
hostname: {{ name }}
subdomain: crawler
{% if priorityClassName %}
priorityClassName: {{ priorityClassName }}
{% endif %}
restartPolicy: OnFailure
terminationGracePeriodSeconds: {{ termination_grace_secs }}
volumes:
- name: crawl-config
configMap:
name: crawl-config-{{ cid }}
- name: crawl-data
persistentVolumeClaim:
claimName: {{ name }}
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: nodeType
operator: In
values:
- "{{ crawler_node_type }}"
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 2
podAffinityTerm:
topologyKey: "failure-domain.beta.kubernetes.io/zone"
labelSelector:
matchLabels:
crawl: {{ id }}
tolerations:
- key: nodeType
operator: Equal
value: crawling
effect: NoSchedule
- key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
effect: NoExecute
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
containers:
- name: crawler
image: {{ crawler_image }}
imagePullPolicy: {{ crawler_image_pull_policy }}
command:
- crawl
- --config
- /tmp/crawl-config.json
- --redisStoreUrl
- {{ redis_url }}
{%- if profile_filename %}
- --profile
- "@profiles/{{ profile_filename }}"
{%- endif %}
volumeMounts:
- name: crawl-config
mountPath: /tmp/crawl-config.json
subPath: crawl-config.json
readOnly: True
- name: crawl-data
mountPath: /crawls
envFrom:
- configMapRef:
name: shared-crawler-config
- secretRef:
name: storage-{{ storage_name }}
env:
- name: CRAWL_ID
value: "{{ id }}"
- name: WEBHOOK_URL
value: "{{ redis_url }}/crawls-done"
- name: STORE_PATH
value: "{{ store_path }}"
- name: STORE_FILENAME
value: "{{ store_filename }}"
- name: STORE_USER
value: "{{ userid }}"
{% if crawler_socks_proxy_host %}
- name: SOCKS_HOST
value: "{{ crawler_socks_proxy_host }}"
{% if crawler_socks_proxy_port %}
- name: SOCKS_PORT
value: "{{ crawler_socks_proxy_port }}"
{% endif %}
{% endif %}
resources:
limits:
memory: "{{ crawler_memory }}"
requests:
cpu: "{{ crawler_cpu }}"
memory: "{{ crawler_memory }}"
{% if crawler_liveness_port and crawler_liveness_port != '0' %}
livenessProbe:
httpGet:
path: /healthz
port: {{ crawler_liveness_port }}
initialDelaySeconds: 15
periodSeconds: 120
failureThreshold: 3
{% endif %}
{% endif %}