Backend: - add 'maxCrawlSize' to models and crawljob spec - add 'MAX_CRAWL_SIZE' to configmap - add maxCrawlSize to new crawlconfig + update APIs - operator: gracefully stop crawl if current size (from stats) exceeds maxCrawlSize - tests: add max crawl size tests Frontend: - Add Max Crawl Size text box Limits tab - Users enter max crawl size in GB, convert to bytes - Add BYTES_PER_GB as constant for converting to bytes - docs: Crawl Size Limit to user guide workflow setup section Operator Refactor: - use 'status.stopping' instead of 'crawl.stopping' to indicate crawl is being stopped, as changing later has no effect in operator - add is_crawl_stopping() to return if crawl is being stopped, based on crawl.stopping or size or time limit being reached - crawlerjob status: store byte size under 'size', human readable size under 'sizeHuman' for clarity - size stat always exists so remove unneeded conditional (defaults to 0) - store raw byte size in 'size', human readable size in 'sizeHuman' Charts: - subchart: update crawlerjob crd in btrix-crds to show status.stopping instead of spec.stopping - subchart: show 'sizeHuman' property instead of 'size' - bump subchart version to 0.1.1 --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
27 lines
475 B
YAML
27 lines
475 B
YAML
apiVersion: btrix.cloud/v1
|
|
kind: CrawlJob
|
|
metadata:
|
|
name: crawljob-{{ id }}
|
|
labels:
|
|
crawl: "{{ id }}"
|
|
role: "job"
|
|
oid: "{{ oid }}"
|
|
userid: "{{ userid }}"
|
|
|
|
spec:
|
|
selector:
|
|
matchLabels:
|
|
crawl: "{{ id }}"
|
|
|
|
id: "{{ id }}"
|
|
userid: "{{ userid }}"
|
|
cid: "{{ cid }}"
|
|
oid: "{{ oid }}"
|
|
scale: {{ scale }}
|
|
maxCrawlSize: {{ max_crawl_size }}
|
|
ttlSecondsAfterFinished: 30
|
|
|
|
{% if expire_time %}
|
|
expireTime: "{{ expire_time }}"
|
|
{% endif %}
|