- add 'pause' crawl state (fixes #2567) - gracefully shut down crawler pods, and then redis pod when paused - crawler uploads WACZ before shutting down (dependent on webrecorder/browsertrix-crawler#824, supported in 1.6.1+) - add 'paused_at' on crawl spec to indicate when crawl is paused - support max pause time limit, after which crawl becomes automatically stopped. - add 'stopped_pause_expired' when pause automatically expires and crawl is stopped - /crawl/<id>/{pause,resume} apis to toggle 'paused' on crawl spec - ui: add pause/resume button, paused state (partially addresses #2568) - ui: add pausing/resuming derivative states when crawl is running and pausing, or paused and not pausing (partially addresses #2569) - Designed to work with crawler 1.6.1+ which support pausing + uploading on pause Work on #2566, Fixes #2576 --------- Co-authored-by: sua yoo <sua@webrecorder.org> Co-authored-by: Tessa Walsh <tessa@bitarchivist.net> Co-authored-by: sua yoo <sua@suayoo.com>
41 lines
915 B
YAML
41 lines
915 B
YAML
apiVersion: btrix.cloud/v1
|
|
kind: CrawlJob
|
|
metadata:
|
|
name: crawljob-{{ id }}
|
|
labels:
|
|
crawl: "{{ id }}"
|
|
role: {{ "qa-job" if qa_source else "job" }}
|
|
btrix.org: "{{ oid }}"
|
|
btrix.user: "{{ userid }}"
|
|
btrix.storage: "{{ storage_name }}"
|
|
|
|
spec:
|
|
selector:
|
|
matchLabels:
|
|
crawl: "{{ id }}"
|
|
|
|
id: "{{ id }}"
|
|
userid: "{{ userid }}"
|
|
cid: "{{ cid }}"
|
|
oid: "{{ oid }}"
|
|
scale: {{ scale }}
|
|
|
|
profile_filename: "{{ profile_filename }}"
|
|
storage_filename: "{{ storage_filename }}"
|
|
|
|
maxCrawlSize: {{ max_crawl_size if not qa_source else 0 }}
|
|
timeout: {{ timeout if not qa_source else 0 }}
|
|
qaSourceCrawlId: "{{ qa_source }}"
|
|
|
|
manual: {{ manual }}
|
|
crawlerChannel: "{{ crawler_channel }}"
|
|
ttlSecondsAfterFinished: {{ 30 if not qa_source else 0 }}
|
|
warcPrefix: "{{ warc_prefix }}"
|
|
|
|
storageName: "{{ storage_name }}"
|
|
|
|
proxyId: "{{ proxy_id }}"
|
|
|
|
pausedAt: "{{ pausedAt }}"
|
|
|