browsertrix/chart/app-templates/crawl_job.yaml
Tessa Walsh dc41468daf
Allow users to run crawls with 1 or 2 browser windows (#2627)
Fixes #2425 

## Changed

- Switch backend to primarily using number of browser windows rather
than scale multiplier (including migration to calculate `browserWindows`
from `scale` for existing workflows and crawls)
- Still support `scale` in addition to `browserWindows` in input models
for creating and updating workflows and re-adjusting live crawl scale
for backwards compatibility
- Adds new `max_browser_windows` value to Helm chart, but calculates the
value from `max_crawl_scale` as fallback for users with that value
already set in local charts
- Rework frontend to allow users to select multiples of
`crawler_browser_instances` or any value below
`crawler_browser_instances` for browser windows. For instance, with
`crawler_browser_instances=4` and `max_browser_windows=8`, the user
would be presented with the following options: 1, 2, 3, 4, 8
- Sets maximum width of screencast to image width returned by `message`

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2025-06-03 13:37:30 -07:00

42 lines
955 B
YAML

apiVersion: btrix.cloud/v1
kind: CrawlJob
metadata:
name: crawljob-{{ id }}
labels:
crawl: "{{ id }}"
role: {{ "qa-job" if qa_source else "job" }}
btrix.org: "{{ oid }}"
btrix.user: "{{ userid }}"
btrix.storage: "{{ storage_name }}"
spec:
selector:
matchLabels:
crawl: "{{ id }}"
id: "{{ id }}"
userid: "{{ userid }}"
cid: "{{ cid }}"
oid: "{{ oid }}"
scale: {{ scale }}
browserWindows: {{ browser_windows }}
profile_filename: "{{ profile_filename }}"
storage_filename: "{{ storage_filename }}"
maxCrawlSize: {{ max_crawl_size if not qa_source else 0 }}
timeout: {{ timeout if not qa_source else 0 }}
qaSourceCrawlId: "{{ qa_source }}"
manual: {{ manual }}
crawlerChannel: "{{ crawler_channel }}"
ttlSecondsAfterFinished: {{ 30 if not qa_source else 0 }}
warcPrefix: "{{ warc_prefix }}"
storageName: "{{ storage_name }}"
proxyId: "{{ proxy_id }}"
pausedAt: "{{ pausedAt }}"