browsertrix/backend/btrixcloud/templates/crawl_job.yaml
Ilya Kreymer 6dc452ebad
Storage Refactor: Replication + Custom Storage Support (#1296)
- Refactors storage to support replicas + custom storages on the Org.
- There is a default primary + replica storage, while an Org can also have
primary and replica storages.
- StorageRef object is used to store references to default and custom
storage.

- CrawlFile has been updated to contain a StorageRef instead of a
def_storage_name, which references
either a default storage (in StorageOps) or custom storage (in
Organization)
- There is also a 'replicas' Optional[List[StorageRef]] which contains
replicas, if any.
- CrawlFileOut contain a numReplicas for how many replicas exist for
a given file.
- Migration: migration 0020 added to migrate existing Orgs, CrawlFile and ProfileFile objects to new storage system (CrawlFile and ProfileFile now extend BaseFile)


Part of #1262

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-10-26 21:44:09 -07:00

31 lines
585 B
YAML

apiVersion: btrix.cloud/v1
kind: CrawlJob
metadata:
name: crawljob-{{ id }}
labels:
crawl: "{{ id }}"
role: "job"
btrix.org: "{{ oid }}"
btrix.user: "{{ userid }}"
btrix.storage: "{{ storage_name }}"
spec:
selector:
matchLabels:
crawl: "{{ id }}"
id: "{{ id }}"
userid: "{{ userid }}"
cid: "{{ cid }}"
oid: "{{ oid }}"
scale: {{ scale }}
maxCrawlSize: {{ max_crawl_size }}
manual: {{ manual }}
ttlSecondsAfterFinished: 30
storageName: "{{ storage_name }}"
{% if expire_time %}
expireTime: "{{ expire_time }}"
{% endif %}