- Refactors storage to support replicas + custom storages on the Org. - There is a default primary + replica storage, while an Org can also have primary and replica storages. - StorageRef object is used to store references to default and custom storage. - CrawlFile has been updated to contain a StorageRef instead of a def_storage_name, which references either a default storage (in StorageOps) or custom storage (in Organization) - There is also a 'replicas' Optional[List[StorageRef]] which contains replicas, if any. - CrawlFileOut contain a numReplicas for how many replicas exist for a given file. - Migration: migration 0020 added to migrate existing Orgs, CrawlFile and ProfileFile objects to new storage system (CrawlFile and ProfileFile now extend BaseFile) Part of #1262 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
31 lines
585 B
YAML
31 lines
585 B
YAML
apiVersion: btrix.cloud/v1
|
|
kind: CrawlJob
|
|
metadata:
|
|
name: crawljob-{{ id }}
|
|
labels:
|
|
crawl: "{{ id }}"
|
|
role: "job"
|
|
btrix.org: "{{ oid }}"
|
|
btrix.user: "{{ userid }}"
|
|
btrix.storage: "{{ storage_name }}"
|
|
|
|
spec:
|
|
selector:
|
|
matchLabels:
|
|
crawl: "{{ id }}"
|
|
|
|
id: "{{ id }}"
|
|
userid: "{{ userid }}"
|
|
cid: "{{ cid }}"
|
|
oid: "{{ oid }}"
|
|
scale: {{ scale }}
|
|
maxCrawlSize: {{ max_crawl_size }}
|
|
manual: {{ manual }}
|
|
ttlSecondsAfterFinished: 30
|
|
|
|
storageName: "{{ storage_name }}"
|
|
|
|
{% if expire_time %}
|
|
expireTime: "{{ expire_time }}"
|
|
{% endif %}
|