Supports running QA Runs via the QA API! Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes #1498 Also requires the latest Browsertrix Crawler 1.1.0+ (from webrecorder/browsertrix-crawler#469 branch) Notable changes: - QARun objects contain info about QA runs, which are crawls performed on data loaded from existing crawls. - Various crawl db operations can be performed on either the crawl or `qa.` object, and core crawl fields have been moved to CoreCrawlable. - While running,`QARun` data stored in a single `qa` object, while finished qa runs are added to `qaFinished` dictionary on the Crawl. The QA list API returns data from the finished list, sorted by most recent first. - Includes additional type fixes / type safety, especially around BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific get_upload(), get_basecrawl(), get_crawl() getters for internal use and get_crawl_out() for API - Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch) along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results. --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
		
			
				
	
	
		
			34 lines
		
	
	
		
			765 B
		
	
	
	
		
			YAML
		
	
	
	
	
	
			
		
		
	
	
			34 lines
		
	
	
		
			765 B
		
	
	
	
		
			YAML
		
	
	
	
	
	
apiVersion: btrix.cloud/v1
 | 
						|
kind: CrawlJob
 | 
						|
metadata:
 | 
						|
  name: crawljob-{{ id }}
 | 
						|
  labels:
 | 
						|
    crawl: "{{ id }}"
 | 
						|
    role: {{ "qa-job" if qa_source else "job" }}
 | 
						|
    btrix.org: "{{ oid }}"
 | 
						|
    btrix.user: "{{ userid }}"
 | 
						|
    btrix.storage: "{{ storage_name }}"
 | 
						|
 | 
						|
spec:
 | 
						|
  selector:
 | 
						|
    matchLabels:
 | 
						|
      crawl: "{{ id }}"
 | 
						|
 | 
						|
  id: "{{ id }}"
 | 
						|
  userid: "{{ userid }}"
 | 
						|
  cid: "{{ cid }}"
 | 
						|
  oid: "{{ oid }}"
 | 
						|
  scale: {{ scale }}
 | 
						|
 | 
						|
  maxCrawlSize: {{ max_crawl_size if not qa_source else 0 }}
 | 
						|
  timeout: {{ timeout if not qa_source else 0 }}
 | 
						|
  qaSourceCrawlId: "{{ qa_source }}"
 | 
						|
 | 
						|
  manual: {{ manual }}
 | 
						|
  crawlerChannel: "{{ crawler_channel }}"
 | 
						|
  ttlSecondsAfterFinished: {{ 30 if not qa_source else 0 }}
 | 
						|
  warcPrefix: "{{ warc_prefix }}"
 | 
						|
 | 
						|
  storageName: "{{ storage_name }}"
 | 
						|
 |