browsertrix/qa_configmap.yaml at fef2b430729a8dfa9a65a75d7158a5a3f1db3dde - browsertrix - Scribbles

tea/browsertrix

Ilya Kreymer 4f676e4e82

QA Runs Initial Backend Implementation (#1586 )

Supports running QA Runs via the QA API!

Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes
#1498

Also requires the latest Browsertrix Crawler 1.1.0+ (from
webrecorder/browsertrix-crawler#469 branch)

Notable changes:
- QARun objects contain info about QA runs, which are crawls
performed on data loaded from existing crawls.

- Various crawl db operations can be performed on either the crawl or
`qa.` object, and core crawl fields have been moved to CoreCrawlable.

- While running,`QARun` data stored in a single `qa` object, while
finished qa runs are added to `qaFinished` dictionary on the Crawl. The
QA list API returns data from the finished list, sorted by most recent
first.

- Includes additional type fixes / type safety, especially around
BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific
get_upload(), get_basecrawl(), get_crawl() getters for internal use and
get_crawl_out() for API

- Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch) 
along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results.

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>

2024-03-20 22:42:16 -07:00

15 lines

234 B

YAML

Raw Blame History

 # -------
 # CONFIGMAP
 # -------
 apiVersion: v1
 kind: ConfigMap
 metadata:
   name: {{ name }}
   namespace: {{ namespace }}
   labels:
     crawl: {{ id }}
     role: crawler
 data:
   crawl-config.json: {{ qa_source_replay_json | tojson }}