browsertrix

History

Ilya Kreymer 4f676e4e82 QA Runs Initial Backend Implementation (#1586 ) Supports running QA Runs via the QA API! Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes #1498 Also requires the latest Browsertrix Crawler 1.1.0+ (from webrecorder/browsertrix-crawler#469 branch) Notable changes: - QARun objects contain info about QA runs, which are crawls performed on data loaded from existing crawls. - Various crawl db operations can be performed on either the crawl or `qa.` object, and core crawl fields have been moved to CoreCrawlable. - While running,`QARun` data stored in a single `qa` object, while finished qa runs are added to `qaFinished` dictionary on the Crawl. The QA list API returns data from the finished list, sorted by most recent first. - Includes additional type fixes / type safety, especially around BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific get_upload(), get_basecrawl(), get_crawl() getters for internal use and get_crawl_out() for API - Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch) along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results. --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>		2024-03-20 22:42:16 -07:00
..
btrixcloud	QA Runs Initial Backend Implementation (#1586 )	2024-03-20 22:42:16 -07:00
test	QA Runs Initial Backend Implementation (#1586 )	2024-03-20 22:42:16 -07:00
test_nightly	Add extra and gifted execution minutes (#1361 )	2023-12-07 14:34:37 -05:00
.pylintrc	quickfix: pydantic / lint fix (#452 )	2023-01-10 18:54:11 -08:00
Dockerfile	Backend mem usage fix - use fixed MOTOR_MAX_WORKERS + switch to gunicorn (#1468 )	2024-01-16 15:32:42 -08:00
mypy.ini	Support multiple crawler versions (#1420 )	2024-01-16 15:32:12 -08:00
requirements.txt	Add endpoints to read pages from older crawl WACZs into database (#1562 )	2024-03-19 14:14:21 -07:00
test-requirements.txt	Add slugs to org backend (#1250 )	2023-10-10 18:30:09 -07:00