browsertrix

Author	SHA1	Message	Date
Ilya Kreymer	5c08c9679c	fix issue with incorrect number of total pages if any of the seeds is a redirect (#1649 ) Following changes in webrecorder/browsertrix-crawler#475, webrecorder/browsertrix-crawler#509, the crawler adds a redirected seed to the seen list. To account for this, it needs to be subtracted to get the total page count.	2024-04-04 15:55:44 -07:00
sua yoo	83c9203a11	Initial QA Review UI! (#1624 ) QA Details page: - Enables QA tab with ability to start automated analysis QA Run + view a and manual review status - Pages listed with review status + overall crawl review status shown on QA details (relates to #1508) - Initial placeholder for QA run analytics (part of #1589) - Addresses a good deal of #1477 Automated Analysis QA in Review Mode: - Ability to select from multiple analysis QA runs / view QA runs in QA details - Shows analysis screenshot, text and resources compare and replay tabs (fixes #1496) - Sorting by worst screenshot / worst text score for each QA run - Includes pages sidebar with screenshot/text/resource compare results (fixes #1497) Manual Review QA in Review Mode: - Per-page replay available as separate tab (fixes #1499) - Supports thumbs up, thumbs down, notes for each page - Supports entering review status approval (good/acceptable/bad can be entered when finishing review --------- Co-authored-by: Emma Segal-Grossman <hi@emma.cafe> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2024-04-04 15:09:52 -07:00
Ilya Kreymer	ffc4b5b58f	operator state fixes (follow up fomr #1639 ) (#1640 ) - increase time for going to waiting_capacity from starting to 150 seconds - relax requirement for state transitions, allow complete from waiting - additional type safety for different states, ensure mark_finished() only called with non-running states, add `Literal` types for all the state types.	2024-03-29 15:12:16 -07:00
Ilya Kreymer	3438133fcb	Crawler pod memory padding + auto scaling (#1631 ) - set memory limit to 1.2x memory request to provide extra padding and avoid OOM - attempt to resize crawler pods by 1.2x when exceeding 90% of available memory - do a 'soft OOM' (send extra SIGTERM) to pod when reaching 100% of requested memory, resulting in faster graceful restart, but avoiding a system-instant OOM Kill - Fixes #1632 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-03-28 16:39:00 -07:00
Ilya Kreymer	4f676e4e82	QA Runs Initial Backend Implementation (#1586 ) Supports running QA Runs via the QA API! Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes #1498 Also requires the latest Browsertrix Crawler 1.1.0+ (from webrecorder/browsertrix-crawler#469 branch) Notable changes: - QARun objects contain info about QA runs, which are crawls performed on data loaded from existing crawls. - Various crawl db operations can be performed on either the crawl or `qa.` object, and core crawl fields have been moved to CoreCrawlable. - While running,`QARun` data stored in a single `qa` object, while finished qa runs are added to `qaFinished` dictionary on the Crawl. The QA list API returns data from the finished list, sorted by most recent first. - Includes additional type fixes / type safety, especially around BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific get_upload(), get_basecrawl(), get_crawl() getters for internal use and get_crawl_out() for API - Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch) along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results. --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-03-20 22:42:16 -07:00
Ilya Kreymer	e7af081af1	profile browser fixes: better resource usage + load retry (main) (#1604 ) - Backend: Use separate resource constraints for profiles: default profile browser resources to either 'profile_browser_cpu' / 'profile_browser_memory' or single browser 'crawler_memory_base' / 'crawler_cpu_base', instead of scaled to the number of browser workers - Frontend: check that profile html page is loading, keep retrying if still getting nginx error instead of loading an iframe with the error. Fixes #1598 (Copy of #1599 from 1.9.4)	2024-03-16 15:07:04 -07:00
Ilya Kreymer	ea494fa6e6	Merge V1.9.3 changes into main (#1583 ) - Fix execution time checking by keeping lastUpdatedTime in db by @ikreymer in https://github.com/webrecorder/browsertrix-cloud/pull/1573 - disable postcss-lit for var css - Prevent closing tooltips from closing collection share dialog by @SuaYoo in https://github.com/webrecorder/browsertrix-cloud/pull/1579 - Fix pending exclusion pagination by @SuaYoo in https://github.com/webrecorder/browsertrix-cloud/pull/1578 - Fix regex escape in exclusion editor text match by @SuaYoo in https://github.com/webrecorder/browsertrix-cloud/pull/1577 --------- Co-authored-by: emma <hi@emma.cafe> Co-authored-by: sua yoo <sua@webrecorder.org>	2024-03-06 15:38:22 -08:00
Ilya Kreymer	2ac6584942	Refactor operator class into module (#1564 ) The operator class has gotten fairly large, this is a first pass in refactoring operator.py into a submodule instead, with multiple operator instances which handle different types of objects. - The main k8s interface has been split into K8sOpApi which extends K8sApi and is shared across all operators. - Each operator extends BaseOperator which also has an instance of K8sOpApi - The CrawlOperator is still the bulk of the functionality, but will likely be further refactored to support QA jobs --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-02-29 14:40:12 -08:00

8 Commits