Commit Graph

21 Commits

Author SHA1 Message Date
Ilya Kreymer
335700e683
Additional typing cleanup (#1938)
Misc typing fixes, including in profiles and time functions

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-07-17 10:49:22 -07:00
Ilya Kreymer
3cd52342a7
Remove Crawl Workflow Configmaps (#1894)
Fixes #1893 

- Removes crawl workflow-scoped configmaps, and replaces with operator-controlled
per-crawl configmaps that only contain the json config passed to Browsertrix
Crawler (as a volume).
- Other configmap settings replaced are replaced the custom CrawlJob options
(mostly already were, just added profile_filename and storage_filename)
- Cron jobs also updated to create CrawlJob without relying on configmaps,
querying the db for additional settings.
- The `userid` associated with cron jobs is set to the user that last modified
 the schedule of the crawl, rather than whomever last modified the workflow
- Various functions that deal with updating configmaps have been removed,
including in migrations.
- New migration 0029 added to remove all crawl workflow configmaps
2024-06-28 15:25:23 -07:00
Ilya Kreymer
6df10d5fb0
Improved Scale Handling (#1889)
Fixes #1888 

Refactors scale handling:
- Ensures number of scaled instances does not exceed number of pages,
but is also at minimum 1
- Checks for finish condition to be numFailed + numDone >= desired scale
- If at least one instance succeeds, crawl considers successful / done.
- If all instances fail, crawl considered failed
- Ensures that pod done count >= redis done count

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-06-26 10:24:45 -07:00
Tessa Walsh
9140dd75bc
Add and enforce readOnly field in Organization (#1886)
Fixes https://github.com/webrecorder/browsertrix/issues/1883
Backend work for https://github.com/webrecorder/browsertrix/issues/1876

- If readOnly is set true, disallow crawls and QA analysis runs
- If readOnly is set to true, skip scheduled crawls
- Add endpoint to set `readOnly` with optional `readOnlyReason` (which
is automatically set back to an empty string when `readOnly` is being
set to false), which can be displayed in banner
- Operator: ensures cronjobs that are skipped due to internal logic (eg. readonly mode) simply succeed right away and do not leave a k8s job dangling.

---------
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-06-25 19:30:53 -07:00
Ilya Kreymer
fa6627ce70
ensure QA configmap is updated for long running QA runs: (#1865)
- add a 'expire_at_duration_seconds' which is 75% of actual presign
duration time, or <25% remaining until presigned URL actually expires to
ensure presigned URLs are updated early than when they actually expire
- set cached expireAt time to the renew at time for more frequent
updates
- update QA configmap in place with updated presigned URLs when expireAt
time is reached
- mount qa config volume under /tmp/qa/ without subPath to get automatic
updates, which crawler will handle
- tests: fix qa test typo (from main)
- fixes #1864
2024-06-12 10:51:35 -07:00
Ilya Kreymer
d42de92d75
QA analysis scale configurable in helm chart (#1843)
- allow configuring QA run scale via 'qa_scale' setting in helm values
(overriding any setting on the qa crawljob)
- adds additional comments to browser instances helm values settings for clarity
- fixes #1842
2024-05-30 12:59:21 -07:00
Ilya Kreymer
61239a40ed
include workflow config in QA runs + different browser instances for QA (#1829)
Currently, the workflow crawl settings were not being included at all in
QA runs.
This mounts the crawl workflow config, as well as QA configmap, into QA
run crawls, allowing for page limits from crawl workflow to be applied
to QA runs.

It also allows a different number of browser instances to be used for QA
runs, as QA runs might work better with less browsers, (eg. 2 instead of
4). This can be set with `qa_browser_instances` in helm chart.

Default qa browser workers to 1 if unset (for now, for best results)

Fixes #1828
2024-05-29 13:32:25 -07:00
Ilya Kreymer
375057a819
check that status.lastUpdatedTime is set before attempting to subtract! (#1754)
Don't subtract none value!
2024-04-30 20:33:46 +02:00
Ilya Kreymer
f6c0791dc1
fix missing settings / typos: (#1748)
- ensure max_crawler_memory_size is inited before it is set!
- pass profile_browser_memory / profile_browser_cpu from chart values
- map volume to /tmp/home to avoid persisting /tmp for profiles
2024-04-25 09:00:17 +02:00
Ilya Kreymer
f286f04130
Typo Fix: fix typos in setting max crawler memory (#1747)
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-04-24 10:29:09 -04:00
Ilya Kreymer
ec74eb4242
operator: add 'max_crawler_memory' to limit autosizing of crawler pods (#1746)
Adds a `max_crawler_memory` chart setting, which, if set, will
defines the upper crawler memory limit that crawler pods can be resized up to.
If not set, auto resizing is disabled and pods are always set to 'crawler_memory' memory
2024-04-24 15:16:32 +02:00
Ilya Kreymer
95f5605af7
renumber crawl priority classes: (#1673)
- priority classes <-10 are ignored by cluster-autoscaler so QA jobs
with too low priorities never run
- start crawl priorities at 0 going down (same as before)
- start qa run priorities at -2 going down (instead of -100)
- this means a crawl of with scale of 3 can be preempted by 1st qa pod,
but otherwise crawls have higher priority
- rename priority classes as they are otherwise immutable and error on
helm upgrade

This allows for more room in lower pri classes for other type of
objects, while keeping in mind the -10 and below threshold: (see:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md)
2024-04-13 12:24:43 -07:00
Ilya Kreymer
f243d34395
Remove pages from QA Configmap (#1671)
Fixes #1670 

No longer need to pass pages to the ConfigMap. The ConfigMap has a size
limit and will fail if there are too many pages.

With this change, the page list for QA will be read directly from the
WACZ files pages.jsonl / extraPages.jsonl entries.
2024-04-12 16:04:33 -07:00
Ilya Kreymer
5c08c9679c
fix issue with incorrect number of total pages if any of the seeds is a redirect (#1649)
Following changes in webrecorder/browsertrix-crawler#475,
webrecorder/browsertrix-crawler#509, the crawler adds a redirected seed
to the seen list. To account for this, it needs to be subtracted to get
the total page count.
2024-04-04 15:55:44 -07:00
sua yoo
83c9203a11
Initial QA Review UI! (#1624)
QA Details page:
- Enables QA tab with ability to start automated analysis QA Run + view a and manual review status
- Pages listed with review status + overall crawl review status shown on QA details (relates to #1508)
- Initial placeholder for QA run analytics (part of #1589)
- Addresses a good deal of #1477

Automated Analysis QA in Review Mode:
- Ability to select from multiple analysis QA runs / view QA runs in QA details
- Shows analysis screenshot, text and resources compare and replay tabs (fixes #1496)
- Sorting by worst screenshot / worst text score for each QA run
- Includes pages sidebar with screenshot/text/resource compare results (fixes #1497)

Manual Review QA in Review Mode:
- Per-page replay available as separate tab (fixes #1499)
- Supports thumbs up, thumbs down, notes for each page
- Supports entering review status approval (good/acceptable/bad can be entered when finishing review

---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-04-04 15:09:52 -07:00
Ilya Kreymer
ffc4b5b58f
operator state fixes (follow up fomr #1639) (#1640)
- increase time for going to waiting_capacity from starting to 150
seconds
- relax requirement for state transitions, allow complete from waiting
- additional type safety for different states, ensure mark_finished()
only called with non-running states, add `Literal` types for all the
state types.
2024-03-29 15:12:16 -07:00
Ilya Kreymer
3438133fcb
Crawler pod memory padding + auto scaling (#1631)
- set memory limit to 1.2x memory request to provide extra padding and
avoid OOM
- attempt to resize crawler pods by 1.2x when exceeding 90% of available
memory
- do a 'soft OOM' (send extra SIGTERM) to pod when reaching 100% of
requested memory, resulting in faster graceful restart, but avoiding a
system-instant OOM Kill
- Fixes #1632

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-03-28 16:39:00 -07:00
Ilya Kreymer
4f676e4e82
QA Runs Initial Backend Implementation (#1586)
Supports running QA Runs via the QA API!

Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes
#1498

Also requires the latest Browsertrix Crawler 1.1.0+ (from
webrecorder/browsertrix-crawler#469 branch)

Notable changes:
- QARun objects contain info about QA runs, which are crawls
performed on data loaded from existing crawls.

- Various crawl db operations can be performed on either the crawl or
`qa.` object, and core crawl fields have been moved to CoreCrawlable.

- While running,`QARun` data stored in a single `qa` object, while
finished qa runs are added to `qaFinished` dictionary on the Crawl. The
QA list API returns data from the finished list, sorted by most recent
first.

- Includes additional type fixes / type safety, especially around
BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific
get_upload(), get_basecrawl(), get_crawl() getters for internal use and
get_crawl_out() for API

- Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch) 
along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results.

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-03-20 22:42:16 -07:00
Ilya Kreymer
e7af081af1
profile browser fixes: better resource usage + load retry (main) (#1604)
- Backend: Use separate resource constraints for profiles: default
profile browser resources to either 'profile_browser_cpu' /
'profile_browser_memory' or single browser 'crawler_memory_base' /
'crawler_cpu_base', instead of scaled to the number of browser workers

- Frontend: check that profile html page is loading, keep retrying if
still getting nginx error instead of loading an iframe with the error.

Fixes #1598 (Copy of #1599 from 1.9.4)
2024-03-16 15:07:04 -07:00
Ilya Kreymer
ea494fa6e6
Merge V1.9.3 changes into main (#1583)
- Fix execution time checking by keeping lastUpdatedTime in db by
@ikreymer in https://github.com/webrecorder/browsertrix-cloud/pull/1573
- disable postcss-lit for var css
- Prevent closing tooltips from closing collection share dialog by
@SuaYoo in https://github.com/webrecorder/browsertrix-cloud/pull/1579
- Fix pending exclusion pagination by @SuaYoo in
https://github.com/webrecorder/browsertrix-cloud/pull/1578
- Fix regex escape in exclusion editor text match by @SuaYoo in
https://github.com/webrecorder/browsertrix-cloud/pull/1577

---------
Co-authored-by: emma <hi@emma.cafe>
Co-authored-by: sua yoo <sua@webrecorder.org>
2024-03-06 15:38:22 -08:00
Ilya Kreymer
2ac6584942
Refactor operator class into module (#1564)
The operator class has gotten fairly large, this is a first pass in
refactoring operator.py into a submodule instead, with multiple operator
instances which handle different types of objects.

- The main k8s interface has been split into K8sOpApi which extends K8sApi
and is shared across all operators.
- Each operator extends BaseOperator which also has an instance of K8sOpApi
- The CrawlOperator is still the bulk of the functionality, but will likely be further refactored
to support QA jobs

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-02-29 14:40:12 -08:00