Closes#1655
### Changes
Removes the separate element inside the `<sl-select>` and instead
doesn't show localized language unless it's different from the language
in the current locale
As additional support for #1683, include the active QA stats in the
crawl response, along with active QA state.
This will allow showing progress of QA run in the archived items list.
Closes#424
### Changes
- Adds Browsertrix logo to the nav bar!!
- Will license the typeface (Konsole) as soon as this gets merged.
- Changes the theme colour to Webrecorder's turquoise brand colour
- Adjusts the usage of `primary` vs `blue` for elements where it
shouldn't be the same
- Changes the _Cancel & Discard_ button of the cancel crawl dialogue to
`danger` button variant
- Replaces the Replay icon with the new ReplayWebpage icon!
- Fixes the favicon SVG linking (wrong name, oops!)
- Fills the microscope lens in the microscope icon
- Removes the old btrix-cloud.svg logo
Co-authored-by: sua yoo <sua@webrecorder.org>
### Changes
- improves overflow issues at smaller screen sizes
- adds icons to buttons
- updates text & layout to match mocks
- changes primary button & button options depending on if there's a
qa run available
- adds a loading state for qa run status & buttons
- updates `<btrix-crawl-status>` with a `type` param allowing for
crawls, uploads, and QA runs
- Updates `<btrix-alert>` to match `<sl-tag>` styling
- Improves overflow issues at smaller viewport sizes by making tab
lists overflow when necessary
### Features
- Ability to start/stop/cancel QA runs
https://github.com/webrecorder/browsertrix-cloud/pull/1666 @SuaYoo
- Ability to see progress of current QA run @emma-sg
- Ability to delete QA runs @emma-sg
- Ability to download QA run files
https://github.com/webrecorder/browsertrix-cloud/pull/1666 @SuaYoo
- Only able to start review if a QA Run is finished (for now,
initial pass). @SuaYoo
- Only most recent running or successful QA run is displayed in
header
---------
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Fixes#1659
Takes an arbitrary set of thresholds for text and screenshot matches as
a comma-separated list of floats.
Returns a list of groupings for each that include the lower boundary and
count for all thresholds passed in.
- fixes#1684
- can be used to optionally restrict QA to only some crawls (eg. with
browsertrix-crawler>=1.0.0)
- enforce error on backend (return 400) and handle special error on the
frontend
Backend work for #1672
Adds new sort options to /crawls and /all-crawls GET list endpoints:
- `reviewStatus`
- `qaRunCount`: number of completed QA runs for crawl (also added to
CrawlOut)
- `qaState` (sorts by `activeQAState` first, then `lastQAState`, both of
which are added to CrawlOut)
- Remove globals from profile, uploads, and qa test modules in favor of fixtures
- Add retries to fix intermittent test failures due to timing
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- priority classes <-10 are ignored by cluster-autoscaler so QA jobs
with too low priorities never run
- start crawl priorities at 0 going down (same as before)
- start qa run priorities at -2 going down (instead of -100)
- this means a crawl of with scale of 3 can be preempted by 1st qa pod,
but otherwise crawls have higher priority
- rename priority classes as they are otherwise immutable and error on
helm upgrade
This allows for more room in lower pri classes for other type of
objects, while keeping in mind the -10 and below threshold: (see:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md)
Fixes#1670
No longer need to pass pages to the ConfigMap. The ConfigMap has a size
limit and will fail if there are too many pages.
With this change, the page list for QA will be read directly from the
WACZ files pages.jsonl / extraPages.jsonl entries.
- Includes eslint-related files in Dockerfile build step
- Switches to using a cache mount in Dockerfile build step for yarn
cache, rather than deleting yarn cache, which should speed up most
builds
Following [discussion on
Discord](https://discord.com/channels/895426029194207262/910966759165657161/1227299497546223707),
this enables ESLint errors showing up during dev within Webpack, and
also enforces that all issues (both warnings and errors) will cause
builds to fail during CI.
Other changes:
- Updates some dependencies (primarily to versions that are
type-checked)
- modify invite email template to answer common questions
- email templates: make each email template overridable with --set-file
- docs: update customization doc to document how to customize email
templates
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Fixes#1648
- Tracks failed QA runs in database, not only successful ones
- Includes failed QA runs in list endpoint by default
- Adds `skipFailed` param to list endpoint to return only successful
runs
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
QA Details page:
- Enables QA tab with ability to start automated analysis QA Run + view a and manual review status
- Pages listed with review status + overall crawl review status shown on QA details (relates to #1508)
- Initial placeholder for QA run analytics (part of #1589)
- Addresses a good deal of #1477
Automated Analysis QA in Review Mode:
- Ability to select from multiple analysis QA runs / view QA runs in QA details
- Shows analysis screenshot, text and resources compare and replay tabs (fixes#1496)
- Sorting by worst screenshot / worst text score for each QA run
- Includes pages sidebar with screenshot/text/resource compare results (fixes#1497)
Manual Review QA in Review Mode:
- Per-page replay available as separate tab (fixes#1499)
- Supports thumbs up, thumbs down, notes for each page
- Supports entering review status approval (good/acceptable/bad can be entered when finishing review
---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Closes#1642
### Changes
- Adds section to the collections page on downloading collections
- Changes the Files section on the archived items page to be more
explicit about downloading files because that's the only action you can
do there!
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
- increase time for going to waiting_capacity from starting to 150
seconds
- relax requirement for state transitions, allow complete from waiting
- additional type safety for different states, ensure mark_finished()
only called with non-running states, add `Literal` types for all the
state types.
Supports horizontal pod autoscaling (hpa) for backend and frontend pods:
- use cpu and memory averages
- adjust base memory + cpu for backend
- threshold set to 80% cpu and 95% memory utilization by default
(configurable in values.yaml)
- instead of backend and frontend replicas, set max replicas in
values.yaml
- only enable hpa if backend_max_replicas or frontend_max_replicas is
>1, default to 1 for now
- set memory limit to 1.2x memory request to provide extra padding and
avoid OOM
- attempt to resize crawler pods by 1.2x when exceeding 90% of available
memory
- do a 'soft OOM' (send extra SIGTERM) to pod when reaching 100% of
requested memory, resulting in faster graceful restart, but avoiding a
system-instant OOM Kill
- Fixes#1632
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Closes#1591
### Changes
- Converts one instance of a button with an icon in it to an `icon-button`
- Makes all the trashcan icon buttons have a red hover state
- Adds localization function & placeholder to upload dialog "Name" field
- Adds localization functions to some missing icon-button label
instances
- Adds a few missing icon button labels
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: sua yoo <sua@webrecorder.org>
Fixes#1617
Filters added:
- reviewed: filter by page has approval or at least one note (true) or
neither (false)
- approved: filter by approval value (accepts list of strings,
comma-separated, each of which are coerced into True, False, or None, or
ignored if they are invalid values)
- hasNotes: filter by has at least one note (true) or not (false)
Tests have also been added to ensure that results are as expected.
Fixes#1620
This increases the total timeout from 60 seconds to 120 seconds for
crawl to complete, which should be sufficient given how intermittently
the failure has been happening. Can increase it further if needed.
Supports running QA Runs via the QA API!
Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes
#1498
Also requires the latest Browsertrix Crawler 1.1.0+ (from
webrecorder/browsertrix-crawler#469 branch)
Notable changes:
- QARun objects contain info about QA runs, which are crawls
performed on data loaded from existing crawls.
- Various crawl db operations can be performed on either the crawl or
`qa.` object, and core crawl fields have been moved to CoreCrawlable.
- While running,`QARun` data stored in a single `qa` object, while
finished qa runs are added to `qaFinished` dictionary on the Crawl. The
QA list API returns data from the finished list, sorted by most recent
first.
- Includes additional type fixes / type safety, especially around
BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific
get_upload(), get_basecrawl(), get_crawl() getters for internal use and
get_crawl_out() for API
- Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch)
along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results.
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Adds a `browserlist` field to `package.json`, which Webpack picks up so
it doesn't convert nullish coalescing operators into obfuscated messes
like `key !== null && key !== void 0 ? key : null`.
This improves output size a little & improves the debugging experience
as well.
Tested in Chrome, FF, & Safari locally and didn't encounter any issues.
See title.
The only place this changes behaviour is in the placeholder page list,
which will be replaced by the real one shortly, so I'm going to just
merge this.
Fixes#1597
New endpoints (replacing old migration) to re-add crawl pages to db from
WACZs.
After a few implementation attempts, we settled on using
[remotezip](https://github.com/gtsystem/python-remotezip) to handle
parsing of the zip files and streaming their contents line-by-line for
pages. I've also modified the sync log streaming to use remotezip as
well, which allows us to remove our own zip module and let remotezip
handle the complexity of parsing zip files.
Database inserts for pages from WACZs are batched 100 at a time to help
speed up the endpoint, and the task is kicked off using
asyncio.create_task so as not to block before giving a response.
StorageOps now contains a method for streaming the bytes of any file in
a remote WACZ, requiring only the presigned URL for the WACZ and the
name of the file to stream.
Addresses failing test in
https://github.com/webrecorder/browsertrix-cloud/pull/1592 by fixing
asset imports in unit tests. Unit tests now import an empty string for
all assets--note: if we want to test actual asset content, will need to
update this config.
Follow-up to #1608 — quick fix for an issue I encountered after merging
main into #1497
Just going to directly merge once this completes (cc @SuaYoo for
visibility)
- instead of overriding the content-type header globally, pass
'application/merge-patch+json' to
self.custom_api.patch_namespaced_custom_object() directly
- bump kubernetes-asyncio to 29.0.0
- fixes potential issues with global override of the header in
kubernetes-asyncio
- copy of #1602 for main