Fix a few edge-case situations:
- Restart evicted pods that have reached the terminal `Failed` state
with reason `Evicted`, by just recreating them. These pods will not be
automatically retried, so need to be recreated (usually happens due to
memory pressure from the node)
- Don't treat containers in ContainerCreating as running, even though
this state is usually quick, its possible for containers to get stuck
there, and will improve accuracy of exec seconds tracking.
- Consolidate state transition for running states, either sets to
running or to pending-wait/generate-wacz/upload-wacz and allows changing
from to either of these states from each other or waiting_capacity
Also improves the slug editing experience by partially-slugifying the
value as it's entered.
Previously, submitting a org slug value of ".." or similar would cause
the frontend to redirect to a "page not found" page, with all accessible
links leading to only `/account/settings`. This also causes the backend
to reset the org slug to one generated from the org name on a reload.
---------
Co-authored-by: sua yoo <sua@webrecorder.org>
* Fixes issue in FailedLogin model:
- fix data-model to remove nested 'attempted.attempted'
- migrate existing data to remove nested field
* Also, avoid setting dt_now() in model as that results in fixed date for
all objects:
- update FailedLogin to update 'attempted' date on every attempt
- also update PageNote object to set date in constructor
* Update text for too many logins to make it clear it is set only if its a
valid email
* fixes#2001
- fix validation error if user doesn'r exist
- always return success even if user doesn't exist for security reasons
- add test for forgot password endpoint
Following https://github.com/webrecorder/browsertrix/pull/1995, we want
to keep the usage history table on the dashboard for now for video
demos.
### Changes
- Adds usage history table back to org dashboard
- Makes "No usage history" message more apparent
Tweaks to how execution time is tracked for more accuracy + excluding
waiting states:
- don't update if crawl state is in a 'waiting state' (waiting for
capacity or waiting for org limit)
- rename start states -> waiting states for clarity
- reset lastUpdatedTime if two consecutive updates of non-running state,
to ensure non-running states don't count, but also account for
occasional hiccups -- if only one update detects non-running state,
don't reset
- webhooks: move start webhook to when crawl actually starts for first
time (db lastUpdatedTime is not yet + crawl is running)
- don't set lastUpdatedTime until pods actually running
- set crawljob update interval to every 10 seconds for more accurate
execution time tracking
- frontend: show seconds in 'Execution Time' display
- Follow-up to #1914, allows SubscriptionUpdate event to also update
quotas.
- Passes current usage info + current billing page URL to portalUrl
request for external app to be able to respond with best portalUrl
- get_origin() moved to utils to be available more generally.
- Updates billing tab to show current plans, switches order of quotas to
list execution time, storage first
- no longer being used with latest stream-zip
- was not computed correctly in the crawler
- counterpart to webrecorder/browsertrix-crawler#657
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
an org is made read-only while crawls are running:
- treat similar to other stopped_* states, do a graceful stop
- update UI to display "Stopped: Crawling Disabled" for this status
- don't add corresponding skipped status - just skip running crawls if org is read-only
If a cronjob is disabled, the operator should quickly return a success
value so that the job can be terminated.
Was previously returning an incorrect response, causing disabled
cronjobs to not be cleaned up. Add proper typing to always return correct response
Fixes#1957
Adds three new webhook events related to QA: analysis started, analysis
ended, and crawl reviewed.
Tests have been updated accordingly.
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Resolves https://github.com/webrecorder/browsertrix/issues/1951
### Changes
- Shows date org was created in superadmin org list
- Visually differentiates unnamed org ID
- Adds "Admin" badge to app header to make current login more apparent
- Fixes logic to show "create org" dialog if there are no orgs in an
instance
- Refactors `btrix-home` to remove unused references to non-superadmin
org list
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- instead of looking up storage and exec min quotas from oid, and
loading an org each time, load org once and then check quotas on the org
object - many times the org was already available, and was looked up
again
- storage and exec quota checks become sync
- rename can_run_crawl() to more generic can_write_data(), optionally
also checks exec minutes
- typing: get_org_by_id() always returns org, or throws, adjust methods
accordingly (don't check for none, catch exception)
- typing: fix typo in BaseOperator, catch type errors in operator
'org_ops'
- operator quota check: use up-to-date 'status.size' for current job,
ignore current job in all jobs list to avoid double-counting
- follow up to #1969
Resolves https://github.com/webrecorder/browsertrix/issues/1912
### Changes
- Show support email, if available, in invalid invite error message
- Separate error message for invite email that doesn't match current
user's
Fixes#1968
Changes:
- `stopped_quota_reached` and `skipped_quota_reached` migrated to new
values that indicate which quota was reached
- Before crawls are run, the operator checks if storage or exec mins
quotas are reached and if so fails the crawl with the appropriate state
of `skipped_storage_quota_reached` or `skipped_time_quota_reached`
- While crawls are running, the operator checks if the exec mins quota
is reached or if the size of all running crawls will mean the storage
quota is reached once uploaded; if so, the crawl is stopped gracefully
and given `stopped_storage_quota_needed` or `stopped_time_quota_reached`
state as appropriate
- Adds new nightly tests for enforcing storage quota
Fixes#1412
## Changes
### Backend
- Adds `all-crawls`, `crawls`, and `uploads` API endpoints to download
archived item as multi-WACZ
- Download QA runs as multi-WACZ
- Adds backend tests for new endpoints
- Update to new version of stream-zip library which does not require crc-32 to be present for ZIP members,
computes after streaming, fixing invalid crc-32 issues as previously computed crc-32s from crawler may be invalid.
### Frontend
Adds ability to download archived item from:
- Button in archived item detail Files tab
- Archived item details actions menu
- Archived items list menu
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
The monthly execution minutes meter was using the wrong metric,
`crawlExecSeconds` instead of `monthlyExecSeconds`.
Since the meter is showing minutes used out of the monthly quota, it
should be using `monthlyExecSeconds`.
This is a quick fix, however, this system may need another pass at some
point.
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Fixes https://github.com/webrecorder/browsertrix/issues/1954
### Changes
- Refactors app state to include org data
- Fixes banner not showing if storage or execution minutes is exceeded
after page load
- Disables closing banners
- Refreshes org when tab changes
---------
Co-authored-by: emma <hi@emma.cafe>
- make crawl args a reusable template
- adds QA_ARGS to configmap, setting to same value as CRAWL_ARGS but
with --behaviors= prepended to disable behaviors for QA, to improve
performance of QA runs.
fixes#1962
- only enable if 'enable_auto_resize' is true, default to false
- if true, set memory limit to 1.2 of memory requests, resize when
hitting 'soft oom' of initial request, adjust by 1.2 (current behavior)
up to max_crawler_memory
- if false, set memory limit to max_crawler_memory and never adjust
memory requests or memory limits
- part of #1959
Directs user that doesn't belong to any orgs to account settings page,
with banner.
Also contains some minor out-of-scope changes:
- Refactors `isAdmin` key to `isSuperAdmin` for more legibility on
whether current user is superadmin or regular user without orgs
- Adds "cancel" button to change password form
Fixes#1955
Orgs list endpoint sorting now works as follows:
- Default org is always sorted first
- Name sorting now works on a lowercased version of the org names to
ensure lexical sorting
The lodash `sortBy` resorting of orgs in the "All Organizations"
dropdown list in the nav bar has also been removed so that the backend
sorting is applied instead.
Tests have been updated accordingly.
* updates pydantic to 2.x
* also update to python 3.12
* additional type fixes:
- all Optional[] types must have a default value
- update to constrained types
- URL types converted from str
- test updates
Fixes#1940
Follow-up to regressions from #1928, this PR:
- Fixes response models for queue endpoints, which had incorrect model
- Adds tests for queue get, queue match, and exclusions add / remove to
ensure regressions like this can be caught via tests. This involves
starting a new crawl in test_run_crawls() instead of relying on implicit
running via fixtures, make it easier to test crawl while it's running.
- Adds additional typing for crawls apis, including making
delete_crawls() have correct typing, consistent derived class override
- Adds check to ensure queue + exclusion operations can not be called
when crawl is not running
Fixes#1927
Also adds tests to ensure index is working as expected, and migration to
rename orgs that have names or slugs identical to other orgs except for
case before the new case-insensitive index is built.
### Changes
- Updates customer portal link label
- Opens billing portal in the same tab
- Shows separate cancel date
- Makes payment failed appear as error
- Fixes crawl time quota
- ensure crawlFilenameTemplate is part of the CrawlConfig model
- change CrawlConfig init to use type-safe construction
- add a run_now_internal() that is shared for starting crawl, either on
demand or from new config
- add OrgOps.can_run_crawls() to check against org quotas for crawling
- cleanup profile updates, remove _lookup_profile, only check for
EmptyStr in update
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>