Commit Graph

80 Commits

Author SHA1 Message Date
Tessa Walsh
cbcf087a48
Add last crawl and subscription status indicators to org list (#2273)
Fixes #2260 

- Adds `lastCrawlFinished` to Organization model, updated after crawls
are added/deleted and with an idempotent migration to backfill existing
orgs
- Adds Last Crawl column to end of admin orgs list table
- Adds subscription icon next to existing status icon in orgs list
- Adds "lastCrawlFinished", "subscriptionStatus", and "subscriptionPlan"
sort options to orgs list backend endpoint in anticipation of future
sorting/filtering of orgs list

---------

Co-authored-by: emma <hi@emma.cafe>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-01-14 10:57:06 -05:00
Tessa Walsh
190bdeb868
Add public API endpoint for public collections (#2174)
Fixes #1051 

If org with provided slug doesn't exist or no public collections exist
for that org, return same 404 response with a detail of
"public_profile_not_found" to prevent people from using public endpoint
to determine whether an org exists.

Endpoint is `GET /api/public-collections/<org-slug>` (no auth needed) to
avoid collisions with existing org and collection endpoints.
2025-01-13 15:15:48 -08:00
Tessa Walsh
42ebfd303d
Make changes to collections to support publicly listed collections (#2164)
Fixes #2158 

- Adds `Organization.listPublicCollections` field and API endpoint to
update it
- Replaces `Collection.isPublic` boolean with `Collection.access`
(values: `private`, `unlisted`, `public`) and add database migration
- Update frontend to use `Collection.access` instead of `isPublic`,
otherwise not changing current behavior

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2025-01-13 15:15:47 -08:00
Ilya Kreymer
db39333ef4
Send subscription cancelation email (#2234)
Adds sending a cancellation email when a subscription is cancelled.
- The email may also include an option survey optional survey URL, if
configured in helm chart `survey_url` setting.
- Cancellation e-mail configured in `sub_cancel` e-mail template
- E-mails are sent to all org admins.
- Also adds `trialing_canceled` subscription state to differentiate from
a default `trialing` which will automatically rollover into `active`.
- The email is sent when: a new cancellation date is added for an
`active` subscription, or a `trialing` subscription is changed to to
`trialing_canceled`. (A subscription can be canceled/uncanceled several
times before actual date, and e-mail is sent every time it is canceled.)
- The 'You have X days left of your trial' is also always displayed when
state is in trialing_canceled.

Fixes #2229
---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-12-12 11:52:38 -08:00
Ilya Kreymer
50dac7dc50
1.12.2 release -> main (#2181)
Merge 1.12.2 release changes into main, includes:
- Collection replay full refresh on metadata / archived items (#2176)
- Fix for self-registration default org (#2178)
- Prepend missing https in start URL (#2177)
- Updated billing to support free trial messaging (#2179)

---------

Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: SuaYoo <SuaYoo@users.noreply.github.com>
2024-11-26 11:17:07 -08:00
Tessa Walsh
ba5ca3fdd9
Move org storage recalculation into background job (#2138)
Fixes #2112 

- Moves org storage recalculation to background job, modify endpoint to
return job id as part of response
- Updates crawl + QA backend tests that broke due to
https://webrecorder.net website changes

---------

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2024-11-19 17:32:57 -05:00
Tessa Walsh
1b1819ba5a
Move org deletion to background job with access to backend ops classes (#2098)
This PR introduces background jobs that have full access to the backend
ops classes and moves the delete org job to a background job.
2024-10-10 14:41:05 -04:00
Vinzenz Sinapius
bb6e703f6a
Configure browsertrix proxies (#1847)
Resolves #1354

Supports crawling through pre-configured proxy servers, allowing users to select which proxy servers to use (requires browsertrix crawler 1.3+)

Config:
- proxies defined in btrix-proxies subchart
- can be configured via btrix-proxies key or separate proxies.yaml file via separate subchart
- proxies list refreshed automatically if crawler_proxies.json changes if subchart is deployed
- support for ssh and socks5 proxies
- proxy keys added to secrets in subchart
- support for default proxy to be always used if no other proxy configured, prevent starting cluster if default proxy not available
- prevent starting manual crawl if previously configured proxy is no longer available, return error
- force 'btrix' username and group name on browsertrix-crawler non-root user to support ssh

Operator:
- support crawling through proxies, pass proxyId in CrawlJob
- support running profile browsers which designated proxy, pass proxyId to ProfileJob
- prevent starting scheduled crawl if previously configured proxy is no longer available

API / Access:
- /api/orgs/all/crawlconfigs/crawler-proxies - get all proxies (superadmin only)
- /api/orgs/{oid}/crawlconfigs/crawler-proxies - get proxies available to particular org
- /api/orgs/{oid}/proxies - update allowed proxies for particular org (superadmin only)
- superadmin can configure which orgs can use which proxies, stored on the org
- superadmin can also allow an org to access all 'shared' proxies, to avoid having to allow a shared proxy on each org.

UI:
- Superadmin has 'Edit Proxies' dialog to configure for each org if it has: dedicated proxies, has access to shared proxies.
- User can select a proxy in Crawl Workflow browser settings
- Users can choose to launch a browser profile with a particular proxy
- Display which proxy is used to create profile in profile selector
- Users can choose with default proxy to use for new workflows in Crawling Defaults

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-10-02 18:35:45 -07:00
Ilya Kreymer
62da0fbd6c
security: tweak get /invite endpoints / InviteOut to: (#2087)
don't set inviterEmail / inviterName if the inviter is the superuser:
- return fromSuperuser true/false
- if fromSuperuser, don't set inviterEmail / inviterName
- tests: add tests for non-superuser admin invites
2024-09-20 11:52:56 -07:00
Ilya Kreymer
feb6b1f26c
Ensure email comparisons are case-insensitive, emails stored as lowercase (#2084) (#2086) (fixes from 1.11.7)
- Add a custom EmailStr type which lowercases the full e-mail, not just
the domain.
- Ensure EmailStr is used throughout wherever e-mails are used, both for
invites and user models
- Tests: update to check for lowercase email responses, e-mails returned
from APIs are always lowercase
- Tests: remove tests where '@' was ur-lencoded, should not be possible
since POSTing JSON and no url-decoding is done/expected. E-mails should
have '@' present.
- Fixes #2083 where invites were rejected due to case differences
- CI: pin pymongo dependency due to latest releases update, update python used for CI
2024-09-19 12:20:34 -07:00
Ilya Kreymer
04c8b50423
add a crawling defaults on the Org to allow setting certain crawl workflow fields as defaults: (#2031)
- add POST /orgs/<id>/defaults/crawling API to update all defaults
(defaults unset are cleared)
- defaults returned as 'crawlingDefaults' object on Org, if set
- fixes #2016

---------

Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
2024-08-22 10:36:04 -07:00
Ilya Kreymer
8c9a14b6a2
Ensure Subscription Update doesn't update the gifted quotas (#2012)
- add a separate OrgQuotasIn where all quota updates are optional
- ensure gifted quotas are never updated as part of org update
- update tests
2024-08-20 13:15:03 -07:00
Ilya Kreymer
12f994b864
QA: Count QA execution minutes separately for now (#2011)
For now, keep QA exec time separate, as it may be scaled differently and currently still in beta.
2024-08-09 13:13:21 -07:00
Ilya Kreymer
7fa2b61b29
Execution time tracking tweaks (#1994)
Tweaks to how execution time is tracked for more accuracy + excluding
waiting states:
- don't update if crawl state is in a 'waiting state' (waiting for
capacity or waiting for org limit)
- rename start states -> waiting states for clarity
- reset lastUpdatedTime if two consecutive updates of non-running state,
to ensure non-running states don't count, but also account for
occasional hiccups -- if only one update detects non-running state,
don't reset
- webhooks: move start webhook to when crawl actually starts for first
time (db lastUpdatedTime is not yet + crawl is running)
- don't set lastUpdatedTime until pods actually running
- set crawljob update interval to every 10 seconds for more accurate
execution time tracking
- frontend: show seconds in 'Execution Time' display
2024-08-06 09:44:44 -07:00
Ilya Kreymer
1c153dfd3c
Subscription Update Quotas (#1988)
- Follow-up to #1914, allows SubscriptionUpdate event to also update
quotas.
- Passes current usage info + current billing page URL to portalUrl
request for external app to be able to respond with best portalUrl
- get_origin() moved to utils to be available more generally.
- Updates billing tab to show current plans, switches order of quotas to
list execution time, storage first
2024-08-05 15:59:47 -07:00
Ilya Kreymer
94e985ae13
optimize org quota lookups (#1973)
- instead of looking up storage and exec min quotas from oid, and
loading an org each time, load org once and then check quotas on the org
object - many times the org was already available, and was looked up
again
- storage and exec quota checks become sync
- rename can_run_crawl() to more generic can_write_data(), optionally
also checks exec minutes
- typing: get_org_by_id() always returns org, or throws, adjust methods
accordingly (don't check for none, catch exception)
- typing: fix typo in BaseOperator, catch type errors in operator
'org_ops'
- operator quota check: use up-to-date 'status.size' for current job,
ignore current job in all jobs list to avoid double-counting
- follow up to #1969
2024-07-25 14:00:16 -07:00
Tessa Walsh
d38abbca7f
Standardize handling of storage and execution time quotas (#1969)
Fixes #1968 

Changes:
- `stopped_quota_reached` and `skipped_quota_reached` migrated to new
values that indicate which quota was reached
- Before crawls are run, the operator checks if storage or exec mins
quotas are reached and if so fails the crawl with the appropriate state
of `skipped_storage_quota_reached` or `skipped_time_quota_reached`
- While crawls are running, the operator checks if the exec mins quota
is reached or if the size of all running crawls will mean the storage
quota is reached once uploaded; if so, the crawl is stopped gracefully
and given `stopped_storage_quota_needed` or `stopped_time_quota_reached`
state as appropriate
- Adds new nightly tests for enforcing storage quota
2024-07-25 12:49:11 -07:00
Tessa Walsh
a02f7a6826
Ensure lexical sort for org names (#1958)
Fixes #1955 

Orgs list endpoint sorting now works as follows:
- Default org is always sorted first
- Name sorting now works on a lowercased version of the org names to
ensure lexical sorting

The lodash `sortBy` resorting of orgs in the "All Organizations"
dropdown list in the nav bar has also been removed so that the backend
sorting is applied instead.

Tests have been updated accordingly.
2024-07-23 13:13:04 -07:00
Ilya Kreymer
8c0321bdea
Pydantic 2.x update + type fixes + python 3.12 (#1947)
* updates pydantic to 2.x
* also update to python 3.12
* additional type fixes:
- all Optional[] types must have a default value
- update to constrained types
- URL types converted from str
- test updates

Fixes #1940
2024-07-22 17:23:03 -07:00
Tessa Walsh
2237120cd5
Add API endpoint to recalculate org storage (#1943)
Fixes #1942 

This process might be a bit slow for large orgs, may consider moving it to background job in #1898.
2024-07-19 18:29:20 -07:00
Tessa Walsh
6ccaad26d8
Ensure org name and slug uniqueness is case-insensitive (#1929)
Fixes #1927 

Also adds tests to ensure index is working as expected, and migration to
rename orgs that have names or slugs identical to other orgs except for
case before the new case-insensitive index is built.
2024-07-18 15:30:12 -07:00
Ilya Kreymer
4db3053a9f
fix crawlFilenameTemplate + add_crawl_config cleanup (fixes #1932) (#1935)
- ensure crawlFilenameTemplate is part of the CrawlConfig model
- change CrawlConfig init to use type-safe construction
- add a run_now_internal() that is shared for starting crawl, either on
demand or from new config
- add OrgOps.can_run_crawls() to check against org quotas for crawling
- cleanup profile updates, remove _lookup_profile, only check for
EmptyStr in update

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-07-17 10:48:25 -07:00
Tessa Walsh
60afb19472
Add API endpoint to import subscription for existing org (#1930)
Fixes #1926 

- adds /subscriptions/import endpoint for importing an existing subscription to an existing org
- add SubscriptionImport object and log as 'import' event in subscription events collection

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-07-16 16:17:02 -07:00
Tessa Walsh
d41647e6c2
Document all API endpoints with response models (#1928)
Fixes #1920 

Adds response models to all API endpoints that were missing them,
documenting current behavior without making any changes at this stage to
standardize responses.

Follow-up work will involve adding generics to some of the response models
2024-07-16 12:48:38 -07:00
Tessa Walsh
aaf18e70a0
Add created date to Organization and fix datetimes across backend (#1921)
Fixes #1916

- Add `created` field to Organization and OrgOut, set on org creation
- Add migration to backfill `created` dates from first workflow
`created`
- Replace `datetime.now()` and `datetime.utcnow()` across app with
consistent timezone-aware `utils.dt_now` helper function, which now uses
`datetime.now(timezone.utc)`. This is in part to ensure consistency in
how we handle datetimes, and also to get ahead of timezone naive
datetime creation methods like `datetime.utcnow()` being deprecated in
Python 3.12. For more, see:
https://blog.miguelgrinberg.com/post/it-s-time-for-a-change-datetime-utcnow-is-now-deprecated
2024-07-15 19:46:32 -07:00
Tessa Walsh
a546fb6fe0
Improve handling of duplicate org name/slug (#1917)
Initial implementation of #1892 

- Modifies the backend to return `duplicate_org_name` or
`duplicate_org_slug` as appropriate on a pymongo `DuplicateKeyError`
- Updates frontend to handle `duplicate_org_name`, `duplicate_org_slug`,
and `invalid_slug` error details
- Update errors to be more consistent, also return `duplicate_org_subscription.subId` for duplicate subscription instead of the more generic `already_exists`
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-07-10 19:24:50 -07:00
Ilya Kreymer
9a67e28f13
Adds Subscription API (#1914)
Fixes https://github.com/webrecorder/browsertrix/issues/1905

- adds a new top-level `/api/subscriptions` endpoint and SubOps handler on
the backend.
- enable subscriptions API endpoints available only if `billing_enabled` is
set in helm chart
- new POST /subscriptions/create, /subscriptions/update,
/subscriptions/cancel API endpoints
- Subscriptions mongo collection storing timestamped /subscription
API events
- GET /subscriptions/events API to get subscription events, support for filtering and sorting
- Subscription data model 
- Support for setting and handling readOnlyOnCancel on org
- /orgs/<id>/billing-portal to lookup portalUrl using external API
- subscription in org getter and list views
- mark org as readOnly for subscription status `paused_payment_failed`, clears it on status `active`

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-07-10 17:41:16 -07:00
sua yoo
c97900ec2b
Merge branch 'main' into frontend-org-manage-readonly 2024-07-08 11:20:30 -07:00
Tessa Walsh
192737ea99
Add API endpoint to delete org (#1448)
Fixes #903 

Adds superuser-only API endpoint to delete an org and all of its data

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-07-03 16:00:11 -04:00
Tessa Walsh
bca05ac185 Fix typing 2024-07-03 11:25:01 -04:00
Tessa Walsh
497cfdc561
Merge branch 'main' into frontend-org-manage-readonly 2024-07-03 11:15:47 -04:00
Tessa Walsh
787ebc8738 Add one more pylint disable comment 2024-07-03 11:14:46 -04:00
Tessa Walsh
5a563d20d9 Fix linting issues 2024-07-03 11:10:10 -04:00
Tessa Walsh
d3fb33a78a Add and apply backend sorting for org list
The default org will always be sorted first, regardless of sort options.
Orgs after the first will be sorted by name ascending by default.
Sorting currently supported on name, slug, and readOnly.
2024-07-03 11:01:01 -04:00
Ilya Kreymer
1c42e21b8a
Refactor Invites and Registration, Flatten Per-User Invites (#1902)
Fixes #1432

Refactors the invite + registration system to be simpler and more consistent
with regards to existing user invites. Previously, per-user invites are
stored in the user.invites dict instead of in the invites collection,
which creates a few issues:
- Existing user do not show up in Org Invites list: #1432 
- Existing user invites also do not expire, unlike new user invites,
creating potential security issue.

Instead, existing user invites should be treated like new user invites.
This PR moves them into the same collection,
adding a `userid` field to InvitePending to match with an existing user.

If a user already exists, it will be matched by userid, instead of by
email. This allows for user to update their email while still being
invited. Note that the email of the invited existing user will not
change in the invite email. This is also by design: an admin of one org
should not be given any hint that an invited user already has an
account, such as by having their email automatically update. For an org
admin, the invite to a new or existing user should be indistinguishable.

The sha256 of invite token is stored instead of actual token for better
security.

The registration system has also been refactored with the following
changes:
- Auto-creation of new orgs for new users has been removed
- User.create_user() replaces the old User._create() and just creates the user with
additional complex logic around org auto-add
- Users are added to org in org add_user_to_org()
- Users are added to org through invites with add_user_with_invite()

Tests:
- Additional tests include verifying that existing and new pending
invites appear in the pending invites list
- Tests for `/users/invite/<token>?email=` and
`/users/me/invite/<token>` endpoints
- Deleting pending invites
- Additional tests added for user self-registration, including existing
user self-registration to default org of existing user (in nightly
tests)
2024-07-02 15:13:27 -07:00
Tessa Walsh
f076e7d9e3
Add superuser API endpoints to export and import org data (#1394)
Fixes #890 

This PR introduces new streaming superuser-only API endpoints to export
and import database information for an organization. New Adminstrator
deployment documentation on how to manage the process and copy files
between S3 buckets as needed is also included.

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-07-02 17:14:34 -04:00
Ilya Kreymer
e1ef894275
Extends Org Create endpont + shared secret auth (#1897)
Updates the /api/orgs/create endpoint to:
- not have name / slug be required, will be renamed on first user via
#1870
- support optional quotas
- support optional first admin user email, who will receive an invite to
join the org.

Also supports a new shared secret mechanism, to allow an external
automation to access the /api/orgs/create endpoint (and only that
endpoint thus far) via a shared secret instead of normal login.
2024-07-01 09:37:02 -07:00
Tessa Walsh
8a904c9031
feat: Rename org when accepting org invite for first admin (#1870)
Resolves https://github.com/webrecorder/browsertrix/issues/1874

Support for new two-part sign up flow if first admin user is added to org
- If new user, user registers first, then is able to change the org name / slug on following screen
- If existing user, user accepts invite, then is able to change the org name / slug on following screen
- After confirming org slug name, user is taken to dashboard, or error is shown if org name or slug already taken.
- If org name == org id, org name and slug is automatically set to `{Your Name}'s Archive` when first user is registered / accepts invite
- Email templates updated to better reflect new / existing users and not show org name if it is 'unset' (org name == org id internally)
- tests: frontend unit testing for accept + invite screens.

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
2024-06-27 16:08:31 -07:00
Tessa Walsh
b7631d1b91
Add slug validation and test (#1891)
Fixes #1890 

Adds validation for org slugs, ensuring that they contain only ASCII
alphanumeric characters and dashes (`-`). If an invalid slug is
provided, an HTTPException is returned with status code 400 and detail
`invalid_slug`.
2024-06-26 15:04:54 -04:00
Tessa Walsh
9140dd75bc
Add and enforce readOnly field in Organization (#1886)
Fixes https://github.com/webrecorder/browsertrix/issues/1883
Backend work for https://github.com/webrecorder/browsertrix/issues/1876

- If readOnly is set true, disallow crawls and QA analysis runs
- If readOnly is set to true, skip scheduled crawls
- Add endpoint to set `readOnly` with optional `readOnlyReason` (which
is automatically set back to an empty string when `readOnly` is being
set to false), which can be displayed in banner
- Operator: ensures cronjobs that are skipped due to internal logic (eg. readonly mode) simply succeed right away and do not leave a k8s job dangling.

---------
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-06-25 19:30:53 -07:00
Ilya Kreymer
85cd214101
Fix regression to changing user roles via PATCH /user-role API (#1824)
clean up adding user vs changing role logic:
- when adding user, ensure user doesn't exist
- when changing roles, ensure user does exist

add test for changing roles of existing user

Fixes #1821
2024-05-24 10:41:05 -07:00
Ilya Kreymer
b061a39d5f
Handle registration when user already exists (#1744)
Fixes https://github.com/webrecorder/browsertrix/issues/1743

On the backend, support adding user to new org and improved error
messaging:
- if user exists is not part of the org to be added to, add user to the
registration org, return 201
- if user is already part of the org, return 400 with
'user_already_is_org_member' error
- if user is not being added to a new org but already exists, return
'user_already_exists'

frontend:
- if user user same password, they will just be logged in and added to
registration org (same as before)
- if user uses wrong password, show alert message
- handle both user_already_is_org_member and user_already_registered
with alert message and link to log-in page.

note: this means existing user is added to the registration org even if
they provide wrong password, but they won't be able
to login until they use correct password/reset/etc...
2024-04-24 16:40:25 +02:00
Ilya Kreymer
b94070160b
allow configuring designated registration org to which new users can register (#1735)
if 'registration_enabled' is set, check 'registration_org_id' for org id
of an existing org that new users should be added to when they register.
if omitted, default to the default org

Fixes #1729
2024-04-23 17:11:37 -04:00
Ilya Kreymer
4f676e4e82
QA Runs Initial Backend Implementation (#1586)
Supports running QA Runs via the QA API!

Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes
#1498

Also requires the latest Browsertrix Crawler 1.1.0+ (from
webrecorder/browsertrix-crawler#469 branch)

Notable changes:
- QARun objects contain info about QA runs, which are crawls
performed on data loaded from existing crawls.

- Various crawl db operations can be performed on either the crawl or
`qa.` object, and core crawl fields have been moved to CoreCrawlable.

- While running,`QARun` data stored in a single `qa` object, while
finished qa runs are added to `qaFinished` dictionary on the Crawl. The
QA list API returns data from the finished list, sorted by most recent
first.

- Includes additional type fixes / type safety, especially around
BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific
get_upload(), get_basecrawl(), get_crawl() getters for internal use and
get_crawl_out() for API

- Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch) 
along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results.

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-03-20 22:42:16 -07:00
Tessa Walsh
a898c2b456
Format backend with Black 24 (#1507)
Fixes #1506
2024-02-07 11:35:34 -08:00
Tessa Walsh
38a01860b8
Add API endpoints for crawl statistics (#1461)
Fixes #1158 

Introduces two new API endpoints that stream crawling statistics CSVs
(with a suggested attachment filename header):

- `GET /api/orgs/all/crawls/stats` - crawls from all orgs (superuser
only)
- `GET /api/orgs/{oid}/crawls/stats` - crawls from just one org
(available to org crawler/admin users as well as superusers)

Also includes tests for both endpoints.
2024-01-10 13:30:47 -08:00
Tessa Walsh
be41c48c27
Add extra and gifted execution minutes (#1361)
Fixes #1358 

- Adds `extraExecMinutes` and `giftedExecMinutes` org quotas, which are
not reset monthly but are updateable amounts that carry across months
- Adds `quotaUpdate` field to `Organization` to track when quotas were
updated with timestamp
- Adds `extraExecMinutesAvailable` and `giftedExecMinutesAvailable`
fields to `Organization` to help with tracking available time left
(includes tested migration to initialize these to 0)
- Modifies org backend to track time across multiple categories, using
monthlyExecSeconds, then giftedExecSeconds, then extraExecSeconds.
All time is also written into crawlExecSeconds, which is now the monthly
total and also contains any overage time above the quotas
- Updates Dashboard crawling meter to include all types of execution
time if `extraExecMinutes` and/or `giftedExecMinutes` are set above 0
- Updates Dashboard Usage History table to include all types of
execution time (only displaying columns that have data)
- Adds backend nightly test to check handling of quotas and execution
time
- Includes migration to add new fields and copy crawlExecSeconds to
monthlyExecSeconds for previous months

Co-authored-by: emma <hi@emma.cafe>
2023-12-07 14:34:37 -05:00
Ilya Kreymer
6384d8b5f1
Additional Type Hints / Type Fix Pass (#1320)
This PR adds more type safety to the backend codebase:
- All ops classes calls should be type checked
- Avoiding circular references with TYPE_CHECKING conditional
- Consistent UUID usage: uuid.UUID / UUID4 with just UUID
- Crawl states moved to models, made into lists
- Additional typing added as needed, fixed a few type related errors
- CrawlOps / UploadOps / BaseCrawlOps now all have same param init order
to simplify changes
2023-10-30 12:59:24 -04:00
Ilya Kreymer
6dc452ebad
Storage Refactor: Replication + Custom Storage Support (#1296)
- Refactors storage to support replicas + custom storages on the Org.
- There is a default primary + replica storage, while an Org can also have
primary and replica storages.
- StorageRef object is used to store references to default and custom
storage.

- CrawlFile has been updated to contain a StorageRef instead of a
def_storage_name, which references
either a default storage (in StorageOps) or custom storage (in
Organization)
- There is also a 'replicas' Optional[List[StorageRef]] which contains
replicas, if any.
- CrawlFileOut contain a numReplicas for how many replicas exist for
a given file.
- Migration: migration 0020 added to migrate existing Orgs, CrawlFile and ProfileFile objects to new storage system (CrawlFile and ProfileFile now extend BaseFile)


Part of #1262

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-10-26 21:44:09 -07:00
Tessa Walsh
38f32f11ea
Enforce quota and hard cap for monthly execution minutes (#1284)
Fixes #1261 Closes #1092

The quota for monthly execution minutes is treated as a hard cap. Once
it is exceeded, an alert indicating that an org has exceeded its monthly
execution minutes will display and the user will be unable to start new
crawls. Any running crawls will be stopped once the quota is exceeded.

An execution minutes meter bar is also added in the Org Dashboard and
displayed if a quota is set. More detail in #1305 which was
merged into this branch.

## Changes

- Enable setting 'maxExecMinutesPerMonth' in orgs list quotas by superadmin
- Enforce quota by stopping crawls in operator once quota is reached
- Show alert banner once execution time quota is hit:
- Once quota is hit, disable Run Crawl buttons in frontend, return 403
message with `exec_minutes_quota_reached` detail in backend from
crawl config `/run` endpoint, and don't run new workflows on creation
(similar to storage quota)
- Display execution time for crawls in the crawl details overview,
immediately below
- Show execution minutes meter on dashboard (from #1305)

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: sua yoo <sua@webrecorder.org>
2023-10-26 15:38:51 -07:00