Commit Graph

202 Commits

Author SHA1 Message Date
Tessa Walsh
cd7b695520
Add backend support for custom behaviors + validation endpoint (#2505)
Backend support for #2151 

Adds support for specifying custom behaviors via a list of strings.

When workflows are added or modified, minimal backend validation is done
to ensure that all custom behavior URLs are valid URLs (after removing
the git prefix and custom query arguments).

A separate `POST /crawlconfigs/validate/custom-behavior` endpoint is
also added, which can be used to validate a custom behavior URL. It
performs the same syntax check as above and then:
- For URL directly to behavior file, ensures URL resolves and returns a
2xx/3xx status code
- For Git repositories, uses `git ls-remote` to ensure they exist (and
that branch exists if specified)

---------

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2025-04-02 16:20:51 -07:00
Ilya Kreymer
21a372057b
Fix user emails use userout (#2511)
Follow-up to #2495, actually ensure org subscription data is in included
in admin email response

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-03-24 12:04:39 -07:00
Ilya Kreymer
4c0ddd0fe3
crawl replay: remove isSeed=true from initialPages query (#2509)
- matches initial query for collections
- fixes 'Show Non-Seed Pages' not appearing for crawl replay
2025-03-20 15:03:41 -07:00
Ilya Kreymer
6be1f6674c
fixes token lifetime bug / improve security (#2490)
- fix jwt_token_lifetime being in hours, not minutes, remove extra * 60
- don't return userids in user list for org admins, instead just key
users by email, which is already unique
2025-03-19 10:07:09 -07:00
Ilya Kreymer
afa892000b
replay api: add downloadUrl to replay endpoints to be used by RWP (#2456)
RWP (2.3.3+) can determine if the 'Download Archive' menu item should be
showed based on the value of downloadUrl.
If set to 'null', will hide the menu item:
- set downloadUrl to public collection download for public collections
replay
- set downloadUrl to null for private collection and crawl replay to
hide the download menu item in RWP (otherwise have to add the
auth_header query with bearer token and should assess security before
doing that..)

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-03-03 14:11:28 -08:00
Ilya Kreymer
702c9ab3b7
Better cacheing of presigned URLs + support for thumbnails (#2446)
Overhauls URL presigning by:
- cache the presigned urls in a flat, separate mongodb collection which
has an expiring index
- update presigned urls if not found / expired automatically in index
- remove logic on storing presignedUrl in files
- support cacheing presigned URL for thumbnails.
- add endpoints to clear presigned urls for org or for all files in all
orgs (superadmin only)
- supersedes #2438, fix for #2437
- removes previous presignedUrl and expireAt data from crawls and QA
runs

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-03-03 12:05:23 -08:00
Ilya Kreymer
2263745df3
Fix replay.json 400 response for empty collection (#2445)
- fix #2443 
- don't throw error in list_pages() if no crawls provided, just return
empty list
- ensure an empty collection returns 200 on replay.json, add tests
2025-03-03 09:38:19 -08:00
Tessa Walsh
45aa0a32b6
Calculate total for crawl QA page endpoint (#2435)
Fixes #2434 

Patch fix for a regression in Browsertrix 1.4.0-1.4.1 where total was
not being calculated for QA page list endpoint but still being included
in response, which led to total always being 0 and pages not loading in
the frontend review screen as a result.
2025-02-27 11:46:35 -08:00
Ilya Kreymer
8a507f0473
Consolidate list page endpoints + better QA sorting + optimize pages fix (#2417)
- consolidate list_pages() and list_replay_query_pages() into
list_pages()
- to keep backwards compatibility, add <crawl>/pagesSearch that does not
include page totals, keep <crawl>/pages with page total (slower)
- qa frontend: add default 'Crawl Order' sort order, to better show
pages in QA view
- bgjob: account for parallelism in bgjobs, add logging if succeeded
mismatches parallelism
- QA sorting: default to 'crawl order' by default to get better results.
- Optimize pages job: also cover crawls that may not have any pages but have pages listed in done stats
- Bgjobs: give custom op jobs more memory
2025-02-21 13:47:20 -08:00
Tessa Walsh
f8fb2d2c8d
Rework crawl page migration + MongoDB Query Optimizations (#2412)
Fixes #2406 

Converts migration 0042 to launch a background job (parallelized across
several pods) to migrate all crawls by optimizing their pages and
setting `version: 2` on the crawl when complete.

Also Optimizes MongoDB queries for better performance.

Migration Improvements:

- Add `isMigrating` and `version` fields to `BaseCrawl`
- Add new background job type to use in migration with accompanying
`migration_job.yaml` template that allows for parallelization
- Add new API endpoint to launch this crawl migration job, and ensure
that we have list and retry endpoints for superusers that work with
background jobs that aren't tied to a specific org
- Rework background job models and methods now that not all background
jobs are tied to a single org
- Ensure new crawls and uploads have `version` set to `2`
- Modify crawl and collection replay.json endpoints to only include
fields for replay optimization (`initialPages`, `pageQueryUrl`,
`preloadResources`) if all relevant crawls/uploads have `version` set to
`2`
- Remove `distinct` calls from migration pathways
- Consolidate collection recompute stats

Query Optimizations:
- Remove all uses of $group and $facet
- Optimize /replay.json endpoints to precompute preload_resources, avoid
fetching crawl list twice
- Optimize /collections endpoint by not fetching resources 
- Rename /urls -> /pageUrlCounts and avoid $group, instead sort with
index, either by seed + ts or by url to get top matches.
- Use $gte instead of $regex to get prefix matches on URL
- Use $text instead of $regex to get text search on title
- Remove total from /pages and /pageUrlCounts queries by not using
$facet
- frontend: only call /pageUrlCounts when dialog is opened.


---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2025-02-20 15:26:11 -08:00
Ilya Kreymer
e112f96614
Upload Fixes: (#2397)
- ensure upload pages are always added with a new uuid, to avoid any
duplicates with existing uploads, even if upload wacz is actually a
crawl from different browsertrix instance, etc..
- cleanup upload names with slugify, which also replaces spaces, fixes
uploading wacz filenames with spaces in them
- part of fix for #2396

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-02-17 13:05:33 -08:00
Tessa Walsh
39d99e7c5d
Add support for custom link selectors to backend (#2346)
Related to #2152 

This PR adds backend support for custom link selectors via `selectLinks`
on the crawl workflow config. Tests have been updated as well.

It also adds `selectLinks` to the frontend in a minimal and for now
hardcoded way that we can use as a basis for proper frontend support
moving forward.

---------

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2025-02-13 22:22:27 -08:00
Tessa Walsh
7f1af9bb31
Mark all pages from pages.jsonl as seeds (#2390)
Fixes #2389 

All pages from `pages/pages.jsonl` files now have `isSeed: True` in the
database, in addition to any pages that explicitly have `seed` set to
true in the actual JSONL.

Tests have been added to ensure that all pages from our fixture uploads
have `isSeed: True`.

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-02-13 16:54:30 -08:00
Ilya Kreymer
7b2932c582
Add initial pages + pagesQuery endpoint to /replay.json APIs (#2380)
Fixes #2360 

- Adds `initialPages` to /replay.json response for collections, returning
up-to 25 pages (seed pages first, then sorted by capture time).
- Adds `pagesQueryUrl` to /replay.json
- Adds a public pages search endpoint to support public collections.
- Adds `preloadResources`, including list of WACZ files that should
always be loaded, to /replay.json

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-02-13 16:53:47 -08:00
Emma Segal-Grossman
f8a44258d8
Merge pull request #2332 from webrecorder/frontend-collection-editing-dialog
Collection editing and sharing revamp
2025-02-11 18:27:35 -05:00
Tessa Walsh
98a45b0d85
Add collection page list/search endpoint (#2354)
Fixes #2353

Adds a new endpoint to list pages in a collection, with filtering
available on `url` (exact match), `ts`, `urlPrefix`, `isSeed`, and
`depth`, as well as accompanying tests. Additional sort options have
been added as well.

These same filters and sort options have also been added to the crawl
pages endpoint.

Also fixes an issue where `isSeed` wasn't being set in the database when
false but only added on serialization, which was preventing filtering
from working as expected.
2025-02-10 16:44:37 -08:00
Tessa Walsh
0e9e70f3a3
Add WACZ filename, depth, favIconUrl, isSeed to pages (#2352)
Adds `filename` to pages, pointed to the WACZ file those files come
from, as well as depth, favIconUrl, and isSeed. Also adds an idempotent
migration to backfill this information for existing pages, and increases
the backend container's startupProbe time to 24 hours to give it sufficient
time to finish the migration.
---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-02-05 15:50:04 -05:00
Tessa Walsh
0a8df62ab4
Ensure collection stats are updated when WACZ is added on upload (#2351)
Fixes #2350 

Collection earliest/latest dates and the collection modified date are
also now updated when crawls or uploads are added to a collection via
the collection auto-add feature.
2025-01-30 13:05:56 -08:00
Tessa Walsh
9363095d62
Validate exclusion regexes on backend (#2316) 2025-01-23 13:32:54 -05:00
Tessa Walsh
763c654484
feat: Update collection sorting, metadata, stats (#2327)
- Refactors dashboard and org profile preview to use private API
endpoint, to fix public collections not showing when the org
visibility is hidden
- Adds additional sorting options for collections
- Adds unique page url counts for archived items, collections, and
organizations to backend and exposes this in collections
- Shows collection period (i.e. `dateEarliest` to `dateLatest`) in
collections list
- Shows same collection metadata in private and public views, updates
private view info bar
- Fixes "Update Org Profile" action item showing for crawler roles

---------

Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-01-23 13:32:23 -05:00
Tessa Walsh
6797b41de0
Add pageCount to crawls and uploads and use in frontend for page counts (#2315)
Fixes #2257 

This is a follow-up to the public collections work, which adds pages to
the database for uploads. All crawls and uploads now have a `pageCount`
field which is populated when the item is successfully added. A new
migration is also added to populate the field for existing archived
items that don't have it set yet.

OrgMetrics have also been modified to include `crawlPageCount` and
`uploadPageCount`, and to include the total of both in `pageCount`, and
all three included in the frontend org dashboard.

The frontend has been updated to use `pageCount` rather than
`stats.done` wherever appropriate, meaning that in archived item lists
and details we now have a consistent page count for both crawls and
uploads.

### New functionality

- Deploy this branch
- Create new crawls and uploads and verify that page count appears
correctly throughout the frontend for all new crawls and uploads

### Migration

- Deploy from latest main
- Create some crawls and uploads
- Change to this branch and re-deploy
- Verify migration ran without errors in backend logs
- Verify that page count has been populated successfully by checking
archived items lists, crawl and upload detail pages, and dashboard to
ensure there are no longer any missing page counts.

---------

Co-authored-by: emma <hi@emma.cafe>
2025-01-16 14:41:14 -08:00
Tessa Walsh
4583babecb
feat: Add slug to collections and use it in public collection URLs (#2301)
Resolves https://github.com/webrecorder/browsertrix/issues/2298

## Changes

- Slugs added to collections, can be specified separately when creating
or updating collections or else is based off of supplied collection name
- Migration added to backfill slugs for existing collections
- Redirect collection to newest slug if changed
- Adds option to copy public profile link to "Public Collections" action
menu
- Show "Back to <Org>" link instead of breadcrumbs

---------
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-01-15 22:44:32 -08:00
sua yoo
4347fcdba5
feat: Show collection created date (#2302)
- Shows collection created date in detail view (if present)
- Adds `black` formatter to vscode extension recommendations

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-01-14 11:22:00 -05:00
Tessa Walsh
cbcf087a48
Add last crawl and subscription status indicators to org list (#2273)
Fixes #2260 

- Adds `lastCrawlFinished` to Organization model, updated after crawls
are added/deleted and with an idempotent migration to backfill existing
orgs
- Adds Last Crawl column to end of admin orgs list table
- Adds subscription icon next to existing status icon in orgs list
- Adds "lastCrawlFinished", "subscriptionStatus", and "subscriptionPlan"
sort options to orgs list backend endpoint in anticipation of future
sorting/filtering of orgs list

---------

Co-authored-by: emma <hi@emma.cafe>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-01-14 10:57:06 -05:00
Tessa Walsh
a031fab313
Backend work for public collections (#2198)
Fixes #2182 

This rather large PR adds the rest of what should be needed for public
collections work in the frontend.

New API endpoints include:

- Public collections endpoints: GET, streaming download
- Paginated list of URLs in collection with snapshot (page) info for
each
- Collection endpoint to set home URL
- Collection endpoint to upload thumbnail as stream
- DELETE endpoint to remove collection thumbnail

Changes to existing API endpoints include:

- Paginating public collection list results
- Several `pages` endpoints that previously only supported `/crawls/` in
their path, e.g. `/orgs/{oid}/crawls/all/pages/reAdd`, now support
`/uploads/` and `/all-crawls/` namespaces as well. This is necessitated
by adding pages for uploads to the database (see below). For
`/orgs/{oid}/namespace/all/pages/reAdd`, `crawls` or `uploads` will
serve as a filter to only affect crawls of that given type. Other
endpoints are more liberal at this point, and will perform the same
action regardless of the namespace used in the route (we'll likely want
to change this in a follow-up to be more consistent).
- `/orgs/{oid}/namespace/all/pages/reAdd` now kicks off a background job
rather than doing all of the computation in an asyncio task in the
backend container. The background job additionally updates collection
date ranges, page/size counts, and tags for each collection in the org
after pages have been (re)added.

Other big changes:

- New uploads will now have their pages read into the database!
Collection page counts now also include uploads
- A migration was added to start a background job for each org that will
add the pages for previously-uploaded WACZ files to the database and
update collections accordingly
- Adds a new `ImageFile` subclass of `BaseFile` for thumbnails that we
can use for other user-uploaded image files moving forward, with
separate output models for authenticated and public endpoints
2025-01-13 15:15:48 -08:00
Tessa Walsh
190bdeb868
Add public API endpoint for public collections (#2174)
Fixes #1051 

If org with provided slug doesn't exist or no public collections exist
for that org, return same 404 response with a detail of
"public_profile_not_found" to prevent people from using public endpoint
to determine whether an org exists.

Endpoint is `GET /api/public-collections/<org-slug>` (no auth needed) to
avoid collisions with existing org and collection endpoints.
2025-01-13 15:15:48 -08:00
Tessa Walsh
42ebfd303d
Make changes to collections to support publicly listed collections (#2164)
Fixes #2158 

- Adds `Organization.listPublicCollections` field and API endpoint to
update it
- Replaces `Collection.isPublic` boolean with `Collection.access`
(values: `private`, `unlisted`, `public`) and add database migration
- Update frontend to use `Collection.access` instead of `isPublic`,
otherwise not changing current behavior

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2025-01-13 15:15:47 -08:00
Ilya Kreymer
c27758a0f6
quickfix: update test_api.py to match all locales enabled by default (#2241) 2024-12-13 20:30:06 -08:00
Emma Segal-Grossman
b650762a45
Allow configuring available languages from helm chart (#2230)
Closes #2223 

- [x] Adds `localesAvailable` to `/api/settings` endpoint, and uses that
list if available, rather than the full list of translated locales, to
determine which options to display to users
- [x] ~~Uses the user's browser locales, filtered to the current
language setting, for formatting numbers, dates, and durations~~
- [x] Adds & persists checkbox for "use same language for formatting
dates and numbers" in user settings
- [x] Replaces uses of `sl-format-bytes` with `localize.bytes(...)`, and
`sl-format-date` with replacement `btrix-format-date` that properly
handles fallback locales
- [x] Caches all number/duration/datetime formatters by a combined key
consisting of app language, browser language, browser setting, and
formatter options so that all formatters can be reused if needed
(previously any formatter with non-default options would be recreated
every render)
- [x] Splits out ordinal formatting from number formatter, as it didn't
make much sense in some non-English locales
- [x] Adds a little demo of date/time/duration/number formatting so you
can see what effect your language settings have
  


https://github.com/user-attachments/assets/724858cb-b140-4d72-a38d-83f602c71bc7

---------

Signed-off-by: emma <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2024-12-13 22:31:26 -05:00
Tessa Walsh
b7604ee61d
Add superuser endpoint to get user emails with org info (#2211)
Fixes #2203

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-12-09 16:38:01 -08:00
Tessa Walsh
ba5ca3fdd9
Move org storage recalculation into background job (#2138)
Fixes #2112 

- Moves org storage recalculation to background job, modify endpoint to
return job id as part of response
- Updates crawl + QA backend tests that broke due to
https://webrecorder.net website changes

---------

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2024-11-19 17:32:57 -05:00
Tessa Walsh
1b1819ba5a
Move org deletion to background job with access to backend ops classes (#2098)
This PR introduces background jobs that have full access to the backend
ops classes and moves the delete org job to a background job.
2024-10-10 14:41:05 -04:00
Ilya Kreymer
6032e28231
fix: firstOrgAdmin being set to true even if invite was not for an admin (#2110)
Non-admin users should not be given option to rename org when invited to
a new org:
- set firstOrgAdmin to true only when invite is for an admin
- default to false instead of null
- update tests to check
2024-10-08 16:42:30 -07:00
Ilya Kreymer
104ea097c4
switch to simpler streaming download + multiwacz metadata improvements: (#1982)
- download via presigned URLs via requests instead of boto APIs, remove boto
- follow-up to #1933 for streaming download improvements
- fixes datapackage.json in multi-wacz to contain the same resources
objects with: `name`, `path`, `hash`, `bytes` to match single WACZ.
- Add additional metadata to multi-wacz datapackage.json, including `type`
(`crawl`, `upload`, `collection`, `qaRun`), `id` (unique id for the
object), `title` / `description` if available (for
crawl/upload/collection), and `crawlId` for `qaRun`
2024-10-03 16:13:31 -07:00
Ilya Kreymer
62da0fbd6c
security: tweak get /invite endpoints / InviteOut to: (#2087)
don't set inviterEmail / inviterName if the inviter is the superuser:
- return fromSuperuser true/false
- if fromSuperuser, don't set inviterEmail / inviterName
- tests: add tests for non-superuser admin invites
2024-09-20 11:52:56 -07:00
Ilya Kreymer
feb6b1f26c
Ensure email comparisons are case-insensitive, emails stored as lowercase (#2084) (#2086) (fixes from 1.11.7)
- Add a custom EmailStr type which lowercases the full e-mail, not just
the domain.
- Ensure EmailStr is used throughout wherever e-mails are used, both for
invites and user models
- Tests: update to check for lowercase email responses, e-mails returned
from APIs are always lowercase
- Tests: remove tests where '@' was ur-lencoded, should not be possible
since POSTing JSON and no url-decoding is done/expected. E-mails should
have '@' present.
- Fixes #2083 where invites were rejected due to case differences
- CI: pin pymongo dependency due to latest releases update, update python used for CI
2024-09-19 12:20:34 -07:00
Tessa Walsh
123705c53f
Serialize datetimes with Z suffix (#2058)
Use timezone aware datetimes instead of timezone naive datetimes:
- Update mongodb client to use tz-aware conversion
- Convert dt_now() to return timezone aware UTC date
- Rename to_k8s_date -> date_to_str, just returns ISO UTC date with 'Z'
(instead of '+00:00' suffix)
- Rename from_k8s_date -> str_to_date, returns timezone aware date from
str
- Standardize all string<->date conversion to use either date_to_str or
str_to_date
- Update frontend to assume iso date, not append 'Z' directly
- Update tests to check for 'Z' suffix on some dates

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-09-12 16:16:13 -07:00
sua yoo
4c36c80351
feat: Display scale as number of browser windows (#2057)
Resolves https://github.com/webrecorder/browsertrix/issues/2048

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-09-05 17:32:40 -07:00
sua yoo
337454f8c9
feat: Add link to hosted sign-up page (#2045)
Resolves https://github.com/webrecorder/browsertrix/issues/2043

<!-- Fixes #issue_number -->

### Changes

- Shows link to sign up in UI if `sign_up_url` is configured.
- Expires settings in session storage (for now)
2024-08-26 17:26:25 -07:00
Ilya Kreymer
04c8b50423
add a crawling defaults on the Org to allow setting certain crawl workflow fields as defaults: (#2031)
- add POST /orgs/<id>/defaults/crawling API to update all defaults
(defaults unset are cleared)
- defaults returned as 'crawlingDefaults' object on Org, if set
- fixes #2016

---------

Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
2024-08-22 10:36:04 -07:00
Ilya Kreymer
8c9a14b6a2
Ensure Subscription Update doesn't update the gifted quotas (#2012)
- add a separate OrgQuotasIn where all quota updates are optional
- ensure gifted quotas are never updated as part of org update
- update tests
2024-08-20 13:15:03 -07:00
Tessa Walsh
916813af2d
Include user and user org info in login response (#2014)
Fixes #2013 

Adds the `/users/me` response data to the API login endpoint response
under the key `user_info` and adds a test.
2024-08-12 18:51:42 -07:00
Ilya Kreymer
5f53db75ee
fix resetting of invalid logins: (#2002)
* Fixes issue in FailedLogin model:
- fix data-model to remove nested 'attempted.attempted'
- migrate existing data to remove nested field

* Also, avoid setting dt_now() in model as that results in fixed date for
all objects:
- update FailedLogin to update 'attempted' date on every attempt
- also update PageNote object to set date in constructor

* Update text for too many logins to make it clear it is set only if its a
valid email

* fixes #2001
2024-08-07 12:36:06 -07:00
Ilya Kreymer
41d43ae249
Fix forgot password for invalid user (#1999)
- fix validation error if user doesn'r exist
- always return success even if user doesn't exist for security reasons
- add test for forgot password endpoint
2024-08-07 11:02:40 -07:00
Ilya Kreymer
1c153dfd3c
Subscription Update Quotas (#1988)
- Follow-up to #1914, allows SubscriptionUpdate event to also update
quotas.
- Passes current usage info + current billing page URL to portalUrl
request for external app to be able to respond with best portalUrl
- get_origin() moved to utils to be available more generally.
- Updates billing tab to show current plans, switches order of quotas to
list execution time, storage first
2024-08-05 15:59:47 -07:00
Tessa Walsh
551660bb62
Add webhooks for qaAnalysisStarted, qaAnalysisFinished, and crawlReviewed (#1974)
Fixes #1957 

Adds three new webhook events related to QA: analysis started, analysis
ended, and crawl reviewed.

Tests have been updated accordingly.

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-07-25 16:53:49 -07:00
Tessa Walsh
27ee16d308
Implement downloading archived item + QA runs as multi-WACZ (#1933)
Fixes #1412 

## Changes

### Backend
- Adds `all-crawls`, `crawls`, and `uploads` API endpoints to download
archived item as multi-WACZ
- Download QA runs as multi-WACZ
- Adds backend tests for new endpoints
- Update to new version of stream-zip library which does not require crc-32 to be present for ZIP members,
computes after streaming, fixing invalid crc-32 issues as previously computed crc-32s from crawler may be invalid.

### Frontend

Adds ability to download archived item from:

- Button in archived item detail Files tab
- Archived item details actions menu
- Archived items list menu



---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-07-25 10:28:57 -07:00
Ilya Kreymer
a8c5f07b7c
Add support e-mail to settings (#1960)
Adds support email to /api/settings
Also adds a response model for this endpoint and consolidates api tests

Addresses request in #1912
2024-07-23 20:58:12 -04:00
Tessa Walsh
a02f7a6826
Ensure lexical sort for org names (#1958)
Fixes #1955 

Orgs list endpoint sorting now works as follows:
- Default org is always sorted first
- Name sorting now works on a lowercased version of the org names to
ensure lexical sorting

The lodash `sortBy` resorting of orgs in the "All Organizations"
dropdown list in the nav bar has also been removed so that the backend
sorting is applied instead.

Tests have been updated accordingly.
2024-07-23 13:13:04 -07:00
Ilya Kreymer
8c0321bdea
Pydantic 2.x update + type fixes + python 3.12 (#1947)
* updates pydantic to 2.x
* also update to python 3.12
* additional type fixes:
- all Optional[] types must have a default value
- update to constrained types
- URL types converted from str
- test updates

Fixes #1940
2024-07-22 17:23:03 -07:00