browsertrix

Author	SHA1	Message	Date
Tessa Walsh	a898c2b456	Format backend with Black 24 (#1507 ) Fixes #1506	2024-02-07 11:35:34 -08:00
Tessa Walsh	4014d98243	Move pydantic models to separate module + refactor crawl response endpoints to be consistent (#983 ) * Move all pydantic models to models.py to avoid circular dependencies * Include automated crawl details in all-crawls GET endpoints - ensure /all-crawls endpoint resolves names / firstSeed data same as /crawls endpoint for crawls to ensure consistent frontend display. fields added in get and list all-crawl endpoints for automated crawls only: - cid - name - description - firstSeed - seedCount - profileName * Add automated crawl fields to list all-crawls test * Uncomment mongo readinessProbe * cleanup CrawlOutWithResources: - remove 'files' from output model, only resources should be returned - add _files_to_resources() to simplify computing presigned 'resources' from raw 'files' - update upload tests to be more consistent, 'files' never present, 'errors' always none --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-07-20 13:05:33 +02:00
Ilya Kreymer	00eb62214d	Uploads API: BaseCrawl refactor + Initial support for /uploads endpoint (#937 ) * basecrawl refactor: make crawls db more generic, supporting different types of 'base crawls': crawls, uploads, manual archives - move shared functionality to basecrawl.py - create a base BaseCrawl object, which contains start / finish time, metadata and files array - create BaseCrawlOps, base class for CrawlOps, which supports base crawl deletion, querying and collection add/remove * uploads api: (part of #929) - new UploadCrawl object which extends BaseCrawl, has name and description - support multipart form data data upload to /uploads/formdata - support streaming upload of a single file via /uploads/stream, using botocore multipart upload to upload to s3-endpoint in parts - require 'filename' param to set upload filename for streaming uploads (otherwise use form data names) - sanitize filename, place uploads in /uploads/<uuid>/<sanitized-filename>-<random>.wacz - uploads have internal id 'upload-<uuid>' - create UploadedCrawl object with CrawlFiles pointing to the newly uploaded files, set state to 'complete' - handle upload failures, abort multipart upload - ensure uploads added within org bucket path - return id / added when adding new UploadedCrawl - support listing, deleting, and patch /uploads - support upload details via /replay.json to support for replay - add support for 'replaceId=<id>', which would remove all previous files in upload after new upload succeeds. if replaceId doesn't exist, create new upload. (only for stream endpoint so far). - support patching upload metadata: notes, tags and name on uploads (UpdateUpload extends UpdateCrawl and adds 'name') * base crawls api: Add /all-crawls list and delete endpoints for all crawl types (without resources) - support all-crawls/<id>/replay.json with resources - Use ListCrawlOut model for /all-crawls list endpoint - Extend BaseCrawlOut from ListCrawlOut, add type - use 'type: crawl' for crawls and 'type: upload' for uploads - migration: ensure all previous crawl objects / missing type are set to 'type: crawl' - indexes: add db indices on 'type' field and with 'type' field and oid, cid, finished, state * tests: add test for multipart and streaming upload, listing uploads, deleting upload - add sample WACZ for upload testing: 'example.wacz' and 'example-2.wacz' * collections: support adding and remove both crawls and uploads via base crawl - include collection_ids in /all-crawls list - collections replay.json can include both crawls and uploads bump version to 1.6.0-beta.2 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-07-07 09:13:26 -07:00
Tessa Walsh	e9b61c632d	Add pageSize to pagination format (#736 )	2023-04-03 15:57:47 -04:00
Tessa Walsh	4724754efc	Filter and sort crawl and workflow list API endpoints in backend (#724 ) * Re-implement pagination and paginate crawlconfig revs First step toward simplifying pagination to set us up for sorting and filtering of list endpoints. This commit removes fastapi-pagination as a dependency. * Migrate all HttpUrl seeds to Seeds This commit also updates the frontend to always use Seeds and to fix display issues resulting from the change. * Filter and sort crawls and workflows Crawls: - Filter by createdBy (via userid param) - Filter by state (comma-separated string for multiple values) - Filter by first_seed, name, description - Sort by started, finished, fileSize, firstSeed - Sort descending by default to match frontend Workflows: - Filter by createdBy (formerly userid) and modifiedBy - Filter by first_seed, name, description - Sort by created, modified, firstSeed, lastCrawlTime * Add crawlconfigs search-values API endpoint and test	2023-03-28 17:55:40 -04:00
Tessa Walsh	e98c7172a9	Paginate API list endpoints (#659 ) * Paginate API list endpoints fastapi-pagination is pinned to 0.9.3, the latest release that plays nicely with pinned versions of fastapi and fastapi-users. * Increase page size via overriden Params and Page classes * update api resource list keys --------- Co-authored-by: sua yoo <sua@suayoo.com>	2023-03-06 14:41:25 -05:00

6 Commits