Commit Graph

16 Commits

Author SHA1 Message Date
Tessa Walsh
123705c53f
Serialize datetimes with Z suffix (#2058)
Use timezone aware datetimes instead of timezone naive datetimes:
- Update mongodb client to use tz-aware conversion
- Convert dt_now() to return timezone aware UTC date
- Rename to_k8s_date -> date_to_str, just returns ISO UTC date with 'Z'
(instead of '+00:00' suffix)
- Rename from_k8s_date -> str_to_date, returns timezone aware date from
str
- Standardize all string<->date conversion to use either date_to_str or
str_to_date
- Update frontend to assume iso date, not append 'Z' directly
- Update tests to check for 'Z' suffix on some dates

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-09-12 16:16:13 -07:00
Anish Lakhwara
1bf531e1ec
Fix: Make Collections Public on Creation (#1213)
- Add isPublic to Add Collection endpoint, send isPublic from frontend
- Fixes #1212
2023-09-29 12:08:10 -07:00
Anish Lakhwara
8b16124675
feat: implement 'collections' array with {name, id} for archived item details (#1098)
- rename 'collections' -> 'collectionIds', adding migration 0014
- only populate 'collections' array with {name, id} pair for get_crawl() / single archived item
path, but not for aggregate/list methods
- remove Crawl.get_crawl(), redundant with BaseCrawl.get_crawl() version
- ensure _files_to_resources returns an empty [] instead of none if empty (matching BaseCrawl.get_crawl() behavior to Crawl.get_crawl())
- tests: update tests to use collectionIds for id list, add 'collections' for {name, id} test
- frontend: change Crawl object to have collectionIds instead of collections

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-08-25 00:26:46 -07:00
Ilya Kreymer
8d0a4f2ca9
fix public collections endpoint returning 404 when not public (#1052)
tests: add tests for public collections endpoint when collection is public and when not
2023-08-04 13:29:13 -04:00
Ilya Kreymer
6506965d98
Streaming Download for Collections (#1012)
* support streaming download of collections (part of #927)
- WACZ zip created on the fly using stream-zip
- add 'Download Collection' option to collection detail and list
- after editing collection, return to collection view
- tests: add test for streaming download, ensure WACZ files + datapackage present, STORE compression used

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2023-07-26 15:42:17 -07:00
Tessa Walsh
fcd48b1831
Add totalSize to collections and make it sortable in list endpoint (#1001)
* Precompute collection.totalSize and make sortable

* Add migration to recompute collection data with totalSize
2023-07-24 13:12:23 -04:00
Ilya Kreymer
00eb62214d
Uploads API: BaseCrawl refactor + Initial support for /uploads endpoint (#937)
* basecrawl refactor: make crawls db more generic, supporting different types of 'base crawls': crawls, uploads, manual archives
- move shared functionality to basecrawl.py
- create a base BaseCrawl object, which contains start / finish time, metadata and files array
- create BaseCrawlOps, base class for CrawlOps, which supports base crawl deletion, querying and collection add/remove

* uploads api: (part of #929)
- new UploadCrawl object which extends BaseCrawl, has name and description
- support multipart form data data upload to /uploads/formdata
- support streaming upload of a single file via /uploads/stream, using botocore multipart upload to upload to s3-endpoint in parts
- require 'filename' param to set upload filename for streaming uploads (otherwise use form data names)
- sanitize filename, place uploads in /uploads/<uuid>/<sanitized-filename>-<random>.wacz
- uploads have internal id 'upload-<uuid>'
- create UploadedCrawl object with CrawlFiles pointing to the newly uploaded files, set state to 'complete'
- handle upload failures, abort multipart upload
- ensure uploads added within org bucket path
- return id / added when adding new UploadedCrawl
- support listing, deleting, and patch /uploads
- support upload details via /replay.json to support for replay
- add support for 'replaceId=<id>', which would remove all previous files in upload after new upload succeeds. if replaceId doesn't exist, create new upload. (only for stream endpoint so far).
- support patching upload metadata: notes, tags and name on uploads (UpdateUpload extends UpdateCrawl and adds 'name')

* base crawls api: Add /all-crawls list and delete endpoints for all crawl types (without resources)
- support all-crawls/<id>/replay.json with resources
- Use ListCrawlOut model for /all-crawls list endpoint
- Extend BaseCrawlOut from ListCrawlOut, add type
- use 'type: crawl' for crawls and 'type: upload' for uploads
- migration: ensure all previous crawl objects / missing type are set to 'type: crawl'
- indexes: add db indices on 'type' field and with 'type' field and oid, cid, finished, state

* tests: add test for multipart and streaming upload, listing uploads, deleting upload
- add sample WACZ for upload testing: 'example.wacz' and 'example-2.wacz'

* collections: support adding and remove both crawls and uploads via base crawl
- include collection_ids in /all-crawls list
- collections replay.json can include both crawls and uploads

bump version to 1.6.0-beta.2
---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-07-07 09:13:26 -07:00
Tessa Walsh
c7051d5fbf
Backend API consistency pass (#921)
* Make API add and update method returns consistent

- Updates return {"updated": True}
- Adds return {"added": True}
- Both can additionally have other fields as needed, e.g. id or name

- remove Profile response model, as returning added / id only
- reformat

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-06-16 18:52:46 -07:00
Tessa Walsh
bd6dc79449
Add frontend support for auto-adding collections to workflows (#916)
- Adds collections search and list to workflow editor
- Adds collections to workflow details component
- Adds namePrefix filter to backend GET /orgs/{oid}/collections endpoint to support case-insensitive searching of collections
- Adds documentation for new setting

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-06-12 18:18:05 -07:00
Tessa Walsh
e10b7093c7
Fix bug preventing deleting collections with no crawls (#912) 2023-06-08 11:28:30 -07:00
sua yoo
6208ead040
Sort collection by last updated (modified) (#897)
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-05-30 14:09:10 -04:00
Ilya Kreymer
4d30a64bc9
collection delete: (#896)
set delete endpoint to use DELETE verb, fix for #869
2023-05-29 18:19:04 -07:00
Tessa Walsh
9c7a312a4c
Rework collections to track collections in Crawl (#878)
* Track collections in Crawl rather than crawls in Collection
* Add delete collection API endpoint and tests
* Precompute collection crawlCount, pageCount, and tags and add them to
GET collection responses
* Add modified field to Collection
* Update collection replay.json method
* Make add and remove crawls accept list of crawl ids
* Auto-add new workflow crawls to collections when they successfully
complete via CrawlConfig.autoAddCollections field
* Move long-running post-crawl operator tasks into asyncio task
* Make CrawlConfig.autoAddCollections updatable via /update API endpoint
2023-05-25 15:41:50 -04:00
Tessa Walsh
60fac2b677
Add collection sorting and filtering (#863)
* Sort by name and description (ascending by default)
* Filter by name
* Add endpoint to fetch collection names for search
* Add collation so that utf-8 chars sort as expected
2023-05-22 16:53:49 -04:00
Tessa Walsh
f482831d53
Use collection uuid as id (instead of name) (#855)
Also ensure name is not empty by adding minimum length of 1

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-05-19 09:03:48 -04:00
Tessa Walsh
59e49eacd5
Update collections backend API (#759)
* Re-implement collections, storing crawlIds in collection

* Return collections for crawl endpoints and filter on coll name

* Remove crawl from all collections when deleted

* Revert get_collection_crawls to flat array of resources

* Fix tests
2023-04-14 12:17:18 -04:00