- Upgrades webpack and webpack-dev-server for bugfixes and performance updates
- Removes unnecessary file watching
- Enables persistent build cache in dev
- Switches to faster dev source map
* Always run yarn only on build platform with --platform=$BUILDPLATFORM
* Remove optional dependencies (playwright + chromium) from build with --ignore-optional and move some devDependencies to be optional
* Disable husky pre-commit hook checks on frontend
Co-authored-by: sua yoo <sua@suayoo.com>
- Always shows primary "Share" action button in Collection detail page.
- Enables toggling shareable status and share info from dialog. Difference from mockups: I made the "Done" button neutral do differentiate from our submit action buttons in the dialog, since toggling will apply changes immediately.
- Menu item: "Go to Public View"/"Go to Shareable View" -> "Visit Shareable URL".
- Toggle label: "Make Collection Shareable" -> "Collection is Shareable".
- Additional dialog copy: adds "This collection can be viewed by anyone with the link." under "Link to Share" and "Share this collection by embedding it into an existing webpage." under "Embed Collection".
- Moves share status icon to its own column in list view.
- Adds new syntax-highlighted code component that supports js and html.
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- Adds top-level "Archived Data" view, replacing "Finished Crawls" and moving it as "Crawls" into view
- Adds list for viewing all artifacts/data
- Adds list for viewing all uploaded crawls
- Updates crawl detail view to show upload details
- Edit upload metadata, including 'name'
- Delete uploads
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
* basecrawl refactor: make crawls db more generic, supporting different types of 'base crawls': crawls, uploads, manual archives
- move shared functionality to basecrawl.py
- create a base BaseCrawl object, which contains start / finish time, metadata and files array
- create BaseCrawlOps, base class for CrawlOps, which supports base crawl deletion, querying and collection add/remove
* uploads api: (part of #929)
- new UploadCrawl object which extends BaseCrawl, has name and description
- support multipart form data data upload to /uploads/formdata
- support streaming upload of a single file via /uploads/stream, using botocore multipart upload to upload to s3-endpoint in parts
- require 'filename' param to set upload filename for streaming uploads (otherwise use form data names)
- sanitize filename, place uploads in /uploads/<uuid>/<sanitized-filename>-<random>.wacz
- uploads have internal id 'upload-<uuid>'
- create UploadedCrawl object with CrawlFiles pointing to the newly uploaded files, set state to 'complete'
- handle upload failures, abort multipart upload
- ensure uploads added within org bucket path
- return id / added when adding new UploadedCrawl
- support listing, deleting, and patch /uploads
- support upload details via /replay.json to support for replay
- add support for 'replaceId=<id>', which would remove all previous files in upload after new upload succeeds. if replaceId doesn't exist, create new upload. (only for stream endpoint so far).
- support patching upload metadata: notes, tags and name on uploads (UpdateUpload extends UpdateCrawl and adds 'name')
* base crawls api: Add /all-crawls list and delete endpoints for all crawl types (without resources)
- support all-crawls/<id>/replay.json with resources
- Use ListCrawlOut model for /all-crawls list endpoint
- Extend BaseCrawlOut from ListCrawlOut, add type
- use 'type: crawl' for crawls and 'type: upload' for uploads
- migration: ensure all previous crawl objects / missing type are set to 'type: crawl'
- indexes: add db indices on 'type' field and with 'type' field and oid, cid, finished, state
* tests: add test for multipart and streaming upload, listing uploads, deleting upload
- add sample WACZ for upload testing: 'example.wacz' and 'example-2.wacz'
* collections: support adding and remove both crawls and uploads via base crawl
- include collection_ids in /all-crawls list
- collections replay.json can include both crawls and uploads
bump version to 1.6.0-beta.2
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
- Support for creating new collections and editing existing collections
- Can select crawling workflows which adds entire workflow, and then deselect individual crawls
- Can edit existing collections and add more crawls
- Can view, create and delete collections via new Collections top-level nav entry
fixes from 1.4.1:
* Upgrade to mongo 6 and use for workflow crawls
* update readiness probe with timeouts doubled, and failure threshold increased for slower 'mongosh' readiness check
update versions to 1.5.0-beta.0 in backend and frontend
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
* misc frontend build fixes:
- fix playwright version to be consistent to fix playwright test
- chunking: set max number of chunks generated
* lock playwright version
* remove intl polyfill
---------
Co-authored-by: sua yoo <sua@suayoo.com>
* Rename archives to orgs and aid to oid on backend
* Rename archive to org and aid to oid in frontend
* Remove translation artifact
* Rename team -> organization
* Add database migrations and run once on startup
* This commit also applies the new by_one_worker decorator to other
asyncio tasks to prevent heavy tasks from being run in each worker.
* Run black, pylint, and husky via pre-commit
* Set db version and use in migrations
* Update and prepare database in single task
* Migrate k8s configmaps
* profile browser vnc support + fixes:
- switch profile browser rendering to use VNC
- frontend: add @novnc/novnc as dependency, create separate bundle novnc.js to load into vnc browser (to avoid loading from each container)
- frontend: update proxy paths to proxy websocket, index page to crawler
- frontend: allow browser profiles in all browsers, remove browser compatibility check
- frontend: update webpack dev config, apply prettier
- frontend: node version fix
- backend: get vncpassword, build new URL for proxying to crawler iframe
- backend: fix profile / crawl job pull policy from 'Always' -> 'Never', should use existing image for job
- backend: fix kill signal to use bash -c to work with latest backend image
- backend/chart: add 'profile_browser_timeout_seconds' to chart values to control how long profile browser to remain when idle (default to 60)
- backend: remove utils.py, now using secret.token_hex() for random suffix
Co-authored-by: sua yoo <sua@suayoo.com>
- fix typos in docs
- update prod deployment info
- update minikube info
- add info on how to run with local images
- bump version to 1.1.0-beta.3 for testing multiarch build