* helm chart tweaks:
- lower mem requirements for backend and crawler
- disable cors in ingress to pass through cors headers from backend
- crawler statefulset: use ordered instead of parallel scaling policy to avoid single crawl taking up all crawling capacity quickly
- Upgrades webpack and webpack-dev-server for bugfixes and performance updates
- Removes unnecessary file watching
- Enables persistent build cache in dev
- Switches to faster dev source map
* terminology tweaks in frontend: (part of #922)
- use 'crawl workflow' instead of 'workflow' where possible
- use 'replay' instead of 'replay crawl'
- localization: rerun string extraction / processing
- "Review Config" → "Review Settings"
- "Workflow" → "Crawl Workflow" in error message
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Frontend:
- Renames list view to "All Archived Items"
- Refactors fetches to use single all-crawls endpoints
- Removes search by config ID for more search parity with uploads
- Adds sort by size
- Refactors property and method names to replace crawl*
- Replaces remaining references to "crawl" in copy with "item"'
- Rename Upload Archive button to Upload WACZ
- Fix focusout in item menu so menus close
Backend:
- Filter search values by type as well
- Only get list of cids for crawls in search values
- Don't list crawl/workflow ids in search values
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
* Always run yarn only on build platform with --platform=$BUILDPLATFORM
* Remove optional dependencies (playwright + chromium) from build with --ignore-optional and move some devDependencies to be optional
* Disable husky pre-commit hook checks on frontend
Co-authored-by: sua yoo <sua@suayoo.com>
- Always shows primary "Share" action button in Collection detail page.
- Enables toggling shareable status and share info from dialog. Difference from mockups: I made the "Done" button neutral do differentiate from our submit action buttons in the dialog, since toggling will apply changes immediately.
- Menu item: "Go to Public View"/"Go to Shareable View" -> "Visit Shareable URL".
- Toggle label: "Make Collection Shareable" -> "Collection is Shareable".
- Additional dialog copy: adds "This collection can be viewed by anyone with the link." under "Link to Share" and "Share this collection by embedding it into an existing webpage." under "Embed Collection".
- Moves share status icon to its own column in list view.
- Adds new syntax-highlighted code component that supports js and html.
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- all-crawls list endpoint filters now conform to 'Standardize list controls for archived items #1025' and URL decode values before passing them in
- Uploads list endpoint now includes all all-crawls filters relevant to uploads
- An all-crawls/search-values endpoint is added to support searching across all archived item types
- Crawl configuration names are now copied to the crawl when the crawl is created, and crawl names and descriptions are now editable via the backend API (note: this will require frontend changes as well to make them editable via the UI)
- Migration added to copy existing config names for active configs into their associated crawls. This migration has been tested in a local deployment
- New statuses generate-wacz, uploading-wacz, and pending-wait are added when relevant to tests to ensure that they pass
- Tests coverage added for all new all-crawls endpoints, filters, and sort values
* collections: support toggling collections public/private, viewable via RWP
- backend: add 'public' to collection model, support patching to update
- backend: add .../collections/<id>/public/replay.json for public access
- backend: add CORS handling for public endpoint
- frontend: support 'make shareable / make private' dropdown actions on collection detail + collection list views
- frontend: show shareable / private icons by collection name on detail + list views
- frontend: link to replayweb.page for standalone browsing
- frontend: add embed code popup when a collection is shareable
- refer to public collections as 'shareable' for now
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- Adds Collection info bar to detail view
- Update "Web Captures" -> "Archived Items"
- Updates Collection list columns to match
- Refactors `btrix-desc-list` and usage in `workflow-details` to reuse horizontal info bar component
- Adds tab for "Web Captures" in Collection detail view
- Move Collection description under Replay section
- Fixes app reloading when clicking into a Collection
- Standardizes Web Capture list headers from "Finished -> "Created Date"
operator: ensure transitions from each of these states is supported, including to 'waiting_capacity'
add extra check on stopping to avoid transitioning back to a running state after crawl is finished
ui: add states to UI display, localization, add as active states
fixes#263
* resource constraints: (fixes#895)
- for cpu, only set cpu requests
- for memory, set mem requests == mem limits
- add missing resource constraints for minio and scheduled job
- for crawler, set mem and cpu constraints per browser, scale based on browser instances per crawler
- add comments in values.yaml for crawler values being multiplied
- default values: bump crawler to 650 millicpu per browser instance just in case
cleanup: remove unused entries from main backend configmap
* support streaming download of collections (part of #927)
- WACZ zip created on the fly using stream-zip
- add 'Download Collection' option to collection detail and list
- after editing collection, return to collection view
- tests: add test for streaming download, ensure WACZ files + datapackage present, STORE compression used
---------
Co-authored-by: sua yoo <sua@suayoo.com>
* feat: ansible DO teardown
* fix(DO): idempotency issues in ansible teardown
* chore(DO): remove unused code
* docs(ansible): mention teardown in the docs
* fix: pass ansible-lint
* fix: point database backup upload to the correct location in DO space
* feat: use existing pre-commit framework
* feat(ci): add github action for password_check
* feat: add some simple tests to password_check.py
* fix: set `backend_password_secret` in default values.yaml to an allowed password
Operator: Modified init behavior to only load redis when at least one crawler pod available:
- waits for at least one crawler pod to be available before starting redis pod, to avoid situation where many crawler pods are in pending mode, but redis pods are still running.
- redis statefulset starts at scale of 0
- once crawler pod becomes available, redis sts is scaled to 1 (via `initRedis==true` status)
- crawl remains in 'starting' or 'waiting_capacity' state until pod becomes available without redis pod running
- set to 'running' state only after redis and at least one crawler pod is available
- if no crawler pods available after running, or, if stuck in starting for >60 seconds, switch to 'waiting_capacity' state
- when switching to 'waiting_capacity', also scale down redis to 0, wait for crawler pod to become available, only then scale up redis to 1, and get back to 'running'
other tweaks:
- add new status field 'initRedis', default to false, not displayed
- crawler pod: consider 'ContainerCreating' state as available, as container will not be blocked by resource limits
- add a resync after 3 seconds when waiting for crawler pod or redis pod to become available, configurable via 'operator_fast_resync_secs'
- set_state: if not updating state, ensure state reflects actual value in db
- Users can now upload .WACZ archives from the "Archived Data" page.
- Can specify name, description, tags and collection(s) to add upload to
- Show progress of upload
- Support canceling upload
* Move all pydantic models to models.py to avoid circular dependencies
* Include automated crawl details in all-crawls GET endpoints
- ensure /all-crawls endpoint resolves names / firstSeed data same as /crawls endpoint for crawls to ensure consistent frontend display. fields added in get and list all-crawl endpoints for automated
crawls only:
- cid
- name
- description
- firstSeed
- seedCount
- profileName
* Add automated crawl fields to list all-crawls test
* Uncomment mongo readinessProbe
* cleanup CrawlOutWithResources:
- remove 'files' from output model, only resources should be returned
- add _files_to_resources() to simplify computing presigned 'resources' from raw 'files'
- update upload tests to be more consistent, 'files' never present, 'errors' always none
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>