- all-crawls list endpoint filters now conform to 'Standardize list controls for archived items #1025' and URL decode values before passing them in
- Uploads list endpoint now includes all all-crawls filters relevant to uploads
- An all-crawls/search-values endpoint is added to support searching across all archived item types
- Crawl configuration names are now copied to the crawl when the crawl is created, and crawl names and descriptions are now editable via the backend API (note: this will require frontend changes as well to make them editable via the UI)
- Migration added to copy existing config names for active configs into their associated crawls. This migration has been tested in a local deployment
- New statuses generate-wacz, uploading-wacz, and pending-wait are added when relevant to tests to ensure that they pass
- Tests coverage added for all new all-crawls endpoints, filters, and sort values
* collections: support toggling collections public/private, viewable via RWP
- backend: add 'public' to collection model, support patching to update
- backend: add .../collections/<id>/public/replay.json for public access
- backend: add CORS handling for public endpoint
- frontend: support 'make shareable / make private' dropdown actions on collection detail + collection list views
- frontend: show shareable / private icons by collection name on detail + list views
- frontend: link to replayweb.page for standalone browsing
- frontend: add embed code popup when a collection is shareable
- refer to public collections as 'shareable' for now
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- Adds Collection info bar to detail view
- Update "Web Captures" -> "Archived Items"
- Updates Collection list columns to match
- Refactors `btrix-desc-list` and usage in `workflow-details` to reuse horizontal info bar component
- Adds tab for "Web Captures" in Collection detail view
- Move Collection description under Replay section
- Fixes app reloading when clicking into a Collection
- Standardizes Web Capture list headers from "Finished -> "Created Date"
operator: ensure transitions from each of these states is supported, including to 'waiting_capacity'
add extra check on stopping to avoid transitioning back to a running state after crawl is finished
ui: add states to UI display, localization, add as active states
fixes#263
* resource constraints: (fixes#895)
- for cpu, only set cpu requests
- for memory, set mem requests == mem limits
- add missing resource constraints for minio and scheduled job
- for crawler, set mem and cpu constraints per browser, scale based on browser instances per crawler
- add comments in values.yaml for crawler values being multiplied
- default values: bump crawler to 650 millicpu per browser instance just in case
cleanup: remove unused entries from main backend configmap
* support streaming download of collections (part of #927)
- WACZ zip created on the fly using stream-zip
- add 'Download Collection' option to collection detail and list
- after editing collection, return to collection view
- tests: add test for streaming download, ensure WACZ files + datapackage present, STORE compression used
---------
Co-authored-by: sua yoo <sua@suayoo.com>
* feat: ansible DO teardown
* fix(DO): idempotency issues in ansible teardown
* chore(DO): remove unused code
* docs(ansible): mention teardown in the docs
* fix: pass ansible-lint
* fix: point database backup upload to the correct location in DO space
* feat: use existing pre-commit framework
* feat(ci): add github action for password_check
* feat: add some simple tests to password_check.py
* fix: set `backend_password_secret` in default values.yaml to an allowed password
Operator: Modified init behavior to only load redis when at least one crawler pod available:
- waits for at least one crawler pod to be available before starting redis pod, to avoid situation where many crawler pods are in pending mode, but redis pods are still running.
- redis statefulset starts at scale of 0
- once crawler pod becomes available, redis sts is scaled to 1 (via `initRedis==true` status)
- crawl remains in 'starting' or 'waiting_capacity' state until pod becomes available without redis pod running
- set to 'running' state only after redis and at least one crawler pod is available
- if no crawler pods available after running, or, if stuck in starting for >60 seconds, switch to 'waiting_capacity' state
- when switching to 'waiting_capacity', also scale down redis to 0, wait for crawler pod to become available, only then scale up redis to 1, and get back to 'running'
other tweaks:
- add new status field 'initRedis', default to false, not displayed
- crawler pod: consider 'ContainerCreating' state as available, as container will not be blocked by resource limits
- add a resync after 3 seconds when waiting for crawler pod or redis pod to become available, configurable via 'operator_fast_resync_secs'
- set_state: if not updating state, ensure state reflects actual value in db
- Users can now upload .WACZ archives from the "Archived Data" page.
- Can specify name, description, tags and collection(s) to add upload to
- Show progress of upload
- Support canceling upload
* Move all pydantic models to models.py to avoid circular dependencies
* Include automated crawl details in all-crawls GET endpoints
- ensure /all-crawls endpoint resolves names / firstSeed data same as /crawls endpoint for crawls to ensure consistent frontend display. fields added in get and list all-crawl endpoints for automated
crawls only:
- cid
- name
- description
- firstSeed
- seedCount
- profileName
* Add automated crawl fields to list all-crawls test
* Uncomment mongo readinessProbe
* cleanup CrawlOutWithResources:
- remove 'files' from output model, only resources should be returned
- add _files_to_resources() to simplify computing presigned 'resources' from raw 'files'
- update upload tests to be more consistent, 'files' never present, 'errors' always none
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
* frontend:
- follow up to #969, fixes crawl workflows by using crawl-specific endpoint and merging results
* get crawls and uploads concurrently
---------
Co-authored-by: sua yoo <sua@suayoo.com>