- Always shows primary "Share" action button in Collection detail page.
- Enables toggling shareable status and share info from dialog. Difference from mockups: I made the "Done" button neutral do differentiate from our submit action buttons in the dialog, since toggling will apply changes immediately.
- Menu item: "Go to Public View"/"Go to Shareable View" -> "Visit Shareable URL".
- Toggle label: "Make Collection Shareable" -> "Collection is Shareable".
- Additional dialog copy: adds "This collection can be viewed by anyone with the link." under "Link to Share" and "Share this collection by embedding it into an existing webpage." under "Embed Collection".
- Moves share status icon to its own column in list view.
- Adds new syntax-highlighted code component that supports js and html.
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- Adds Collection info bar to detail view
- Update "Web Captures" -> "Archived Items"
- Updates Collection list columns to match
- Refactors `btrix-desc-list` and usage in `workflow-details` to reuse horizontal info bar component
- Adds tab for "Web Captures" in Collection detail view
- Move Collection description under Replay section
- Fixes app reloading when clicking into a Collection
- Standardizes Web Capture list headers from "Finished -> "Created Date"
operator: ensure transitions from each of these states is supported, including to 'waiting_capacity'
add extra check on stopping to avoid transitioning back to a running state after crawl is finished
ui: add states to UI display, localization, add as active states
fixes#263
- Users can now upload .WACZ archives from the "Archived Data" page.
- Can specify name, description, tags and collection(s) to add upload to
- Show progress of upload
- Support canceling upload
* additional fixes for #935:
- don't use artifactType for detail pages, ensure correct artifact selected based on path
* naming tweaks:
- from uploads detail, return to 'All Uploads' with filter
- from crawls detail, return to 'All Crawls' with filter
- rename general to 'All Archived Data'
- Adds top-level "Archived Data" view, replacing "Finished Crawls" and moving it as "Crawls" into view
- Adds list for viewing all artifacts/data
- Adds list for viewing all uploaded crawls
- Updates crawl detail view to show upload details
- Edit upload metadata, including 'name'
- Delete uploads
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
* Make API add and update method returns consistent
- Updates return {"updated": True}
- Adds return {"added": True}
- Both can additionally have other fields as needed, e.g. id or name
- remove Profile response model, as returning added / id only
- reformat
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- Adds collections search and list to workflow editor
- Adds collections to workflow details component
- Adds namePrefix filter to backend GET /orgs/{oid}/collections endpoint to support case-insensitive searching of collections
- Adds documentation for new setting
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- Unifies trash icons on all pages to use trash3 (there were a few stragglers!)
- Brings styling of org quotas dialogue in-line with the rest of our dialogues
- Adds missing localization strings
- Swaps button with icon button to match table row action styling elsewhere
- Adds two new properties, name to pick the icon's name and content to pick a custom tooltip message. These are in-line with what Shoelace uses but are perhaps not the best descriptors...
- Swaps the existing anchor links on the Workflow Details' Settings tab for these and relocates them to after the heading. (Navigation to the links is broken right now... but the copying part works nicely!)
- Updates btrix-section-heading to better handle multiple elements with flexbox and an 8px gap between elements
- Support for creating new collections and editing existing collections
- Can select crawling workflows which adds entire workflow, and then deselect individual crawls
- Can edit existing collections and add more crawls
- Can view, create and delete collections via new Collections top-level nav entry
concurrent crawl limits: (addresses #866)
- support limits on concurrent crawls that can be run within a single org
- change 'waiting' state to 'waiting_org_limit' for concurrent crawl limit and 'waiting_capacity' for capacity-based
limits
orgs:
- add 'maxConcurrentCrawl' to new 'quotas' object on orgs
- add /quotas endpoint for updating quotas object
operator:
- add all crawljobs as related, appear to be returned in creation order
- operator: if concurrent crawl limit set, ensures current job is in the first N set of crawljobs (as provided via 'related' list of crawljob objects) before it can proceed to 'starting', otherwise set to 'waiting_org_limit'
- api: add org /quotas endpoint for configuring quotas
- remove 'new' state, always start with 'starting'
- crawljob: add 'oid' to crawljob spec and label for easier querying
- more stringent state transitions: add allowed_from to set_state()
- ensure state transitions only happened from allowed states, while failed/canceled can happen from any state
- ensure finished and state synched from db if transition not allowed
- add crawl indices by oid and cid
frontend:
- show different waiting states on frontend: 'Waiting (Crawl Limit) and 'Waiting (At Capacity)'
- add gear icon on orgs admin page
- and initial popup for setting org quotas, showing all properties from org 'quotas' object
tests:
- add concurrent crawl limit nightly tests
- fix state waiting -> waiting_capacity
- ci: add logging of operator output on test failure
* Precompute config crawl stats
Includes a database migration to move preciously dynamically computed
crawl stats for workflows into the CrawlConfig model.
* Add lastRun sorting option and enable it by default
* Add modified as final sort key to order non-run workflows
* Remove currCrawl* fields and update frontend accordingly
* Add isCrawlRunning field to backend and use in frontend
- Updates superadmin "Running Crawls" to show active crawls (starting, waiting, running, stopping) and sort by start by default
- Navigates to crawl workflow watch view on clicking crawl item
- Adds "Copy Crawl ID" to crawl actions for easy paste into "Jump to crawl"
- Navigates to crawl workflow watch when jumping to crawl
* frontend crawl stopping improvements (#836)
- support new backend 'stopping' property
- for now, keep 'stopping' indicator state when crawl is running but stopping set to true
* operator: add waiting state
- add pods as related objects
- inspect pod status, set crawl status to 'waiting' if no pods are running
frontend:
- frontend support for 'waiting' state
- show waiting icon from mocks
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- Adds an input to the Workflow creation and edit form for specifying crawl depth. This input is conditionally shown for seeded crawls when the scope is set to "Pages on this domain", "Pages on this domain & subdomains" or "Custom page prefix". The "any" scope is also supported for backwards compatibility but is not shown by default or in new configs.
- API implementation: The depth value is set in the primary seed config, i.e. the first seed in seeds: [], not in the outer .config.depth property.
Resolves#817
- Adds relevant action buttons to each Workflow detail tab header
- Adds "Delete" action menu item to crawls in Crawls tab
- Prevent automatically switching to "Watch" tab after running crawl from detail page
- Removes "Stop" confirmation prompt and only shows "Cancel" confirmation prompt if there are one or more pages crawled
- Replaces "Cancel" confirmation prompt with web component dialog (partially addresses Switch to in-page dialogue boxes #619)
- Fixes hash routing to fix going back with browser back button
- Adds some labels to missing icon buttons
- Fixes metadata `aria-label` usage → `label` so it actually gets added to the rendered `button`
- Changes the "More" label to a (hopefully) more descriptive "Actions" label for dropdown actions menus
* misc frontend build fixes:
- fix playwright version to be consistent to fix playwright test
- chunking: set max number of chunks generated
* lock playwright version
* remove intl polyfill
---------
Co-authored-by: sua yoo <sua@suayoo.com>
* Re-implement pagination and paginate crawlconfig revs
First step toward simplifying pagination to set us up for sorting
and filtering of list endpoints. This commit removes fastapi-pagination
as a dependency.
* Migrate all HttpUrl seeds to Seeds
This commit also updates the frontend to always use Seeds and to
fix display issues resulting from the change.
* Filter and sort crawls and workflows
Crawls:
- Filter by createdBy (via userid param)
- Filter by state (comma-separated string for multiple values)
- Filter by first_seed, name, description
- Sort by started, finished, fileSize, firstSeed
- Sort descending by default to match frontend
Workflows:
- Filter by createdBy (formerly userid) and modifiedBy
- Filter by first_seed, name, description
- Sort by created, modified, firstSeed, lastCrawlTime
* Add crawlconfigs search-values API endpoint and test
* backend: make crawlconfigs mutable! (#656)
- crawlconfig PATCH /{id} can now receive a new JSON config to replace the old one (in addition to scale, schedule, tags)
- exclusions: add / remove APIs mutate the current crawlconfig, do not result in a new crawlconfig created
- exclusions: ensure crawl job 'config' is updated when exclusions are added/removed, unify add/remove exclusions on crawl
- k8s: crawlconfig json is updated along with scale
- k8s: stateful set is restarted by updating annotation, instead of changing template
- crawl object: now has 'config', as well as 'profileid', 'schedule', 'crawlTimeout', 'jobType' properties to ensure anything that is changeable is stored on the crawl
- crawlconfigcore: store share properties between crawl and crawlconfig in new crawlconfigcore (includes 'schedule', 'jobType', 'config', 'profileid', 'schedule', 'crawlTimeout', 'tags', 'oid')
- crawlconfig object: remove 'oldId', 'newId', disallow deactivating/deleting while crawl is running
- rename 'userid' -> 'createdBy'
- remove unused 'completions' field
- add missing return to fix /run response
- crawlout: ensure 'profileName' is resolved on CrawlOut from profileid
- crawlout: return 'name' instead of 'configName' for consistent response
- update: 'modified', 'modifiedBy' fields to set modification date and user modifying config
- update: ensure PROFILE_FILENAME is updated in configmap is profileid provided, clear if profileid==""
- update: return 'settings_changed' and 'metadata_changed' if either crawl settings or metadata changed
- tests: update tests to check settings_changed/metadata_changed return values
add revision tracking to crawlconfig:
- store each revision separate mongo db collection
- revisions accessible via /crawlconfigs/{cid}/revs
- store 'rev' int in crawlconfig and in crawljob
- only add revision history if crawl config changed
migration:
- update to db v3
- copy fields from crawlconfig -> crawl
- rename userid -> createdBy
- copy userid -> modifiedBy, created -> modified
- skip invalid crawls (missing config), make createdBy optional (just in case)
frontend: Update crawl config keys with new API (#681), update frontend to use new PATCH endpoint, load config from crawl object in details view
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: sua yoo <sua@suayoo.com>
* Paginate API list endpoints
fastapi-pagination is pinned to 0.9.3, the latest release that plays
nicely with pinned versions of fastapi and fastapi-users.
* Increase page size via overriden Params and Page classes
* update api resource list keys
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Would probably ideally be break-word for all the non URL related things in the form but I don't think it will have any effect on anything that's not URLs in practice?
* Rename archives to orgs and aid to oid on backend
* Rename archive to org and aid to oid in frontend
* Remove translation artifact
* Rename team -> organization
* Add database migrations and run once on startup
* This commit also applies the new by_one_worker decorator to other
asyncio tasks to prevent heavy tasks from being run in each worker.
* Run black, pylint, and husky via pre-commit
* Set db version and use in migrations
* Update and prepare database in single task
* Migrate k8s configmaps
* profile browser vnc support + fixes:
- switch profile browser rendering to use VNC
- frontend: add @novnc/novnc as dependency, create separate bundle novnc.js to load into vnc browser (to avoid loading from each container)
- frontend: update proxy paths to proxy websocket, index page to crawler
- frontend: allow browser profiles in all browsers, remove browser compatibility check
- frontend: update webpack dev config, apply prettier
- frontend: node version fix
- backend: get vncpassword, build new URL for proxying to crawler iframe
- backend: fix profile / crawl job pull policy from 'Always' -> 'Never', should use existing image for job
- backend: fix kill signal to use bash -c to work with latest backend image
- backend/chart: add 'profile_browser_timeout_seconds' to chart values to control how long profile browser to remain when idle (default to 60)
- backend: remove utils.py, now using secret.token_hex() for random suffix
Co-authored-by: sua yoo <sua@suayoo.com>
- including pagination of queue results (30 results per page currently)
- show numbering on paginated results
- allow user navigation to each result page
Smoother elapsed crawl timer:
- Crawls list: show seconds increment up to 2 minutes, then show minutes only
- Crawls detail: show seconds increment up to one day
* animate starting state
* consistent fixed-size slots for each browser (url + screencast)
* add tooltip for expected number of browsers (workers x scale)
* ensure editing other config options does not lose profile
* support adding/editing/removing profile of existing config
* when duplicating config, ensure profile setting is also copied in the duplicate
* backend api
- superadmin has admin access to all archives
- new superadmin endpoints: /archives/all/crawls and /archives/all/crawls/<crawl_id>.json for list all running crawls
and loading crawl data by id
- frontend superadmin view (fixes#201)
* show all archives on superadmin home page
* show jump to crawl for super admin (#200)
* navbar links for: all archives, all running crawls and jump to crawl
Co-authored-by: sua yoo <sua@suayoo.com>
- Cancel and stop crawl
- Sorts crawls by start time, status and crawl template ID
- Filters crawls by crawl template ID
- Adds shortcut to copy template ID
- Leverage webpack chunk splitting to creating more, smaller JS files rather than one large main file (import(file) syntax)
- Enable long-term caching by adding content hash to output file names
- Copy entire /dist folder contents in Dockerfile
- Changed yarn start-dev -> yarn start since there is no prod server
- Reenable locale picker
- Show toast alert when user is verified
- Redirect to correct page on verified
- Update already-logged in user info on verify
- Adds new toast component
closes#39