Commit Graph

545 Commits

Author SHA1 Message Date
Vinzenz Sinapius
9b125bc2c6
Passthrough X-Forwarded-Proto header in frontend nginx (#1226)
If X-Forwarded-Proto header is already set, pass that through instead of setting to current scheme.
2023-09-28 10:58:57 -07:00
sua yoo
e5cc70754e
Show org storage quotas in dashboard (#1210)
- Displays storage quota in subdivided meter
- Updates icon colors
- Adds new <btrix-meter> component

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-09-27 10:38:59 -07:00
Tessa Walsh
304ea6d52f
Always display Download Logs button in Error Logs tab (#1209) 2023-09-22 12:08:03 -04:00
sua yoo
730a160f75
New org home page dashboard (#1201) 2023-09-21 19:20:08 -07:00
sua yoo
d05a27e8a4
Separate "run now" switch from scheduling options (#1175) 2023-09-21 19:18:57 -07:00
sua yoo
f4d9c0e3d5
build: fix webpack dev server recompiling without changes
See https://stackoverflow.com/questions/70990356/ionic-serve-keeps-recompiling-without-changes\#comment133341886_70990356
2023-09-19 12:15:58 -07:00
Tessa Walsh
9224f52f51
Remove config from list endpoints to speed up responses (#1193)
* Remove config from list endpoints

- Remove config field from workflow and crawl list endpoints
- Add seedCount to CrawlConfigOut on backend and Workflow on frontend
- Refactor CrawlConfig and CrawlConfigOut to extend CrawlConfigCore + CrawlConfigAdditional
- Refactor workflow list in frontend to use firstSeed and seedCount
- Frontend uses ListWorkflow type which is Omit<Workflow, "config">
2023-09-19 11:05:48 -05:00
sua yoo
58ff64dfbb
build: disable webpack polling for hot reload
potential fix for dev server recompiling--currently not using hot reload anyway
2023-09-18 15:14:34 -07:00
Ilya Kreymer
65b7c10ba1 bump version to 1.7.0-beta.1 2023-09-18 14:33:03 -07:00
sua yoo
6ddba105f4
Enable saving individual collection form sections (#1166)
- Moves metadata tab to first position
- Adds save button to each section, stays in edit view on saving
- Validates name exists before moving to next section or saving
- Changes save button text to "Create Collection without Items" if crawl/uploads aren't selected in new collection
- Fix server error not showing in UI
2023-09-14 15:21:01 -07:00
sua yoo
6234346d84
Fix crawl scope help text (#1169)
* update text

* remove trailing slash removal

* make scope help text responsive as user types

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-09-13 11:46:58 -07:00
Ilya Kreymer
9159c7c914
ensure max crawl size and max crawl timeout values are set to 0 when unused, instead of null (#1167)
- convert None->0 when creating CrawlJob
- ensure frontend sends 0 not null
- make input model require 'int = 0' instead of 'Optional[int] = 0'
2023-09-13 09:51:26 -07:00
Ilya Kreymer
c9c39d47b7
Scheduled Crawl Refactor: Handle via Operator + Add Skipped Crawls on Quota Reached (#1162)
* use metacontroller's decoratorcontroller to create CrawlJob from Job
* scheduled job work:
- use existing job name for scheduled crawljob
- use suspended job, set startTime, completionTime and succeeded status on job when crawljob is done
- simplify cronjob template: remove job_image, cron_namespace, using same namespace as crawls,
placeholder job image for cronjobs

* move storage quota check to crawljob handler:
- add 'skipped_quota_reached' as new failed status type
- check for storage quota before checking if crawljob can be started, fail if not (check before any pods/pvcs created)

* frontend:
- show all crawls in crawl workflow, no need to filter by status
- add 'skipped_quota_reached' status, show as 'Skipped (Quota Reached)', render same as failed

* migration: make release namespace available as DEFAULT_NAMESPACE, delete old cronjobs in DEFAULT_NAMESPACE and recreate in crawlers namespace with new template
2023-09-12 13:05:43 -07:00
Tessa Walsh
9377a6f456
Issue all non-upload storage-quota-update events from LiteElement (#1151)
- More specific toast notification error messages to the action being attempted
- Single dismissable global banner shown when org storage is reached
- Removed check for storage quota reached in `runNow`, since buttons are disabled in UI, and errors handled if request fails.
- Allow creating new workflow when storage quota reached
- More responsive storage quota updates: add storageQuotaReached to archived item replay.json, updates w/o reload when crawl pushes quota over limit
- Modify LiteElement to check for storageQuotaReached on GET requests

---------
Co-authored-by: sua yoo <sua@suayoo.com>
2023-09-11 18:17:48 -07:00
Ilya Kreymer
ad9bca2e92
Operator refactor to control pods + pvcs directly instead of statefulsets (#1149)
- Ability for pod to be Completed, unlike in Statefulset - eg. if 3 pods are running and first one finishes, all 3 must be running until all 3 are done. With this setup, the first finished pod can remain in Completed state.
- Fixed shutdown order - crawler pods now correctly shutdown first before redis pods, by switching to background deletion.
- Pod priority decreases with scale: 1st instance of a new crawl can preempt 3rd or 2nd instance of another crawl
- Create priority classes upto 'max_crawl_scale, configured in values.yaml
- Improved scale change reconciliation: if increasing scale, immediately scale up. If decreasing scale,
graceful stop scaled-down instance to complete via redis 'stopone' key, wait until they exit with Completed state
before adjust status.scale / removing scaled down pods. Ensures unaccepted interrupts don't cause scaled down data to be deleted.
- Redis pod remains inactive until crawler is first active, or after no crawl pods are active for 60 seconds
- Configurable Redis storage with 'redis_storage' value, set to 3Gi by default
- CrawlJob deletion starts as soon as post-finish crawl operations are run
- Post-crawl operations get their own redis instance, since one during response is being cleaned up in finalizer
- Finalizer ignores request with incorrect state (returns 400 if reported as not finished while crawl is finished)
- Current resource usage added to status
- Profile browser: also manage single pod directly without statefulset for consistency.
- Restart pods via restartTime value: if spec.restartTime != status.restartTime, clear out pods and update status.restartTime (using OnDelete policy to avoid recreate loops in edge cases).
- Update to latest metacontroller (v4.11.0)
- Add --restartOnError flag for crawler (for browsertrix-crawler 0.11.0)
- Failed crawl logging: dd 'fail_crawl()' to be used for failing a crawl, which prints logs for default container (if enabled) as well as pod status
- tests: check other finished states to avoid stuck in infinite loop if crawl fails
- tests: disable disk utilization check, which adds unpredictability to crawl testing!
fixes #1147 

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-09-11 10:38:04 -07:00
Tessa Walsh
d2ededc895
Add and enforce org storage quota (#1106)
* Implement in backend

- Track bytesStored in org
- Add migration to pre-calculate based on size of crawlfiles and profilefiles
- Add methods to increase or decrease org storage when crawl or profile files
are added or deleted
- Include storageQuotaReached boolean in API responses that alter storage
- Don't start new crawls and fail uploads if storage quota reached

* Implement in frontend

- Add to orgs-list quotas
- Update org's storageQuotaReached based on backend endpoint responses
- Disable buttons when storage quota is met
- Show toast notification when attempting to run a crawl when org
storage quota is met
2023-09-07 12:45:43 -04:00
Henry Wilkinson
8850e35f7a
Changes "Crawls" → "Items" (#1145) 2023-09-05 23:58:12 -04:00
sua yoo
0cad649ab9
fix too many errors in chrome (#1130) 2023-09-05 21:36:40 -04:00
Tessa Walsh
93573d0bfe
Use base10 for sizes in frontend (#1133)
* Use base10 for sizes in frontend

* Simplify renderSize
2023-09-05 21:35:20 -04:00
Ilya Kreymer
6dca2f1c03
supports overriding the replayweb.page version without having to be r… (#1122)
* supports overriding the replayweb.page version without having to be rebuild frontend image:
- ensures 'rwp_base_url' from helm chart is passed to nginx
- ensures both ui.js and sw.js are loaded based on nginx environment variable, not hard-coded
- ui.js loaded via redirect from new /replay/ui.js path
- pin RWP to known working release in default values.yaml
- remove RWP_BASE_URL from Dockerfile, no longer needed, set via chart env var
- set default RWP_BASE_URL for devserver to use CDN
- set RWP version to 1.8.11
2023-09-05 20:10:21 -04:00
sua yoo
ff6650d481
Manage collection from archived item details (#1085)
- Lists collections that an archived item belongs to in item detail view
- Improves performance of collection add component
---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-09-05 17:52:17 -04:00
Henry Wilkinson
1af796bd0e
fix: Terminology unification "crawls" & "archive data" → "items" (#1127)
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2023-09-01 11:09:06 -04:00
Tessa Walsh
e667fe2e97
Add max crawl size option to backend and frontend (#1045)
Backend:
- add 'maxCrawlSize' to models and crawljob spec
- add 'MAX_CRAWL_SIZE' to configmap
- add maxCrawlSize to new crawlconfig + update APIs
- operator: gracefully stop crawl if current size (from stats) exceeds maxCrawlSize
- tests: add max crawl size tests

Frontend:
- Add Max Crawl Size text box Limits tab
- Users enter max crawl size in GB, convert to bytes
- Add BYTES_PER_GB as constant for converting to bytes
- docs: Crawl Size Limit to user guide workflow setup section

Operator Refactor:
- use 'status.stopping' instead of 'crawl.stopping' to indicate crawl is being stopped, as changing later has no effect in operator
- add is_crawl_stopping() to return if crawl is being stopped, based on crawl.stopping or size or time limit being reached
- crawlerjob status: store byte size under 'size', human readable size under 'sizeHuman' for clarity
- size stat always exists so remove unneeded conditional (defaults to 0)
- store raw byte size in 'size', human readable size in 'sizeHuman'

Charts:
- subchart: update crawlerjob crd in btrix-crds to show status.stopping instead of spec.stopping
- subchart: show 'sizeHuman' property instead of 'size'
- bump subchart version to 0.1.1

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-08-26 22:00:37 -07:00
Ilya Kreymer
2da6c1c905
1.6.3 Fixes - Fix workflow sort order for Latest Crawl + 'Remove From Collection' action menu on archived items in collections (#1113)
* fix latest crawl (lastRun) sort:
- don't cast 'started' value to string when setting as starting crawl time (regression from #937)
- caused incorrect sorting as finished crawl time was a datetime, while starting crawl time was a string
- move updated config crawl info in one place, simplify to avoid returning started time altogether, just set directly
- pass mdb crawlconfigs and crawls collections directly to add_new_crawl() function
- fixes #1108

* Add dropdown menu containing 'Remove from Collection' to archived items in collection view (#1110)
- Enables users to remove an item from a collection from the collection detail view - menu was previously missing
- Fixes: #1102 (missing dropdown menu) by making use of the inactive menu trigger button.
- Updates collection items page size to match "Archived Items" page size (20 items per page)

---------
Co-authored-by: sua yoo <sua@webrecorder.org>
2023-08-25 21:08:47 -07:00
Anish Lakhwara
8b16124675
feat: implement 'collections' array with {name, id} for archived item details (#1098)
- rename 'collections' -> 'collectionIds', adding migration 0014
- only populate 'collections' array with {name, id} pair for get_crawl() / single archived item
path, but not for aggregate/list methods
- remove Crawl.get_crawl(), redundant with BaseCrawl.get_crawl() version
- ensure _files_to_resources returns an empty [] instead of none if empty (matching BaseCrawl.get_crawl() behavior to Crawl.get_crawl())
- tests: update tests to use collectionIds for id list, add 'collections' for {name, id} test
- frontend: change Crawl object to have collectionIds instead of collections

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-08-25 00:26:46 -07:00
Ilya Kreymer
989ed2a8da
Use Shared Services for Crawling, Redis, Profile Browsers (#1088)
* refactor to use shared role-based service shared across pods:
- 'crawler' service for all crawler screencasting, scales 0 .. N with crawler-<ID>-N.crawl
- 'redis' service for all redis access, redis-<ID>-0.redis
- 'browser' service for all browser access (profile browsers), browser-<ID>-0.browser
- don't create a new service per crawl/profile at all
- enable 'publishNotReadyAddresses' for potentially faster resolving, esp for redis
- remove service as type managed by operator as no longer creating services dynamically
- remove frontend var CRAWLER_SVC_SUFFIX, suffix always '.crawler' to match crawler service name
2023-08-24 20:08:53 -07:00
Ilya Kreymer
e7f2d93f80 bump version to 1.7.0-beta.0 2023-08-23 12:03:45 -07:00
Tessa Walsh
ce5b52f8af
Add and enforce org maxPagesPerCrawl quota (#1044) 2023-08-23 10:38:36 -04:00
sua yoo
54cf4f23e4
Paginate Workflows and refactor to use server-side queries (#1078)
- Paginates Crawl Workflows when there are more than 10 workflows
- Refactors workflow search and crawl search to use the same component
- Adds sort by first seed, workflow creation date, and workflow modified date
- Separates "last run" date from "modified" date
- Update column layout into Name & Schedule (or Manual Ru'ri=), Latest Crawl (<finish time> in <duration>), total size, and last modified (modified by and modified time)
2023-08-22 16:29:17 -07:00
Ilya Kreymer
223571b18b
exclusion regex: show unmodified regex string, avoid dropping the '\' when displaying escaped regexes (#1094) 2023-08-22 10:16:23 -07:00
Ilya Kreymer
422452b5c1 bump to 1.6.2 2023-08-18 18:27:37 -07:00
sua yoo
6044486190
Add button to download error logs (#1080)
* add button to download logs

* render if logs are present

* add icon
2023-08-15 21:14:32 -07:00
sua yoo
270e134359
Show details in crawl error log (#1079)
Shows crawl error log details in a dialog. Since the detail object does not always follow a specific format, this iteration uses the detail key in uppercase as the label.
2023-08-15 21:14:08 -07:00
Ilya Kreymer
768d1181f8
frontend: fixes for queue / exclusions: (#1076)
- fix 'Edit Crawler Instances' not showing up when crawl running
- urlencode regex params to properly encode '+'
- catch server-side regex error, display 'Invalid Regex'
2023-08-15 13:15:43 -07:00
sua yoo
4c74fadf91
Update frontend local dev guide (#1073)
- Clarifies use case for frontend development server
- Fixes incorrect sample API URLs
- Adds additional detail around requirements and quickstart
- Links back to docs from frontend README
---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2023-08-15 12:03:39 -07:00
sua yoo
89983542f9
Update archived item URLs (#1064)
- Changes to URLs in "Crawling", "All Archived Items", and "Collections":
- Rename Artifacts -> Items
- Unifies view crawl view as loaded from All Archived Items and from Workflows
- Includes redirect for /artifacts/uploads -> /items/uploads to support archiveweb.page usage
2023-08-14 18:28:37 -07:00
sua yoo
ffd0e525d9
Webpack config improvements (#1063)
- Upgrades webpack and webpack-dev-server for bugfixes and performance updates
- Removes unnecessary file watching
- Enables persistent build cache in dev
- Switches to faster dev source map
2023-08-11 13:16:24 -07:00
Ilya Kreymer
d93ddaf620 bump version to 1.6.1 2023-08-11 12:50:41 -07:00
Ilya Kreymer
35ab6d6df6 bump to 1.6.0! 2023-08-09 15:40:27 -07:00
Ilya Kreymer
8ea3dd5dae
terminology tweaks in frontend: (part of #922) (#1062)
* terminology tweaks in frontend: (part of #922)
- use 'crawl workflow' instead of 'workflow' where possible
- use 'replay' instead of 'replay crawl'
- localization: rerun string extraction / processing
- "Review Config" → "Review Settings"
- "Workflow" → "Crawl Workflow" in error message

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-08-09 15:38:58 -07:00
sua yoo
37733483d5
Standardize archived item filtering, sorting and labels (#1054)
Frontend:
- Renames list view to "All Archived Items"
- Refactors fetches to use single all-crawls endpoints
- Removes search by config ID for more search parity with uploads
- Adds sort by size
- Refactors property and method names to replace crawl*
- Replaces remaining references to "crawl" in copy with "item"'
- Rename Upload Archive button to Upload WACZ
- Fix focusout in item menu so menus close

Backend:
- Filter search values by type as well
- Only get list of cids for crawls in search values
- Don't list crawl/workflow ids in search values

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-08-09 12:13:55 -07:00
Ilya Kreymer
7a8f370bc2 bump version to 1.6.0-beta.4 for testing 2023-08-09 12:09:37 -07:00
Ilya Kreymer
38f67a6cc0
Optimize Frontend Image Build on CI (#1057)
* Always run yarn only on build platform with --platform=$BUILDPLATFORM
* Remove optional dependencies (playwright + chromium) from build with --ignore-optional and move some devDependencies to be optional
* Disable husky pre-commit hook checks on frontend

Co-authored-by: sua yoo <sua@suayoo.com>
2023-08-09 12:06:20 -07:00
sua yoo
b494070e43
Collection share dialog + copy updates (#1056)
- Always shows primary "Share" action button in Collection detail page.
- Enables toggling shareable status and share info from dialog. Difference from mockups: I made the "Done" button neutral do differentiate from our submit action buttons in the dialog, since toggling will apply changes immediately.
- Menu item: "Go to Public View"/"Go to Shareable View" -> "Visit Shareable URL". 
- Toggle label: "Make Collection Shareable" -> "Collection is Shareable".
- Additional dialog copy: adds "This collection can be viewed by anyone with the link." under "Link to Share" and "Share this collection by embedding it into an existing webpage." under "Embed Collection".
- Moves share status icon to its own column in list view.
- Adds new syntax-highlighted code component that supports js and html.

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-08-09 10:12:46 -07:00
Anish Lakhwara
9236a07800
fix: run yarn format in frontend dir (#1043) 2023-08-03 19:12:48 -07:00
Ilya Kreymer
362afa47bd
Support for Public / Shareable Collections (#1038)
* collections: support toggling collections public/private, viewable via RWP
- backend: add 'public' to collection model, support patching to update
- backend: add .../collections/<id>/public/replay.json for public access
- backend: add CORS handling for public endpoint
- frontend: support 'make shareable / make private' dropdown actions on collection detail + collection list views
- frontend: show shareable / private icons by collection name on detail + list views
- frontend: link to replayweb.page for standalone browsing
- frontend: add embed code popup when a collection is shareable
- refer to public collections as 'shareable' for now

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-08-03 19:11:01 -07:00
sua yoo
62d3399223
Add info bar to Collection detail view (#1036)
- Adds Collection info bar to detail view
- Update "Web Captures" -> "Archived Items"
- Updates Collection list columns to match
- Refactors `btrix-desc-list` and usage in `workflow-details` to reuse horizontal info bar component
2023-08-03 16:58:56 -07:00
Anish Lakhwara
af09d56ef6
Merge pull request #1035 from webrecorder/backend-init
feat: Display waiting message while backend is initializing
2023-08-02 17:39:47 -07:00
Anish Lakhwara
fa58e77167 fix: remove strange character? 2023-08-02 17:34:09 -07:00
Anish Lakhwara
5ed2faaecc fix: need to use window.timeOut to get a timerId back 2023-08-02 17:31:01 -07:00
Anish Lakhwara
6ecfd8ec24 fix: timerId not timeoutId 2023-08-02 17:28:07 -07:00
Anish Lakhwara
3985cf014e fix: clear timeout on disconnect callback 2023-08-02 17:26:26 -07:00
Anish Lakhwara
196b26c60e fix: center text 2023-08-02 17:21:36 -07:00
Anish Lakhwara
f1d91e3bf9 fix: add styling 2023-08-02 17:18:40 -07:00
Anish Lakhwara
a8bedeffb5 fix: take Sua's suggestons, less code needed 2023-08-02 17:10:45 -07:00
Anish Lakhwara
2f26fcefce fix: make pretty & work correctly 2023-08-02 16:36:28 -07:00
Anish Lakhwara
06918c967b feat: use html dialog instead 2023-08-02 11:37:55 -07:00
Anish Lakhwara
84a60b54e4 feat: Display waiting message while backend is initializing 2023-08-01 17:18:05 -07:00
Ilya Kreymer
45eaa0b3a3 version: bump to 1.6.0-beta.3 2023-08-01 09:48:17 -07:00
sua yoo
cc52dfd940
Sort Collections by size (#1026)
- Adds "Size" column to Collections list view
- Adds "Size" option to sort dropdown
2023-08-01 09:47:47 -07:00
sua yoo
54e2b2c703
List web captures in Collection (#1024)
- Adds tab for "Web Captures" in Collection detail view
- Move Collection description under Replay section
- Fixes app reloading when clicking into a Collection
- Standardizes Web Capture list headers from "Finished -> "Created Date"
2023-08-01 09:14:27 -07:00
Ilya Kreymer
06cf9c7cc3
add crawl ending states: 'generate-wacz', 'uploading-wacz', 'pending-wait' that occur after a crawl is finished or is being stopped (#1022)
operator: ensure transitions from each of these states is supported, including to 'waiting_capacity'
add extra check on stopping to avoid transitioning back to a running state after crawl is finished
ui: add states to UI display, localization, add as active states
fixes #263
2023-08-01 00:15:59 -07:00
Anish Lakhwara
d8502da885
fix(build): use /usr/bin/env bash instead of /bin/bash (#1020)
* fix: add to various other shell scripts
2023-07-28 21:50:04 -07:00
sua yoo
7069b33646
Show only running crawls in superadmin view (#1015)
- Show separate crawls list for admin view, fixes #1010
2023-07-26 15:48:20 -07:00
Ilya Kreymer
6506965d98
Streaming Download for Collections (#1012)
* support streaming download of collections (part of #927)
- WACZ zip created on the fly using stream-zip
- add 'Download Collection' option to collection detail and list
- after editing collection, return to collection view
- tests: add test for streaming download, ensure WACZ files + datapackage present, STORE compression used

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2023-07-26 15:42:17 -07:00
Tessa Walsh
c21153255a
Rename notes to description in frontend and backend (#1011)
- Rename crawl notes to description
- Add migration renaming notes -> description
- Stop inheriting workflow description in crawl
- Update frontend to replace crawl/upload notes with description
- Remove setting of config description from crawl list
- Adjust tests for changes
2023-07-26 13:00:04 -07:00
sua yoo
75b011f951
Upload WACZ via UI (#992)
- Users can now upload .WACZ archives from the "Archived Data" page.
- Can specify name, description, tags and collection(s) to add upload to
- Show progress of upload
- Support canceling upload
2023-07-21 16:45:52 +02:00
sua yoo
85913112a2
Upgrade lit + shoelace to reduce build size (#938)
* upgrade lit

* upgrade shoelace

* upgrade testing libraries

* add webpack bundle analyzer

* revert shoelace changes

* remove bundle analyzer

* remove console log
2023-07-20 11:50:05 +02:00
Tessa Walsh
d5c3a8519f
Add crawler Use Sitemap option to Browsertrix Cloud (#978)
* Add user-guide docs for Use Sitemap option
---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-07-19 13:57:52 -04:00
sua yoo
c5b3be0680
Fix frontend formatting pre-commit (#991)
* update lint staged config

* remove prettier defaults
2023-07-18 17:51:13 +02:00
Ilya Kreymer
2372f43c2c
frontend: fix to collection editor with crawls and uploads (#971)
* frontend:
- follow up to #969, fixes crawl workflows by using crawl-specific endpoint and merging results

* get crawls and uploads concurrently

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2023-07-10 19:29:19 +02:00
sua yoo
f3660839bf
Allow users to add uploads to collections (#968)
* show uploads in 'Select Uploads' section
2023-07-09 22:21:50 -07:00
Henry Wilkinson
d9e73fcbc3
Reorder Limits section (#966)
* Reorder Limits section

- Minor text change to section names
  - "Limit Per Page" → "Per-Page Limits"
  - "Limit Per Crawl" → "Per-Crawl Limits"

* Reorder limits section in documentation
2023-07-08 08:54:30 -07:00
Ilya Kreymer
8eeb66e11f
Frontend more upload path fixes (#961)
* additional fixes for #935:
- don't use artifactType for detail pages, ensure correct artifact selected based on path

* naming tweaks:
- from uploads detail, return to 'All Uploads' with filter
- from crawls detail, return to 'All Crawls' with filter
- rename general to 'All Archived Data'
2023-07-07 15:41:03 -07:00
Ilya Kreymer
d3a757e20b
partial fix for: #935: (#960)
- add route for /artifacts/upload/<id> to be used for uploads
- link uploads to /artifacts/upload/<id> instead of /artifacts/crawl/<id>
2023-07-07 14:23:26 -07:00
sua yoo
de4b18aa67
List crawls, uploads, and all objects in UI (#941)
- Adds top-level "Archived Data" view, replacing "Finished Crawls" and moving it as "Crawls" into view
- Adds list for viewing all artifacts/data
- Adds list for viewing all uploaded crawls
- Updates crawl detail view to show upload details
- Edit upload metadata, including 'name'
- Delete uploads
---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-07-07 13:20:28 -07:00
Ilya Kreymer
00eb62214d
Uploads API: BaseCrawl refactor + Initial support for /uploads endpoint (#937)
* basecrawl refactor: make crawls db more generic, supporting different types of 'base crawls': crawls, uploads, manual archives
- move shared functionality to basecrawl.py
- create a base BaseCrawl object, which contains start / finish time, metadata and files array
- create BaseCrawlOps, base class for CrawlOps, which supports base crawl deletion, querying and collection add/remove

* uploads api: (part of #929)
- new UploadCrawl object which extends BaseCrawl, has name and description
- support multipart form data data upload to /uploads/formdata
- support streaming upload of a single file via /uploads/stream, using botocore multipart upload to upload to s3-endpoint in parts
- require 'filename' param to set upload filename for streaming uploads (otherwise use form data names)
- sanitize filename, place uploads in /uploads/<uuid>/<sanitized-filename>-<random>.wacz
- uploads have internal id 'upload-<uuid>'
- create UploadedCrawl object with CrawlFiles pointing to the newly uploaded files, set state to 'complete'
- handle upload failures, abort multipart upload
- ensure uploads added within org bucket path
- return id / added when adding new UploadedCrawl
- support listing, deleting, and patch /uploads
- support upload details via /replay.json to support for replay
- add support for 'replaceId=<id>', which would remove all previous files in upload after new upload succeeds. if replaceId doesn't exist, create new upload. (only for stream endpoint so far).
- support patching upload metadata: notes, tags and name on uploads (UpdateUpload extends UpdateCrawl and adds 'name')

* base crawls api: Add /all-crawls list and delete endpoints for all crawl types (without resources)
- support all-crawls/<id>/replay.json with resources
- Use ListCrawlOut model for /all-crawls list endpoint
- Extend BaseCrawlOut from ListCrawlOut, add type
- use 'type: crawl' for crawls and 'type: upload' for uploads
- migration: ensure all previous crawl objects / missing type are set to 'type: crawl'
- indexes: add db indices on 'type' field and with 'type' field and oid, cid, finished, state

* tests: add test for multipart and streaming upload, listing uploads, deleting upload
- add sample WACZ for upload testing: 'example.wacz' and 'example-2.wacz'

* collections: support adding and remove both crawls and uploads via base crawl
- include collection_ids in /all-crawls list
- collections replay.json can include both crawls and uploads

bump version to 1.6.0-beta.2
---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-07-07 09:13:26 -07:00
Tessa Walsh
29a6f0f6bc
Fix links in watch crawl after workflow crawl completes (#943) 2023-07-06 15:04:26 -07:00
Henry Wilkinson
8a240ad044
Fixes z-index (#939) 2023-07-04 23:05:09 -04:00
Ilya Kreymer
e37f220d6c version: bump to 1.6.0-beta.1 2023-06-16 18:53:32 -07:00
Tessa Walsh
c7051d5fbf
Backend API consistency pass (#921)
* Make API add and update method returns consistent

- Updates return {"updated": True}
- Adds return {"added": True}
- Both can additionally have other fields as needed, e.g. id or name

- remove Profile response model, as returning added / id only
- reformat

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-06-16 18:52:46 -07:00
Ilya Kreymer
d9ad8c11d2 frontend: fix RWP_BASE_URL not being set correctly for nginx image 2023-06-13 00:04:46 -07:00
Tessa Walsh
bd6dc79449
Add frontend support for auto-adding collections to workflows (#916)
- Adds collections search and list to workflow editor
- Adds collections to workflow details component
- Adds namePrefix filter to backend GET /orgs/{oid}/collections endpoint to support case-insensitive searching of collections
- Adds documentation for new setting

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-06-12 18:18:05 -07:00
Henry Wilkinson
71e9984e65
Adds documentation link and version copy button to footer (#920)
* Updates footer

- Adds documentation link
- Adds label to GitHub link, moves outside of the version code
- Adds copy button to version code for quick access when filing bug reports :)

* Comments out invisible div

* Improves responsiveness on mobile
2023-06-12 17:51:21 -07:00
Ilya Kreymer
ec3404c798
Fix Extra URLs in Scope (#913)
* scope fix: when using 'Custom Page Prefix scope (fixes #873)
- don't include primary seed URL in include list
- don't always add trailing slash to extra in scope URLs
- set seed scope to 'prefix' (supported via webrecorder/browsertrix-crawler#318) instead of re-including seed URL
- add comments on using 'custom' to indicate 'Custom Prefix Scope' semantics on frontend, setting actual scope to 'prefix' on backend
- remove unneeded conditional for additional urls, main scopeType overridden per seed anyway
2023-06-12 17:29:41 -07:00
Henry Wilkinson
2364433932
Admin Panel Minor Frontend Style Updates (#915)
- Unifies trash icons on all pages to use trash3 (there were a few stragglers!)
- Brings styling of org quotas dialogue in-line with the rest of our dialogues
- Adds missing localization strings
- Swaps button with icon button to match table row action styling elsewhere
2023-06-10 19:21:34 -07:00
Ilya Kreymer
9707fb55e4
fix finished workflows incorrectly being displayed as running (#909) 2023-06-08 11:26:42 -07:00
Ilya Kreymer
4428184aea
frontend: configure running with a fixed 'replay.json', auth headers passed via separate config (#899)
wabac.js will reload the replay.json on 403 with new token (will be in next version of wabac.js)
presign urls: make presign timeout configurable (in minutes), defaults to 60 mins
dockerfile: fix configuring RWP_BASE_URL
2023-06-08 11:26:26 -07:00
Henry Wilkinson
a718043fa8
Adds icon name and tooltip content fields to btrix-copy-button (#879)
- Adds two new properties, name to pick the icon's name and content to pick a custom tooltip message. These are in-line with what Shoelace uses but are perhaps not the best descriptors...
- Swaps the existing anchor links on the Workflow Details' Settings tab for these and relocates them to after the heading. (Navigation to the links is broken right now... but the copying part works nicely!)
- Updates btrix-section-heading to better handle multiple elements with flexbox and an 8px gap between elements
2023-06-06 17:54:17 -07:00
sua yoo
66b3befef9
Frontend collections beta UI (#886)
- Support for creating new collections and editing existing collections
- Can select crawling workflows which adds entire workflow, and then deselect individual crawls
- Can edit existing collections and add more crawls
- Can view, create and delete collections via new Collections top-level nav entry
2023-06-06 17:52:01 -07:00
Ilya Kreymer
00fb8ac048
Concurrent Crawl Limit (#874)
concurrent crawl limits: (addresses #866)
- support limits on concurrent crawls that can be run within a single org
- change 'waiting' state to 'waiting_org_limit' for concurrent crawl limit and 'waiting_capacity' for capacity-based
limits

orgs:
- add 'maxConcurrentCrawl' to new 'quotas' object on orgs
- add /quotas endpoint for updating quotas object

operator:
- add all crawljobs as related, appear to be returned in creation order
- operator: if concurrent crawl limit set, ensures current job is in the first N set of crawljobs (as provided via 'related' list of crawljob objects) before it can proceed to 'starting', otherwise set to 'waiting_org_limit'
- api: add org /quotas endpoint for configuring quotas
- remove 'new' state, always start with 'starting'
- crawljob: add 'oid' to crawljob spec and label for easier querying
- more stringent state transitions: add allowed_from to set_state()
- ensure state transitions only happened from allowed states, while failed/canceled can happen from any state
- ensure finished and state synched from db if transition not allowed
- add crawl indices by oid and cid

frontend: 
- show different waiting states on frontend: 'Waiting (Crawl Limit) and 'Waiting (At Capacity)'
- add gear icon on orgs admin page
- and initial popup for setting org quotas, showing all properties from org 'quotas' object

tests:
- add concurrent crawl limit nightly tests
- fix state waiting -> waiting_capacity
- ci: add logging of operator output on test failure
2023-05-30 15:38:03 -07:00
sua yoo
ab518f51fb
Fix ResizeObserver loop error (#902) 2023-05-30 14:59:34 -07:00
sua yoo
4852532866
Show org creation form if there are no orgs (#883) 2023-05-24 13:10:12 -07:00
Henry Wilkinson
f788934ef5
Fix copy tags button disabling when no tags on Crawl Details page (#877) 2023-05-24 12:30:31 -04:00
Tessa Walsh
bd8b306fbd
Improve sorting workflows by lastUpdated (#826)
* Precompute config crawl stats

Includes a database migration to move preciously dynamically computed
crawl stats for workflows into the CrawlConfig model.

* Add lastRun sorting option and enable it by default

* Add modified as final sort key to order non-run workflows

* Remove currCrawl* fields and update frontend accordingly

* Add isCrawlRunning field to backend and use in frontend
2023-05-22 18:42:30 -04:00
sua yoo
821fbc12d8
Upgrade Shoelace to stable version (v2) (#856) 2023-05-22 10:01:48 -07:00
Ilya Kreymer
826c2e8298 version: bump to 1.6.0-beta.0 2023-05-19 11:29:31 -07:00
Ilya Kreymer
d07204e59d version: bump to 1.5.1 2023-05-18 17:28:42 -07:00
sua yoo
b5781c8869
Fix workflow edit back button (#857) 2023-05-17 12:07:12 -07:00
Henry Wilkinson
da33231be9
Removes webkit <summary> element triangle (#852) 2023-05-16 18:13:59 -04:00
Ilya Kreymer
a1ef93a46a version: bump to 1.5.0 for release! 2023-05-16 17:36:58 +02:00
Ilya Kreymer
ebee5e1788 version: bump to 1.5.0-beta.4 2023-05-12 07:34:50 +02:00
sua yoo
f250293794
Fix workflow edit page not loading (#848)
* fix workflow not loading

* don't add hash if editing

* remove controller
2023-05-12 07:33:35 +02:00
sua yoo
98d82184e6
Fix superadmin running crawls views (#846)
- Updates superadmin "Running Crawls" to show active crawls (starting, waiting, running, stopping) and sort by start by default
- Navigates to crawl workflow watch view on clicking crawl item
- Adds "Copy Crawl ID" to crawl actions for easy paste into "Jump to crawl"
- Navigates to crawl workflow watch when jumping to crawl
2023-05-11 08:15:52 +02:00
Ilya Kreymer
d8b36c0ae2 version: bump to 1.5.0-beta.3 2023-05-11 03:05:46 +02:00
sua yoo
a6435ae3d0
Improve Workflow Detail tab and button UX (#840)
- Adds primary action button next to "Actions" dropdown
- Switches "Edit Workflow Settings" button to icon button
- Redirects user to "Watch Crawl" tab when starting crawl
  - Now uses crawl ID from `data.started` in API `/run` response for more responsive UI
- Keeps "Watch Crawl" tab navigation button in list but disable when crawl is not running
  - Also handles watch view when workflow is not running to cover navigational edge cases
- Adds banner in "Crawls" list to direct users to the Watch Crawl when workflow is running
- Shows notification when crawl is done to make redirect to Crawls tab smoother
- Uses workflow scale when updating crawl scale
- Removes "All" from "View: All Finished Crawls" on Finished Crawl page for wording consistency
2023-05-11 02:57:38 +02:00
Ilya Kreymer
d1e5b0a021 version: bump to 1.5.0-beta.2 2023-05-10 14:55:35 +02:00
sua yoo
42794cad46
Add stop crawl confirmation dialog (#841)
* switch dialog control

* wait for workflow update to complete before showing dialog

* add stop dialog

* close scale after save

* update crawl text
2023-05-10 07:21:16 +02:00
Ilya Kreymer
82b21b6813
frontend crawl stopping improvements (#836) (#838)
* frontend crawl stopping improvements (#836)
- support new backend 'stopping' property
- for now, keep 'stopping' indicator state when crawl is running but stopping set to true
2023-05-08 23:52:49 -07:00
Ilya Kreymer
2cae065c46
Add Waiting state on the backend and frontend (#839)
* operator: add waiting state
- add pods as related objects
- inspect pod status, set crawl status to 'waiting' if no pods are running

frontend:
- frontend support for 'waiting' state
- show waiting icon from mocks

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-05-08 17:05:01 -07:00
Ilya Kreymer
f992704491 version: bump version to 1.5.0-beta.1 2023-05-06 00:31:03 -07:00
sua yoo
9fcbc3f87e
Allow users to set max depth/hop out within scope (#816)
- Adds an input to the Workflow creation and edit form for specifying crawl depth. This input is conditionally shown for seeded crawls when the scope is set to "Pages on this domain", "Pages on this domain & subdomains" or "Custom page prefix". The "any" scope is also supported for backwards compatibility but is not shown by default or in new configs.
- API implementation: The depth value is set in the primary seed config, i.e. the first seed in seeds: [], not in the outer .config.depth property.
2023-05-05 14:26:48 -07:00
Henry Wilkinson
7409e0637e
Improves crawl detail files list truncation (#830) 2023-05-05 14:25:29 -07:00
sua yoo
0d23b45dac
Crawl workflow detail page improvements (#823)
Resolves #817
- Adds relevant action buttons to each Workflow detail tab header
- Adds "Delete" action menu item to crawls in Crawls tab
- Prevent automatically switching to "Watch" tab after running crawl from detail page
- Removes "Stop" confirmation prompt and only shows "Cancel" confirmation prompt if there are one or more pages crawled
- Replaces "Cancel" confirmation prompt with web component dialog (partially addresses Switch to in-page dialogue boxes #619)
- Fixes hash routing to fix going back with browser back button
2023-05-05 13:50:45 -07:00
sua yoo
85c96de883
Show critical errors in Crawl detail logs (#811) 2023-05-05 11:30:38 -07:00
Henry Wilkinson
7978cb4d85
Crawl detail page update (#808)
- Removes the info bar rendering and moves relevant information to the Overview section
- Adds total crawl size to the overview section
2023-05-03 15:50:15 -04:00
Henry Wilkinson
76c9185d69
Improve Recursive font declaration (#791) 2023-05-03 14:19:21 -04:00
sua yoo
60581411eb
Refactor screencast IDs (#800)
Fixes #713, mapping watch windows to exact column/row by id
2023-05-03 10:33:04 -07:00
sua yoo
9a1c2ba871
Fix workflow limit empty values being set to 0 (#795)
* default to null

* pass undefined for removing values

* handle 0 default
2023-05-03 09:25:22 -07:00
Henry Wilkinson
9500fd97fa
Merge pull request #802 from webrecorder/frontend-workflow-controls-update 2023-05-03 00:21:20 -04:00
Henry Wilkinson
a13964c4c4
Merge pull request #809 from webrecorder/frontend-icon-button-aria-label-fixes 2023-05-01 15:38:49 -04:00
Henry Wilkinson
ee92eb6646
Merge pull request #810 from webrecorder/frontend-minor-visual-updates 2023-05-01 15:38:37 -04:00
Henry Wilkinson
624e7083cf
Merge pull request #806 from webrecorder/frontend-update-copy-button 2023-05-01 15:38:22 -04:00
Henry Wilkinson
bddbe35315 Runs yarn format 2023-05-01 15:33:17 -04:00
Henry Wilkinson
088d6d306a Adds hoist to browser profile list actions dropdown
- Should fix bug where you can see the icon buttons through the dropdown
2023-05-01 03:27:59 -04:00
Henry Wilkinson
6e921cc065 Add margin to crawls list
Mirrors workflow list
2023-05-01 03:26:57 -04:00
Henry Wilkinson
23e398d327 Icon updates
- Changes `trash` for `trash3` which I believe wasn't originally available in the version of bootstrap-icons we were using but now it is and I like the tapered edges better :P
- Makes browser profiles action button small to fit with the rest of the dropdown components used elswhere
- Changes previous file-earmark delete icon to trash icons used everywhere else for delete actions
2023-05-01 03:26:34 -04:00
Henry Wilkinson
e04a6a7825 Improves icon button aria labels
- Adds some labels to missing icon buttons
- Fixes metadata `aria-label` usage → `label` so it actually gets added to the rendered `button`
- Changes the "More" label to a (hopefully) more descriptive "Actions" label for dropdown actions menus
2023-05-01 02:57:32 -04:00
Henry Wilkinson
1d7518af07 Ensure that button returns to its default state
uses the .blur() method to set the icon button back to its unfocused state after the set time
2023-04-29 17:17:49 -04:00
Henry Wilkinson
228e2187e3 Copy button text → icon
- Converts to icon button
- Adds accessibility label field
2023-04-28 14:53:11 -04:00
Henry Wilkinson
45826f8d70 Show only mine unification
Same styling as the finished crawls page
2023-04-28 13:38:23 -04:00
Henry Wilkinson
577942805b Moves dropdown beside search bar
- Improves responsiveness for top two items
2023-04-27 02:03:09 -04:00
Henry Wilkinson
81aeba6e92 Changes logout icon
- It's a door now instead of the box arrow
2023-04-27 00:46:42 -04:00
Henry Wilkinson
c359589024 Adds additional styling to the file picker
- Half aligned with current mockup!
2023-04-27 00:12:06 -04:00
Henry Wilkinson
c03bb1923b Removes extra padding around replay window
- Adds a check before the block of HTML that adds 16px of padding and tells it not to add that if it's the replay page.
2023-04-26 23:50:02 -04:00
sua yoo
e6e46b522a
hotfix: prevent polling during workflow edit 2023-04-26 13:41:41 -07:00
sua yoo
937ad4fe08
fix: navigate to watch on new crawl work
follows #720
2023-04-25 14:30:41 -07:00
sua yoo
7888c4fde3
Frontend crawl workflows rework (#775) 2023-04-25 14:16:07 -07:00
sua yoo
1458e2cdd9
hotfix: delete crawl workflow without crawls 2023-04-24 15:18:20 -07:00
Ilya Kreymer
85b6a05419
Upgrade to mongo 6 and use sortArray for workflow crawls (#764) (#765)
fixes from 1.4.1:
* Upgrade to mongo 6 and use  for workflow crawls

* update readiness probe with timeouts doubled, and failure threshold increased for slower 'mongosh' readiness check

update versions to 1.5.0-beta.0 in backend and frontend

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-04-11 18:22:07 -07:00
Ilya Kreymer
631c84e488 version: bump to 1.4.0! 2023-04-06 10:12:43 -07:00
Henry Wilkinson
ba3daf326d
Adds inputmode attributes to workflow config fields (#755)
- Now the appropriate virtual keyboards are shown! :)
- Also adjusts type weight for workflow config headers to match mockups
2023-04-06 09:16:48 -07:00
Henry Wilkinson
c6aec84af4
Changes the autoscroll setting to true by default (#756)
As per my note on #745, currently all our other check boxes turn features on when enabled.  For consistency I have reversed the states of the autoscroll checkbox so the page autoscrolls when it is checked and does not run the behavior when it is unchecked.  Checked is also now the default state.

- Updates help text accordingly
- Renames `disableAutoscrollBehavior` → autoscrollBehavior
2023-04-06 09:06:55 -07:00
Ilya Kreymer
3ab62547a9 version: bump to 1.4.0-beta.2 2023-04-06 02:45:20 -07:00
sua yoo
80bc4a3eb9
Fix additional URLs (#752) 2023-04-05 20:11:09 -07:00
sua yoo
91c2c1ad62
Allow users to set additional page time limits (#744) 2023-04-05 20:06:46 -07:00
sua yoo
72967a0381
Frontend Docker build improvements (#749) 2023-04-05 20:05:45 -07:00
sua yoo
c60dc5d086
Crawls list backend pagination (#735) 2023-04-05 10:55:42 -07:00
Ilya Kreymer
88497d2a64
text: rename workflowuration -> workflow (#741) 2023-04-04 08:48:06 -07:00
sua yoo
370b8cbd4d
Set max pages to API default (#739) 2023-04-04 08:47:37 -07:00