Commit Graph

1645 Commits

Author SHA1 Message Date
Tessa Walsh
dc41468daf
Allow users to run crawls with 1 or 2 browser windows (#2627)
Fixes #2425 

## Changed

- Switch backend to primarily using number of browser windows rather
than scale multiplier (including migration to calculate `browserWindows`
from `scale` for existing workflows and crawls)
- Still support `scale` in addition to `browserWindows` in input models
for creating and updating workflows and re-adjusting live crawl scale
for backwards compatibility
- Adds new `max_browser_windows` value to Helm chart, but calculates the
value from `max_crawl_scale` as fallback for users with that value
already set in local charts
- Rework frontend to allow users to select multiples of
`crawler_browser_instances` or any value below
`crawler_browser_instances` for browser windows. For instance, with
`crawler_browser_instances=4` and `max_browser_windows=8`, the user
would be presented with the following options: 1, 2, 3, 4, 8
- Sets maximum width of screencast to image width returned by `message`

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2025-06-03 13:37:30 -07:00
Ilya Kreymer
f5c120b529
Don't clobber existing helm chart in release! (#2643)
Switch to different github release action:
- avoids clobbering existing release if already published, updates
existing draft only with latest Helm chart
- also sets name to `Browsertrix <version>`, fills in changelist.
- fixes #2642 

Tested:
- New draft release created (since branch ends in `-release`)
- Running multiple types to ensure chart is updated in draft
- Switching to older release to ensure chart is *NOT* clobbered
2025-06-03 09:28:34 -07:00
Ilya Kreymer
0e06ccd746 version: bump to 1.17.0-beta.0 2025-06-02 14:46:32 -07:00
Emma Segal-Grossman
4ed1a37f9d
Popover styling fixes (#2637) 2025-06-02 13:51:24 -04:00
Pierre
8b54444b7e
docs: update remote deployment docs with working nginx-install example (#2625)
- Update the docs on k3s deployment for installing `ingress-nginx`, fixes
#2619.
- Also fix the indentation on the code blocks so markdown carries on list
numbering. At the moment the numbering confusingly resets after point 3.
- Update indentation on all code blocks so they show up as part of list +
wrap long commands.
---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-05-28 20:07:02 -07:00
sua yoo
2aad7b8dc0
feat: Make saving simple workflow more efficient (#2626)
- Sticks workflow form save/run buttons to the viewport if all the
required fields are filled
- Adds keyboard shortcuts to save (cmd/ctrl + S to save, cmd/ctrl +
Enter to save and run)
- Adds "Cancel" button to new workflow
2025-05-28 20:04:07 -07:00
sua yoo
858ae15ce6
feat: Handle paused state + workflow performance improvements (#2610)
- Handles `paused` workflow state.
- Adds "Copy Crawl ID" and "View Archived Item" buttons to workflow
detail
- Fixes file size not updating in workflow crawls list
- Fixes superadmin banner showing over workflow tabs
- Refactors workflow detail API calls to use `Task` to improve poll
performance.
- Fixes execution time rendering when less than a minute

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-05-28 19:26:38 -07:00
sua yoo
9f17264aa9
devex: Create btrix-popover component (#2632)
Add and documents new `btrix-popover` component.
2025-05-28 18:29:30 -07:00
sua yoo
7e3e8a594f
gh: Update issue templates (#2621)
- Adds issue type to each template
- Differentiates user-submitted "Change Request" from internal "Planned
Feature". This allows us to separate user-submitted ideas from work
we've planned through the new feature workflow, and automatically set
the github project.
- Adds template for docs change
- Makes additional context section optional, I noticed many issues put
"n/a" or similar in this section anyway.
- Disables blank issue adds generic "Task" issue template
2025-05-27 18:11:55 -07:00
sua yoo
7c32e27f94
fix: Show 404 page for nonexistent org (#2620)
Renders 404 page if org in URL doesn't exist.
2025-05-27 18:10:49 -07:00
Ilya Kreymer
5b0f851857
Fix securityContext for pod (#2623)
Some of the `securityContext` settings need to be on the container, not
on the pod, including the read-only file system, which was not previously enabled.
This now enables the read-only file system.
Also map the crawler /tmp directory to use the same volume as crawls (as
crawler currently uses /tmp dir) as /tmp becomes read-only otherwise.
2025-05-27 10:59:50 -07:00
sua yoo
7674672027
feat: Update superadmin active crawls view (#2618)
- Renames "Running Crawls" -> "Active Crawls" in superadmin app bar
- Shows number of active crawls next to link
- Refreshes active crawl list every 30 seconds
- Standardizes browser title
2025-05-26 12:22:38 -07:00
Ilya Kreymer
cb50c7c2c2
Pause / Resume Crawls Initial Implmentation. (#2572)
- add 'pause' crawl state (fixes #2567)
- gracefully shut down crawler pods, and then redis pod when paused
- crawler uploads WACZ before shutting down (dependent on
webrecorder/browsertrix-crawler#824, supported in 1.6.1+)
- add 'paused_at' on crawl spec to indicate when crawl is paused
- support max pause time limit, after which crawl becomes automatically
stopped.
- add 'stopped_pause_expired' when pause automatically expires and crawl
is stopped
- /crawl/<id>/{pause,resume} apis to toggle 'paused' on crawl spec
- ui: add pause/resume button, paused state (partially addresses #2568)
- ui: add pausing/resuming derivative states when crawl is running and
pausing, or paused and not pausing (partially addresses #2569)
- Designed to work with crawler 1.6.1+ which support pausing + uploading on pause

Work on #2566, Fixes #2576 

---------
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: sua yoo <sua@suayoo.com>
2025-05-21 14:05:16 -07:00
Ilya Kreymer
e995811dd4 version: bump to 1.16.2 2025-05-20 18:43:22 -07:00
Ilya Kreymer
8a713155ef
remove deleted collections from crawlconfigs (#2615)
simplified version of #2608, add a remove_collection_from_all_configs() in CrawlConfigs, also check org.
update tests to ensure removal
2025-05-20 18:38:40 -07:00
Ilya Kreymer
86e35e358d
Add Org Check for Collection access (#2616)
Ensure collection access checks org membership
2025-05-20 15:30:22 -07:00
Ilya Kreymer
e29db33629
tests: fix nightly test config after #2611 (#2614)
remove namespace from minio config to match settings
2025-05-20 12:25:15 -07:00
sua yoo
ef93c5ad90
docs: Document latest crawl (#2613)
Follows https://github.com/webrecorder/browsertrix/issues/2603

## Changes

- Updates documentation on "Latest Crawl" tab
- Fixes extra fetch in workflow detail page
- Reverts workflow detail labels from "Duration" back to "Run Duration"
and "Pages" back to "Pages Crawled"
2025-05-20 12:19:09 -07:00
Ilya Kreymer
c134b576ae
Optimize presigning for replay.json (#2516)
Fixes #2515.

This PR introduces a significantly optimized logic for presigning URLs
for crawls and collections.
- For collections, the files needed from all crawls are looked up, and
then the 'presign_urls' table is merged in one pass, resulting in a
unified iterator containing files and presign urls for those files.
- For crawls, the presign URLs are also looked up once, and the same
iterator is used for a single crawl with passed in list of CrawlFiles
- URLs that are already signed are added to the return list.
- For any remaining URLs to be signed, a bulk presigning function is
added, which shares an HTTP connection and signing 8 files in parallels
(customizable via helm chart, though may not be needed). This function
is used to call the presigning API in parallel.
2025-05-20 12:09:35 -07:00
Ilya Kreymer
f1fd11c031
storage: use s3v4 signature for presigning urls (#2611)
Use V4 ('s3v4') signature version for for all presigning URLs to support
backblaze, fixes #2472
- add 'access_addressing_style' to be able to choose virtual/path
addressing for access endpoint (default to 'virtual' as before)
- fix minio presigning with v4 by using 'path' addressing style for
minio
- if path matches '/data/' for internal minio bucket, then always use
'path'
- also make minio access path '/data/' configurable

also simplify running in any namespace with default settings:
- don't hardcode 'local-minio.default'
- in crawlers namespace, add a 'local-minio' externalName service which
maps to the main namespace service.
2025-05-19 15:44:36 -07:00
sua yoo
4b1e416eb6
feat: Workflow "latest crawl" tab (#2605)
- Combines "Watch" and "Logs" into single "Latest Crawl" tab
- Updates workflow routes and adds redirects
- Enables replaying and downloading latest crawl from the workflow
detail view
- Tweaks crawl list table header labels and and archived item download
button labels for consistency
- Fixes crawl queue showing error when stopping crawl
2025-05-14 10:23:36 -07:00
sua yoo
7c9627f4bb
chore: Clean up data grid component (#2604)
- Moves data grid styles to separate stylesheet.
- Adds `rowsSelectable` option, renames `rows-` properties to match.
- Adds WIP `rowsExpandable` option.
- Fixes showing tooltip on focus.
- Cleans up rows controller typing.

---------

Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
2025-05-14 09:44:07 -07:00
Tessa Walsh
c73512dbd4
Bump version to 1.16.1 (#2606) 2025-05-13 17:29:49 -04:00
Tessa Walsh
1492397656
Add ISO-639-1 language code validation to backend (#2602)
- Add backend validation for language codes
- Add migration to look for invalid ISO-639-1 language codes in
workflows, crawls, and org crawling defaults, and fix any found
2025-05-13 16:54:33 -04:00
Emma Segal-Grossman
e17772145e
Add minimized superadmin banner (#2598) 2025-05-13 16:32:35 -04:00
Tessa Walsh
6f81d588a9
Ensure crawl page counts are correct when re-adding pages (#2601)
Fixes #2600 

This PR fixes the issue by ensuring that crawl page counts (total,
unique, files, errors) are reset to 0 when crawl pages are deleted, such
as right before being re-added.

It also adds a migration will recalculates file and error page counts
for each crawl without re-adding pages from the WACZ files.
2025-05-13 14:05:41 -04:00
sua yoo
594f5bc171
devex: Data grid component (#2561)
- Adds new `<btrix-data-grid>` component
- Refactors `<btrix-usage-history-table>` to data grid
- Refactors Refactors `<btrix-syntax-input>` and
`<btrix-link-selector-table>` to be form-associated controls.

---------

Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
2025-05-12 10:36:14 -07:00
sua yoo
6b510fe89c
fix: Sync user guide to correct workflow section (#2592)
Resolves https://github.com/webrecorder/browsertrix/issues/2560

## Changes

- Syncs workflow current form section with user guide section.
- Stickies "User Guide" button to top of viewport so that user guide can
be opened.
- Makes content behind user guide clickable (fixes issues with stickied
elements shifting when user guide is not contained to the parent
element.)
- Decreases size of user guide text when embedded in an iframe.
- Refactors overflow scrim to reuse CSS variables.

---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
2025-05-08 14:41:35 -07:00
Ilya Kreymer
652e8a6085 version: bump to 1.16.0 2025-05-08 14:30:00 -07:00
Ilya Kreymer
1570011ec7
compute top page origins for each collection (#2483)
A quick PR to fix #2482:
- compute topPageHosts as part of existing collection stats compute
- store top 10 results in collection for now.
- display in collection About sidebar
- fixes #2482 

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-05-08 14:22:40 -07:00
Emma Segal-Grossman
0691f43be6
Sort running crawls first by default (#2587) 2025-05-08 17:21:17 -04:00
Emma Segal-Grossman
5915c24c18
Add "cancellation scheduled" state to superadmin org list (#2594)
Fixes https://github.com/webrecorder/browsertrix/issues/2595

## Changes

Adds "Subscription Cancellation Scheduled" state/icon/tooltip to
superadmin org list, with future cancellation duration/date.

Adds more subscription-related info and features to the action menu in
the same org list
- "Open in Stripe" action is visible if subscription id is a Stripe
object id
- "Plan ID" and "Action on Cancel" correspond to `planId` and
`readOnlyOnCancel` properties on `subscription` object
- There's also some additional highlighting for possible errors
(hopefully only visible on dev) — see the last screenshot for an example

Adds first pass at filters for superadmin org list
- The filters' counts update when searching
- I took an initial pass at figuring out which filters would be most
useful — we can always go back and tweak them later
2025-05-06 18:59:29 -07:00
Tessa Walsh
3e169ebc15
Add API endpoint to check if subscription is activated (#2582)
Subscription Management: used check to ensure subscription can be auto-canceled if
not activated.

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-05-06 17:36:58 -07:00
sua yoo
cb6e279a3c
fix: Hide incorrect menu item for running workflow crawl (#2591)
- Hides the "Delete" menu item for a running crawl in the workflows
crawls list.
- Slightly grays out row for running crawl to indicate that it's not
clickable.
2025-05-06 15:19:33 -07:00
sua yoo
0ec94098a5
fix: Show correct button for workflow without crawls (#2590)
Shows "Run Now" button instead of "QA Latest Crawl" in workflow "Watch"
tab when there aren't any crawls.
2025-05-06 14:31:26 -07:00
sua yoo
62a53d01d6
fix: Correct post load delay label (#2593) 2025-05-06 10:09:51 -07:00
Emma Segal-Grossman
8b6e1ca9af
Add overflow scroll component with scroll scrim/shadow (#2578) 2025-05-05 20:24:47 -04:00
Emma Segal-Grossman
4ce769ecab
Ensure primary button in button group has its border appear (#2583) 2025-05-05 20:24:34 -04:00
Emma Segal-Grossman
8a707e3b3a
Fix table grid column CSS variable, superadmin list menus being hidden/inoperable, and various other table tweaks (#2573)
Closes #2574
cc @SuaYoo 

## Changes

This adds an internal `--btrix-table-grid-template-columns--internal`
css property to `btrix-table` to set table grid cols, which uses the
`--btrix-table-grid-template-columns` value if defined and otherwise
defaults to the number of header cols **from within the css
declaration**, rather than using JS. In Chrome at least,
`this.style.getPropertyValue` wasn't picking up on css variables defined
outside of the custom component boundary, so this gets around that.

Other changes:
- Adds an additional column to the superadmin org list, as it was
missing one
- Fixes `overflow-dropdown` unintentionally setting its internal
button's size to `undefined` if `size` wasn't set on it
- Swaps the remaining tables to use
`--btrix-table-grid-template-columns` instead of directly setting
`grid-template-columns`
- Adds a min-width of `min-content` to the table container, because
doing so is necessary for left/right scrolling, and this is a common
enough pattern it seems that upstreaming this into the table itself
makes sense — it shouldn't cause breakages, this already generally is
the expected behaviour
- Allows tables to scroll left/right when necessary
- Fix padding/margin for a few left/right scrolling tables
- Allows primary column of collections list to shrink to a smaller min
width

## Testing

Test that none of the other tables are broken. I couldn't find any!
2025-04-29 21:00:16 -04:00
sua yoo
1fa43335c0
feat: Apply saved workflow settings to current crawl (#2514)
Resolves https://github.com/webrecorder/browsertrix/issues/2366

## Changes

Allows users to update current crawl with newly saved workflow settings.

## Manual testing

1. Log in as crawler
2. Start a crawl
3. Go to edit workflow. Verify "Update Crawl" button is shown
4. Click "Update Crawl". Verify crawl is updated with new settings

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-04-29 11:43:14 -07:00
Tessa Walsh
c4a7ebce29
Update button text from "Setup Guide" to "User Guide" for consistency (#2565)
Fixes #2564
2025-04-24 10:58:26 -04:00
sua yoo
573d8ca316
devex: Document workflow table components (#2558)
- Documents the following components in Storybook:
  - `btrix-data-table`
  - `btrix-table`
  - `btrix-crawl-log-table`
  - `btrix-custom-behaviors-table`
  - `btrix-link-selector-table`
  - `btrix-queue-exclusion-table`
  - `btrix-queue-exclusion-form`
- Refactors `btrix-table` and subcomponents to simplify CSS properties
- Fixes crawl exclusion table delete button not rendering
- Fixes Shoelace assets not loading Storybook
2025-04-23 19:31:34 -07:00
Tessa Walsh
f34b42cb59
Add custom behavior docs to user guide (#2559) 2025-04-23 14:27:39 -04:00
Emma Segal-Grossman
76ab3e7eaa
Add grid view to collection list (#2403)
Closes #2498 

Yay for consistency!

## Changes

Adds a grid view to the collections list, alongside the default list
view.

- Refactors edit dialog into `collections-grid-with-edit-dialog`
component for dashboard — collections list already has its own edit
dialog, so no need for this to be duplicated in the grid component
- Adds getter/setter for `page` property of pagination component, which
fixes the dashboard not switching back to page 1 when switching between
"Public" and "All" collection views

## Manual testing

1. On the collections list page, click between "View as Grid" and "View
as List" in the toolbar
2. Verify that pagination, the collection editing dialog, and the action
menu works in grid view
3. On the dashboard in an org with multiple pages of collections, switch
to the second page of "All" collections, then switch back to "Public"
collections. Verify that the page search param disappears when switching
between views.

## Screenshots

| Page | Screenshot |
|--------|--------|
| Collection list | <img width="1282" alt="Screenshot 2025-04-17 at 3 46
55 PM"
src="https://github.com/user-attachments/assets/f6dff74f-d56e-48f6-8d44-11b84bacbafb"
/> |
| Collection list (detail) | <img width="165" alt="Screenshot 2025-04-17
at 3 46 29 PM"
src="https://github.com/user-attachments/assets/3442c5e4-a67f-46a2-b475-ee4d3d1e0259"
/> |

---



Remaining things to do:
- [x] Add full actions menu from list view to grid view, instead of just
having pencil icon
- [x] Reuse collection editing dialog from existing list view, instead
of the grid view having its own separate dialog instance
2025-04-23 14:08:50 -04:00
sua yoo
78e2dadf0a
devex: Add Storybook for component development (#2556)
Adds Storybook in preparation for UI component refactoring.
2025-04-21 13:06:31 -07:00
sua yoo
c2a11ccf10
deps: Upgrade main frontend dependencies (#2551)
- Upgrades typescript-eslint to a more performant version and related
dependencies. Note that these dependencies were not upgraded to the
latest version to avoid upgrading to eslint 9 at this time.
- Upgrades Lit one minor version
2025-04-15 13:31:50 -07:00
Emma Segal-Grossman
224f34e288
Update to docker/setup-buildx-action@v3 (#2553)
## Changes
- Updates `docker/setup-buildx-action` from v2 to v3
- This should fix the intermittent build error seen here:
https://github.com/webrecorder/browsertrix/actions/runs/14362787038/job/40268326912?pr=2538
- The breaking change from v2 to v3 is the minimum node version is
updated to 20, which is already the minimum version we use
- Relevant buildx issue here:
https://github.com/docker/buildx/issues/681
2025-04-09 21:42:45 +02:00
sua yoo
f2e6892729
fix: Update custom behavior file placeholder text (#2552)
Follows https://github.com/webrecorder/browsertrix/issues/2151

## Changes

Updates placeholder text for custom behavior files, since we now accept
JSON.
2025-04-09 21:41:53 +02:00
Emma Segal-Grossman
eeda4cd9ff
Persist pagination state in url (#2538)
Closes #1944 

## Changes
- Pagination stores page number in url search params, rather than
internal state, allowing going back to a specific page in a list
- Pagination navigation pushes to history stack, and listens to history
changes to be able to respond to browser history navigation
(back/forward)
- Search parameter reactive controller powers pagination component
- Pagination component allows for multiple simultaneous paginations via
custom `name` property

## Manual testing

1. Log in as any role
2. Go to one of the list views on an org with enough items in the list
to span more than one page
3. Click on one of the pages, and navigate back in your browser. The
selected page should respect this navigation and return to the initial
numbered page.
4. Navigate forward in your browser. The selected page should respect
this navigation and switch to the numbered page from the previous step.
5. Click on a non-default page, and then click on one of the items in
the list to go to its detail page. Then, using your browser's back
button, return to the list page. You should be on the same numbered page
as before.

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2025-04-09 15:40:30 -04:00
sua yoo
b0d1a35563
fix: Handle no crawling defaults (#2549)
Fixes regression introduced by
7c6bae8d61

## Changes

Handles orgs without any crawl defaults correctly. Areas that use
crawling defaults are also more strongly typed now to prevent similar
issues.
2025-04-09 12:48:12 -04:00