Follows https://github.com/webrecorder/browsertrix/issues/2603
## Changes
- Updates documentation on "Latest Crawl" tab
- Fixes extra fetch in workflow detail page
- Reverts workflow detail labels from "Duration" back to "Run Duration"
and "Pages" back to "Pages Crawled"
Fixes#2515.
This PR introduces a significantly optimized logic for presigning URLs
for crawls and collections.
- For collections, the files needed from all crawls are looked up, and
then the 'presign_urls' table is merged in one pass, resulting in a
unified iterator containing files and presign urls for those files.
- For crawls, the presign URLs are also looked up once, and the same
iterator is used for a single crawl with passed in list of CrawlFiles
- URLs that are already signed are added to the return list.
- For any remaining URLs to be signed, a bulk presigning function is
added, which shares an HTTP connection and signing 8 files in parallels
(customizable via helm chart, though may not be needed). This function
is used to call the presigning API in parallel.
Use V4 ('s3v4') signature version for for all presigning URLs to support
backblaze, fixes#2472
- add 'access_addressing_style' to be able to choose virtual/path
addressing for access endpoint (default to 'virtual' as before)
- fix minio presigning with v4 by using 'path' addressing style for
minio
- if path matches '/data/' for internal minio bucket, then always use
'path'
- also make minio access path '/data/' configurable
also simplify running in any namespace with default settings:
- don't hardcode 'local-minio.default'
- in crawlers namespace, add a 'local-minio' externalName service which
maps to the main namespace service.
- Combines "Watch" and "Logs" into single "Latest Crawl" tab
- Updates workflow routes and adds redirects
- Enables replaying and downloading latest crawl from the workflow
detail view
- Tweaks crawl list table header labels and and archived item download
button labels for consistency
- Fixes crawl queue showing error when stopping crawl
- Add backend validation for language codes
- Add migration to look for invalid ISO-639-1 language codes in
workflows, crawls, and org crawling defaults, and fix any found
Fixes#2600
This PR fixes the issue by ensuring that crawl page counts (total,
unique, files, errors) are reset to 0 when crawl pages are deleted, such
as right before being re-added.
It also adds a migration will recalculates file and error page counts
for each crawl without re-adding pages from the WACZ files.
- Adds new `<btrix-data-grid>` component
- Refactors `<btrix-usage-history-table>` to data grid
- Refactors Refactors `<btrix-syntax-input>` and
`<btrix-link-selector-table>` to be form-associated controls.
---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Resolves https://github.com/webrecorder/browsertrix/issues/2560
## Changes
- Syncs workflow current form section with user guide section.
- Stickies "User Guide" button to top of viewport so that user guide can
be opened.
- Makes content behind user guide clickable (fixes issues with stickied
elements shifting when user guide is not contained to the parent
element.)
- Decreases size of user guide text when embedded in an iframe.
- Refactors overflow scrim to reuse CSS variables.
---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
A quick PR to fix#2482:
- compute topPageHosts as part of existing collection stats compute
- store top 10 results in collection for now.
- display in collection About sidebar
- fixes#2482
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Fixes https://github.com/webrecorder/browsertrix/issues/2595
## Changes
Adds "Subscription Cancellation Scheduled" state/icon/tooltip to
superadmin org list, with future cancellation duration/date.
Adds more subscription-related info and features to the action menu in
the same org list
- "Open in Stripe" action is visible if subscription id is a Stripe
object id
- "Plan ID" and "Action on Cancel" correspond to `planId` and
`readOnlyOnCancel` properties on `subscription` object
- There's also some additional highlighting for possible errors
(hopefully only visible on dev) — see the last screenshot for an example
Adds first pass at filters for superadmin org list
- The filters' counts update when searching
- I took an initial pass at figuring out which filters would be most
useful — we can always go back and tweak them later
Subscription Management: used check to ensure subscription can be auto-canceled if
not activated.
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- Hides the "Delete" menu item for a running crawl in the workflows
crawls list.
- Slightly grays out row for running crawl to indicate that it's not
clickable.
Closes#2574
cc @SuaYoo
## Changes
This adds an internal `--btrix-table-grid-template-columns--internal`
css property to `btrix-table` to set table grid cols, which uses the
`--btrix-table-grid-template-columns` value if defined and otherwise
defaults to the number of header cols **from within the css
declaration**, rather than using JS. In Chrome at least,
`this.style.getPropertyValue` wasn't picking up on css variables defined
outside of the custom component boundary, so this gets around that.
Other changes:
- Adds an additional column to the superadmin org list, as it was
missing one
- Fixes `overflow-dropdown` unintentionally setting its internal
button's size to `undefined` if `size` wasn't set on it
- Swaps the remaining tables to use
`--btrix-table-grid-template-columns` instead of directly setting
`grid-template-columns`
- Adds a min-width of `min-content` to the table container, because
doing so is necessary for left/right scrolling, and this is a common
enough pattern it seems that upstreaming this into the table itself
makes sense — it shouldn't cause breakages, this already generally is
the expected behaviour
- Allows tables to scroll left/right when necessary
- Fix padding/margin for a few left/right scrolling tables
- Allows primary column of collections list to shrink to a smaller min
width
## Testing
Test that none of the other tables are broken. I couldn't find any!
Resolves https://github.com/webrecorder/browsertrix/issues/2366
## Changes
Allows users to update current crawl with newly saved workflow settings.
## Manual testing
1. Log in as crawler
2. Start a crawl
3. Go to edit workflow. Verify "Update Crawl" button is shown
4. Click "Update Crawl". Verify crawl is updated with new settings
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Closes#2498
Yay for consistency!
## Changes
Adds a grid view to the collections list, alongside the default list
view.
- Refactors edit dialog into `collections-grid-with-edit-dialog`
component for dashboard — collections list already has its own edit
dialog, so no need for this to be duplicated in the grid component
- Adds getter/setter for `page` property of pagination component, which
fixes the dashboard not switching back to page 1 when switching between
"Public" and "All" collection views
## Manual testing
1. On the collections list page, click between "View as Grid" and "View
as List" in the toolbar
2. Verify that pagination, the collection editing dialog, and the action
menu works in grid view
3. On the dashboard in an org with multiple pages of collections, switch
to the second page of "All" collections, then switch back to "Public"
collections. Verify that the page search param disappears when switching
between views.
## Screenshots
| Page | Screenshot |
|--------|--------|
| Collection list | <img width="1282" alt="Screenshot 2025-04-17 at 3 46
55 PM"
src="https://github.com/user-attachments/assets/f6dff74f-d56e-48f6-8d44-11b84bacbafb"
/> |
| Collection list (detail) | <img width="165" alt="Screenshot 2025-04-17
at 3 46 29 PM"
src="https://github.com/user-attachments/assets/3442c5e4-a67f-46a2-b475-ee4d3d1e0259"
/> |
---
Remaining things to do:
- [x] Add full actions menu from list view to grid view, instead of just
having pencil icon
- [x] Reuse collection editing dialog from existing list view, instead
of the grid view having its own separate dialog instance
- Upgrades typescript-eslint to a more performant version and related
dependencies. Note that these dependencies were not upgraded to the
latest version to avoid upgrading to eslint 9 at this time.
- Upgrades Lit one minor version
Closes#1944
## Changes
- Pagination stores page number in url search params, rather than
internal state, allowing going back to a specific page in a list
- Pagination navigation pushes to history stack, and listens to history
changes to be able to respond to browser history navigation
(back/forward)
- Search parameter reactive controller powers pagination component
- Pagination component allows for multiple simultaneous paginations via
custom `name` property
## Manual testing
1. Log in as any role
2. Go to one of the list views on an org with enough items in the list
to span more than one page
3. Click on one of the pages, and navigate back in your browser. The
selected page should respect this navigation and return to the initial
numbered page.
4. Navigate forward in your browser. The selected page should respect
this navigation and switch to the numbered page from the previous step.
5. Click on a non-default page, and then click on one of the items in
the list to go to its detail page. Then, using your browser's back
button, return to the list page. You should be on the same numbered page
as before.
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Fixes regression introduced by
7c6bae8d61
## Changes
Handles orgs without any crawl defaults correctly. Areas that use
crawling defaults are also more strongly typed now to prevent similar
issues.
Resolves https://github.com/webrecorder/browsertrix/issues/2513
## Changes
- Allows org admins to set custom behaviors as crawling defaults
- Shows warning text if both autoscroll/autoclick and custom behaviors
are enabled
- Refactors `infoTextStrings` -> `infoTextFor` to match other
label/string matchers
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- Displays behavior logs wherever error logs are shown
- Makes page URL in detail dialog clickable rather than in row column to
prevent accidental navigation
- Rename "Download Logs" -> "Download All Logs" and add tooltip with
additional context
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Resolves#2504
## Changes
- Allows users to customize autoclick selector in workflows
- Refactors `btrix-syntax-input` to support rendering label and help
text `sl-input`
- Show autoclick selector in workflow / crawl settings
- Adds 'clickSelector' with default of 'a' to backend crawl config.
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Allows users to save a workflow with an empty "Link Selectors" table,
using the default value. This is aligned with how we use default values
for other empty inputs, and prevents a case where a user may
inadvertently removed a row and now cannot save a workflow with the
default link selector.
Also updates the info text to show the default value.
Backend work for #2524
This PR adds a second dedicated endpoint similar to `/errors`, as a
combined log endpoint would give a false impression of being the
complete crawl logs (which is far from what we're serving in Browsertrix
at this point).
Eventually when we have support for streaming live crawl logs in
`crawls/<id>/logs` I'd ideally like to deprecate these two dedicated
endpoints in favor of using that, but for now this seems like the best
solution.
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Follow-up to #2152
Related to https://github.com/webrecorder/browsertrix/pull/2487
This PR provides very basic validation of the `config.selectLinks`
argument on workflow creation and update. Namely, it checks that:
- `config.selectLinks` is not an empty array
- Each entry consists of two non-empty text sequences separated by `->`
At this point we're not validating the actual CSS selector on the
backend, though we could add that down the road.
Tests have been added accordingly.
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Backend support for #2151
Adds support for specifying custom behaviors via a list of strings.
When workflows are added or modified, minimal backend validation is done
to ensure that all custom behavior URLs are valid URLs (after removing
the git prefix and custom query arguments).
A separate `POST /crawlconfigs/validate/custom-behavior` endpoint is
also added, which can be used to validate a custom behavior URL. It
performs the same syntax check as above and then:
- For URL directly to behavior file, ensures URL resolves and returns a
2xx/3xx status code
- For Git repositories, uses `git ls-remote` to ensure they exist (and
that branch exists if specified)
---------
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>