Closes#2498
Yay for consistency!
## Changes
Adds a grid view to the collections list, alongside the default list
view.
- Refactors edit dialog into `collections-grid-with-edit-dialog`
component for dashboard — collections list already has its own edit
dialog, so no need for this to be duplicated in the grid component
- Adds getter/setter for `page` property of pagination component, which
fixes the dashboard not switching back to page 1 when switching between
"Public" and "All" collection views
## Manual testing
1. On the collections list page, click between "View as Grid" and "View
as List" in the toolbar
2. Verify that pagination, the collection editing dialog, and the action
menu works in grid view
3. On the dashboard in an org with multiple pages of collections, switch
to the second page of "All" collections, then switch back to "Public"
collections. Verify that the page search param disappears when switching
between views.
## Screenshots
| Page | Screenshot |
|--------|--------|
| Collection list | <img width="1282" alt="Screenshot 2025-04-17 at 3 46
55 PM"
src="https://github.com/user-attachments/assets/f6dff74f-d56e-48f6-8d44-11b84bacbafb"
/> |
| Collection list (detail) | <img width="165" alt="Screenshot 2025-04-17
at 3 46 29 PM"
src="https://github.com/user-attachments/assets/3442c5e4-a67f-46a2-b475-ee4d3d1e0259"
/> |
---
Remaining things to do:
- [x] Add full actions menu from list view to grid view, instead of just
having pencil icon
- [x] Reuse collection editing dialog from existing list view, instead
of the grid view having its own separate dialog instance
- Upgrades typescript-eslint to a more performant version and related
dependencies. Note that these dependencies were not upgraded to the
latest version to avoid upgrading to eslint 9 at this time.
- Upgrades Lit one minor version
Closes#1944
## Changes
- Pagination stores page number in url search params, rather than
internal state, allowing going back to a specific page in a list
- Pagination navigation pushes to history stack, and listens to history
changes to be able to respond to browser history navigation
(back/forward)
- Search parameter reactive controller powers pagination component
- Pagination component allows for multiple simultaneous paginations via
custom `name` property
## Manual testing
1. Log in as any role
2. Go to one of the list views on an org with enough items in the list
to span more than one page
3. Click on one of the pages, and navigate back in your browser. The
selected page should respect this navigation and return to the initial
numbered page.
4. Navigate forward in your browser. The selected page should respect
this navigation and switch to the numbered page from the previous step.
5. Click on a non-default page, and then click on one of the items in
the list to go to its detail page. Then, using your browser's back
button, return to the list page. You should be on the same numbered page
as before.
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Fixes regression introduced by
7c6bae8d61
## Changes
Handles orgs without any crawl defaults correctly. Areas that use
crawling defaults are also more strongly typed now to prevent similar
issues.
Resolves https://github.com/webrecorder/browsertrix/issues/2513
## Changes
- Allows org admins to set custom behaviors as crawling defaults
- Shows warning text if both autoscroll/autoclick and custom behaviors
are enabled
- Refactors `infoTextStrings` -> `infoTextFor` to match other
label/string matchers
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- Displays behavior logs wherever error logs are shown
- Makes page URL in detail dialog clickable rather than in row column to
prevent accidental navigation
- Rename "Download Logs" -> "Download All Logs" and add tooltip with
additional context
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Resolves#2504
## Changes
- Allows users to customize autoclick selector in workflows
- Refactors `btrix-syntax-input` to support rendering label and help
text `sl-input`
- Show autoclick selector in workflow / crawl settings
- Adds 'clickSelector' with default of 'a' to backend crawl config.
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Allows users to save a workflow with an empty "Link Selectors" table,
using the default value. This is aligned with how we use default values
for other empty inputs, and prevents a case where a user may
inadvertently removed a row and now cannot save a workflow with the
default link selector.
Also updates the info text to show the default value.
Follow-up to #2152
Related to https://github.com/webrecorder/browsertrix/pull/2487
This PR provides very basic validation of the `config.selectLinks`
argument on workflow creation and update. Namely, it checks that:
- `config.selectLinks` is not an empty array
- Each entry consists of two non-empty text sequences separated by `->`
At this point we're not validating the actual CSS selector on the
backend, though we could add that down the road.
Tests have been added accordingly.
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- add 'imagePullPolicy' field to each crawler channel declaration
- if unset, defaults to the setting in the existing
'crawler_image_pull_policy' field.
fixes#2522
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
- Displays built-in behaviors as single field in workflow settings
- Standardizes how "None" is displayed in workflow settings
- Refactors behavior names into enum
if logged in user is not part of any orgs, still allow logging in,
instead of throwing an exception due to accessing non-existent org
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Reported by @tw4l
Quick fix for the bug I introduced in 1bc3c35 in #2481. I didn't
properly test on the workflow editor in a "new workflow" state, and
didn't realize that the component that fetches the workflow state for an
existing workflow wouldn't be rendered for a new workflow, so the update
to the loading state never occurred for new workflows. This fix
explicitly sets `isCrawlRunning` to `false` instead of `null` for new
workflows, so that the loading state isn't displayed.
Tested locally with both new and existing workflows (in both non-running
and running states).
Part of https://github.com/webrecorder/browsertrix/issues/2366
## Changes
- Displays latest running crawl status when editing workflow
- Disables "Run Now" button if crawl is currently running
Currently, clicking "Run Now" will result in a preventable server error
if the crawl is already running. The change in this PR is in preparation
for being able to update a currently running crawl and doesn't require
any backend changes.
## Manual testing
1. Log in as crawler
2. Go to edit crawl workflow
3. Open same workflow in another tab
4. Run the workflow
5. Go back to edit tab. Verify "Starting" status is shown next to "Save"
button and "Run Crawl" button is disabled
## Screenshots
| Page | Image/video |
| ---- | ----------- |
| Edit Workflow | <img width="354" alt="Screenshot 2025-03-11 at 1 34
07 PM"
src="https://github.com/user-attachments/assets/02f7fb4a-219d-43a4-bb1f-1f2b40ac1480"
/> |
<!-- ## Follow-ups -->
---------
Co-authored-by: emma <hi@emma.cafe>
Hides "Back to [org name]" breadcrumb when viewing a public/unlisted
collection when the public gallery isn't enabled for the org (except
when logged into that org).
- Moves "Per-Page Limits" fields to new "Page Behavior" section
- Fixes workflow settings closing tags with refactor to how sections are
rendered
- Updates user guide with behaviors documentation
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- Updated how collections gallery and presentation and sharing pages
- Collections gallery page content extracted from blog post, linked from blog post
- Each page has one video covering the gallery setting and individual collection presentation
- Cleaned up text on both to avoid duplicated content (thanks @DaleLore)
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: DaleLore <DaleLoreNY@gmail.com>
Resolves https://github.com/webrecorder/browsertrix/issues/2452
## Changes
- Displays page count and collection size in listing grid
- Displays month if collection period is in the same year
- Displays collection size in About > Details section
- Minor refactor: move byte formatting into `localize.ts` utility file,
move slash (`/`) separator into own utility file
- remove query for /collections endpoint just to get the org name
- add orgName to single /collection endpoint, where it is already
available on the backend
- don't wait for languages to be ready to render UI, as this can result
in empty page if backend can not be reached.
- catch if /api/settings returns an invalid response to show 'backend
initializing' message
- will support initContainers where backend may return 5xx error while
backend is initializing, via #2449
Note: this results in locale picker showing all available locales if
backend is not available, not just filtered ones, but I think that's a
reasonable trade-off.
### Changes
- Public collections can now be deeplinked
### Caveats
- When users click the _About this Collection_ tab and then return to
the _Browse Collection_ tab, the deeplink is gone until they visit
another page.
- consolidate list_pages() and list_replay_query_pages() into
list_pages()
- to keep backwards compatibility, add <crawl>/pagesSearch that does not
include page totals, keep <crawl>/pages with page total (slower)
- qa frontend: add default 'Crawl Order' sort order, to better show
pages in QA view
- bgjob: account for parallelism in bgjobs, add logging if succeeded
mismatches parallelism
- QA sorting: default to 'crawl order' by default to get better results.
- Optimize pages job: also cover crawls that may not have any pages but have pages listed in done stats
- Bgjobs: give custom op jobs more memory
Resolves https://github.com/webrecorder/browsertrix/issues/2359
## Changes
- Track when a workflow form section is opened
- Hide workflow form section navigation on small screens
---------
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Fixes#2406
Converts migration 0042 to launch a background job (parallelized across
several pods) to migrate all crawls by optimizing their pages and
setting `version: 2` on the crawl when complete.
Also Optimizes MongoDB queries for better performance.
Migration Improvements:
- Add `isMigrating` and `version` fields to `BaseCrawl`
- Add new background job type to use in migration with accompanying
`migration_job.yaml` template that allows for parallelization
- Add new API endpoint to launch this crawl migration job, and ensure
that we have list and retry endpoints for superusers that work with
background jobs that aren't tied to a specific org
- Rework background job models and methods now that not all background
jobs are tied to a single org
- Ensure new crawls and uploads have `version` set to `2`
- Modify crawl and collection replay.json endpoints to only include
fields for replay optimization (`initialPages`, `pageQueryUrl`,
`preloadResources`) if all relevant crawls/uploads have `version` set to
`2`
- Remove `distinct` calls from migration pathways
- Consolidate collection recompute stats
Query Optimizations:
- Remove all uses of $group and $facet
- Optimize /replay.json endpoints to precompute preload_resources, avoid
fetching crawl list twice
- Optimize /collections endpoint by not fetching resources
- Rename /urls -> /pageUrlCounts and avoid $group, instead sort with
index, either by seed + ts or by url to get top matches.
- Use $gte instead of $regex to get prefix matches on URL
- Use $text instead of $regex to get text search on title
- Remove total from /pages and /pageUrlCounts queries by not using
$facet
- frontend: only call /pageUrlCounts when dialog is opened.
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
While ideally we don't need to use superadmin for many things, there are
still a lot of places where it's necessary, especially around customer
service. This makes it a little more visible when that's the case, just
as a reminder. I could see this coming in handy especially for newer
people who might not have the experience to know to look for the "admin"
and "running crawls" buttons.
<img width="1088" alt="Screenshot 2025-02-13 at 1 12 58 PM"
src="https://github.com/user-attachments/assets/70b975e1-af6b-4e8c-9e49-52c4c66e9721"
/>
Related to #2152
This PR adds backend support for custom link selectors via `selectLinks`
on the crawl workflow config. Tests have been updated as well.
It also adds `selectLinks` to the frontend in a minimal and for now
hardcoded way that we can use as a basis for proper frontend support
moving forward.
---------
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
Adds a switch to switch between viewing public collections only
(default) and all collections on org dashboard.
Also updates the `house-fill` icon to `house` in a couple places
(@Shrinks99)
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- Merges existing collection content into one page
- Updates ArchiveWeb.page link
- Adds redirect from /collections → /collection
- Moves content relevant to presentation & sharing out of the intro
- Adds new content about sharing collections!
---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: sua yoo <sua@webrecorder.org>
Fixes https://github.com/webrecorder/browsertrix/issues/2383
- Fixes unpredictable sort order when typing in collection page URL
- Fixes page URL results flickering in and out while typing
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
- Updates links to running crawls to redirect to workflow "Watch" tab
- Removes unused "Jump to crawl" superadmin widgets
- Refactors archived item component to remove references to active
crawls
- Moves page count out from under "Size" label in archived item detail
- Renames "Pages Crawled" to "Pages" in archived item leading heading
and detail overview
- Renames "Crawl ID" to "Archived Item ID"
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- add ensure_page_limit_quotas() which sets the config limit to the max
pages quota, if any
- set the page limit on the config when: creating new crawl, creating
configmap
- don't set the quota page limit on new or existing crawl workflows
(remove setting it on new workflows) to allow updated quotas to take
affect for next crawl
- frontend: correctly display page limit on workflow settings page from
org quotas, if any.
- operator: get org on each sync in one place
- fixes#2369
---------
Co-authored-by: sua yoo <sua@webrecorder.org>
- Displays workflow form as collapsible sections
- Combines run now toggle into submit
- Fixes exclusion field errors not preventing form submission
- Refactors `<btrix-observable>` into new `Observable` controller
---------
Co-authored-by: emma <hi@emma.cafe>
- Refactors dashboard and org profile preview to use private API
endpoint, to fix public collections not showing when the org
visibility is hidden
- Adds additional sorting options for collections
- Adds unique page url counts for archived items, collections, and
organizations to backend and exposes this in collections
- Shows collection period (i.e. `dateEarliest` to `dateLatest`) in
collections list
- Shows same collection metadata in private and public views, updates
private view info bar
- Fixes "Update Org Profile" action item showing for crawler roles
---------
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Fixes issue where collection thumbnail is always the screenshot, even if
a Browsertrix provided default thumbnail is selected after choosing the
screenshot.
Fixes#2257
This is a follow-up to the public collections work, which adds pages to
the database for uploads. All crawls and uploads now have a `pageCount`
field which is populated when the item is successfully added. A new
migration is also added to populate the field for existing archived
items that don't have it set yet.
OrgMetrics have also been modified to include `crawlPageCount` and
`uploadPageCount`, and to include the total of both in `pageCount`, and
all three included in the frontend org dashboard.
The frontend has been updated to use `pageCount` rather than
`stats.done` wherever appropriate, meaning that in archived item lists
and details we now have a consistent page count for both crawls and
uploads.
### New functionality
- Deploy this branch
- Create new crawls and uploads and verify that page count appears
correctly throughout the frontend for all new crawls and uploads
### Migration
- Deploy from latest main
- Create some crawls and uploads
- Change to this branch and re-deploy
- Verify migration ran without errors in backend logs
- Verify that page count has been populated successfully by checking
archived items lists, crawl and upload detail pages, and dashboard to
ensure there are no longer any missing page counts.
---------
Co-authored-by: emma <hi@emma.cafe>
Fixes#2259
This PR brings backend and frontend support for the new autoclick
behavior in Browsertrix, introduces in Browsertrix 1.5.0+
On the backend, we introduce `min_autoclick_crawler_image` to
`values.yaml`, with a default value of
`"docker.io/webrecorder/browsertrix-crawler:1.5.0"`. If this is set and
the crawler version for a new crawl is less than this value, the
autoclick behavior is removed from the behaviors list in the configmap
created for the crawl.
The one caveat for this is that a crawler image tag like "latest" will
always be parsed as greater than `min_autoclick_crawler_image`, so there
is the potential for the crawler to run into issues if using a
non-numeric image tag with an older version of the crawler. For
production we use hardcoded specific versions of the crawler except for
the dev channel, which from here on out will including autoclick
support, so I think this should be okay (and is also true of the
existing implementation for checking `min_qa_crawler_image`).
On the frontend, I've added a checkbox (unchecked by default) in the
"Limits" section just below the current checkbox for autoscroll. We
might want to move these to a different section eventually - I'm not
sure Limits is the right place for them - but I wanted to be consistent
with things as they are.
---------
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
Fixes https://github.com/webrecorder/browsertrix/issues/2306
## Changes
Refactors collection view configuration to wait for thumbnail preview
image (using `URL.createObjectURL`, like in QA screenshots) to be fully
loaded from `replay-web-page` before saving.
Resolves https://github.com/webrecorder/browsertrix/issues/2298
## Changes
- Slugs added to collections, can be specified separately when creating
or updating collections or else is based off of supplied collection name
- Migration added to backfill slugs for existing collections
- Redirect collection to newest slug if changed
- Adds option to copy public profile link to "Public Collections" action
menu
- Show "Back to <Org>" link instead of breadcrumbs
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- Upgrades webpack and webpack tool versions
- Updates dev source map to webpack recommendation
- Implements `webpack.DllPlugin` in dev for faster rebuilds
- Implements `thread-loader` to run `ts-loader` in a worker pool
- Disables saving collection start page if valid snapshot is not
selected
- Shows full URL in page URL status check mark
- Shows error in page URL status exclamation mark
- Fixes pasting in URL
- Refactors instances of `btrix-tab-list` except in workflow editor in
preparation for https://github.com/webrecorder/browsertrix/issues/2169
- Removes the visual space above navigation item since many tab headings
describe the first section in the tab, rather than the entire tab itself
Fixes#2260
- Adds `lastCrawlFinished` to Organization model, updated after crawls
are added/deleted and with an idempotent migration to backfill existing
orgs
- Adds Last Crawl column to end of admin orgs list table
- Adds subscription icon next to existing status icon in orgs list
- Adds "lastCrawlFinished", "subscriptionStatus", and "subscriptionPlan"
sort options to orgs list backend endpoint in anticipation of future
sorting/filtering of orgs list
---------
Co-authored-by: emma <hi@emma.cafe>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>