This PR adds a new checkbox to both page and seed crawl workflow types,
which will fail the crawl if behaviors detect the browser is not logged
in for supported sites.
Changes include:
- Backend support for the new crawler flag
- A new `failed_not_logged_in` crawl state
- Checkbox workflow editor and config details in the frontend (currently
in the Scope section - I think it makes sense to have this option up
front, but worth considering)
- User Guide documentation of new option
- A new nightly test for the new workflow option and
`failed_not_logged_in` state
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: sua yoo <sua@webrecorder.org>
Resolves https://github.com/webrecorder/browsertrix/issues/2764
## Changes
Uses crawl first seed as starting URL instead of workflow first seed to
fix replay after saving workflow without running.
## Manual testing
1. Log in as crawler
2. Create a new workflow and crawl a single page, for example,
https://example.com/
3. Edit the workflow to change starting URL to https://example.org/
4. Save without running
5. Go to latest crawl tab, verify replay loads https://example.com/
Resolves#2646
Depends on #2710
## Changes
(Copied from #2689)
- Allows users to specify URL list as file.
- Allow uploading a text file of URLs
- Allow specifying >100 URLs into URL list, where they will turn into an uploaded list automatically.
---------
Co-authored-by: sua yoo <sua@suayoo.com>
- Fix race condition related to browser commit time
- The profile commit request waits for browser to actual finish, and
profile saved. This can cause request to time out, resulting in a retry,
in which the browser has already been closed.
- With these changes, the commit is now 'idempotent' and returns a
waiting_for_browser until the profile is actually committed.
- On frontend, keep pinging commit endpoint with a timeout while 'waiting_for_browser' is returned, actual committed when endpoint returns profile id.
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Resolves#2718
## Changes
- Enables manual QA review for successfully finished crawls.
- Individual pages and full crawl can be reviewed without assistive QA running
- Show replay, screenshot and text without comparison if no assistive QA yet.
- follow-up to: #2736: remove '^' custom prefix URLs to avoid accumulating '^' via utility function
- Show URL prefix list in settings for custom prefix scope.
- Update user guide with correct custom prefix field.
---------
Co-authored-by: sua yoo <sua@webrecorder.org>
- Automatically compute prefix from starting URL, if no other prefix is
set in custom prefix mode.
- Ensure each prefix is actually a prefix: add '^' to each custom prefix
URL, as include URL path is a regex
- rename 'Extra URL Prefixes' to just 'URL Prefixes' and adjust help
text to indicate that the prefix list is the list that is in scope
- fixes#2735, follow up to #2722
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: sua yoo <sua@webrecorder.org>
## Changes
- Deletes and rewrites arrays in URL search params in workflow list when
editing array filters (i.e. tags & profiles)
- Removes a missed `console.log`
- bump to 1.17.3
cc @SuaYoo
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Fixes#2721
This PR removes frontend logic that set the seed-level scopeType for
custom page prefix workflows to `prefix`, which was causing the scope to
balloon larger than what users intended for some workflows.
Resolves https://github.com/webrecorder/browsertrix/issues/2660
## Changes
- Enables filtering workflow list by tag
- Displays tags near workflow name in detail view
- Adds `<btrix-filter-chip>` component
- Migrates "schedule state", "only running", and "only mine" filters
- Adds basic documentation to Storybook
---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Connected to #2661
- Removes crawl workflows from being returned as part of the profile
response.
- Frontend: removes display of workflows in profile details.
- Adds 'inUse' flag to all profile responses to indicate profile is in
use by at least one workflow
- Adds 'profileid' as possible filter for workflows search in
preparation for filtering by profile id (#2708)
- Make 'profile_in_use' a proper error (returning 400) on profile
delete.
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
No issue created, quick fix for edge case
## Changes
Adds ID to accept page toast messages so that "Please log in ..."
message closes once the invite is accepted or declined.
## Manual testing
1. Log in as org admin
2. Invite user (one you have access to) to an org
3. Log out and log in as invited user
4. Click invite link in inbox
5. Click accept quickly. Verify "Please log in ..." message is replaced
with "You've joined ..."
## Changes
Following
bbd5fb81c4,
since the banner is shown throughout the duration of the trial, it should be
made informational at the beginning of the trial so that it's not as obtrusive.
- only set state to 'paused' if shoudPause is true and crawl is still
running (using FAILED_STATES list)
- treat failed/canceled crawl as inactive, don't show replay (using
RUNNING_STATES list)
---------
Co-authored-by: sua yoo <sua@webrecorder.org>
Resolves https://github.com/webrecorder/browsertrix/issues/2650
## Changes
Differentials between `trialing` and `trialing_canceled` when displaying
messages:
- No changes to messages if `trialing_canceled`.
- If `trialing`, show messaging that subscription will automatically
continue.
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
- Updates status icons & colors in several places in the app
- Moves "Action Menus" and updated "Status Indicators" design docs from
public docs to Storybook
- [Storybook] Adds `remark-gfm` to enable tables in MDX
- [Storybook] Adds a custom `ColorSwatch` block
- [Browsertrix Docs] Swaps out custom colors and fonts included with
docs for color variables from Hickory and Webrecorder CDN's hosted font
files, respectively
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Fixes#2636
## Changes
- Displays trials scheduled for cancellation alongside non-trials
scheduled for cancellation
- Adds filter for "bad states" — active orgs that have a cancelled
subscription, orgs with a cancellation date in the past, and empty
subscription ids currently, but could be extended as necessary
- Displays scheduled-for-cancellation trials in the "trialing" filter as
well
- Improves display of future cancellation durations for both active
subscriptions and trials
- Surfaces issues where a trial cancellation was scheduled for the past
but the org is still active
- Swaps out `sl-tooltip`s for `btrix-popover`s in popovers with longer
details
- Adds correct heading levels, `tabindex`, and orientation for popovers
in use here
## Follow-ups
Once #2637 is merged we can ~~swap out the `sl-tooltip`s for
`btrix-popover`s here~~ _done!_ & in the superadmin stats card
Resolves https://github.com/webrecorder/browsertrix/issues/2629
## Changes
- Fixes user guide not opening to the correct page when not using the
workflow editor
- Fixes out of date instructions in "starting a crawl" user guide
- Updates user guide so that the content makes more sense for both
logged in and non-logged in users, including moving the introduction
section so that the user guide navigation categories are all displayed
(see screenshot)
## Screenshots
| Page | Image/video |
| ---- | ----------- |
| Dashboard | <img width="517" alt="Screenshot 2025-05-27 at 5 09 07 PM"
src="https://github.com/user-attachments/assets/481ac817-d591-4ca9-a4be-532fad586fcf"
/> |
<!-- ## Follow-ups -->
---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Fixes#2425
## Changed
- Switch backend to primarily using number of browser windows rather
than scale multiplier (including migration to calculate `browserWindows`
from `scale` for existing workflows and crawls)
- Still support `scale` in addition to `browserWindows` in input models
for creating and updating workflows and re-adjusting live crawl scale
for backwards compatibility
- Adds new `max_browser_windows` value to Helm chart, but calculates the
value from `max_crawl_scale` as fallback for users with that value
already set in local charts
- Rework frontend to allow users to select multiples of
`crawler_browser_instances` or any value below
`crawler_browser_instances` for browser windows. For instance, with
`crawler_browser_instances=4` and `max_browser_windows=8`, the user
would be presented with the following options: 1, 2, 3, 4, 8
- Sets maximum width of screencast to image width returned by `message`
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
- Sticks workflow form save/run buttons to the viewport if all the
required fields are filled
- Adds keyboard shortcuts to save (cmd/ctrl + S to save, cmd/ctrl +
Enter to save and run)
- Adds "Cancel" button to new workflow
- Handles `paused` workflow state.
- Adds "Copy Crawl ID" and "View Archived Item" buttons to workflow
detail
- Fixes file size not updating in workflow crawls list
- Fixes superadmin banner showing over workflow tabs
- Refactors workflow detail API calls to use `Task` to improve poll
performance.
- Fixes execution time rendering when less than a minute
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- Renames "Running Crawls" -> "Active Crawls" in superadmin app bar
- Shows number of active crawls next to link
- Refreshes active crawl list every 30 seconds
- Standardizes browser title
- add 'pause' crawl state (fixes#2567)
- gracefully shut down crawler pods, and then redis pod when paused
- crawler uploads WACZ before shutting down (dependent on
webrecorder/browsertrix-crawler#824, supported in 1.6.1+)
- add 'paused_at' on crawl spec to indicate when crawl is paused
- support max pause time limit, after which crawl becomes automatically
stopped.
- add 'stopped_pause_expired' when pause automatically expires and crawl
is stopped
- /crawl/<id>/{pause,resume} apis to toggle 'paused' on crawl spec
- ui: add pause/resume button, paused state (partially addresses #2568)
- ui: add pausing/resuming derivative states when crawl is running and
pausing, or paused and not pausing (partially addresses #2569)
- Designed to work with crawler 1.6.1+ which support pausing + uploading on pause
Work on #2566, Fixes#2576
---------
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: sua yoo <sua@suayoo.com>
Follows https://github.com/webrecorder/browsertrix/issues/2603
## Changes
- Updates documentation on "Latest Crawl" tab
- Fixes extra fetch in workflow detail page
- Reverts workflow detail labels from "Duration" back to "Run Duration"
and "Pages" back to "Pages Crawled"