browsertrix

Author	SHA1	Message	Date
Tessa Walsh	0c8c397fca	Add option to fail crawl if not logged in (#2754 ) This PR adds a new checkbox to both page and seed crawl workflow types, which will fail the crawl if behaviors detect the browser is not logged in for supported sites. Changes include: - Backend support for the new crawler flag - A new `failed_not_logged_in` crawl state - Checkbox workflow editor and config details in the frontend (currently in the Scope section - I think it makes sense to have this option up front, but worth considering) - User Guide documentation of new option - A new nightly test for the new workflow option and `failed_not_logged_in` state --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: sua yoo <sua@webrecorder.org>	2025-07-28 22:58:43 -07:00
sua yoo	7df3cb718d	feat: Duplicate workflows with seed file (#2744 ) Resolves https://github.com/webrecorder/browsertrix/issues/2732 ## Changes Allows users to duplicate workflows with a seed file.	2025-07-22 21:20:12 -07:00
Ilya Kreymer	795a1a6f58	feat: Frontend upload seed url list (#2761 ) Resolves #2646 Depends on #2710 ## Changes (Copied from #2689) - Allows users to specify URL list as file. - Allow uploading a text file of URLs - Allow specifying >100 URLs into URL list, where they will turn into an uploaded list automatically. --------- Co-authored-by: sua yoo <sua@suayoo.com>	2025-07-22 20:17:27 -07:00
Ilya Kreymer	2f9a61f6be	custom prefix additional fixes (#2746 ) - follow-up to: #2736: remove '^' custom prefix URLs to avoid accumulating '^' via utility function - Show URL prefix list in settings for custom prefix scope. - Update user guide with correct custom prefix field. --------- Co-authored-by: sua yoo <sua@webrecorder.org>	2025-07-18 18:21:32 -07:00
sua yoo	9e581cbb7d	fix: Improve embedded user guide UX (#2630 ) Resolves https://github.com/webrecorder/browsertrix/issues/2629 ## Changes - Fixes user guide not opening to the correct page when not using the workflow editor - Fixes out of date instructions in "starting a crawl" user guide - Updates user guide so that the content makes more sense for both logged in and non-logged in users, including moving the introduction section so that the user guide navigation categories are all displayed (see screenshot) ## Screenshots \| Page \| Image/video \| \| ---- \| ----------- \| \| Dashboard \| <img width="517" alt="Screenshot 2025-05-27 at 5 09 07 PM" src="https://github.com/user-attachments/assets/481ac817-d591-4ca9-a4be-532fad586fcf" /> \| <!-- ## Follow-ups --> --------- Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>	2025-06-03 13:38:51 -07:00
Tessa Walsh	dc41468daf	Allow users to run crawls with 1 or 2 browser windows (#2627 ) Fixes #2425 ## Changed - Switch backend to primarily using number of browser windows rather than scale multiplier (including migration to calculate `browserWindows` from `scale` for existing workflows and crawls) - Still support `scale` in addition to `browserWindows` in input models for creating and updating workflows and re-adjusting live crawl scale for backwards compatibility - Adds new `max_browser_windows` value to Helm chart, but calculates the value from `max_crawl_scale` as fallback for users with that value already set in local charts - Rework frontend to allow users to select multiples of `crawler_browser_instances` or any value below `crawler_browser_instances` for browser windows. For instance, with `crawler_browser_instances=4` and `max_browser_windows=8`, the user would be presented with the following options: 1, 2, 3, 4, 8 - Sets maximum width of screencast to image width returned by `message` --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: sua yoo <sua@suayoo.com> Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>	2025-06-03 13:37:30 -07:00
sua yoo	6b510fe89c	fix: Sync user guide to correct workflow section (#2592 ) Resolves https://github.com/webrecorder/browsertrix/issues/2560 ## Changes - Syncs workflow current form section with user guide section. - Stickies "User Guide" button to top of viewport so that user guide can be opened. - Makes content behind user guide clickable (fixes issues with stickied elements shifting when user guide is not contained to the parent element.) - Decreases size of user guide text when embedded in an iframe. - Refactors overflow scrim to reuse CSS variables. --------- Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>	2025-05-08 14:41:35 -07:00
sua yoo	1fa43335c0	feat: Apply saved workflow settings to current crawl (#2514 ) Resolves https://github.com/webrecorder/browsertrix/issues/2366 ## Changes Allows users to update current crawl with newly saved workflow settings. ## Manual testing 1. Log in as crawler 2. Start a crawl 3. Go to edit workflow. Verify "Update Crawl" button is shown 4. Click "Update Crawl". Verify crawl is updated with new settings --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-04-29 11:43:14 -07:00
sua yoo	c2a11ccf10	deps: Upgrade main frontend dependencies (#2551 ) - Upgrades typescript-eslint to a more performant version and related dependencies. Note that these dependencies were not upgraded to the latest version to avoid upgrading to eslint 9 at this time. - Upgrades Lit one minor version	2025-04-15 13:31:50 -07:00
sua yoo	7c6bae8d61	feat: Add custom behaviors to org crawling defaults (#2546 ) Resolves https://github.com/webrecorder/browsertrix/issues/2513 ## Changes - Allows org admins to set custom behaviors as crawling defaults - Shows warning text if both autoscroll/autoclick and custom behaviors are enabled - Refactors `infoTextStrings` -> `infoTextFor` to match other label/string matchers --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2025-04-09 04:10:30 -04:00
sua yoo	58749602ff	Move custom behaviors behind checkbox (#2545 ) WIP for https://github.com/webrecorder/browsertrix/issues/2541 ## Changes - Moves custom behaviors table to behind "Use Custom Behaviors" checkbox. - Updates autoclick selector to match checkbox reveal layout. - Adds minimum viable user guide documentation of custom behaviors.	2025-04-09 00:16:02 +02:00
Tessa Walsh	55bedcb0b7	feat: Custom autoclick selector (#2517 ) Resolves #2504 ## Changes - Allows users to customize autoclick selector in workflows - Refactors `btrix-syntax-input` to support rendering label and help text `sl-input` - Show autoclick selector in workflow / crawl settings - Adds 'clickSelector' with default of 'a' to backend crawl config. --------- Co-authored-by: sua yoo <sua@suayoo.com> Co-authored-by: sua yoo <sua@webrecorder.org> Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>	2025-04-08 05:53:40 +02:00
sua yoo	f6481272f4	feat: Specify custom link selectors (#2487 ) - Allows users to specify page link selectors in workflow "Scope" section - Adds new `<btrix-syntax-input>` component for syntax-highlighted inputs - Refactors highlight.js implementation to prevent unnecessary language loading - Updates exclusion table header styles --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net> Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2025-04-02 00:32:34 -07:00
sua yoo	df8c80f3cc	task: Display built-in behaviors as list (#2518 ) - Displays built-in behaviors as single field in workflow settings - Standardizes how "None" is displayed in workflow settings - Refactors behavior names into enum	2025-03-26 17:09:02 -07:00
sua yoo	ac1236f15b	feat: Add behaviors section to workflow form (#2464 ) - Moves "Per-Page Limits" fields to new "Page Behavior" section - Fixes workflow settings closing tags with refactor to how sections are rendered - Updates user guide with behaviors documentation --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2025-03-11 11:40:20 -07:00
sua yoo	18e72262dd	feat: Enable viewing all workflow form sections at once (#2310 ) - Displays workflow form as collapsible sections - Combines run now toggle into submit - Fixes exclusion field errors not preventing form submission - Refactors `<btrix-observable>` into new `Observable` controller --------- Co-authored-by: emma <hi@emma.cafe>	2025-02-04 12:56:36 -08:00
Tessa Walsh	5684e896af	Add support for autoclick (#2313 ) Fixes #2259 This PR brings backend and frontend support for the new autoclick behavior in Browsertrix, introduces in Browsertrix 1.5.0+ On the backend, we introduce `min_autoclick_crawler_image` to `values.yaml`, with a default value of `"docker.io/webrecorder/browsertrix-crawler:1.5.0"`. If this is set and the crawler version for a new crawl is less than this value, the autoclick behavior is removed from the behaviors list in the configmap created for the crawl. The one caveat for this is that a crawler image tag like "latest" will always be parsed as greater than `min_autoclick_crawler_image`, so there is the potential for the crawler to run into issues if using a non-numeric image tag with an older version of the crawler. For production we use hardcoded specific versions of the crawler except for the dev channel, which from here on out will including autoclick support, so I think this should be okay (and is also true of the existing implementation for checking `min_qa_crawler_image`). On the frontend, I've added a checkbox (unchecked by default) in the "Limits" section just below the current checkbox for autoscroll. We might want to move these to a different section eventually - I'm not sure Limits is the right place for them - but I wanted to be consistent with things as they are. --------- Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>	2025-01-16 12:44:00 -08:00
Emma Segal-Grossman	b650762a45	Allow configuring available languages from helm chart (#2230 ) Closes #2223 - [x] Adds `localesAvailable` to `/api/settings` endpoint, and uses that list if available, rather than the full list of translated locales, to determine which options to display to users - [x] ~~Uses the user's browser locales, filtered to the current language setting, for formatting numbers, dates, and durations~~ - [x] Adds & persists checkbox for "use same language for formatting dates and numbers" in user settings - [x] Replaces uses of `sl-format-bytes` with `localize.bytes(...)`, and `sl-format-date` with replacement `btrix-format-date` that properly handles fallback locales - [x] Caches all number/duration/datetime formatters by a combined key consisting of app language, browser language, browser setting, and formatter options so that all formatters can be reused if needed (previously any formatter with non-default options would be recreated every render) - [x] Splits out ordinal formatting from number formatter, as it didn't make much sense in some non-English locales - [x] Adds a little demo of date/time/duration/number formatting so you can see what effect your language settings have https://github.com/user-attachments/assets/724858cb-b140-4d72-a38d-83f602c71bc7 --------- Signed-off-by: emma <hi@emma.cafe> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>	2024-12-13 22:31:26 -05:00
sua yoo	37c0b06622	feat: Localize dates, bytes, and numbers (#2146 ) - Defaults to user's browser language preference when displaying dates, bytes, and numbers (with caveats.) This fixes an issue where numbers were always formatted in English. - Shows user's browser language preferences in language dropdown - Fixes timezone not displayed in archived item detail start and finish times - Standardizes number formatting - Sets timezone that unit tests run in --------- Co-authored-by: SuaYoo <SuaYoo@users.noreply.github.com> Co-authored-by: Emma Segal-Grossman <hi@emma.cafe> Co-authored-by: emma-sg <emma-sg@users.noreply.github.com>	2024-11-19 21:49:02 -08:00
Vinzenz Sinapius	bb6e703f6a	Configure browsertrix proxies (#1847 ) Resolves #1354 Supports crawling through pre-configured proxy servers, allowing users to select which proxy servers to use (requires browsertrix crawler 1.3+) Config: - proxies defined in btrix-proxies subchart - can be configured via btrix-proxies key or separate proxies.yaml file via separate subchart - proxies list refreshed automatically if crawler_proxies.json changes if subchart is deployed - support for ssh and socks5 proxies - proxy keys added to secrets in subchart - support for default proxy to be always used if no other proxy configured, prevent starting cluster if default proxy not available - prevent starting manual crawl if previously configured proxy is no longer available, return error - force 'btrix' username and group name on browsertrix-crawler non-root user to support ssh Operator: - support crawling through proxies, pass proxyId in CrawlJob - support running profile browsers which designated proxy, pass proxyId to ProfileJob - prevent starting scheduled crawl if previously configured proxy is no longer available API / Access: - /api/orgs/all/crawlconfigs/crawler-proxies - get all proxies (superadmin only) - /api/orgs/{oid}/crawlconfigs/crawler-proxies - get proxies available to particular org - /api/orgs/{oid}/proxies - update allowed proxies for particular org (superadmin only) - superadmin can configure which orgs can use which proxies, stored on the org - superadmin can also allow an org to access all 'shared' proxies, to avoid having to allow a shared proxy on each org. UI: - Superadmin has 'Edit Proxies' dialog to configure for each org if it has: dedicated proxies, has access to shared proxies. - User can select a proxy in Crawl Workflow browser settings - Users can choose to launch a browser profile with a particular proxy - Display which proxy is used to create profile in profile selector - Users can choose with default proxy to use for new workflows in Crawling Defaults --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-10-02 18:35:45 -07:00
sua yoo	612bbb6f42	feat: Merge workflow job types (#2068 ) Resolves https://github.com/webrecorder/browsertrix/issues/2073 ### Changes - Removes "URL List" and "Seeded Crawl" job type distinction and adds as additional crawl scope types instead. - 'New Workflow' button defaults to Single Page - 'New Workflow' dropdown includes Page Crawl (Single Page, Page List, In-Page Links) and Site Crawl (Page in Same Directory, Page on Same Domain, + Subdomains and Custom Page Prefix) - Enables specifying `DOCS_URL` in `.env` - Additional follow-ups in #2090, #2091	2024-09-25 10:37:18 -04:00
sua yoo	988d9c9e2b	feat: Allow org admins to set default workflow configs (#2020 ) - Adds new org settings tab for updating crawl details - Refactors `workflow-editor` to move out utility functions - Updates user guide on org settings --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2024-08-29 16:49:22 -07:00

22 Commits