- follow-up to: #2736: remove '^' custom prefix URLs to avoid accumulating '^' via utility function
- Show URL prefix list in settings for custom prefix scope.
- Update user guide with correct custom prefix field.
---------
Co-authored-by: sua yoo <sua@webrecorder.org>
- Automatically compute prefix from starting URL, if no other prefix is
set in custom prefix mode.
- Ensure each prefix is actually a prefix: add '^' to each custom prefix
URL, as include URL path is a regex
- rename 'Extra URL Prefixes' to just 'URL Prefixes' and adjust help
text to indicate that the prefix list is the list that is in scope
- fixes#2735, follow up to #2722
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: sua yoo <sua@webrecorder.org>
Add docs about path / virtual 'access_addressing_style' that is
available for each storage option.
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
- Rename 'Modifying Running Crawls' to 'Running Crawls'
- Add section about pausing/resuming crawls, and that paused crawls will eventually become stopped if not resumed.
- Add new crawl pausing, paused, resuming statuses and icons.
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
- Adds `crawler_network_policy_additional_egress` setting, to add egress
rules to the existing crawler network policy. Useful for when you want
to allow-list a single IPs without replacing the whole network policy.
- Adds docs about `crawler_network_policy_additional_egress` to the customization page.
- Resolves#2121
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- Updates status icons & colors in several places in the app
- Moves "Action Menus" and updated "Status Indicators" design docs from
public docs to Storybook
- [Storybook] Adds `remark-gfm` to enable tables in MDX
- [Storybook] Adds a custom `ColorSwatch` block
- [Browsertrix Docs] Swaps out custom colors and fonts included with
docs for color variables from Hickory and Webrecorder CDN's hosted font
files, respectively
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Resolves https://github.com/webrecorder/browsertrix/issues/2629
## Changes
- Fixes user guide not opening to the correct page when not using the
workflow editor
- Fixes out of date instructions in "starting a crawl" user guide
- Updates user guide so that the content makes more sense for both
logged in and non-logged in users, including moving the introduction
section so that the user guide navigation categories are all displayed
(see screenshot)
## Screenshots
| Page | Image/video |
| ---- | ----------- |
| Dashboard | <img width="517" alt="Screenshot 2025-05-27 at 5 09 07 PM"
src="https://github.com/user-attachments/assets/481ac817-d591-4ca9-a4be-532fad586fcf"
/> |
<!-- ## Follow-ups -->
---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
- Update the docs on k3s deployment for installing `ingress-nginx`, fixes
#2619.
- Also fix the indentation on the code blocks so markdown carries on list
numbering. At the moment the numbering confusingly resets after point 3.
- Update indentation on all code blocks so they show up as part of list +
wrap long commands.
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- Handles `paused` workflow state.
- Adds "Copy Crawl ID" and "View Archived Item" buttons to workflow
detail
- Fixes file size not updating in workflow crawls list
- Fixes superadmin banner showing over workflow tabs
- Refactors workflow detail API calls to use `Task` to improve poll
performance.
- Fixes execution time rendering when less than a minute
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Follows https://github.com/webrecorder/browsertrix/issues/2603
## Changes
- Updates documentation on "Latest Crawl" tab
- Fixes extra fetch in workflow detail page
- Reverts workflow detail labels from "Duration" back to "Run Duration"
and "Pages" back to "Pages Crawled"
Resolves https://github.com/webrecorder/browsertrix/issues/2560
## Changes
- Syncs workflow current form section with user guide section.
- Stickies "User Guide" button to top of viewport so that user guide can
be opened.
- Makes content behind user guide clickable (fixes issues with stickied
elements shifting when user guide is not contained to the parent
element.)
- Decreases size of user guide text when embedded in an iframe.
- Refactors overflow scrim to reuse CSS variables.
---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
- Displays behavior logs wherever error logs are shown
- Makes page URL in detail dialog clickable rather than in row column to
prevent accidental navigation
- Rename "Download Logs" -> "Download All Logs" and add tooltip with
additional context
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Resolves#2504
## Changes
- Allows users to customize autoclick selector in workflows
- Refactors `btrix-syntax-input` to support rendering label and help
text `sl-input`
- Show autoclick selector in workflow / crawl settings
- Adds 'clickSelector' with default of 'a' to backend crawl config.
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
- add 'imagePullPolicy' field to each crawler channel declaration
- if unset, defaults to the setting in the existing
'crawler_image_pull_policy' field.
fixes#2522
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
- Moves "Per-Page Limits" fields to new "Page Behavior" section
- Fixes workflow settings closing tags with refactor to how sections are
rendered
- Updates user guide with behaviors documentation
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- Updated how collections gallery and presentation and sharing pages
- Collections gallery page content extracted from blog post, linked from blog post
- Each page has one video covering the gallery setting and individual collection presentation
- Cleaned up text on both to avoid duplicated content (thanks @DaleLore)
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: DaleLore <DaleLoreNY@gmail.com>
Fixes#2406
Converts migration 0042 to launch a background job (parallelized across
several pods) to migrate all crawls by optimizing their pages and
setting `version: 2` on the crawl when complete.
Also Optimizes MongoDB queries for better performance.
Migration Improvements:
- Add `isMigrating` and `version` fields to `BaseCrawl`
- Add new background job type to use in migration with accompanying
`migration_job.yaml` template that allows for parallelization
- Add new API endpoint to launch this crawl migration job, and ensure
that we have list and retry endpoints for superusers that work with
background jobs that aren't tied to a specific org
- Rework background job models and methods now that not all background
jobs are tied to a single org
- Ensure new crawls and uploads have `version` set to `2`
- Modify crawl and collection replay.json endpoints to only include
fields for replay optimization (`initialPages`, `pageQueryUrl`,
`preloadResources`) if all relevant crawls/uploads have `version` set to
`2`
- Remove `distinct` calls from migration pathways
- Consolidate collection recompute stats
Query Optimizations:
- Remove all uses of $group and $facet
- Optimize /replay.json endpoints to precompute preload_resources, avoid
fetching crawl list twice
- Optimize /collections endpoint by not fetching resources
- Rename /urls -> /pageUrlCounts and avoid $group, instead sort with
index, either by seed + ts or by url to get top matches.
- Use $gte instead of $regex to get prefix matches on URL
- Use $text instead of $regex to get text search on title
- Remove total from /pages and /pageUrlCounts queries by not using
$facet
- frontend: only call /pageUrlCounts when dialog is opened.
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
- Merges existing collection content into one page
- Updates ArchiveWeb.page link
- Adds redirect from /collections → /collection
- Moves content relevant to presentation & sharing out of the intro
- Adds new content about sharing collections!
---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: sua yoo <sua@webrecorder.org>
Resolves https://github.com/webrecorder/browsertrix/issues/2298
## Changes
- Slugs added to collections, can be specified separately when creating
or updating collections or else is based off of supplied collection name
- Migration added to backfill slugs for existing collections
- Redirect collection to newest slug if changed
- Adds option to copy public profile link to "Public Collections" action
menu
- Show "Back to <Org>" link instead of breadcrumbs
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- Renames `inject_analytics` to `inject_extra` and updates docs
- Manually tracks page views to enable passing custom props
- Tracks copying collection share link and downloading a public
collection
---------
Co-authored-by: emma <hi@emma.cafe>
- Enables creating a public org profile page with description and
website at `/profile/<org slug>`
- Updates current "Overview" page to be "Dashboard", found under
`/dashboard`
- Organizes org "General" settings tab by "General", "Profile", and
"Developer Tools"
- Adds sign up banner to log in page for consistent CTA banners
- Updates copy and docs to support changes
- Allows user to set collection to private, public, or unlisted
- Adds route for public collection page with basic page layout
- Refactors copy button to abstract clipboard functionality
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: emma <hi@emma.cafe>
Fixes#2170
The number of days to delay file replication deletion by is configurable
in the Helm chart with `replica_deletion_delay_days` (set by default to
7 days in `values.yaml` to encourage good practice, though we could
change this).
When `replica_deletion_delay_days` is set to an int above 0, when a
delete replica job would otherwise be started as a Kubernetes Job,
a CronJob is created instead with a cron schedule set to run yearly,
starting x days from the current moment. This cronjob is then deleted by
the operator after the job successfully completes. If a failed
background job is retried, it is re-run immediately as a Job rather
than being scheduled out into the future again.
---------
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
Closes#2222
Adds a runtime script that gets set to either inject the plausible
script tags, or do nothing, that runs at initialization of the frontend
container.
This PR syncs the weblate branch back to main, and adds:
- New Spanish translations contributed in weblate!
- New GH action which reformats weblate commits with `localize:extract`,
ensuring strings are not changed, and also runs `localize:build` to
extract templates (as tested in PR: #2207)
- Reformats existing XLIFF files to match `localize:extract`, including
adding self-closing tags.
- Update on generated templates
- Prettier: disable alpha sorting XML attributes to hopefully minimize
conflcits
- Strings: change some strings that use `&` to be wrapped in html`` to
ensure proper unencoding, or just switch to 'and' where that's not
possible.
---------
Co-authored-by: Weblate (bot) <hosted@weblate.org>
Co-authored-by: Webrecorder Dev <dev@webrecorder.org>
Co-authored-by: Kamborio <Kamborio15@users.noreply.hosted.weblate.org>
Co-authored-by: Clara Itzel <missclaraitzel@gmail.com>
Co-authored-by: weblate <weblate@users.noreply.github.com>
Co-authored-by: emma <hi@emma.cafe>
Co-authored-by: sua yoo <sua@webrecorder.org>
can either link to redoc hosted elsewhere or make a local copy:
- for local frontend build, just redirect to
http://localhost:30870/api/redoc
- for deployment, make local copy: run copy-api-docs.sh, copy locally
from prod and serve at /api/
- copy-api-docs.sh copies openapi.json, redoc and logo to /api/ dir
- if analytics enabled, also injects analytics scripts
- for local testing, run copy-api-docs.sh and then run mkdocs serve
- ci: copy from prod server
- fixes#1582
- Adds a missing `/docs` to the docs edit urls
- Adds instructions for installing `mkdocs-material` with pipx & uvx,
and describes correct place to run `mkdocs serve` from
- Fixes edit url, opens in a new tab
- Adds Plausible
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
languages specified in `translatedLocales`
- Updates user language preference beta badge text
- Adds link to translation contribution docs
- Refactors component name
---------
Co-authored-by: SuaYoo <SuaYoo@users.noreply.github.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: emma <hi@emma.cafe>
- Shows language selector on log in page
- Adds tabs to account settings
- Adds language preference selector on account settings
- Adds issue template for new language
- Updates dev docs for frontend localization
---------
Co-authored-by: SuaYoo <SuaYoo@users.noreply.github.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Fixes#2106
Docs are now hosted as part of the frontend at `/docs` by default.
- If `docs_url` is set in the helm chart, the `/docs` endpoint will
redirect to that endpoint instead
- Use multi-stage python image to build mkdocs as part of frontend, then
copy static output
- Dir layout: mkdocs.yml and docs into frontend/docs
- CI: Update docs build GH action to use new path
- Update all frontend paths to use `/docs/` instead of
`https://docs.browsertrix.com/`
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>