- Displays behavior logs wherever error logs are shown
- Makes page URL in detail dialog clickable rather than in row column to
prevent accidental navigation
- Rename "Download Logs" -> "Download All Logs" and add tooltip with
additional context
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Resolves#2504
## Changes
- Allows users to customize autoclick selector in workflows
- Refactors `btrix-syntax-input` to support rendering label and help
text `sl-input`
- Show autoclick selector in workflow / crawl settings
- Adds 'clickSelector' with default of 'a' to backend crawl config.
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Allows users to save a workflow with an empty "Link Selectors" table,
using the default value. This is aligned with how we use default values
for other empty inputs, and prevents a case where a user may
inadvertently removed a row and now cannot save a workflow with the
default link selector.
Also updates the info text to show the default value.
Backend work for #2524
This PR adds a second dedicated endpoint similar to `/errors`, as a
combined log endpoint would give a false impression of being the
complete crawl logs (which is far from what we're serving in Browsertrix
at this point).
Eventually when we have support for streaming live crawl logs in
`crawls/<id>/logs` I'd ideally like to deprecate these two dedicated
endpoints in favor of using that, but for now this seems like the best
solution.
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Follow-up to #2152
Related to https://github.com/webrecorder/browsertrix/pull/2487
This PR provides very basic validation of the `config.selectLinks`
argument on workflow creation and update. Namely, it checks that:
- `config.selectLinks` is not an empty array
- Each entry consists of two non-empty text sequences separated by `->`
At this point we're not validating the actual CSS selector on the
backend, though we could add that down the road.
Tests have been added accordingly.
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Backend support for #2151
Adds support for specifying custom behaviors via a list of strings.
When workflows are added or modified, minimal backend validation is done
to ensure that all custom behavior URLs are valid URLs (after removing
the git prefix and custom query arguments).
A separate `POST /crawlconfigs/validate/custom-behavior` endpoint is
also added, which can be used to validate a custom behavior URL. It
performs the same syntax check as above and then:
- For URL directly to behavior file, ensures URL resolves and returns a
2xx/3xx status code
- For Git repositories, uses `git ls-remote` to ensure they exist (and
that branch exists if specified)
---------
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
- add 'imagePullPolicy' field to each crawler channel declaration
- if unset, defaults to the setting in the existing
'crawler_image_pull_policy' field.
fixes#2522
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
- Displays built-in behaviors as single field in workflow settings
- Standardizes how "None" is displayed in workflow settings
- Refactors behavior names into enum
Ensure we're on the latest versions CI actions + python (except lint check, due to issue)
Also allow running the Microk8s tests on demand with workflow dispatch
Follow-up to #2495, actually ensure org subscription data is in included
in admin email response
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
- Removes chart values that are unused
- Also change `local-mongo.default` -> `local-mongo`,
`local-minio.default` -> `local-minio` as some users have reported
issues with `.default` and it will certainly break if not deploying
Browsertrix in the `default `namespace.
if logged in user is not part of any orgs, still allow logging in,
instead of throwing an exception due to accessing non-existent org
---------
Co-authored-by: sua yoo <sua@suayoo.com>
Reported by @tw4l
Quick fix for the bug I introduced in 1bc3c35 in #2481. I didn't
properly test on the workflow editor in a "new workflow" state, and
didn't realize that the component that fetches the workflow state for an
existing workflow wouldn't be rendered for a new workflow, so the update
to the loading state never occurred for new workflows. This fix
explicitly sets `isCrawlRunning` to `false` instead of `null` for new
workflows, so that the loading state isn't displayed.
Tested locally with both new and existing workflows (in both non-running
and running states).
- fix jwt_token_lifetime being in hours, not minutes, remove extra * 60
- don't return userids in user list for org admins, instead just key
users by email, which is already unique
Part of https://github.com/webrecorder/browsertrix/issues/2366
## Changes
- Displays latest running crawl status when editing workflow
- Disables "Run Now" button if crawl is currently running
Currently, clicking "Run Now" will result in a preventable server error
if the crawl is already running. The change in this PR is in preparation
for being able to update a currently running crawl and doesn't require
any backend changes.
## Manual testing
1. Log in as crawler
2. Go to edit crawl workflow
3. Open same workflow in another tab
4. Run the workflow
5. Go back to edit tab. Verify "Starting" status is shown next to "Save"
button and "Run Crawl" button is disabled
## Screenshots
| Page | Image/video |
| ---- | ----------- |
| Edit Workflow | <img width="354" alt="Screenshot 2025-03-11 at 1 34
07 PM"
src="https://github.com/user-attachments/assets/02f7fb4a-219d-43a4-bb1f-1f2b40ac1480"
/> |
<!-- ## Follow-ups -->
---------
Co-authored-by: emma <hi@emma.cafe>
Hides "Back to [org name]" breadcrumb when viewing a public/unlisted
collection when the public gallery isn't enabled for the org (except
when logged into that org).
- Moves "Per-Page Limits" fields to new "Page Behavior" section
- Fixes workflow settings closing tags with refactor to how sections are
rendered
- Updates user guide with behaviors documentation
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
- Updated how collections gallery and presentation and sharing pages
- Collections gallery page content extracted from blog post, linked from blog post
- Each page has one video covering the gallery setting and individual collection presentation
- Cleaned up text on both to avoid duplicated content (thanks @DaleLore)
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: DaleLore <DaleLoreNY@gmail.com>
- Add /thumbnail collections endpoint to serve the thumbnail as an image for public
collections.
- Also fix uploading thumbnail images to use correct mime, if available.
Fixes#2459
- Set `/data/` as primary storage `access_endpoint_url` in nightly test
chart
- Modify nightly test GH Actions workflow to spawn a separate job per
nightly test module using dynamic matrix
- Set configuration not to fail other jobs if one job fails
- Modify failing tests:
- Add fixture to background job nightly test module so it can run alone
- Add retry loop to crawlconfig stats nightly test so it's less
dependent on timing
GitHub limits each workflow to 256 jobs, so this should continue to be
able to scale up for us without issue.
---------
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
RWP (2.3.3+) can determine if the 'Download Archive' menu item should be
showed based on the value of downloadUrl.
If set to 'null', will hide the menu item:
- set downloadUrl to public collection download for public collections
replay
- set downloadUrl to null for private collection and crawl replay to
hide the download menu item in RWP (otherwise have to add the
auth_header query with bearer token and should assess security before
doing that..)
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Resolves https://github.com/webrecorder/browsertrix/issues/2452
## Changes
- Displays page count and collection size in listing grid
- Displays month if collection period is in the same year
- Displays collection size in About > Details section
- Minor refactor: move byte formatting into `localize.ts` utility file,
move slash (`/`) separator into own utility file
- should avoid gunicorn worker timeouts for long running migrations,
also fixes#2439
- add main_migrations as entrypoint to just run db migrations, using
existing init_ops() call
- first run 'migrations' container with same resources as 'app' and 'op'
- additional typing for initializing db
- cleanup unused code related to running only once, waiting for db to be ready
- fixes#2447