Commit Graph

1581 Commits

Author SHA1 Message Date
Ilya Kreymer
b5b4c4da15 version: update to 1.14.8 2025-03-31 14:17:53 -07:00
Ilya Kreymer
62e47a8817
support overriding crawler image pull policy per channel (#2523)
- add 'imagePullPolicy' field to each crawler channel declaration
- if unset, defaults to the setting in the existing
'crawler_image_pull_policy' field.

fixes #2522

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-03-31 14:11:41 -07:00
sua yoo
df8c80f3cc
task: Display built-in behaviors as list (#2518)
- Displays built-in behaviors as single field in workflow settings
- Standardizes how "None" is displayed in workflow settings
- Refactors behavior names into enum
2025-03-26 17:09:02 -07:00
Ilya Kreymer
61809ab3c5 ci: typo fix, move 'workflow_dispatch' to correct place 2025-03-26 13:02:38 -07:00
Ilya Kreymer
0925da6768
CI: Update python version + script (#2521)
Ensure we're on the latest versions CI actions + python (except lint check, due to issue)
Also allow running the Microk8s tests on demand with workflow dispatch
2025-03-26 12:53:18 -07:00
Ilya Kreymer
b3950dd03f version: update to 1.14.7 2025-03-25 17:25:24 -07:00
Ilya Kreymer
9250befea4
ingress: remove X-Forward-Proto snippet, no longer needed (and now possibly considered unsafe) (#2519)
X-Forward-Proto is now already provided by the standard ingress-nginx config
2025-03-25 17:24:55 -07:00
Ilya Kreymer
21a372057b
Fix user emails use userout (#2511)
Follow-up to #2495, actually ensure org subscription data is in included
in admin email response

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-03-24 12:04:39 -07:00
Ilya Kreymer
46be6a0cf6 version: bump to 1.14.6 2025-03-20 16:52:20 -07:00
Henry Wilkinson
c797e8446d
docs: Add UI documentation page on status icons (#2506)
### Changes
- Adds status icons page
- Moves action menus page to the UI development docs folder
- Fixes sentence fragment
2025-03-20 16:51:20 -07:00
Henry Wilkinson
c770b9ec22
frontend: move name field to the top of the signup form (#2508)
Fixes #2507

Does what it says on the tin!
2025-03-20 16:50:43 -07:00
Ilya Kreymer
4c0ddd0fe3
crawl replay: remove isSeed=true from initialPages query (#2509)
- matches initial query for collections
- fixes 'Show Non-Seed Pages' not appearing for crawl replay
2025-03-20 15:03:41 -07:00
Ilya Kreymer
cb14ac3a00
add org subs info to /api/users/emails endpoint (#2495)
Include additional info in this superadmin-only endpoint.
2025-03-20 08:31:23 -07:00
Ilya Kreymer
b63caf74ad
cleanup unused chart values + change mongo default (#2484)
- Removes chart values that are unused
- Also change `local-mongo.default` -> `local-mongo`,
`local-minio.default` -> `local-minio` as some users have reported
issues with `.default` and it will certainly break if not deploying
Browsertrix in the `default `namespace.
2025-03-20 08:30:45 -07:00
Henry Wilkinson
cf6690e74a
docs: add development section on action menus (#2429)
Closes #2428
2025-03-19 18:46:09 -04:00
Ilya Kreymer
c9c32d86e2
login: don't set default slug if user not part of any orgs #2491 (#2492)
if logged in user is not part of any orgs, still allow logging in,
instead of throwing an exception due to accessing non-existent org

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2025-03-19 15:23:16 -07:00
sua yoo
0bc210d905
devex: Add frontend code snippet & update dev docs (#2494)
- Adds VSCode file template for component unit testing.
- Updates development docs with details on UI dev
2025-03-19 14:22:20 -07:00
Emma Segal-Grossman
b471192cbc
Workflow editor footer button: ensure isCrawlRunning is false if editing a new workflow (#2496)
Reported by @tw4l 

Quick fix for the bug I introduced in 1bc3c35 in #2481. I didn't
properly test on the workflow editor in a "new workflow" state, and
didn't realize that the component that fetches the workflow state for an
existing workflow wouldn't be rendered for a new workflow, so the update
to the loading state never occurred for new workflows. This fix
explicitly sets `isCrawlRunning` to `false` instead of `null` for new
workflows, so that the loading state isn't displayed.

Tested locally with both new and existing workflows (in both non-running
and running states).
2025-03-19 15:44:16 -04:00
Ilya Kreymer
6be1f6674c
fixes token lifetime bug / improve security (#2490)
- fix jwt_token_lifetime being in hours, not minutes, remove extra * 60
- don't return userids in user list for org admins, instead just key
users by email, which is already unique
2025-03-19 10:07:09 -07:00
Ilya Kreymer
eb300815a7
Fixes #2488 (#2493)
- Fixes #2488 
- Adds a k8s api call to set `suspend=false` on Job when associated
CrawlJob is finished.
- bump version - released as 1.14.5
2025-03-19 10:06:25 -07:00
sua yoo
d2601a037e
feat: Show running crawl when editing workflow (#2481)
Part of https://github.com/webrecorder/browsertrix/issues/2366

## Changes

- Displays latest running crawl status when editing workflow
- Disables "Run Now" button if crawl is currently running

Currently, clicking "Run Now" will result in a preventable server error
if the crawl is already running. The change in this PR is in preparation
for being able to update a currently running crawl and doesn't require
any backend changes.

## Manual testing

1. Log in as crawler
2. Go to edit crawl workflow
3. Open same workflow in another tab
4. Run the workflow
5. Go back to edit tab. Verify "Starting" status is shown next to "Save"
button and "Run Crawl" button is disabled

## Screenshots

| Page | Image/video |
| ---- | ----------- |
| Edit Workflow | <img width="354" alt="Screenshot 2025-03-11 at 1 34
07 PM"
src="https://github.com/user-attachments/assets/02f7fb4a-219d-43a4-bb1f-1f2b40ac1480"
/> |


<!-- ## Follow-ups -->

---------

Co-authored-by: emma <hi@emma.cafe>
2025-03-18 18:54:04 -04:00
Emma Segal-Grossman
89a6e84377
Fix broken thumbnail images not taking up appropriate size on ff (#2486)
Closes #2485 

Also adds alt text to collection thumbnail images.
2025-03-18 18:53:10 -04:00
sua yoo
bcb73932d4
docs: Organize readme and fix doc links (#2479)
Resolves https://github.com/webrecorder/browsertrix/issues/2478

## Changes

- Organizes README
- Fixes relative links in mkdocs

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-03-11 18:37:20 -07:00
Emma Segal-Grossman
b2c5b9bc59
Hide breadcrumbs for private orgs (#2477)
Hides "Back to [org name]" breadcrumb when viewing a public/unlisted
collection when the public gallery isn't enabled for the org (except
when logged into that org).
2025-03-11 15:05:35 -04:00
sua yoo
ac1236f15b
feat: Add behaviors section to workflow form (#2464)
- Moves "Per-Page Limits" fields to new "Page Behavior" section
- Fixes workflow settings closing tags with refactor to how sections are
rendered
- Updates user guide with behaviors documentation

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2025-03-11 11:40:20 -07:00
emma
a42d83c9f6
add content-length and etag headers to thumbnail endpoint 2025-03-10 13:58:41 -04:00
Ilya Kreymer
d8365c734f version: bump to 1.14.4 2025-03-08 15:58:18 -08:00
Ilya Kreymer
00a42515c8
docs: add public collections gallery howto (#2462)
- Updated how collections gallery and presentation and sharing pages
- Collections gallery page content extracted from blog post, linked from blog post
- Each page has one video covering the gallery setting and individual collection presentation
- Cleaned up text on both to avoid duplicated content (thanks @DaleLore)



---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: DaleLore <DaleLoreNY@gmail.com>
2025-03-08 15:57:13 -08:00
Ilya Kreymer
75eb04c37b
Translations update from Hosted Weblate (#2467) (#2471)
Translations update from [Hosted Weblate](https://hosted.weblate.org)
for

[Browsertrix/Browsertrix](https://hosted.weblate.org/projects/browsertrix/browsertrix/).



Current translation status:

![Weblate translation

status](https://hosted.weblate.org/widget/browsertrix/browsertrix/horizontal-auto.svg)

---------

Co-authored-by: Weblate (bot) <hosted@weblate.org>
Co-authored-by: Anne Paz <anelisespaz@gmail.com>
Co-authored-by: weblate <1607653+weblate@users.noreply.github.com>
2025-03-07 12:40:43 -08:00
Emma Segal-Grossman
8078f3866b
Add missing "payment never made" subscription status to superadmin org list (#2457) 2025-03-07 12:38:09 -08:00
sua yoo
fa05d68292
fix: Open and highlight correct workflow form section on tab click (#2463)
Fixes https://github.com/webrecorder/browsertrix/issues/2461

## Changes

Opens workflow form section when clicking on section navigation link,
fixing issue with scroll position impacting unopened panels.
2025-03-07 12:35:24 -08:00
Ilya Kreymer
03fa00df45
set default crawler channel if not set, possible fix for #2458 (#2469)
update default RWP version
2025-03-07 12:32:19 -08:00
Ilya Kreymer
6c192df49d
Add thumbnail endpoint (#2468)
- Add /thumbnail collections endpoint to serve the thumbnail as an image for public
collections.
- Also fix uploading thumbnail images to use correct mime, if available.
2025-03-07 12:29:36 -08:00
Tessa Walsh
13bf818914
Fix nightly tests (#2460)
Fixes #2459 

- Set `/data/` as primary storage `access_endpoint_url` in nightly test
chart
- Modify nightly test GH Actions workflow to spawn a separate job per
nightly test module using dynamic matrix
- Set configuration not to fail other jobs if one job fails
- Modify failing tests:
- Add fixture to background job nightly test module so it can run alone
- Add retry loop to crawlconfig stats nightly test so it's less
dependent on timing

GitHub limits each workflow to 256 jobs, so this should continue to be
able to scale up for us without issue.

---------

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2025-03-06 16:23:30 -08:00
Ilya Kreymer
9466e83d18 version: bump to 1.14.3 2025-03-03 15:20:40 -08:00
Ilya Kreymer
afa892000b
replay api: add downloadUrl to replay endpoints to be used by RWP (#2456)
RWP (2.3.3+) can determine if the 'Download Archive' menu item should be
showed based on the value of downloadUrl.
If set to 'null', will hide the menu item:
- set downloadUrl to public collection download for public collections
replay
- set downloadUrl to null for private collection and crawl replay to
hide the download menu item in RWP (otherwise have to add the
auth_header query with bearer token and should assess security before
doing that..)

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-03-03 14:11:28 -08:00
sua yoo
65a40c4816
feat: Show additional collection details (#2455)
Resolves https://github.com/webrecorder/browsertrix/issues/2452

## Changes

- Displays page count and collection size in listing grid
- Displays month if collection period is in the same year
- Displays collection size in About > Details section
- Minor refactor: move byte formatting into `localize.ts` utility file,
move slash (`/`) separator into own utility file
2025-03-03 13:15:27 -08:00
Ilya Kreymer
e13c3bfb48
move db migrations to initContainers: (#2449)
- should avoid gunicorn worker timeouts for long running migrations,
also fixes #2439
- add main_migrations as entrypoint to just run db migrations, using
existing init_ops() call
- first run 'migrations' container with same resources as 'app' and 'op'
- additional typing for initializing db
- cleanup unused code related to running only once, waiting for db to be ready
- fixes #2447
2025-03-03 13:13:15 -08:00
Ilya Kreymer
702c9ab3b7
Better cacheing of presigned URLs + support for thumbnails (#2446)
Overhauls URL presigning by:
- cache the presigned urls in a flat, separate mongodb collection which
has an expiring index
- update presigned urls if not found / expired automatically in index
- remove logic on storing presignedUrl in files
- support cacheing presigned URL for thumbnails.
- add endpoints to clear presigned urls for org or for all files in all
orgs (superadmin only)
- supersedes #2438, fix for #2437
- removes previous presignedUrl and expireAt data from crawls and QA
runs

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-03-03 12:05:23 -08:00
Ilya Kreymer
631b019baf
optimize public collection loading: (#2444)
- remove query for /collections endpoint just to get the org name
- add orgName to single /collection endpoint, where it is already
available on the backend
2025-03-03 10:13:30 -08:00
Ilya Kreymer
2263745df3
Fix replay.json 400 response for empty collection (#2445)
- fix #2443 
- don't throw error in list_pages() if no crawls provided, just return
empty list
- ensure an empty collection returns 200 on replay.json, add tests
2025-03-03 09:38:19 -08:00
Ilya Kreymer
2e86ee3fcc
Weblate (#2450)
Translations update from [Hosted Weblate](https://hosted.weblate.org)
for
[Browsertrix/Browsertrix](https://hosted.weblate.org/projects/browsertrix/browsertrix/).

Current translation status:

![Weblate translation
status](https://hosted.weblate.org/widget/browsertrix/browsertrix/horizontal-auto.svg)

Co-authored-by: Weblate (bot) <hosted@weblate.org>
Co-authored-by: Anne Paz <anelisespaz@gmail.com>
Co-authored-by: weblate <1607653+weblate@users.noreply.github.com>
2025-03-02 19:46:00 -08:00
Ilya Kreymer
64621ba6c0
frontend: fix rendering when backend not available yet (#2448)
- don't wait for languages to be ready to render UI, as this can result
in empty page if backend can not be reached.
- catch if /api/settings returns an invalid response to show 'backend
initializing' message
- will support initContainers where backend may return 5xx error while
backend is initializing, via #2449

Note: this results in locale picker showing all available locales if
backend is not available, not just filtered ones, but I think that's a
reasonable trade-off.
2025-03-01 14:02:37 -08:00
Emma Segal-Grossman
53b531ce3e
Show download button on public collection pages regardless of collection access (#2442)
Reported here
https://discord.com/channels/895426029194207262/1011678975636013066/1345095899008860224

Public-facing collections (whether public or unlisted) should have the
download button visible if "show download button" is enabled.
2025-02-28 22:07:38 -08:00
Ilya Kreymer
cb52da66dc version: bump to 1.14.2 2025-02-27 14:13:03 -08:00
Tessa Walsh
45aa0a32b6
Calculate total for crawl QA page endpoint (#2435)
Fixes #2434 

Patch fix for a regression in Browsertrix 1.4.0-1.4.1 where total was
not being calculated for QA page list endpoint but still being included
in response, which led to total always being 0 and pages not loading in
the frontend review screen as a result.
2025-02-27 11:46:35 -08:00
Ilya Kreymer
376c9981dc version: bump to 1.14.1 2025-02-26 23:15:01 -08:00
Tessa Walsh
3dc8c825c6
Add superadmin endpoint to readd scheduled workflow cronjobs (#2430)
Adds new superadmin-only `POST /orgs/all/crawlconfigs/reAddCronjobs`
endpoint to update/recreate scheduled workflow cronjobs across all orgs.
2025-02-26 23:13:53 -08:00
Tessa Walsh
da77b066a4
Prevent btrix helper from doing anything to k8s contexts other than docker-desktop (#2431)
The `./btrix` development helper shouldn't be used for anything other
than local dev, which this commit helps to enforce.

When running any command, if the k8s context is anything other than
`docker-desktop` the script will now shut down immediately without doing
anything and print the message: "Attempting to modify context other than
docker-desktop not supported. Quitting."
2025-02-26 23:13:25 -08:00
Ilya Kreymer
67668438c0
ingress: only set ssl-redirect if using tls (#2432)
otherwise, http path should be accessible. Can be used when TLS
termination handled outside of ingress.
2025-02-26 23:12:07 -08:00