browsertrix

Author	SHA1	Message	Date
Tessa Walsh	cd7b695520	Add backend support for custom behaviors + validation endpoint (#2505 ) Backend support for #2151 Adds support for specifying custom behaviors via a list of strings. When workflows are added or modified, minimal backend validation is done to ensure that all custom behavior URLs are valid URLs (after removing the git prefix and custom query arguments). A separate `POST /crawlconfigs/validate/custom-behavior` endpoint is also added, which can be used to validate a custom behavior URL. It performs the same syntax check as above and then: - For URL directly to behavior file, ensures URL resolves and returns a 2xx/3xx status code - For Git repositories, uses `git ls-remote` to ensure they exist (and that branch exists if specified) --------- Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>	2025-04-02 16:20:51 -07:00
Ilya Kreymer	c067a0fe7c	fix qa page sorting: (#2530 ) was sorting on qa.{qa_run_id} after the value was already replaced with 'qa', thus was sorting on non-existent value fixes #2529	2025-04-02 09:25:38 -07:00
sua yoo	f6481272f4	feat: Specify custom link selectors (#2487 ) - Allows users to specify page link selectors in workflow "Scope" section - Adds new `<btrix-syntax-input>` component for syntax-highlighted inputs - Refactors highlight.js implementation to prevent unnecessary language loading - Updates exclusion table header styles --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net> Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2025-04-02 00:32:34 -07:00
Ilya Kreymer	b5b4c4da15	version: update to 1.14.8	2025-03-31 14:17:53 -07:00
Ilya Kreymer	62e47a8817	support overriding crawler image pull policy per channel (#2523 ) - add 'imagePullPolicy' field to each crawler channel declaration - if unset, defaults to the setting in the existing 'crawler_image_pull_policy' field. fixes #2522 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-03-31 14:11:41 -07:00
sua yoo	df8c80f3cc	task: Display built-in behaviors as list (#2518 ) - Displays built-in behaviors as single field in workflow settings - Standardizes how "None" is displayed in workflow settings - Refactors behavior names into enum	2025-03-26 17:09:02 -07:00
Ilya Kreymer	61809ab3c5	ci: typo fix, move 'workflow_dispatch' to correct place	2025-03-26 13:02:38 -07:00
Ilya Kreymer	0925da6768	CI: Update python version + script (#2521 ) Ensure we're on the latest versions CI actions + python (except lint check, due to issue) Also allow running the Microk8s tests on demand with workflow dispatch	2025-03-26 12:53:18 -07:00
Ilya Kreymer	b3950dd03f	version: update to 1.14.7	2025-03-25 17:25:24 -07:00
Ilya Kreymer	9250befea4	ingress: remove X-Forward-Proto snippet, no longer needed (and now possibly considered unsafe) (#2519 ) X-Forward-Proto is now already provided by the standard ingress-nginx config	2025-03-25 17:24:55 -07:00
Ilya Kreymer	21a372057b	Fix user emails use userout (#2511 ) Follow-up to #2495, actually ensure org subscription data is in included in admin email response --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-03-24 12:04:39 -07:00
Ilya Kreymer	46be6a0cf6	version: bump to 1.14.6	2025-03-20 16:52:20 -07:00
Henry Wilkinson	c797e8446d	docs: Add UI documentation page on status icons (#2506 ) ### Changes - Adds status icons page - Moves action menus page to the UI development docs folder - Fixes sentence fragment	2025-03-20 16:51:20 -07:00
Henry Wilkinson	c770b9ec22	frontend: move name field to the top of the signup form (#2508 ) Fixes #2507 Does what it says on the tin!	2025-03-20 16:50:43 -07:00
Ilya Kreymer	4c0ddd0fe3	crawl replay: remove isSeed=true from initialPages query (#2509 ) - matches initial query for collections - fixes 'Show Non-Seed Pages' not appearing for crawl replay	2025-03-20 15:03:41 -07:00
Ilya Kreymer	cb14ac3a00	add org subs info to /api/users/emails endpoint (#2495 ) Include additional info in this superadmin-only endpoint.	2025-03-20 08:31:23 -07:00
Ilya Kreymer	b63caf74ad	cleanup unused chart values + change mongo default (#2484 ) - Removes chart values that are unused - Also change `local-mongo.default` -> `local-mongo`, `local-minio.default` -> `local-minio` as some users have reported issues with `.default` and it will certainly break if not deploying Browsertrix in the `default `namespace.	2025-03-20 08:30:45 -07:00
Henry Wilkinson	cf6690e74a	docs: add development section on action menus (#2429 ) Closes #2428	2025-03-19 18:46:09 -04:00
Ilya Kreymer	c9c32d86e2	login: don't set default slug if user not part of any orgs #2491 (#2492 ) if logged in user is not part of any orgs, still allow logging in, instead of throwing an exception due to accessing non-existent org --------- Co-authored-by: sua yoo <sua@suayoo.com>	2025-03-19 15:23:16 -07:00
sua yoo	0bc210d905	devex: Add frontend code snippet & update dev docs (#2494 ) - Adds VSCode file template for component unit testing. - Updates development docs with details on UI dev	2025-03-19 14:22:20 -07:00
Emma Segal-Grossman	b471192cbc	Workflow editor footer button: ensure `isCrawlRunning` is `false` if editing a new workflow (#2496 ) Reported by @tw4l Quick fix for the bug I introduced in 1bc3c35 in #2481. I didn't properly test on the workflow editor in a "new workflow" state, and didn't realize that the component that fetches the workflow state for an existing workflow wouldn't be rendered for a new workflow, so the update to the loading state never occurred for new workflows. This fix explicitly sets `isCrawlRunning` to `false` instead of `null` for new workflows, so that the loading state isn't displayed. Tested locally with both new and existing workflows (in both non-running and running states).	2025-03-19 15:44:16 -04:00
Ilya Kreymer	6be1f6674c	fixes token lifetime bug / improve security (#2490 ) - fix jwt_token_lifetime being in hours, not minutes, remove extra * 60 - don't return userids in user list for org admins, instead just key users by email, which is already unique	2025-03-19 10:07:09 -07:00
Ilya Kreymer	eb300815a7	Fixes #2488 (#2493 ) - Fixes #2488 - Adds a k8s api call to set `suspend=false` on Job when associated CrawlJob is finished. - bump version - released as 1.14.5	2025-03-19 10:06:25 -07:00
sua yoo	d2601a037e	feat: Show running crawl when editing workflow (#2481 ) Part of https://github.com/webrecorder/browsertrix/issues/2366 ## Changes - Displays latest running crawl status when editing workflow - Disables "Run Now" button if crawl is currently running Currently, clicking "Run Now" will result in a preventable server error if the crawl is already running. The change in this PR is in preparation for being able to update a currently running crawl and doesn't require any backend changes. ## Manual testing 1. Log in as crawler 2. Go to edit crawl workflow 3. Open same workflow in another tab 4. Run the workflow 5. Go back to edit tab. Verify "Starting" status is shown next to "Save" button and "Run Crawl" button is disabled ## Screenshots \| Page \| Image/video \| \| ---- \| ----------- \| \| Edit Workflow \| <img width="354" alt="Screenshot 2025-03-11 at 1 34 07 PM" src="https://github.com/user-attachments/assets/02f7fb4a-219d-43a4-bb1f-1f2b40ac1480" /> \| <!-- ## Follow-ups --> --------- Co-authored-by: emma <hi@emma.cafe>	2025-03-18 18:54:04 -04:00
Emma Segal-Grossman	89a6e84377	Fix broken thumbnail images not taking up appropriate size on ff (#2486 ) Closes #2485 Also adds alt text to collection thumbnail images.	2025-03-18 18:53:10 -04:00
sua yoo	bcb73932d4	docs: Organize readme and fix doc links (#2479 ) Resolves https://github.com/webrecorder/browsertrix/issues/2478 ## Changes - Organizes README - Fixes relative links in mkdocs --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-03-11 18:37:20 -07:00
Emma Segal-Grossman	b2c5b9bc59	Hide breadcrumbs for private orgs (#2477 ) Hides "Back to [org name]" breadcrumb when viewing a public/unlisted collection when the public gallery isn't enabled for the org (except when logged into that org).	2025-03-11 15:05:35 -04:00
sua yoo	ac1236f15b	feat: Add behaviors section to workflow form (#2464 ) - Moves "Per-Page Limits" fields to new "Page Behavior" section - Fixes workflow settings closing tags with refactor to how sections are rendered - Updates user guide with behaviors documentation --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2025-03-11 11:40:20 -07:00
emma	a42d83c9f6	add content-length and etag headers to thumbnail endpoint	2025-03-10 13:58:41 -04:00
Ilya Kreymer	d8365c734f	version: bump to 1.14.4	2025-03-08 15:58:18 -08:00
Ilya Kreymer	00a42515c8	docs: add public collections gallery howto (#2462 ) - Updated how collections gallery and presentation and sharing pages - Collections gallery page content extracted from blog post, linked from blog post - Each page has one video covering the gallery setting and individual collection presentation - Cleaned up text on both to avoid duplicated content (thanks @DaleLore) --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics> Co-authored-by: DaleLore <DaleLoreNY@gmail.com>	2025-03-08 15:57:13 -08:00
Ilya Kreymer	75eb04c37b	Translations update from Hosted Weblate (#2467 ) (#2471 ) Translations update from [Hosted Weblate](https://hosted.weblate.org) for [Browsertrix/Browsertrix](https://hosted.weblate.org/projects/browsertrix/browsertrix/). Current translation status: ![Weblate translation status](https://hosted.weblate.org/widget/browsertrix/browsertrix/horizontal-auto.svg) --------- Co-authored-by: Weblate (bot) <hosted@weblate.org> Co-authored-by: Anne Paz <anelisespaz@gmail.com> Co-authored-by: weblate <1607653+weblate@users.noreply.github.com>	2025-03-07 12:40:43 -08:00
Emma Segal-Grossman	8078f3866b	Add missing "payment never made" subscription status to superadmin org list (#2457 )	2025-03-07 12:38:09 -08:00
sua yoo	fa05d68292	fix: Open and highlight correct workflow form section on tab click (#2463 ) Fixes https://github.com/webrecorder/browsertrix/issues/2461 ## Changes Opens workflow form section when clicking on section navigation link, fixing issue with scroll position impacting unopened panels.	2025-03-07 12:35:24 -08:00
Ilya Kreymer	03fa00df45	set default crawler channel if not set, possible fix for #2458 (#2469 ) update default RWP version	2025-03-07 12:32:19 -08:00
Ilya Kreymer	6c192df49d	Add thumbnail endpoint (#2468 ) - Add /thumbnail collections endpoint to serve the thumbnail as an image for public collections. - Also fix uploading thumbnail images to use correct mime, if available.	2025-03-07 12:29:36 -08:00
Tessa Walsh	13bf818914	Fix nightly tests (#2460 ) Fixes #2459 - Set `/data/` as primary storage `access_endpoint_url` in nightly test chart - Modify nightly test GH Actions workflow to spawn a separate job per nightly test module using dynamic matrix - Set configuration not to fail other jobs if one job fails - Modify failing tests: - Add fixture to background job nightly test module so it can run alone - Add retry loop to crawlconfig stats nightly test so it's less dependent on timing GitHub limits each workflow to 256 jobs, so this should continue to be able to scale up for us without issue. --------- Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>	2025-03-06 16:23:30 -08:00
Ilya Kreymer	9466e83d18	version: bump to 1.14.3	2025-03-03 15:20:40 -08:00
Ilya Kreymer	afa892000b	replay api: add downloadUrl to replay endpoints to be used by RWP (#2456 ) RWP (2.3.3+) can determine if the 'Download Archive' menu item should be showed based on the value of downloadUrl. If set to 'null', will hide the menu item: - set downloadUrl to public collection download for public collections replay - set downloadUrl to null for private collection and crawl replay to hide the download menu item in RWP (otherwise have to add the auth_header query with bearer token and should assess security before doing that..) --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-03-03 14:11:28 -08:00
sua yoo	65a40c4816	feat: Show additional collection details (#2455 ) Resolves https://github.com/webrecorder/browsertrix/issues/2452 ## Changes - Displays page count and collection size in listing grid - Displays month if collection period is in the same year - Displays collection size in About > Details section - Minor refactor: move byte formatting into `localize.ts` utility file, move slash (`/`) separator into own utility file	2025-03-03 13:15:27 -08:00
Ilya Kreymer	e13c3bfb48	move db migrations to initContainers: (#2449 ) - should avoid gunicorn worker timeouts for long running migrations, also fixes #2439 - add main_migrations as entrypoint to just run db migrations, using existing init_ops() call - first run 'migrations' container with same resources as 'app' and 'op' - additional typing for initializing db - cleanup unused code related to running only once, waiting for db to be ready - fixes #2447	2025-03-03 13:13:15 -08:00
Ilya Kreymer	702c9ab3b7	Better cacheing of presigned URLs + support for thumbnails (#2446 ) Overhauls URL presigning by: - cache the presigned urls in a flat, separate mongodb collection which has an expiring index - update presigned urls if not found / expired automatically in index - remove logic on storing presignedUrl in files - support cacheing presigned URL for thumbnails. - add endpoints to clear presigned urls for org or for all files in all orgs (superadmin only) - supersedes #2438, fix for #2437 - removes previous presignedUrl and expireAt data from crawls and QA runs --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-03-03 12:05:23 -08:00
Ilya Kreymer	631b019baf	optimize public collection loading: (#2444 ) - remove query for /collections endpoint just to get the org name - add orgName to single /collection endpoint, where it is already available on the backend	2025-03-03 10:13:30 -08:00
Ilya Kreymer	2263745df3	Fix replay.json 400 response for empty collection (#2445 ) - fix #2443 - don't throw error in list_pages() if no crawls provided, just return empty list - ensure an empty collection returns 200 on replay.json, add tests	2025-03-03 09:38:19 -08:00
Ilya Kreymer	2e86ee3fcc	Weblate (#2450 ) Translations update from [Hosted Weblate](https://hosted.weblate.org) for [Browsertrix/Browsertrix](https://hosted.weblate.org/projects/browsertrix/browsertrix/). Current translation status: ![Weblate translation status](https://hosted.weblate.org/widget/browsertrix/browsertrix/horizontal-auto.svg) Co-authored-by: Weblate (bot) <hosted@weblate.org> Co-authored-by: Anne Paz <anelisespaz@gmail.com> Co-authored-by: weblate <1607653+weblate@users.noreply.github.com>	2025-03-02 19:46:00 -08:00
Ilya Kreymer	64621ba6c0	frontend: fix rendering when backend not available yet (#2448 ) - don't wait for languages to be ready to render UI, as this can result in empty page if backend can not be reached. - catch if /api/settings returns an invalid response to show 'backend initializing' message - will support initContainers where backend may return 5xx error while backend is initializing, via #2449 Note: this results in locale picker showing all available locales if backend is not available, not just filtered ones, but I think that's a reasonable trade-off.	2025-03-01 14:02:37 -08:00
Emma Segal-Grossman	53b531ce3e	Show download button on public collection pages regardless of collection access (#2442 ) Reported here https://discord.com/channels/895426029194207262/1011678975636013066/1345095899008860224 Public-facing collections (whether public or unlisted) should have the download button visible if "show download button" is enabled.	2025-02-28 22:07:38 -08:00
Ilya Kreymer	cb52da66dc	version: bump to 1.14.2	2025-02-27 14:13:03 -08:00
Tessa Walsh	45aa0a32b6	Calculate total for crawl QA page endpoint (#2435 ) Fixes #2434 Patch fix for a regression in Browsertrix 1.4.0-1.4.1 where total was not being calculated for QA page list endpoint but still being included in response, which led to total always being 0 and pages not loading in the frontend review screen as a result.	2025-02-27 11:46:35 -08:00
Ilya Kreymer	376c9981dc	version: bump to 1.14.1	2025-02-26 23:15:01 -08:00

1 2 3 4 5 ...

1584 Commits