browsertrix

Author	SHA1	Message	Date
Ilya Kreymer	631b019baf	optimize public collection loading: (#2444 ) - remove query for /collections endpoint just to get the org name - add orgName to single /collection endpoint, where it is already available on the backend	2025-03-03 10:13:30 -08:00
Ilya Kreymer	2263745df3	Fix replay.json 400 response for empty collection (#2445 ) - fix #2443 - don't throw error in list_pages() if no crawls provided, just return empty list - ensure an empty collection returns 200 on replay.json, add tests	2025-03-03 09:38:19 -08:00
Ilya Kreymer	2e86ee3fcc	Weblate (#2450 ) Translations update from [Hosted Weblate](https://hosted.weblate.org) for [Browsertrix/Browsertrix](https://hosted.weblate.org/projects/browsertrix/browsertrix/). Current translation status: ![Weblate translation status](https://hosted.weblate.org/widget/browsertrix/browsertrix/horizontal-auto.svg) Co-authored-by: Weblate (bot) <hosted@weblate.org> Co-authored-by: Anne Paz <anelisespaz@gmail.com> Co-authored-by: weblate <1607653+weblate@users.noreply.github.com>	2025-03-02 19:46:00 -08:00
Ilya Kreymer	64621ba6c0	frontend: fix rendering when backend not available yet (#2448 ) - don't wait for languages to be ready to render UI, as this can result in empty page if backend can not be reached. - catch if /api/settings returns an invalid response to show 'backend initializing' message - will support initContainers where backend may return 5xx error while backend is initializing, via #2449 Note: this results in locale picker showing all available locales if backend is not available, not just filtered ones, but I think that's a reasonable trade-off.	2025-03-01 14:02:37 -08:00
Emma Segal-Grossman	53b531ce3e	Show download button on public collection pages regardless of collection access (#2442 ) Reported here https://discord.com/channels/895426029194207262/1011678975636013066/1345095899008860224 Public-facing collections (whether public or unlisted) should have the download button visible if "show download button" is enabled.	2025-02-28 22:07:38 -08:00
Ilya Kreymer	cb52da66dc	version: bump to 1.14.2	2025-02-27 14:13:03 -08:00
Tessa Walsh	45aa0a32b6	Calculate total for crawl QA page endpoint (#2435 ) Fixes #2434 Patch fix for a regression in Browsertrix 1.4.0-1.4.1 where total was not being calculated for QA page list endpoint but still being included in response, which led to total always being 0 and pages not loading in the frontend review screen as a result.	2025-02-27 11:46:35 -08:00
Ilya Kreymer	376c9981dc	version: bump to 1.14.1	2025-02-26 23:15:01 -08:00
Tessa Walsh	3dc8c825c6	Add superadmin endpoint to readd scheduled workflow cronjobs (#2430 ) Adds new superadmin-only `POST /orgs/all/crawlconfigs/reAddCronjobs` endpoint to update/recreate scheduled workflow cronjobs across all orgs.	2025-02-26 23:13:53 -08:00
Tessa Walsh	da77b066a4	Prevent btrix helper from doing anything to k8s contexts other than docker-desktop (#2431 ) The `./btrix` development helper shouldn't be used for anything other than local dev, which this commit helps to enforce. When running any command, if the k8s context is anything other than `docker-desktop` the script will now shut down immediately without doing anything and print the message: "Attempting to modify context other than docker-desktop not supported. Quitting."	2025-02-26 23:13:25 -08:00
Ilya Kreymer	67668438c0	ingress: only set ssl-redirect if using tls (#2432 ) otherwise, http path should be accessible. Can be used when TLS termination handled outside of ingress.	2025-02-26 23:12:07 -08:00
Emma Segal-Grossman	00e85c3e94	Add "Copy <item type> ID" to a bunch of menus (#2426 ) Addresses feedback from here https://discord.com/channels/895426029194207262/910966759165657161/1344367205004873819 by @tw4l. Add "Copy <item type> ID" to a bunch of menus, including all list and detail pages, as well as all other item/crawl/page lists. \| Screenshots \| \|--------\| \| <img width="323" alt="Screenshot 2025-02-26 at 3 56 48 PM" src="https://github.com/user-attachments/assets/32044c47-65f3-4e80-8f39-df5fd2101324" /> \| \| <img width="246" alt="Screenshot 2025-02-26 at 4 02 06 PM" src="https://github.com/user-attachments/assets/8f2d6272-f450-4923-b5c9-751a2eea9a26" /> \| \| <img width="419" alt="Screenshot 2025-02-26 at 4 02 55 PM" src="https://github.com/user-attachments/assets/0c005a33-055d-4fb7-a79e-9bedae57b785" /> \| \| <img width="1104" alt="Screenshot 2025-02-26 at 1 57 01 PM" src="https://github.com/user-attachments/assets/7ee43400-1b30-4c78-89a0-3ddb89ef90ca" /> \| \| <img width="292" alt="Screenshot 2025-02-26 at 4 01 10 PM" src="https://github.com/user-attachments/assets/929f7870-aa83-4f3c-947a-efad377e0b49" /> \| \| <img width="240" alt="Screenshot 2025-02-26 at 4 03 19 PM" src="https://github.com/user-attachments/assets/45bff838-f741-45ce-b1a7-a8cfefa9656b" /> \| --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2025-02-26 16:58:00 -05:00
Ilya Kreymer	e67708bd4f	version: update to 1.14.0	2025-02-24 14:49:46 -08:00
Henry Wilkinson	c56481fc66	Add `deepLink` attribute to public collection replay embed (#2420 ) ### Changes - Public collections can now be deeplinked ### Caveats - When users click the _About this Collection_ tab and then return to the _Browse Collection_ tab, the deeplink is gone until they visit another page.	2025-02-24 14:33:39 -08:00
Ilya Kreymer	83180efac9	remove dropping page index on migrations (#2418 ) Don't need it for now, and this will now be slow due to amount of pages. Can readd in future migrations if we need it.. --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-02-24 12:29:02 -08:00
Ilya Kreymer	8a507f0473	Consolidate list page endpoints + better QA sorting + optimize pages fix (#2417 ) - consolidate list_pages() and list_replay_query_pages() into list_pages() - to keep backwards compatibility, add <crawl>/pagesSearch that does not include page totals, keep <crawl>/pages with page total (slower) - qa frontend: add default 'Crawl Order' sort order, to better show pages in QA view - bgjob: account for parallelism in bgjobs, add logging if succeeded mismatches parallelism - QA sorting: default to 'crawl order' by default to get better results. - Optimize pages job: also cover crawls that may not have any pages but have pages listed in done stats - Bgjobs: give custom op jobs more memory	2025-02-21 13:47:20 -08:00
sua yoo	06f6d9d4f2	feat: Move admin route to own namespace (#2405 ) Resolves https://github.com/webrecorder/browsertrix/issues/2382 ## Changes - Moves superadmin to `/admin` URL namespace - Removes superadmin views from main webpack chunks	2025-02-20 18:43:31 -08:00
sua yoo	8db80f5570	feat: Workflow form collapsible section enhancements (#2381 ) Resolves https://github.com/webrecorder/browsertrix/issues/2359 ## Changes - Track when a workflow form section is opened - Hide workflow form section navigation on small screens --------- Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com> Co-authored-by: Emma Segal-Grossman <hi@emma.cafe> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2025-02-20 18:42:00 -08:00
Ilya Kreymer	3ca68bf1d2	version: 1.14.0-beta.6	2025-02-20 15:37:33 -08:00
Tessa Walsh	f8fb2d2c8d	Rework crawl page migration + MongoDB Query Optimizations (#2412 ) Fixes #2406 Converts migration 0042 to launch a background job (parallelized across several pods) to migrate all crawls by optimizing their pages and setting `version: 2` on the crawl when complete. Also Optimizes MongoDB queries for better performance. Migration Improvements: - Add `isMigrating` and `version` fields to `BaseCrawl` - Add new background job type to use in migration with accompanying `migration_job.yaml` template that allows for parallelization - Add new API endpoint to launch this crawl migration job, and ensure that we have list and retry endpoints for superusers that work with background jobs that aren't tied to a specific org - Rework background job models and methods now that not all background jobs are tied to a single org - Ensure new crawls and uploads have `version` set to `2` - Modify crawl and collection replay.json endpoints to only include fields for replay optimization (`initialPages`, `pageQueryUrl`, `preloadResources`) if all relevant crawls/uploads have `version` set to `2` - Remove `distinct` calls from migration pathways - Consolidate collection recompute stats Query Optimizations: - Remove all uses of $group and $facet - Optimize /replay.json endpoints to precompute preload_resources, avoid fetching crawl list twice - Optimize /collections endpoint by not fetching resources - Rename /urls -> /pageUrlCounts and avoid $group, instead sort with index, either by seed + ts or by url to get top matches. - Use $gte instead of $regex to get prefix matches on URL - Use $text instead of $regex to get text search on title - Remove total from /pages and /pageUrlCounts queries by not using $facet - frontend: only call /pageUrlCounts when dialog is opened. --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Emma Segal-Grossman <hi@emma.cafe> Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>	2025-02-20 15:26:11 -08:00
Ilya Kreymer	f7cd476b1a	Additional French Translations from Weblate (#2410 ) Co-authored-by: Weblate (bot) <hosted@weblate.org> Co-authored-by: Bricaud Frédéric <frederic.bricaud@banq.qc.ca> Co-authored-by: Webrecorder Dev <dev@webrecorder.org> Co-authored-by: Carole Gagné <carole.gagne@banq.qc.ca> Co-authored-by: weblate <1607653+weblate@users.noreply.github.com>	2025-02-20 11:04:34 -08:00
Ilya Kreymer	36e723cc51	Adjust crawler pvc on exit code 3 (out of storage) (#2375 ) crawler 1.5.0 now has an exit code 3 for when crawler is actually out of disk space. The operator should handle this by immediately adjusting the PVC size. Ideally, crawler will be improved to avoid this, but since this can still happen, operator should be able to respond and fix the issue.	2025-02-20 11:03:28 -08:00
Ilya Kreymer	88a9f3baf7	ensure running crawl configmap is updated when exclusions are added/removed (#2409 ) exclusions are already updated dynamically if crawler pod is running, but when crawler pod is restarted, this ensures new exclusions are also picked up: - mount configmap in separate path, avoiding subPath, to allow dynamic updates of mounted volume - adds a lastConfigUpdate timestamp to CrawlJob - if lastConfigUpdate in spec is different from current, the configmap is recreated by operator - operator: also update image from channel avoid any issues with updating crawler in channel - only updates for exclusion add/remove so far, can later be expanded to other crawler settings (see: #2355 for broader running crawl config updates) - fixes #2408	2025-02-19 11:42:19 -08:00
Emma Segal-Grossman	905fe059a4	Add superadmin instance stats card (#2404 ) Closes #2401 https://github.com/user-attachments/assets/cbd288d7-8e9c-4e86-ae87-6a308f6bdd58	2025-02-18 17:29:26 -05:00
Emma Segal-Grossman	f1dc790ab4	Org dashboard: update collection grid empty text state when view is set to "all" (#2402 ) Tested locally. cc @SuaYoo	2025-02-17 21:05:48 -05:00
Ilya Kreymer	d23bca1f73	style change: remove spaces from python version docstring	2025-02-17 16:52:49 -08:00
Ilya Kreymer	a7c8ca4028	version: bump to 1.14.0-beta.1	2025-02-17 16:48:27 -08:00
Tessa Walsh	6c2d8c88c8	Modify page upload migration (#2400 ) Related to #2396 Changes to migration 0037: - Re-adds pages in migration rather than in background job to avoid race condition with later migrations - Re-adds pages for all uploads in all orgs Fix for readd pages for org: - Ensure org filter is applied! - Fix wrong type - Remove distinct, use iterator to iterate over crawls faster. --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2025-02-17 16:47:58 -08:00
Emma Segal-Grossman	629cf7c404	Add a small sticky banner when logged in as superadmin (#2393 ) While ideally we don't need to use superadmin for many things, there are still a lot of places where it's necessary, especially around customer service. This makes it a little more visible when that's the case, just as a reminder. I could see this coming in handy especially for newer people who might not have the experience to know to look for the "admin" and "running crawls" buttons. <img width="1088" alt="Screenshot 2025-02-13 at 1 12 58 PM" src="https://github.com/user-attachments/assets/70b975e1-af6b-4e8c-9e49-52c4c66e9721" />	2025-02-17 17:42:36 -05:00
Ilya Kreymer	5bebb6161a	Issue 2396 readd pages fixes (#2398 ) readd pages fixes: - add additional mem to background job - copy page qa data to separate temp coll when re-adding pages, then merge back in	2025-02-17 13:52:11 -08:00
Ilya Kreymer	e112f96614	Upload Fixes: (#2397 ) - ensure upload pages are always added with a new uuid, to avoid any duplicates with existing uploads, even if upload wacz is actually a crawl from different browsertrix instance, etc.. - cleanup upload names with slugify, which also replaces spaces, fixes uploading wacz filenames with spaces in them - part of fix for #2396 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-02-17 13:05:33 -08:00
Emma Segal-Grossman	44ca293999	Replace 2-digit years with numerical years everywhere in the frontend (#2394 ) Closes #2365 --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2025-02-13 22:23:13 -08:00
Tessa Walsh	39d99e7c5d	Add support for custom link selectors to backend (#2346 ) Related to #2152 This PR adds backend support for custom link selectors via `selectLinks` on the crawl workflow config. Tests have been updated as well. It also adds `selectLinks` to the frontend in a minimal and for now hardcoded way that we can use as a basis for proper frontend support moving forward. --------- Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>	2025-02-13 22:22:27 -08:00
Emma Segal-Grossman	659e124168	Disable "Update collection thumbnail" checkbox on initial page selection dialog until thumbnail is loaded (#2392 ) Closes #2391	2025-02-13 22:03:13 -08:00
Emma Segal-Grossman	0f2da4f785	Allow showing all collections as well as just public ones in org dashboard (#2379 ) Adds a switch to switch between viewing public collections only (default) and all collections on org dashboard. Also updates the `house-fill` icon to `house` in a couple places (@Shrinks99) --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2025-02-13 21:59:29 -08:00
Ilya Kreymer	4516268a70	misc fixes: cors + disable buffering for uploads (#2395 ) - ensure pages endpoint support CORS for local dev - disable proxy request buffering to support large uploads	2025-02-13 19:38:20 -08:00
Tessa Walsh	7f1af9bb31	Mark all pages from pages.jsonl as seeds (#2390 ) Fixes #2389 All pages from `pages/pages.jsonl` files now have `isSeed: True` in the database, in addition to any pages that explicitly have `seed` set to true in the actual JSONL. Tests have been added to ensure that all pages from our fixture uploads have `isSeed: True`. --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2025-02-13 16:54:30 -08:00
Ilya Kreymer	7b2932c582	Add initial pages + pagesQuery endpoint to /replay.json APIs (#2380 ) Fixes #2360 - Adds `initialPages` to /replay.json response for collections, returning up-to 25 pages (seed pages first, then sorted by capture time). - Adds `pagesQueryUrl` to /replay.json - Adds a public pages search endpoint to support public collections. - Adds `preloadResources`, including list of WACZ files that should always be loaded, to /replay.json --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-02-13 16:53:47 -08:00
sua yoo	73f9f949af	chore: Add pylint to vscode extensions (#2387 ) No issue created for this, small devex improvement after I noticed linting errors weren't surfaced in vscode.	2025-02-12 19:40:27 -08:00
Ilya Kreymer	b121076e63	quickfix: add missing dependency for docs (#2388 ) follow-up to #2368: - add mkdocs-redirect to frontend Docker, docs build ci - build frontend when changing mkdocs	2025-02-12 16:39:06 -05:00
Henry Wilkinson	edf1edbbd1	docs: Add Documentation for Sharing Collections (#2368 ) - Merges existing collection content into one page - Updates ArchiveWeb.page link - Adds redirect from /collections → /collection - Moves content relevant to presentation & sharing out of the intro - Adds new content about sharing collections! --------- Co-authored-by: Emma Segal-Grossman <hi@emma.cafe> Co-authored-by: sua yoo <sua@webrecorder.org>	2025-02-12 14:05:52 -05:00
sua yoo	f7b9b73a68	fix: Sort filtered collection page URLs (#2384 ) Fixes https://github.com/webrecorder/browsertrix/issues/2383 - Fixes unpredictable sort order when typing in collection page URL - Fixes page URL results flickering in and out while typing --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-02-12 11:59:20 -05:00
Ilya Kreymer	5b02d81991	ensure collection is fully reloaded after an archived item is added o… (#2386 ) …r removed follow up to #2332 Testing: 1. Add or remove an archived item. 2. Switch to Replay view. Collection should reload and update the page list.	2025-02-11 23:12:47 -08:00
Henry Wilkinson	3586412da1	docs: Adds section for autoclick behavior addition from 1.13.3 (#2385 ) - Adds section for the autoclick behavior - Removes sections that were removed with the new workflow form... and in some cases much earlier! 😅	2025-02-12 00:22:05 -05:00
sua yoo	7ce115588e	fix: Update links to running crawls (#2378 ) - Updates links to running crawls to redirect to workflow "Watch" tab - Removes unused "Jump to crawl" superadmin widgets - Refactors archived item component to remove references to active crawls	2025-02-11 17:08:27 -08:00
sua yoo	0e04fd98b1	fix: More accurate archived item details (#2364 ) - Moves page count out from under "Size" label in archived item detail - Renames "Pages Crawled" to "Pages" in archived item leading heading and detail overview - Renames "Crawl ID" to "Archived Item ID" --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2025-02-11 16:46:13 -08:00
Emma Segal-Grossman	f8a44258d8	Merge pull request #2332 from webrecorder/frontend-collection-editing-dialog Collection editing and sharing revamp	2025-02-11 18:27:35 -05:00
Tessa Walsh	d4032d4ea2	Add autoclick to workflow and crawl settings display (#2374 ) Also rename Auto-Scroll in UI to Autoscroll for consistency	2025-02-11 10:28:30 -05:00
Tessa Walsh	98a45b0d85	Add collection page list/search endpoint (#2354 ) Fixes #2353 Adds a new endpoint to list pages in a collection, with filtering available on `url` (exact match), `ts`, `urlPrefix`, `isSeed`, and `depth`, as well as accompanying tests. Additional sort options have been added as well. These same filters and sort options have also been added to the crawl pages endpoint. Also fixes an issue where `isSeed` wasn't being set in the database when false but only added on serialization, which was preventing filtering from working as expected.	2025-02-10 16:44:37 -08:00
Ilya Kreymer	001839a521	Fix max pages quota setting and display (#2370 ) - add ensure_page_limit_quotas() which sets the config limit to the max pages quota, if any - set the page limit on the config when: creating new crawl, creating configmap - don't set the quota page limit on new or existing crawl workflows (remove setting it on new workflows) to allow updated quotas to take affect for next crawl - frontend: correctly display page limit on workflow settings page from org quotas, if any. - operator: get org on each sync in one place - fixes #2369 --------- Co-authored-by: sua yoo <sua@webrecorder.org>	2025-02-10 16:15:21 -08:00

1 2 3 4 5 ...

1542 Commits