browsertrix

Author	SHA1	Message	Date
Vinzenz Sinapius	bb6e703f6a	Configure browsertrix proxies (#1847 ) Resolves #1354 Supports crawling through pre-configured proxy servers, allowing users to select which proxy servers to use (requires browsertrix crawler 1.3+) Config: - proxies defined in btrix-proxies subchart - can be configured via btrix-proxies key or separate proxies.yaml file via separate subchart - proxies list refreshed automatically if crawler_proxies.json changes if subchart is deployed - support for ssh and socks5 proxies - proxy keys added to secrets in subchart - support for default proxy to be always used if no other proxy configured, prevent starting cluster if default proxy not available - prevent starting manual crawl if previously configured proxy is no longer available, return error - force 'btrix' username and group name on browsertrix-crawler non-root user to support ssh Operator: - support crawling through proxies, pass proxyId in CrawlJob - support running profile browsers which designated proxy, pass proxyId to ProfileJob - prevent starting scheduled crawl if previously configured proxy is no longer available API / Access: - /api/orgs/all/crawlconfigs/crawler-proxies - get all proxies (superadmin only) - /api/orgs/{oid}/crawlconfigs/crawler-proxies - get proxies available to particular org - /api/orgs/{oid}/proxies - update allowed proxies for particular org (superadmin only) - superadmin can configure which orgs can use which proxies, stored on the org - superadmin can also allow an org to access all 'shared' proxies, to avoid having to allow a shared proxy on each org. UI: - Superadmin has 'Edit Proxies' dialog to configure for each org if it has: dedicated proxies, has access to shared proxies. - User can select a proxy in Crawl Workflow browser settings - Users can choose to launch a browser profile with a particular proxy - Display which proxy is used to create profile in profile selector - Users can choose with default proxy to use for new workflows in Crawling Defaults --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-10-02 18:35:45 -07:00
sua yoo	08aa2f86f3	chore: Auto-commit extracted localization strings (#2089 ) Removes `localize:extract` from pre-commit hook and commits changes from `localize:extract` in frontend PR build check.	2024-09-30 10:48:13 -07:00
sua yoo	612bbb6f42	feat: Merge workflow job types (#2068 ) Resolves https://github.com/webrecorder/browsertrix/issues/2073 ### Changes - Removes "URL List" and "Seeded Crawl" job type distinction and adds as additional crawl scope types instead. - 'New Workflow' button defaults to Single Page - 'New Workflow' dropdown includes Page Crawl (Single Page, Page List, In-Page Links) and Site Crawl (Page in Same Directory, Page on Same Domain, + Subdomains and Custom Page Prefix) - Enables specifying `DOCS_URL` in `.env` - Additional follow-ups in #2090, #2091	2024-09-25 10:37:18 -04:00
Ilya Kreymer	62da0fbd6c	security: tweak get /invite endpoints / InviteOut to: (#2087 ) don't set inviterEmail / inviterName if the inviter is the superuser: - return fromSuperuser true/false - if fromSuperuser, don't set inviterEmail / inviterName - tests: add tests for non-superuser admin invites	2024-09-20 11:52:56 -07:00
Vinzenz Sinapius	a674689354	Update ansible pipfile (#2088 ) Fixes some dependabot alerts	2024-09-20 11:41:21 -07:00
Ilya Kreymer	feb6b1f26c	Ensure email comparisons are case-insensitive, emails stored as lowercase (#2084 ) (#2086 ) (fixes from 1.11.7) - Add a custom EmailStr type which lowercases the full e-mail, not just the domain. - Ensure EmailStr is used throughout wherever e-mails are used, both for invites and user models - Tests: update to check for lowercase email responses, e-mails returned from APIs are always lowercase - Tests: remove tests where '@' was ur-lencoded, should not be possible since POSTing JSON and no url-decoding is done/expected. E-mails should have '@' present. - Fixes #2083 where invites were rejected due to case differences - CI: pin pymongo dependency due to latest releases update, update python used for CI	2024-09-19 12:20:34 -07:00
sua yoo	a8f4f8cfc3	docs: Clarify hosted vs. self-deployment requirements (#2082 ) Updates docs to clarify difference between self-hosting and hosted subscription. --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics> Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-09-18 13:43:09 -07:00
Emma Segal-Grossman	9a799cc8ab	Ensure that CI fails if extracted strings don't match (#2078 ) - Ensures extracted strings get formatted before checking against index - Fixes index check by switching from `git diff-index` to `git diff`, and ensures the proper `--exit-code` flag is present (implicitly turned on by `--quiet`) - Adds actionable error message when the check fails - Updates Github's actions versions from v3 to v4 (major version bump is primarily just for default node version updates, but this way we'll get future updates) - Adds formatting step to npm script for extracting messages - Runs a string extraction & format against current main	2024-09-16 16:48:33 -04:00
Tessa Walsh	123705c53f	Serialize datetimes with Z suffix (#2058 ) Use timezone aware datetimes instead of timezone naive datetimes: - Update mongodb client to use tz-aware conversion - Convert dt_now() to return timezone aware UTC date - Rename to_k8s_date -> date_to_str, just returns ISO UTC date with 'Z' (instead of '+00:00' suffix) - Rename from_k8s_date -> str_to_date, returns timezone aware date from str - Standardize all string<->date conversion to use either date_to_str or str_to_date - Update frontend to assume iso date, not append 'Z' directly - Update tests to check for 'Z' suffix on some dates --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2024-09-12 16:16:13 -07:00
Ilya Kreymer	c242bb96d2	version: bump to 1.12.0-beta.0	2024-09-12 14:30:15 -07:00
Ilya Kreymer	1f919de294	Allow custom auto-resize crawler volume ratio adjustable (#2076 ) Make the avail / used storage ratio (for crawler volumes) adjustable. Disable auto-resize if set to 0. Follow-up to #2023	2024-09-12 09:28:19 -07:00
sua yoo	49ce894353	Merge pull request #2075 from weblate/weblate-browsertrix-browsertrix-ui Translations update from Hosted Weblate	2024-09-11 13:25:04 -07:00
Webrecorder Dev	2afd2b992d	Translated using Weblate (Spanish) Currently translated at 1.1% (14 of 1213 strings) Translation: Browsertrix/Browsertrix UI Translate-URL: https://hosted.weblate.org/projects/browsertrix/browsertrix-ui/es/	2024-09-11 20:06:49 +02:00
sua yoo	f91e32f866	chore: Format XLIFF files (#2074 ) Formats XLIFF file to match Weblate per https://github.com/webrecorder/browsertrix/issues/1416#issuecomment-2344227966	2024-09-11 10:46:39 -07:00
sua yoo	d3ed78575d	chore: Revert frontend build check trigger (#2071 ) Reverts `push` trigger added to frontend build check in `99ed08656a` now that we require PRs to merge.	2024-09-10 16:44:36 -07:00
sua yoo	55f3b8abb1	fix: Watch tab crawl state consistency (#2060 ) - Fixes empty watch tab during some active states - Disables exclusion and browser window editing while a crawl is stopping - Refactors frontend crawl states to match backend	2024-09-10 14:47:29 -07:00
sua yoo	b58d367da7	fix: Archived item navigation improvements (#2062 ) - Replaces QA Review breadcrumbs with "Back" button that takes you back to the archived item - Crawl breadcrumbs always point to workflow - Shows archived item icon next to item name to differentiate from workflow - Adds "Edit Workflow" button to "Crawl Settings" --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics> Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-09-10 14:26:29 -07:00
sua yoo	99ed08656a	feat: Localization workflow improvements (#2069 ) - Extracts translatable text strings in pre-commit hook - Updates ternary pluralization to use `pluralOf` instead - Generates XLIFF for Spanish	2024-09-10 14:15:26 -07:00
sua yoo	8ccd36b0a7	fix: Unstyled content visibility (#2070 ) Fixes flash of unstyled content due to dynamic imports.	2024-09-10 13:32:15 -07:00
sua yoo	c01e3dd88b	feat: Improve UX of choosing new workflow crawl type (#2067 ) Resolves https://github.com/webrecorder/browsertrix/issues/2066 ### Changes - Allows directly choosing new "Page List" or "Site Crawl from workflow list - Reverts terminology introduced in https://github.com/webrecorder/browsertrix/pull/2032	2024-09-09 16:42:47 -07:00
sua yoo	b4e34d1c3c	fix: Fix collection description (#2065 ) Fixes https://github.com/webrecorder/browsertrix/issues/2064 ### Changes - Switches MDE library to one that supports shadow DOM - Refactors collection components to btrix components - Fixes collection detail not expanding and contracting correctly	2024-09-05 22:10:14 -07:00
sua yoo	4c36c80351	feat: Display scale as number of browser windows (#2057 ) Resolves https://github.com/webrecorder/browsertrix/issues/2048 --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2024-09-05 17:32:40 -07:00
Ilya Kreymer	b3c1195878	version: bump to 1.11.6	2024-09-05 17:31:10 -07:00
sua yoo	880e27370d	feat: Breadcrumb navigation (#2053 ) - Adds breadcrumb navigation to all detail views, returning to correct origin for workflow and collection items - Refactors list page headers into layout utility - Refactors crawl tab labels and renames "Files" tab to "WACZ Files"	2024-08-30 09:08:24 -07:00
sua yoo	e4107d0e76	fix: correct link to crawlilng defaults	2024-08-29 17:00:29 -07:00
sua yoo	eb2dab8ae0	fix: Update browser title (#2054 ) Updates browser title when visiting the following pages: - Superadmin dashboard - Org top-level pages - Account settings	2024-08-29 16:50:14 -07:00
sua yoo	988d9c9e2b	feat: Allow org admins to set default workflow configs (#2020 ) - Adds new org settings tab for updating crawl details - Refactors `workflow-editor` to move out utility functions - Updates user guide on org settings --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2024-08-29 16:49:22 -07:00
sua yoo	a44e9207ca	docs: Publish only on release or manual run (#2055 )	2024-08-28 15:28:27 -07:00
sua yoo	ecac4f6939	docs: Reorganize user guide (#2050 ) Reorganizes user guide to be more solutions based --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics> Co-authored-by: Emma Segal-Grossman <hi@emma.cafe> Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-08-28 09:50:42 -07:00
Ilya Kreymer	ea252e8da9	version: bump to 1.11.5	2024-08-27 10:00:53 -07:00
sua yoo	337454f8c9	feat: Add link to hosted sign-up page (#2045 ) Resolves https://github.com/webrecorder/browsertrix/issues/2043 <!-- Fixes #issue_number --> ### Changes - Shows link to sign up in UI if `sign_up_url` is configured. - Expires settings in session storage (for now)	2024-08-26 17:26:25 -07:00
sua yoo	c0725599b2	fix: Help forum link not showing on mobile (#2049 ) Fixes the "Help Forum" link text not showing on smaller screens.	2024-08-26 15:53:08 -07:00
sua yoo	d119e8fd77	chore: Lock yarn version to classic (#2047 ) Enables installing the app with yarn 2+.	2024-08-26 15:30:59 -07:00
Ilya Kreymer	95969ec747	Attempt to auto-adjust storage if usage is running out while crawl is running (#2023 ) Attempt to auto-adjust PVC storage if: - used storage (as reported in redis by the crawler) * 2.5 > total_storage - will cause PVC to resize, if possible (not supported by all drivers) - uses multiples of 1Gi, rounding up to next GB - AVAIL_STORAGE_RATIO hard-coded to 2.5 for now, to account for 2x space for WACZ plus change for fast updating crawls Some caveats: - only works if the storageClass used for PVCs has `allowVolumeExpansion: true`, if not, it will have no effect - designed as a last resort option: the `crawl_storage` in values and `--sizeLimit` and `--diskUtilization` should generally result in this not being needed. - can be useful in cases where a crawl is rapidly capturing a lot of content in one page, and there's no time to interrupt / restart, since the other limits apply only at page end. - May want to have crawler update the disk usage more frequently, not just at page end to make this more effective.	2024-08-26 14:19:20 -07:00
Ilya Kreymer	a1df689729	stats recompute fixes: (#2022 ) - fix stats_recompute_last() and stats_recompute_all() to not update the lastCrawl* properties of a crawl workflow if a crawl is running, as those stats now point to the running crawl - refactor _add_running_curr_crawl_stats() to make it clear stats only updated if crawl is running - stats_recompute_all() change order to ascending to actually get last crawl, not first!	2024-08-26 14:18:59 -07:00
Ilya Kreymer	135c97419d	version: update to 1.11.4	2024-08-26 12:31:56 -07:00
sua yoo	2a057eddd6	chore: Improve time to load org UI (#2044 ) Improves time to first load an org with the following: - Users user info from login response to set org slug and route user on log in - Stores user info in session storage so that it's available on reload - Stores app settings in local storage until user logs out - Loads critical org components synchronously	2024-08-26 10:45:10 -07:00
Ilya Kreymer	96e393e80d	update crawler channel fix: add crawlerChannel to update check (#2046 ) Add missing check for crawlerChannel update	2024-08-26 10:41:54 -04:00
sua yoo	acd3e1252d	feat: Add help shortcuts to app header & footer (#2040 ) WIP for https://github.com/webrecorder/browsertrix/issues/2041 <!-- Fixes #issue_number --> ### Changes - Adds button to open embedded support guide - Adds link to help forum - Refactors app bar to look nicer on smaller screens	2024-08-23 18:11:29 -07:00
Ilya Kreymer	04c8b50423	add a crawling defaults on the Org to allow setting certain crawl workflow fields as defaults: (#2031 ) - add POST /orgs/<id>/defaults/crawling API to update all defaults (defaults unset are cleared) - defaults returned as 'crawlingDefaults' object on Org, if set - fixes #2016 --------- Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>	2024-08-22 10:36:04 -07:00
sua yoo	0e16d526c0	fix: Hide login link on login page (#2039 ) - Removes log in link when on log in page - Fix e2e test, which wasn't actually logging the test user in before	2024-08-21 15:33:24 -07:00
sua yoo	25b1928d44	feat: Enable deleting workflow from list (#2042 ) Adds back workflow list menu item to delete workflow if it's never been run.	2024-08-21 15:33:00 -07:00
sua yoo	2ca9632057	feat: Add additional context around workflow job type options (#2032 ) - Updates workflow job type copy and adds additional clarifying text - Changes "List of URLs" label to "Crawl URL(s)" - Refactors `NewWorkflowDialog` into tailwind element	2024-08-21 14:03:43 -07:00
sua yoo	3605d07547	fix: Make footer translatable (#2038 ) - Wraps footer strings to prepare for localization - Removes extraneous class names - Updates copy button tooltip to match bug report field	2024-08-21 14:01:52 -07:00
Ilya Kreymer	86c9e538c1	quickfix: webhooks: ensure the 'crawl_reviewed' webhook is sent async, doesn't delay submitting a review (#2033 ) make the call to `create_crawl_reviewed_notification` be called with create_task (similar to other user-initiated webhook events), to avoid extra wait for webhook to complete	2024-08-20 17:50:18 -07:00
sua yoo	7208888a1c	chore: remove console log	2024-08-20 17:34:47 -07:00
Emma Segal-Grossman	10640feeef	Add detailed permissions & permission summaries to user invite popup (#2003 )	2024-08-20 20:34:29 -04:00
Emma Segal-Grossman	570dc10f2a	Properly pluralize "Pages" in QA, and display skeletons instead of incorrect fallback values (#2026 )	2024-08-20 20:33:52 -04:00
Ilya Kreymer	8c9a14b6a2	Ensure Subscription Update doesn't update the gifted quotas (#2012 ) - add a separate OrgQuotasIn where all quota updates are optional - ensure gifted quotas are never updated as part of org update - update tests	2024-08-20 13:15:03 -07:00
sua yoo	351e92ae2f	fix: Prevent browser profile selection overflow (#2029 ) - Truncates selected browser profile description and refreshes style - Order browser profiles by modified date	2024-08-20 12:43:51 -07:00

1 2 3 4 5 ...

1361 Commits