Commit Graph

1361 Commits

Author SHA1 Message Date
Vinzenz Sinapius
bb6e703f6a
Configure browsertrix proxies (#1847)
Resolves #1354

Supports crawling through pre-configured proxy servers, allowing users to select which proxy servers to use (requires browsertrix crawler 1.3+)

Config:
- proxies defined in btrix-proxies subchart
- can be configured via btrix-proxies key or separate proxies.yaml file via separate subchart
- proxies list refreshed automatically if crawler_proxies.json changes if subchart is deployed
- support for ssh and socks5 proxies
- proxy keys added to secrets in subchart
- support for default proxy to be always used if no other proxy configured, prevent starting cluster if default proxy not available
- prevent starting manual crawl if previously configured proxy is no longer available, return error
- force 'btrix' username and group name on browsertrix-crawler non-root user to support ssh

Operator:
- support crawling through proxies, pass proxyId in CrawlJob
- support running profile browsers which designated proxy, pass proxyId to ProfileJob
- prevent starting scheduled crawl if previously configured proxy is no longer available

API / Access:
- /api/orgs/all/crawlconfigs/crawler-proxies - get all proxies (superadmin only)
- /api/orgs/{oid}/crawlconfigs/crawler-proxies - get proxies available to particular org
- /api/orgs/{oid}/proxies - update allowed proxies for particular org (superadmin only)
- superadmin can configure which orgs can use which proxies, stored on the org
- superadmin can also allow an org to access all 'shared' proxies, to avoid having to allow a shared proxy on each org.

UI:
- Superadmin has 'Edit Proxies' dialog to configure for each org if it has: dedicated proxies, has access to shared proxies.
- User can select a proxy in Crawl Workflow browser settings
- Users can choose to launch a browser profile with a particular proxy
- Display which proxy is used to create profile in profile selector
- Users can choose with default proxy to use for new workflows in Crawling Defaults

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-10-02 18:35:45 -07:00
sua yoo
08aa2f86f3
chore: Auto-commit extracted localization strings (#2089)
Removes `localize:extract` from pre-commit hook and commits changes from
`localize:extract` in frontend PR build check.
2024-09-30 10:48:13 -07:00
sua yoo
612bbb6f42
feat: Merge workflow job types (#2068)
Resolves https://github.com/webrecorder/browsertrix/issues/2073

### Changes

- Removes "URL List" and "Seeded Crawl" job type distinction and adds as
additional crawl scope types instead.
- 'New Workflow' button defaults to Single Page
- 'New Workflow' dropdown includes Page Crawl (Single Page, Page List, In-Page Links) and Site Crawl (Page in Same Directory, Page on Same Domain, + Subdomains and Custom Page Prefix)
- Enables specifying `DOCS_URL` in `.env`
- Additional follow-ups in #2090, #2091
2024-09-25 10:37:18 -04:00
Ilya Kreymer
62da0fbd6c
security: tweak get /invite endpoints / InviteOut to: (#2087)
don't set inviterEmail / inviterName if the inviter is the superuser:
- return fromSuperuser true/false
- if fromSuperuser, don't set inviterEmail / inviterName
- tests: add tests for non-superuser admin invites
2024-09-20 11:52:56 -07:00
Vinzenz Sinapius
a674689354
Update ansible pipfile (#2088)
Fixes some dependabot alerts
2024-09-20 11:41:21 -07:00
Ilya Kreymer
feb6b1f26c
Ensure email comparisons are case-insensitive, emails stored as lowercase (#2084) (#2086) (fixes from 1.11.7)
- Add a custom EmailStr type which lowercases the full e-mail, not just
the domain.
- Ensure EmailStr is used throughout wherever e-mails are used, both for
invites and user models
- Tests: update to check for lowercase email responses, e-mails returned
from APIs are always lowercase
- Tests: remove tests where '@' was ur-lencoded, should not be possible
since POSTing JSON and no url-decoding is done/expected. E-mails should
have '@' present.
- Fixes #2083 where invites were rejected due to case differences
- CI: pin pymongo dependency due to latest releases update, update python used for CI
2024-09-19 12:20:34 -07:00
sua yoo
a8f4f8cfc3
docs: Clarify hosted vs. self-deployment requirements (#2082)
Updates docs to clarify difference between self-hosting and hosted
subscription.

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-09-18 13:43:09 -07:00
Emma Segal-Grossman
9a799cc8ab
Ensure that CI fails if extracted strings don't match (#2078)
- Ensures extracted strings get formatted before checking against index
- Fixes index check by switching from `git diff-index` to `git diff`,
and ensures the proper `--exit-code` flag is present (implicitly turned
on by `--quiet`)
- Adds actionable error message when the check fails
- Updates Github's actions versions from v3 to v4 (major version bump is
primarily just for default node version updates, but this way we'll get
future updates)
- Adds formatting step to npm script for extracting messages
- Runs a string extraction & format against current main
2024-09-16 16:48:33 -04:00
Tessa Walsh
123705c53f
Serialize datetimes with Z suffix (#2058)
Use timezone aware datetimes instead of timezone naive datetimes:
- Update mongodb client to use tz-aware conversion
- Convert dt_now() to return timezone aware UTC date
- Rename to_k8s_date -> date_to_str, just returns ISO UTC date with 'Z'
(instead of '+00:00' suffix)
- Rename from_k8s_date -> str_to_date, returns timezone aware date from
str
- Standardize all string<->date conversion to use either date_to_str or
str_to_date
- Update frontend to assume iso date, not append 'Z' directly
- Update tests to check for 'Z' suffix on some dates

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-09-12 16:16:13 -07:00
Ilya Kreymer
c242bb96d2 version: bump to 1.12.0-beta.0 2024-09-12 14:30:15 -07:00
Ilya Kreymer
1f919de294
Allow custom auto-resize crawler volume ratio adjustable (#2076)
Make the avail / used storage ratio (for crawler volumes) adjustable.
Disable auto-resize if set to 0.
Follow-up to #2023
2024-09-12 09:28:19 -07:00
sua yoo
49ce894353
Merge pull request #2075 from weblate/weblate-browsertrix-browsertrix-ui
Translations update from Hosted Weblate
2024-09-11 13:25:04 -07:00
Webrecorder Dev
2afd2b992d
Translated using Weblate (Spanish)
Currently translated at 1.1% (14 of 1213 strings)

Translation: Browsertrix/Browsertrix UI
Translate-URL: https://hosted.weblate.org/projects/browsertrix/browsertrix-ui/es/
2024-09-11 20:06:49 +02:00
sua yoo
f91e32f866
chore: Format XLIFF files (#2074)
Formats XLIFF file to match Weblate per
https://github.com/webrecorder/browsertrix/issues/1416#issuecomment-2344227966
2024-09-11 10:46:39 -07:00
sua yoo
d3ed78575d
chore: Revert frontend build check trigger (#2071)
Reverts `push` trigger added to frontend build check in
99ed08656a
now that we require PRs to merge.
2024-09-10 16:44:36 -07:00
sua yoo
55f3b8abb1
fix: Watch tab crawl state consistency (#2060)
- Fixes empty watch tab during some active states
- Disables exclusion and browser window editing while a crawl is
stopping
- Refactors frontend crawl states to match backend
2024-09-10 14:47:29 -07:00
sua yoo
b58d367da7
fix: Archived item navigation improvements (#2062)
- Replaces QA Review breadcrumbs with "Back" button that takes you back
to the archived item
- Crawl breadcrumbs always point to workflow
- Shows archived item icon next to item name to differentiate from
workflow
- Adds "Edit Workflow" button to "Crawl Settings"

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-09-10 14:26:29 -07:00
sua yoo
99ed08656a
feat: Localization workflow improvements (#2069)
- Extracts translatable text strings in pre-commit hook
- Updates ternary pluralization to use `pluralOf` instead
- Generates XLIFF for Spanish
2024-09-10 14:15:26 -07:00
sua yoo
8ccd36b0a7
fix: Unstyled content visibility (#2070)
Fixes flash of unstyled content due to dynamic imports.
2024-09-10 13:32:15 -07:00
sua yoo
c01e3dd88b
feat: Improve UX of choosing new workflow crawl type (#2067)
Resolves https://github.com/webrecorder/browsertrix/issues/2066

### Changes
- Allows directly choosing new "Page List" or "Site Crawl from
workflow list
- Reverts terminology introduced in
https://github.com/webrecorder/browsertrix/pull/2032
2024-09-09 16:42:47 -07:00
sua yoo
b4e34d1c3c
fix: Fix collection description (#2065)
Fixes https://github.com/webrecorder/browsertrix/issues/2064

### Changes

- Switches MDE library to one that supports shadow DOM
- Refactors collection components to btrix components
- Fixes collection detail not expanding and contracting correctly
2024-09-05 22:10:14 -07:00
sua yoo
4c36c80351
feat: Display scale as number of browser windows (#2057)
Resolves https://github.com/webrecorder/browsertrix/issues/2048

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-09-05 17:32:40 -07:00
Ilya Kreymer
b3c1195878 version: bump to 1.11.6 2024-09-05 17:31:10 -07:00
sua yoo
880e27370d
feat: Breadcrumb navigation (#2053)
- Adds breadcrumb navigation to all detail views, returning to correct
origin for workflow and collection items
- Refactors list page headers into layout utility
- Refactors crawl tab labels and renames "Files" tab to "WACZ Files"
2024-08-30 09:08:24 -07:00
sua yoo
e4107d0e76
fix: correct link to crawlilng defaults 2024-08-29 17:00:29 -07:00
sua yoo
eb2dab8ae0
fix: Update browser title (#2054)
Updates browser title when visiting the following pages:

- Superadmin dashboard
- Org top-level pages
- Account settings
2024-08-29 16:50:14 -07:00
sua yoo
988d9c9e2b
feat: Allow org admins to set default workflow configs (#2020)
- Adds new org settings tab for updating crawl details
- Refactors `workflow-editor` to move out utility functions
- Updates user guide on org settings

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-08-29 16:49:22 -07:00
sua yoo
a44e9207ca
docs: Publish only on release or manual run (#2055) 2024-08-28 15:28:27 -07:00
sua yoo
ecac4f6939
docs: Reorganize user guide (#2050)
Reorganizes user guide to be more solutions based

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-08-28 09:50:42 -07:00
Ilya Kreymer
ea252e8da9 version: bump to 1.11.5 2024-08-27 10:00:53 -07:00
sua yoo
337454f8c9
feat: Add link to hosted sign-up page (#2045)
Resolves https://github.com/webrecorder/browsertrix/issues/2043

<!-- Fixes #issue_number -->

### Changes

- Shows link to sign up in UI if `sign_up_url` is configured.
- Expires settings in session storage (for now)
2024-08-26 17:26:25 -07:00
sua yoo
c0725599b2
fix: Help forum link not showing on mobile (#2049)
Fixes the "Help Forum" link text not showing on smaller screens.
2024-08-26 15:53:08 -07:00
sua yoo
d119e8fd77
chore: Lock yarn version to classic (#2047)
Enables installing the app with yarn 2+.
2024-08-26 15:30:59 -07:00
Ilya Kreymer
95969ec747
Attempt to auto-adjust storage if usage is running out while crawl is running (#2023)
Attempt to auto-adjust PVC storage if:
- used storage (as reported in redis by the crawler) * 2.5 >
total_storage
- will cause PVC to resize, if possible (not supported by all drivers)
- uses multiples of 1Gi, rounding up to next GB
- AVAIL_STORAGE_RATIO hard-coded to 2.5 for now, to account for 2x space
for WACZ plus change for fast updating crawls

Some caveats:
- only works if the storageClass used for PVCs has
`allowVolumeExpansion: true`, if not, it will have no effect
- designed as a last resort option: the `crawl_storage` in values and
`--sizeLimit` and `--diskUtilization` should generally result in this
not being needed.
- can be useful in cases where a crawl is rapidly capturing a lot of
content in one page, and there's no time to interrupt / restart, since
the other limits apply only at page end.
- May want to have crawler update the disk usage more frequently, not
just at page end to make this more effective.
2024-08-26 14:19:20 -07:00
Ilya Kreymer
a1df689729
stats recompute fixes: (#2022)
- fix stats_recompute_last() and stats_recompute_all() to not update the
lastCrawl* properties of a crawl workflow if a crawl is running, as
those stats now point to the running crawl
- refactor _add_running_curr_crawl_stats() to make it clear stats only
updated if crawl is running
- stats_recompute_all() change order to ascending to actually get last
crawl, not first!
2024-08-26 14:18:59 -07:00
Ilya Kreymer
135c97419d version: update to 1.11.4 2024-08-26 12:31:56 -07:00
sua yoo
2a057eddd6
chore: Improve time to load org UI (#2044)
Improves time to first load an org with the following:
- Users user info from login response to set org slug and route user on
log in
- Stores user info in session storage so that it's available on reload
- Stores app settings in local storage until user logs out
- Loads critical org components synchronously
2024-08-26 10:45:10 -07:00
Ilya Kreymer
96e393e80d
update crawler channel fix: add crawlerChannel to update check (#2046)
Add missing check for crawlerChannel update
2024-08-26 10:41:54 -04:00
sua yoo
acd3e1252d
feat: Add help shortcuts to app header & footer (#2040)
WIP for https://github.com/webrecorder/browsertrix/issues/2041

<!-- Fixes #issue_number -->

### Changes

- Adds button to open embedded support guide
- Adds link to help forum
- Refactors app bar to look nicer on smaller screens
2024-08-23 18:11:29 -07:00
Ilya Kreymer
04c8b50423
add a crawling defaults on the Org to allow setting certain crawl workflow fields as defaults: (#2031)
- add POST /orgs/<id>/defaults/crawling API to update all defaults
(defaults unset are cleared)
- defaults returned as 'crawlingDefaults' object on Org, if set
- fixes #2016

---------

Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
2024-08-22 10:36:04 -07:00
sua yoo
0e16d526c0
fix: Hide login link on login page (#2039)
- Removes log in link when on log in page
- Fix e2e test, which wasn't actually logging the test user in before
2024-08-21 15:33:24 -07:00
sua yoo
25b1928d44
feat: Enable deleting workflow from list (#2042)
Adds back workflow list menu item to delete workflow if it's never been
run.
2024-08-21 15:33:00 -07:00
sua yoo
2ca9632057
feat: Add additional context around workflow job type options (#2032)
- Updates workflow job type copy and adds additional clarifying text
- Changes "List of URLs" label to "Crawl URL(s)"
- Refactors `NewWorkflowDialog` into tailwind element
2024-08-21 14:03:43 -07:00
sua yoo
3605d07547
fix: Make footer translatable (#2038)
- Wraps footer strings to prepare for localization
- Removes extraneous class names
- Updates copy button tooltip to match bug report field
2024-08-21 14:01:52 -07:00
Ilya Kreymer
86c9e538c1
quickfix: webhooks: ensure the 'crawl_reviewed' webhook is sent async, doesn't delay submitting a review (#2033)
make the call to `create_crawl_reviewed_notification` be called with
create_task (similar to other user-initiated webhook events), to avoid
extra wait for webhook to complete
2024-08-20 17:50:18 -07:00
sua yoo
7208888a1c
chore: remove console log 2024-08-20 17:34:47 -07:00
Emma Segal-Grossman
10640feeef
Add detailed permissions & permission summaries to user invite popup (#2003) 2024-08-20 20:34:29 -04:00
Emma Segal-Grossman
570dc10f2a
Properly pluralize "Pages" in QA, and display skeletons instead of incorrect fallback values (#2026) 2024-08-20 20:33:52 -04:00
Ilya Kreymer
8c9a14b6a2
Ensure Subscription Update doesn't update the gifted quotas (#2012)
- add a separate OrgQuotasIn where all quota updates are optional
- ensure gifted quotas are never updated as part of org update
- update tests
2024-08-20 13:15:03 -07:00
sua yoo
351e92ae2f
fix: Prevent browser profile selection overflow (#2029)
- Truncates selected browser profile description and refreshes style
- Order browser profiles by modified date
2024-08-20 12:43:51 -07:00