Commit Graph

1575 Commits

Author SHA1 Message Date
Tessa Walsh
1b1819ba5a
Move org deletion to background job with access to backend ops classes (#2098)
This PR introduces background jobs that have full access to the backend
ops classes and moves the delete org job to a background job.
2024-10-10 14:41:05 -04:00
Ilya Kreymer
84a74c43a4 version: bump to 1.13.0-beta.0 2024-10-10 11:38:13 -07:00
Ilya Kreymer
6032e28231
fix: firstOrgAdmin being set to true even if invite was not for an admin (#2110)
Non-admin users should not be given option to rename org when invited to
a new org:
- set firstOrgAdmin to true only when invite is for an admin
- default to false instead of null
- update tests to check
2024-10-08 16:42:30 -07:00
Ilya Kreymer
c33f749515
Frontend hosted-docs (#2107)
Fixes #2106 

Docs are now hosted as part of the frontend at `/docs` by default.

- If `docs_url` is set in the helm chart, the `/docs` endpoint will
redirect to that endpoint instead
- Use multi-stage python image to build mkdocs as part of frontend, then
copy static output
- Dir layout: mkdocs.yml and docs into frontend/docs
- CI: Update docs build GH action to use new path
- Update all frontend paths to use `/docs/` instead of
`https://docs.browsertrix.com/`

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-10-08 14:56:34 -07:00
Ilya Kreymer
d12ce9a591
sidebar setup guide fixes: (#2105)
- ensure user-guide iframe is always visible if not fading in when
'Setup Guide' button is clicked
- Clicking 'Setup Guide' scrolls to relevant section of guide (scope,
limits, browser settings, scheduling, metadata) based on current
workflow editor tab via hashtag

---------

Co-authored-by: sua yoo <sua@webrecorder.org>
2024-10-07 22:14:46 -07:00
Ilya Kreymer
dde5056242
quickfix: add a flag to avoid clearing primarySeedUrl when switching scope types (#2104)
otherwise, triggers a partial state update, which causes a form to be
reset.
2024-10-07 13:05:23 -07:00
Ilya Kreymer
406879f2f4
ui: fix missing click handler on QA review 'cancel' button (#2103)
quick fix: add a handler to close the dialog as expected
2024-10-04 17:35:26 -07:00
Ilya Kreymer
8192e5bed6 version: bump to 1.12.0 2024-10-03 16:45:54 -07:00
Ilya Kreymer
104ea097c4
switch to simpler streaming download + multiwacz metadata improvements: (#1982)
- download via presigned URLs via requests instead of boto APIs, remove boto
- follow-up to #1933 for streaming download improvements
- fixes datapackage.json in multi-wacz to contain the same resources
objects with: `name`, `path`, `hash`, `bytes` to match single WACZ.
- Add additional metadata to multi-wacz datapackage.json, including `type`
(`crawl`, `upload`, `collection`, `qaRun`), `id` (unique id for the
object), `title` / `description` if available (for
crawl/upload/collection), and `crawlId` for `qaRun`
2024-10-03 16:13:31 -07:00
Henry Wilkinson
2429bb620c
Docs: Proxy docs copy editing (#2102)
### Changes
- Minor copy edits and sentence restructures
- Markdown formatting

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-10-03 12:18:05 -04:00
Ilya Kreymer
00efb3615b
Stopped crawls should not show stop/cancel buttons (#2101)
- crawler isActive() should only look at running states, not 'stopping',
since stopping may not be reset to false after crawl is stopped
(probably should reset, but also a record that the crawl ended as being
stopped). crawl is guaranteed to remain in one of the active states
while stopping on the backend
- fixes #2100
2024-10-02 23:28:04 -07:00
Emma Segal-Grossman
87cc38b737
[Chore] Webpack: Migrate from onBeforeSetupMiddleware to setupMiddlewares (#2080)
Resolves the big red
`[DEP_WEBPACK_DEV_SERVER_ON_BEFORE_SETUP_MIDDLEWARE] DeprecationWarning:
'onBeforeSetupMiddleware' option is deprecated. Please use the
'setupMiddlewares' option.` warnings when starting Webpack.

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-10-02 19:35:32 -07:00
Emma Segal-Grossman
adec41bd70
[Chore] Update browserlist db (#2081)
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-10-02 19:28:37 -07:00
sua yoo
22435ddaf9
Save new workflow scope type preference (#2099)
Resolves https://github.com/webrecorder/browsertrix/issues/2091

<!-- Fixes #issue_number -->

### Changes

Saves scope type chosen from "+ New Workflow" dropdown to local storage, as well as from within workflow editor when creating a new workflow (but not editing an existing one).
2024-10-02 19:08:20 -07:00
Vinzenz Sinapius
bb6e703f6a
Configure browsertrix proxies (#1847)
Resolves #1354

Supports crawling through pre-configured proxy servers, allowing users to select which proxy servers to use (requires browsertrix crawler 1.3+)

Config:
- proxies defined in btrix-proxies subchart
- can be configured via btrix-proxies key or separate proxies.yaml file via separate subchart
- proxies list refreshed automatically if crawler_proxies.json changes if subchart is deployed
- support for ssh and socks5 proxies
- proxy keys added to secrets in subchart
- support for default proxy to be always used if no other proxy configured, prevent starting cluster if default proxy not available
- prevent starting manual crawl if previously configured proxy is no longer available, return error
- force 'btrix' username and group name on browsertrix-crawler non-root user to support ssh

Operator:
- support crawling through proxies, pass proxyId in CrawlJob
- support running profile browsers which designated proxy, pass proxyId to ProfileJob
- prevent starting scheduled crawl if previously configured proxy is no longer available

API / Access:
- /api/orgs/all/crawlconfigs/crawler-proxies - get all proxies (superadmin only)
- /api/orgs/{oid}/crawlconfigs/crawler-proxies - get proxies available to particular org
- /api/orgs/{oid}/proxies - update allowed proxies for particular org (superadmin only)
- superadmin can configure which orgs can use which proxies, stored on the org
- superadmin can also allow an org to access all 'shared' proxies, to avoid having to allow a shared proxy on each org.

UI:
- Superadmin has 'Edit Proxies' dialog to configure for each org if it has: dedicated proxies, has access to shared proxies.
- User can select a proxy in Crawl Workflow browser settings
- Users can choose to launch a browser profile with a particular proxy
- Display which proxy is used to create profile in profile selector
- Users can choose with default proxy to use for new workflows in Crawling Defaults

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-10-02 18:35:45 -07:00
sua yoo
08aa2f86f3
chore: Auto-commit extracted localization strings (#2089)
Removes `localize:extract` from pre-commit hook and commits changes from
`localize:extract` in frontend PR build check.
2024-09-30 10:48:13 -07:00
sua yoo
612bbb6f42
feat: Merge workflow job types (#2068)
Resolves https://github.com/webrecorder/browsertrix/issues/2073

### Changes

- Removes "URL List" and "Seeded Crawl" job type distinction and adds as
additional crawl scope types instead.
- 'New Workflow' button defaults to Single Page
- 'New Workflow' dropdown includes Page Crawl (Single Page, Page List, In-Page Links) and Site Crawl (Page in Same Directory, Page on Same Domain, + Subdomains and Custom Page Prefix)
- Enables specifying `DOCS_URL` in `.env`
- Additional follow-ups in #2090, #2091
2024-09-25 10:37:18 -04:00
Ilya Kreymer
62da0fbd6c
security: tweak get /invite endpoints / InviteOut to: (#2087)
don't set inviterEmail / inviterName if the inviter is the superuser:
- return fromSuperuser true/false
- if fromSuperuser, don't set inviterEmail / inviterName
- tests: add tests for non-superuser admin invites
2024-09-20 11:52:56 -07:00
Vinzenz Sinapius
a674689354
Update ansible pipfile (#2088)
Fixes some dependabot alerts
2024-09-20 11:41:21 -07:00
Ilya Kreymer
feb6b1f26c
Ensure email comparisons are case-insensitive, emails stored as lowercase (#2084) (#2086) (fixes from 1.11.7)
- Add a custom EmailStr type which lowercases the full e-mail, not just
the domain.
- Ensure EmailStr is used throughout wherever e-mails are used, both for
invites and user models
- Tests: update to check for lowercase email responses, e-mails returned
from APIs are always lowercase
- Tests: remove tests where '@' was ur-lencoded, should not be possible
since POSTing JSON and no url-decoding is done/expected. E-mails should
have '@' present.
- Fixes #2083 where invites were rejected due to case differences
- CI: pin pymongo dependency due to latest releases update, update python used for CI
2024-09-19 12:20:34 -07:00
sua yoo
a8f4f8cfc3
docs: Clarify hosted vs. self-deployment requirements (#2082)
Updates docs to clarify difference between self-hosting and hosted
subscription.

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-09-18 13:43:09 -07:00
Emma Segal-Grossman
9a799cc8ab
Ensure that CI fails if extracted strings don't match (#2078)
- Ensures extracted strings get formatted before checking against index
- Fixes index check by switching from `git diff-index` to `git diff`,
and ensures the proper `--exit-code` flag is present (implicitly turned
on by `--quiet`)
- Adds actionable error message when the check fails
- Updates Github's actions versions from v3 to v4 (major version bump is
primarily just for default node version updates, but this way we'll get
future updates)
- Adds formatting step to npm script for extracting messages
- Runs a string extraction & format against current main
2024-09-16 16:48:33 -04:00
Tessa Walsh
123705c53f
Serialize datetimes with Z suffix (#2058)
Use timezone aware datetimes instead of timezone naive datetimes:
- Update mongodb client to use tz-aware conversion
- Convert dt_now() to return timezone aware UTC date
- Rename to_k8s_date -> date_to_str, just returns ISO UTC date with 'Z'
(instead of '+00:00' suffix)
- Rename from_k8s_date -> str_to_date, returns timezone aware date from
str
- Standardize all string<->date conversion to use either date_to_str or
str_to_date
- Update frontend to assume iso date, not append 'Z' directly
- Update tests to check for 'Z' suffix on some dates

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-09-12 16:16:13 -07:00
Ilya Kreymer
c242bb96d2 version: bump to 1.12.0-beta.0 2024-09-12 14:30:15 -07:00
Ilya Kreymer
1f919de294
Allow custom auto-resize crawler volume ratio adjustable (#2076)
Make the avail / used storage ratio (for crawler volumes) adjustable.
Disable auto-resize if set to 0.
Follow-up to #2023
2024-09-12 09:28:19 -07:00
sua yoo
49ce894353
Merge pull request #2075 from weblate/weblate-browsertrix-browsertrix-ui
Translations update from Hosted Weblate
2024-09-11 13:25:04 -07:00
Webrecorder Dev
2afd2b992d
Translated using Weblate (Spanish)
Currently translated at 1.1% (14 of 1213 strings)

Translation: Browsertrix/Browsertrix UI
Translate-URL: https://hosted.weblate.org/projects/browsertrix/browsertrix-ui/es/
2024-09-11 20:06:49 +02:00
sua yoo
f91e32f866
chore: Format XLIFF files (#2074)
Formats XLIFF file to match Weblate per
https://github.com/webrecorder/browsertrix/issues/1416#issuecomment-2344227966
2024-09-11 10:46:39 -07:00
sua yoo
d3ed78575d
chore: Revert frontend build check trigger (#2071)
Reverts `push` trigger added to frontend build check in
99ed08656a
now that we require PRs to merge.
2024-09-10 16:44:36 -07:00
sua yoo
55f3b8abb1
fix: Watch tab crawl state consistency (#2060)
- Fixes empty watch tab during some active states
- Disables exclusion and browser window editing while a crawl is
stopping
- Refactors frontend crawl states to match backend
2024-09-10 14:47:29 -07:00
sua yoo
b58d367da7
fix: Archived item navigation improvements (#2062)
- Replaces QA Review breadcrumbs with "Back" button that takes you back
to the archived item
- Crawl breadcrumbs always point to workflow
- Shows archived item icon next to item name to differentiate from
workflow
- Adds "Edit Workflow" button to "Crawl Settings"

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-09-10 14:26:29 -07:00
sua yoo
99ed08656a
feat: Localization workflow improvements (#2069)
- Extracts translatable text strings in pre-commit hook
- Updates ternary pluralization to use `pluralOf` instead
- Generates XLIFF for Spanish
2024-09-10 14:15:26 -07:00
sua yoo
8ccd36b0a7
fix: Unstyled content visibility (#2070)
Fixes flash of unstyled content due to dynamic imports.
2024-09-10 13:32:15 -07:00
sua yoo
c01e3dd88b
feat: Improve UX of choosing new workflow crawl type (#2067)
Resolves https://github.com/webrecorder/browsertrix/issues/2066

### Changes
- Allows directly choosing new "Page List" or "Site Crawl from
workflow list
- Reverts terminology introduced in
https://github.com/webrecorder/browsertrix/pull/2032
2024-09-09 16:42:47 -07:00
sua yoo
b4e34d1c3c
fix: Fix collection description (#2065)
Fixes https://github.com/webrecorder/browsertrix/issues/2064

### Changes

- Switches MDE library to one that supports shadow DOM
- Refactors collection components to btrix components
- Fixes collection detail not expanding and contracting correctly
2024-09-05 22:10:14 -07:00
sua yoo
4c36c80351
feat: Display scale as number of browser windows (#2057)
Resolves https://github.com/webrecorder/browsertrix/issues/2048

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-09-05 17:32:40 -07:00
Ilya Kreymer
b3c1195878 version: bump to 1.11.6 2024-09-05 17:31:10 -07:00
sua yoo
880e27370d
feat: Breadcrumb navigation (#2053)
- Adds breadcrumb navigation to all detail views, returning to correct
origin for workflow and collection items
- Refactors list page headers into layout utility
- Refactors crawl tab labels and renames "Files" tab to "WACZ Files"
2024-08-30 09:08:24 -07:00
sua yoo
e4107d0e76
fix: correct link to crawlilng defaults 2024-08-29 17:00:29 -07:00
sua yoo
eb2dab8ae0
fix: Update browser title (#2054)
Updates browser title when visiting the following pages:

- Superadmin dashboard
- Org top-level pages
- Account settings
2024-08-29 16:50:14 -07:00
sua yoo
988d9c9e2b
feat: Allow org admins to set default workflow configs (#2020)
- Adds new org settings tab for updating crawl details
- Refactors `workflow-editor` to move out utility functions
- Updates user guide on org settings

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-08-29 16:49:22 -07:00
sua yoo
a44e9207ca
docs: Publish only on release or manual run (#2055) 2024-08-28 15:28:27 -07:00
sua yoo
ecac4f6939
docs: Reorganize user guide (#2050)
Reorganizes user guide to be more solutions based

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-08-28 09:50:42 -07:00
Ilya Kreymer
ea252e8da9 version: bump to 1.11.5 2024-08-27 10:00:53 -07:00
sua yoo
337454f8c9
feat: Add link to hosted sign-up page (#2045)
Resolves https://github.com/webrecorder/browsertrix/issues/2043

<!-- Fixes #issue_number -->

### Changes

- Shows link to sign up in UI if `sign_up_url` is configured.
- Expires settings in session storage (for now)
2024-08-26 17:26:25 -07:00
sua yoo
c0725599b2
fix: Help forum link not showing on mobile (#2049)
Fixes the "Help Forum" link text not showing on smaller screens.
2024-08-26 15:53:08 -07:00
sua yoo
d119e8fd77
chore: Lock yarn version to classic (#2047)
Enables installing the app with yarn 2+.
2024-08-26 15:30:59 -07:00
Ilya Kreymer
95969ec747
Attempt to auto-adjust storage if usage is running out while crawl is running (#2023)
Attempt to auto-adjust PVC storage if:
- used storage (as reported in redis by the crawler) * 2.5 >
total_storage
- will cause PVC to resize, if possible (not supported by all drivers)
- uses multiples of 1Gi, rounding up to next GB
- AVAIL_STORAGE_RATIO hard-coded to 2.5 for now, to account for 2x space
for WACZ plus change for fast updating crawls

Some caveats:
- only works if the storageClass used for PVCs has
`allowVolumeExpansion: true`, if not, it will have no effect
- designed as a last resort option: the `crawl_storage` in values and
`--sizeLimit` and `--diskUtilization` should generally result in this
not being needed.
- can be useful in cases where a crawl is rapidly capturing a lot of
content in one page, and there's no time to interrupt / restart, since
the other limits apply only at page end.
- May want to have crawler update the disk usage more frequently, not
just at page end to make this more effective.
2024-08-26 14:19:20 -07:00
Ilya Kreymer
a1df689729
stats recompute fixes: (#2022)
- fix stats_recompute_last() and stats_recompute_all() to not update the
lastCrawl* properties of a crawl workflow if a crawl is running, as
those stats now point to the running crawl
- refactor _add_running_curr_crawl_stats() to make it clear stats only
updated if crawl is running
- stats_recompute_all() change order to ascending to actually get last
crawl, not first!
2024-08-26 14:18:59 -07:00
Ilya Kreymer
135c97419d version: update to 1.11.4 2024-08-26 12:31:56 -07:00