Commit Graph

1632 Commits

Author SHA1 Message Date
sua yoo
59dff89614
chore: Update PR template headings (#2132) 2024-11-05 19:11:58 -08:00
Henry Wilkinson
43ebc1c94c
Update Readme to add Weblate information (#2109)
- Updates README file with a link to our Weblate project
- Fixes minor out of date text bits

Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
2024-10-31 16:36:12 -04:00
Henry Wilkinson
74161f8477
Update Webrecorder.net links (#2120)
- Updates documentation links to point to new Browsertrix landing page
- Updates redoc links
2024-10-31 16:33:54 -04:00
Tessa Walsh
55a758f342
Consolidate ops class initialization (#2117)
Fixes #2111

The background job and operator entrypoints now use a shared function
that initalizes and returns the ops classes. This is not applied in the
main entrypoint as that also initializes the backend API, which we don't
want in the other entrypoints.
2024-10-30 15:33:22 -04:00
Tessa Walsh
0dc025e9fd
Update nightly org deletion tests to account for bg job (#2118)
Follow-up to https://github.com/webrecorder/browsertrix/pull/2098

Updates I missed to nightly org deletion tests following the shift to
deleting orgs in a background job. I think this should be the last thing
to get nightly tests passing consistently again.
2024-10-30 15:31:33 -04:00
Tessa Walsh
3ea20e538d
Fix nightly tests: Add boto3 as test requirement (#2116) 2024-10-23 13:41:22 -07:00
Tessa Walsh
f7426cc46a
Fix nightly tests: modify kubectl exec syntax for creating new minio bucket (#2097)
Fixes #2096

For example failing test run, see:
https://github.com/webrecorder/browsertrix/actions/runs/11121185534/job/30899729448

---------
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2024-10-21 17:41:19 -07:00
Tessa Walsh
1b1819ba5a
Move org deletion to background job with access to backend ops classes (#2098)
This PR introduces background jobs that have full access to the backend
ops classes and moves the delete org job to a background job.
2024-10-10 14:41:05 -04:00
Ilya Kreymer
84a74c43a4 version: bump to 1.13.0-beta.0 2024-10-10 11:38:13 -07:00
Ilya Kreymer
6032e28231
fix: firstOrgAdmin being set to true even if invite was not for an admin (#2110)
Non-admin users should not be given option to rename org when invited to
a new org:
- set firstOrgAdmin to true only when invite is for an admin
- default to false instead of null
- update tests to check
2024-10-08 16:42:30 -07:00
Ilya Kreymer
c33f749515
Frontend hosted-docs (#2107)
Fixes #2106 

Docs are now hosted as part of the frontend at `/docs` by default.

- If `docs_url` is set in the helm chart, the `/docs` endpoint will
redirect to that endpoint instead
- Use multi-stage python image to build mkdocs as part of frontend, then
copy static output
- Dir layout: mkdocs.yml and docs into frontend/docs
- CI: Update docs build GH action to use new path
- Update all frontend paths to use `/docs/` instead of
`https://docs.browsertrix.com/`

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-10-08 14:56:34 -07:00
Ilya Kreymer
d12ce9a591
sidebar setup guide fixes: (#2105)
- ensure user-guide iframe is always visible if not fading in when
'Setup Guide' button is clicked
- Clicking 'Setup Guide' scrolls to relevant section of guide (scope,
limits, browser settings, scheduling, metadata) based on current
workflow editor tab via hashtag

---------

Co-authored-by: sua yoo <sua@webrecorder.org>
2024-10-07 22:14:46 -07:00
Ilya Kreymer
dde5056242
quickfix: add a flag to avoid clearing primarySeedUrl when switching scope types (#2104)
otherwise, triggers a partial state update, which causes a form to be
reset.
2024-10-07 13:05:23 -07:00
Ilya Kreymer
406879f2f4
ui: fix missing click handler on QA review 'cancel' button (#2103)
quick fix: add a handler to close the dialog as expected
2024-10-04 17:35:26 -07:00
Ilya Kreymer
8192e5bed6 version: bump to 1.12.0 2024-10-03 16:45:54 -07:00
Ilya Kreymer
104ea097c4
switch to simpler streaming download + multiwacz metadata improvements: (#1982)
- download via presigned URLs via requests instead of boto APIs, remove boto
- follow-up to #1933 for streaming download improvements
- fixes datapackage.json in multi-wacz to contain the same resources
objects with: `name`, `path`, `hash`, `bytes` to match single WACZ.
- Add additional metadata to multi-wacz datapackage.json, including `type`
(`crawl`, `upload`, `collection`, `qaRun`), `id` (unique id for the
object), `title` / `description` if available (for
crawl/upload/collection), and `crawlId` for `qaRun`
2024-10-03 16:13:31 -07:00
Henry Wilkinson
2429bb620c
Docs: Proxy docs copy editing (#2102)
### Changes
- Minor copy edits and sentence restructures
- Markdown formatting

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-10-03 12:18:05 -04:00
Ilya Kreymer
00efb3615b
Stopped crawls should not show stop/cancel buttons (#2101)
- crawler isActive() should only look at running states, not 'stopping',
since stopping may not be reset to false after crawl is stopped
(probably should reset, but also a record that the crawl ended as being
stopped). crawl is guaranteed to remain in one of the active states
while stopping on the backend
- fixes #2100
2024-10-02 23:28:04 -07:00
Emma Segal-Grossman
87cc38b737
[Chore] Webpack: Migrate from onBeforeSetupMiddleware to setupMiddlewares (#2080)
Resolves the big red
`[DEP_WEBPACK_DEV_SERVER_ON_BEFORE_SETUP_MIDDLEWARE] DeprecationWarning:
'onBeforeSetupMiddleware' option is deprecated. Please use the
'setupMiddlewares' option.` warnings when starting Webpack.

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-10-02 19:35:32 -07:00
Emma Segal-Grossman
adec41bd70
[Chore] Update browserlist db (#2081)
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-10-02 19:28:37 -07:00
sua yoo
22435ddaf9
Save new workflow scope type preference (#2099)
Resolves https://github.com/webrecorder/browsertrix/issues/2091

<!-- Fixes #issue_number -->

### Changes

Saves scope type chosen from "+ New Workflow" dropdown to local storage, as well as from within workflow editor when creating a new workflow (but not editing an existing one).
2024-10-02 19:08:20 -07:00
Vinzenz Sinapius
bb6e703f6a
Configure browsertrix proxies (#1847)
Resolves #1354

Supports crawling through pre-configured proxy servers, allowing users to select which proxy servers to use (requires browsertrix crawler 1.3+)

Config:
- proxies defined in btrix-proxies subchart
- can be configured via btrix-proxies key or separate proxies.yaml file via separate subchart
- proxies list refreshed automatically if crawler_proxies.json changes if subchart is deployed
- support for ssh and socks5 proxies
- proxy keys added to secrets in subchart
- support for default proxy to be always used if no other proxy configured, prevent starting cluster if default proxy not available
- prevent starting manual crawl if previously configured proxy is no longer available, return error
- force 'btrix' username and group name on browsertrix-crawler non-root user to support ssh

Operator:
- support crawling through proxies, pass proxyId in CrawlJob
- support running profile browsers which designated proxy, pass proxyId to ProfileJob
- prevent starting scheduled crawl if previously configured proxy is no longer available

API / Access:
- /api/orgs/all/crawlconfigs/crawler-proxies - get all proxies (superadmin only)
- /api/orgs/{oid}/crawlconfigs/crawler-proxies - get proxies available to particular org
- /api/orgs/{oid}/proxies - update allowed proxies for particular org (superadmin only)
- superadmin can configure which orgs can use which proxies, stored on the org
- superadmin can also allow an org to access all 'shared' proxies, to avoid having to allow a shared proxy on each org.

UI:
- Superadmin has 'Edit Proxies' dialog to configure for each org if it has: dedicated proxies, has access to shared proxies.
- User can select a proxy in Crawl Workflow browser settings
- Users can choose to launch a browser profile with a particular proxy
- Display which proxy is used to create profile in profile selector
- Users can choose with default proxy to use for new workflows in Crawling Defaults

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-10-02 18:35:45 -07:00
sua yoo
08aa2f86f3
chore: Auto-commit extracted localization strings (#2089)
Removes `localize:extract` from pre-commit hook and commits changes from
`localize:extract` in frontend PR build check.
2024-09-30 10:48:13 -07:00
sua yoo
612bbb6f42
feat: Merge workflow job types (#2068)
Resolves https://github.com/webrecorder/browsertrix/issues/2073

### Changes

- Removes "URL List" and "Seeded Crawl" job type distinction and adds as
additional crawl scope types instead.
- 'New Workflow' button defaults to Single Page
- 'New Workflow' dropdown includes Page Crawl (Single Page, Page List, In-Page Links) and Site Crawl (Page in Same Directory, Page on Same Domain, + Subdomains and Custom Page Prefix)
- Enables specifying `DOCS_URL` in `.env`
- Additional follow-ups in #2090, #2091
2024-09-25 10:37:18 -04:00
Ilya Kreymer
62da0fbd6c
security: tweak get /invite endpoints / InviteOut to: (#2087)
don't set inviterEmail / inviterName if the inviter is the superuser:
- return fromSuperuser true/false
- if fromSuperuser, don't set inviterEmail / inviterName
- tests: add tests for non-superuser admin invites
2024-09-20 11:52:56 -07:00
Vinzenz Sinapius
a674689354
Update ansible pipfile (#2088)
Fixes some dependabot alerts
2024-09-20 11:41:21 -07:00
Ilya Kreymer
feb6b1f26c
Ensure email comparisons are case-insensitive, emails stored as lowercase (#2084) (#2086) (fixes from 1.11.7)
- Add a custom EmailStr type which lowercases the full e-mail, not just
the domain.
- Ensure EmailStr is used throughout wherever e-mails are used, both for
invites and user models
- Tests: update to check for lowercase email responses, e-mails returned
from APIs are always lowercase
- Tests: remove tests where '@' was ur-lencoded, should not be possible
since POSTing JSON and no url-decoding is done/expected. E-mails should
have '@' present.
- Fixes #2083 where invites were rejected due to case differences
- CI: pin pymongo dependency due to latest releases update, update python used for CI
2024-09-19 12:20:34 -07:00
sua yoo
a8f4f8cfc3
docs: Clarify hosted vs. self-deployment requirements (#2082)
Updates docs to clarify difference between self-hosting and hosted
subscription.

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-09-18 13:43:09 -07:00
Emma Segal-Grossman
9a799cc8ab
Ensure that CI fails if extracted strings don't match (#2078)
- Ensures extracted strings get formatted before checking against index
- Fixes index check by switching from `git diff-index` to `git diff`,
and ensures the proper `--exit-code` flag is present (implicitly turned
on by `--quiet`)
- Adds actionable error message when the check fails
- Updates Github's actions versions from v3 to v4 (major version bump is
primarily just for default node version updates, but this way we'll get
future updates)
- Adds formatting step to npm script for extracting messages
- Runs a string extraction & format against current main
2024-09-16 16:48:33 -04:00
Tessa Walsh
123705c53f
Serialize datetimes with Z suffix (#2058)
Use timezone aware datetimes instead of timezone naive datetimes:
- Update mongodb client to use tz-aware conversion
- Convert dt_now() to return timezone aware UTC date
- Rename to_k8s_date -> date_to_str, just returns ISO UTC date with 'Z'
(instead of '+00:00' suffix)
- Rename from_k8s_date -> str_to_date, returns timezone aware date from
str
- Standardize all string<->date conversion to use either date_to_str or
str_to_date
- Update frontend to assume iso date, not append 'Z' directly
- Update tests to check for 'Z' suffix on some dates

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-09-12 16:16:13 -07:00
Ilya Kreymer
c242bb96d2 version: bump to 1.12.0-beta.0 2024-09-12 14:30:15 -07:00
Ilya Kreymer
1f919de294
Allow custom auto-resize crawler volume ratio adjustable (#2076)
Make the avail / used storage ratio (for crawler volumes) adjustable.
Disable auto-resize if set to 0.
Follow-up to #2023
2024-09-12 09:28:19 -07:00
sua yoo
49ce894353
Merge pull request #2075 from weblate/weblate-browsertrix-browsertrix-ui
Translations update from Hosted Weblate
2024-09-11 13:25:04 -07:00
Webrecorder Dev
2afd2b992d
Translated using Weblate (Spanish)
Currently translated at 1.1% (14 of 1213 strings)

Translation: Browsertrix/Browsertrix UI
Translate-URL: https://hosted.weblate.org/projects/browsertrix/browsertrix-ui/es/
2024-09-11 20:06:49 +02:00
sua yoo
f91e32f866
chore: Format XLIFF files (#2074)
Formats XLIFF file to match Weblate per
https://github.com/webrecorder/browsertrix/issues/1416#issuecomment-2344227966
2024-09-11 10:46:39 -07:00
sua yoo
d3ed78575d
chore: Revert frontend build check trigger (#2071)
Reverts `push` trigger added to frontend build check in
99ed08656a
now that we require PRs to merge.
2024-09-10 16:44:36 -07:00
sua yoo
55f3b8abb1
fix: Watch tab crawl state consistency (#2060)
- Fixes empty watch tab during some active states
- Disables exclusion and browser window editing while a crawl is
stopping
- Refactors frontend crawl states to match backend
2024-09-10 14:47:29 -07:00
sua yoo
b58d367da7
fix: Archived item navigation improvements (#2062)
- Replaces QA Review breadcrumbs with "Back" button that takes you back
to the archived item
- Crawl breadcrumbs always point to workflow
- Shows archived item icon next to item name to differentiate from
workflow
- Adds "Edit Workflow" button to "Crawl Settings"

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-09-10 14:26:29 -07:00
sua yoo
99ed08656a
feat: Localization workflow improvements (#2069)
- Extracts translatable text strings in pre-commit hook
- Updates ternary pluralization to use `pluralOf` instead
- Generates XLIFF for Spanish
2024-09-10 14:15:26 -07:00
sua yoo
8ccd36b0a7
fix: Unstyled content visibility (#2070)
Fixes flash of unstyled content due to dynamic imports.
2024-09-10 13:32:15 -07:00
sua yoo
c01e3dd88b
feat: Improve UX of choosing new workflow crawl type (#2067)
Resolves https://github.com/webrecorder/browsertrix/issues/2066

### Changes
- Allows directly choosing new "Page List" or "Site Crawl from
workflow list
- Reverts terminology introduced in
https://github.com/webrecorder/browsertrix/pull/2032
2024-09-09 16:42:47 -07:00
sua yoo
b4e34d1c3c
fix: Fix collection description (#2065)
Fixes https://github.com/webrecorder/browsertrix/issues/2064

### Changes

- Switches MDE library to one that supports shadow DOM
- Refactors collection components to btrix components
- Fixes collection detail not expanding and contracting correctly
2024-09-05 22:10:14 -07:00
sua yoo
4c36c80351
feat: Display scale as number of browser windows (#2057)
Resolves https://github.com/webrecorder/browsertrix/issues/2048

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-09-05 17:32:40 -07:00
Ilya Kreymer
b3c1195878 version: bump to 1.11.6 2024-09-05 17:31:10 -07:00
sua yoo
880e27370d
feat: Breadcrumb navigation (#2053)
- Adds breadcrumb navigation to all detail views, returning to correct
origin for workflow and collection items
- Refactors list page headers into layout utility
- Refactors crawl tab labels and renames "Files" tab to "WACZ Files"
2024-08-30 09:08:24 -07:00
sua yoo
e4107d0e76
fix: correct link to crawlilng defaults 2024-08-29 17:00:29 -07:00
sua yoo
eb2dab8ae0
fix: Update browser title (#2054)
Updates browser title when visiting the following pages:

- Superadmin dashboard
- Org top-level pages
- Account settings
2024-08-29 16:50:14 -07:00
sua yoo
988d9c9e2b
feat: Allow org admins to set default workflow configs (#2020)
- Adds new org settings tab for updating crawl details
- Refactors `workflow-editor` to move out utility functions
- Updates user guide on org settings

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-08-29 16:49:22 -07:00
sua yoo
a44e9207ca
docs: Publish only on release or manual run (#2055) 2024-08-28 15:28:27 -07:00
sua yoo
ecac4f6939
docs: Reorganize user guide (#2050)
Reorganizes user guide to be more solutions based

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-08-28 09:50:42 -07:00