Commit Graph

629 Commits

Author SHA1 Message Date
Ilya Kreymer
dde5056242
quickfix: add a flag to avoid clearing primarySeedUrl when switching scope types (#2104)
otherwise, triggers a partial state update, which causes a form to be
reset.
2024-10-07 13:05:23 -07:00
Ilya Kreymer
406879f2f4
ui: fix missing click handler on QA review 'cancel' button (#2103)
quick fix: add a handler to close the dialog as expected
2024-10-04 17:35:26 -07:00
Ilya Kreymer
00efb3615b
Stopped crawls should not show stop/cancel buttons (#2101)
- crawler isActive() should only look at running states, not 'stopping',
since stopping may not be reset to false after crawl is stopped
(probably should reset, but also a record that the crawl ended as being
stopped). crawl is guaranteed to remain in one of the active states
while stopping on the backend
- fixes #2100
2024-10-02 23:28:04 -07:00
sua yoo
22435ddaf9
Save new workflow scope type preference (#2099)
Resolves https://github.com/webrecorder/browsertrix/issues/2091

<!-- Fixes #issue_number -->

### Changes

Saves scope type chosen from "+ New Workflow" dropdown to local storage, as well as from within workflow editor when creating a new workflow (but not editing an existing one).
2024-10-02 19:08:20 -07:00
Vinzenz Sinapius
bb6e703f6a
Configure browsertrix proxies (#1847)
Resolves #1354

Supports crawling through pre-configured proxy servers, allowing users to select which proxy servers to use (requires browsertrix crawler 1.3+)

Config:
- proxies defined in btrix-proxies subchart
- can be configured via btrix-proxies key or separate proxies.yaml file via separate subchart
- proxies list refreshed automatically if crawler_proxies.json changes if subchart is deployed
- support for ssh and socks5 proxies
- proxy keys added to secrets in subchart
- support for default proxy to be always used if no other proxy configured, prevent starting cluster if default proxy not available
- prevent starting manual crawl if previously configured proxy is no longer available, return error
- force 'btrix' username and group name on browsertrix-crawler non-root user to support ssh

Operator:
- support crawling through proxies, pass proxyId in CrawlJob
- support running profile browsers which designated proxy, pass proxyId to ProfileJob
- prevent starting scheduled crawl if previously configured proxy is no longer available

API / Access:
- /api/orgs/all/crawlconfigs/crawler-proxies - get all proxies (superadmin only)
- /api/orgs/{oid}/crawlconfigs/crawler-proxies - get proxies available to particular org
- /api/orgs/{oid}/proxies - update allowed proxies for particular org (superadmin only)
- superadmin can configure which orgs can use which proxies, stored on the org
- superadmin can also allow an org to access all 'shared' proxies, to avoid having to allow a shared proxy on each org.

UI:
- Superadmin has 'Edit Proxies' dialog to configure for each org if it has: dedicated proxies, has access to shared proxies.
- User can select a proxy in Crawl Workflow browser settings
- Users can choose to launch a browser profile with a particular proxy
- Display which proxy is used to create profile in profile selector
- Users can choose with default proxy to use for new workflows in Crawling Defaults

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-10-02 18:35:45 -07:00
sua yoo
612bbb6f42
feat: Merge workflow job types (#2068)
Resolves https://github.com/webrecorder/browsertrix/issues/2073

### Changes

- Removes "URL List" and "Seeded Crawl" job type distinction and adds as
additional crawl scope types instead.
- 'New Workflow' button defaults to Single Page
- 'New Workflow' dropdown includes Page Crawl (Single Page, Page List, In-Page Links) and Site Crawl (Page in Same Directory, Page on Same Domain, + Subdomains and Custom Page Prefix)
- Enables specifying `DOCS_URL` in `.env`
- Additional follow-ups in #2090, #2091
2024-09-25 10:37:18 -04:00
Tessa Walsh
123705c53f
Serialize datetimes with Z suffix (#2058)
Use timezone aware datetimes instead of timezone naive datetimes:
- Update mongodb client to use tz-aware conversion
- Convert dt_now() to return timezone aware UTC date
- Rename to_k8s_date -> date_to_str, just returns ISO UTC date with 'Z'
(instead of '+00:00' suffix)
- Rename from_k8s_date -> str_to_date, returns timezone aware date from
str
- Standardize all string<->date conversion to use either date_to_str or
str_to_date
- Update frontend to assume iso date, not append 'Z' directly
- Update tests to check for 'Z' suffix on some dates

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-09-12 16:16:13 -07:00
sua yoo
55f3b8abb1
fix: Watch tab crawl state consistency (#2060)
- Fixes empty watch tab during some active states
- Disables exclusion and browser window editing while a crawl is
stopping
- Refactors frontend crawl states to match backend
2024-09-10 14:47:29 -07:00
sua yoo
b58d367da7
fix: Archived item navigation improvements (#2062)
- Replaces QA Review breadcrumbs with "Back" button that takes you back
to the archived item
- Crawl breadcrumbs always point to workflow
- Shows archived item icon next to item name to differentiate from
workflow
- Adds "Edit Workflow" button to "Crawl Settings"

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-09-10 14:26:29 -07:00
sua yoo
99ed08656a
feat: Localization workflow improvements (#2069)
- Extracts translatable text strings in pre-commit hook
- Updates ternary pluralization to use `pluralOf` instead
- Generates XLIFF for Spanish
2024-09-10 14:15:26 -07:00
sua yoo
8ccd36b0a7
fix: Unstyled content visibility (#2070)
Fixes flash of unstyled content due to dynamic imports.
2024-09-10 13:32:15 -07:00
sua yoo
c01e3dd88b
feat: Improve UX of choosing new workflow crawl type (#2067)
Resolves https://github.com/webrecorder/browsertrix/issues/2066

### Changes
- Allows directly choosing new "Page List" or "Site Crawl from
workflow list
- Reverts terminology introduced in
https://github.com/webrecorder/browsertrix/pull/2032
2024-09-09 16:42:47 -07:00
sua yoo
b4e34d1c3c
fix: Fix collection description (#2065)
Fixes https://github.com/webrecorder/browsertrix/issues/2064

### Changes

- Switches MDE library to one that supports shadow DOM
- Refactors collection components to btrix components
- Fixes collection detail not expanding and contracting correctly
2024-09-05 22:10:14 -07:00
sua yoo
4c36c80351
feat: Display scale as number of browser windows (#2057)
Resolves https://github.com/webrecorder/browsertrix/issues/2048

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-09-05 17:32:40 -07:00
sua yoo
880e27370d
feat: Breadcrumb navigation (#2053)
- Adds breadcrumb navigation to all detail views, returning to correct
origin for workflow and collection items
- Refactors list page headers into layout utility
- Refactors crawl tab labels and renames "Files" tab to "WACZ Files"
2024-08-30 09:08:24 -07:00
sua yoo
e4107d0e76
fix: correct link to crawlilng defaults 2024-08-29 17:00:29 -07:00
sua yoo
eb2dab8ae0
fix: Update browser title (#2054)
Updates browser title when visiting the following pages:

- Superadmin dashboard
- Org top-level pages
- Account settings
2024-08-29 16:50:14 -07:00
sua yoo
988d9c9e2b
feat: Allow org admins to set default workflow configs (#2020)
- Adds new org settings tab for updating crawl details
- Refactors `workflow-editor` to move out utility functions
- Updates user guide on org settings

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-08-29 16:49:22 -07:00
sua yoo
337454f8c9
feat: Add link to hosted sign-up page (#2045)
Resolves https://github.com/webrecorder/browsertrix/issues/2043

<!-- Fixes #issue_number -->

### Changes

- Shows link to sign up in UI if `sign_up_url` is configured.
- Expires settings in session storage (for now)
2024-08-26 17:26:25 -07:00
sua yoo
c0725599b2
fix: Help forum link not showing on mobile (#2049)
Fixes the "Help Forum" link text not showing on smaller screens.
2024-08-26 15:53:08 -07:00
sua yoo
2a057eddd6
chore: Improve time to load org UI (#2044)
Improves time to first load an org with the following:
- Users user info from login response to set org slug and route user on
log in
- Stores user info in session storage so that it's available on reload
- Stores app settings in local storage until user logs out
- Loads critical org components synchronously
2024-08-26 10:45:10 -07:00
sua yoo
acd3e1252d
feat: Add help shortcuts to app header & footer (#2040)
WIP for https://github.com/webrecorder/browsertrix/issues/2041

<!-- Fixes #issue_number -->

### Changes

- Adds button to open embedded support guide
- Adds link to help forum
- Refactors app bar to look nicer on smaller screens
2024-08-23 18:11:29 -07:00
sua yoo
0e16d526c0
fix: Hide login link on login page (#2039)
- Removes log in link when on log in page
- Fix e2e test, which wasn't actually logging the test user in before
2024-08-21 15:33:24 -07:00
sua yoo
25b1928d44
feat: Enable deleting workflow from list (#2042)
Adds back workflow list menu item to delete workflow if it's never been
run.
2024-08-21 15:33:00 -07:00
sua yoo
2ca9632057
feat: Add additional context around workflow job type options (#2032)
- Updates workflow job type copy and adds additional clarifying text
- Changes "List of URLs" label to "Crawl URL(s)"
- Refactors `NewWorkflowDialog` into tailwind element
2024-08-21 14:03:43 -07:00
sua yoo
3605d07547
fix: Make footer translatable (#2038)
- Wraps footer strings to prepare for localization
- Removes extraneous class names
- Updates copy button tooltip to match bug report field
2024-08-21 14:01:52 -07:00
sua yoo
7208888a1c
chore: remove console log 2024-08-20 17:34:47 -07:00
Emma Segal-Grossman
10640feeef
Add detailed permissions & permission summaries to user invite popup (#2003) 2024-08-20 20:34:29 -04:00
Emma Segal-Grossman
570dc10f2a
Properly pluralize "Pages" in QA, and display skeletons instead of incorrect fallback values (#2026) 2024-08-20 20:33:52 -04:00
sua yoo
351e92ae2f
fix: Prevent browser profile selection overflow (#2029)
- Truncates selected browser profile description and refreshes style
- Order browser profiles by modified date
2024-08-20 12:43:51 -07:00
sua yoo
6ce565b5f7
fix: Use correct job type in crawl settings (#2028)
Fixes https://github.com/webrecorder/browsertrix/issues/2027

### Changes

Fixes crawl settings for archived item not showing the correct fields
based on job type.
2024-08-19 18:10:52 -07:00
sua yoo
4c7f1aa3ca
feat: Clean up settings UI (#2018)
- Renames "Org Settings" -> "Settings"
- Reduces gap between settings panel heading and panel
- Always show "Pending Invites" section and update heading styles to
match panel heading
- Update "Current Plan" and "Usage History" sections to be on the same
hierarchical level under "Billing"
- Refactors `<btrix-org>` to move `isAdmin` and `isCrawler` helpers to
app state
2024-08-19 13:37:41 -07:00
sua yoo
9a7033875b
chore: Refactor home component (#2000)
Resolves https://github.com/webrecorder/browsertrix/issues/1972

---------

Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
2024-08-15 17:16:14 -04:00
sua yoo
1a6892572d
chore: Refactor frontend shared state (#1997)
Refactors custom components to enable shared state accessors
(like `authState`) and helpers (like `api.fetch`.) Schemas are now
defined with [zod](https://zod.dev/?id=basic-usage) which enables
runtime schema validation.

See subtasks for full description of change:

- https://github.com/webrecorder/browsertrix/pull/1979
- https://github.com/webrecorder/browsertrix/pull/1981
- https://github.com/webrecorder/browsertrix/pull/1985
- https://github.com/webrecorder/browsertrix/pull/1986

---------

Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
2024-08-12 17:57:31 -07:00
Ilya Kreymer
3923996aaf
add missing 'Z' for iso-dates until ISO-formatting is fixed (#2008)
Adds missing 'Z' that is added explicitly now, until #1922 is fixed. It
was missing from workflow last modified and billing cancelation dates
2024-08-08 16:19:32 -07:00
sua yoo
97eac2b0e2
fix: Redirect /orgs to default path (#2006)
Fixes https://github.com/webrecorder/browsertrix/issues/2005

<!-- Fixes #issue_number -->

### Changes

Redirects `/orgs` to user's default home page.
2024-08-08 15:15:11 -07:00
Emma Segal-Grossman
2b5f964c24
Prevent invalid slugs from causing redirects in org settings (#2004)
Also improves the slug editing experience by partially-slugifying the
value as it's entered.

Previously, submitting a org slug value of ".." or similar would cause
the frontend to redirect to a "page not found" page, with all accessible
links leading to only `/account/settings`. This also causes the backend
to reset the org slug to one generated from the org name on a reload.

---------

Co-authored-by: sua yoo <sua@webrecorder.org>
2024-08-08 14:41:18 -07:00
Ilya Kreymer
5f53db75ee
fix resetting of invalid logins: (#2002)
* Fixes issue in FailedLogin model:
- fix data-model to remove nested 'attempted.attempted'
- migrate existing data to remove nested field

* Also, avoid setting dt_now() in model as that results in fixed date for
all objects:
- update FailedLogin to update 'attempted' date on every attempt
- also update PageNote object to set date in constructor

* Update text for too many logins to make it clear it is set only if its a
valid email

* fixes #2001
2024-08-07 12:36:06 -07:00
sua yoo
0b14be896b
feat: Show usage history in dashboard (#1998)
Following https://github.com/webrecorder/browsertrix/pull/1995, we want
to keep the usage history table on the dashboard for now for video
demos.

### Changes

- Adds usage history table back to org dashboard
- Makes "No usage history" message more apparent
2024-08-06 19:09:34 -07:00
sua yoo
ba1e2ab602
feat: App bar enhancements (#1996)
- Always shows current org name
- Moves org dropdown next to logo
- Reduces logo size when logged in at smaller screen sizes
2024-08-06 17:54:05 -07:00
sua yoo
96e48b001b
feat: Update billing tab with usage & portal URL check (#1995)
- Hides usage table from dashboard if billing is enabled
- Shows usage table in billing settings
- Updates usage table column headings
- Fixes `portalUrl` task running unnecessarily
2024-08-06 16:31:57 -07:00
Ilya Kreymer
7fa2b61b29
Execution time tracking tweaks (#1994)
Tweaks to how execution time is tracked for more accuracy + excluding
waiting states:
- don't update if crawl state is in a 'waiting state' (waiting for
capacity or waiting for org limit)
- rename start states -> waiting states for clarity
- reset lastUpdatedTime if two consecutive updates of non-running state,
to ensure non-running states don't count, but also account for
occasional hiccups -- if only one update detects non-running state,
don't reset
- webhooks: move start webhook to when crawl actually starts for first
time (db lastUpdatedTime is not yet + crawl is running)
- don't set lastUpdatedTime until pods actually running
- set crawljob update interval to every 10 seconds for more accurate
execution time tracking
- frontend: show seconds in 'Execution Time' display
2024-08-06 09:44:44 -07:00
Ilya Kreymer
ec29928b28
Fix QA run downloads as a single WACZ (#1993)
Follow up to #1412, fix QA run downloads as a single (multi) WACZ,
containing other WACZ files from all workers.
2024-08-06 09:44:17 -07:00
Ilya Kreymer
1c153dfd3c
Subscription Update Quotas (#1988)
- Follow-up to #1914, allows SubscriptionUpdate event to also update
quotas.
- Passes current usage info + current billing page URL to portalUrl
request for external app to be able to respond with best portalUrl
- get_origin() moved to utils to be available more generally.
- Updates billing tab to show current plans, switches order of quotas to
list execution time, storage first
2024-08-05 15:59:47 -07:00
Ilya Kreymer
e9aeff1836
add a 'stopped_org_readonly' state for crawls that are running while org is made read-only (#1977)
an org is made read-only while crawls are running:
- treat similar to other stopped_* states, do a graceful stop
- update UI to display "Stopped: Crawling Disabled" for this status
- don't add corresponding skipped status - just skip running crawls if org is read-only
2024-07-29 12:24:40 -07:00
sua yoo
daeb7448f5
feat: Minor improvements to superadmin view (#1971)
Resolves https://github.com/webrecorder/browsertrix/issues/1951

### Changes

- Shows date org was created in superadmin org list
- Visually differentiates unnamed org ID
- Adds "Admin" badge to app header to make current login more apparent
- Fixes logic to show "create org" dialog if there are no orgs in an
instance
- Refactors `btrix-home` to remove unused references to non-superadmin
org list


---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-07-25 15:47:40 -07:00
sua yoo
dd6c33a59d
feat: Show details of invalid invite (#1970)
Resolves https://github.com/webrecorder/browsertrix/issues/1912

### Changes

- Show support email, if available, in invalid invite error message
- Separate error message for invite email that doesn't match current
user's
2024-07-25 13:57:02 -07:00
Tessa Walsh
d38abbca7f
Standardize handling of storage and execution time quotas (#1969)
Fixes #1968 

Changes:
- `stopped_quota_reached` and `skipped_quota_reached` migrated to new
values that indicate which quota was reached
- Before crawls are run, the operator checks if storage or exec mins
quotas are reached and if so fails the crawl with the appropriate state
of `skipped_storage_quota_reached` or `skipped_time_quota_reached`
- While crawls are running, the operator checks if the exec mins quota
is reached or if the size of all running crawls will mean the storage
quota is reached once uploaded; if so, the crawl is stopped gracefully
and given `stopped_storage_quota_needed` or `stopped_time_quota_reached`
state as appropriate
- Adds new nightly tests for enforcing storage quota
2024-07-25 12:49:11 -07:00
sua yoo
2c89edcc36
feat: Disable archiving for read-only orgs (#1965)
Resolves https://github.com/webrecorder/browsertrix/issues/1915

### Changes

- Disables buttons to create resources, duplicate resources, run crawls,
and configure browser profiles.
- Updates copy from "read-only" -> "disable archiving"
2024-07-25 12:42:04 -07:00
Tessa Walsh
27ee16d308
Implement downloading archived item + QA runs as multi-WACZ (#1933)
Fixes #1412 

## Changes

### Backend
- Adds `all-crawls`, `crawls`, and `uploads` API endpoints to download
archived item as multi-WACZ
- Download QA runs as multi-WACZ
- Adds backend tests for new endpoints
- Update to new version of stream-zip library which does not require crc-32 to be present for ZIP members,
computes after streaming, fixing invalid crc-32 issues as previously computed crc-32s from crawler may be invalid.

### Frontend

Adds ability to download archived item from:

- Button in archived item detail Files tab
- Archived item details actions menu
- Archived items list menu



---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-07-25 10:28:57 -07:00