Commit Graph

35 Commits

Author SHA1 Message Date
Ilya Kreymer
c33f749515
Frontend hosted-docs (#2107)
Fixes #2106 

Docs are now hosted as part of the frontend at `/docs` by default.

- If `docs_url` is set in the helm chart, the `/docs` endpoint will
redirect to that endpoint instead
- Use multi-stage python image to build mkdocs as part of frontend, then
copy static output
- Dir layout: mkdocs.yml and docs into frontend/docs
- CI: Update docs build GH action to use new path
- Update all frontend paths to use `/docs/` instead of
`https://docs.browsertrix.com/`

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-10-08 14:56:34 -07:00
Ilya Kreymer
d12ce9a591
sidebar setup guide fixes: (#2105)
- ensure user-guide iframe is always visible if not fading in when
'Setup Guide' button is clicked
- Clicking 'Setup Guide' scrolls to relevant section of guide (scope,
limits, browser settings, scheduling, metadata) based on current
workflow editor tab via hashtag

---------

Co-authored-by: sua yoo <sua@webrecorder.org>
2024-10-07 22:14:46 -07:00
sua yoo
22435ddaf9
Save new workflow scope type preference (#2099)
Resolves https://github.com/webrecorder/browsertrix/issues/2091

<!-- Fixes #issue_number -->

### Changes

Saves scope type chosen from "+ New Workflow" dropdown to local storage, as well as from within workflow editor when creating a new workflow (but not editing an existing one).
2024-10-02 19:08:20 -07:00
Vinzenz Sinapius
bb6e703f6a
Configure browsertrix proxies (#1847)
Resolves #1354

Supports crawling through pre-configured proxy servers, allowing users to select which proxy servers to use (requires browsertrix crawler 1.3+)

Config:
- proxies defined in btrix-proxies subchart
- can be configured via btrix-proxies key or separate proxies.yaml file via separate subchart
- proxies list refreshed automatically if crawler_proxies.json changes if subchart is deployed
- support for ssh and socks5 proxies
- proxy keys added to secrets in subchart
- support for default proxy to be always used if no other proxy configured, prevent starting cluster if default proxy not available
- prevent starting manual crawl if previously configured proxy is no longer available, return error
- force 'btrix' username and group name on browsertrix-crawler non-root user to support ssh

Operator:
- support crawling through proxies, pass proxyId in CrawlJob
- support running profile browsers which designated proxy, pass proxyId to ProfileJob
- prevent starting scheduled crawl if previously configured proxy is no longer available

API / Access:
- /api/orgs/all/crawlconfigs/crawler-proxies - get all proxies (superadmin only)
- /api/orgs/{oid}/crawlconfigs/crawler-proxies - get proxies available to particular org
- /api/orgs/{oid}/proxies - update allowed proxies for particular org (superadmin only)
- superadmin can configure which orgs can use which proxies, stored on the org
- superadmin can also allow an org to access all 'shared' proxies, to avoid having to allow a shared proxy on each org.

UI:
- Superadmin has 'Edit Proxies' dialog to configure for each org if it has: dedicated proxies, has access to shared proxies.
- User can select a proxy in Crawl Workflow browser settings
- Users can choose to launch a browser profile with a particular proxy
- Display which proxy is used to create profile in profile selector
- Users can choose with default proxy to use for new workflows in Crawling Defaults

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-10-02 18:35:45 -07:00
sua yoo
612bbb6f42
feat: Merge workflow job types (#2068)
Resolves https://github.com/webrecorder/browsertrix/issues/2073

### Changes

- Removes "URL List" and "Seeded Crawl" job type distinction and adds as
additional crawl scope types instead.
- 'New Workflow' button defaults to Single Page
- 'New Workflow' dropdown includes Page Crawl (Single Page, Page List, In-Page Links) and Site Crawl (Page in Same Directory, Page on Same Domain, + Subdomains and Custom Page Prefix)
- Enables specifying `DOCS_URL` in `.env`
- Additional follow-ups in #2090, #2091
2024-09-25 10:37:18 -04:00
sua yoo
b58d367da7
fix: Archived item navigation improvements (#2062)
- Replaces QA Review breadcrumbs with "Back" button that takes you back
to the archived item
- Crawl breadcrumbs always point to workflow
- Shows archived item icon next to item name to differentiate from
workflow
- Adds "Edit Workflow" button to "Crawl Settings"

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-09-10 14:26:29 -07:00
sua yoo
c01e3dd88b
feat: Improve UX of choosing new workflow crawl type (#2067)
Resolves https://github.com/webrecorder/browsertrix/issues/2066

### Changes
- Allows directly choosing new "Page List" or "Site Crawl from
workflow list
- Reverts terminology introduced in
https://github.com/webrecorder/browsertrix/pull/2032
2024-09-09 16:42:47 -07:00
sua yoo
880e27370d
feat: Breadcrumb navigation (#2053)
- Adds breadcrumb navigation to all detail views, returning to correct
origin for workflow and collection items
- Refactors list page headers into layout utility
- Refactors crawl tab labels and renames "Files" tab to "WACZ Files"
2024-08-30 09:08:24 -07:00
sua yoo
988d9c9e2b
feat: Allow org admins to set default workflow configs (#2020)
- Adds new org settings tab for updating crawl details
- Refactors `workflow-editor` to move out utility functions
- Updates user guide on org settings

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-08-29 16:49:22 -07:00
sua yoo
1a6892572d
chore: Refactor frontend shared state (#1997)
Refactors custom components to enable shared state accessors
(like `authState`) and helpers (like `api.fetch`.) Schemas are now
defined with [zod](https://zod.dev/?id=basic-usage) which enables
runtime schema validation.

See subtasks for full description of change:

- https://github.com/webrecorder/browsertrix/pull/1979
- https://github.com/webrecorder/browsertrix/pull/1981
- https://github.com/webrecorder/browsertrix/pull/1985
- https://github.com/webrecorder/browsertrix/pull/1986

---------

Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
2024-08-12 17:57:31 -07:00
sua yoo
2c89edcc36
feat: Disable archiving for read-only orgs (#1965)
Resolves https://github.com/webrecorder/browsertrix/issues/1915

### Changes

- Disables buttons to create resources, duplicate resources, run crawls,
and configure browser profiles.
- Updates copy from "read-only" -> "disable archiving"
2024-07-25 12:42:04 -07:00
Tessa Walsh
80008a2853
Add post load delay to Browsertrix (#1700)
Fixes #1699 

Adds post load delay to:
- Backend `RawCrawlConfig` model
- Frontend (workflow editor and config details component)
- Workflow setup docs
2024-04-18 20:03:47 -07:00
Emma Segal-Grossman
b1e2f1b325
Add ESLint rules for import ordering (#1608)
Follow-up from
https://github.com/webrecorder/browsertrix-cloud/pull/1546#discussion_r1529001599
(cc @SuaYoo)

- Adds `eslint-plugin-import-x` and
`@ianvs/prettier-plugin-sort-imports` and configures rules for them both
so imports get sorted on format & on lint.
- Runs both on everything!
2024-03-18 21:50:02 -04:00
Emma Segal-Grossman
f853fcdd81
Upgrade Prettier to 3 (#1513)
Updates Prettier to major version 3, and also updates a couple
prettier-related other things.

Prelude to #1511 so that that PR doesn't include a bunch of unrelated
changes
2024-01-31 20:56:17 -05:00
Emma Segal-Grossman
3968928ac2
ESLint improvements & Typescript upgrade (#1501)
## Overview

Adds a bunch of ESLint rules, mostly from `typescript-eslint`, and fixes
the issues turning on these rules raises.

Also updates Typescript & typescript-eslint.

## Rationale

Most of these new rules are auto-fixable, so I've tackled a bunch of the
little fixes that do need manual intervention now with the intention
that this shouldn't add much of any additional friction in future
development work, and also give us a good bump in overall code quality.
A lot of the rules here are also great for catching potential bugs!

## Changes

- Adds `void` to most un-awaited and unhandled promises (i.e. places
where async functions are called but nothing is done with the promise)
- Converts properties that are only ever read to `readonly`
- Adds a new `isApiError` function that informs Typescript of when an
error is an `APIError`
- Adds types to a bunch of places that were previously untyped
- Changes instances of `Map<string, any>` in lit property update methods
to `PropertyValues<this>`, or sometimes `PropertyValues<this> &
Map<string, unknown>` where private or protected members are used
(`keyof` doesn't include private and protected members, unfortunately)
  - Adds types to a bunch of custom events
- Cleans up a regex by removing unnecessary escape characters
- Makes a number of implied type conversions explicit (by wrapping with
`Boolean(...)` or calling `.toString()`)
- More consistently applies type coercions when necessary, and removes
them when unnecessary
- Converts a couple const strings to an enum
- Removes the need to type debounced functions as `any` by doing type
coercions to the underlying function type at where the method is bound
to the event in the `html` block
2024-01-31 14:42:06 -05:00
Tessa Walsh
07fa46d9aa
Add custom user agent to workflows (#1465)
Fixes #1341

Adds "User Agent" field to workflow editor under the Browser Settings
tab. If not set, the crawler will use the browser's default user agent.

Also added to docs and to the workflow details page (if set).

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2024-01-17 17:33:50 -05:00
Tessa Walsh
032859f361
Support multiple crawler versions (#1420)
Fixes #1385 

## Changes
Supports multiple crawler 'channels' which can be configured to
different browsertrix-crawler versions
- Replaces `crawler_image` in helm chart with `crawler_channels` array
similar to how storages are handled
- The `default` crawler channel must always be provided and specifies
the default crawler image
- Adds backend `/orgs/{oid}/crawlconfigs/crawler-channels` API endpoint
to fetch information about available crawler versions (name, image, and
label) and test
- Adds crawler channel select to workflow creation/edit screens and
profile creation dialog, and updates related API endpoints and
configmaps accordingly. The select dropdown is shown only if more than
one channel is configured.
- Adds `crawlerChannel` to workflow and crawl details.
- Add `image` to crawler image, used to display actual image used as
part of the crawl.
- Modifies `crawler_crawl_id` backend test fixture to use `test` crawler
version to ensure crawler versions other than latest work
- Adds migration to add `crawlerChannel` set to `default` to existing
workflow and profile objects and workflow configmaps

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-01-16 15:32:12 -08:00
Emma Segal-Grossman
106fe5dd61
Organize components into folders by function (#1411) 2023-11-29 14:12:29 -05:00
Emma Segal-Grossman
b15c5ccddd
ESLint & Typescript fixes (#1407)
Closes #1405

- Properly uses `typescript-eslint`: we were missing the preset from it,
so some of the default `eslint` rules (that don't properly work with
typescript) were being applied and causing false positives
- I also moved the `eslint` config into its own file, and enabled
`typescript-eslint`'s type-awareness, so that we can enable more
type-aware rules in the future if we like
- Adds `ts-lit-plugin` to the typescript config, which _hopefully_ will
allow us to catch issues during build (in CI)
- It looks like `ts-lit-plugin` is sort of abandonware at the moment,
and unfortunately _doesn't_ actually work for this purpose right now,
but the lit team is working on a replacement here:
https://www.npmjs.com/package/@lit-labs/analyzer
- Adds `fork-ts-checker-webpack-plugin`, which allows the typescript
checking process to be run on a separate forked thread in Webpack, which
can help speed up builds & checking
- Enables incremental type checking for better speed
- Fixes a whole bunch of `eslint`-auto-fixable issues (unused imports
and variables, some type issues, etc)
- Fixes a bunch of `lit-analyzer` issues (mostly attribute naming, some
type issues as well)
- Fixes various other type issues:
- Improves type safety in a bunch of places, notably anywhere `apiFetch`
and `APIPaginatedList` are used
  - Removes some `any`s
2023-11-24 12:32:53 -05:00
emma
bc6f362861 update pages as well 2023-11-15 18:24:50 -05:00
Tessa Walsh
38f32f11ea
Enforce quota and hard cap for monthly execution minutes (#1284)
Fixes #1261 Closes #1092

The quota for monthly execution minutes is treated as a hard cap. Once
it is exceeded, an alert indicating that an org has exceeded its monthly
execution minutes will display and the user will be unable to start new
crawls. Any running crawls will be stopped once the quota is exceeded.

An execution minutes meter bar is also added in the Org Dashboard and
displayed if a quota is set. More detail in #1305 which was
merged into this branch.

## Changes

- Enable setting 'maxExecMinutesPerMonth' in orgs list quotas by superadmin
- Enforce quota by stopping crawls in operator once quota is reached
- Show alert banner once execution time quota is hit:
- Once quota is hit, disable Run Crawl buttons in frontend, return 403
message with `exec_minutes_quota_reached` detail in backend from
crawl config `/run` endpoint, and don't run new workflows on creation
(similar to storage quota)
- Display execution time for crawls in the crawl details overview,
immediately below
- Show execution minutes meter on dashboard (from #1305)

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: sua yoo <sua@webrecorder.org>
2023-10-26 15:38:51 -07:00
sua yoo
4610d95cd7
Use org slug in place of UUIDs in app URLs (#1277)
- Replaces org UUID in URL/browser location bar with org slug.
- Refactor: Adds shared app state utility using https://sijakret.github.io/lit-shared-state/ to
access org data from deep descendants.
- Backwards compatible: org UUID URLs should auto-redirect to org slug URLs.
- Show the org UUID in org settings general tab for use with APIs
(Resolves #1258, Follows #1279)
2023-10-18 09:28:30 -07:00
Tessa Walsh
b1ead614ee
Add --failOnFailedSeed checkbox to URL list workflows (#1236)
- If set, and any of the seeds fails, the entire crawl is marked as a failure.
- Add checkbox which adds --failOnFailedSeed checkbox to URL list workflows
- Add 'Fail Crawl On Failed URL' to crawl workflow setup docs
2023-10-03 18:46:09 -07:00
sua yoo
941a75ef12
Separate seeds into a new endpoints (#1217)
- Remove config.seeds from workflow and crawl detail endpoints
- Add new paginated GET /crawls/{crawl_id}/seeds and /crawlconfigs/{cid}/seeds endpoints to retrieve seeds for a crawl or workflow
- Include firstSeed in GET /crawlconfigs/{cid} endpoint (was missing before)
- Modify frontend to fetch seeds from new /seeds endpoints with loading indicator

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-10-02 10:56:12 -07:00
sua yoo
90e3a300cc
"Add new" dialog for all resources (#1202)
- Replaces individual "New" buttons in home page with dropdown button in header (includes Crawl Workflow, Upload Collection, Browser Profile)
- Refactors required step of new workflow and new collection into dialog
2023-09-29 09:11:24 -07:00
Tessa Walsh
e667fe2e97
Add max crawl size option to backend and frontend (#1045)
Backend:
- add 'maxCrawlSize' to models and crawljob spec
- add 'MAX_CRAWL_SIZE' to configmap
- add maxCrawlSize to new crawlconfig + update APIs
- operator: gracefully stop crawl if current size (from stats) exceeds maxCrawlSize
- tests: add max crawl size tests

Frontend:
- Add Max Crawl Size text box Limits tab
- Users enter max crawl size in GB, convert to bytes
- Add BYTES_PER_GB as constant for converting to bytes
- docs: Crawl Size Limit to user guide workflow setup section

Operator Refactor:
- use 'status.stopping' instead of 'crawl.stopping' to indicate crawl is being stopped, as changing later has no effect in operator
- add is_crawl_stopping() to return if crawl is being stopped, based on crawl.stopping or size or time limit being reached
- crawlerjob status: store byte size under 'size', human readable size under 'sizeHuman' for clarity
- size stat always exists so remove unneeded conditional (defaults to 0)
- store raw byte size in 'size', human readable size in 'sizeHuman'

Charts:
- subchart: update crawlerjob crd in btrix-crds to show status.stopping instead of spec.stopping
- subchart: show 'sizeHuman' property instead of 'size'
- bump subchart version to 0.1.1

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-08-26 22:00:37 -07:00
Anish Lakhwara
9236a07800
fix: run yarn format in frontend dir (#1043) 2023-08-03 19:12:48 -07:00
Tessa Walsh
d5c3a8519f
Add crawler Use Sitemap option to Browsertrix Cloud (#978)
* Add user-guide docs for Use Sitemap option
---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-07-19 13:57:52 -04:00
Tessa Walsh
bd6dc79449
Add frontend support for auto-adding collections to workflows (#916)
- Adds collections search and list to workflow editor
- Adds collections to workflow details component
- Adds namePrefix filter to backend GET /orgs/{oid}/collections endpoint to support case-insensitive searching of collections
- Adds documentation for new setting

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-06-12 18:18:05 -07:00
sua yoo
7888c4fde3
Frontend crawl workflows rework (#775) 2023-04-25 14:16:07 -07:00
Henry Wilkinson
ba3daf326d
Adds inputmode attributes to workflow config fields (#755)
- Now the appropriate virtual keyboards are shown! :)
- Also adjusts type weight for workflow config headers to match mockups
2023-04-06 09:16:48 -07:00
sua yoo
91c2c1ad62
Allow users to set additional page time limits (#744) 2023-04-05 20:06:46 -07:00
sua yoo
5f5bb5ea6e
Allow users to set workflow description (#708) 2023-03-21 13:40:23 -07:00
Ilya Kreymer
de9212eec7
exclusions editor fix: (#692)
- backend: fix updating model after exclusions change
- frontend: don't check for new_cid, just success
- fixes #691
2023-03-10 22:36:10 -08:00
sua yoo
8ca4276c57
Migrate crawl config frontend -> workflow (#686) 2023-03-10 11:39:42 -08:00