Commit Graph

58 Commits

Author SHA1 Message Date
Tessa Walsh
fa35d8994f Disable useSitemap by default in new workflows (#1541) 2024-02-22 23:54:23 -08:00
Ilya Kreymer
ba18abc063 Fix URL List showing scope accidentally (#1536)
fix call from when(...) to call function directly, avoid implicit true,
which results in page scope being shown for url list.
fixes #1535

To test:
1) Create new workflow of type URL List
2) Ensure the Crawl Scope drop down is not shown.

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2024-02-22 23:54:23 -08:00
Ilya Kreymer
4cbe134a0e
one hop out: remove errant '|| true' from condition (#1532)
Fixes #1531, one hop out checkbox always checked.
2024-02-16 09:43:14 -08:00
Emma Segal-Grossman
d1156b0145
enable a few more useful eslint suggestions & correct some more types (#1517)
## Changes

Implements suggestions from
https://typescript-eslint.io/blog/consistent-type-imports-and-exports-why-and-how/
and
https://www.totaltypescript.com/method-shorthand-syntax-considered-harmful,
along with a couple more auto-fixable consistency rules.

Of note:
- Functions that return a promise are marked as async
- Suggestions now appear for where to simplify boolean checks,
non-nullish assertions, and optional chaining
2024-02-09 16:14:08 -08:00
Henry Wilkinson
9c2228aa52
Updates browser profile selector help text (#1510) 2024-02-07 18:05:28 -05:00
Emma Segal-Grossman
f853fcdd81
Upgrade Prettier to 3 (#1513)
Updates Prettier to major version 3, and also updates a couple
prettier-related other things.

Prelude to #1511 so that that PR doesn't include a bunch of unrelated
changes
2024-01-31 20:56:17 -05:00
Emma Segal-Grossman
3968928ac2
ESLint improvements & Typescript upgrade (#1501)
## Overview

Adds a bunch of ESLint rules, mostly from `typescript-eslint`, and fixes
the issues turning on these rules raises.

Also updates Typescript & typescript-eslint.

## Rationale

Most of these new rules are auto-fixable, so I've tackled a bunch of the
little fixes that do need manual intervention now with the intention
that this shouldn't add much of any additional friction in future
development work, and also give us a good bump in overall code quality.
A lot of the rules here are also great for catching potential bugs!

## Changes

- Adds `void` to most un-awaited and unhandled promises (i.e. places
where async functions are called but nothing is done with the promise)
- Converts properties that are only ever read to `readonly`
- Adds a new `isApiError` function that informs Typescript of when an
error is an `APIError`
- Adds types to a bunch of places that were previously untyped
- Changes instances of `Map<string, any>` in lit property update methods
to `PropertyValues<this>`, or sometimes `PropertyValues<this> &
Map<string, unknown>` where private or protected members are used
(`keyof` doesn't include private and protected members, unfortunately)
  - Adds types to a bunch of custom events
- Cleans up a regex by removing unnecessary escape characters
- Makes a number of implied type conversions explicit (by wrapping with
`Boolean(...)` or calling `.toString()`)
- More consistently applies type coercions when necessary, and removes
them when unnecessary
- Converts a couple const strings to an enum
- Removes the need to type debounced functions as `any` by doing type
coercions to the underlying function type at where the method is bound
to the event in the `html` block
2024-01-31 14:42:06 -05:00
sua yoo
896c3cc91c
Fix scheduler date input and display (#1472)
Fixes #1255

### Changes

- Fixes incorrect time zone conversion when generating UTC schedule in
workflow.
- Fixes minute input display not prefixing single digits with `0`

Co-authored-by: emma <hi@emma.cafe>
2024-01-18 23:55:55 -08:00
Tessa Walsh
07fa46d9aa
Add custom user agent to workflows (#1465)
Fixes #1341

Adds "User Agent" field to workflow editor under the Browser Settings
tab. If not set, the crawler will use the browser's default user agent.

Also added to docs and to the workflow details page (if set).

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2024-01-17 17:33:50 -05:00
Tessa Walsh
032859f361
Support multiple crawler versions (#1420)
Fixes #1385 

## Changes
Supports multiple crawler 'channels' which can be configured to
different browsertrix-crawler versions
- Replaces `crawler_image` in helm chart with `crawler_channels` array
similar to how storages are handled
- The `default` crawler channel must always be provided and specifies
the default crawler image
- Adds backend `/orgs/{oid}/crawlconfigs/crawler-channels` API endpoint
to fetch information about available crawler versions (name, image, and
label) and test
- Adds crawler channel select to workflow creation/edit screens and
profile creation dialog, and updates related API endpoints and
configmaps accordingly. The select dropdown is shown only if more than
one channel is configured.
- Adds `crawlerChannel` to workflow and crawl details.
- Add `image` to crawler image, used to display actual image used as
part of the crawl.
- Modifies `crawler_crawl_id` backend test fixture to use `test` crawler
version to ensure crawler versions other than latest work
- Adds migration to add `crawlerChannel` set to `default` to existing
workflow and profile objects and workflow configmaps

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-01-16 15:32:12 -08:00
Tessa Walsh
4fe014067e
Set runNow to false when editing existing workflows (#1458)
Fixes #1339
2023-12-18 14:23:04 -05:00
Emma Segal-Grossman
512698d747
Fix attribute casing & lit-analyzer issues (#1429)
## Changes
- Reverts changes introduced in #1407 that incorrectly changed attribute
casing
- Patches `@shoelace-style/shoelace` using
[`patch-package`](https://www.npmjs.com/package/patch-package) to add
JSDoc comments to component typedefs so that `lit-analyzer` can properly
pick up attributes
- Adds component typedef for `<replay-web-page>` component

## Testing
Tested by hand, it looks like missing help text/date formatting
changes/etc are back!

Before | After
-|-
![dev browsertrix
cloud_orgs_default-org_browser-profiles_profile_dea43f41-8777-4a42-b2ad-b8d43f6599b8](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/1c6be749-ee8f-4b07-84c7-b05c5df376a7)
|
![localhost_9870_orgs_default-org_browser-profiles_profile_dea43f41-8777-4a42-b2ad-b8d43f6599b8](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/4a305d3f-7947-4e13-b379-a82dc01620ea)
![dev browsertrix
cloud_orgs_default-org_browser-profiles_profile_dea43f41-8777-4a42-b2ad-b8d43f6599b8
(2)](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/a5e6bba6-ce03-4622-8f39-194ce08481b7)
|
![localhost_9870_orgs_default-org_browser-profiles_profile_dea43f41-8777-4a42-b2ad-b8d43f6599b8
(2)](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/33f076d8-aa20-4d25-9d1f-e6927d32819d)
![dev browsertrix
cloud_orgs_default-org_browser-profiles_profile_dea43f41-8777-4a42-b2ad-b8d43f6599b8
(1)](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/34761f6b-32a9-4eb5-a129-0df67bb90f65)
|
![localhost_9870_orgs_default-org_browser-profiles_profile_dea43f41-8777-4a42-b2ad-b8d43f6599b8
(1)](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/d8144b10-fc9b-49a4-9641-604ad8fa4e5a)

---------

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2023-12-11 12:34:03 -05:00
Emma Segal-Grossman
106fe5dd61
Organize components into folders by function (#1411) 2023-11-29 14:12:29 -05:00
Emma Segal-Grossman
b15c5ccddd
ESLint & Typescript fixes (#1407)
Closes #1405

- Properly uses `typescript-eslint`: we were missing the preset from it,
so some of the default `eslint` rules (that don't properly work with
typescript) were being applied and causing false positives
- I also moved the `eslint` config into its own file, and enabled
`typescript-eslint`'s type-awareness, so that we can enable more
type-aware rules in the future if we like
- Adds `ts-lit-plugin` to the typescript config, which _hopefully_ will
allow us to catch issues during build (in CI)
- It looks like `ts-lit-plugin` is sort of abandonware at the moment,
and unfortunately _doesn't_ actually work for this purpose right now,
but the lit team is working on a replacement here:
https://www.npmjs.com/package/@lit-labs/analyzer
- Adds `fork-ts-checker-webpack-plugin`, which allows the typescript
checking process to be run on a separate forked thread in Webpack, which
can help speed up builds & checking
- Enables incremental type checking for better speed
- Fixes a whole bunch of `eslint`-auto-fixable issues (unused imports
and variables, some type issues, etc)
- Fixes a bunch of `lit-analyzer` issues (mostly attribute naming, some
type issues as well)
- Fixes various other type issues:
- Improves type safety in a bunch of places, notably anywhere `apiFetch`
and `APIPaginatedList` are used
  - Removes some `any`s
2023-11-24 12:32:53 -05:00
emma
bc6f362861 update pages as well 2023-11-15 18:24:50 -05:00
Henry Wilkinson
b9eab4c20f
Minor capitalization fixes for workflow options (#1374)
- Updates checkbox casing
2023-11-13 19:22:50 -05:00
Henry Wilkinson
3c884f94c9
Fix z-index footer issue in crawl workflow form (#1313)
Closes #1312

- Adds z-index to footer element.
2023-10-26 21:44:50 -07:00
Tessa Walsh
38f32f11ea
Enforce quota and hard cap for monthly execution minutes (#1284)
Fixes #1261 Closes #1092

The quota for monthly execution minutes is treated as a hard cap. Once
it is exceeded, an alert indicating that an org has exceeded its monthly
execution minutes will display and the user will be unable to start new
crawls. Any running crawls will be stopped once the quota is exceeded.

An execution minutes meter bar is also added in the Org Dashboard and
displayed if a quota is set. More detail in #1305 which was
merged into this branch.

## Changes

- Enable setting 'maxExecMinutesPerMonth' in orgs list quotas by superadmin
- Enforce quota by stopping crawls in operator once quota is reached
- Show alert banner once execution time quota is hit:
- Once quota is hit, disable Run Crawl buttons in frontend, return 403
message with `exec_minutes_quota_reached` detail in backend from
crawl config `/run` endpoint, and don't run new workflows on creation
(similar to storage quota)
- Display execution time for crawls in the crawl details overview,
immediately below
- Show execution minutes meter on dashboard (from #1305)

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: sua yoo <sua@webrecorder.org>
2023-10-26 15:38:51 -07:00
sua yoo
4610d95cd7
Use org slug in place of UUIDs in app URLs (#1277)
- Replaces org UUID in URL/browser location bar with org slug.
- Refactor: Adds shared app state utility using https://sijakret.github.io/lit-shared-state/ to
access org data from deep descendants.
- Backwards compatible: org UUID URLs should auto-redirect to org slug URLs.
- Show the org UUID in org settings general tab for use with APIs
(Resolves #1258, Follows #1279)
2023-10-18 09:28:30 -07:00
Henry Wilkinson
0bd8748e68
Minor Workflow Creator UX Changes (#1267)
- Adds `position: sticky` to the workflow creator / editor controls to
affix them to the bottom of the screen, they are now always visible!
- Renames "Extra URLs in Scope" to "Extra URL Prefixes in Scope"
- Updates documentation accordingly
- Adjusts casing for checkboxes
- Adds the multiplication sign to the crawler instances settings to
better communicate that they are increases in scale and not arbitrary
numbers.
2023-10-13 16:55:54 -07:00
sua yoo
38efeccc25
Limit URL list entry to maximum URLs (#1242)
- Limits URL list entry to 1,000 URLs
- Limits additional URL list entry to 100 URLs
- Shows first invalid URL in list in error message
- Quick and dirty fix for long URLs wrapping: Show URLs in list on one line, with entire container scrolling
---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-10-03 21:02:32 -07:00
Tessa Walsh
b1ead614ee
Add --failOnFailedSeed checkbox to URL list workflows (#1236)
- If set, and any of the seeds fails, the entire crawl is marked as a failure.
- Add checkbox which adds --failOnFailedSeed checkbox to URL list workflows
- Add 'Fail Crawl On Failed URL' to crawl workflow setup docs
2023-10-03 18:46:09 -07:00
sua yoo
941a75ef12
Separate seeds into a new endpoints (#1217)
- Remove config.seeds from workflow and crawl detail endpoints
- Add new paginated GET /crawls/{crawl_id}/seeds and /crawlconfigs/{cid}/seeds endpoints to retrieve seeds for a crawl or workflow
- Include firstSeed in GET /crawlconfigs/{cid} endpoint (was missing before)
- Modify frontend to fetch seeds from new /seeds endpoints with loading indicator

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-10-02 10:56:12 -07:00
sua yoo
90e3a300cc
"Add new" dialog for all resources (#1202)
- Replaces individual "New" buttons in home page with dropdown button in header (includes Crawl Workflow, Upload Collection, Browser Profile)
- Refactors required step of new workflow and new collection into dialog
2023-09-29 09:11:24 -07:00
sua yoo
d05a27e8a4
Separate "run now" switch from scheduling options (#1175) 2023-09-21 19:18:57 -07:00
sua yoo
6234346d84
Fix crawl scope help text (#1169)
* update text

* remove trailing slash removal

* make scope help text responsive as user types

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-09-13 11:46:58 -07:00
Ilya Kreymer
9159c7c914
ensure max crawl size and max crawl timeout values are set to 0 when unused, instead of null (#1167)
- convert None->0 when creating CrawlJob
- ensure frontend sends 0 not null
- make input model require 'int = 0' instead of 'Optional[int] = 0'
2023-09-13 09:51:26 -07:00
Tessa Walsh
9377a6f456
Issue all non-upload storage-quota-update events from LiteElement (#1151)
- More specific toast notification error messages to the action being attempted
- Single dismissable global banner shown when org storage is reached
- Removed check for storage quota reached in `runNow`, since buttons are disabled in UI, and errors handled if request fails.
- Allow creating new workflow when storage quota reached
- More responsive storage quota updates: add storageQuotaReached to archived item replay.json, updates w/o reload when crawl pushes quota over limit
- Modify LiteElement to check for storageQuotaReached on GET requests

---------
Co-authored-by: sua yoo <sua@suayoo.com>
2023-09-11 18:17:48 -07:00
Tessa Walsh
d2ededc895
Add and enforce org storage quota (#1106)
* Implement in backend

- Track bytesStored in org
- Add migration to pre-calculate based on size of crawlfiles and profilefiles
- Add methods to increase or decrease org storage when crawl or profile files
are added or deleted
- Include storageQuotaReached boolean in API responses that alter storage
- Don't start new crawls and fail uploads if storage quota reached

* Implement in frontend

- Add to orgs-list quotas
- Update org's storageQuotaReached based on backend endpoint responses
- Disable buttons when storage quota is met
- Show toast notification when attempting to run a crawl when org
storage quota is met
2023-09-07 12:45:43 -04:00
Tessa Walsh
93573d0bfe
Use base10 for sizes in frontend (#1133)
* Use base10 for sizes in frontend

* Simplify renderSize
2023-09-05 21:35:20 -04:00
Tessa Walsh
e667fe2e97
Add max crawl size option to backend and frontend (#1045)
Backend:
- add 'maxCrawlSize' to models and crawljob spec
- add 'MAX_CRAWL_SIZE' to configmap
- add maxCrawlSize to new crawlconfig + update APIs
- operator: gracefully stop crawl if current size (from stats) exceeds maxCrawlSize
- tests: add max crawl size tests

Frontend:
- Add Max Crawl Size text box Limits tab
- Users enter max crawl size in GB, convert to bytes
- Add BYTES_PER_GB as constant for converting to bytes
- docs: Crawl Size Limit to user guide workflow setup section

Operator Refactor:
- use 'status.stopping' instead of 'crawl.stopping' to indicate crawl is being stopped, as changing later has no effect in operator
- add is_crawl_stopping() to return if crawl is being stopped, based on crawl.stopping or size or time limit being reached
- crawlerjob status: store byte size under 'size', human readable size under 'sizeHuman' for clarity
- size stat always exists so remove unneeded conditional (defaults to 0)
- store raw byte size in 'size', human readable size in 'sizeHuman'

Charts:
- subchart: update crawlerjob crd in btrix-crds to show status.stopping instead of spec.stopping
- subchart: show 'sizeHuman' property instead of 'size'
- bump subchart version to 0.1.1

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-08-26 22:00:37 -07:00
Tessa Walsh
ce5b52f8af
Add and enforce org maxPagesPerCrawl quota (#1044) 2023-08-23 10:38:36 -04:00
Ilya Kreymer
8ea3dd5dae
terminology tweaks in frontend: (part of #922) (#1062)
* terminology tweaks in frontend: (part of #922)
- use 'crawl workflow' instead of 'workflow' where possible
- use 'replay' instead of 'replay crawl'
- localization: rerun string extraction / processing
- "Review Config" → "Review Settings"
- "Workflow" → "Crawl Workflow" in error message

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-08-09 15:38:58 -07:00
sua yoo
37733483d5
Standardize archived item filtering, sorting and labels (#1054)
Frontend:
- Renames list view to "All Archived Items"
- Refactors fetches to use single all-crawls endpoints
- Removes search by config ID for more search parity with uploads
- Adds sort by size
- Refactors property and method names to replace crawl*
- Replaces remaining references to "crawl" in copy with "item"'
- Rename Upload Archive button to Upload WACZ
- Fix focusout in item menu so menus close

Backend:
- Filter search values by type as well
- Only get list of cids for crawls in search values
- Don't list crawl/workflow ids in search values

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-08-09 12:13:55 -07:00
Anish Lakhwara
9236a07800
fix: run yarn format in frontend dir (#1043) 2023-08-03 19:12:48 -07:00
sua yoo
75b011f951
Upload WACZ via UI (#992)
- Users can now upload .WACZ archives from the "Archived Data" page.
- Can specify name, description, tags and collection(s) to add upload to
- Show progress of upload
- Support canceling upload
2023-07-21 16:45:52 +02:00
Tessa Walsh
d5c3a8519f
Add crawler Use Sitemap option to Browsertrix Cloud (#978)
* Add user-guide docs for Use Sitemap option
---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-07-19 13:57:52 -04:00
Henry Wilkinson
d9e73fcbc3
Reorder Limits section (#966)
* Reorder Limits section

- Minor text change to section names
  - "Limit Per Page" → "Per-Page Limits"
  - "Limit Per Crawl" → "Per-Crawl Limits"

* Reorder limits section in documentation
2023-07-08 08:54:30 -07:00
Tessa Walsh
c7051d5fbf
Backend API consistency pass (#921)
* Make API add and update method returns consistent

- Updates return {"updated": True}
- Adds return {"added": True}
- Both can additionally have other fields as needed, e.g. id or name

- remove Profile response model, as returning added / id only
- reformat

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-06-16 18:52:46 -07:00
Tessa Walsh
bd6dc79449
Add frontend support for auto-adding collections to workflows (#916)
- Adds collections search and list to workflow editor
- Adds collections to workflow details component
- Adds namePrefix filter to backend GET /orgs/{oid}/collections endpoint to support case-insensitive searching of collections
- Adds documentation for new setting

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-06-12 18:18:05 -07:00
Ilya Kreymer
ec3404c798
Fix Extra URLs in Scope (#913)
* scope fix: when using 'Custom Page Prefix scope (fixes #873)
- don't include primary seed URL in include list
- don't always add trailing slash to extra in scope URLs
- set seed scope to 'prefix' (supported via webrecorder/browsertrix-crawler#318) instead of re-including seed URL
- add comments on using 'custom' to indicate 'Custom Prefix Scope' semantics on frontend, setting actual scope to 'prefix' on backend
- remove unneeded conditional for additional urls, main scopeType overridden per seed anyway
2023-06-12 17:29:41 -07:00
sua yoo
821fbc12d8
Upgrade Shoelace to stable version (v2) (#856) 2023-05-22 10:01:48 -07:00
sua yoo
9fcbc3f87e
Allow users to set max depth/hop out within scope (#816)
- Adds an input to the Workflow creation and edit form for specifying crawl depth. This input is conditionally shown for seeded crawls when the scope is set to "Pages on this domain", "Pages on this domain & subdomains" or "Custom page prefix". The "any" scope is also supported for backwards compatibility but is not shown by default or in new configs.
- API implementation: The depth value is set in the primary seed config, i.e. the first seed in seeds: [], not in the outer .config.depth property.
2023-05-05 14:26:48 -07:00
sua yoo
9a1c2ba871
Fix workflow limit empty values being set to 0 (#795)
* default to null

* pass undefined for removing values

* handle 0 default
2023-05-03 09:25:22 -07:00
sua yoo
937ad4fe08
fix: navigate to watch on new crawl work
follows #720
2023-04-25 14:30:41 -07:00
sua yoo
7888c4fde3
Frontend crawl workflows rework (#775) 2023-04-25 14:16:07 -07:00
Henry Wilkinson
ba3daf326d
Adds inputmode attributes to workflow config fields (#755)
- Now the appropriate virtual keyboards are shown! :)
- Also adjusts type weight for workflow config headers to match mockups
2023-04-06 09:16:48 -07:00
Henry Wilkinson
c6aec84af4
Changes the autoscroll setting to true by default (#756)
As per my note on #745, currently all our other check boxes turn features on when enabled.  For consistency I have reversed the states of the autoscroll checkbox so the page autoscrolls when it is checked and does not run the behavior when it is unchecked.  Checked is also now the default state.

- Updates help text accordingly
- Renames `disableAutoscrollBehavior` → autoscrollBehavior
2023-04-06 09:06:55 -07:00
sua yoo
80bc4a3eb9
Fix additional URLs (#752) 2023-04-05 20:11:09 -07:00
sua yoo
91c2c1ad62
Allow users to set additional page time limits (#744) 2023-04-05 20:06:46 -07:00