Commit Graph

1259 Commits

Author SHA1 Message Date
Ilya Kreymer
4bc8152640 version: bump to 1.9.0-beta.4 2024-02-09 16:17:13 -08:00
Henry Wilkinson
652856e74c
docs: Adds more details about browser profile capabilities (#1523)
Fixes #1522

## Changes

- Adds further security recommendations to change the password to
accounts you care about after crawling

Adds more details about the capabilities afforded with browser profiles.
This is now split into the following sections:
- Logging into Websites
- Accepting Popups
- Changing Browser Settings
- More in the future???  Extensions???

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-02-09 16:16:47 -08:00
Ilya Kreymer
0653657637
better handling of failed redis connection + exec time updates (#1520)
This PR addresses a possible failure when Redis pod was inaccessible
from Crawler pod.
- Ensure crawl is set to 'waiting_for_capacity' if either no crawler
pods are available or no redis pod. previously, missing/inaccessible
redis would not result in 'waiting_for_capacity' if crawler pods are
available
- Rework logic: if no crawler and redis after >60 seconds, shutdown
redis. if crawler and no redis, init (or reinit) redis
- track 'lastUpdatedTime' in db when incrementing exec time to avoid
double counting if lastUpdatedTime has not changed, eg. if operator sync
fails.
- add redis timeout of 20 seconds to avoid timing out operator responses
if redis conn takes too long, assume unavailable
2024-02-09 16:14:29 -08:00
Emma Segal-Grossman
d1156b0145
enable a few more useful eslint suggestions & correct some more types (#1517)
## Changes

Implements suggestions from
https://typescript-eslint.io/blog/consistent-type-imports-and-exports-why-and-how/
and
https://www.totaltypescript.com/method-shorthand-syntax-considered-harmful,
along with a couple more auto-fixable consistency rules.

Of note:
- Functions that return a promise are marked as async
- Suggestions now appear for where to simplify boolean checks,
non-nullish assertions, and optional chaining
2024-02-09 16:14:08 -08:00
Emma Segal-Grossman
07edf697f0
Hotfix: Crawls page table click targets not applied to the right elements (#1524)
Fixes #1525

### Changes

- Changes one of the table cell component usages in the crawl list page
to correctly use the `rowClickTarget` prop, rather than setting the
class to `rowClickTarget`.
- Updates the `rowClickTarget` styling to only apply _within_ a
`<btrix-table-cell>`
2024-02-08 14:41:33 -08:00
Ilya Kreymer
65fec64197
storages: use asynccontextmanager instead of sync to close client (#1521)
Follow-up to #1481, use the asyncontextmanager with `async with` as only
used in async functions (which call run_in_executor)
2024-02-08 08:28:53 -08:00
Ilya Kreymer
b2a5dbf2cd
enable screenshots by default + fix py version formatting (#1518)
configmap: add --screenshot thumbnail,view as default screenshots 
version: update update-version.sh to add newline in version.py to match
new black formatting (from changes in #1507)
Fixes #1519
2024-02-07 17:07:28 -08:00
Ilya Kreymer
7aebce66f6 version: bump to 1.9.0-beta.3 2024-02-07 15:21:10 -08:00
Henry Wilkinson
3982064636
Fixes workflow selector keyboard navigation (#1514)
Fixes #1387

### Context

While checking some other keyboard navigation issues, I found that I was
unable to create a crawl workflow using only keyboard navigation. This
PR fixes that!

### Changes
- Changes from `<div>`s to `<button>`s so that these can be selected
with tab and enter.
- Adds tabindex for correct selection of items
- Removes the H3 & combines with window title
- Adds width and height to image and width to its container, should make
for a more stable layout while loading (#1387)
2024-02-07 15:11:20 -08:00
Henry Wilkinson
9c2228aa52
Updates browser profile selector help text (#1510) 2024-02-07 18:05:28 -05:00
Tessa Walsh
a898c2b456
Format backend with Black 24 (#1507)
Fixes #1506
2024-02-07 11:35:34 -08:00
Henry Wilkinson
45c9a91c9e
Docs: Improve relative links (#1476)
### Changes

- Fixes one broken link (["Ansible Playbooks"
here](https://docs.browsertrix.cloud/deploy/remote/))
- Formats relative links better to conform with [mkdocs 1.5 link
validation
improvements](https://www.mkdocs.org/about/release-notes/#expanded-validation-of-links)
2024-02-07 11:33:57 -08:00
Emma Segal-Grossman
f853fcdd81
Upgrade Prettier to 3 (#1513)
Updates Prettier to major version 3, and also updates a couple
prettier-related other things.

Prelude to #1511 so that that PR doesn't include a bunch of unrelated
changes
2024-01-31 20:56:17 -05:00
Emma Segal-Grossman
b5fe5551c5
Ensure linting & formatting runs in CI (#1512)
Makes sure code quality stays high by checking that code is linted &
formatted in CI.

### Reason

Frustration — so that [things like
this](https://github.com/webrecorder/browsertrix-cloud/pull/1500#issuecomment-1920087667)
don't happen in the future. I tried to merge `main` into a branch to get
it up to date with main, and main isn't totally formatted or linted
properly, and then formatting the codebase introduced a whole bunch of
unrelated changes. Running a formatter or linter shouldn't cause
unrelated code changes, and `main` should always be in a correct state
in terms of linting and formatting.

### Testing

- [x] Test run with failing lint checks errors:
https://github.com/webrecorder/browsertrix-cloud/actions/runs/7733354321/job/21085236200
- [x] Test run with failing formatting check errors:
https://github.com/webrecorder/browsertrix-cloud/actions/runs/7733501666/job/21085717519
- [x] Test run with both passing lint & formatting checks passes:
https://github.com/webrecorder/browsertrix-cloud/actions/runs/7733529142/job/21085796727
2024-01-31 18:25:44 -05:00
Henry Wilkinson
b2d526f09a
docs: Explains execution time (#1475)
Fixes #1463 

### Changes
- Explains execution time
- Adds style guide section about adding a badge for paid features
- Updates config for mkdocs-material 9.5, materialx emoji support is
being removed.
- Adds better tooltips, a cool feature that also got released with
mkdocs-material 9.5
- Adds search suggestions

### Caveats
- [mkdocs 1.5 has improved the way they handle link
validation](https://www.mkdocs.org/about/release-notes/#expanded-validation-of-links).
Looks like way I've gone about linking things could be improved, and it
will give a bunch of warnings as a result. The site still builds fine,
but I'm going to fix this in a different PR so this one doesn't take as
much effort to review :)

EDIT: Here's that PR
https://github.com/webrecorder/browsertrix-cloud/pull/1476

### Testing
- Make sure you are up to date with `pip install --upgrade
mkdocs-material`

### Screenshot

**Badge!**
<img width="884" alt="Screenshot 2024-01-17 at 11 59 00 PM"
src="https://github.com/webrecorder/browsertrix-cloud/assets/5672810/62a51cf6-24bd-49f1-a6d0-d335f730bfbe">


### Future
- Should mkdocs-material be versioned in our deployment script? We risk
things breaking if I don't get to them fast enough! 🙃

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-01-31 15:12:39 -05:00
Emma Segal-Grossman
3968928ac2
ESLint improvements & Typescript upgrade (#1501)
## Overview

Adds a bunch of ESLint rules, mostly from `typescript-eslint`, and fixes
the issues turning on these rules raises.

Also updates Typescript & typescript-eslint.

## Rationale

Most of these new rules are auto-fixable, so I've tackled a bunch of the
little fixes that do need manual intervention now with the intention
that this shouldn't add much of any additional friction in future
development work, and also give us a good bump in overall code quality.
A lot of the rules here are also great for catching potential bugs!

## Changes

- Adds `void` to most un-awaited and unhandled promises (i.e. places
where async functions are called but nothing is done with the promise)
- Converts properties that are only ever read to `readonly`
- Adds a new `isApiError` function that informs Typescript of when an
error is an `APIError`
- Adds types to a bunch of places that were previously untyped
- Changes instances of `Map<string, any>` in lit property update methods
to `PropertyValues<this>`, or sometimes `PropertyValues<this> &
Map<string, unknown>` where private or protected members are used
(`keyof` doesn't include private and protected members, unfortunately)
  - Adds types to a bunch of custom events
- Cleans up a regex by removing unnecessary escape characters
- Makes a number of implied type conversions explicit (by wrapping with
`Boolean(...)` or calling `.toString()`)
- More consistently applies type coercions when necessary, and removes
them when unnecessary
- Converts a couple const strings to an enum
- Removes the need to type debounced functions as `any` by doing type
coercions to the underlying function type at where the method is bound
to the event in the `html` block
2024-01-31 14:42:06 -05:00
sua yoo
79645b64fe
Refactor collections and browser profile data-tables (#1505)
- Updates browser profile list styles to match other data table styles
- Makes entire collection item clickable
- Refactors row click area to fix text overflow
2024-01-30 19:46:42 -08:00
sua yoo
15e410daa1
Unify crawl and archived item list components (#1485) 2024-01-30 19:08:43 -05:00
sua yoo
ce37c7d02f
Upgrade to lit 3 (#1482)
- Upgrades to lit 3 to access new features
- Reduces number of installed lit versions
2024-01-28 21:48:40 -08:00
sua yoo
894fc63835
Refactor data table to use btrix-table component (#1474)
- Refactors `btrix-data-table` to use `btrix-table`
- Prevent tables from breaking layout at smaller screen size
2024-01-28 21:17:47 -08:00
Tessa Walsh
b252931c71
Add scale to CrawlOut (#1487)
Fixes #1486 

`scale` is already saved in the crawl but needed to be added to
`CrawlOut`.
2024-01-23 14:10:37 -08:00
sua yoo
73ea9815c4
Fix archived item crawl settings (#1473)
Fixes https://github.com/webrecorder/browsertrix-cloud/issues/1418

### Changes
- Fixes crawl detail always showing URL list seed settings
- Removes metadata section from crawl detail settings tab
2024-01-23 14:09:49 -08:00
sua yoo
534f5ff2c7
Increase app max width (#1484)
Increases max width of entire app

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-01-23 10:41:25 -08:00
sua yoo
1f55edbe68
Update collection archived item lists (#1457)
New features & enhancements:
- New UI for collection item selection dialog
- Consistent data table styles for collection list and collection item
list

Refactors:
- Adds `btrix-table` as low-level table component
- Adds `btrix-archived-item-list`, removes `checkbox-list` and
deprecates `crawl-list`
- Upgrades Shoelace for `sl-tree` fixes
- Fixes `ArchivedItem` typing
2024-01-22 17:14:53 -08:00
sua yoo
896c3cc91c
Fix scheduler date input and display (#1472)
Fixes #1255

### Changes

- Fixes incorrect time zone conversion when generating UTC schedule in
workflow.
- Fixes minute input display not prefixing single digits with `0`

Co-authored-by: emma <hi@emma.cafe>
2024-01-18 23:55:55 -08:00
Ilya Kreymer
bf38063e0a
Close sync S3 client (#1481)
Cleanup of boto3 sync client, ensure that it is used as a context manager like
async client.
2024-01-18 18:18:41 -05:00
Tessa Walsh
950844dc92
Add migration to fix issues with previous migrations (#1480)
Fixes #1479 

- Update null crawlTimeouts in db from null to 0
- Update crawlerChannel in configmaps
2024-01-18 16:59:40 -05:00
Ilya Kreymer
ad19941318
operator: use 'default' CRAWLER_CHANNEL if none is set (#1478)
Use default channel if CRAWLER_CHANNEL not set in crawlconfig configmap,
consistent with how other configmap settings for cronjobs are used.
2024-01-18 11:13:03 -08:00
Ilya Kreymer
e43feedc43 version: bump to 1.9.0-beta.2 2024-01-18 10:01:38 -08:00
Ilya Kreymer
370590b14f version: bump to 1.9.0-beta.1 2024-01-17 14:58:25 -08:00
Tessa Walsh
07fa46d9aa
Add custom user agent to workflows (#1465)
Fixes #1341

Adds "User Agent" field to workflow editor under the Browser Settings
tab. If not set, the crawler will use the browser's default user agent.

Also added to docs and to the workflow details page (if set).

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2024-01-17 17:33:50 -05:00
Emma Segal-Grossman
7282274502
Hotfix: ignore everything in ./tests and playwright.config.ts when checking types during webpack build (#1470)
[Frontend Build
Check](https://github.com/webrecorder/browsertrix-cloud/actions/workflows/frontend-build-check.yaml)
was failing on main bc Webpack was type-checking a number of files that
require various `devDependencies`, which are purposefully not installed
at this point to mirror `frontend/Dockerfile` behaviour.
2024-01-16 18:01:01 -08:00
Ilya Kreymer
90197b2a85
Backend mem usage fix - use fixed MOTOR_MAX_WORKERS + switch to gunicorn (#1468)
Refactors backend deployment to:
- Use MOTOR_MAX_WORKERS (defaulting to 1) to reduce threads used by
mongodb connections
- Also sets backend workers to 1 by default to reduce default memory
usage
- Switches to gunicorn with uvloop worker for production use instead of
uvicorn (as recommended by uvicorn)

Lower thread count should address memory leak/increased usage, which
resulted in 5x thread x cpus x workers, eg. potentially 20 or 40 threads
just for mongodb connections. Lower default number of workers should
make it easier to scale backend with HPA if additional capacity.

Fixes #1467
2024-01-16 15:32:42 -08:00
Tessa Walsh
032859f361
Support multiple crawler versions (#1420)
Fixes #1385 

## Changes
Supports multiple crawler 'channels' which can be configured to
different browsertrix-crawler versions
- Replaces `crawler_image` in helm chart with `crawler_channels` array
similar to how storages are handled
- The `default` crawler channel must always be provided and specifies
the default crawler image
- Adds backend `/orgs/{oid}/crawlconfigs/crawler-channels` API endpoint
to fetch information about available crawler versions (name, image, and
label) and test
- Adds crawler channel select to workflow creation/edit screens and
profile creation dialog, and updates related API endpoints and
configmaps accordingly. The select dropdown is shown only if more than
one channel is configured.
- Adds `crawlerChannel` to workflow and crawl details.
- Add `image` to crawler image, used to display actual image used as
part of the crawl.
- Modifies `crawler_crawl_id` backend test fixture to use `test` crawler
version to ensure crawler versions other than latest work
- Adds migration to add `crawlerChannel` set to `default` to existing
workflow and profile objects and workflow configmaps

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-01-16 15:32:12 -08:00
Henry Wilkinson
05c5e09d25
Adds status information to user documentation (#1459)
Closes #1434 

### Changes
#### Developer
- Adds the K3S playbook guide to the navigation
- Adds note about restarting MKDocs when adding new icons
- Adds note about concise language to the styleguide ([see previous
discussion](https://github.com/webrecorder/browsertrix-cloud/pull/1394#discussion_r1402666872))
- Adds a note about noun usage to the styleguide
#### User guide
- Adds tables for archived item and workflow statuses
- Adds custom styles for displaying statuses with their icons like we do
in the app
- Fixes capitalization issues
---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: sua yoo <sua@webrecorder.org>
2024-01-14 16:44:51 -08:00
Tessa Walsh
9f73bafd37
Fix browser profile name in crawl endpoints (#1464)
Fixes #1388 

Fixes browser profile name lookup by ensuring profileid is in CrawlOut
model.
2024-01-14 16:30:27 -08:00
Tessa Walsh
138e2da8b3
Add setup command to btrix helper to copy local config (#1462)
Fixes #1157 

Adds `./btrix setup` command to `btrix` helper which copies the example
local config to `./chart/local.yaml` where `btrix` expects it.

If another command is run when the local config file doesn't yet exist,
the helper will stop and suggest to the user to run `./btrix setup` and
edit the resulting file.
2024-01-10 19:32:39 -08:00
Tessa Walsh
38a01860b8
Add API endpoints for crawl statistics (#1461)
Fixes #1158 

Introduces two new API endpoints that stream crawling statistics CSVs
(with a suggested attachment filename header):

- `GET /api/orgs/all/crawls/stats` - crawls from all orgs (superuser
only)
- `GET /api/orgs/{oid}/crawls/stats` - crawls from just one org
(available to org crawler/admin users as well as superusers)

Also includes tests for both endpoints.
2024-01-10 13:30:47 -08:00
Emma Segal-Grossman
99dd9b4acb
Remove non-prod & optional dependencies when building frontend in ci (#1455)
Fixes #1454 

## Motivation

We've had a number of cases recently where a build dependency is added
to `devDependencies`, the PR passes the frontend build check
(`frontend-build-check.yaml`) in the branch, and then fails the cluster
run (`k3d-ci.yaml`) in `main` because the frontend build check installs
all dependencies, whereas the cluster run uses the frontend Dockerfile,
which skips everything but prod dependencies.

## Changes

This runs an additional step in the frontend build check, after running
unit tests and the l10n build but before doing the build, that re-runs
`yarn` with the same arguments as are in the frontend Dockerfile,
installing just prod dependencies.

This results in slightly longer frontend build check runtimes, but
should save us some wasted time fixing broken `main`.
2024-01-10 11:46:17 -08:00
Ilya Kreymer
a6936299d3 version: bump to 1.9.0-beta.0 2023-12-20 00:08:16 -08:00
sua yoo
dbd48cf8e3
Improvements to collection creation and editing flow (#1424)
Resolves https://github.com/webrecorder/browsertrix-cloud/issues/1333

- Moves "Select Crawls" / "Select Uploads" steps into a single "Select
Archived Items" dialog
- Refactors new collection metadata dialog to accept editing existing
collection
- Prevents RWP component from rendering if there are no archived items
(@Shrinks99 made a comment about this figma, but this prevents
unnecessary requests when there isn't an archive to replay)
- Shows collection description at bottom of detail page at all times
(@Shrinks99 seems useful to see even on archived items view?)
- Switches collection detail primary action to "Add Archived Items" if
none are included (cc @Shrinks99)
- Displays friendlier "name taken" error
- Removes unused Collection edit route
- Upgrades markdown dependencies for fixes/improvements to description
editing

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-12-19 18:12:43 -08:00
Tessa Walsh
4fe014067e
Set runNow to false when editing existing workflows (#1458)
Fixes #1339
2023-12-18 14:23:04 -05:00
Emma Segal-Grossman
73e20269ef
Org settings layout fix + misc styling & consistency improvements (#1427)
## General changes

- Added `postcss-lit`, which allows us to use tailwind in lit elements
with shadow DOMs
- Added `// postcss-lit-disable-next-line` comments to most `` css`...`
`` tagged templates so as not to change existing CSS in components
- Added `TailwindElement`, which uses a single shared `CSSStyleSheet`
across all instances to be able to access Tailwind without requiring a
full copy of (compiled) Tailwind for every instance of a component that
extends it
- Added a new `<btrix-copy-field>` element, replacing the existing copy
elements

## Org settings page

- Stopped content from overflowing at medium widths
- Made spacing consistent at both smaller and wider widths
- Used readonly/monospace styling for copyable org id field
- Updated tab shadows to be slightly blue, consistent with the tab
background (also did this in other places tabs show up)

Before | After
-|-
![dev browsertrix
cloud_orgs_default-org_settings](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/9bcacdcc-259b-4a01-bac5-8913518776f0)
|
![localhost_9870_orgs_default-org_workflows_crawls](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/53936d4d-e5cd-4f37-ad06-b3b5041381df)
![dev browsertrix cloud_orgs_default-org_settings
(3)](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/602dd8d6-3012-4a0e-a638-a5192c9601ec)
| ![localhost_9870_orgs_default-org_workflows_crawls
(3)](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/74c93312-ad26-48d8-a87e-3da9a851693b)

## Misc fixes

- Used consistent single-line readonly/monospace styling for copyable
url field

Before | After
-|-
![dev browsertrix cloud_orgs_default-org_settings
(1)](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/e361feeb-3ea0-4f56-9e38-12ef6a644d58)
| ![localhost_9870_orgs_default-org_workflows_crawls
(1)](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/0145b1ad-8f45-4486-893e-8f638ac9add6)

- Removed inconsistent angled bottom borders from crawl workflow list
header

Before | After
-|-
![dev browsertrix cloud_orgs_default-org_settings
(2)](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/4aa20359-3ecf-4441-83c0-ed36a951ed3b)
| ![localhost_9870_orgs_default-org_workflows_crawls
(2)](https://github.com/webrecorder/browsertrix-cloud/assets/5727389/8c771464-3a70-47e7-8475-fa82d4d030a9)

- Changes _all_ list page primary action buttons to use
`variant="primary"`

<img width="190" alt="Screenshot 2023-12-08 at 11 23 49 AM"
src="https://github.com/webrecorder/browsertrix-cloud/assets/5672810/2b007f5e-e675-40b2-86a7-f0bf8ef83b81">
<img width="240" alt="Screenshot 2023-12-08 at 11 23 43 AM"
src="https://github.com/webrecorder/browsertrix-cloud/assets/5672810/621b340e-2051-4ab0-8f42-8f0a51d8d3a5">

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: sua yoo <sua@suayoo.com>
2023-12-13 17:29:35 -05:00
sua yoo
2bb21c615d
Improve frontend event system (#1450)
- Adds notify, navigate, and log in events to global event map, handle
in `btrix-app`
- Adds console debugs, which are stripped in prod
- Replaces TODO redundant `navTo`s with controller implementation
- Refactors rest of `LitElement` helpers into arrow functions
2023-12-13 14:11:15 -08:00
Emma Segal-Grossman
647562be73
Use execution duration formatter in table view (#1449)
More-or-less cherry-picked from #1433 

## Changes
- Updated the data table to use `formatExecutionSeconds` rather than
`formatSeconds`
- Fixed an issue in `formatExecutionSeconds` where the time in minutes
would sometimes be displayed twice when `options.displaySeconds` was
false or unset

## Testing
Tested locally with orgs with and without execution limits of various
kinds set
2023-12-13 15:43:28 -05:00
sua yoo
603ace0740
Fix redirect to login page (#1445)
Fixes https://github.com/webrecorder/browsertrix-cloud/issues/1436, regression introduced in
https://github.com/webrecorder/browsertrix-cloud/pull/1381
2023-12-13 09:53:29 -08:00
Ilya Kreymer
d74d9ac09d
Recreate configmaps if missing (#1444)
If configmap is missing (eg. was accidentally deleted from k8s) recreate
the configmap when updating the crawl workflow or running a crawl.
Previously, this would result in an error, but now the configmap should
be correctly recreated.
2023-12-12 17:48:27 -05:00
sua yoo
3251b06e06
Fix fetch helper (#1442)
Fixes https://github.com/webrecorder/browsertrix-cloud/issues/1441,
regression introduced in
https://github.com/webrecorder/browsertrix-cloud/pull/1423

### Manual testing

1. Log in and go to "Archived Items"
2. Click a crawl. Verify that "Sorry, couldn't retrieve crawl logs"
notification doesn't show and logs fetch as expected.

### Follow-ups

Consistency pass on rest of `LitElement` helpers here:
https://github.com/webrecorder/browsertrix-cloud/pull/1443
2023-12-12 15:11:29 -05:00
Emma Segal-Grossman
a5dd35bd6e
Only load webpack-bundle-analyzer if BUNDLE_ANALYZER env var is present (#1446)
Fixes build failing in main

Tested with a local build (`./scripts/build-frontend.sh`)
2023-12-12 14:37:50 -05:00
sua yoo
7e4650ed61
Fix runtime error on log out (#1439)
Fixes https://github.com/webrecorder/browsertrix-cloud/issues/1438
2023-12-11 18:37:50 -08:00