Commit Graph

1245 Commits

Author SHA1 Message Date
wvengen
6278157f40
Make storage deletion work on more S3 providers, don't use access URL for deletion (#1600)
I came across [this
problem](https://forum.webrecorder.net/t/deleting-crawl-failure/512) and
noticed that the access URL is used when deleting files, causing my file
deletions to fail on OpenStack SWIFT S3 (relates to #1090). This trivial
change makes it work there.
2024-03-16 04:17:23 -04:00
sua yoo
eb7036bf87
Add QA tab to archived item detail (#1590)
Adds tab with placeholders as a starting point to work off of. The badge and button is not currently linked up to any data or actions.
2024-03-12 14:05:16 -07:00
Henry Wilkinson
16e8b761c0
Frontend: Various icon updates (#1569)
Closes #1568 

## Changes
- Status icons are now filled!
- Uses Bootstrap Icons' new `copy` icon for all actions involving
copying to clipboard!
  - Finally! A real copy icon! 🎉 
  - Removes `copy-code.svg` as it is no longer used
- Actions involving duplicating objects still use `files`... Which is
good! Now they have distinct symbols!
- Adds orange to the tailwind colour palette

---------

Co-authored-by: sua yoo <sua@webrecorder.org>
2024-03-12 15:18:10 -04:00
sua yoo
9f312c075e
Manually approve pages in QA review (#1576)
- Automatically update view to first page if page ID isn't specified
- Show current page URL in location bar (resolves
https://github.com/webrecorder/browsertrix-cloud/issues/1495)
- Approve, reject, or leave notes on a page
- Display temporary list of links to pages in the sidebar
2024-03-12 10:08:51 -07:00
Henry Wilkinson
8ba29ca776
Browsertrix Cloud → Browsertrix text rename (#1466)
Part of #1241

### Changes
- Renames all instances of "Browsertrix Cloud" to "Browsertrix" on the
front end, emails, and documentation

---------

Co-authored-by: emma <hi@emma.cafe>
2024-03-12 11:30:05 -04:00
Ilya Kreymer
08f6847194
Configurable Max Scale for frontend (#1557)
Allow maximum scale option to be fully configurable via
`max_crawl_scale`. Already configurable on the backend, and now exposed
to the frontend via API `/api/settings` `maxCrawlScale` value.

The workflow editor and workflow details are updated to allow selecting
the scale up to the maxCrawlScale setting (which defaults to 3 if not
set).
2024-03-11 16:21:20 -07:00
Emma Segal-Grossman
8462c08206
Fix a couple linting issues (#1565) 2024-03-11 16:20:37 -07:00
sua yoo
548261e663
Fix shoelace icon loading (#1587)
Loads `sl-icon` synchronously to get correct base path when running
webpack-dev-server.
2024-03-11 13:38:58 -07:00
dependabot[bot]
a5521c6866
Bump cryptography from 41.0.1 to 42.0.4 in /ansible (#1574)
Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.1
to 42.0.4.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-06 16:24:36 -08:00
Ilya Kreymer
ea494fa6e6
Merge V1.9.3 changes into main (#1583)
- Fix execution time checking by keeping lastUpdatedTime in db by
@ikreymer in https://github.com/webrecorder/browsertrix-cloud/pull/1573
- disable postcss-lit for var css
- Prevent closing tooltips from closing collection share dialog by
@SuaYoo in https://github.com/webrecorder/browsertrix-cloud/pull/1579
- Fix pending exclusion pagination by @SuaYoo in
https://github.com/webrecorder/browsertrix-cloud/pull/1578
- Fix regex escape in exclusion editor text match by @SuaYoo in
https://github.com/webrecorder/browsertrix-cloud/pull/1577

---------
Co-authored-by: emma <hi@emma.cafe>
Co-authored-by: sua yoo <sua@webrecorder.org>
2024-03-06 15:38:22 -08:00
Tessa Walsh
c20e754269
Add updatable QA reviewStatus field to crawls (#1575)
Fixes #1539 

Adds `reviewStatus` field to `BaseCrawl` model, updatable via the crawl
update API endpoint. Acceptable values are "good", "acceptable" or
"failure", enforced by an Enum.

Added to `BaseCrawl` so that we can extend support to uploads more
easily later on, but for now we'll only display this for crawls in the
frontend.
2024-03-05 16:49:23 -08:00
Emma Segal-Grossman
780dd09321
Create ArchivedItemPage and ArchivedItemPageComment types (#1567)
Based on #1534

Figured this should be in place so we can work on other front-end things
with these, rather than dealing with refactoring later

<!-- Fixes #issue_number -->

### Changes

- Adds `ArchivedItemPage` and `ArchivedItemPageComment` types from #1534
(thank you @SuaYoo!)
- Adds typedefs for match and resource count properties
- sets properties optional in the db schema to optional in the type as
well

### Manual testing

1.

### Screenshots

| Page | Image/video |
| ---- | ----------- |
|      |             |

<!-- ### Follow-ups -->
2024-03-04 18:52:09 -05:00
Tessa Walsh
ec0db1c323
Temporarily remove pages migration (#1572)
Removing until we have a better tested solution, including to avoid testing of QA runs for new crawls in beta.
2024-03-04 10:30:04 -08:00
Tessa Walsh
144000c7a3
Add guide for customizing Helm chart values (#1556)
Fixes #1555 

This is a first pass at some of the configuration options within the
Helm chart that might be most applicable to users. Emphasis is placed on
configuration that's particular to our application, such as storage and
crawler channels.

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-03-04 12:03:11 -05:00
Ilya Kreymer
09a0d51843
pages: set page status to 200 if unset and loadState != 0 (#1563)
Follow up to #1516, ensure page status is set to 200 if no status is
provided, if loadState is not 0
2024-02-29 15:15:17 -08:00
Ilya Kreymer
2ac6584942
Refactor operator class into module (#1564)
The operator class has gotten fairly large, this is a first pass in
refactoring operator.py into a submodule instead, with multiple operator
instances which handle different types of objects.

- The main k8s interface has been split into K8sOpApi which extends K8sApi
and is shared across all operators.
- Each operator extends BaseOperator which also has an instance of K8sOpApi
- The CrawlOperator is still the bulk of the functionality, but will likely be further refactored
to support QA jobs

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-02-29 14:40:12 -08:00
Tessa Walsh
da19691184
Add crawl errors incrementally during crawl (#1561)
Fixes #1558 

- Adds crawl errors to database incrementally during crawl rather than
after crawl completes
- Simplifies crawl /errors API endpoint to always return errors from
database
2024-02-29 09:16:34 -08:00
Ilya Kreymer
804f755787
Increase startup probe time to account for long-running migrations (#1560)
- increases the failureThreshold for startupProbe for the api backend
container to account for long running migrations, upto 300 seconds
- add `/healthzStartup` which checks if db is ready
- bump 
- keeps `/healthz` to always return 200 when running
- increases livenessProbe failureThreshold to be higher than readiness
probe, following recommended best practice of liveness probe > readiness
probe
- fixes #1559
2024-02-28 14:22:33 -08:00
Tessa Walsh
14189b7cfb
Add crawl pages and related API endpoints (#1516)
Fixes #1502 

- Adds pages to database as they get added to Redis during crawl
- Adds migration to add pages to database for older crawls from
pages.jsonl and extraPages.jsonl files in WACZ
- Adds GET, list GET, and PATCH update endpoints for pages
- Adds POST (add), PATCH, and POST (delete) endpoints for page notes,
each with their own id, timestamp, and user info in addition to text
- Adds page_ops methods for 1. adding resources/urls to page, and 2.
adding automated heuristics and supplemental info (mime, type, etc.) to
page (for use in crawl QA job)
- Modifies `Migration` class to accept kwargs so that we can pass in ops
classes as needed for migrations
- Deletes WACZ files and pages from database for failed crawls during
crawl_finished process
- Deletes crawl pages when a crawl is deleted

Note: Requires a crawler version 1.0.0 beta3 or later, with support for
`--writePagesToRedis` to populate pages at crawl completion. Beta 4 is
configured in the test chart, which should be upgraded to stable 1.0.0
when it's released.

Connected to https://github.com/webrecorder/browsertrix-crawler/pull/464

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-02-28 12:11:35 -05:00
sua yoo
974b919eef
docs: remove reference to prod 2024-02-26 13:26:47 -08:00
sua yoo
86a816662e
add api reference section 2024-02-26 12:58:21 -08:00
Emma Segal-Grossman
f6e82d9335
Archived item nav button quickfix (#1543)
Navigation buttons weren't being laid out properly and were overflowing
in unintentional ways, this fixes that, and then also updates navigation
buttons & puts them into use everywhere elements service the purpose of
navigation buttons were used instead!


<img width="452" alt="Screenshot 2024-02-24 at 10 37 41 PM"
src="https://github.com/webrecorder/browsertrix-cloud/assets/5727389/a77ed1be-3f95-4e03-a4d8-e3740229621e">
<img width="519" alt="Screenshot 2024-02-24 at 10 38 06 PM"
src="https://github.com/webrecorder/browsertrix-cloud/assets/5727389/684bc9a4-bec2-4258-b264-662dc441e75f">
<img width="273" alt="Screenshot 2024-02-24 at 10 38 20 PM"
src="https://github.com/webrecorder/browsertrix-cloud/assets/5727389/863d9d9a-121e-4682-8c12-eaf94ae69c7c">
<img width="410" alt="Screenshot 2024-02-24 at 10 38 25 PM"
src="https://github.com/webrecorder/browsertrix-cloud/assets/5727389/b321375c-d063-4c00-b876-36a592c85a35">
<img width="200" alt="Screenshot 2024-02-24 at 10 38 37 PM"
src="https://github.com/webrecorder/browsertrix-cloud/assets/5727389/62bbb5d1-d4f3-4ba3-8cd5-035242424f3a">
2024-02-25 02:04:53 -05:00
Ilya Kreymer
ae59617e02 ci fix: deploy-dev.yaml fix, install poetry earlier, add decrypt values to sparse checkout 2024-02-23 18:40:36 -08:00
Ilya Kreymer
5e003f36a0 ci: also publish helm chart for *-release branches 2024-02-22 23:54:23 -08:00
Tessa Walsh
fa35d8994f Disable useSitemap by default in new workflows (#1541) 2024-02-22 23:54:23 -08:00
Ilya Kreymer
8ae032ff88 More friendly WARC prefix inside WACZ based on Org slug + Crawl Name / First Seed URL. (#1537)
Supports setting WARC prefix for WARCs inside WACZ to `<org slug>-<slug
[crawl name | first seed host]>`.
- Prefix set via WARC_PREFIX env var, supported in browsertrix-crawler
1.0.0-beta.4 or higher
If crawl name is provided, uses crawl name, other hostname of first
seed. The name is 'sluggified', using lowercase alphanum characters
separated by dashes.

Ex: in an organization called `Default Org`, a crawl of
`https://specs.webrecorder.net/` and no name will have WARCs named:
`default-org-specs-webrecorder-net-....warc.gz`
If the crawl is given the name `SPECS`, the WARCs will be named
`default-org-specs-manual-....warc.gz`

Fixes #412 in a default way.
2024-02-22 23:54:23 -08:00
Ilya Kreymer
ba18abc063 Fix URL List showing scope accidentally (#1536)
fix call from when(...) to call function directly, avoid implicit true,
which results in page scope being shown for url list.
fixes #1535

To test:
1) Create new workflow of type URL List
2) Ensure the Crawl Scope drop down is not shown.

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2024-02-22 23:54:23 -08:00
Vinzenz Sinapius
be1dc80e4a
Deploy dev cluster with values from ops repo (#1530) 2024-02-21 17:08:00 -08:00
sua yoo
91ff95c8e9
Add new WIP QA Review page (#1500)
Resolves https://github.com/webrecorder/browsertrix-cloud/issues/1493

<!-- Fixes #issue_number -->

### Changes

Adds WIP QA page with basic grid layout sections and navigation.

### Manual testing

Page can be access by adding `/review/screenshots` or `/review/replay`
to a crawl detail page URL. For example:
```
/orgs/suas-dev-sandbox-2/items/crawl/manual-20240124023524-422e41d6-97d/review/screenshots
```

---------
Co-authored-by: emma <hi@emma.cafe>
2024-02-20 00:26:38 -08:00
Ilya Kreymer
a8e3ff1141 version: bump to 1.10.0-beta.0 2024-02-20 00:22:29 -08:00
Ilya Kreymer
c1cffe9ecd version: bump to 1.9.1 2024-02-16 09:44:18 -08:00
Ilya Kreymer
4cbe134a0e
one hop out: remove errant '|| true' from condition (#1532)
Fixes #1531, one hop out checkbox always checked.
2024-02-16 09:43:14 -08:00
Ilya Kreymer
64bf21311d version: bump to 1.9.0! 2024-02-14 13:30:46 -08:00
Ilya Kreymer
1d266e3cea bump to 1.9.0.beta.5 2024-02-12 18:29:39 -08:00
Emma Segal-Grossman
3ed10ad893
Add comments I meant to add in #1528 (#1529)
Tessa merged #1528 before I got to add these comments lol (it's my
fault, should have left the PR as draft until I was actually ready)
2024-02-12 19:33:16 -05:00
Emma Segal-Grossman
d88a6eb07f
Include leading zero in months when accessing usage and quota data (#1528)
Closes #1527 

Improves front-end types & ensures the data being accessed matches the
data sent by the back-end.

Tested by hand by using the returned data from the `/orgs/${orgId}`
endpoint in prod where this is happening in dev
2024-02-12 19:27:42 -05:00
Ilya Kreymer
4bc8152640 version: bump to 1.9.0-beta.4 2024-02-09 16:17:13 -08:00
Henry Wilkinson
652856e74c
docs: Adds more details about browser profile capabilities (#1523)
Fixes #1522

## Changes

- Adds further security recommendations to change the password to
accounts you care about after crawling

Adds more details about the capabilities afforded with browser profiles.
This is now split into the following sections:
- Logging into Websites
- Accepting Popups
- Changing Browser Settings
- More in the future???  Extensions???

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-02-09 16:16:47 -08:00
Ilya Kreymer
0653657637
better handling of failed redis connection + exec time updates (#1520)
This PR addresses a possible failure when Redis pod was inaccessible
from Crawler pod.
- Ensure crawl is set to 'waiting_for_capacity' if either no crawler
pods are available or no redis pod. previously, missing/inaccessible
redis would not result in 'waiting_for_capacity' if crawler pods are
available
- Rework logic: if no crawler and redis after >60 seconds, shutdown
redis. if crawler and no redis, init (or reinit) redis
- track 'lastUpdatedTime' in db when incrementing exec time to avoid
double counting if lastUpdatedTime has not changed, eg. if operator sync
fails.
- add redis timeout of 20 seconds to avoid timing out operator responses
if redis conn takes too long, assume unavailable
2024-02-09 16:14:29 -08:00
Emma Segal-Grossman
d1156b0145
enable a few more useful eslint suggestions & correct some more types (#1517)
## Changes

Implements suggestions from
https://typescript-eslint.io/blog/consistent-type-imports-and-exports-why-and-how/
and
https://www.totaltypescript.com/method-shorthand-syntax-considered-harmful,
along with a couple more auto-fixable consistency rules.

Of note:
- Functions that return a promise are marked as async
- Suggestions now appear for where to simplify boolean checks,
non-nullish assertions, and optional chaining
2024-02-09 16:14:08 -08:00
Emma Segal-Grossman
07edf697f0
Hotfix: Crawls page table click targets not applied to the right elements (#1524)
Fixes #1525

### Changes

- Changes one of the table cell component usages in the crawl list page
to correctly use the `rowClickTarget` prop, rather than setting the
class to `rowClickTarget`.
- Updates the `rowClickTarget` styling to only apply _within_ a
`<btrix-table-cell>`
2024-02-08 14:41:33 -08:00
Ilya Kreymer
65fec64197
storages: use asynccontextmanager instead of sync to close client (#1521)
Follow-up to #1481, use the asyncontextmanager with `async with` as only
used in async functions (which call run_in_executor)
2024-02-08 08:28:53 -08:00
Ilya Kreymer
b2a5dbf2cd
enable screenshots by default + fix py version formatting (#1518)
configmap: add --screenshot thumbnail,view as default screenshots 
version: update update-version.sh to add newline in version.py to match
new black formatting (from changes in #1507)
Fixes #1519
2024-02-07 17:07:28 -08:00
Ilya Kreymer
7aebce66f6 version: bump to 1.9.0-beta.3 2024-02-07 15:21:10 -08:00
Henry Wilkinson
3982064636
Fixes workflow selector keyboard navigation (#1514)
Fixes #1387

### Context

While checking some other keyboard navigation issues, I found that I was
unable to create a crawl workflow using only keyboard navigation. This
PR fixes that!

### Changes
- Changes from `<div>`s to `<button>`s so that these can be selected
with tab and enter.
- Adds tabindex for correct selection of items
- Removes the H3 & combines with window title
- Adds width and height to image and width to its container, should make
for a more stable layout while loading (#1387)
2024-02-07 15:11:20 -08:00
Henry Wilkinson
9c2228aa52
Updates browser profile selector help text (#1510) 2024-02-07 18:05:28 -05:00
Tessa Walsh
a898c2b456
Format backend with Black 24 (#1507)
Fixes #1506
2024-02-07 11:35:34 -08:00
Henry Wilkinson
45c9a91c9e
Docs: Improve relative links (#1476)
### Changes

- Fixes one broken link (["Ansible Playbooks"
here](https://docs.browsertrix.cloud/deploy/remote/))
- Formats relative links better to conform with [mkdocs 1.5 link
validation
improvements](https://www.mkdocs.org/about/release-notes/#expanded-validation-of-links)
2024-02-07 11:33:57 -08:00
Emma Segal-Grossman
f853fcdd81
Upgrade Prettier to 3 (#1513)
Updates Prettier to major version 3, and also updates a couple
prettier-related other things.

Prelude to #1511 so that that PR doesn't include a bunch of unrelated
changes
2024-01-31 20:56:17 -05:00
Emma Segal-Grossman
b5fe5551c5
Ensure linting & formatting runs in CI (#1512)
Makes sure code quality stays high by checking that code is linted &
formatted in CI.

### Reason

Frustration — so that [things like
this](https://github.com/webrecorder/browsertrix-cloud/pull/1500#issuecomment-1920087667)
don't happen in the future. I tried to merge `main` into a branch to get
it up to date with main, and main isn't totally formatted or linted
properly, and then formatting the codebase introduced a whole bunch of
unrelated changes. Running a formatter or linter shouldn't cause
unrelated code changes, and `main` should always be in a correct state
in terms of linting and formatting.

### Testing

- [x] Test run with failing lint checks errors:
https://github.com/webrecorder/browsertrix-cloud/actions/runs/7733354321/job/21085236200
- [x] Test run with failing formatting check errors:
https://github.com/webrecorder/browsertrix-cloud/actions/runs/7733501666/job/21085717519
- [x] Test run with both passing lint & formatting checks passes:
https://github.com/webrecorder/browsertrix-cloud/actions/runs/7733529142/job/21085796727
2024-01-31 18:25:44 -05:00