Commit Graph

1516 Commits

Author SHA1 Message Date
Ilya Kreymer
a7c8ca4028 version: bump to 1.14.0-beta.1 2025-02-17 16:48:27 -08:00
Tessa Walsh
6c2d8c88c8
Modify page upload migration (#2400)
Related to #2396 

Changes to migration 0037:
- Re-adds pages in migration rather than in background job to avoid race
condition with later migrations
- Re-adds pages for all uploads in all orgs

Fix for readd pages for org:
- Ensure org filter is applied!
- Fix wrong type
- Remove distinct, use iterator to iterate over crawls faster.

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-02-17 16:47:58 -08:00
Emma Segal-Grossman
629cf7c404
Add a small sticky banner when logged in as superadmin (#2393)
While ideally we don't need to use superadmin for many things, there are
still a lot of places where it's necessary, especially around customer
service. This makes it a little more visible when that's the case, just
as a reminder. I could see this coming in handy especially for newer
people who might not have the experience to know to look for the "admin"
and "running crawls" buttons.

<img width="1088" alt="Screenshot 2025-02-13 at 1 12 58 PM"
src="https://github.com/user-attachments/assets/70b975e1-af6b-4e8c-9e49-52c4c66e9721"
/>
2025-02-17 17:42:36 -05:00
Ilya Kreymer
5bebb6161a
Issue 2396 readd pages fixes (#2398)
readd pages fixes:
    - add additional mem to background job
- copy page qa data to separate temp coll when re-adding pages, then
merge back in
2025-02-17 13:52:11 -08:00
Ilya Kreymer
e112f96614
Upload Fixes: (#2397)
- ensure upload pages are always added with a new uuid, to avoid any
duplicates with existing uploads, even if upload wacz is actually a
crawl from different browsertrix instance, etc..
- cleanup upload names with slugify, which also replaces spaces, fixes
uploading wacz filenames with spaces in them
- part of fix for #2396

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-02-17 13:05:33 -08:00
Emma Segal-Grossman
44ca293999
Replace 2-digit years with numerical years everywhere in the frontend (#2394)
Closes #2365

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-02-13 22:23:13 -08:00
Tessa Walsh
39d99e7c5d
Add support for custom link selectors to backend (#2346)
Related to #2152 

This PR adds backend support for custom link selectors via `selectLinks`
on the crawl workflow config. Tests have been updated as well.

It also adds `selectLinks` to the frontend in a minimal and for now
hardcoded way that we can use as a basis for proper frontend support
moving forward.

---------

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2025-02-13 22:22:27 -08:00
Emma Segal-Grossman
659e124168
Disable "Update collection thumbnail" checkbox on initial page selection dialog until thumbnail is loaded (#2392)
Closes #2391
2025-02-13 22:03:13 -08:00
Emma Segal-Grossman
0f2da4f785
Allow showing all collections as well as just public ones in org dashboard (#2379)
Adds a switch to switch between viewing public collections only
(default) and all collections on org dashboard.

Also updates the `house-fill` icon to `house` in a couple places
(@Shrinks99)

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2025-02-13 21:59:29 -08:00
Ilya Kreymer
4516268a70
misc fixes: cors + disable buffering for uploads (#2395)
- ensure pages endpoint support CORS for local dev
- disable proxy request buffering to support large uploads
2025-02-13 19:38:20 -08:00
Tessa Walsh
7f1af9bb31
Mark all pages from pages.jsonl as seeds (#2390)
Fixes #2389 

All pages from `pages/pages.jsonl` files now have `isSeed: True` in the
database, in addition to any pages that explicitly have `seed` set to
true in the actual JSONL.

Tests have been added to ensure that all pages from our fixture uploads
have `isSeed: True`.

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-02-13 16:54:30 -08:00
Ilya Kreymer
7b2932c582
Add initial pages + pagesQuery endpoint to /replay.json APIs (#2380)
Fixes #2360 

- Adds `initialPages` to /replay.json response for collections, returning
up-to 25 pages (seed pages first, then sorted by capture time).
- Adds `pagesQueryUrl` to /replay.json
- Adds a public pages search endpoint to support public collections.
- Adds `preloadResources`, including list of WACZ files that should
always be loaded, to /replay.json

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-02-13 16:53:47 -08:00
sua yoo
73f9f949af
chore: Add pylint to vscode extensions (#2387)
No issue created for this, small devex improvement after I noticed
linting errors weren't surfaced in vscode.
2025-02-12 19:40:27 -08:00
Ilya Kreymer
b121076e63
quickfix: add missing dependency for docs (#2388)
follow-up to #2368:
- add mkdocs-redirect to frontend Docker, docs build ci
- build frontend when changing mkdocs
2025-02-12 16:39:06 -05:00
Henry Wilkinson
edf1edbbd1
docs: Add Documentation for Sharing Collections (#2368)
- Merges existing collection content into one page
- Updates ArchiveWeb.page link
- Adds redirect from /collections → /collection
- Moves content relevant to presentation & sharing out of the intro
- Adds new content about sharing collections!

---------

Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: sua yoo <sua@webrecorder.org>
2025-02-12 14:05:52 -05:00
sua yoo
f7b9b73a68
fix: Sort filtered collection page URLs (#2384)
Fixes https://github.com/webrecorder/browsertrix/issues/2383

- Fixes unpredictable sort order when typing in collection page URL
- Fixes page URL results flickering in and out while typing

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-02-12 11:59:20 -05:00
Ilya Kreymer
5b02d81991
ensure collection is fully reloaded after an archived item is added o… (#2386)
…r removed

follow up to #2332

Testing:
1. Add or remove an archived item.
2. Switch to Replay view. Collection should reload and update the page
list.
2025-02-11 23:12:47 -08:00
Henry Wilkinson
3586412da1
docs: Adds section for autoclick behavior addition from 1.13.3 (#2385)
- Adds section for the autoclick behavior 
- Removes sections that were removed with the new workflow form... and
in some cases much earlier! 😅
2025-02-12 00:22:05 -05:00
sua yoo
7ce115588e
fix: Update links to running crawls (#2378)
- Updates links to running crawls to redirect to workflow "Watch" tab
- Removes unused "Jump to crawl" superadmin widgets
- Refactors archived item component to remove references to active
crawls
2025-02-11 17:08:27 -08:00
sua yoo
0e04fd98b1
fix: More accurate archived item details (#2364)
- Moves page count out from under "Size" label in archived item detail
- Renames "Pages Crawled" to "Pages" in archived item leading heading
and detail overview
- Renames "Crawl ID" to "Archived Item ID"

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2025-02-11 16:46:13 -08:00
Emma Segal-Grossman
f8a44258d8
Merge pull request #2332 from webrecorder/frontend-collection-editing-dialog
Collection editing and sharing revamp
2025-02-11 18:27:35 -05:00
Tessa Walsh
d4032d4ea2
Add autoclick to workflow and crawl settings display (#2374)
Also rename Auto-Scroll in UI to Autoscroll for consistency
2025-02-11 10:28:30 -05:00
Tessa Walsh
98a45b0d85
Add collection page list/search endpoint (#2354)
Fixes #2353

Adds a new endpoint to list pages in a collection, with filtering
available on `url` (exact match), `ts`, `urlPrefix`, `isSeed`, and
`depth`, as well as accompanying tests. Additional sort options have
been added as well.

These same filters and sort options have also been added to the crawl
pages endpoint.

Also fixes an issue where `isSeed` wasn't being set in the database when
false but only added on serialization, which was preventing filtering
from working as expected.
2025-02-10 16:44:37 -08:00
Ilya Kreymer
001839a521
Fix max pages quota setting and display (#2370)
- add ensure_page_limit_quotas() which sets the config limit to the max
pages quota, if any
- set the page limit on the config when: creating new crawl, creating
configmap
- don't set the quota page limit on new or existing crawl workflows
(remove setting it on new workflows) to allow updated quotas to take
affect for next crawl
- frontend: correctly display page limit on workflow settings page from
org quotas, if any.
- operator: get org on each sync in one place
- fixes #2369

---------

Co-authored-by: sua yoo <sua@webrecorder.org>
2025-02-10 16:15:21 -08:00
Henry Wilkinson
aae1c02b3a
fix: create new profile link in the workflow form (#2373)
Closes #2372

[Original bug report on the
forum](https://forum.webrecorder.net/t/new-browser-profile-button-is-disabled/776)

### Changes
- Fixes broken link, `?new` → `?new=browser-profile`
2025-02-10 17:33:50 -05:00
sua yoo
a04a2280c4
Merge org public gallery settings (#2356)
- Merges public gallery settings into general org settings
- Adds help text to "Org URL" to highlight impact of changing slug
2025-02-10 10:46:20 -08:00
Tessa Walsh
0e9e70f3a3
Add WACZ filename, depth, favIconUrl, isSeed to pages (#2352)
Adds `filename` to pages, pointed to the WACZ file those files come
from, as well as depth, favIconUrl, and isSeed. Also adds an idempotent
migration to backfill this information for existing pages, and increases
the backend container's startupProbe time to 24 hours to give it sufficient
time to finish the migration.
---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-02-05 15:50:04 -05:00
sua yoo
8cfa28733a
fix: More accurate workflow and archived item search (#2363)
Sorts workflow and items search results by match score.
2025-02-05 09:24:53 -08:00
sua yoo
18e72262dd
feat: Enable viewing all workflow form sections at once (#2310)
- Displays workflow form as collapsible sections
- Combines run now toggle into submit
- Fixes exclusion field errors not preventing form submission
- Refactors `<btrix-observable>` into new `Observable` controller

---------

Co-authored-by: emma <hi@emma.cafe>
2025-02-04 12:56:36 -08:00
sua yoo
83211b2f19
fix: Re-enable workflow setup guide button (#2358)
Fixes workflow setup guide not showing when button is clicked
2025-02-03 21:10:30 -08:00
Ilya Kreymer
ea3b5e7322 quickfix: fix typo (missing self) that did not make it into #2351 2025-01-30 13:11:42 -08:00
Tessa Walsh
0a8df62ab4
Ensure collection stats are updated when WACZ is added on upload (#2351)
Fixes #2350 

Collection earliest/latest dates and the collection modified date are
also now updated when crawls or uploads are added to a collection via
the collection auto-add feature.
2025-01-30 13:05:56 -08:00
Tessa Walsh
b0aebb599a
Reformat with Black for 2025 ruleset (#2349) 2025-01-29 16:57:06 -05:00
Ilya Kreymer
514811701f
Translations update from Hosted Weblate (#2317) (#2343)
Translations update from [Hosted Weblate](https://hosted.weblate.org)
for

[Browsertrix/Browsertrix](https://hosted.weblate.org/projects/browsertrix/browsertrix/).

Current translation status:

![Weblate translation

status](https://hosted.weblate.org/widget/browsertrix/browsertrix/horizontal-auto.svg)

---------

Co-authored-by: Weblate (bot) <hosted@weblate.org>
Co-authored-by: Bricaud Frédéric <frederic.bricaud@banq.qc.ca>
Co-authored-by: Webrecorder Dev <dev@webrecorder.org>
2025-01-27 20:43:42 -08:00
Ilya Kreymer
4fa3bc492f
cleanup of loc messages that resulted in errors in some translations (#2340)
- remove str`` where it is not needed
- resolve templates to use simple variable in str``
- combine into single str``
2025-01-27 20:10:47 -08:00
sua yoo
3c860775b9
feat: Update references to org public profile -> gallery (#2330)
- Renames public URL prefix to `explore`
- Updates org settings sections
- Removes or renames references to "org profile"

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2025-01-27 13:48:38 -08:00
sua yoo
84ae73df18
feat: UX improvements to collections with single URL (#2325)
Resolves https://github.com/webrecorder/browsertrix/issues/2322

## Changes

- Sets default start page if collection only contains one page
- Removes status code from snapshot options
2025-01-25 17:18:22 -08:00
Tessa Walsh
9363095d62
Validate exclusion regexes on backend (#2316) 2025-01-23 13:32:54 -05:00
Tessa Walsh
763c654484
feat: Update collection sorting, metadata, stats (#2327)
- Refactors dashboard and org profile preview to use private API
endpoint, to fix public collections not showing when the org
visibility is hidden
- Adds additional sorting options for collections
- Adds unique page url counts for archived items, collections, and
organizations to backend and exposes this in collections
- Shows collection period (i.e. `dateEarliest` to `dateLatest`) in
collections list
- Shows same collection metadata in private and public views, updates
private view info bar
- Fixes "Update Org Profile" action item showing for crawler roles

---------

Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-01-23 13:32:23 -05:00
sua yoo
f8976e688a
fix: Use default collection thumbnail if selected (#2331)
Fixes issue where collection thumbnail is always the screenshot, even if
a Browsertrix provided default thumbnail is selected after choosing the
screenshot.
2025-01-22 14:02:56 -08:00
Ilya Kreymer
28d39d8c4d
Fix migration to avoid duplicate collection slugs and names (#2318)
Follow-up to #2301 

Updates the 0039 migration to ensure collection slugs and names are
unique by:
- Removing all indexes
- Setting `slug` to random value
- Adding unique index to `slug` field.
- Attempting to set slug from name using `slug_from_name()`
- If rejected due to duplicate, append `-<counter>` at end of slug. Also
update name with ` <counter>`.
- Now that names should also be unique, add unique index on name field.

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-01-21 14:23:32 -08:00
Tessa Walsh
6797b41de0
Add pageCount to crawls and uploads and use in frontend for page counts (#2315)
Fixes #2257 

This is a follow-up to the public collections work, which adds pages to
the database for uploads. All crawls and uploads now have a `pageCount`
field which is populated when the item is successfully added. A new
migration is also added to populate the field for existing archived
items that don't have it set yet.

OrgMetrics have also been modified to include `crawlPageCount` and
`uploadPageCount`, and to include the total of both in `pageCount`, and
all three included in the frontend org dashboard.

The frontend has been updated to use `pageCount` rather than
`stats.done` wherever appropriate, meaning that in archived item lists
and details we now have a consistent page count for both crawls and
uploads.

### New functionality

- Deploy this branch
- Create new crawls and uploads and verify that page count appears
correctly throughout the frontend for all new crawls and uploads

### Migration

- Deploy from latest main
- Create some crawls and uploads
- Change to this branch and re-deploy
- Verify migration ran without errors in backend logs
- Verify that page count has been populated successfully by checking
archived items lists, crawl and upload detail pages, and dashboard to
ensure there are no longer any missing page counts.

---------

Co-authored-by: emma <hi@emma.cafe>
2025-01-16 14:41:14 -08:00
Tessa Walsh
5684e896af
Add support for autoclick (#2313)
Fixes #2259 

This PR brings backend and frontend support for the new autoclick
behavior in Browsertrix, introduces in Browsertrix 1.5.0+

On the backend, we introduce `min_autoclick_crawler_image` to
`values.yaml`, with a default value of
`"docker.io/webrecorder/browsertrix-crawler:1.5.0"`. If this is set and
the crawler version for a new crawl is less than this value, the
autoclick behavior is removed from the behaviors list in the configmap
created for the crawl.

The one caveat for this is that a crawler image tag like "latest" will
always be parsed as greater than `min_autoclick_crawler_image`, so there
is the potential for the crawler to run into issues if using a
non-numeric image tag with an older version of the crawler. For
production we use hardcoded specific versions of the crawler except for
the dev channel, which from here on out will including autoclick
support, so I think this should be okay (and is also true of the
existing implementation for checking `min_qa_crawler_image`).

On the frontend, I've added a checkbox (unchecked by default) in the
"Limits" section just below the current checkbox for autoscroll. We
might want to move these to a different section eventually - I'm not
sure Limits is the right place for them - but I wanted to be consistent
with things as they are.

---------

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2025-01-16 12:44:00 -08:00
Dmitriy Pertsev
246bcc73c5
Use new ingressClassName only by default (#2268)
- By default, use only `ingressClassName` for ingress class name and
corresponding field in cert-manager
- Only use old 'kubernetes.io/ingress.class' if
ingress.useOldClassAnnotation is set
- Allow for using old annotation only for backwards compatibility, eg.
for GCP
- Closes #2267 and #1570

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-01-15 23:23:50 -08:00
Ilya Kreymer
bce75b35fa
Translations update from Hosted Weblate (#2296) (#2314)
Translations update from [Hosted Weblate](https://hosted.weblate.org)
for

[Browsertrix/Browsertrix](https://hosted.weblate.org/projects/browsertrix/browsertrix/).



Current translation status:

![Weblate translation

status](https://hosted.weblate.org/widget/browsertrix/browsertrix/horizontal-auto.svg)

---------

Co-authored-by: Weblate (bot) <hosted@weblate.org>
Co-authored-by: Bricaud Frédéric <frederic.bricaud@banq.qc.ca>
Co-authored-by: Carole Gagné <carole.gagne@banq.qc.ca>
Co-authored-by: Webrecorder Dev <dev@webrecorder.org>
Co-authored-by: weblate <1607653+weblate@users.noreply.github.com>
2025-01-15 23:19:02 -08:00
sua yoo
a64f3a6c4c
fix: Fully load thumbnail before save (#2307)
Fixes https://github.com/webrecorder/browsertrix/issues/2306

## Changes

Refactors collection view configuration to wait for thumbnail preview
image (using `URL.createObjectURL`, like in QA screenshots) to be fully
loaded from `replay-web-page` before saving.
2025-01-15 22:58:32 -08:00
Tessa Walsh
4583babecb
feat: Add slug to collections and use it in public collection URLs (#2301)
Resolves https://github.com/webrecorder/browsertrix/issues/2298

## Changes

- Slugs added to collections, can be specified separately when creating
or updating collections or else is based off of supplied collection name
- Migration added to backfill slugs for existing collections
- Redirect collection to newest slug if changed
- Adds option to copy public profile link to "Public Collections" action
menu
- Show "Back to <Org>" link instead of breadcrumbs

---------
Co-authored-by: sua yoo <sua@suayoo.com>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-01-15 22:44:32 -08:00
sua yoo
21db8e1b83
fix: Fix workflow crawl list layout (#2309)
- Fixes workflow detail page crawls tab issue when the crawls list is
long
- Removes extraneous and incorrectly placed spinner
2025-01-15 09:23:18 -08:00
Henry Wilkinson
06eea7979a
ui: Replaces boring thumbnail gradients with fun squiggles! (#2305)
- Updates thumbnails
- Bonus ~30% size reduction per image due to better dialed in
compression settings!
2025-01-14 16:27:13 -05:00
sua yoo
dd22fd11ee
deps: Improve Webpack build performance (#2288)
- Upgrades webpack and webpack tool versions
- Updates dev source map to webpack recommendation
- Implements `webpack.DllPlugin` in dev for faster rebuilds
- Implements `thread-loader` to run `ts-loader` in a worker pool
2025-01-14 12:55:12 -08:00