Commit Graph

874 Commits

Author SHA1 Message Date
Ilya Kreymer
9a2787f9c4
User refactor + remove fastapi_users dependency + update fastapi (#1290)
Fixes #1050 

Major refactor of the user/auth system to remove fastapi_users
dependency. Refactors users.py to be standalone
and adds new auth.py module for handling auth. UserManager now works
similar to other ops classes.

The auth should be fully backwards compatible with fastapi_users auth,
including accepting previous JWT tokens w/o having to re-login. The User
data model in mongodb is also unchanged.

Additional fixes:
- allows updating fastapi to latest
- add webhook docs to openapi (follow up to #1041)

API changes:
- Removing the`GET, PATCH, DELETE /users/<id>` endpoints, which were not
in used before, as users are scoped to orgs. For deletion, probably
auto-delete when user is removed from last org (to be implemented).
- Rename `/users/me-with-orgs` is renamed to just `/users/me/`
- New `PUT /users/me/change-password` endpoint with password required to update password, fixes  #1269, supersedes #1272 

Frontend changes:
- Fixes from #1272 to support new change password endpoint.

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: sua yoo <sua@suayoo.com>
2023-10-18 10:49:23 -07:00
sua yoo
4610d95cd7
Use org slug in place of UUIDs in app URLs (#1277)
- Replaces org UUID in URL/browser location bar with org slug.
- Refactor: Adds shared app state utility using https://sijakret.github.io/lit-shared-state/ to
access org data from deep descendants.
- Backwards compatible: org UUID URLs should auto-redirect to org slug URLs.
- Show the org UUID in org settings general tab for use with APIs
(Resolves #1258, Follows #1279)
2023-10-18 09:28:30 -07:00
Ilya Kreymer
36bd228115 version: update to 1.8.0-beta.0 2023-10-17 18:06:55 -07:00
sua yoo
6b897e281c
hotfix: display workflow list date as utc 2023-10-17 15:51:24 -07:00
Henry Wilkinson
adf71f132e
Adds missing user documentation for launch! (#1286)
Closes #1215

- Adds account settings page
- Adds overview page
- Adds archived items page
- Adds note about browser profile metadata editing
- Adds note on editing the crawler instances scale while crawling
- Adds details on permission levels for the org settings
- Removes note about not being able to change your display name (follows
#1265)
2023-10-16 19:16:38 -07:00
Ilya Kreymer
a0def4f2b3
ansible microk8s additional cleanup (#1295)
follow-up to #1264: 
- microk8s: move default inventory vars role defaults
- microk8s: improve debugging of template output
- do: move teardown tasks to new role
2023-10-16 18:55:35 -07:00
Ilya Kreymer
b3f530f8e6 version: bump to 1.7.0 2023-10-16 18:39:20 -07:00
sua yoo
ab8e82cd28
Update org custom URL label (#1292)
Fast follower https://github.com/webrecorder/browsertrix-cloud/pull/1276

Updates label, info text, and preview text for org slug field to be more user-friendly
use 'Custom URL Identifier' and 'Custom your organization's web address for accessing Browsertrix Cloud'
---------
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-10-16 15:08:43 -07:00
Ilya Kreymer
ddc4e03422
operator status typo fix: (#1293)
- don't log normal exists as crashes!
- set pod_status.exitCode to the exitCode
- count exit code 13 as not-a-crash also (force interrupt)
2023-10-16 15:01:46 -07:00
Ilya Kreymer
1bc4697995
optimization: avoid updating whole org when only need to set one field (#1288)
- add update_users and update_slug_and_name
- rename update to update_full
2023-10-16 10:54:04 -07:00
Ilya Kreymer
dc8d510b11
webhook tweak: pass oid to crawl finished and upload finished webhooks (#1287)
Optimizes webhooks by passing oid directly to webhooks:
- avoids extra crawl lookup
- possible for crawl to be deleted before webhook is processed via
operator (resulting in crawl lookup to fail)
- add more typing to operator and webhooks
2023-10-16 10:51:36 -07:00
Henry Wilkinson
6d6fa03ade
Disable collection share button actions for viewer users (#1282)
Closes #1273 
- Viewers can see the share button and the dialogue's sharing info if the collection is sharable
- Viewers can't see or change the share toggle
- Viewers can't see the share button if the collection is not sharable
2023-10-16 10:50:33 -07:00
Ilya Kreymer
a295f5d05d version: bump to 1.7.0-beta.3 2023-10-15 18:31:03 -07:00
Tessa Walsh
2383b0d616
Set log download attachment name to crawl_id.log (#1280)
Fixes #1271
Using .log for now due to broader support for opening with default viewers

---------
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2023-10-13 20:00:37 -07:00
Tessa Walsh
aa3f1ebf5f
Add down command to uninstall and delete data (#1285)
Small improvement to `btrix` helper. Adds `./btrix down` command to
uninstall and delete data without resetting the dev environment.
2023-10-13 17:16:12 -07:00
Tessa Walsh
c5ca250f37
Add id-slug lookup and restrict slugs endpoints to superadmins (#1279)
Fixes #1278 
- Adds `GET /orgs/slug-lookup` endpoint returning `{id: slug}` for all
orgs
- Restricts new endpoint and existing `GET /orgs/slugs` to superadmins
2023-10-13 17:02:19 -07:00
sua yoo
8466caf1d9
Allow org admins to update slug (#1276)
- Allows editing of org slugs (actual URL updates will be handled in
https://github.com/webrecorder/browsertrix-cloud/issues/1258.)
- Converts user input to slug using slugify
- Adds help text to org name and slug
- Renames tab from "information" to "general" settings
2023-10-13 17:00:43 -07:00
Henry Wilkinson
0bd8748e68
Minor Workflow Creator UX Changes (#1267)
- Adds `position: sticky` to the workflow creator / editor controls to
affix them to the bottom of the screen, they are now always visible!
- Renames "Extra URLs in Scope" to "Extra URL Prefixes in Scope"
- Updates documentation accordingly
- Adjusts casing for checkboxes
- Adds the multiplication sign to the crawler instances settings to
better communicate that they are increases in scale and not arbitrary
numbers.
2023-10-13 16:55:54 -07:00
sua yoo
22fbf92ed6
Show storage values for each item type when no quota (#1260)
Hides chart and shows size values for each Storage line when org has no
quota. No changes to orgs with quota. (Follow-up to #1188)
2023-10-13 14:31:33 -07:00
sua yoo
630c00c5b0
Enforce strong passwords in UI (#1266) 2023-10-12 19:36:59 -07:00
Anish Lakhwara
834fa72baf
Refactor microk8s playbook to follow "new" structure (#1264)
* Refactor microk8s playbook to follow structure with shared roles

- Integrates with btrix/deploy role for deploying
- Seperated RedHat and Debian into seperate roles
- Created Common role

- allow running remotely by default
- use 'browsertrix_cloud_home' for charts path
- add additional customizable options to btrix_values.j2 (todo: unify all the templates)
- docs: update to new playbook path

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-10-11 19:33:30 -07:00
Ilya Kreymer
41c054d209
Storage ops followup type checking (#1274)
* storage ops: follow up to #1257:
- fix refactor typo
- add type hints for all storageops apis (add mypy_boto3_s3 and types_aiobotocore_s3 for type hints)
2023-10-11 14:03:00 -07:00
sua yoo
f1dcc7e48a
Allow users to change display name and email (#1265) 2023-10-11 13:42:41 -07:00
Ilya Kreymer
c591a5755d test quickfix: microk8s crawls were not running due to exceeding CI resource capacity
to fix:
- disable metrics-server
- lower per-browser mem/cpu requirements
2023-10-10 23:29:10 -07:00
Tessa Walsh
266afdf8d9
Add slugs to org backend (#1250)
- Add slug field with uniqueness constraint to Organization
- Use python-slugify to generate slug from name and import that in migration
- Require name in all /rename and org creation requests
- Auto-generate slug for new org with no slug or when /rename is called w/o a slug
- Auto-generate slug for 'default-org' based on name

- Add /api/orgs/slugs GET endpoint to return all slugs in use

- tests: extend backend test-requirements.txt from requirements to allow testing slugify
- tests: move get_redis_crawl_stats() to avoid extra dependency in utils
2023-10-10 18:30:09 -07:00
Ilya Kreymer
16e7a1d0a2
Storage Ops Refactor (#1257)
* storage ops refactor:
- create StorageOps class similar to other ops classes
- init storages list in StorageOps, no longer require lookup up default storages via CrawlManager
- convert all storage functions to members, add storageops to operator
- remove unused params, ensure crawl exists for rollover restart
- add env var to determine if using local minio to use correct endpoint URL

* crawls /seeds endpoint: just return empty list if not a crawl (eg. upload)

* crawlmanager: remove unused code, rename check_storage -> has_storage
2023-10-10 15:04:23 -07:00
Ilya Kreymer
5cad9acee9
Compute crawl execution time in operator (#1256)
* store execution time in operator:
- rename isNewCrash -> isNewExit, crashTime -> exitTime
- keep track of exitCode
- add execTime counter, increment when state has a 'finishedAt' and 'startedAt' state
- ensure pods are complete before deleting
- store 'crawlExecSeconds' on crawl and org levels, add to Crawl, CrawlOut, Organization models

* support for fast cancel:
- set redis ':canceled' key to immediately cancel crawl
- delete crawl pods to ensure pod exits immediately
- in finalizer, don't wait for pods to complete when canceling (but still check if terminated)
- add currentTime in pod.status.running.startedAt times for all existing pods
- logging: log exec time, missing finishedAt
- logging: don't log exit code 11 (interrupt due to time/size limits) as a crash

* don't wait for pods completed on failed with existing browsertrix-crawler image

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-10-09 17:45:00 -07:00
Tessa Walsh
748c86700d
fix: lookup user object operator to pass to CrawlConfig.add_new_crawl (#1254)
fixes #1253 
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-10-05 21:30:10 -07:00
Ilya Kreymer
fa86555eed
Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235)
* keep track of per pod status on crawljob:
- crashes time, and reason
- 'used' vs 'allocated' resources 
- 'percent' used / allocated

* crawl log errors: log error when crawler crashes via OOM, either via redis error log
or to console

* add initial autoscaling support!
- detect if metrics server is available via K8SApi.is_pod_metrics_available()
- if available, use metrics for 'used' fields
- if no metrics, set memory used for redis only (using redis apis)
- allow overriding memory and cpu via newMemory and newCpu settings on pod status
- scale memory / cpu based on newMemory and newCpu setting
- templates: update jinja templates to allow restarting crawler and redis with new resources
- ci: enable metrics-server on k3d, microk8s and nightly k3d ci runs

* roles: cleanup unused roles, add permissions for listing metrics

* stats for running crawls:
- update in db via operator
- avoids losing stats if redis pod happens to be done
- tradeoff is more db access in operator, but less extra connections to redis + already
loading from db in backend
- size stat: ensure size of previous files is added to the stats

* crawler deployment tweaks:
- adjust cpu/mem per browser
- add --headless flag to configmap to use new headless mode by default!
2023-10-05 20:41:18 -07:00
Ilya Kreymer
20560abb81 version: bump to 1.7.0-beta.2 2023-10-05 20:33:38 -07:00
sua yoo
f2261bcb34
Fix frontend not redirecting on 401 (#1244)
- Ensures need-login event bubbles until handled
- Redirects on 401 from /refresh endpoint
- Go to previous URL upon login, rather than always to home page
- Shows accurate login notification (rather than less precise "couldn't retrieve org" or similar message)
2023-10-04 00:17:22 -07:00
sua yoo
38efeccc25
Limit URL list entry to maximum URLs (#1242)
- Limits URL list entry to 1,000 URLs
- Limits additional URL list entry to 100 URLs
- Shows first invalid URL in list in error message
- Quick and dirty fix for long URLs wrapping: Show URLs in list on one line, with entire container scrolling
---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-10-03 21:02:32 -07:00
Henry Wilkinson
99ccdf2de8
Browser Profile Warning & Dialog Style Updates (#1243)
* Give protocol selection box smaller max-width

* Add warning and docs link to browser profile creation

- Updates dialog styling to btrix dialog
- Updates button sizes
- Updates button placement in dialog
- Updates button labels for consistency with other buttons in app
- Updates docs page with new button labels

* Update browser profile edit metadata dialog. Matches updated dialog shown on profile creation

* Open docs page in new tab
2023-10-03 18:59:19 -07:00
Tessa Walsh
bbdb7f8ce5
Require that all passwords are between 8 and 64 characters (#1239)
- Require that all passwords are between 8 and 64 characters
- Fixes account settings password reset form to only trigger
logged-in event after successful password change.
- Password validation can be extended within the UserManager's
validate_password method to add or modify requirements.
- Add tests for password validation
2023-10-03 18:57:46 -07:00
Tessa Walsh
b1ead614ee
Add --failOnFailedSeed checkbox to URL list workflows (#1236)
- If set, and any of the seeds fails, the entire crawl is marked as a failure.
- Add checkbox which adds --failOnFailedSeed checkbox to URL list workflows
- Add 'Fail Crawl On Failed URL' to crawl workflow setup docs
2023-10-03 18:46:09 -07:00
sua yoo
4f36a94bc6
Update local dev docs (#1246)
Suggest uncommenting backend_image and frontend_image to use local images
2023-10-03 17:05:21 -04:00
Tessa Walsh
e9bac4c088
API delete endpoint improvements (#1232)
- Applies user permissions check before deleting anything in all /delete endpoints
- Shuts down running crawls before deleting anything in /all-crawls/delete as well as /crawls/delete
- Splits delete_list.crawl_ids into crawls and upload lists at same time as checks in /all-crawls/delete
- Updates frontend notification message to Only org owners can delete other users' archived items. when a crawler user attempts to delete another users' archived items
2023-10-03 13:05:00 -07:00
sua yoo
df190e12b9
Show running workflow error logs (#1224)
- Adds "Logs" tab to workflow detail
- Shows error logs in expandable section in "Watch" tab
- Show corresponding message (no logs yet or logs temporarily unavailable) when `/errors` returns 503 based on crawl state
- text tweaks: use error logs instead of logs, change 'crawl start' -> 'crawl continue' in log message

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-10-03 00:03:21 -07:00
Anish Lakhwara
a2dbad35c3
feat: use is_bool to check EMAIL_SMTP_USE_TLS (#1231)
- use is_bool to check EMAIL_SMTP_USE_TLS
- use is_bool for yaml values that are boolean
2023-10-02 21:29:36 -07:00
sua yoo
3fea4cabe2
Show storage meter even with no quota (#1240)
- Displays how much storage items and browser profiles take up even when quota is not specified
2023-10-02 20:01:39 -07:00
sua yoo
941a75ef12
Separate seeds into a new endpoints (#1217)
- Remove config.seeds from workflow and crawl detail endpoints
- Add new paginated GET /crawls/{crawl_id}/seeds and /crawlconfigs/{cid}/seeds endpoints to retrieve seeds for a crawl or workflow
- Include firstSeed in GET /crawlconfigs/{cid} endpoint (was missing before)
- Modify frontend to fetch seeds from new /seeds endpoints with loading indicator

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-10-02 10:56:12 -07:00
Anish Lakhwara
1bf531e1ec
Fix: Make Collections Public on Creation (#1213)
- Add isPublic to Add Collection endpoint, send isPublic from frontend
- Fixes #1212
2023-09-29 12:08:10 -07:00
sua yoo
90e3a300cc
"Add new" dialog for all resources (#1202)
- Replaces individual "New" buttons in home page with dropdown button in header (includes Crawl Workflow, Upload Collection, Browser Profile)
- Refactors required step of new workflow and new collection into dialog
2023-09-29 09:11:24 -07:00
Anish Lakhwara
037396f3d9
Fix: Stream log downloading from WACZ (#1225)
* Fix(backend): Stream logs without causing OOM

Also be smarter about when to use `heapq.merge` and when to use
`itertools.chain`: If all the logs are coming from the same instance we
`chain` them, otherwise we'll `merge` them

iterator fixes:
- group wacz files by instance by suffix, eg. -0.wacz, -1.wacz, -2.wacz
- sort wacz files, and all logs within each wacz file
- chain log iterators for all log files within wacz group
- merge log iterators across wacz files in different groups
- add type hints to help keep track of iterator helper functions
- add iter_lines() from botocore, use that for line parsing for simplicity

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-09-28 18:54:52 -07:00
Ilya Kreymer
d6bc467c54
improvements to redis pod: (#1219)
- add liveness check/fix readiness check - ensure 'redis-cli ping' actually returns 'PONG', as exit code is 0 even if errors
will detect situations where redis is not available, such as due to to max clients being reached
- bump redis memory/cpu for now (until autoscaling/automatic adjustment is available)
2023-09-28 13:00:31 -07:00
Ilya Kreymer
7eac0fdf95
optimization: convert all uses of 'async for' to use iterator directly (#1229)
- optimization: convert all uses of 'async for' to use iterator directly instead of converting to list to avoid
unbounded size lists
- additional cursor.to_list() to async for conversions for stats computation, simply crawlconfigs stats computation

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-09-28 12:31:08 -07:00
Vinzenz Sinapius
cabf4ccc21
Disable smtp_use_tls with false instead of empty string (#1184)
`smtp_use_tls = bool(os.environ.get("EMAIL_SMTP_USE_TLS", True))` would only disable tls when `EMAIL_SMTP_USE_TLS` is set to an empty string which is not intuitive
2023-09-28 12:10:20 -07:00
Henry Wilkinson
e93f195d59
fix: Right Align Copy Buttons & <btrix-desc-list> vertical width: 100% (#1177)
* Reorders actions, adds tooltip

- All copy buttons on the collection share dialog are now on the right side
- Adds a tooltip to tell the user the button opens the link in a new tab

* Make vertical `dec-list` items fill 100% width of their parent container

- Allows for better placement of items within the container
- Adds horizontal padding to info bars

* Right align copy button in item details page
2023-09-28 12:08:27 -07:00
Ilya Kreymer
86a424af93
migration improvements: (#1228)
* migration improvements + rerunning migrations: (fixes #1227)
- avoid starting some workers while migration is still running
- ensure workers that aren't performing migration await for migration to complete
- backend will not be valid until migration is run
* allow rerunning migration from specified version via --set rerun_from_migration=<VERSION> (replaces rerun_last_migration)
2023-09-28 12:04:19 -07:00
Tessa Walsh
1f74f03447
Recalculate Organization.storedBytes in migration 0017 (#1220) 2023-09-28 11:22:10 -07:00