Commit Graph

776 Commits

Author SHA1 Message Date
Ilya Kreymer
6dca2f1c03
supports overriding the replayweb.page version without having to be r… (#1122)
* supports overriding the replayweb.page version without having to be rebuild frontend image:
- ensures 'rwp_base_url' from helm chart is passed to nginx
- ensures both ui.js and sw.js are loaded based on nginx environment variable, not hard-coded
- ui.js loaded via redirect from new /replay/ui.js path
- pin RWP to known working release in default values.yaml
- remove RWP_BASE_URL from Dockerfile, no longer needed, set via chart env var
- set default RWP_BASE_URL for devserver to use CDN
- set RWP version to 1.8.11
2023-09-05 20:10:21 -04:00
sua yoo
ff6650d481
Manage collection from archived item details (#1085)
- Lists collections that an archived item belongs to in item detail view
- Improves performance of collection add component
---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-09-05 17:52:17 -04:00
Anish Lakhwara
00eddd548d
feat: k3s ansible playbook (#1071)
It changes the directory layout of the ansible playbook to a
more "best practices" friendly approach using ansible roles and
a real inventory file

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2023-09-05 17:50:18 -04:00
Ilya Kreymer
7d0cfa93e2 quick fix: fix typo in publish-helm-chart specifying version 2023-09-05 15:51:10 -04:00
Anish Lakhwara
3bfa69b98a
fix: add "v" to helm chart release filename (#1141)
* fix: add "v" to helm chart release filename, fixes #1134 

* add 'v' to helm chart version and update-version.sh

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-09-05 15:47:39 -04:00
Henry Wilkinson
1af796bd0e
fix: Terminology unification "crawls" & "archive data" → "items" (#1127)
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2023-09-01 11:09:06 -04:00
Tessa Walsh
147bfd9d44
Add event webhook notifications system to backend (#1061)
Initial set of backend API for event webhook notifications for the following events:
* Crawl started (including boolean indicating if crawl was scheduled)
* Crawl finished
* Upload finished
* Archived item added to collection
* Archived item removed from collection

Configuration of URLs is done via /api/orgs/<oid>/event-webhook-urls. If a URL is configured for a given event, a webhook notification is added to the database and then attempted to be sent (up to a total of 5 tries per overall attempt, with an increasing backoff between, implemented via use of the backoff library, which supports async).

webhook status available via /api/orgs/<oid>/webhooks

(Additional testing + potential fastapi integration left in separate follow-ups
Fixes #1041
2023-08-31 19:52:37 -07:00
Tessa Walsh
1aa951132c
Fix unsetting all collections via PATCH update (#1126) 2023-08-30 18:16:21 -04:00
Ilya Kreymer
a9ab17fc61
publish helm chart on release (fixes #1114) (#1117) (#1123)
- no longer using :latest by default in values.yaml, instead updating version with each release
- set chart version to match app version in Chart.yaml
- update version in helm chart and values.yaml as part of update-version.sh script
- update test.yaml and local-config.yaml to enable using :latest tag images
- ci: add ci script for packaging current helm chart
- docs: updates docs to indicate deploying directly from GitHub release
- docs: add script to fill in latest version for 'VERSION' using custom script
- chart: set local_service_port to 30870 by default, but use only if no ingress.
- default values.yaml set up for local deployment, local-config.yaml contains additional commented out examples
- ci draft: add deployment info to draft with helm install command for current version
- test: fix password check test
2023-08-30 12:02:02 -07:00
Tessa Walsh
f6369ee01e
Add support for collectionIds to archived item PATCH endpoints (#1121)
* Add support for collectionIds to patch endpoints

* Make update available via all-crawls/ and add test

* Fix tests

* Always remove collectionIds from udpate

* Remove unnecessary fallback

* One more pass on expected values before update
2023-08-30 10:41:30 -04:00
Henry Wilkinson
ceaaf630f2
dev: GitHub Issue form update: updates "user story" title (#1112)
- Updates user story title
- User story title should be friendlier to those who don't know what a "user story" is!
- Clarifies sections that shouldn't be edited by users in the preview text
- Adds note about reporting security vulnerabilities
---------
Co-authored-by: sua yoo <sua@webrecorder.org>
2023-08-27 16:34:04 -07:00
Tessa Walsh
e667fe2e97
Add max crawl size option to backend and frontend (#1045)
Backend:
- add 'maxCrawlSize' to models and crawljob spec
- add 'MAX_CRAWL_SIZE' to configmap
- add maxCrawlSize to new crawlconfig + update APIs
- operator: gracefully stop crawl if current size (from stats) exceeds maxCrawlSize
- tests: add max crawl size tests

Frontend:
- Add Max Crawl Size text box Limits tab
- Users enter max crawl size in GB, convert to bytes
- Add BYTES_PER_GB as constant for converting to bytes
- docs: Crawl Size Limit to user guide workflow setup section

Operator Refactor:
- use 'status.stopping' instead of 'crawl.stopping' to indicate crawl is being stopped, as changing later has no effect in operator
- add is_crawl_stopping() to return if crawl is being stopped, based on crawl.stopping or size or time limit being reached
- crawlerjob status: store byte size under 'size', human readable size under 'sizeHuman' for clarity
- size stat always exists so remove unneeded conditional (defaults to 0)
- store raw byte size in 'size', human readable size in 'sizeHuman'

Charts:
- subchart: update crawlerjob crd in btrix-crds to show status.stopping instead of spec.stopping
- subchart: show 'sizeHuman' property instead of 'size'
- bump subchart version to 0.1.1

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-08-26 22:00:37 -07:00
Ilya Kreymer
2da6c1c905
1.6.3 Fixes - Fix workflow sort order for Latest Crawl + 'Remove From Collection' action menu on archived items in collections (#1113)
* fix latest crawl (lastRun) sort:
- don't cast 'started' value to string when setting as starting crawl time (regression from #937)
- caused incorrect sorting as finished crawl time was a datetime, while starting crawl time was a string
- move updated config crawl info in one place, simplify to avoid returning started time altogether, just set directly
- pass mdb crawlconfigs and crawls collections directly to add_new_crawl() function
- fixes #1108

* Add dropdown menu containing 'Remove from Collection' to archived items in collection view (#1110)
- Enables users to remove an item from a collection from the collection detail view - menu was previously missing
- Fixes: #1102 (missing dropdown menu) by making use of the inactive menu trigger button.
- Updates collection items page size to match "Archived Items" page size (20 items per page)

---------
Co-authored-by: sua yoo <sua@webrecorder.org>
2023-08-25 21:08:47 -07:00
Anish Lakhwara
8b16124675
feat: implement 'collections' array with {name, id} for archived item details (#1098)
- rename 'collections' -> 'collectionIds', adding migration 0014
- only populate 'collections' array with {name, id} pair for get_crawl() / single archived item
path, but not for aggregate/list methods
- remove Crawl.get_crawl(), redundant with BaseCrawl.get_crawl() version
- ensure _files_to_resources returns an empty [] instead of none if empty (matching BaseCrawl.get_crawl() behavior to Crawl.get_crawl())
- tests: update tests to use collectionIds for id list, add 'collections' for {name, id} test
- frontend: change Crawl object to have collectionIds instead of collections

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-08-25 00:26:46 -07:00
Ilya Kreymer
989ed2a8da
Use Shared Services for Crawling, Redis, Profile Browsers (#1088)
* refactor to use shared role-based service shared across pods:
- 'crawler' service for all crawler screencasting, scales 0 .. N with crawler-<ID>-N.crawl
- 'redis' service for all redis access, redis-<ID>-0.redis
- 'browser' service for all browser access (profile browsers), browser-<ID>-0.browser
- don't create a new service per crawl/profile at all
- enable 'publishNotReadyAddresses' for potentially faster resolving, esp for redis
- remove service as type managed by operator as no longer creating services dynamically
- remove frontend var CRAWLER_SVC_SUFFIX, suffix always '.crawler' to match crawler service name
2023-08-24 20:08:53 -07:00
Ilya Kreymer
e7f2d93f80 bump version to 1.7.0-beta.0 2023-08-23 12:03:45 -07:00
Ilya Kreymer
63b776bce8
ingress: minor tweaks to ingress to update to latest spec: (#1096)
- use pathType ImplementationSpecific for regexes
- use ingressClassName instead of annotation
2023-08-23 11:36:52 -07:00
Tessa Walsh
ce5b52f8af
Add and enforce org maxPagesPerCrawl quota (#1044) 2023-08-23 10:38:36 -04:00
sua yoo
54cf4f23e4
Paginate Workflows and refactor to use server-side queries (#1078)
- Paginates Crawl Workflows when there are more than 10 workflows
- Refactors workflow search and crawl search to use the same component
- Adds sort by first seed, workflow creation date, and workflow modified date
- Separates "last run" date from "modified" date
- Update column layout into Name & Schedule (or Manual Ru'ri=), Latest Crawl (<finish time> in <duration>), total size, and last modified (modified by and modified time)
2023-08-22 16:29:17 -07:00
Ilya Kreymer
223571b18b
exclusion regex: show unmodified regex string, avoid dropping the '\' when displaying escaped regexes (#1094) 2023-08-22 10:16:23 -07:00
Henry Wilkinson
4948e53cdb
dev: Adds GitHub feature issue template (#1087)
* Create feature-change.yml

* Enables docs referral in issue template
2023-08-21 15:27:45 -07:00
Henry Wilkinson
2952988864
docs: formatting fixes & minor content updates (#1091)
Additional tweaks on Browser Profiles pages + general consistency pass
2023-08-21 13:26:43 -07:00
Henry Wilkinson
02a01e7abb
docs: Adds information about 1.6 features to documentation (#1086)
* 1.6 docs update

### Changes

- Adds note in style guide about referencing actions in the app
- Adds page for Browser Profiles
  - Adds callout for uploads in the context of combining items from multiple sources
- Adds page for Collections
- Adds page for Crawl Workflows
- Updates index to link to new dedicated Crawl Workflow page in addition to the Crawl Workflow Setup page
- Updates Org Settings page action styling in accordance with new rules
- Updates Crawl Workflow Setup page with links to the new pages and a hierarchy fix for the first item
- Updates user guide navigation with a new section for crawling related items
---------

Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2023-08-18 21:55:20 -07:00
Henry Wilkinson
726a070ca9
Adds guidelines for using admonitions (#1084)
- Adds section about the admonitions we use and their meanings when writing documentation
- Heading hierarchy changes (fixed my past blunders!)
- Removes section about GitHub Flavored Markdown — it's not really relevant here anymore considering how much custom stuff we have.
2023-08-18 18:28:36 -07:00
Ilya Kreymer
422452b5c1 bump to 1.6.2 2023-08-18 18:27:37 -07:00
Ilya Kreymer
8e43940196
chart resources: adjust backend memory to 350Mi, as 200Mi was too low (#1082) 2023-08-15 21:59:57 -07:00
sua yoo
6044486190
Add button to download error logs (#1080)
* add button to download logs

* render if logs are present

* add icon
2023-08-15 21:14:32 -07:00
sua yoo
270e134359
Show details in crawl error log (#1079)
Shows crawl error log details in a dialog. Since the detail object does not always follow a specific format, this iteration uses the detail key in uppercase as the label.
2023-08-15 21:14:08 -07:00
Ilya Kreymer
90b2f94aef
follow-up to #1066: update redis to 5.0.0 which includes full fix for connection leak in from_url(), (#1081)
simplifies previous workaround addressed in 5.0.0
2023-08-15 20:34:47 -07:00
Ilya Kreymer
768d1181f8
frontend: fixes for queue / exclusions: (#1076)
- fix 'Edit Crawler Instances' not showing up when crawl running
- urlencode regex params to properly encode '+'
- catch server-side regex error, display 'Invalid Regex'
2023-08-15 13:15:43 -07:00
Henry Wilkinson
5edb4ebabf
Add MkDocs YAML schema to vscode settings.json (#1075)
* Add mkdocs YAML schema to vscode settings.json

* Fixes wacky indenting

* Fixes config error
2023-08-15 12:06:05 -07:00
sua yoo
4c74fadf91
Update frontend local dev guide (#1073)
- Clarifies use case for frontend development server
- Fixes incorrect sample API URLs
- Adds additional detail around requirements and quickstart
- Links back to docs from frontend README
---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2023-08-15 12:03:39 -07:00
Anish Lakhwara
04c2f050df
fix: password check constructor error (#1077) 2023-08-15 12:00:13 -07:00
Ilya Kreymer
2e73148bea
fix redis connection leaks + exclusions error: (fixes #1065) (#1066)
* fix redis connection leaks + exclusions error: (fixes #1065)
- use contextmanager for accessing redis to ensure redis.close() is always called
- add get_redis_client() to k8sapi to ensure unified place to get redis client
- use connectionpool.from_url() until redis 5.0.0 is released to ensure auto close and single client settings are applied
- also: catch invalid regex passed to re.compile() in queue regex check, return 400 instead of 500 for invalid regex
- redis requirements: bump to 5.0.0rc2
2023-08-14 18:29:28 -07:00
sua yoo
89983542f9
Update archived item URLs (#1064)
- Changes to URLs in "Crawling", "All Archived Items", and "Collections":
- Rename Artifacts -> Items
- Unifies view crawl view as loaded from All Archived Items and from Workflows
- Includes redirect for /artifacts/uploads -> /items/uploads to support archiveweb.page usage
2023-08-14 18:28:37 -07:00
Ilya Kreymer
9553115bbe
helm chart tweaks: (#1067)
* helm chart tweaks:
- lower mem requirements for backend and crawler
- disable cors in ingress to pass through cors headers from backend
- crawler statefulset: use ordered instead of parallel scaling policy to avoid single crawl taking up all crawling capacity quickly
2023-08-14 16:43:12 -07:00
sua yoo
ffd0e525d9
Webpack config improvements (#1063)
- Upgrades webpack and webpack-dev-server for bugfixes and performance updates
- Removes unnecessary file watching
- Enables persistent build cache in dev
- Switches to faster dev source map
2023-08-11 13:16:24 -07:00
Ilya Kreymer
d93ddaf620 bump version to 1.6.1 2023-08-11 12:50:41 -07:00
Ilya Kreymer
35ab6d6df6 bump to 1.6.0! 2023-08-09 15:40:27 -07:00
Ilya Kreymer
8ea3dd5dae
terminology tweaks in frontend: (part of #922) (#1062)
* terminology tweaks in frontend: (part of #922)
- use 'crawl workflow' instead of 'workflow' where possible
- use 'replay' instead of 'replay crawl'
- localization: rerun string extraction / processing
- "Review Config" → "Review Settings"
- "Workflow" → "Crawl Workflow" in error message

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-08-09 15:38:58 -07:00
sua yoo
37733483d5
Standardize archived item filtering, sorting and labels (#1054)
Frontend:
- Renames list view to "All Archived Items"
- Refactors fetches to use single all-crawls endpoints
- Removes search by config ID for more search parity with uploads
- Adds sort by size
- Refactors property and method names to replace crawl*
- Replaces remaining references to "crawl" in copy with "item"'
- Rename Upload Archive button to Upload WACZ
- Fix focusout in item menu so menus close

Backend:
- Filter search values by type as well
- Only get list of cids for crawls in search values
- Don't list crawl/workflow ids in search values

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-08-09 12:13:55 -07:00
Ilya Kreymer
7a8f370bc2 bump version to 1.6.0-beta.4 for testing 2023-08-09 12:09:37 -07:00
Ilya Kreymer
38f67a6cc0
Optimize Frontend Image Build on CI (#1057)
* Always run yarn only on build platform with --platform=$BUILDPLATFORM
* Remove optional dependencies (playwright + chromium) from build with --ignore-optional and move some devDependencies to be optional
* Disable husky pre-commit hook checks on frontend

Co-authored-by: sua yoo <sua@suayoo.com>
2023-08-09 12:06:20 -07:00
sua yoo
b494070e43
Collection share dialog + copy updates (#1056)
- Always shows primary "Share" action button in Collection detail page.
- Enables toggling shareable status and share info from dialog. Difference from mockups: I made the "Done" button neutral do differentiate from our submit action buttons in the dialog, since toggling will apply changes immediately.
- Menu item: "Go to Public View"/"Go to Shareable View" -> "Visit Shareable URL". 
- Toggle label: "Make Collection Shareable" -> "Collection is Shareable".
- Additional dialog copy: adds "This collection can be viewed by anyone with the link." under "Link to Share" and "Share this collection by embedding it into an existing webpage." under "Embed Collection".
- Moves share status icon to its own column in list view.
- Adds new syntax-highlighted code component that supports js and html.

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-08-09 10:12:46 -07:00
Ilya Kreymer
de3e5907a7
backend: crawlout: include raw crawnconfig in api details, fixes #1030 (#1055) 2023-08-09 08:46:42 -07:00
Ilya Kreymer
8d0a4f2ca9
fix public collections endpoint returning 404 when not public (#1052)
tests: add tests for public collections endpoint when collection is public and when not
2023-08-04 13:29:13 -04:00
Tessa Walsh
7ff57ce6b5
Backend: standardize search values, filters, and sorting for archived items (#1039)
- all-crawls list endpoint filters now conform to 'Standardize list controls for archived items #1025' and URL decode values before passing them in
- Uploads list endpoint now includes all all-crawls filters relevant to uploads
- An all-crawls/search-values endpoint is added to support searching across all archived item types
- Crawl configuration names are now copied to the crawl when the crawl is created, and crawl names and descriptions are now editable via the backend API (note: this will require frontend changes as well to make them editable via the UI)
- Migration added to copy existing config names for active configs into their associated crawls. This migration has been tested in a local deployment
- New statuses generate-wacz, uploading-wacz, and pending-wait are added when relevant to tests to ensure that they pass
- Tests coverage added for all new all-crawls endpoints, filters, and sort values
2023-08-04 09:56:52 -07:00
Anish Lakhwara
9236a07800
fix: run yarn format in frontend dir (#1043) 2023-08-03 19:12:48 -07:00
Ilya Kreymer
362afa47bd
Support for Public / Shareable Collections (#1038)
* collections: support toggling collections public/private, viewable via RWP
- backend: add 'public' to collection model, support patching to update
- backend: add .../collections/<id>/public/replay.json for public access
- backend: add CORS handling for public endpoint
- frontend: support 'make shareable / make private' dropdown actions on collection detail + collection list views
- frontend: show shareable / private icons by collection name on detail + list views
- frontend: link to replayweb.page for standalone browsing
- frontend: add embed code popup when a collection is shareable
- refer to public collections as 'shareable' for now

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-08-03 19:11:01 -07:00
sua yoo
62d3399223
Add info bar to Collection detail view (#1036)
- Adds Collection info bar to detail view
- Update "Web Captures" -> "Archived Items"
- Updates Collection list columns to match
- Refactors `btrix-desc-list` and usage in `workflow-details` to reuse horizontal info bar component
2023-08-03 16:58:56 -07:00