Fixes#890
This PR introduces new streaming superuser-only API endpoints to export
and import database information for an organization. New Adminstrator
deployment documentation on how to manage the process and copy files
between S3 buckets as needed is also included.
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- Provides a link to Mozilla's page explaining what they are (good for
folks new to the concept)
- Provides a link to useragents.me, the same site we link to in the app
- Provides two examples of situations where they may be helpful to get
around content restrictions
- Shows browser profile last modified or created by name, if available
- Moves backed-up status to browser profile subsection header
- Moves "Last Updated" column to last and displays user name on hover,
to match archived items list view
- Updates browser profile docs
Fixes#1695
### Changes
- Adds Crawl Review user docs
- Adds Quality Assurance section to the Archived Items page
- Adds note in the user roles list on crawl review not being available for viewers
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
Fixes#1782
- Dash icons are now used to convey status exclusively
- Slash icons are now used to convey no data states
- Updates status icons to filled in the docs (also required for QA
docs!)
Repository Index: Generate an index.yaml in ./docx/helm-repo/index.yaml
to allow for browsertrix to be a helm repository.
docs: rename docs.browsertrix.cloud -> docs.browsertrix.com
docs: update deployment doc to mention helm repo as preferred way to
install
docs build action: generate repository index in GH action
publish action: update auto-generated message to mention installing from
the repo.
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
- modify invite email template to answer common questions
- email templates: make each email template overridable with --set-file
- docs: update customization doc to document how to customize email
templates
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Closes#1642
### Changes
- Adds section to the collections page on downloading collections
- Changes the Files section on the archived items page to be more
explicit about downloading files because that's the only action you can
do there!
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Supports horizontal pod autoscaling (hpa) for backend and frontend pods:
- use cpu and memory averages
- adjust base memory + cpu for backend
- threshold set to 80% cpu and 95% memory utilization by default
(configurable in values.yaml)
- instead of backend and frontend replicas, set max replicas in
values.yaml
- only enable hpa if backend_max_replicas or frontend_max_replicas is
>1, default to 1 for now
Partially addresses #1241
### Changes
- Adds Browsertrix logo to readme
- It detects if you're in light or dark mode and adjusts the text color
accordingly! _The future is now!_
- Minor readme updates
- Updates icon and adds favicon SVGs to the docs
- This does not yet use Konsole for the docs site title. Will have to
sort this out later along with private hosting for that font.
- Updates docs theme to use new brand colours — picked the green for
this one, will probably be consistent across all of Webrecorder's MKDocs
sites.
Part of #1241
### Changes
- Renames all instances of "Browsertrix Cloud" to "Browsertrix" on the
front end, emails, and documentation
---------
Co-authored-by: emma <hi@emma.cafe>
Fixes#1555
This is a first pass at some of the configuration options within the
Helm chart that might be most applicable to users. Emphasis is placed on
configuration that's particular to our application, such as storage and
crawler channels.
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Fixes#1522
## Changes
- Adds further security recommendations to change the password to
accounts you care about after crawling
Adds more details about the capabilities afforded with browser profiles.
This is now split into the following sections:
- Logging into Websites
- Accepting Popups
- Changing Browser Settings
- More in the future??? Extensions???
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Fixes#1463
### Changes
- Explains execution time
- Adds style guide section about adding a badge for paid features
- Updates config for mkdocs-material 9.5, materialx emoji support is
being removed.
- Adds better tooltips, a cool feature that also got released with
mkdocs-material 9.5
- Adds search suggestions
### Caveats
- [mkdocs 1.5 has improved the way they handle link
validation](https://www.mkdocs.org/about/release-notes/#expanded-validation-of-links).
Looks like way I've gone about linking things could be improved, and it
will give a bunch of warnings as a result. The site still builds fine,
but I'm going to fix this in a different PR so this one doesn't take as
much effort to review :)
EDIT: Here's that PR
https://github.com/webrecorder/browsertrix-cloud/pull/1476
### Testing
- Make sure you are up to date with `pip install --upgrade
mkdocs-material`
### Screenshot
**Badge!**
<img width="884" alt="Screenshot 2024-01-17 at 11 59 00 PM"
src="https://github.com/webrecorder/browsertrix-cloud/assets/5672810/62a51cf6-24bd-49f1-a6d0-d335f730bfbe">
### Future
- Should mkdocs-material be versioned in our deployment script? We risk
things breaking if I don't get to them fast enough! 🙃
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Fixes#1341
Adds "User Agent" field to workflow editor under the Browser Settings
tab. If not set, the crawler will use the browser's default user agent.
Also added to docs and to the workflow details page (if set).
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
Fixes#1385
## Changes
Supports multiple crawler 'channels' which can be configured to
different browsertrix-crawler versions
- Replaces `crawler_image` in helm chart with `crawler_channels` array
similar to how storages are handled
- The `default` crawler channel must always be provided and specifies
the default crawler image
- Adds backend `/orgs/{oid}/crawlconfigs/crawler-channels` API endpoint
to fetch information about available crawler versions (name, image, and
label) and test
- Adds crawler channel select to workflow creation/edit screens and
profile creation dialog, and updates related API endpoints and
configmaps accordingly. The select dropdown is shown only if more than
one channel is configured.
- Adds `crawlerChannel` to workflow and crawl details.
- Add `image` to crawler image, used to display actual image used as
part of the crawl.
- Modifies `crawler_crawl_id` backend test fixture to use `test` crawler
version to ensure crawler versions other than latest work
- Adds migration to add `crawlerChannel` set to `default` to existing
workflow and profile objects and workflow configmaps
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Closes#1434
### Changes
#### Developer
- Adds the K3S playbook guide to the navigation
- Adds note about restarting MKDocs when adding new icons
- Adds note about concise language to the styleguide ([see previous
discussion](https://github.com/webrecorder/browsertrix-cloud/pull/1394#discussion_r1402666872))
- Adds a note about noun usage to the styleguide
#### User guide
- Adds tables for archived item and workflow statuses
- Adds custom styles for displaying statuses with their icons like we do
in the app
- Fixes capitalization issues
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: sua yoo <sua@webrecorder.org>
Resolves https://github.com/webrecorder/browsertrix-cloud/issues/1333
- Moves "Select Crawls" / "Select Uploads" steps into a single "Select
Archived Items" dialog
- Refactors new collection metadata dialog to accept editing existing
collection
- Prevents RWP component from rendering if there are no archived items
(@Shrinks99 made a comment about this figma, but this prevents
unnecessary requests when there isn't an archive to replay)
- Shows collection description at bottom of detail page at all times
(@Shrinks99 seems useful to see even on archived items view?)
- Switches collection detail primary action to "Add Archived Items" if
none are included (cc @Shrinks99)
- Displays friendlier "name taken" error
- Removes unused Collection edit route
- Upgrades markdown dependencies for fixes/improvements to description
editing
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Closes#1369
### Changes
- Adds improved getting started steps and intro contact information to
the User Guide homepage
- Adds a small section about the execution minutes graph for orgs with a
quota set
- Moves existing signup content to a dedicated signup page
- Changes admonitions from using em dashes to using colons.
- Em dashes are great and I love em.... But sometimes I love them a
little _too_ much and they were a bad fit here.
- Fixes user guide homepage link
- Fixes `ReplayWeb.page` and `ArchiveWeb.page` names
- Fixes broken links (would be good to have a CI system for this I
think)
---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Fixes#1261Closes#1092
The quota for monthly execution minutes is treated as a hard cap. Once
it is exceeded, an alert indicating that an org has exceeded its monthly
execution minutes will display and the user will be unable to start new
crawls. Any running crawls will be stopped once the quota is exceeded.
An execution minutes meter bar is also added in the Org Dashboard and
displayed if a quota is set. More detail in #1305 which was
merged into this branch.
## Changes
- Enable setting 'maxExecMinutesPerMonth' in orgs list quotas by superadmin
- Enforce quota by stopping crawls in operator once quota is reached
- Show alert banner once execution time quota is hit:
- Once quota is hit, disable Run Crawl buttons in frontend, return 403
message with `exec_minutes_quota_reached` detail in backend from
crawl config `/run` endpoint, and don't run new workflows on creation
(similar to storage quota)
- Display execution time for crawls in the crawl details overview,
immediately below
- Show execution minutes meter on dashboard (from #1305)
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: sua yoo <sua@webrecorder.org>
Closes#1215
- Adds account settings page
- Adds overview page
- Adds archived items page
- Adds note about browser profile metadata editing
- Adds note on editing the crawler instances scale while crawling
- Adds details on permission levels for the org settings
- Removes note about not being able to change your display name (follows
#1265)
- Adds `position: sticky` to the workflow creator / editor controls to
affix them to the bottom of the screen, they are now always visible!
- Renames "Extra URLs in Scope" to "Extra URL Prefixes in Scope"
- Updates documentation accordingly
- Adjusts casing for checkboxes
- Adds the multiplication sign to the crawler instances settings to
better communicate that they are increases in scale and not arbitrary
numbers.
* Refactor microk8s playbook to follow structure with shared roles
- Integrates with btrix/deploy role for deploying
- Seperated RedHat and Debian into seperate roles
- Created Common role
- allow running remotely by default
- use 'browsertrix_cloud_home' for charts path
- add additional customizable options to btrix_values.j2 (todo: unify all the templates)
- docs: update to new playbook path
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
* Give protocol selection box smaller max-width
* Add warning and docs link to browser profile creation
- Updates dialog styling to btrix dialog
- Updates button sizes
- Updates button placement in dialog
- Updates button labels for consistency with other buttons in app
- Updates docs page with new button labels
* Update browser profile edit metadata dialog. Matches updated dialog shown on profile creation
* Open docs page in new tab
- If set, and any of the seeds fails, the entire crawl is marked as a failure.
- Add checkbox which adds --failOnFailedSeed checkbox to URL list workflows
- Add 'Fail Crawl On Failed URL' to crawl workflow setup docs
* feat: move do_setup to new unified format at root of ansible/ dir to allow sharing roles, inventory with playbooks for other deployment types
* fix: pass ansible lint
* update do settings to current deployment:
- bump main node params
- add additional settings to helm values template
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
It changes the directory layout of the ansible playbook to a
more "best practices" friendly approach using ansible roles and
a real inventory file
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
- no longer using :latest by default in values.yaml, instead updating version with each release
- set chart version to match app version in Chart.yaml
- update version in helm chart and values.yaml as part of update-version.sh script
- update test.yaml and local-config.yaml to enable using :latest tag images
- ci: add ci script for packaging current helm chart
- docs: updates docs to indicate deploying directly from GitHub release
- docs: add script to fill in latest version for 'VERSION' using custom script
- chart: set local_service_port to 30870 by default, but use only if no ingress.
- default values.yaml set up for local deployment, local-config.yaml contains additional commented out examples
- ci draft: add deployment info to draft with helm install command for current version
- test: fix password check test
Backend:
- add 'maxCrawlSize' to models and crawljob spec
- add 'MAX_CRAWL_SIZE' to configmap
- add maxCrawlSize to new crawlconfig + update APIs
- operator: gracefully stop crawl if current size (from stats) exceeds maxCrawlSize
- tests: add max crawl size tests
Frontend:
- Add Max Crawl Size text box Limits tab
- Users enter max crawl size in GB, convert to bytes
- Add BYTES_PER_GB as constant for converting to bytes
- docs: Crawl Size Limit to user guide workflow setup section
Operator Refactor:
- use 'status.stopping' instead of 'crawl.stopping' to indicate crawl is being stopped, as changing later has no effect in operator
- add is_crawl_stopping() to return if crawl is being stopped, based on crawl.stopping or size or time limit being reached
- crawlerjob status: store byte size under 'size', human readable size under 'sizeHuman' for clarity
- size stat always exists so remove unneeded conditional (defaults to 0)
- store raw byte size in 'size', human readable size in 'sizeHuman'
Charts:
- subchart: update crawlerjob crd in btrix-crds to show status.stopping instead of spec.stopping
- subchart: show 'sizeHuman' property instead of 'size'
- bump subchart version to 0.1.1
---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
* 1.6 docs update
### Changes
- Adds note in style guide about referencing actions in the app
- Adds page for Browser Profiles
- Adds callout for uploads in the context of combining items from multiple sources
- Adds page for Collections
- Adds page for Crawl Workflows
- Updates index to link to new dedicated Crawl Workflow page in addition to the Crawl Workflow Setup page
- Updates Org Settings page action styling in accordance with new rules
- Updates Crawl Workflow Setup page with links to the new pages and a hierarchy fix for the first item
- Updates user guide navigation with a new section for crawling related items
---------
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
- Adds section about the admonitions we use and their meanings when writing documentation
- Heading hierarchy changes (fixed my past blunders!)
- Removes section about GitHub Flavored Markdown — it's not really relevant here anymore considering how much custom stuff we have.
- Clarifies use case for frontend development server
- Fixes incorrect sample API URLs
- Adds additional detail around requirements and quickstart
- Links back to docs from frontend README
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
* feat: ansible DO teardown
* fix(DO): idempotency issues in ansible teardown
* chore(DO): remove unused code
* docs(ansible): mention teardown in the docs
* fix: pass ansible-lint
* fix: point database backup upload to the correct location in DO space
- Adds collections search and list to workflow editor
- Adds collections to workflow details component
- Adds namePrefix filter to backend GET /orgs/{oid}/collections endpoint to support case-insensitive searching of collections
- Adds documentation for new setting
---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>