docs: formatting fixes & minor content updates (#1091)

Additional tweaks on Browser Profiles pages + general consistency pass
This commit is contained in:
Henry Wilkinson 2023-08-21 16:26:43 -04:00 committed by GitHub
parent 02a01e7abb
commit 2952988864
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
12 changed files with 42 additions and 53 deletions

View File

@ -2,18 +2,18 @@
*Playbook Path: [ansible/playbooks/install_microk8s.yml](https://github.com/webrecorder/browsertrix-cloud/blob/main/ansible/playbooks/do_setup.yml)*
This playbook provides an easy way to install BrowserTrix Cloud on DigitalOcean. It automatically sets up Browsertrix with, LetsEncrypt certificates.
This playbook provides an easy way to install Browsertrix Cloud on DigitalOcean. It automatically sets up Browsertrix with LetsEncrypt certificates.
### Requirements
To run this ansible playbook, you need to:
* Have a [DigitalOcean Account](https://m.do.co/c/e0db3814e33e) where this will run.
* Create a [DigitalOcean API Key](https://cloud.digitalocean.com/account/api) which will need to be set in your terminal sessions environment variables `export DO_API_TOKEN`
* `doctl` command line client configured (run `doctl auth init`)
* Create a [DigitalOcean Spaces](https://docs.digitalocean.com/reference/api/spaces-api/) API Key which will also need to be set in your terminal sessions environment variables, which should be set as `DO_AWS_ACCESS_KEY` and `DO_AWS_SECRET_KEY`
* Configure a DNS A Record and CNAME record.
* Have a working python and pip configuration through your OS Package Manager
- Have a [DigitalOcean Account](https://m.do.co/c/e0db3814e33e) where this will run.
- Create a [DigitalOcean API Key](https://cloud.digitalocean.com/account/api) which will need to be set in your terminal sessions environment variables `export DO_API_TOKEN`
- `doctl` command line client configured (run `doctl auth init`)
- Create a [DigitalOcean Spaces](https://docs.digitalocean.com/reference/api/spaces-api/) API Key which will also need to be set in your terminal sessions environment variables, which should be set as `DO_AWS_ACCESS_KEY` and `DO_AWS_SECRET_KEY`
- Configure a DNS A Record and CNAME record.
- Have a working python and pip configuration through your OS Package Manager
#### Install

View File

@ -2,17 +2,16 @@
*Playbook Path: [ansible/playbooks/install_microk8s.yml](https://github.com/webrecorder/browsertrix-cloud/blob/main/ansible/playbooks/install_microk8s.yml)*
This playbook provides an easy way to install Browsertrix Cloud on an Ubuntu (tested on Jammy Jellyfish) and a RedHat 9 (tested on Rocky Linux 9).
It automatically sets up Browsertrix with, Letsencrypt certificates.
This playbook provides an easy way to install Browsertrix Cloud on Ubuntu (tested on Jammy Jellyfish) and RedHat 9 (tested on Rocky Linux 9). It automatically sets up Browsertrix with Letsencrypt certificates.
### Requirements
To run this ansible playbook, you need to:
* Have a server / VPS where browsertrix will run.
* Configure a DNS A Record to point at your server's IP address.
* Make sure you can ssh to it, with a sudo user: ssh <your-user>@<your-domain>
* Install Ansible on your local machine (the control machine).
- Have a server / VPS where browsertrix will run.
- Configure a DNS A Record to point at your server's IP address.
- Make sure you can ssh to it, with a sudo user: ssh <your-user>@<your-domain>
- Install Ansible on your local machine (the control machine).
#### Install

View File

@ -10,6 +10,4 @@ The main requirements for Browsertrix Cloud are:
- [Helm 3](https://helm.sh/) (package manager for Kubernetes)
We have prepared a [Local Deployment Guide](./local) which covers several options for testing Browsertrix Cloud locally on a single machine,
as well as a [Production (Self-Hosted and Cloud) Deployment](./production) guides to help with
setting up Browsertrix Cloud for different production scenarios.
We have prepared a [Local Deployment Guide](./local) which covers several options for testing Browsertrix Cloud locally on a single machine, as well as a [Production (Self-Hosted and Cloud) Deployment](./production) guides to help with setting up Browsertrix Cloud for different production scenarios.

View File

@ -8,13 +8,13 @@ Before running Browsertrix Cloud, you'll need to set up a running [Kubernetes](h
Today, there are numerous ways to deploy Kubernetes fairly easily, and we recommend trying one of the single-node options, which include Docker Desktop, microk8s, minikube and k3s.
The instructions below assume you have cloned the [https://github.com/webrecorder/browsertrix-cloud](https://github.com/webrecorder/browsertrix-cloud) repository locally, and have local package managers for your platform (eg. `brew` for Mac, `choco` for Windows, etc...) already installed.
The instructions below assume you have cloned the [https://github.com/webrecorder/browsertrix-cloud](https://github.com/webrecorder/browsertrix-cloud) repository locally, and have local package managers for your platform (eg. `brew` for macOS, `choco` for Windows, etc...) already installed.
Here are some environment specific instructions for setting up a local cluster from different Kubernetes vendors:
??? info "Docker Desktop (recommended for Mac and Windows)"
??? info "Docker Desktop (recommended for macOS and Windows)"
For Mac and Windows, we recommend testing out Browsertrix Cloud using Kubernetes support in Docker Desktop as that will be one of the simplest options.
For macOS and Windows, we recommend testing out Browsertrix Cloud using Kubernetes support in Docker Desktop as that will be one of the simplest options.
1. [Install Docker Desktop](https://www.docker.com/products/docker-desktop/) if not already installed.
@ -22,7 +22,7 @@ Here are some environment specific instructions for setting up a local cluster f
3. Restart Docker Desktop if asked, and wait for it to fully restart.
4. Install [Helm](https://helm.sh/), which can be installed with `brew install helm` (Mac) or `choco install kubernetes-helm` (Windows) or following some of the [other install options](https://helm.sh/docs/intro/install/)
4. Install [Helm](https://helm.sh/), which can be installed with `brew install helm` (macOS) or `choco install kubernetes-helm` (Windows) or following some of the [other install options](https://helm.sh/docs/intro/install/)
??? info "MicroK8S (recommended for Ubuntu)"
@ -36,19 +36,19 @@ Here are some environment specific instructions for setting up a local cluster f
Note: microk8s comes with its own version helm, so you don't need to install it separately. Replace `helm` with `microk8s helm3` in the subsequent instructions below.
??? info "Minikube (Windows, Mac or Linux)"
??? info "Minikube (Windows, macOS, or Linux)"
1. Install Minikube [following installation instructions](https://minikube.sigs.k8s.io/docs/start/), eg. `brew install minikube`.
Note that Minikube also requires Docker or another container management system to be installed as well.
2. Install [Helm](https://helm.sh/), which can be installed with `brew install helm` (Mac) or `choco install kubernetes-helm` (Windows) or following some of the [other install options](https://helm.sh/docs/intro/install/)
2. Install [Helm](https://helm.sh/), which can be installed with `brew install helm` (macOS) or `choco install kubernetes-helm` (Windows) or following some of the [other install options](https://helm.sh/docs/intro/install/)
??? info "K3S (recommended for non-Ubuntu Linux)"
1. Install K3s [as per the instructions](https://docs.k3s.io/quick-start)
2. Install [Helm](https://helm.sh/), which can be installed with `brew install helm` (Mac) or `choco install kubernetes-helm` (Windows) or following some of the [other install options](https://helm.sh/docs/intro/install/)
2. Install [Helm](https://helm.sh/), which can be installed with `brew install helm` (macOS) or `choco install kubernetes-helm` (Windows) or following some of the [other install options](https://helm.sh/docs/intro/install/)
3. Set `KUBECONFIG` to point to the config for K3S: `export KUBECONFIG=/etc/rancher/k3s/k3s.yaml` to ensure Helm will use the correct version.
@ -105,9 +105,9 @@ The command will exit when all pods have been loaded, or if there is an error an
If the command succeeds, you should be able to access Browsertrix Cloud by loading: **[http://localhost:30870/](http://localhost:30870/)** in your browser.
??? info "Minikube (on Mac)"
??? info "Minikube (on macOS)"
When using Minikube on a Mac, the port will not be 30870. Instead, Minikube opens a tunnel to a random port,
When using Minikube on a macOS, the port will not be 30870. Instead, Minikube opens a tunnel to a random port,
obtained by running `minikube service browsertrix-cloud-frontend --url` in a separate terminal.
Use the provided URL (in the format `http://127.0.0.1:<TUNNEL_PORT>`) instead.
@ -140,8 +140,7 @@ To uninstall, run `helm uninstall btrix`.
By default, the database + storage volumes are not automatically deleted, so you can run `helm upgrade ...` again to restart the cluster in its current state.
If you are upgrading from a previous version, and run into issues with `helm upgrade ...`, we recommend
uninstalling and then re-running upgrade.
If you are upgrading from a previous version, and run into issues with `helm upgrade ...`, we recommend uninstalling and then re-running upgrade.
## Deleting all Data
@ -149,6 +148,4 @@ To fully delete all persistent data (db + archives) created in the cluster, also
## Deploying for Local Development
These instructions are intended for deploying the cluster from the latest release.
See [setting up cluster for local development](../develop/local-dev-setup.md) for additional customizations related to
developing Browsertrix Cloud and deploying from local images.
These instructions are intended for deploying the cluster from the latest release. See [setting up cluster for local development](../develop/local-dev-setup.md) for additional customizations related to developing Browsertrix Cloud and deploying from local images.

View File

@ -1,7 +1,6 @@
# Production: Self-Hosted and Cloud
For production and hosted deployments (both on a single machine or in the cloud), the only requirement is to have a designed domain
and (strongly recommended, but not required) second domain for signing web archives.
For production and hosted deployments (both on a single machine or in the cloud), the only requirement is to have a designed domain and (strongly recommended, but not required) second domain for signing web archives.
We are also experimenting with [Ansible playbooks](../deploy/ansible) for cloud deployment setups.

View File

@ -110,7 +110,7 @@ There are a lot of different options provided by Material for MkDocs — So many
???+ Note
The default call-out, used to highlight something if there isn't a more relevant one — should generally be expanded by default but can be collapsable by the user if the note is long.
!!! Tip
!!! Tip "Tip — May have a title stating the tip or best practice"
Used to highlight a point that is useful for everyone to understand about the documented subject — should be expanded and kept brief.
???+ Info "Info — Must have a title describing the context under which this information is useful"

View File

@ -72,9 +72,9 @@ If connecting to a local deployment cluster, set `API_BASE_URL` to:
API_BASE_URL=http://localhost:30870
```
??? info "Port when using Minikube (on Mac)"
??? info "Port when using Minikube (on macOS)"
When using Minikube on a Mac, the port will not be 30870. Instead, Minikube opens a tunnel to a random port,
When using Minikube on macOS, the port will not be 30870. Instead, Minikube opens a tunnel to a random port,
obtained by running `minikube service browsertrix-cloud-frontend --url` in a separate terminal.
Set API_BASE_URL to provided URL instead, eg. `API_BASE_URL=http://127.0.0.1:<TUNNEL_PORT>`

View File

@ -13,8 +13,7 @@ The deployment can then be [further customized for local development](./local-de
### Backend
The backend is an API-only system, using the FastAPI framework. The latest API reference is available
under ./api of a running cluster.
The backend is an API-only system, using the FastAPI framework. The latest API reference is available under ./api of a running cluster.
At this time, the backend must be deployed in the Kubernetes cluster.

View File

@ -125,12 +125,10 @@ Refer back to the [Local Development guide](../deploy/local.md#waiting-for-clust
## Update the Images
After making any changes to backend code (in `./backend`) or frontend code (in `./frontend`),
you'll need to rebuild the images as specified above, before running `helm upgrade ...` to re-deploy.
After making any changes to backend code (in `./backend`) or frontend code (in `./frontend`), you'll need to rebuild the images as specified above, before running `helm upgrade ...` to re-deploy.
Changes to settings in `./chart/local.yaml` can be deployed with `helm upgrade ...` directly.
## Deploying Frontend Only
If you are just making changes to the frontend, you can also [deploy the frontend separately](frontend-dev.md)
using a dev server for quicker iteration.
If you are just making changes to the frontend, you can also [deploy the frontend separately](frontend-dev.md) using a dev server for quicker iteration.

View File

@ -1,21 +1,20 @@
# Browser Profiles
Browser Profiles are saved instances of a web browsing session that can be reused to crawl websites as they were configued, with any cookies or saved login sessions. They are specifically useful for crawling websites as a logged in user or accepting cookie consent popups.
Browser profiles are saved instances of a web browsing session that can be reused to crawl websites as they were configued, with any cookies or saved login sessions. Using a pre-configured profile also means that content that can only be viewed by logged in users can be archived, without archiving the actual login credentials.
Using a pre-created profile means that paywalled content can be archived, without archiving the actual login credentials.
!!! tip "Best practice: Create and use web archiving-specific accounts for crawling with browser profiles"
??? info "Best practice: Create and use web archiving-specific accounts"
For the following reasons, we recommend creating dedicated accounts for archiving anything that is locked behind login credentials but otherwise public, especially on social media platforms.
Some websites may rate limit or lock your account if they deem crawling-related activity to be suspicious, such as logging in from a new location.
- While user names and passwords are not, the access tokens for logged in websites used in the browser profile creation process _are stored_ by the server.
While your login information (username, password) is not archived, *other* data such as cookies, location, etc.. may be part of a logged in content (after all, personalized content is often the goal of paywalls).
- Some websites may rate limit or lock accounts for reasons they deem to be suspicious, such as logging in from a new location or any crawling-related activity.
Due to nature of social media especially, existing accounts may have personally identifiable information, even when accessing otherwise public content.
- While login information (username, password) is not archived, *other* data such as cookies, location, etc.. may be included in the resulting crawl (after all, personalized content is often the goal of sites that require credentials to view content).
For these reasons, we recommend creating dedicated accounts for archiving anything that is paywalled but otherwise public, especially on social media platforms.
Of course, there are exceptions -- such as when the goal is to archive personalized or private content accessible only from designated accounts.
- Due to nature of social media specifically, existing accounts may have personally identifiable information, even when accessing otherwise public content.
Of course, there are exceptions — such as when the goal is to archive personalized or private content accessible only from designated accounts.
## Creating New Browser Profiles
@ -28,4 +27,3 @@ Press the _Next_ button to save the browser profile with a _Name_ and _Descripti
Sometimes websites will log users out or expire cookies after a period of time. In these cases, when crawling the browser profile can still be loaded but may not behave as it did when it was initially set up.
To update the profile, go to the profile's details page and press the _Edit Browser Profile_ button to load and interact with the sites that need to be re-configured. When finished, press the _Save Browser Profile_ button to return to the profile's details page.

View File

@ -10,7 +10,8 @@ If you have been sent an [invite](org-settings#members), enter a password and na
If the server has enabled signups and you have been given a registration link, enter your email address, password, and name to create a new account. Your account will be added to the server's default organization.
!!! info "At this time, the name field is not yet editable."
!!! note
Names chosen on signup cannot be changed later.
---

View File

@ -26,7 +26,7 @@ It is also available under the _Additional URLs_ section for Seeded Crawls where
When enabled, the crawler will visit all the links it finds within each page defined in the _List of URLs_ field.
??? tip "Crawling tags & search queries with URL List crawls"
??? example "Crawling tags & search queries with URL List crawls"
This setting can be useful for crawling the content of specific tags or searh queries. Specify the tag or search query URL(s) in the _List of URLs_ field, e.g: `https://example.com/search?q=tag`, and enable _Include Any Linked Page_ to crawl all the content present on that search query page.
### Crawl Start URL