docs: formatting fixes & minor content updates (#1091)
Additional tweaks on Browser Profiles pages + general consistency pass
This commit is contained in:
parent
02a01e7abb
commit
2952988864
@ -2,18 +2,18 @@
|
||||
|
||||
*Playbook Path: [ansible/playbooks/install_microk8s.yml](https://github.com/webrecorder/browsertrix-cloud/blob/main/ansible/playbooks/do_setup.yml)*
|
||||
|
||||
This playbook provides an easy way to install BrowserTrix Cloud on DigitalOcean. It automatically sets up Browsertrix with, LetsEncrypt certificates.
|
||||
This playbook provides an easy way to install Browsertrix Cloud on DigitalOcean. It automatically sets up Browsertrix with LetsEncrypt certificates.
|
||||
|
||||
### Requirements
|
||||
|
||||
To run this ansible playbook, you need to:
|
||||
|
||||
* Have a [DigitalOcean Account](https://m.do.co/c/e0db3814e33e) where this will run.
|
||||
* Create a [DigitalOcean API Key](https://cloud.digitalocean.com/account/api) which will need to be set in your terminal sessions environment variables `export DO_API_TOKEN`
|
||||
* `doctl` command line client configured (run `doctl auth init`)
|
||||
* Create a [DigitalOcean Spaces](https://docs.digitalocean.com/reference/api/spaces-api/) API Key which will also need to be set in your terminal sessions environment variables, which should be set as `DO_AWS_ACCESS_KEY` and `DO_AWS_SECRET_KEY`
|
||||
* Configure a DNS A Record and CNAME record.
|
||||
* Have a working python and pip configuration through your OS Package Manager
|
||||
- Have a [DigitalOcean Account](https://m.do.co/c/e0db3814e33e) where this will run.
|
||||
- Create a [DigitalOcean API Key](https://cloud.digitalocean.com/account/api) which will need to be set in your terminal sessions environment variables `export DO_API_TOKEN`
|
||||
- `doctl` command line client configured (run `doctl auth init`)
|
||||
- Create a [DigitalOcean Spaces](https://docs.digitalocean.com/reference/api/spaces-api/) API Key which will also need to be set in your terminal sessions environment variables, which should be set as `DO_AWS_ACCESS_KEY` and `DO_AWS_SECRET_KEY`
|
||||
- Configure a DNS A Record and CNAME record.
|
||||
- Have a working python and pip configuration through your OS Package Manager
|
||||
|
||||
#### Install
|
||||
|
||||
|
@ -2,17 +2,16 @@
|
||||
|
||||
*Playbook Path: [ansible/playbooks/install_microk8s.yml](https://github.com/webrecorder/browsertrix-cloud/blob/main/ansible/playbooks/install_microk8s.yml)*
|
||||
|
||||
This playbook provides an easy way to install Browsertrix Cloud on an Ubuntu (tested on Jammy Jellyfish) and a RedHat 9 (tested on Rocky Linux 9).
|
||||
It automatically sets up Browsertrix with, Letsencrypt certificates.
|
||||
This playbook provides an easy way to install Browsertrix Cloud on Ubuntu (tested on Jammy Jellyfish) and RedHat 9 (tested on Rocky Linux 9). It automatically sets up Browsertrix with Letsencrypt certificates.
|
||||
|
||||
### Requirements
|
||||
|
||||
To run this ansible playbook, you need to:
|
||||
|
||||
* Have a server / VPS where browsertrix will run.
|
||||
* Configure a DNS A Record to point at your server's IP address.
|
||||
* Make sure you can ssh to it, with a sudo user: ssh <your-user>@<your-domain>
|
||||
* Install Ansible on your local machine (the control machine).
|
||||
- Have a server / VPS where browsertrix will run.
|
||||
- Configure a DNS A Record to point at your server's IP address.
|
||||
- Make sure you can ssh to it, with a sudo user: ssh <your-user>@<your-domain>
|
||||
- Install Ansible on your local machine (the control machine).
|
||||
|
||||
#### Install
|
||||
|
||||
|
@ -10,6 +10,4 @@ The main requirements for Browsertrix Cloud are:
|
||||
- [Helm 3](https://helm.sh/) (package manager for Kubernetes)
|
||||
|
||||
|
||||
We have prepared a [Local Deployment Guide](./local) which covers several options for testing Browsertrix Cloud locally on a single machine,
|
||||
as well as a [Production (Self-Hosted and Cloud) Deployment](./production) guides to help with
|
||||
setting up Browsertrix Cloud for different production scenarios.
|
||||
We have prepared a [Local Deployment Guide](./local) which covers several options for testing Browsertrix Cloud locally on a single machine, as well as a [Production (Self-Hosted and Cloud) Deployment](./production) guides to help with setting up Browsertrix Cloud for different production scenarios.
|
||||
|
@ -8,13 +8,13 @@ Before running Browsertrix Cloud, you'll need to set up a running [Kubernetes](h
|
||||
|
||||
Today, there are numerous ways to deploy Kubernetes fairly easily, and we recommend trying one of the single-node options, which include Docker Desktop, microk8s, minikube and k3s.
|
||||
|
||||
The instructions below assume you have cloned the [https://github.com/webrecorder/browsertrix-cloud](https://github.com/webrecorder/browsertrix-cloud) repository locally, and have local package managers for your platform (eg. `brew` for Mac, `choco` for Windows, etc...) already installed.
|
||||
The instructions below assume you have cloned the [https://github.com/webrecorder/browsertrix-cloud](https://github.com/webrecorder/browsertrix-cloud) repository locally, and have local package managers for your platform (eg. `brew` for macOS, `choco` for Windows, etc...) already installed.
|
||||
|
||||
Here are some environment specific instructions for setting up a local cluster from different Kubernetes vendors:
|
||||
|
||||
??? info "Docker Desktop (recommended for Mac and Windows)"
|
||||
??? info "Docker Desktop (recommended for macOS and Windows)"
|
||||
|
||||
For Mac and Windows, we recommend testing out Browsertrix Cloud using Kubernetes support in Docker Desktop as that will be one of the simplest options.
|
||||
For macOS and Windows, we recommend testing out Browsertrix Cloud using Kubernetes support in Docker Desktop as that will be one of the simplest options.
|
||||
|
||||
1. [Install Docker Desktop](https://www.docker.com/products/docker-desktop/) if not already installed.
|
||||
|
||||
@ -22,7 +22,7 @@ Here are some environment specific instructions for setting up a local cluster f
|
||||
|
||||
3. Restart Docker Desktop if asked, and wait for it to fully restart.
|
||||
|
||||
4. Install [Helm](https://helm.sh/), which can be installed with `brew install helm` (Mac) or `choco install kubernetes-helm` (Windows) or following some of the [other install options](https://helm.sh/docs/intro/install/)
|
||||
4. Install [Helm](https://helm.sh/), which can be installed with `brew install helm` (macOS) or `choco install kubernetes-helm` (Windows) or following some of the [other install options](https://helm.sh/docs/intro/install/)
|
||||
|
||||
??? info "MicroK8S (recommended for Ubuntu)"
|
||||
|
||||
@ -36,19 +36,19 @@ Here are some environment specific instructions for setting up a local cluster f
|
||||
|
||||
Note: microk8s comes with its own version helm, so you don't need to install it separately. Replace `helm` with `microk8s helm3` in the subsequent instructions below.
|
||||
|
||||
??? info "Minikube (Windows, Mac or Linux)"
|
||||
??? info "Minikube (Windows, macOS, or Linux)"
|
||||
|
||||
1. Install Minikube [following installation instructions](https://minikube.sigs.k8s.io/docs/start/), eg. `brew install minikube`.
|
||||
Note that Minikube also requires Docker or another container management system to be installed as well.
|
||||
|
||||
2. Install [Helm](https://helm.sh/), which can be installed with `brew install helm` (Mac) or `choco install kubernetes-helm` (Windows) or following some of the [other install options](https://helm.sh/docs/intro/install/)
|
||||
2. Install [Helm](https://helm.sh/), which can be installed with `brew install helm` (macOS) or `choco install kubernetes-helm` (Windows) or following some of the [other install options](https://helm.sh/docs/intro/install/)
|
||||
|
||||
|
||||
??? info "K3S (recommended for non-Ubuntu Linux)"
|
||||
|
||||
1. Install K3s [as per the instructions](https://docs.k3s.io/quick-start)
|
||||
|
||||
2. Install [Helm](https://helm.sh/), which can be installed with `brew install helm` (Mac) or `choco install kubernetes-helm` (Windows) or following some of the [other install options](https://helm.sh/docs/intro/install/)
|
||||
2. Install [Helm](https://helm.sh/), which can be installed with `brew install helm` (macOS) or `choco install kubernetes-helm` (Windows) or following some of the [other install options](https://helm.sh/docs/intro/install/)
|
||||
|
||||
3. Set `KUBECONFIG` to point to the config for K3S: `export KUBECONFIG=/etc/rancher/k3s/k3s.yaml` to ensure Helm will use the correct version.
|
||||
|
||||
@ -105,9 +105,9 @@ The command will exit when all pods have been loaded, or if there is an error an
|
||||
|
||||
If the command succeeds, you should be able to access Browsertrix Cloud by loading: **[http://localhost:30870/](http://localhost:30870/)** in your browser.
|
||||
|
||||
??? info "Minikube (on Mac)"
|
||||
??? info "Minikube (on macOS)"
|
||||
|
||||
When using Minikube on a Mac, the port will not be 30870. Instead, Minikube opens a tunnel to a random port,
|
||||
When using Minikube on a macOS, the port will not be 30870. Instead, Minikube opens a tunnel to a random port,
|
||||
obtained by running `minikube service browsertrix-cloud-frontend --url` in a separate terminal.
|
||||
Use the provided URL (in the format `http://127.0.0.1:<TUNNEL_PORT>`) instead.
|
||||
|
||||
@ -140,8 +140,7 @@ To uninstall, run `helm uninstall btrix`.
|
||||
|
||||
By default, the database + storage volumes are not automatically deleted, so you can run `helm upgrade ...` again to restart the cluster in its current state.
|
||||
|
||||
If you are upgrading from a previous version, and run into issues with `helm upgrade ...`, we recommend
|
||||
uninstalling and then re-running upgrade.
|
||||
If you are upgrading from a previous version, and run into issues with `helm upgrade ...`, we recommend uninstalling and then re-running upgrade.
|
||||
|
||||
## Deleting all Data
|
||||
|
||||
@ -149,6 +148,4 @@ To fully delete all persistent data (db + archives) created in the cluster, also
|
||||
|
||||
## Deploying for Local Development
|
||||
|
||||
These instructions are intended for deploying the cluster from the latest release.
|
||||
See [setting up cluster for local development](../develop/local-dev-setup.md) for additional customizations related to
|
||||
developing Browsertrix Cloud and deploying from local images.
|
||||
These instructions are intended for deploying the cluster from the latest release. See [setting up cluster for local development](../develop/local-dev-setup.md) for additional customizations related to developing Browsertrix Cloud and deploying from local images.
|
||||
|
@ -1,7 +1,6 @@
|
||||
# Production: Self-Hosted and Cloud
|
||||
|
||||
For production and hosted deployments (both on a single machine or in the cloud), the only requirement is to have a designed domain
|
||||
and (strongly recommended, but not required) second domain for signing web archives.
|
||||
For production and hosted deployments (both on a single machine or in the cloud), the only requirement is to have a designed domain and (strongly recommended, but not required) second domain for signing web archives.
|
||||
|
||||
We are also experimenting with [Ansible playbooks](../deploy/ansible) for cloud deployment setups.
|
||||
|
||||
|
@ -110,7 +110,7 @@ There are a lot of different options provided by Material for MkDocs — So many
|
||||
???+ Note
|
||||
The default call-out, used to highlight something if there isn't a more relevant one — should generally be expanded by default but can be collapsable by the user if the note is long.
|
||||
|
||||
!!! Tip
|
||||
!!! Tip "Tip — May have a title stating the tip or best practice"
|
||||
Used to highlight a point that is useful for everyone to understand about the documented subject — should be expanded and kept brief.
|
||||
|
||||
???+ Info "Info — Must have a title describing the context under which this information is useful"
|
||||
|
@ -72,9 +72,9 @@ If connecting to a local deployment cluster, set `API_BASE_URL` to:
|
||||
API_BASE_URL=http://localhost:30870
|
||||
```
|
||||
|
||||
??? info "Port when using Minikube (on Mac)"
|
||||
??? info "Port when using Minikube (on macOS)"
|
||||
|
||||
When using Minikube on a Mac, the port will not be 30870. Instead, Minikube opens a tunnel to a random port,
|
||||
When using Minikube on macOS, the port will not be 30870. Instead, Minikube opens a tunnel to a random port,
|
||||
obtained by running `minikube service browsertrix-cloud-frontend --url` in a separate terminal.
|
||||
|
||||
Set API_BASE_URL to provided URL instead, eg. `API_BASE_URL=http://127.0.0.1:<TUNNEL_PORT>`
|
||||
|
@ -13,8 +13,7 @@ The deployment can then be [further customized for local development](./local-de
|
||||
|
||||
### Backend
|
||||
|
||||
The backend is an API-only system, using the FastAPI framework. The latest API reference is available
|
||||
under ./api of a running cluster.
|
||||
The backend is an API-only system, using the FastAPI framework. The latest API reference is available under ./api of a running cluster.
|
||||
|
||||
At this time, the backend must be deployed in the Kubernetes cluster.
|
||||
|
||||
|
@ -125,12 +125,10 @@ Refer back to the [Local Development guide](../deploy/local.md#waiting-for-clust
|
||||
|
||||
## Update the Images
|
||||
|
||||
After making any changes to backend code (in `./backend`) or frontend code (in `./frontend`),
|
||||
you'll need to rebuild the images as specified above, before running `helm upgrade ...` to re-deploy.
|
||||
After making any changes to backend code (in `./backend`) or frontend code (in `./frontend`), you'll need to rebuild the images as specified above, before running `helm upgrade ...` to re-deploy.
|
||||
|
||||
Changes to settings in `./chart/local.yaml` can be deployed with `helm upgrade ...` directly.
|
||||
|
||||
## Deploying Frontend Only
|
||||
|
||||
If you are just making changes to the frontend, you can also [deploy the frontend separately](frontend-dev.md)
|
||||
using a dev server for quicker iteration.
|
||||
If you are just making changes to the frontend, you can also [deploy the frontend separately](frontend-dev.md) using a dev server for quicker iteration.
|
||||
|
@ -1,21 +1,20 @@
|
||||
# Browser Profiles
|
||||
|
||||
Browser Profiles are saved instances of a web browsing session that can be reused to crawl websites as they were configued, with any cookies or saved login sessions. They are specifically useful for crawling websites as a logged in user or accepting cookie consent popups.
|
||||
Browser profiles are saved instances of a web browsing session that can be reused to crawl websites as they were configued, with any cookies or saved login sessions. Using a pre-configured profile also means that content that can only be viewed by logged in users can be archived, without archiving the actual login credentials.
|
||||
|
||||
Using a pre-created profile means that paywalled content can be archived, without archiving the actual login credentials.
|
||||
!!! tip "Best practice: Create and use web archiving-specific accounts for crawling with browser profiles"
|
||||
|
||||
??? info "Best practice: Create and use web archiving-specific accounts"
|
||||
For the following reasons, we recommend creating dedicated accounts for archiving anything that is locked behind login credentials but otherwise public, especially on social media platforms.
|
||||
|
||||
Some websites may rate limit or lock your account if they deem crawling-related activity to be suspicious, such as logging in from a new location.
|
||||
- While user names and passwords are not, the access tokens for logged in websites used in the browser profile creation process _are stored_ by the server.
|
||||
|
||||
While your login information (username, password) is not archived, *other* data such as cookies, location, etc.. may be part of a logged in content (after all, personalized content is often the goal of paywalls).
|
||||
- Some websites may rate limit or lock accounts for reasons they deem to be suspicious, such as logging in from a new location or any crawling-related activity.
|
||||
|
||||
Due to nature of social media especially, existing accounts may have personally identifiable information, even when accessing otherwise public content.
|
||||
- While login information (username, password) is not archived, *other* data such as cookies, location, etc.. may be included in the resulting crawl (after all, personalized content is often the goal of sites that require credentials to view content).
|
||||
|
||||
For these reasons, we recommend creating dedicated accounts for archiving anything that is paywalled but otherwise public, especially on social media platforms.
|
||||
|
||||
Of course, there are exceptions -- such as when the goal is to archive personalized or private content accessible only from designated accounts.
|
||||
- Due to nature of social media specifically, existing accounts may have personally identifiable information, even when accessing otherwise public content.
|
||||
|
||||
Of course, there are exceptions — such as when the goal is to archive personalized or private content accessible only from designated accounts.
|
||||
|
||||
## Creating New Browser Profiles
|
||||
|
||||
@ -28,4 +27,3 @@ Press the _Next_ button to save the browser profile with a _Name_ and _Descripti
|
||||
Sometimes websites will log users out or expire cookies after a period of time. In these cases, when crawling the browser profile can still be loaded but may not behave as it did when it was initially set up.
|
||||
|
||||
To update the profile, go to the profile's details page and press the _Edit Browser Profile_ button to load and interact with the sites that need to be re-configured. When finished, press the _Save Browser Profile_ button to return to the profile's details page.
|
||||
|
||||
|
@ -10,7 +10,8 @@ If you have been sent an [invite](org-settings#members), enter a password and na
|
||||
|
||||
If the server has enabled signups and you have been given a registration link, enter your email address, password, and name to create a new account. Your account will be added to the server's default organization.
|
||||
|
||||
!!! info "At this time, the name field is not yet editable."
|
||||
!!! note
|
||||
Names chosen on signup cannot be changed later.
|
||||
|
||||
---
|
||||
|
||||
|
@ -26,7 +26,7 @@ It is also available under the _Additional URLs_ section for Seeded Crawls where
|
||||
|
||||
When enabled, the crawler will visit all the links it finds within each page defined in the _List of URLs_ field.
|
||||
|
||||
??? tip "Crawling tags & search queries with URL List crawls"
|
||||
??? example "Crawling tags & search queries with URL List crawls"
|
||||
This setting can be useful for crawling the content of specific tags or searh queries. Specify the tag or search query URL(s) in the _List of URLs_ field, e.g: `https://example.com/search?q=tag`, and enable _Include Any Linked Page_ to crawl all the content present on that search query page.
|
||||
|
||||
### Crawl Start URL
|
||||
|
Loading…
Reference in New Issue
Block a user