Go to file

Ilya Kreymer d0b54dd752 Enable sending emails in K8S, trigger verification e-mail on registration. (#38 ) * k8s: support email configuration support sending reset password email fix for #32 * fastapi users: update to latest (8.1.2) send verification email upon registration * update to latest fastapi-users(8.1.2), refactor to use UserManager class ensure verification e-mail sent upon registration, w/o requiring separate apicall fixes #32 * add email options to default chart/values.yaml * separate usermanager init from fastapi users init, fix for sending invite emails		2021-11-30 23:50:38 -08:00
backend	Enable sending emails in K8S, trigger verification e-mail on registration. (#38 )	2021-11-30 23:50:38 -08:00
chart	Enable sending emails in K8S, trigger verification e-mail on registration. (#38 )	2021-11-30 23:50:38 -08:00
configs	add collections api:	2021-10-27 09:39:14 -07:00
frontend	Allow users to sign up through UI (#30 )	2021-11-30 08:57:53 -08:00
.gitignore	add ingress + nginx container for better routing	2021-10-09 23:47:29 -07:00
docker-compose.yml	Misc backend fixes for cloud deployment (#26 )	2021-11-25 11:58:26 -08:00
docker-restart.sh	README + docker-restart.sh add	2021-08-25 16:27:22 -07:00
pylintrc	misc tweaks:	2021-08-25 18:34:49 -07:00
README.md	Storage + Data Model Refactor (fixes #3 ):	2021-10-09 18:58:40 -07:00
yarn.lock	Replace daisy UI with shoelace (#16 )	2021-11-19 19:38:58 -08:00

README.md

Browsertrix Cloud

Browsertrix Cloud is a cloud-native crawling system, which supports a multi-user, multi-archive crawling system to run natively in the cloud via Kubernetes or locally via Docker.

The system currently includes support for the following:

Fully API-driven, with OpenAPI specification for all APIs.
Multiple users, registered via email and/or invited to join Archives.
Crawling centered around Archives which are associated with an S3-compatible storage bucket.
Users may be part of multiple archives and have different roles in different archives
Archives contain crawler configs, which are passed to the crawler.
Crawls launched via a crontab-based schedule or manually on-demand
Crawls performed using Browsertrix Crawler.
Crawl config includes an optional timeout, after which crawl is stopped gracefully.
Crawl status is tracked in the DB (possible crawl states include: Completed, Partially-Complete (due to timeout or cancelation), Cancelation, Failure)

Deploying to Docker

To deploy via local Docker instance, copy the config.sample.env to config.env.

Docker Compose is required.

Then, run docker-compose build; docker-compose up -d to launch.

To update/relaunch, use ./docker-restart.sh.

The API should be available at: http://localhost:8000/docs

Note: When deployed in local Docker, failed crawls are not retried currently. Scheduling is handled by a subprocess, which stores active schedule in the DB.

Deploying to Kubernetes

To deploy to K8s, helm is required. Browsertrix Cloud comes with a helm chart, which can be installed as follows:

helm install -f ./chart/values.yaml btrix ./chart/

This will create a browsertrix-cloud service in the default namespace.

For a quick update, the following is recommended:

helm upgrade -f ./chart/values.yaml btrix ./chart/

Note: When deployed in Kubernetes, failed crawls are automatically retried. Scheduling is handled via Kubernetes Cronjobs, and crawl jobs are run in the crawlers namespace.

Browsertrix Cloud is currently in pre-alpha stages and not ready for production.