Go to file

Ilya Kreymer 87c5505c43 Backend Invite System Refactor (#53 ) * backend: - refactor invite system, move to separate InviteOps object, used by archives and user - supporting three invite use cases: 1) superuser invites any user not registered, not added to any archive 2) archive admin invites any user not registered, add to one of their archives 3) archive admin invites existing registered user, add to one of their archives - support superadmin invite via /users/invite (fixes #37) - superadmin invite has no archive set and does not add user to archive - don't send verification email when accepting from invite, fixes #50 - use different email template / accept url for existing user invite, eg, `/invite/accept/` - fix default token value in chart		2021-12-04 12:14:28 -08:00
backend	Backend Invite System Refactor (#53 )	2021-12-04 12:14:28 -08:00
chart	Backend Invite System Refactor (#53 )	2021-12-04 12:14:28 -08:00
configs	Add global settings endpoint (#52 )	2021-12-03 10:56:57 -08:00
frontend	Frontend prod build optimizations (#54 )	2021-12-03 18:00:14 -08:00
.DS_Store	Frontend prod build optimizations (#54 )	2021-12-03 18:00:14 -08:00
.gitignore	add ingress + nginx container for better routing	2021-10-09 23:47:29 -07:00
docker-compose.yml	Frontend + Backend Integrated Deployment (K8s only) (#45 )	2021-12-03 10:17:22 -08:00
docker-restart.sh	README + docker-restart.sh add	2021-08-25 16:27:22 -07:00
pylintrc	misc tweaks:	2021-08-25 18:34:49 -07:00
README.md	Storage + Data Model Refactor (fixes #3 ):	2021-10-09 18:58:40 -07:00
yarn.lock	Replace daisy UI with shoelace (#16 )	2021-11-19 19:38:58 -08:00

README.md

Browsertrix Cloud

Browsertrix Cloud is a cloud-native crawling system, which supports a multi-user, multi-archive crawling system to run natively in the cloud via Kubernetes or locally via Docker.

The system currently includes support for the following:

Fully API-driven, with OpenAPI specification for all APIs.
Multiple users, registered via email and/or invited to join Archives.
Crawling centered around Archives which are associated with an S3-compatible storage bucket.
Users may be part of multiple archives and have different roles in different archives
Archives contain crawler configs, which are passed to the crawler.
Crawls launched via a crontab-based schedule or manually on-demand
Crawls performed using Browsertrix Crawler.
Crawl config includes an optional timeout, after which crawl is stopped gracefully.
Crawl status is tracked in the DB (possible crawl states include: Completed, Partially-Complete (due to timeout or cancelation), Cancelation, Failure)

Deploying to Docker

To deploy via local Docker instance, copy the config.sample.env to config.env.

Docker Compose is required.

Then, run docker-compose build; docker-compose up -d to launch.

To update/relaunch, use ./docker-restart.sh.

The API should be available at: http://localhost:8000/docs

Note: When deployed in local Docker, failed crawls are not retried currently. Scheduling is handled by a subprocess, which stores active schedule in the DB.

Deploying to Kubernetes

To deploy to K8s, helm is required. Browsertrix Cloud comes with a helm chart, which can be installed as follows:

helm install -f ./chart/values.yaml btrix ./chart/

This will create a browsertrix-cloud service in the default namespace.

For a quick update, the following is recommended:

helm upgrade -f ./chart/values.yaml btrix ./chart/

Note: When deployed in Kubernetes, failed crawls are automatically retried. Scheduling is handled via Kubernetes Cronjobs, and crawl jobs are run in the crawlers namespace.

Browsertrix Cloud is currently in pre-alpha stages and not ready for production.