browsertrix

Author	SHA1	Message	Date
Francis Kayiwa	6833c9d676	Digital ocean setup (#314 ) - Ansible playbook for deploying on DigitalOcean, configuring space, k8s cluster, mongodb, domain / subdomain, signing subdomain, container registry, and cors - Generates helm chat in ./deploys/ directory for future use with helm directly - Initial support for deletion of created resources as well. - add documentation on how to use playbook default helm values: update to latest authsign, set default timeout to 120 seconds	2022-11-15 13:44:24 -08:00
Ilya Kreymer	447b0bf9b9	k8s chart + values tweak: (#317 ) - mongo chart to avoid requiring username/password if passing db_url - tweaks to default values (set registration enabled by default, longer) add missing options	2022-09-21 12:45:08 -07:00
Ilya Kreymer	1216f6cb66	K8s: update chart for local minio + mongo default (#301 ) * k8s chart fixes: mongo: pin to 5.0.11 version for now minio: create empty dir for local storage for now instead of using mc, use 'btrix-data' as bucket name	2022-09-02 13:07:47 -07:00
Ilya Kreymer	f0c079dc1b	k8s: update default images to dev images in values.yaml	2022-09-01 16:18:15 -07:00
Ilya Kreymer	5b6aa3bc95	Affinity + Tolerations + Cleanup Crawl Job (#256 ) * k8s: add tolerations for 'nodeType=crawling:NoSchedule' to allow scheduling crawling on designated nodes for crawler and profiles jobs and statefulsets * add affinity for 'nodeType=crawling' on crawling and profile browser statefulsets * refactor crawljob: combine crawl_updater logic into base crawl_job * increment new 'crawlAttemptCount' counter crawlconfig when crawl is started, not necessarily finished, to avoid deleting configs that had attempted but not finished crawls. * better external mongodb support: use MONGO_DB_URL to set custom url directly, otherwise build from username, password and mongo host	2022-06-10 19:21:37 -07:00
Ilya Kreymer	bf79959a5a	refactoring to use statefulsets + job (#245 ) - use statefulsets instead of deployments for mongo, redis, signer - use k8s job + statefulset for running crawls - use separate statefulset for crawl (scaled) and single-replica redis stateful set - move crawl job update login to crawl_updater - remove shared redis chart package refactor: - move to shared code to 'btrixcloud' - move k8s to 'btrixcloud.k8s' - move docker to 'btrixcloud.docker'	2022-06-05 10:37:17 -07:00
Ilya Kreymer	3df310ee4f	Backend: Crawls with Multiple WACZ files + Profile + Misc Fixes (#232 ) * backend: k8s: - support crawls with multiple wacz files, don't assume crawl complete after first wacz uploaded - if crawl is running and has wacz file, still show as running - k8s: allow configuring node selector for main pods (eg. nodeType=main) and for crawlers (eg. nodeType=crawling) - profiles: support uploading to alternate storage specified via 'shared_profile_storage' value is set - misc fixes for profiles * backend: ensure docker run_profile api matches k8s k8s chart: don't delete pvc and pv in helm chart * dependency: bump authsign to 0.4.0 docker: disable public redis port * profiles: fix path, profile browser return value * fix typo in presigned url cacheing	2022-05-19 18:40:41 -07:00
Ilya Kreymer	4b2f89db91	k8s: support for using a pre-made persistent volume/claim for crawling, configurable via CRAWLER_PV_CLAIM, otherwise using emptyDir k8s: ability to set deployment scale for frontend as well	2022-03-15 11:18:23 -07:00
Ilya Kreymer	8ce7a9802b	backend quick fix: chart/config: use screencastPort, fixed collection name k8s: set pod to never restart to see logs	2022-03-14 11:42:53 -07:00
Ilya Kreymer	51a573ef1f	backend prod settings: - set WEB_CONCURRENCY env var to configure number of backend api workers for both docker and k8s - set via 'backend_workers' in values.yaml - also add 'rwp_base_url' to values.yaml - update containers to use public webrecorder/browsertrix-backend and webrecorder/browsertrix-frontend containers - make liveness, readiness and startup health checks more tolerant	2022-02-28 18:09:13 -08:00
Ilya Kreymer	9bd402fa17	New WS Endpoint for Watching Crawl (#152 ) * backend support for new watch system (#134): - support for watch via redis pubsub and websocket connection to backend - can support watch from any number of crawler instances to support scaled crawls - use /archives/{aid}/crawls/{crawl_id}/watch/ws websocket endpoint - ws: ignore graceful connectionclosedok exception, log other exceptions - set logging to info to instead of debug for now (debug logs all ws traffic) - remove old watch apis in backend - remove old websocket routing to crawler instance for old watch system - oauth bearer check: support websockets, use websocket object if no request object - crawler args: replace --screencastPort with --screencastRedis	2022-02-22 10:33:10 -08:00
Ilya Kreymer	57e5b9fceb	k8s charts: update default resource usage in values.yaml add liveness probe for backend pod	2022-02-14 18:49:56 -08:00
Ilya Kreymer	2e2b8b329d	Add signing server via authsign (k8s only) (#107 ) - add k8s deployment of signing server, if 'signer.enabled' chart value if set - update ingress to provide access for 'signer.host' if signing server enabled to verify domain, run signing server itself on different port (also turn off ssl redirects to support signing server) - set WACZ_SIGN_URL and WACZ_SIGN_TOKEN (supported in browesertrix-crawler 0.5.0) - authsign deployment uses a volume to store current certs - add sample signer block, with signing disabled by default	2022-01-26 23:27:13 -08:00
Ilya Kreymer	b3ca501a19	helm chart: support cloud-based persistent volumes if values 'volume_storage_class' is specified. use PersistentVolumeClaim to create a persistent volume for each local service (mongo, minio, redis) when running in a cloud setup if cloud-specified volume storage class not specified, create default hostPath volume (eg. for minikube) lint: add default icon for chart	2022-01-15 19:34:31 -08:00
Ilya Kreymer	53beb84c01	Config superuser (#59 ) * backend: automatically create super user, fixes #57 - if SUPERUSER_EMAIL is set, superuser is created with `is_superuser` and `is_verified` settings, if user doesn't already exist. - if SUPERUSER_PASSWORD if set, the password for superuser is set, otherwise a random password is generated update sample SUPERUSER_EMAIL and SUPERUSER_PASSWORD in config file and chart. - ensure verification email is not sent if user already verified	2021-12-05 14:12:42 -08:00
Ilya Kreymer	11b797d535	Add global settings endpoint (#52 ) * backend: - add /api/settings endpoint for misc system-wide settings - setting 'registrationEnabled' if open registration should be enabled, set via REGISTRATION_ENABLED=1 env var - setting 'jwtTokenLifetimeMinutes' returns the jwt token expiry in seconds, configured in minutes via JWT_TOKEN_LIFETIME_MINUTES env var (default: 60)	2021-12-03 10:56:57 -08:00
Ilya Kreymer	d0b54dd752	Enable sending emails in K8S, trigger verification e-mail on registration. (#38 ) * k8s: support email configuration support sending reset password email fix for #32 * fastapi users: update to latest (8.1.2) send verification email upon registration * update to latest fastapi-users(8.1.2), refactor to use UserManager class ensure verification e-mail sent upon registration, w/o requiring separate apicall fixes #32 * add email options to default chart/values.yaml * separate usermanager init from fastapi users init, fix for sending invite emails	2021-11-30 23:50:38 -08:00
Ilya Kreymer	3d4d7049a2	Misc backend fixes for cloud deployment (#26 ) * misc backend fixes: - fix running w/o local minio - ensure crawler image pull policy is configurable, loaded via chart value - use digitalocean repo for main backend image (for now) - add bucket_name to config only if using default bucket * enable all behaviors, support 'access_endpoint_url' for default storages * debugging: add 'no_delete_jobs' setting for k8s and docker to disable deletion of completed jobs	2021-11-25 11:58:26 -08:00
Ilya Kreymer	57a4b6b46f	add collections api: - collections defined by name per archive - can update collections with additional metadata (currently just description) - crawl config api accepts a list of collections by name, resolved to collection uids and stored in config - finished crawls also associated with collection list - /archives/{aid}/collections/{name} can list all crawl artifacts (wacz files) from a named collection (in frictionless data package-ish format) - /archives/{aid}/collections/$all lists all crawled artifacts for the archive readiness check: add /healthz endpoints for app and nginx ingress: add /data/ route to local bucket storage improvements: - for default storages, store path only, and prepend default storage access endpoint - collections api returns the paths using the storage access endpoint - define default storages as secrets in k8s (can support multiple), hard-coded in docker (only one for now)	2021-10-27 09:39:14 -07:00
Ilya Kreymer	c38e0b7bf7	use redis based queue instead of url for crawl done webhook update docker setup to support redis webhook, add consistent CRAWL_ARGS, additional fixes	2021-10-10 12:18:28 -07:00
Ilya Kreymer	4ae4005d74	add ingress + nginx container for better routing support screencasting to dynamically created service via nginx (k8s only thus far) add crawl /watch endpoint to enable watching, creates service if doesn't exist add crawl /running endpoint to check if crawl is running nginx auth check in place, but not yet enabled add k8s nginx.conf add missing chart files file reorg: move docker config to configs/ k8s: add readiness check for nginx and api containers for smoother reloading ensure service deleted along with job todo: update dockerman with screencast support	2021-10-09 23:47:29 -07:00
Ilya Kreymer	19879fe349	Storage + Data Model Refactor (fixes #3 ): - Add default vs custom (s3) storage - K8S: All storages correspond to secrets - K8S: Default storages inited via helm - K8S: Custom storage results in custom secret (per archive) - K8S: Don't add secret per crawl config - API for changing storage per archive - Docker: default storage just hard-coded from env vars (only one for now) - Validate custom storage via aiobotocore before confirming - Data Model: remove usage from users - Data Model: support adding multiple files per crawl for parallel crawls - Data Model: track completions for parallel crawls - Data Model: initial support for tags per crawl, add collection as 'coll' tag README fixes	2021-10-09 18:58:40 -07:00
Ilya Kreymer	b6d1e492d7	add redis for storing crawl state data! - supported in both docker and k8s - additional pods with same job id automatically use same crawl state in redis - support dynamic scaling (#2) via /scale endpoint - k8s job parallelism adjusted dynamically for running job (only supported in k8s so far)	2021-09-17 15:02:11 -07:00
Ilya Kreymer	20b19f932f	make crawlTimeout a per-crawconfig property allow crawl complete/partial complete to update existing crawl state, eg. timeout enable handling backofflimitexceeded / deadlineexceeded failure, with possible success able to override the failure state filter out only active jobs in running crawls listing	2021-08-24 11:29:15 -07:00
Ilya Kreymer	ed27f3e3ee	job handling: - job watch: add watch loop for job failure (backofflimitexceeded) - set job retries + job timeout via chart values - sigterm starts graceful shutdown by default, including for timeout - use sigusr1 to switch to instant shutdown - update stop_crawl() to use new semantics	2021-08-23 21:22:01 -07:00
Ilya Kreymer	66c4e618eb	crawls work (#1 ), support for: - canceling a crawl (via sigterm) - stopping a crawl gracefully (via custom exec sigint)	2021-08-23 12:25:04 -07:00
Ilya Kreymer	627e9a6f14	cleanup crawl config, add separate 'runNow' field crawler: add cpu/memory limits minio: auto-create bucket for local minio	2021-08-19 14:15:21 -07:00
Ilya Kreymer	61a608bfbe	update models: - replace storages with archives, which have a single storage (for now) - crawls associated with archives - users below to archive, with one admin user (if archive created by default) - update crawlconfig for latest browsertrix-crawler (0.4.4) - k8s: fix permissions for crawler role - k8s: fix minio service (now requiring two ports)	2021-08-18 16:53:49 -07:00
Ilya Kreymer	f77eaccf41	support committing to s3 storage move mongo into separate optional deployment along with minio support for configuring storages support for deleting crawls, associated config and secrets	2021-07-02 15:56:24 -07:00
Ilya Kreymer	a111bacfb5	add k8s support - working apis for adding crawls, removing crawls in mongo, mapped to k8s cronjobs - more complete crawl spec - option to start on-demand job from cronjobs - optional minio in separate deployment/service	2021-06-30 21:48:44 -07:00

1 2 3

130 Commits