browsertrix

Author	SHA1	Message	Date
Ilya Kreymer	05c1129fb8	Frontend + Backend Integrated Deployment (K8s only) (#45 ) * support running backend + frontend together on k8s * split nginx container into separate frontend service, which uses nignx-base image and the static frontend files * add nginx-based frontend image to docker-compose build (for building only, docker-based combined deployment not yet supported) * backend: - fix paths for email templates - chart: support '--set backend_only=1' and '--set frontend_only=1' to only force deploy one or the other - run backend from root /api in uvicorn	2021-12-03 10:17:22 -08:00
Ilya Kreymer	d0b54dd752	Enable sending emails in K8S, trigger verification e-mail on registration. (#38 ) * k8s: support email configuration support sending reset password email fix for #32 * fastapi users: update to latest (8.1.2) send verification email upon registration * update to latest fastapi-users(8.1.2), refactor to use UserManager class ensure verification e-mail sent upon registration, w/o requiring separate apicall fixes #32 * add email options to default chart/values.yaml * separate usermanager init from fastapi users init, fix for sending invite emails	2021-11-30 23:50:38 -08:00
Ilya Kreymer	3d4d7049a2	Misc backend fixes for cloud deployment (#26 ) * misc backend fixes: - fix running w/o local minio - ensure crawler image pull policy is configurable, loaded via chart value - use digitalocean repo for main backend image (for now) - add bucket_name to config only if using default bucket * enable all behaviors, support 'access_endpoint_url' for default storages * debugging: add 'no_delete_jobs' setting for k8s and docker to disable deletion of completed jobs	2021-11-25 11:58:26 -08:00
Ilya Kreymer	57a4b6b46f	add collections api: - collections defined by name per archive - can update collections with additional metadata (currently just description) - crawl config api accepts a list of collections by name, resolved to collection uids and stored in config - finished crawls also associated with collection list - /archives/{aid}/collections/{name} can list all crawl artifacts (wacz files) from a named collection (in frictionless data package-ish format) - /archives/{aid}/collections/$all lists all crawled artifacts for the archive readiness check: add /healthz endpoints for app and nginx ingress: add /data/ route to local bucket storage improvements: - for default storages, store path only, and prepend default storage access endpoint - collections api returns the paths using the storage access endpoint - define default storages as secrets in k8s (can support multiple), hard-coded in docker (only one for now)	2021-10-27 09:39:14 -07:00
Ilya Kreymer	c38e0b7bf7	use redis based queue instead of url for crawl done webhook update docker setup to support redis webhook, add consistent CRAWL_ARGS, additional fixes	2021-10-10 12:18:28 -07:00
Ilya Kreymer	4ae4005d74	add ingress + nginx container for better routing support screencasting to dynamically created service via nginx (k8s only thus far) add crawl /watch endpoint to enable watching, creates service if doesn't exist add crawl /running endpoint to check if crawl is running nginx auth check in place, but not yet enabled add k8s nginx.conf add missing chart files file reorg: move docker config to configs/ k8s: add readiness check for nginx and api containers for smoother reloading ensure service deleted along with job todo: update dockerman with screencast support	2021-10-09 23:47:29 -07:00
Ilya Kreymer	19879fe349	Storage + Data Model Refactor (fixes #3 ): - Add default vs custom (s3) storage - K8S: All storages correspond to secrets - K8S: Default storages inited via helm - K8S: Custom storage results in custom secret (per archive) - K8S: Don't add secret per crawl config - API for changing storage per archive - Docker: default storage just hard-coded from env vars (only one for now) - Validate custom storage via aiobotocore before confirming - Data Model: remove usage from users - Data Model: support adding multiple files per crawl for parallel crawls - Data Model: track completions for parallel crawls - Data Model: initial support for tags per crawl, add collection as 'coll' tag README fixes	2021-10-09 18:58:40 -07:00
Ilya Kreymer	b6d1e492d7	add redis for storing crawl state data! - supported in both docker and k8s - additional pods with same job id automatically use same crawl state in redis - support dynamic scaling (#2) via /scale endpoint - k8s job parallelism adjusted dynamically for running job (only supported in k8s so far)	2021-09-17 15:02:11 -07:00
Ilya Kreymer	20b19f932f	make crawlTimeout a per-crawconfig property allow crawl complete/partial complete to update existing crawl state, eg. timeout enable handling backofflimitexceeded / deadlineexceeded failure, with possible success able to override the failure state filter out only active jobs in running crawls listing	2021-08-24 11:29:15 -07:00
Ilya Kreymer	ed27f3e3ee	job handling: - job watch: add watch loop for job failure (backofflimitexceeded) - set job retries + job timeout via chart values - sigterm starts graceful shutdown by default, including for timeout - use sigusr1 to switch to instant shutdown - update stop_crawl() to use new semantics	2021-08-23 21:22:01 -07:00
Ilya Kreymer	7146e054a4	crawls work (#1 ): - support listing existing crawls - add 'schedule' and 'manual' annotations to jobs, store in Crawl obj - ensure manual jobs are deleted when completed - support deleting crawls by id (but not data) - rename running crawl delete to '/cancel' change paths for local minio/mongo to /tmp	2021-08-23 18:01:29 -07:00
Ilya Kreymer	66c4e618eb	crawls work (#1 ), support for: - canceling a crawl (via sigterm) - stopping a crawl gracefully (via custom exec sigint)	2021-08-23 12:25:04 -07:00
Ilya Kreymer	627e9a6f14	cleanup crawl config, add separate 'runNow' field crawler: add cpu/memory limits minio: auto-create bucket for local minio	2021-08-19 14:15:21 -07:00
Ilya Kreymer	61a608bfbe	update models: - replace storages with archives, which have a single storage (for now) - crawls associated with archives - users below to archive, with one admin user (if archive created by default) - update crawlconfig for latest browsertrix-crawler (0.4.4) - k8s: fix permissions for crawler role - k8s: fix minio service (now requiring two ports)	2021-08-18 16:53:49 -07:00
Ilya Kreymer	f77eaccf41	support committing to s3 storage move mongo into separate optional deployment along with minio support for configuring storages support for deleting crawls, associated config and secrets	2021-07-02 15:56:24 -07:00
Ilya Kreymer	a111bacfb5	add k8s support - working apis for adding crawls, removing crawls in mongo, mapped to k8s cronjobs - more complete crawl spec - option to start on-demand job from cronjobs - optional minio in separate deployment/service	2021-06-30 21:48:44 -07:00

... 2 3 4 5 6

266 Commits