browsertrix

Author	SHA1	Message	Date
Ilya Kreymer	c38e0b7bf7	use redis based queue instead of url for crawl done webhook update docker setup to support redis webhook, add consistent CRAWL_ARGS, additional fixes	2021-10-10 12:18:28 -07:00
Ilya Kreymer	4ae4005d74	add ingress + nginx container for better routing support screencasting to dynamically created service via nginx (k8s only thus far) add crawl /watch endpoint to enable watching, creates service if doesn't exist add crawl /running endpoint to check if crawl is running nginx auth check in place, but not yet enabled add k8s nginx.conf add missing chart files file reorg: move docker config to configs/ k8s: add readiness check for nginx and api containers for smoother reloading ensure service deleted along with job todo: update dockerman with screencast support	2021-10-09 23:47:29 -07:00
Ilya Kreymer	19879fe349	Storage + Data Model Refactor (fixes #3 ): - Add default vs custom (s3) storage - K8S: All storages correspond to secrets - K8S: Default storages inited via helm - K8S: Custom storage results in custom secret (per archive) - K8S: Don't add secret per crawl config - API for changing storage per archive - Docker: default storage just hard-coded from env vars (only one for now) - Validate custom storage via aiobotocore before confirming - Data Model: remove usage from users - Data Model: support adding multiple files per crawl for parallel crawls - Data Model: track completions for parallel crawls - Data Model: initial support for tags per crawl, add collection as 'coll' tag README fixes	2021-10-09 18:58:40 -07:00
Ilya Kreymer	b6d1e492d7	add redis for storing crawl state data! - supported in both docker and k8s - additional pods with same job id automatically use same crawl state in redis - support dynamic scaling (#2) via /scale endpoint - k8s job parallelism adjusted dynamically for running job (only supported in k8s so far)	2021-09-17 15:02:11 -07:00
Ilya Kreymer	ed27f3e3ee	job handling: - job watch: add watch loop for job failure (backofflimitexceeded) - set job retries + job timeout via chart values - sigterm starts graceful shutdown by default, including for timeout - use sigusr1 to switch to instant shutdown - update stop_crawl() to use new semantics	2021-08-23 21:22:01 -07:00
Ilya Kreymer	7146e054a4	crawls work (#1 ): - support listing existing crawls - add 'schedule' and 'manual' annotations to jobs, store in Crawl obj - ensure manual jobs are deleted when completed - support deleting crawls by id (but not data) - rename running crawl delete to '/cancel' change paths for local minio/mongo to /tmp	2021-08-23 18:01:29 -07:00
Ilya Kreymer	627e9a6f14	cleanup crawl config, add separate 'runNow' field crawler: add cpu/memory limits minio: auto-create bucket for local minio	2021-08-19 14:15:21 -07:00
Ilya Kreymer	61a608bfbe	update models: - replace storages with archives, which have a single storage (for now) - crawls associated with archives - users below to archive, with one admin user (if archive created by default) - update crawlconfig for latest browsertrix-crawler (0.4.4) - k8s: fix permissions for crawler role - k8s: fix minio service (now requiring two ports)	2021-08-18 16:53:49 -07:00
Ilya Kreymer	f77eaccf41	support committing to s3 storage move mongo into separate optional deployment along with minio support for configuring storages support for deleting crawls, associated config and secrets	2021-07-02 15:56:24 -07:00
Ilya Kreymer	a111bacfb5	add k8s support - working apis for adding crawls, removing crawls in mongo, mapped to k8s cronjobs - more complete crawl spec - option to start on-demand job from cronjobs - optional minio in separate deployment/service	2021-06-30 21:48:44 -07:00

1 2 3

110 Commits