browsertrix

Author	SHA1	Message	Date
Ilya Kreymer	aabb0b2a92	chart / deployment fixes to run on microk8s: (fixes #385 ) (#387 ) - ingress: fix proxying /data to minio, use another ingress which proxies correct host to ensure presigned urls work - presigning: determine if signing endpoint url (minio) or access endpoint (cloud bucket) based on if access endpoint is provided, set bool on storage object - chart: fix indent on incorrect storageClassName configs - ingress: make 'ingress_class' configurable (set to 'public' for microk8s, default to 'nginx') - minio: use older minio image which supports legacy fs based setup (for now) - nginx service: add 'nginx_service_use_node_port' config setting: if true, will use NodePort for frontend, other will use default (ClusterIP) and only for the frontend / nginx - chart: remove changing service type for other services	2022-11-30 09:21:58 -08:00
Ilya Kreymer	793611e5bb	add exclusion api, fixes #311 (#349 ) * add exclusion api, fixes #311 add new apis: `POST crawls/{crawl_id}/exclusion?regex=...` and `DELETE crawls/{crawl_id}/exclusion?regex=...` which will: - create new config with add 'regex' as exclusion (deleting or making inactive previous config) OR remove as exclusion. - update crawl to point to new config - update statefulset to point to new config, causing crawler pods to restart - filter out urls matching 'regex' from both queue and seen list (currently a bit slow) (when adding only) - return 400 if exclusion already existing when adding, or doesn't exist when removing - api reads redis list in reverse to match how exclusion queue is used	2022-11-12 17:24:30 -08:00
Ilya Kreymer	d340bceb39	style pass: normalize docstring spacing	2022-10-19 21:47:34 -07:00
Ilya Kreymer	6b63b72a13	backend config tweaks: - send SIGUSR2 instead of SIGUSR1 for scale down - chart: move persistentVolumeClaimRetentionPolicy to correct place in chart	2022-09-16 16:28:31 -07:00
Ilya Kreymer	2531a03e41	fix stopping crawls + profiles: (fixes #298 ) (#309 ) - regression fix: ensure correct signals are set to stop crawl (SIGUSER1 + SIGTERM) - crawl stop: if crawl is still running after 60 seconds, allow signal to be resent - regression fix: ensure crawling with profile is working in k8s	2022-09-09 18:31:43 -07:00
Ilya Kreymer	3859be009a	k8s: don't add entire crawl config as env var from configmap, add only specified env vars from configmap fix issue with crawls with large number of seeds failing due to unusually large env var	2022-08-11 15:33:22 -07:00
Ilya Kreymer	b11a5f136a	profile browser deletion/removal: - ensure profile browser DELETE command is working - ensure profile browser job expires if no initial ping - logging: print exception for base job if init fails	2022-08-02 18:31:33 -07:00
Ilya Kreymer	df905682a5	backend: fix scaling api response, return error details if available	2022-06-29 18:37:04 -07:00
Ilya Kreymer	2717a60763	improvements / bug fixes for stop/cancel handling: (#279 ) - only send signal if stopping, no need for canceling as pods/containers will be removed - refactor stop/cancel handling to be unified in manager, separate in job - when stopping / graceful shutdown, return false if sending signal fails - return success=true in json response if and only if stop/cancel actually succeeds, return 'error' message in error, should fix #270 - allow canceling after stopping / if stopping fails - ensure finished time is set in case of cancelation before crawl starts, should fix #273	2022-06-29 17:47:25 -07:00
Ilya Kreymer	418c07bf0d	Local swarm + podman support (#261 ) * backend: refactor swarm support to also support podman (#260) - implement podman support as subclass of swarm deployment - podman is used when 'RUNTIME=podman' env var is set - podman socket is mapped instead of docker socket - podman-compose is used instead of docker-compose (though docker-compose works with podman, it does not support secrets, but podman-compose does) - separate cli utils into SwarmRunner and PodmanRunner which extends it - using config.yaml and config.env, both copied from sample versions - work on simplifying config: add docker-compose.podman.yml and docker-compose.swarm.yml and signing and debug configs in ./configs - add {build,run,stop}-{swarm,podman}.sh in scripts dir - add init-configs, only copy if configs don't exist - build local image use current version of podman, to support both podman 3.x and 4.x - additional fixes for after testing podman on centos - docs: update Deployment.md to cover swarm, podman, k8s deployment	2022-06-14 00:13:49 -07:00
Ilya Kreymer	5b6aa3bc95	Affinity + Tolerations + Cleanup Crawl Job (#256 ) * k8s: add tolerations for 'nodeType=crawling:NoSchedule' to allow scheduling crawling on designated nodes for crawler and profiles jobs and statefulsets * add affinity for 'nodeType=crawling' on crawling and profile browser statefulsets * refactor crawljob: combine crawl_updater logic into base crawl_job * increment new 'crawlAttemptCount' counter crawlconfig when crawl is started, not necessarily finished, to avoid deleting configs that had attempted but not finished crawls. * better external mongodb support: use MONGO_DB_URL to set custom url directly, otherwise build from username, password and mongo host	2022-06-10 19:21:37 -07:00
Ilya Kreymer	dee354f252	affinity: add affinity for k8s crawl deployments: - prefer deploy crawler, redis and job to same zone - prefer deploying crawler and job together via crawler node type, redis via redis node type (all optional)	2022-06-07 21:52:04 -07:00
Ilya Kreymer	0c8a5a49b4	refactor to use docker swarm for local alternative to k8s instead of docker compose (#247 ): - use python-on-whale to use docker cli api directly, creating docker stack for each crawl or profile browser - configure storages via storages.yaml secret - add crawl_job, profile_job, splitting into base and k8s/swarm implementations - split manager into base crawlmanager and k8s/swarm implementations - swarm: load initial scale from db to avoid modifying fixed configs, in k8s, load from configmap - swarm: support scheduled jobs via swarm-cronjob service - remove docker dependencies (aiodocker, apscheduler, scheduling) - swarm: when using local minio, expose via /data/ route in nginx via extra include (in k8s, include dir is empty and routing handled via ingress) - k8s: cleanup minio chart: move init containers to minio.yaml - swarm: stateful set implementation to be consistent with k8s scaling: - don't use service replicas, - create a unique service with '-N' appended and allocate unique volume for each replica - allows crawl containers to be restarted w/o losing data - add volume pruning background service, as volumes can be deleted only after service shuts down fully - watch: fully simplify routing, route via replica index instead of ip for both k8s and swarm - rename network btrix-cloud-net -> btrix-net to avoid conflict with compose network	2022-06-05 10:37:17 -07:00
Ilya Kreymer	bf79959a5a	refactoring to use statefulsets + job (#245 ) - use statefulsets instead of deployments for mongo, redis, signer - use k8s job + statefulset for running crawls - use separate statefulset for crawl (scaled) and single-replica redis stateful set - move crawl job update login to crawl_updater - remove shared redis chart package refactor: - move to shared code to 'btrixcloud' - move k8s to 'btrixcloud.k8s' - move docker to 'btrixcloud.docker'	2022-06-05 10:37:17 -07:00

14 Commits