Commit Graph

266 Commits

Author SHA1 Message Date
Ilya Kreymer
05c1129fb8
Frontend + Backend Integrated Deployment (K8s only) (#45)
* support running backend + frontend together on k8s
* split nginx container into separate frontend service, which uses nignx-base image and the static frontend files
* add nginx-based frontend image to docker-compose build (for building only, docker-based combined deployment not yet supported)

* backend:
- fix paths for email templates
- chart: support '--set backend_only=1' and '--set frontend_only=1' to only force deploy one or the other
- run backend from root /api in uvicorn
2021-12-03 10:17:22 -08:00
Ilya Kreymer
d0b54dd752 Enable sending emails in K8S, trigger verification e-mail on registration. (#38)
* k8s: support email configuration
support sending reset password email
fix for #32

* fastapi users: update to latest (8.1.2)
send verification email upon registration

* update to latest fastapi-users(8.1.2), refactor to use UserManager class
ensure verification e-mail sent upon registration, w/o requiring separate apicall
fixes #32

* add email options to default chart/values.yaml

* separate usermanager init from fastapi users init, fix for sending invite emails
2021-11-30 23:50:38 -08:00
Ilya Kreymer
3d4d7049a2
Misc backend fixes for cloud deployment (#26)
* misc backend fixes:
- fix running w/o local minio
- ensure crawler image pull policy is configurable, loaded via chart value
- use digitalocean repo for main backend image (for now)
- add bucket_name to config only if using default bucket

* enable all behaviors, support 'access_endpoint_url' for default storages

* debugging: add 'no_delete_jobs' setting for k8s and docker to disable deletion of completed jobs
2021-11-25 11:58:26 -08:00
Ilya Kreymer
57a4b6b46f add collections api:
- collections defined by name per archive
- can update collections with additional metadata (currently just description)
- crawl config api accepts a list of collections by name, resolved to collection uids and stored in config
- finished crawls also associated with collection list
- /archives/{aid}/collections/{name} can list all crawl artifacts (wacz files) from a named collection (in frictionless data package-ish format)
- /archives/{aid}/collections/$all lists all crawled artifacts for the archive

readiness check: add /healthz endpoints for app and nginx
ingress: add /data/ route to local bucket

storage improvements:
- for default storages, store path only, and prepend default storage access endpoint
- collections api returns the paths using the storage access endpoint
- define default storages as secrets in k8s (can support multiple), hard-coded in docker (only one for now)
2021-10-27 09:39:14 -07:00
Ilya Kreymer
c38e0b7bf7 use redis based queue instead of url for crawl done webhook
update docker setup to support redis webhook, add consistent CRAWL_ARGS, additional fixes
2021-10-10 12:18:28 -07:00
Ilya Kreymer
4ae4005d74 add ingress + nginx container for better routing
support screencasting to dynamically created service via nginx (k8s only thus far)
add crawl /watch endpoint to enable watching, creates service if doesn't exist
add crawl /running endpoint to check if crawl is running
nginx auth check in place, but not yet enabled
add k8s nginx.conf
add missing chart files
file reorg: move docker config to configs/
k8s: add readiness check for nginx and api containers for smoother reloading
ensure service deleted along with job
todo: update dockerman with screencast support
2021-10-09 23:47:29 -07:00
Ilya Kreymer
19879fe349 Storage + Data Model Refactor (fixes #3):
- Add default vs custom (s3) storage
 - K8S: All storages correspond to secrets
 - K8S: Default storages inited via helm
 - K8S: Custom storage results in custom secret (per archive)
 - K8S: Don't add secret per crawl config
 - API for changing storage per archive
 - Docker: default storage just hard-coded from env vars (only one for now)
 - Validate custom storage via aiobotocore before confirming
 - Data Model: remove usage from users
 - Data Model: support adding multiple files per crawl for parallel crawls
 - Data Model: track completions for parallel crawls
 - Data Model: initial support for tags per crawl, add collection as 'coll' tag

README fixes
2021-10-09 18:58:40 -07:00
Ilya Kreymer
b6d1e492d7 add redis for storing crawl state data!
- supported in both docker and k8s
- additional pods with same job id automatically use same crawl state in redis
- support dynamic scaling (#2) via /scale endpoint - k8s job parallelism adjusted dynamically for running job (only supported in k8s so far)
2021-09-17 15:02:11 -07:00
Ilya Kreymer
20b19f932f make crawlTimeout a per-crawconfig property
allow crawl complete/partial complete to update existing crawl state, eg. timeout
enable handling backofflimitexceeded / deadlineexceeded failure, with possible success able to override the failure state
filter out only active jobs in running crawls listing
2021-08-24 11:29:15 -07:00
Ilya Kreymer
ed27f3e3ee job handling:
- job watch: add watch loop for job failure (backofflimitexceeded)
- set job retries + job timeout via chart values
- sigterm starts graceful shutdown by default, including for timeout
- use sigusr1 to switch to instant shutdown
- update stop_crawl() to use new semantics
2021-08-23 21:22:01 -07:00
Ilya Kreymer
7146e054a4 crawls work (#1):
- support listing existing crawls
- add 'schedule' and 'manual' annotations to jobs, store in Crawl obj
- ensure manual jobs are deleted when completed
- support deleting crawls by id (but not data)
- rename running crawl delete to '/cancel'

change paths for local minio/mongo to /tmp
2021-08-23 18:01:29 -07:00
Ilya Kreymer
66c4e618eb crawls work (#1), support for:
- canceling a crawl (via sigterm)
- stopping a crawl gracefully (via custom exec sigint)
2021-08-23 12:25:04 -07:00
Ilya Kreymer
627e9a6f14 cleanup crawl config, add separate 'runNow' field
crawler: add cpu/memory limits
minio: auto-create bucket for local minio
2021-08-19 14:15:21 -07:00
Ilya Kreymer
61a608bfbe update models:
- replace storages with archives, which have a single storage (for now)
- crawls associated with archives
- users below to archive, with one admin user (if archive created by default)
- update crawlconfig for latest browsertrix-crawler (0.4.4)
- k8s: fix permissions for crawler role
- k8s: fix minio service (now requiring two ports)
2021-08-18 16:53:49 -07:00
Ilya Kreymer
f77eaccf41 support committing to s3 storage
move mongo into separate optional deployment along with minio
support for configuring storages
support for deleting crawls, associated config and secrets
2021-07-02 15:56:24 -07:00
Ilya Kreymer
a111bacfb5 add k8s support
- working apis for adding crawls, removing crawls in mongo, mapped to k8s cronjobs
- more complete crawl spec
- option to start on-demand job from cronjobs
- optional minio in separate deployment/service
2021-06-30 21:48:44 -07:00