browsertrix

Author	SHA1	Message	Date
Ilya Kreymer	f55f84c60b	backend: - crawlconfigs cleanup: simplify get_crawl_configs api - return CrawlConfigOut for single crawlconfig api endpoint, include currCrawlId	2022-01-22 17:41:37 -08:00
Ilya Kreymer	77aa5213f2	quickfix: typo fix, return config, not archive, fixes #96	2022-01-22 17:21:29 -08:00
Ilya Kreymer	b506442b21	backend api: add curr crawl to crawlconfig listing (#95 ) * backend api: add current crawl id to crawlconfig listing - model: add 'currCrawlId' to CrawlConfig model - output: add response model to /crawlconfigs api response to show correct openapi model - rename crawl_configs -> crawlConfigs for consistency	2022-01-22 13:52:46 -08:00
Ilya Kreymer	88f1689e0e	crawlconfig: add 'name' property to crawl config superuser init: don't check invite token for verified superuser (automatic init) fix formatting	2022-01-15 19:06:48 -08:00
Ilya Kreymer	c561fe3af4	Support Invite Info APIs (#82 ) * backend: support exposing info about a particular invite, fixes part of #35 new apis are: - GET /users/invite/{token}?email={email} - no auth needed, get invite to new user - GET /users/me/invite/{token} - with auth, to get invite to join an archive for an existing user * get archive.name as well if invite is adding to an archive * first camelCase typo	2022-01-14 22:53:02 -08:00
Ilya Kreymer	53beb84c01	Config superuser (#59 ) * backend: automatically create super user, fixes #57 - if SUPERUSER_EMAIL is set, superuser is created with `is_superuser` and `is_verified` settings, if user doesn't already exist. - if SUPERUSER_PASSWORD if set, the password for superuser is set, otherwise a random password is generated update sample SUPERUSER_EMAIL and SUPERUSER_PASSWORD in config file and chart. - ensure verification email is not sent if user already verified	2021-12-05 14:12:42 -08:00
Ilya Kreymer	eaf8055063	Support unified docker + k8s deployment (#58 ) - adapt nginx config to work both in docker and k8s, using env vars to set urls backend: additional fixes: - use env vars with nginx config - fix settings api route - when sending e-mail, use the Host header for verification urls when available - prepare Dockerfile with full build from scratch in image, (disabled 'yarn install' for faster builds for now) - fix accept invite api for existing user to /archives/accept-invite/{token}	2021-12-05 13:02:26 -08:00
Ilya Kreymer	87c5505c43	Backend Invite System Refactor (#53 ) * backend: - refactor invite system, move to separate InviteOps object, used by archives and user - supporting three invite use cases: 1) superuser invites any user not registered, not added to any archive 2) archive admin invites any user not registered, add to one of their archives 3) archive admin invites existing registered user, add to one of their archives - support superadmin invite via /users/invite (fixes #37) - superadmin invite has no archive set and does not add user to archive - don't send verification email when accepting from invite, fixes #50 - use different email template / accept url for existing user invite, eg, `/invite/accept/` - fix default token value in chart	2021-12-04 12:14:28 -08:00
Ilya Kreymer	11b797d535	Add global settings endpoint (#52 ) * backend: - add /api/settings endpoint for misc system-wide settings - setting 'registrationEnabled' if open registration should be enabled, set via REGISTRATION_ENABLED=1 env var - setting 'jwtTokenLifetimeMinutes' returns the jwt token expiry in seconds, configured in minutes via JWT_TOKEN_LIFETIME_MINUTES env var (default: 60)	2021-12-03 10:56:57 -08:00
Ilya Kreymer	05c1129fb8	Frontend + Backend Integrated Deployment (K8s only) (#45 ) * support running backend + frontend together on k8s * split nginx container into separate frontend service, which uses nignx-base image and the static frontend files * add nginx-based frontend image to docker-compose build (for building only, docker-based combined deployment not yet supported) * backend: - fix paths for email templates - chart: support '--set backend_only=1' and '--set frontend_only=1' to only force deploy one or the other - run backend from root /api in uvicorn	2021-12-03 10:17:22 -08:00
Ilya Kreymer	081d6f8519	User Display Name Support + Token Refresh Support (#44 ) * backend api/data model improvements: - add 'name' property to user, can be set on registration, fixes #43 - in archive user list, include 'name' and 'role' for each user - don't include is_* property in user create/register and update - add /auth/jwt/refresh endpoint for refreshing token, fixes #34, support for #22 * allow jwt token lifetime to be settable via JWT_LIFETIME env var (default 3600)	2021-12-01 18:55:10 -08:00
Ilya Kreymer	d0b54dd752	Enable sending emails in K8S, trigger verification e-mail on registration. (#38 ) * k8s: support email configuration support sending reset password email fix for #32 * fastapi users: update to latest (8.1.2) send verification email upon registration * update to latest fastapi-users(8.1.2), refactor to use UserManager class ensure verification e-mail sent upon registration, w/o requiring separate apicall fixes #32 * add email options to default chart/values.yaml * separate usermanager init from fastapi users init, fix for sending invite emails	2021-11-30 23:50:38 -08:00
Ilya Kreymer	3d4d7049a2	Misc backend fixes for cloud deployment (#26 ) * misc backend fixes: - fix running w/o local minio - ensure crawler image pull policy is configurable, loaded via chart value - use digitalocean repo for main backend image (for now) - add bucket_name to config only if using default bucket * enable all behaviors, support 'access_endpoint_url' for default storages * debugging: add 'no_delete_jobs' setting for k8s and docker to disable deletion of completed jobs	2021-11-25 11:58:26 -08:00
Ilya Kreymer	57a4b6b46f	add collections api: - collections defined by name per archive - can update collections with additional metadata (currently just description) - crawl config api accepts a list of collections by name, resolved to collection uids and stored in config - finished crawls also associated with collection list - /archives/{aid}/collections/{name} can list all crawl artifacts (wacz files) from a named collection (in frictionless data package-ish format) - /archives/{aid}/collections/$all lists all crawled artifacts for the archive readiness check: add /healthz endpoints for app and nginx ingress: add /data/ route to local bucket storage improvements: - for default storages, store path only, and prepend default storage access endpoint - collections api returns the paths using the storage access endpoint - define default storages as secrets in k8s (can support multiple), hard-coded in docker (only one for now)	2021-10-27 09:39:14 -07:00
Ilya Kreymer	c38e0b7bf7	use redis based queue instead of url for crawl done webhook update docker setup to support redis webhook, add consistent CRAWL_ARGS, additional fixes	2021-10-10 12:18:28 -07:00
Ilya Kreymer	4ae4005d74	add ingress + nginx container for better routing support screencasting to dynamically created service via nginx (k8s only thus far) add crawl /watch endpoint to enable watching, creates service if doesn't exist add crawl /running endpoint to check if crawl is running nginx auth check in place, but not yet enabled add k8s nginx.conf add missing chart files file reorg: move docker config to configs/ k8s: add readiness check for nginx and api containers for smoother reloading ensure service deleted along with job todo: update dockerman with screencast support	2021-10-09 23:47:29 -07:00
Ilya Kreymer	19879fe349	Storage + Data Model Refactor (fixes #3 ): - Add default vs custom (s3) storage - K8S: All storages correspond to secrets - K8S: Default storages inited via helm - K8S: Custom storage results in custom secret (per archive) - K8S: Don't add secret per crawl config - API for changing storage per archive - Docker: default storage just hard-coded from env vars (only one for now) - Validate custom storage via aiobotocore before confirming - Data Model: remove usage from users - Data Model: support adding multiple files per crawl for parallel crawls - Data Model: track completions for parallel crawls - Data Model: initial support for tags per crawl, add collection as 'coll' tag README fixes	2021-10-09 18:58:40 -07:00
Ilya Kreymer	b6d1e492d7	add redis for storing crawl state data! - supported in both docker and k8s - additional pods with same job id automatically use same crawl state in redis - support dynamic scaling (#2) via /scale endpoint - k8s job parallelism adjusted dynamically for running job (only supported in k8s so far)	2021-09-17 15:02:11 -07:00
Ilya Kreymer	223658cfa2	misc tweaks: - better error handling for not found resources, ensure 404 - typo in k8smanager - add pylintrc - ensure manual job ares deleted when complete - fix typos, reformat	2021-08-25 18:34:49 -07:00
Ilya Kreymer	9a3356ad0d	add missing scheduler!	2021-08-25 16:18:53 -07:00
Ilya Kreymer	36fb01cbdf	docker-compose: use fixed network name	2021-08-25 16:04:34 -07:00
Ilya Kreymer	60b48ee8a6	dockermanager + scheduler: - run as child process using aioprocessing - cleanup: support cleanup of orphaned containers - timeout: support crawlTimeout via check in cleanup loop - support crawl listing + crawl stopping	2021-08-25 15:28:57 -07:00
Ilya Kreymer	b417d7c185	docker manager: support scheduling with apscheduler and separate 'scheduler' process	2021-08-25 12:21:03 -07:00
Ilya Kreymer	91e9fc8699	dockerman: initial pass - support for creating, deleting crawlconfigs, running crawls on-demand - config stored in volume - list to docker events and clean up containers when they exit	2021-08-24 22:49:06 -07:00
Ilya Kreymer	20b19f932f	make crawlTimeout a per-crawconfig property allow crawl complete/partial complete to update existing crawl state, eg. timeout enable handling backofflimitexceeded / deadlineexceeded failure, with possible success able to override the failure state filter out only active jobs in running crawls listing	2021-08-24 11:29:15 -07:00
Ilya Kreymer	ed27f3e3ee	job handling: - job watch: add watch loop for job failure (backofflimitexceeded) - set job retries + job timeout via chart values - sigterm starts graceful shutdown by default, including for timeout - use sigusr1 to switch to instant shutdown - update stop_crawl() to use new semantics	2021-08-23 21:22:01 -07:00
Ilya Kreymer	7146e054a4	crawls work (#1 ): - support listing existing crawls - add 'schedule' and 'manual' annotations to jobs, store in Crawl obj - ensure manual jobs are deleted when completed - support deleting crawls by id (but not data) - rename running crawl delete to '/cancel' change paths for local minio/mongo to /tmp	2021-08-23 18:01:29 -07:00
Ilya Kreymer	66c4e618eb	crawls work (#1 ), support for: - canceling a crawl (via sigterm) - stopping a crawl gracefully (via custom exec sigint)	2021-08-23 12:25:04 -07:00
Ilya Kreymer	a8255a76b2	crawljob: - support run once on existing crawl job - support updating/patching existing crawl job with new crawl config, new schedule and run once	2021-08-21 22:10:31 -07:00
Ilya Kreymer	ea9010bf9a	add completed crawls to crawls table	2021-08-20 23:53:06 -07:00
Ilya Kreymer	4b08163ead	support usage counters per archive, per user -- handle crawl completion	2021-08-20 23:05:42 -07:00
Ilya Kreymer	170958be37	rename crawls -> crawlconfigs.py add crawls for crawl api management	2021-08-20 15:15:51 -07:00
Ilya Kreymer	f2d9d7ba6a	new features: - sending emai for validation + invites, configured via env vars - inviting new users to join an existing archive - /crawldone webhook to track verify crawl id (next: store crawl complete entry)	2021-08-20 11:02:29 -07:00
Ilya Kreymer	627e9a6f14	cleanup crawl config, add separate 'runNow' field crawler: add cpu/memory limits minio: auto-create bucket for local minio	2021-08-19 14:15:21 -07:00
Ilya Kreymer	eaa87c8b43	support for user roles (owner, crawler, viewer), owner users can issue invites to other existing users by email to join existing archives	2021-08-18 20:35:51 -07:00
Ilya Kreymer	61a608bfbe	update models: - replace storages with archives, which have a single storage (for now) - crawls associated with archives - users below to archive, with one admin user (if archive created by default) - update crawlconfig for latest browsertrix-crawler (0.4.4) - k8s: fix permissions for crawler role - k8s: fix minio service (now requiring two ports)	2021-08-18 16:53:49 -07:00
Ilya Kreymer	f77eaccf41	support committing to s3 storage move mongo into separate optional deployment along with minio support for configuring storages support for deleting crawls, associated config and secrets	2021-07-02 15:56:24 -07:00
Ilya Kreymer	a111bacfb5	add k8s support - working apis for adding crawls, removing crawls in mongo, mapped to k8s cronjobs - more complete crawl spec - option to start on-demand job from cronjobs - optional minio in separate deployment/service	2021-06-30 21:48:44 -07:00
Ilya Kreymer	c3143df0a2	rename archives -> storages add crawlconfig apis run lint pass, prep for k8s / docker crawl manager support	2021-06-29 20:30:33 -07:00
Ilya Kreymer	b08a188fea	initial commit!	2021-06-28 15:48:59 -07:00

1 2 3 4

190 Commits