browsertrix

Author	SHA1	Message	Date
Ilya Kreymer	b9d7907ab3	Single config and env vars (#267 ) * simplify back to single config.env! - back to good ole env vars! - remove shared secret, which made it difficult to have scheduled crawls, since secrets are immutable, so could not update config if a scheduled crawl existed :/ - all env vars unified in configs/config.env - run-swarm.sh and run-pod.sh 'source' this config - remove config.sample.yaml - customize minio volume dir via config.env - customize redis port via config.env - include authsign ports in debug-ports config	2022-06-16 21:50:03 -07:00
raffaele messuti	70767f0ac2	small fixes on docker swarm deployment (#265 ) * fix COPY with multiple files * Update Deployment.md	2022-06-16 19:56:40 -07:00
sua yoo	b40765134c	Re-run crawl from crawls list view (#264 ) * run crawl from crawls list, and show link to newly started crawl * if crawl is already running, show link to previously running crawl	2022-06-15 18:54:57 -07:00
sua yoo	a8757e2e50	Screencast UX enhancements (#251 ) * animate starting state * consistent fixed-size slots for each browser (url + screencast) * add tooltip for expected number of browsers (workers x scale)	2022-06-15 18:50:14 -07:00
Ilya Kreymer	418c07bf0d	Local swarm + podman support (#261 ) * backend: refactor swarm support to also support podman (#260) - implement podman support as subclass of swarm deployment - podman is used when 'RUNTIME=podman' env var is set - podman socket is mapped instead of docker socket - podman-compose is used instead of docker-compose (though docker-compose works with podman, it does not support secrets, but podman-compose does) - separate cli utils into SwarmRunner and PodmanRunner which extends it - using config.yaml and config.env, both copied from sample versions - work on simplifying config: add docker-compose.podman.yml and docker-compose.swarm.yml and signing and debug configs in ./configs - add {build,run,stop}-{swarm,podman}.sh in scripts dir - add init-configs, only copy if configs don't exist - build local image use current version of podman, to support both podman 3.x and 4.x - additional fixes for after testing podman on centos - docs: update Deployment.md to cover swarm, podman, k8s deployment	2022-06-14 00:13:49 -07:00
Ilya Kreymer	68ec582f73	nginx simplify: (#259 ) - add custom init script for ./docker-entrypoint.d/ to setup resolver from local /etc/resolv.conf - custom init script also removes default.conf, and removes minio route if NO_MINIO_ROUTE=1 is set - assign template vars to nginx vars to avoid conflicts when interpolating - k8s: remove initContainers and volumes, now handled via custom init script in image	2022-06-13 11:53:15 -07:00
Ilya Kreymer	d16c22f45a	Merge branch 'main' into dev	2022-06-11 12:40:18 -07:00
Ilya Kreymer	9fce8cfc1d	frontend: fix missed renames	2022-06-11 12:37:24 -07:00
Ilya Kreymer	5b6aa3bc95	Affinity + Tolerations + Cleanup Crawl Job (#256 ) * k8s: add tolerations for 'nodeType=crawling:NoSchedule' to allow scheduling crawling on designated nodes for crawler and profiles jobs and statefulsets * add affinity for 'nodeType=crawling' on crawling and profile browser statefulsets * refactor crawljob: combine crawl_updater logic into base crawl_job * increment new 'crawlAttemptCount' counter crawlconfig when crawl is started, not necessarily finished, to avoid deleting configs that had attempted but not finished crawls. * better external mongodb support: use MONGO_DB_URL to set custom url directly, otherwise build from username, password and mongo host	2022-06-10 19:21:37 -07:00
sua yoo	710639365b	adjust no files message (#250 ) Change 'no files yet' -> 'no files to replay' when there are no files available for replay.	2022-06-07 22:59:34 -07:00
Ilya Kreymer	dee354f252	affinity: add affinity for k8s crawl deployments: - prefer deploy crawler, redis and job to same zone - prefer deploying crawler and job together via crawler node type, redis via redis node type (all optional)	2022-06-07 21:52:04 -07:00
Ilya Kreymer	21b1a87534	crawljob: detect crawl failure when all crawlers set their status to 'failed'	2022-06-07 21:48:58 -07:00
sua yoo	fa4b71288c	Fix watch crawl running state (#249 )	2022-06-07 12:04:35 -07:00
Ilya Kreymer	e3f268a2e8	CI setup for new swarm mode (#248 ) - build backend and frontend with cacheing using GHA cache) - streamline frontend image to reduce layers - setup local swarm with test/setup.sh script, wait for containers to init - copy sample config files as default (add storages.sample.yaml) - add initial backend test for logging in with default superadmin credentials via 127.0.0.1:9871 - must use 127.0.0.1 instead of localhost for accessing frontend container within action	2022-06-06 09:34:02 -07:00
Ilya Kreymer	0c8a5a49b4	refactor to use docker swarm for local alternative to k8s instead of docker compose (#247 ): - use python-on-whale to use docker cli api directly, creating docker stack for each crawl or profile browser - configure storages via storages.yaml secret - add crawl_job, profile_job, splitting into base and k8s/swarm implementations - split manager into base crawlmanager and k8s/swarm implementations - swarm: load initial scale from db to avoid modifying fixed configs, in k8s, load from configmap - swarm: support scheduled jobs via swarm-cronjob service - remove docker dependencies (aiodocker, apscheduler, scheduling) - swarm: when using local minio, expose via /data/ route in nginx via extra include (in k8s, include dir is empty and routing handled via ingress) - k8s: cleanup minio chart: move init containers to minio.yaml - swarm: stateful set implementation to be consistent with k8s scaling: - don't use service replicas, - create a unique service with '-N' appended and allocate unique volume for each replica - allows crawl containers to be restarted w/o losing data - add volume pruning background service, as volumes can be deleted only after service shuts down fully - watch: fully simplify routing, route via replica index instead of ip for both k8s and swarm - rename network btrix-cloud-net -> btrix-net to avoid conflict with compose network	2022-06-05 10:37:17 -07:00
Ilya Kreymer	bf79959a5a	refactoring to use statefulsets + job (#245 ) - use statefulsets instead of deployments for mongo, redis, signer - use k8s job + statefulset for running crawls - use separate statefulset for crawl (scaled) and single-replica redis stateful set - move crawl job update login to crawl_updater - remove shared redis chart package refactor: - move to shared code to 'btrixcloud' - move k8s to 'btrixcloud.k8s' - move docker to 'btrixcloud.docker'	2022-06-05 10:37:17 -07:00
Ilya Kreymer	ae51114a45	backend: fix accessing signed urls when using local minio service - signing url with endpoint_url instead of access_endpoint_url, but replace endpoint_url prefix with access_endpoint_url for access. - keep existing behavior of signing access_endpoint_url only if SIGN_ACCESS_ENDPOINT env var is set	2022-06-04 08:29:57 -07:00
sua yoo	502d687620	Enable duplicating and editing browser profile (#237 ) * ensure editing other config options does not lose profile * support adding/editing/removing profile of existing config * when duplicating config, ensure profile setting is also copied in the duplicate	2022-06-04 08:26:19 -07:00
sua yoo	0c1dc2a1d1	Show crawl replay for running crawls (#235 ) * show replay and watch at same time * add separate section for watch * only show replay if crawl has files, otherwise show 'no files' message	2022-06-04 08:19:09 -07:00
sua yoo	6a78bcd4aa	Delete browser profile (#243 ) - delete browser profile, if not in use - if in use, show error message, listing crawl configs that use the profile - backend: fix check for confirming profile deletion	2022-06-01 19:18:41 -07:00
sua yoo	9cf1ed7d4d	copy yaml (#239 )	2022-06-01 19:06:52 -07:00
Ilya Kreymer	aa1a2bf211	frontend: adjust api for websocket access checks	2022-06-01 15:08:50 -07:00
Ilya Kreymer	c023fe7c9a	Backend API prefix (#240 ) * apply /api prefix consistently, both directly through backend and when accessing via frontend, fixes #236 * docs: update local deployment docs to use 9871 instead of 8000, don't expose 8000 by default * schemas: don't include /openapi.json as /healthz in documentation, keep /healthz at root * k8s: route backend to /api without additional rewriting	2022-05-31 19:29:20 -07:00
sua yoo	2355de3067	docs: remove extra comment	2022-05-31 14:13:17 -07:00
sua yoo	6e19e854be	Fix "Run now" button (#234 )	2022-05-30 16:15:10 -07:00
Ilya Kreymer	955197579e	frontend: support multi wacz replay using the crawl json as input	2022-05-20 09:11:23 -07:00
Ilya Kreymer	3df310ee4f	Backend: Crawls with Multiple WACZ files + Profile + Misc Fixes (#232 ) * backend: k8s: - support crawls with multiple wacz files, don't assume crawl complete after first wacz uploaded - if crawl is running and has wacz file, still show as running - k8s: allow configuring node selector for main pods (eg. nodeType=main) and for crawlers (eg. nodeType=crawling) - profiles: support uploading to alternate storage specified via 'shared_profile_storage' value is set - misc fixes for profiles * backend: ensure docker run_profile api matches k8s k8s chart: don't delete pvc and pv in helm chart * dependency: bump authsign to 0.4.0 docker: disable public redis port * profiles: fix path, profile browser return value * fix typo in presigned url cacheing	2022-05-19 18:40:41 -07:00
Ilya Kreymer	cdefb8d06e	frontend: further nginx template, just rename to frontend.template -> frontend.conf.template	2022-05-13 11:29:09 -04:00
Andy Jackson	330c0347dc	frontend: ensure generated config file has correct .conf extension. (#228 )	2022-05-13 10:10:40 -04:00
Ilya Kreymer	0fab6db75e	frontend: add nginx.conf to limit worker processes (#226 ) set the number of nginx workers to 2 to avoid exceeding memory, which can happen with default worker_processes: auto due to the cpu limit setting.	2022-05-10 15:11:35 -04:00
Ilya Kreymer	ff42785410	Profiles Backend (part 2) (#224 ) * profiles: api update: - support profile deletion - support listing crawlconfigs using a profile - support using a browser to update existing profile or create new one - cleanup: move profile creation to POST, profile updates to PATCH endpoints - support updating just profile name or description - add new /navigate api to navigate browser	2022-04-24 10:23:52 -07:00
sua yoo	bda817dadd	View and edit browser profile (#218 )	2022-04-23 20:12:16 -07:00
sua yoo	f157e2031f	Filter and sort crawl templates (#217 )	2022-04-23 20:11:53 -07:00
sua yoo	cb80c6767e	hotfix: update profile ID in crawl template	2022-04-20 19:40:30 -07:00
Ilya Kreymer	38869cdd24	crawl templates: check that lastCrawlState is not null (#220 )	2022-04-20 19:17:24 -07:00
sua yoo	db27b6aaaf	View and edit browser profile (#214 )	2022-04-19 10:44:21 -07:00
sua yoo	71eec4d915	Create crawl template with browser profile (#215 )	2022-04-18 10:36:28 -07:00
Ilya Kreymer	73b8c64ba4	frontend profile browser: cover devtools sidebar with profile sidebar, add try/catch for localStorage override	2022-04-13 21:41:51 -07:00
sua yoo	f5993e8ad8	Create browser profile UI (#211 )	2022-04-13 21:11:13 -07:00
sua yoo	d2653ae835	View browser profiles in UI (#209 )	2022-04-13 21:10:22 -07:00
Ilya Kreymer	2f63c7dcf8	Profiles: Backend API + Nginx Devtools Proxy Support (#212 ) * add profile creation, list endpoints at /archives/<aid>/profiles * add profile browser creation, get, ping, commit, delete endpoints at /archives/<aid>/profiles/browser * support creation of profile browser using browsertrix-crawler 'create-login-profile' in docker and k8s * ensure profile browser expires after set time, k8s job or docker container automatically deleted on exit * profile browser creation returns temporary browser id, or `{"detail": "waiting_for_browser"}` while waiting for browser container init * nginx frontend: proxy /loadbrowser/ to port 9223 in browsertrix-crawler, connecting directly to chrome devtools * profile api auth: use redis for auth - store browserid->archiveid and browserid->browser ip mapping in redis - browser apis: ensure profile browser is associated with specified archive - browser ws: pass arcchiveid and browserid to ws query args, browserid is part of archive, and browserid corresponds to specified ip * store profiles in /profiles/ directory in default storage, include profileid in profile tar.gz filename * support profile in crawlconfig: - add profileid to CrawlConfig, and profileName to CrawlConfigOut - support resolving profile path via profileid, setting '--profile @{path/to/profile.tar.gz}' for crawler (assuming same storage for profile as output for now) in both docker and k8s setups - docker: support out_filename, custom wacz output filename missing functionality	2022-04-13 19:36:06 -07:00
sua yoo	238ee8f7ee	delete unused component file	2022-04-11 13:18:23 -07:00
sua yoo	8828681e8e	hotfix: fix crawl sort control alignment	2022-04-11 13:13:53 -07:00
sua yoo	d4b3ae3795	delete unused component file	2022-04-11 13:10:23 -07:00
sua yoo	5307138202	enable opening crawl template in new tab	2022-04-11 13:03:19 -07:00
sua yoo	f90ef071de	enable opening crawl in new tab	2022-04-11 13:03:10 -07:00
sua yoo	29b586b03f	Edit crawl config as YAML (#207 )	2022-04-06 17:40:25 -07:00
Ilya Kreymer	9a6483630e	Support for Admin interface for viewing web archives (#198 ) * backend api - superadmin has admin access to all archives - new superadmin endpoints: /archives/all/crawls and /archives/all/crawls/<crawl_id>.json for list all running crawls and loading crawl data by id - frontend superadmin view (fixes #201) * show all archives on superadmin home page * show jump to crawl for super admin (#200) * navbar links for: all archives, all running crawls and jump to crawl Co-authored-by: sua yoo <sua@suayoo.com>	2022-04-06 12:42:04 -07:00
sua yoo	ec3a77b71e	Mobile layout fixes (#206 ) closes #202	2022-03-30 15:54:25 -07:00
Ilya Kreymer	aa83d51f7a	k8s backend improvements: (#205 ) - add liveness probe for crawls, configurable via 'crawler_liveness_port' - add User system:anonymous permissions - treat jobs that have exceeded total as 'partial_complete' (experimental)	2022-03-30 14:39:06 -07:00

1 2 3 4 5

212 Commits