browsertrix

Author	SHA1	Message	Date
Ilya Kreymer	0c8a5a49b4	refactor to use docker swarm for local alternative to k8s instead of docker compose (#247 ): - use python-on-whale to use docker cli api directly, creating docker stack for each crawl or profile browser - configure storages via storages.yaml secret - add crawl_job, profile_job, splitting into base and k8s/swarm implementations - split manager into base crawlmanager and k8s/swarm implementations - swarm: load initial scale from db to avoid modifying fixed configs, in k8s, load from configmap - swarm: support scheduled jobs via swarm-cronjob service - remove docker dependencies (aiodocker, apscheduler, scheduling) - swarm: when using local minio, expose via /data/ route in nginx via extra include (in k8s, include dir is empty and routing handled via ingress) - k8s: cleanup minio chart: move init containers to minio.yaml - swarm: stateful set implementation to be consistent with k8s scaling: - don't use service replicas, - create a unique service with '-N' appended and allocate unique volume for each replica - allows crawl containers to be restarted w/o losing data - add volume pruning background service, as volumes can be deleted only after service shuts down fully - watch: fully simplify routing, route via replica index instead of ip for both k8s and swarm - rename network btrix-cloud-net -> btrix-net to avoid conflict with compose network	2022-06-05 10:37:17 -07:00
Ilya Kreymer	bf79959a5a	refactoring to use statefulsets + job (#245 ) - use statefulsets instead of deployments for mongo, redis, signer - use k8s job + statefulset for running crawls - use separate statefulset for crawl (scaled) and single-replica redis stateful set - move crawl job update login to crawl_updater - remove shared redis chart package refactor: - move to shared code to 'btrixcloud' - move k8s to 'btrixcloud.k8s' - move docker to 'btrixcloud.docker'	2022-06-05 10:37:17 -07:00
Ilya Kreymer	ae51114a45	backend: fix accessing signed urls when using local minio service - signing url with endpoint_url instead of access_endpoint_url, but replace endpoint_url prefix with access_endpoint_url for access. - keep existing behavior of signing access_endpoint_url only if SIGN_ACCESS_ENDPOINT env var is set	2022-06-04 08:29:57 -07:00
sua yoo	502d687620	Enable duplicating and editing browser profile (#237 ) * ensure editing other config options does not lose profile * support adding/editing/removing profile of existing config * when duplicating config, ensure profile setting is also copied in the duplicate	2022-06-04 08:26:19 -07:00
sua yoo	0c1dc2a1d1	Show crawl replay for running crawls (#235 ) * show replay and watch at same time * add separate section for watch * only show replay if crawl has files, otherwise show 'no files' message	2022-06-04 08:19:09 -07:00
sua yoo	6a78bcd4aa	Delete browser profile (#243 ) - delete browser profile, if not in use - if in use, show error message, listing crawl configs that use the profile - backend: fix check for confirming profile deletion	2022-06-01 19:18:41 -07:00
sua yoo	9cf1ed7d4d	copy yaml (#239 )	2022-06-01 19:06:52 -07:00
Ilya Kreymer	aa1a2bf211	frontend: adjust api for websocket access checks	2022-06-01 15:08:50 -07:00
Ilya Kreymer	c023fe7c9a	Backend API prefix (#240 ) * apply /api prefix consistently, both directly through backend and when accessing via frontend, fixes #236 * docs: update local deployment docs to use 9871 instead of 8000, don't expose 8000 by default * schemas: don't include /openapi.json as /healthz in documentation, keep /healthz at root * k8s: route backend to /api without additional rewriting	2022-05-31 19:29:20 -07:00
sua yoo	2355de3067	docs: remove extra comment	2022-05-31 14:13:17 -07:00
sua yoo	6e19e854be	Fix "Run now" button (#234 )	2022-05-30 16:15:10 -07:00
Ilya Kreymer	955197579e	frontend: support multi wacz replay using the crawl json as input	2022-05-20 09:11:23 -07:00
Ilya Kreymer	3df310ee4f	Backend: Crawls with Multiple WACZ files + Profile + Misc Fixes (#232 ) * backend: k8s: - support crawls with multiple wacz files, don't assume crawl complete after first wacz uploaded - if crawl is running and has wacz file, still show as running - k8s: allow configuring node selector for main pods (eg. nodeType=main) and for crawlers (eg. nodeType=crawling) - profiles: support uploading to alternate storage specified via 'shared_profile_storage' value is set - misc fixes for profiles * backend: ensure docker run_profile api matches k8s k8s chart: don't delete pvc and pv in helm chart * dependency: bump authsign to 0.4.0 docker: disable public redis port * profiles: fix path, profile browser return value * fix typo in presigned url cacheing	2022-05-19 18:40:41 -07:00
Ilya Kreymer	cdefb8d06e	frontend: further nginx template, just rename to frontend.template -> frontend.conf.template	2022-05-13 11:29:09 -04:00
Andy Jackson	330c0347dc	frontend: ensure generated config file has correct .conf extension. (#228 )	2022-05-13 10:10:40 -04:00
Ilya Kreymer	0fab6db75e	frontend: add nginx.conf to limit worker processes (#226 ) set the number of nginx workers to 2 to avoid exceeding memory, which can happen with default worker_processes: auto due to the cpu limit setting.	2022-05-10 15:11:35 -04:00
Ilya Kreymer	ff42785410	Profiles Backend (part 2) (#224 ) * profiles: api update: - support profile deletion - support listing crawlconfigs using a profile - support using a browser to update existing profile or create new one - cleanup: move profile creation to POST, profile updates to PATCH endpoints - support updating just profile name or description - add new /navigate api to navigate browser	2022-04-24 10:23:52 -07:00
sua yoo	bda817dadd	View and edit browser profile (#218 )	2022-04-23 20:12:16 -07:00
sua yoo	f157e2031f	Filter and sort crawl templates (#217 )	2022-04-23 20:11:53 -07:00
sua yoo	cb80c6767e	hotfix: update profile ID in crawl template	2022-04-20 19:40:30 -07:00
Ilya Kreymer	38869cdd24	crawl templates: check that lastCrawlState is not null (#220 )	2022-04-20 19:17:24 -07:00
sua yoo	db27b6aaaf	View and edit browser profile (#214 )	2022-04-19 10:44:21 -07:00
sua yoo	71eec4d915	Create crawl template with browser profile (#215 )	2022-04-18 10:36:28 -07:00
Ilya Kreymer	73b8c64ba4	frontend profile browser: cover devtools sidebar with profile sidebar, add try/catch for localStorage override	2022-04-13 21:41:51 -07:00
sua yoo	f5993e8ad8	Create browser profile UI (#211 )	2022-04-13 21:11:13 -07:00
sua yoo	d2653ae835	View browser profiles in UI (#209 )	2022-04-13 21:10:22 -07:00
Ilya Kreymer	2f63c7dcf8	Profiles: Backend API + Nginx Devtools Proxy Support (#212 ) * add profile creation, list endpoints at /archives/<aid>/profiles * add profile browser creation, get, ping, commit, delete endpoints at /archives/<aid>/profiles/browser * support creation of profile browser using browsertrix-crawler 'create-login-profile' in docker and k8s * ensure profile browser expires after set time, k8s job or docker container automatically deleted on exit * profile browser creation returns temporary browser id, or `{"detail": "waiting_for_browser"}` while waiting for browser container init * nginx frontend: proxy /loadbrowser/ to port 9223 in browsertrix-crawler, connecting directly to chrome devtools * profile api auth: use redis for auth - store browserid->archiveid and browserid->browser ip mapping in redis - browser apis: ensure profile browser is associated with specified archive - browser ws: pass arcchiveid and browserid to ws query args, browserid is part of archive, and browserid corresponds to specified ip * store profiles in /profiles/ directory in default storage, include profileid in profile tar.gz filename * support profile in crawlconfig: - add profileid to CrawlConfig, and profileName to CrawlConfigOut - support resolving profile path via profileid, setting '--profile @{path/to/profile.tar.gz}' for crawler (assuming same storage for profile as output for now) in both docker and k8s setups - docker: support out_filename, custom wacz output filename missing functionality	2022-04-13 19:36:06 -07:00
sua yoo	238ee8f7ee	delete unused component file	2022-04-11 13:18:23 -07:00
sua yoo	8828681e8e	hotfix: fix crawl sort control alignment	2022-04-11 13:13:53 -07:00
sua yoo	d4b3ae3795	delete unused component file	2022-04-11 13:10:23 -07:00
sua yoo	5307138202	enable opening crawl template in new tab	2022-04-11 13:03:19 -07:00
sua yoo	f90ef071de	enable opening crawl in new tab	2022-04-11 13:03:10 -07:00
sua yoo	29b586b03f	Edit crawl config as YAML (#207 )	2022-04-06 17:40:25 -07:00
Ilya Kreymer	9a6483630e	Support for Admin interface for viewing web archives (#198 ) * backend api - superadmin has admin access to all archives - new superadmin endpoints: /archives/all/crawls and /archives/all/crawls/<crawl_id>.json for list all running crawls and loading crawl data by id - frontend superadmin view (fixes #201) * show all archives on superadmin home page * show jump to crawl for super admin (#200) * navbar links for: all archives, all running crawls and jump to crawl Co-authored-by: sua yoo <sua@suayoo.com>	2022-04-06 12:42:04 -07:00
sua yoo	ec3a77b71e	Mobile layout fixes (#206 ) closes #202	2022-03-30 15:54:25 -07:00
Ilya Kreymer	aa83d51f7a	k8s backend improvements: (#205 ) - add liveness probe for crawls, configurable via 'crawler_liveness_port' - add User system:anonymous permissions - treat jobs that have exceeded total as 'partial_complete' (experimental)	2022-03-30 14:39:06 -07:00
sua yoo	9e2274f612	remove temp file	2022-03-30 13:51:02 -07:00
Ilya Kreymer	9e45dc35d2	minor frontend-tweaks: (#196 ) * frontend-tweaks: - treat 'starting' state same as 'running' - default to no schedule instead of weekly for default - add 'Domain' scopeType * backend: also allow 'domain' as a scopeType	2022-03-15 21:19:23 -07:00
sua yoo	8863776c54	Define websocket host in common webpack config (#195 ) * move websocket host var to common config, better fix for #193	2022-03-15 18:34:49 -07:00
Ilya Kreymer	e6467c3374	backend work: - support {configname}-{username}-@ts-@hostsuffix.wacz as output filename, sanitize username and config name - support returning 'starting' for crawl status if no ips or 0/0 pages found. - fix updating scale via POST crawlconfig update - fix duplicate user error on superuser init	2022-03-15 18:20:25 -07:00
Ilya Kreymer	4b2f89db91	k8s: support for using a pre-made persistent volume/claim for crawling, configurable via CRAWLER_PV_CLAIM, otherwise using emptyDir k8s: ability to set deployment scale for frontend as well	2022-03-15 11:18:23 -07:00
Ilya Kreymer	912004751d	quickfix: partial mitigation for #193 , use current host for websock address	2022-03-14 15:29:35 -07:00
Ilya Kreymer	8ce7a9802b	backend quick fix: chart/config: use screencastPort, fixed collection name k8s: set pod to never restart to see logs	2022-03-14 11:42:53 -07:00
sua yoo	6fabea3e7a	Frontend build fixes (#191 ) * copy specific files * replace api host env var * remove unused dotenv * Update frontend/webpack.dev.js Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>	2022-03-10 23:26:21 -08:00
sua yoo	4190e40964	Show last crawl state in UI (#192 ) * update crawl list status * show on detail page	2022-03-10 23:25:42 -08:00
Ilya Kreymer	9c99d67b1d	quickfix: backend: docker: fix loading ips for watch	2022-03-04 17:12:19 -08:00
sua yoo	edf6b9ded7	Update home page routing (#186 ) closes #183	2022-03-04 16:18:41 -08:00
sua yoo	0fe54653be	Fix unable to save edits to simple view (#185 )	2022-03-04 16:17:57 -08:00
sua yoo	f2f67c34af	Copy extra hops value when duplicating crawl config (#184 ) closes #158	2022-03-04 16:17:37 -08:00
sua yoo	4383c5e8d8	Disable error tracking in prod (#182 ) closes #161	2022-03-04 16:17:05 -08:00

1 2 3 4

198 Commits