Commit Graph

964 Commits

Author SHA1 Message Date
Ilya Kreymer
c6d9e7d612 config sample: switch back to browsertrix-crawler:latest for now 2022-06-17 13:39:45 -07:00
Ilya Kreymer
37ea3ed2af config/scripts:
- additional fixes to signing.yml config
- add missing 'set -o allexport' before 'source'
2022-06-16 22:36:44 -07:00
Ilya Kreymer
b9d7907ab3
Single config and env vars (#267)
* simplify back to single config.env!
- back to good ole env vars!
- remove shared secret, which made it difficult to have scheduled crawls, since secrets are immutable, so could not update config if a scheduled crawl existed :/
- all env vars unified in configs/config.env - run-swarm.sh and run-pod.sh 'source' this config
- remove config.sample.yaml
- customize minio volume dir via config.env
- customize redis port via config.env
- include authsign ports in debug-ports config
2022-06-16 21:50:03 -07:00
raffaele messuti
70767f0ac2
small fixes on docker swarm deployment (#265)
* fix COPY with multiple files

* Update Deployment.md
2022-06-16 19:56:40 -07:00
sua yoo
b40765134c
Re-run crawl from crawls list view (#264)
* run crawl from crawls list, and show link to newly started crawl
* if crawl is already running, show link to previously running crawl
2022-06-15 18:54:57 -07:00
sua yoo
a8757e2e50
Screencast UX enhancements (#251)
* animate starting state
* consistent fixed-size slots for each browser (url + screencast)
* add tooltip for expected number of browsers (workers x scale)
2022-06-15 18:50:14 -07:00
Ilya Kreymer
418c07bf0d
Local swarm + podman support (#261)
* backend: refactor swarm support to also support podman (#260)
- implement podman support as subclass of swarm deployment
- podman is used when 'RUNTIME=podman' env var is set
- podman socket is mapped instead of docker socket
- podman-compose is used instead of docker-compose (though docker-compose works with podman, it does not support secrets, but podman-compose does)
- separate cli utils into SwarmRunner and PodmanRunner which extends it
- using config.yaml and config.env, both copied from sample versions
- work on simplifying config: add docker-compose.podman.yml and docker-compose.swarm.yml and signing and debug configs in ./configs
- add {build,run,stop}-{swarm,podman}.sh in scripts dir
- add init-configs, only copy if configs don't exist
- build local image use current version of podman, to support both podman 3.x and 4.x
- additional fixes for after testing podman on centos
- docs: update Deployment.md to cover swarm, podman, k8s deployment
2022-06-14 00:13:49 -07:00
Ilya Kreymer
68ec582f73
nginx simplify: (#259)
- add custom init script for ./docker-entrypoint.d/ to setup resolver from local /etc/resolv.conf
- custom init script also removes default.conf, and removes minio route if NO_MINIO_ROUTE=1 is set
- assign template vars to nginx vars to avoid conflicts when interpolating
- k8s: remove initContainers and volumes, now handled via custom init script in image
2022-06-13 11:53:15 -07:00
Ilya Kreymer
d16c22f45a Merge branch 'main' into dev 2022-06-11 12:40:18 -07:00
Ilya Kreymer
9fce8cfc1d frontend: fix missed renames 2022-06-11 12:37:24 -07:00
Ilya Kreymer
5b6aa3bc95
Affinity + Tolerations + Cleanup Crawl Job (#256)
* k8s: add tolerations for 'nodeType=crawling:NoSchedule' to allow scheduling crawling on designated nodes for crawler and profiles jobs and statefulsets
* add affinity for 'nodeType=crawling' on crawling and profile browser statefulsets
* refactor crawljob: combine crawl_updater logic into base crawl_job
* increment new 'crawlAttemptCount' counter crawlconfig when crawl is started, not necessarily finished, to avoid deleting configs that had attempted but not finished crawls.
* better external mongodb support: use MONGO_DB_URL to set custom url directly, otherwise build from username, password and mongo host
2022-06-10 19:21:37 -07:00
sua yoo
710639365b
adjust no files message (#250)
Change 'no files yet' -> 'no files to replay' when there are no files available for replay.
2022-06-07 22:59:34 -07:00
Ilya Kreymer
dee354f252 affinity: add affinity for k8s crawl deployments:
- prefer deploy crawler, redis and job to same zone
- prefer deploying crawler and job together via crawler node type, redis via redis node type (all optional)
2022-06-07 21:52:04 -07:00
Ilya Kreymer
21b1a87534 crawljob: detect crawl failure when all crawlers set their status to 'failed' 2022-06-07 21:48:58 -07:00
sua yoo
fa4b71288c
Fix watch crawl running state (#249) 2022-06-07 12:04:35 -07:00
Ilya Kreymer
e3f268a2e8
CI setup for new swarm mode (#248)
- build backend and frontend with cacheing using GHA cache)
- streamline frontend image to reduce layers
- setup local swarm with test/setup.sh script, wait for containers to init
- copy sample config files as default (add storages.sample.yaml)
- add initial backend test for logging in with default superadmin credentials via 127.0.0.1:9871
- must use 127.0.0.1 instead of localhost for accessing frontend container within action
2022-06-06 09:34:02 -07:00
Ilya Kreymer
0c8a5a49b4 refactor to use docker swarm for local alternative to k8s instead of docker compose (#247):
- use python-on-whale to use docker cli api directly, creating docker stack for each crawl or profile browser
- configure storages via storages.yaml secret
- add crawl_job, profile_job, splitting into base and k8s/swarm implementations
- split manager into base crawlmanager and k8s/swarm implementations
- swarm: load initial scale from db to avoid modifying fixed configs, in k8s, load from configmap
- swarm: support scheduled jobs via swarm-cronjob service
- remove docker dependencies (aiodocker, apscheduler, scheduling)
- swarm: when using local minio, expose via /data/ route in nginx via extra include (in k8s, include dir is empty and routing handled via ingress)
- k8s: cleanup minio chart: move init containers to minio.yaml
- swarm: stateful set implementation to be consistent with k8s scaling:
  - don't use service replicas,
  - create a unique service with '-N' appended and allocate unique volume for each replica
  - allows crawl containers to be restarted w/o losing data
- add volume pruning background service, as volumes can be deleted only after service shuts down fully
- watch: fully simplify routing, route via replica index instead of ip for both k8s and swarm
- rename network btrix-cloud-net -> btrix-net to avoid conflict with compose network
2022-06-05 10:37:17 -07:00
Ilya Kreymer
bf79959a5a refactoring to use statefulsets + job (#245)
- use statefulsets instead of deployments for mongo, redis, signer
- use k8s job + statefulset for running crawls
- use separate statefulset for crawl (scaled) and single-replica redis stateful set
- move crawl job update login to crawl_updater
- remove shared redis chart

package refactor:
- move to shared code to 'btrixcloud'
- move k8s to 'btrixcloud.k8s'
- move docker to 'btrixcloud.docker'
2022-06-05 10:37:17 -07:00
Ilya Kreymer
ae51114a45
backend: fix accessing signed urls when using local minio service
- signing url with endpoint_url instead of access_endpoint_url, but replace endpoint_url prefix with access_endpoint_url for access.
- keep existing behavior of signing access_endpoint_url only if SIGN_ACCESS_ENDPOINT env var is set
2022-06-04 08:29:57 -07:00
sua yoo
502d687620
Enable duplicating and editing browser profile (#237)
* ensure editing other config options does not lose profile
* support adding/editing/removing profile of existing config
* when duplicating config, ensure profile setting is also copied in the duplicate
2022-06-04 08:26:19 -07:00
sua yoo
0c1dc2a1d1
Show crawl replay for running crawls (#235)
* show replay and watch at same time

* add separate section for watch

* only show replay if crawl has files, otherwise show 'no files' message
2022-06-04 08:19:09 -07:00
sua yoo
6a78bcd4aa
Delete browser profile (#243)
- delete browser profile, if not in use
- if in use, show error message, listing crawl configs that use the profile
- backend: fix check for confirming profile deletion
2022-06-01 19:18:41 -07:00
sua yoo
9cf1ed7d4d
copy yaml (#239) 2022-06-01 19:06:52 -07:00
Ilya Kreymer
aa1a2bf211 frontend: adjust api for websocket access checks 2022-06-01 15:08:50 -07:00
Ilya Kreymer
c023fe7c9a
Backend API prefix (#240)
* apply /api prefix consistently, both directly through backend and when accessing via frontend, fixes #236

* docs: update local deployment docs to use 9871 instead of 8000, don't expose 8000 by default

* schemas: don't include /openapi.json as /healthz in documentation, keep /healthz at root

* k8s: route backend to /api without additional rewriting
2022-05-31 19:29:20 -07:00
sua yoo
2355de3067
docs: remove extra comment 2022-05-31 14:13:17 -07:00
sua yoo
6e19e854be
Fix "Run now" button (#234) 2022-05-30 16:15:10 -07:00
Ilya Kreymer
955197579e frontend: support multi wacz replay using the crawl json as input 2022-05-20 09:11:23 -07:00
Ilya Kreymer
3df310ee4f
Backend: Crawls with Multiple WACZ files + Profile + Misc Fixes (#232)
* backend: k8s:
- support crawls with multiple wacz files, don't assume crawl complete after first wacz uploaded
- if crawl is running and has wacz file, still show as running
- k8s: allow configuring node selector for main pods (eg. nodeType=main) and for crawlers (eg. nodeType=crawling)
- profiles: support uploading to alternate storage specified via 'shared_profile_storage' value is set
- misc fixes for profiles

* backend: ensure docker run_profile api matches k8s
k8s chart: don't delete pvc and pv in helm chart

* dependency: bump authsign to 0.4.0
docker: disable public redis port

* profiles: fix path, profile browser return value

* fix typo in presigned url cacheing
2022-05-19 18:40:41 -07:00
Ilya Kreymer
cdefb8d06e frontend: further nginx template, just rename to frontend.template -> frontend.conf.template 2022-05-13 11:29:09 -04:00
Andy Jackson
330c0347dc
frontend: ensure generated config file has correct .conf extension. (#228) 2022-05-13 10:10:40 -04:00
Ilya Kreymer
0fab6db75e
frontend: add nginx.conf to limit worker processes (#226)
set the number of nginx workers to 2 to avoid exceeding memory, which can happen with default worker_processes: auto due to the cpu limit setting.
2022-05-10 15:11:35 -04:00
Ilya Kreymer
ff42785410
Profiles Backend (part 2) (#224)
* profiles: api update:
- support profile deletion
- support listing crawlconfigs using a profile
- support using a browser to update existing profile or create new one
- cleanup: move profile creation to POST, profile updates to PATCH endpoints
- support updating just profile name or description
- add new /navigate api to navigate browser
2022-04-24 10:23:52 -07:00
sua yoo
bda817dadd
View and edit browser profile (#218) 2022-04-23 20:12:16 -07:00
sua yoo
f157e2031f
Filter and sort crawl templates (#217) 2022-04-23 20:11:53 -07:00
sua yoo
cb80c6767e
hotfix: update profile ID in crawl template 2022-04-20 19:40:30 -07:00
Ilya Kreymer
38869cdd24
crawl templates: check that lastCrawlState is not null (#220) 2022-04-20 19:17:24 -07:00
sua yoo
db27b6aaaf
View and edit browser profile (#214) 2022-04-19 10:44:21 -07:00
sua yoo
71eec4d915
Create crawl template with browser profile (#215) 2022-04-18 10:36:28 -07:00
Ilya Kreymer
73b8c64ba4 frontend profile browser: cover devtools sidebar with profile sidebar, add try/catch for localStorage override 2022-04-13 21:41:51 -07:00
sua yoo
f5993e8ad8
Create browser profile UI (#211) 2022-04-13 21:11:13 -07:00
sua yoo
d2653ae835
View browser profiles in UI (#209) 2022-04-13 21:10:22 -07:00
Ilya Kreymer
2f63c7dcf8
Profiles: Backend API + Nginx Devtools Proxy Support (#212)
* add profile creation, list endpoints at /archives/<aid>/profiles
* add profile browser creation, get, ping, commit, delete endpoints at /archives/<aid>/profiles/browser
* support creation of profile browser using browsertrix-crawler 'create-login-profile' in docker and k8s
* ensure profile browser expires after set time, k8s job or docker container automatically deleted on exit
* profile browser creation returns temporary browser id, or `{"detail": "waiting_for_browser"}` while waiting for browser container init
* nginx frontend: proxy /loadbrowser/ to port 9223 in browsertrix-crawler, connecting directly to chrome devtools
* profile api auth: use redis for auth
- store browserid->archiveid and browserid->browser ip mapping in redis
- browser apis: ensure profile browser is associated with specified archive
- browser ws: pass arcchiveid and browserid to ws query args, browserid is part of archive, and browserid corresponds to specified ip
* store profiles in /profiles/ directory in default storage, include profileid in profile tar.gz filename

* support profile in crawlconfig:
- add profileid to CrawlConfig, and profileName to CrawlConfigOut
- support resolving profile path via profileid, setting '--profile @{path/to/profile.tar.gz}' for crawler (assuming same storage for profile as output for now) in both docker and k8s setups
- docker: support out_filename, custom wacz output filename missing functionality
2022-04-13 19:36:06 -07:00
sua yoo
238ee8f7ee
delete unused component file 2022-04-11 13:18:23 -07:00
sua yoo
8828681e8e
hotfix: fix crawl sort control alignment 2022-04-11 13:13:53 -07:00
sua yoo
d4b3ae3795
delete unused component file 2022-04-11 13:10:23 -07:00
sua yoo
5307138202
enable opening crawl template in new tab 2022-04-11 13:03:19 -07:00
sua yoo
f90ef071de
enable opening crawl in new tab 2022-04-11 13:03:10 -07:00
sua yoo
29b586b03f
Edit crawl config as YAML (#207) 2022-04-06 17:40:25 -07:00
Ilya Kreymer
9a6483630e
Support for Admin interface for viewing web archives (#198)
* backend api
- superadmin has admin access to all archives
- new superadmin endpoints: /archives/all/crawls and /archives/all/crawls/<crawl_id>.json for list all running crawls
and loading crawl data by id

- frontend superadmin view (fixes #201)
* show all archives on superadmin home page
* show jump to crawl for super admin (#200)
* navbar links for: all archives, all running crawls and jump to crawl

Co-authored-by: sua yoo <sua@suayoo.com>
2022-04-06 12:42:04 -07:00