Commit Graph

330 Commits

Author SHA1 Message Date
Ilya Kreymer
2842ca6a06 quick fix: add back USER_ID to crawlconfig config map, needed by crawler pod 2022-08-11 17:33:12 -07:00
Ilya Kreymer
3859be009a k8s: don't add entire crawl config as env var from configmap, add only specified env vars from configmap
fix issue with crawls with large number of seeds failing due to unusually large env var
2022-08-11 15:33:22 -07:00
sua yoo
319a8a3c07
make clearer that profile selection is optional and that a default profile is used by default (#290)
- Rename 'Select Profile' -> 'Default Profile'
- Rename 'No Profiles' -> 'No Additional Profiles'
2022-08-10 15:54:39 -07:00
sua yoo
ee6161ad43
Frontend browser profile editor enhancements (#288)
- add button to duplicate profile from main view
- add save / cancel button when editing
- change location of 'full screen' button
2022-08-10 15:51:34 -07:00
Ilya Kreymer
b11a5f136a profile browser deletion/removal:
- ensure profile browser DELETE command is working
- ensure profile browser job expires if no initial ping
- logging: print exception for base job if init fails
2022-08-02 18:31:33 -07:00
Ilya Kreymer
df905682a5 backend: fix scaling api response, return error details if available 2022-06-29 18:37:04 -07:00
sua yoo
9606d59c3d
Improve format of crawl template config error from server (#281)
* better display of api errors, such as fields missing or invalid urls, addresses #280
2022-06-29 17:57:03 -07:00
sua yoo
301b05ff4e
Refactor screencast websocket connection and retry (#276)
* replace ip with index and retry connection, fixes #252
2022-06-29 17:55:32 -07:00
Ilya Kreymer
2717a60763
improvements / bug fixes for stop/cancel handling: (#279)
- only send signal if stopping, no need for canceling as pods/containers will be removed
- refactor stop/cancel handling to be unified in manager, separate in job
- when stopping / graceful shutdown, return false if sending signal fails
- return success=true in json response if and only if stop/cancel actually succeeds, return 'error' message in error, should fix #270
- allow canceling after stopping / if stopping fails
- ensure finished time is set in case of cancelation before crawl starts, should fix #273
2022-06-29 17:47:25 -07:00
sua yoo
1c52902ea0
Update crawl scale label for UI consistency (#275)
closes #254
2022-06-29 16:14:03 -07:00
Ilya Kreymer
50c525853f
validation: ensure seed urls, and other url properties, are validated on POST by using pydantic HttpUrl type, fixes #277 (#278) 2022-06-29 16:09:32 -07:00
Ilya Kreymer
3fec2a9f82
dev server: also proxy /data directory for testing replay from a remote instance locally (#266)
API_BASE_URL will need to be set to 'http://remote.example.com/' instead of 'http://remote.example.com/api/'
2022-06-29 15:47:20 -07:00
sua yoo
92292591ad
Re-run crawl from detail view + handle inactive crawl template (#268)
closes #253
2022-06-29 14:17:09 -07:00
sua yoo
d144591dbf
Display & edit crawl schedule in user local time (#271)
closes #255
2022-06-27 13:01:20 -07:00
sua yoo
c2aa4e6319
Fix AM/PM toggle (#272) 2022-06-23 16:35:47 -07:00
sua yoo
c2be1a27ce
Handle stopping state in UI (#269)
closes #262
2022-06-23 16:35:03 -07:00
Ilya Kreymer
c6d9e7d612 config sample: switch back to browsertrix-crawler:latest for now 2022-06-17 13:39:45 -07:00
Ilya Kreymer
37ea3ed2af config/scripts:
- additional fixes to signing.yml config
- add missing 'set -o allexport' before 'source'
2022-06-16 22:36:44 -07:00
Ilya Kreymer
b9d7907ab3
Single config and env vars (#267)
* simplify back to single config.env!
- back to good ole env vars!
- remove shared secret, which made it difficult to have scheduled crawls, since secrets are immutable, so could not update config if a scheduled crawl existed :/
- all env vars unified in configs/config.env - run-swarm.sh and run-pod.sh 'source' this config
- remove config.sample.yaml
- customize minio volume dir via config.env
- customize redis port via config.env
- include authsign ports in debug-ports config
2022-06-16 21:50:03 -07:00
raffaele messuti
70767f0ac2
small fixes on docker swarm deployment (#265)
* fix COPY with multiple files

* Update Deployment.md
2022-06-16 19:56:40 -07:00
sua yoo
b40765134c
Re-run crawl from crawls list view (#264)
* run crawl from crawls list, and show link to newly started crawl
* if crawl is already running, show link to previously running crawl
2022-06-15 18:54:57 -07:00
sua yoo
a8757e2e50
Screencast UX enhancements (#251)
* animate starting state
* consistent fixed-size slots for each browser (url + screencast)
* add tooltip for expected number of browsers (workers x scale)
2022-06-15 18:50:14 -07:00
Ilya Kreymer
418c07bf0d
Local swarm + podman support (#261)
* backend: refactor swarm support to also support podman (#260)
- implement podman support as subclass of swarm deployment
- podman is used when 'RUNTIME=podman' env var is set
- podman socket is mapped instead of docker socket
- podman-compose is used instead of docker-compose (though docker-compose works with podman, it does not support secrets, but podman-compose does)
- separate cli utils into SwarmRunner and PodmanRunner which extends it
- using config.yaml and config.env, both copied from sample versions
- work on simplifying config: add docker-compose.podman.yml and docker-compose.swarm.yml and signing and debug configs in ./configs
- add {build,run,stop}-{swarm,podman}.sh in scripts dir
- add init-configs, only copy if configs don't exist
- build local image use current version of podman, to support both podman 3.x and 4.x
- additional fixes for after testing podman on centos
- docs: update Deployment.md to cover swarm, podman, k8s deployment
2022-06-14 00:13:49 -07:00
Ilya Kreymer
68ec582f73
nginx simplify: (#259)
- add custom init script for ./docker-entrypoint.d/ to setup resolver from local /etc/resolv.conf
- custom init script also removes default.conf, and removes minio route if NO_MINIO_ROUTE=1 is set
- assign template vars to nginx vars to avoid conflicts when interpolating
- k8s: remove initContainers and volumes, now handled via custom init script in image
2022-06-13 11:53:15 -07:00
Ilya Kreymer
d16c22f45a Merge branch 'main' into dev 2022-06-11 12:40:18 -07:00
Ilya Kreymer
9fce8cfc1d frontend: fix missed renames 2022-06-11 12:37:24 -07:00
Ilya Kreymer
5b6aa3bc95
Affinity + Tolerations + Cleanup Crawl Job (#256)
* k8s: add tolerations for 'nodeType=crawling:NoSchedule' to allow scheduling crawling on designated nodes for crawler and profiles jobs and statefulsets
* add affinity for 'nodeType=crawling' on crawling and profile browser statefulsets
* refactor crawljob: combine crawl_updater logic into base crawl_job
* increment new 'crawlAttemptCount' counter crawlconfig when crawl is started, not necessarily finished, to avoid deleting configs that had attempted but not finished crawls.
* better external mongodb support: use MONGO_DB_URL to set custom url directly, otherwise build from username, password and mongo host
2022-06-10 19:21:37 -07:00
sua yoo
710639365b
adjust no files message (#250)
Change 'no files yet' -> 'no files to replay' when there are no files available for replay.
2022-06-07 22:59:34 -07:00
Ilya Kreymer
dee354f252 affinity: add affinity for k8s crawl deployments:
- prefer deploy crawler, redis and job to same zone
- prefer deploying crawler and job together via crawler node type, redis via redis node type (all optional)
2022-06-07 21:52:04 -07:00
Ilya Kreymer
21b1a87534 crawljob: detect crawl failure when all crawlers set their status to 'failed' 2022-06-07 21:48:58 -07:00
sua yoo
fa4b71288c
Fix watch crawl running state (#249) 2022-06-07 12:04:35 -07:00
Ilya Kreymer
e3f268a2e8
CI setup for new swarm mode (#248)
- build backend and frontend with cacheing using GHA cache)
- streamline frontend image to reduce layers
- setup local swarm with test/setup.sh script, wait for containers to init
- copy sample config files as default (add storages.sample.yaml)
- add initial backend test for logging in with default superadmin credentials via 127.0.0.1:9871
- must use 127.0.0.1 instead of localhost for accessing frontend container within action
2022-06-06 09:34:02 -07:00
Ilya Kreymer
0c8a5a49b4 refactor to use docker swarm for local alternative to k8s instead of docker compose (#247):
- use python-on-whale to use docker cli api directly, creating docker stack for each crawl or profile browser
- configure storages via storages.yaml secret
- add crawl_job, profile_job, splitting into base and k8s/swarm implementations
- split manager into base crawlmanager and k8s/swarm implementations
- swarm: load initial scale from db to avoid modifying fixed configs, in k8s, load from configmap
- swarm: support scheduled jobs via swarm-cronjob service
- remove docker dependencies (aiodocker, apscheduler, scheduling)
- swarm: when using local minio, expose via /data/ route in nginx via extra include (in k8s, include dir is empty and routing handled via ingress)
- k8s: cleanup minio chart: move init containers to minio.yaml
- swarm: stateful set implementation to be consistent with k8s scaling:
  - don't use service replicas,
  - create a unique service with '-N' appended and allocate unique volume for each replica
  - allows crawl containers to be restarted w/o losing data
- add volume pruning background service, as volumes can be deleted only after service shuts down fully
- watch: fully simplify routing, route via replica index instead of ip for both k8s and swarm
- rename network btrix-cloud-net -> btrix-net to avoid conflict with compose network
2022-06-05 10:37:17 -07:00
Ilya Kreymer
bf79959a5a refactoring to use statefulsets + job (#245)
- use statefulsets instead of deployments for mongo, redis, signer
- use k8s job + statefulset for running crawls
- use separate statefulset for crawl (scaled) and single-replica redis stateful set
- move crawl job update login to crawl_updater
- remove shared redis chart

package refactor:
- move to shared code to 'btrixcloud'
- move k8s to 'btrixcloud.k8s'
- move docker to 'btrixcloud.docker'
2022-06-05 10:37:17 -07:00
Ilya Kreymer
ae51114a45
backend: fix accessing signed urls when using local minio service
- signing url with endpoint_url instead of access_endpoint_url, but replace endpoint_url prefix with access_endpoint_url for access.
- keep existing behavior of signing access_endpoint_url only if SIGN_ACCESS_ENDPOINT env var is set
2022-06-04 08:29:57 -07:00
sua yoo
502d687620
Enable duplicating and editing browser profile (#237)
* ensure editing other config options does not lose profile
* support adding/editing/removing profile of existing config
* when duplicating config, ensure profile setting is also copied in the duplicate
2022-06-04 08:26:19 -07:00
sua yoo
0c1dc2a1d1
Show crawl replay for running crawls (#235)
* show replay and watch at same time

* add separate section for watch

* only show replay if crawl has files, otherwise show 'no files' message
2022-06-04 08:19:09 -07:00
sua yoo
6a78bcd4aa
Delete browser profile (#243)
- delete browser profile, if not in use
- if in use, show error message, listing crawl configs that use the profile
- backend: fix check for confirming profile deletion
2022-06-01 19:18:41 -07:00
sua yoo
9cf1ed7d4d
copy yaml (#239) 2022-06-01 19:06:52 -07:00
Ilya Kreymer
aa1a2bf211 frontend: adjust api for websocket access checks 2022-06-01 15:08:50 -07:00
Ilya Kreymer
c023fe7c9a
Backend API prefix (#240)
* apply /api prefix consistently, both directly through backend and when accessing via frontend, fixes #236

* docs: update local deployment docs to use 9871 instead of 8000, don't expose 8000 by default

* schemas: don't include /openapi.json as /healthz in documentation, keep /healthz at root

* k8s: route backend to /api without additional rewriting
2022-05-31 19:29:20 -07:00
sua yoo
2355de3067
docs: remove extra comment 2022-05-31 14:13:17 -07:00
sua yoo
6e19e854be
Fix "Run now" button (#234) 2022-05-30 16:15:10 -07:00
Ilya Kreymer
955197579e frontend: support multi wacz replay using the crawl json as input 2022-05-20 09:11:23 -07:00
Ilya Kreymer
3df310ee4f
Backend: Crawls with Multiple WACZ files + Profile + Misc Fixes (#232)
* backend: k8s:
- support crawls with multiple wacz files, don't assume crawl complete after first wacz uploaded
- if crawl is running and has wacz file, still show as running
- k8s: allow configuring node selector for main pods (eg. nodeType=main) and for crawlers (eg. nodeType=crawling)
- profiles: support uploading to alternate storage specified via 'shared_profile_storage' value is set
- misc fixes for profiles

* backend: ensure docker run_profile api matches k8s
k8s chart: don't delete pvc and pv in helm chart

* dependency: bump authsign to 0.4.0
docker: disable public redis port

* profiles: fix path, profile browser return value

* fix typo in presigned url cacheing
2022-05-19 18:40:41 -07:00
Ilya Kreymer
cdefb8d06e frontend: further nginx template, just rename to frontend.template -> frontend.conf.template 2022-05-13 11:29:09 -04:00
Andy Jackson
330c0347dc
frontend: ensure generated config file has correct .conf extension. (#228) 2022-05-13 10:10:40 -04:00
Ilya Kreymer
0fab6db75e
frontend: add nginx.conf to limit worker processes (#226)
set the number of nginx workers to 2 to avoid exceeding memory, which can happen with default worker_processes: auto due to the cpu limit setting.
2022-05-10 15:11:35 -04:00
Ilya Kreymer
ff42785410
Profiles Backend (part 2) (#224)
* profiles: api update:
- support profile deletion
- support listing crawlconfigs using a profile
- support using a browser to update existing profile or create new one
- cleanup: move profile creation to POST, profile updates to PATCH endpoints
- support updating just profile name or description
- add new /navigate api to navigate browser
2022-04-24 10:23:52 -07:00
sua yoo
bda817dadd
View and edit browser profile (#218) 2022-04-23 20:12:16 -07:00