- Adds version to version.txt in root
- adds update-version.sh which updates version in frontend/package.json and backend/btrixcloud/version.py
- frontend: loads version from $VERSION env var, ../version.txt or package.json
- ci: on new github release, pushes webrecorder/browsertrix-backend and webrecorder/browsertrix-frontend images to Dockerhub with current version, as well as latest.
- version set to 1.1.0-beta.0
- closes#357
- including pagination of queue results (30 results per page currently)
- show numbering on paginated results
- allow user navigation to each result page
Smoother elapsed crawl timer:
- Crawls list: show seconds increment up to 2 minutes, then show minutes only
- Crawls detail: show seconds increment up to one day
- only send signal if stopping, no need for canceling as pods/containers will be removed
- refactor stop/cancel handling to be unified in manager, separate in job
- when stopping / graceful shutdown, return false if sending signal fails
- return success=true in json response if and only if stop/cancel actually succeeds, return 'error' message in error, should fix#270
- allow canceling after stopping / if stopping fails
- ensure finished time is set in case of cancelation before crawl starts, should fix#273
* animate starting state
* consistent fixed-size slots for each browser (url + screencast)
* add tooltip for expected number of browsers (workers x scale)
* backend: refactor swarm support to also support podman (#260)
- implement podman support as subclass of swarm deployment
- podman is used when 'RUNTIME=podman' env var is set
- podman socket is mapped instead of docker socket
- podman-compose is used instead of docker-compose (though docker-compose works with podman, it does not support secrets, but podman-compose does)
- separate cli utils into SwarmRunner and PodmanRunner which extends it
- using config.yaml and config.env, both copied from sample versions
- work on simplifying config: add docker-compose.podman.yml and docker-compose.swarm.yml and signing and debug configs in ./configs
- add {build,run,stop}-{swarm,podman}.sh in scripts dir
- add init-configs, only copy if configs don't exist
- build local image use current version of podman, to support both podman 3.x and 4.x
- additional fixes for after testing podman on centos
- docs: update Deployment.md to cover swarm, podman, k8s deployment
- add custom init script for ./docker-entrypoint.d/ to setup resolver from local /etc/resolv.conf
- custom init script also removes default.conf, and removes minio route if NO_MINIO_ROUTE=1 is set
- assign template vars to nginx vars to avoid conflicts when interpolating
- k8s: remove initContainers and volumes, now handled via custom init script in image
- build backend and frontend with cacheing using GHA cache)
- streamline frontend image to reduce layers
- setup local swarm with test/setup.sh script, wait for containers to init
- copy sample config files as default (add storages.sample.yaml)
- add initial backend test for logging in with default superadmin credentials via 127.0.0.1:9871
- must use 127.0.0.1 instead of localhost for accessing frontend container within action
- use python-on-whale to use docker cli api directly, creating docker stack for each crawl or profile browser
- configure storages via storages.yaml secret
- add crawl_job, profile_job, splitting into base and k8s/swarm implementations
- split manager into base crawlmanager and k8s/swarm implementations
- swarm: load initial scale from db to avoid modifying fixed configs, in k8s, load from configmap
- swarm: support scheduled jobs via swarm-cronjob service
- remove docker dependencies (aiodocker, apscheduler, scheduling)
- swarm: when using local minio, expose via /data/ route in nginx via extra include (in k8s, include dir is empty and routing handled via ingress)
- k8s: cleanup minio chart: move init containers to minio.yaml
- swarm: stateful set implementation to be consistent with k8s scaling:
- don't use service replicas,
- create a unique service with '-N' appended and allocate unique volume for each replica
- allows crawl containers to be restarted w/o losing data
- add volume pruning background service, as volumes can be deleted only after service shuts down fully
- watch: fully simplify routing, route via replica index instead of ip for both k8s and swarm
- rename network btrix-cloud-net -> btrix-net to avoid conflict with compose network
- use statefulsets instead of deployments for mongo, redis, signer
- use k8s job + statefulset for running crawls
- use separate statefulset for crawl (scaled) and single-replica redis stateful set
- move crawl job update login to crawl_updater
- remove shared redis chart
package refactor:
- move to shared code to 'btrixcloud'
- move k8s to 'btrixcloud.k8s'
- move docker to 'btrixcloud.docker'
* ensure editing other config options does not lose profile
* support adding/editing/removing profile of existing config
* when duplicating config, ensure profile setting is also copied in the duplicate
- delete browser profile, if not in use
- if in use, show error message, listing crawl configs that use the profile
- backend: fix check for confirming profile deletion
* apply /api prefix consistently, both directly through backend and when accessing via frontend, fixes#236
* docs: update local deployment docs to use 9871 instead of 8000, don't expose 8000 by default
* schemas: don't include /openapi.json as /healthz in documentation, keep /healthz at root
* k8s: route backend to /api without additional rewriting
* add profile creation, list endpoints at /archives/<aid>/profiles
* add profile browser creation, get, ping, commit, delete endpoints at /archives/<aid>/profiles/browser
* support creation of profile browser using browsertrix-crawler 'create-login-profile' in docker and k8s
* ensure profile browser expires after set time, k8s job or docker container automatically deleted on exit
* profile browser creation returns temporary browser id, or `{"detail": "waiting_for_browser"}` while waiting for browser container init
* nginx frontend: proxy /loadbrowser/ to port 9223 in browsertrix-crawler, connecting directly to chrome devtools
* profile api auth: use redis for auth
- store browserid->archiveid and browserid->browser ip mapping in redis
- browser apis: ensure profile browser is associated with specified archive
- browser ws: pass arcchiveid and browserid to ws query args, browserid is part of archive, and browserid corresponds to specified ip
* store profiles in /profiles/ directory in default storage, include profileid in profile tar.gz filename
* support profile in crawlconfig:
- add profileid to CrawlConfig, and profileName to CrawlConfigOut
- support resolving profile path via profileid, setting '--profile @{path/to/profile.tar.gz}' for crawler (assuming same storage for profile as output for now) in both docker and k8s setups
- docker: support out_filename, custom wacz output filename missing functionality
* backend api
- superadmin has admin access to all archives
- new superadmin endpoints: /archives/all/crawls and /archives/all/crawls/<crawl_id>.json for list all running crawls
and loading crawl data by id
- frontend superadmin view (fixes#201)
* show all archives on superadmin home page
* show jump to crawl for super admin (#200)
* navbar links for: all archives, all running crawls and jump to crawl
Co-authored-by: sua yoo <sua@suayoo.com>
* frontend-tweaks:
- treat 'starting' state same as 'running'
- default to no schedule instead of weekly for default
- add 'Domain' scopeType
* backend: also allow 'domain' as a scopeType
* watch work: proxy directly to crawls instead of redis pubsub
- add 'watchIPs' to crawl detail output
- cache crawl ips for quick access for auth
- add '/ipaccess/{ip}' endpoint for watch ws connection to ensure ws has access to the specified container ip
- enable 'auth_request' in nginx frontend
- requirements: update to latest redis-py
remaining fixes for #134
* frontend docker build: pass GIT_COMMIT_HASH and GIT_BRANCH_NAME as env vars to remove dependency on git in webpack.config.js (for glitchtip)
fixes#150
* default to "unknown" if git and env vars not available
* add comment about error reporting for local use
Co-authored-by: sua yoo <sua@suayoo.com>
* backend support for new watch system (#134):
- support for watch via redis pubsub and websocket connection to backend
- can support watch from any number of crawler instances to support scaled crawls
- use /archives/{aid}/crawls/{crawl_id}/watch/ws websocket endpoint
- ws: ignore graceful connectionclosedok exception, log other exceptions
- set logging to info to instead of debug for now (debug logs all ws traffic)
- remove old watch apis in backend
- remove old websocket routing to crawler instance for old watch system
- oauth bearer check: support websockets, use websocket object if no request object
- crawler args: replace --screencastPort with --screencastRedis
* support for replay via replayweb.page embed, fixes#124
backend:
- pre-sign all files urls
- cache pre-signed urls in redis, presign again when expired (default duration 3600, settable via PRESIGN_DURATION_SECONDS env var)
- change files output -> resources to confirm to Data Package spec supported by replayweb.page
- add CrawlFileOut which contains 'name' (file id), 'path' (presigned url), 'hash', and 'size'
- add /replay/sw.js endpoint to import sw.js from latest replay-web-page release
- update to fastapi-users 9.2.2
- customize backend auth to allow authentication to check 'auth_bearer' query arg if 'Authorization' header not set
- remove sw.js endpoint, handling in frontend
frontend:
- add <replay-web-page> to frontend, include rwp ui.js from latest release in index.html for now
- update crawl api endpoint to end in json
- replay-web-page loads the api endpoint directly!
- update Crawl type to use new format, 'resources' -> instead of 'files', each file has 'name' and 'path'
- nginx: add endpoint to serve the replay sw.js endpoint
- add defer attr to ui.js
- move 'Download' to 'Download Files'
* frontend: support customizing replayweb.page loading url via RWP_BASE_URL env var in Dockerfile
- default prod value set in frontend Dockerfile (set to upcoming 1.5.8 release needed for multi-wacz-file support) (can be overridden during image build via --build-arg)
- rename index.html -> index.ejs to allow interpolation
- RWP_BASE_URL defaults to latest https://replayweb.page/ for testing
- for local testing, add sw.js loading via devServer, also using RWP_BASE_URL (#131)
Co-authored-by: sua yoo <sua@suayoo.com>
- Cancel and stop crawl
- Sorts crawls by start time, status and crawl template ID
- Filters crawls by crawl template ID
- Adds shortcut to copy template ID
frontend:
- add checkbox to basic crawl config component which sets 'extraHops' to 1, otherwise to 0
- text tweaks: rename Scope Type -> Crawl Scope, capitalization
backend: add 'extraHops' to CrawlConfig
fixes#102
* optimizing frontend dockerfile:
- run install first to cache node_modules
- don't pass node_modules to image
- add only needed files before build
* remove language file generation from build step
Co-authored-by: sua yoo <sua@suayoo.com>
- adapt nginx config to work both in docker and k8s, using env vars to set urls
backend: additional fixes:
- use env vars with nginx config
- fix settings api route
- when sending e-mail, use the Host header for verification urls when available
- prepare Dockerfile with full build from scratch in image, (disabled 'yarn install' for faster builds for now)
- fix accept invite api for existing user to /archives/accept-invite/{token}
- Leverage webpack chunk splitting to creating more, smaller JS files rather than one large main file (import(file) syntax)
- Enable long-term caching by adding content hash to output file names
- Copy entire /dist folder contents in Dockerfile
- Changed yarn start-dev -> yarn start since there is no prod server
- Reenable locale picker
* support running backend + frontend together on k8s
* split nginx container into separate frontend service, which uses nignx-base image and the static frontend files
* add nginx-based frontend image to docker-compose build (for building only, docker-based combined deployment not yet supported)
* backend:
- fix paths for email templates
- chart: support '--set backend_only=1' and '--set frontend_only=1' to only force deploy one or the other
- run backend from root /api in uvicorn
- Show toast alert when user is verified
- Redirect to correct page on verified
- Update already-logged in user info on verify
- Adds new toast component
closes#39