* Add API endpoint that lists pending invites for all orgs (superuser-only)
* Add API endpoint that lists pending invites for org
* Add user emails to /api/orgs/<oid> response
Users should only be added as to the default org with Owner permissions
if they are not specifically being invited to another org. This commit
fixes the logic in the post-registration callback to make this the case.
* rename config values
- api -> backend
- nginx -> frontend
* job pods:
- set job_pull_policy from api_pull_policy (same as backend image)
- default to Always, but can be overridden for local deployment (same as backend image)
typo fix: CRAWL_NAMESPACE -> CRAWLER_NAMESPACE (part of #491)
ansible: set default label to :latest instead of :dev for
* Rename archives to orgs and aid to oid on backend
* Rename archive to org and aid to oid in frontend
* Remove translation artifact
* Rename team -> organization
* Add database migrations and run once on startup
* This commit also applies the new by_one_worker decorator to other
asyncio tasks to prevent heavy tasks from being run in each worker.
* Run black, pylint, and husky via pre-commit
* Set db version and use in migrations
* Update and prepare database in single task
* Migrate k8s configmaps
- mongodb: support passwords with '@' by escaping mongo username and password
- superadmin: update superadmin email and password after initial creation if updated in helm values
- add tags to crawl object
- ensure tags are copied from crawlconfig to crawl when crawl is created (both manually and scheduled)
- tests: add test to ensure tags added to crawl, remove redundant wait replaced with fixtures
- Add default switch to Archive (org) model
- Set default org name via values.yaml
- Add check to ensure only one org with default org name exists
- Stop creating new orgs for new users
- Add new API endpoints for creating and renaming orgs (part of #457)
- Make Archive.name unique via index
- Wait for db connection on init, log if waiting
- Make archive-less invites invite user to default org with Owner role
- Rename default org from chart value if changed
- Don't create new org for invited users
backend: object filtering:
- add filtering crawls, crawlconfigs and profiles by userid= query arg, fixes#460
- add filtering crawls by crawlconfig via cid= query arg, fixes#400
- tests: add test_filter_results test suite to test filtering crawls and crawlconfigs by user, also create user with 'crawler' permissions, run second crawl with that user.
* backend: initial tags api support (addresses #365):
- add 'tags' field to crawlconfig (array of strings)
- allow querying crawlconfigs to specify multiple 'tag' query args, eg. tag=A&tag=B
- add /archives/<aid>/crawlconfigs/tags api to query by distinct tag, include index on aid + tag
tests: add tests for adding configs, querying by tags
tests: fix fixtures to retry login if initial attempts fails, use test seed of https://webrecorder.net instead of https://example.com/
* profile browser vnc support + fixes:
- switch profile browser rendering to use VNC
- frontend: add @novnc/novnc as dependency, create separate bundle novnc.js to load into vnc browser (to avoid loading from each container)
- frontend: update proxy paths to proxy websocket, index page to crawler
- frontend: allow browser profiles in all browsers, remove browser compatibility check
- frontend: update webpack dev config, apply prettier
- frontend: node version fix
- backend: get vncpassword, build new URL for proxying to crawler iframe
- backend: fix profile / crawl job pull policy from 'Always' -> 'Never', should use existing image for job
- backend: fix kill signal to use bash -c to work with latest backend image
- backend/chart: add 'profile_browser_timeout_seconds' to chart values to control how long profile browser to remain when idle (default to 60)
- backend: remove utils.py, now using secret.token_hex() for random suffix
Co-authored-by: sua yoo <sua@suayoo.com>
* Use archive_viewer_dep permissions to GET crawls
* Add is_viewer check to archive_dep
* Add API endpoint to add new user to archive directly (/archive/<id>/add-user)
* Add tests
* Refactor tests to use fixtures
* And remove login test that duplicates fixtures
* backend: crawl info apis:
- add /crawls/{crawl_id} api endpoint which just lists the crawl info, without resolving the individual files
- move /crawls/{crawl_id}.json -> /crawls/{crawl_id}/replay.json for clarity that it's used for replay
* frontend: update api for new replay.json endpoint
- fix typos in docs
- update prod deployment info
- update minikube info
- add info on how to run with local images
- bump version to 1.1.0-beta.3 for testing multiarch build
* k8s local deployment work:
- make it easier to deploy w/o ingress by setting 'local_service_port' (suggested port 30870)
- if using local minio, ensure file endpoints set to /data/ and /data/ proxies correctly to local bucket
- if not using minio, ensure file endpoints point to correct access / endpoint url.
- setup should work with docker desktop, minikube, microk8s and k3s!
- nginx chart: bump nginx memory limit to 20Mi
- nginx image: 00-default-override-resolver-config -> 00-browsertrix-nginx-init for clarity
- nginx image: use default nginx.conf, pin to nginx 1.23.2
- mongo: readd readiness probe, bump connect wait timeout (needed for ci)
- config: set superadmin username to 'admin'
- config schema: set 'name' as required
- add sample chart values overrides:
- chart values: local-config.yaml for running locally with 'local_service_port'
- chart values: add microk8s-hosted.yaml for configuring a hosted microk8s setup
- chart values: add microk8s-ci.yaml for ci tests
- ci: remove docker swarm tests
- ci: add microk8s integration tests: launching cluster, logging in, running a crawl of example.com, downloading/checking WACZ
- bump to 1.1.0-beta.2
- ingress: fix proxying /data to minio, use another ingress which proxies correct host to ensure presigned urls work
- presigning: determine if signing endpoint url (minio) or access endpoint (cloud bucket) based on if access endpoint is provided, set bool on storage object
- chart: fix indent on incorrect storageClassName configs
- ingress: make 'ingress_class' configurable (set to 'public' for microk8s, default to 'nginx')
- minio: use older minio image which supports legacy fs based setup (for now)
- nginx service: add 'nginx_service_use_node_port' config setting: if true, will use NodePort for frontend,
other will use default (ClusterIP) and only for the frontend / nginx
- chart: remove changing service type for other services
- Adds version to version.txt in root
- adds update-version.sh which updates version in frontend/package.json and backend/btrixcloud/version.py
- frontend: loads version from $VERSION env var, ../version.txt or package.json
- ci: on new github release, pushes webrecorder/browsertrix-backend and webrecorder/browsertrix-frontend images to Dockerhub with current version, as well as latest.
- version set to 1.1.0-beta.0
- closes#357
* add exclusion api, fixes#311
add new apis: `POST crawls/{crawl_id}/exclusion?regex=...` and `DELETE crawls/{crawl_id}/exclusion?regex=...` which will:
- create new config with add 'regex' as exclusion (deleting or making inactive previous config) OR remove as exclusion.
- update crawl to point to new config
- update statefulset to point to new config, causing crawler pods to restart
- filter out urls matching 'regex' from both queue and seen list (currently a bit slow) (when adding only)
- return 400 if exclusion already existing when adding, or doesn't exist when removing
- api reads redis list in reverse to match how exclusion queue is used
* crawl queue api work: (#329)
- add api to /crawls/{crawl_id}/queue api to get crawl queue, with offset, count, and optional regex. returns results and regex matches within the results, along with total urls in queue.
- add api to match entire crawl queue, /crawls/{crawl_id}/queueMatch with query 'regex' arg, which processes entire crawl queue on backend and returns a list of matches (more experimental)
- if crawl not yet started / redis not available, return empty queue
- only supported for k8s deployment at the moment
At the moment picking "custom" yields a UI error:
```
scopeType: value is not a valid enumeration member; permitted: 'page', 'page-spa', 'prefix', 'host', 'domain', 'any'
```
- regression fix: ensure correct signals are set to stop crawl (SIGUSER1 + SIGTERM)
- crawl stop: if crawl is still running after 60 seconds, allow signal to be resent
- regression fix: ensure crawling with profile is working in k8s
- ensure profile browser DELETE command is working
- ensure profile browser job expires if no initial ping
- logging: print exception for base job if init fails