Commit Graph

15 Commits

Author SHA1 Message Date
Ilya Kreymer
2967f1e320
ingress: simplify ingress config: (fixes #1135) (#1146)
* ingress: simplify ingress config: (fixes #1135)
- use standard Prefix pathTypes
- remove nginx-specific rewriting
- remove 'scheme', use https/http based on 'tls' setting (in ingress and configmap)
- fix signing ingress to use ingressClassName
2023-09-07 09:51:48 -07:00
Ilya Kreymer
63b776bce8
ingress: minor tweaks to ingress to update to latest spec: (#1096)
- use pathType ImplementationSpecific for regexes
- use ingressClassName instead of annotation
2023-08-23 11:36:52 -07:00
Ilya Kreymer
9553115bbe
helm chart tweaks: (#1067)
* helm chart tweaks:
- lower mem requirements for backend and crawler
- disable cors in ingress to pass through cors headers from backend
- crawler statefulset: use ordered instead of parallel scaling policy to avoid single crawl taking up all crawling capacity quickly
2023-08-14 16:43:12 -07:00
Ilya Kreymer
00eb62214d
Uploads API: BaseCrawl refactor + Initial support for /uploads endpoint (#937)
* basecrawl refactor: make crawls db more generic, supporting different types of 'base crawls': crawls, uploads, manual archives
- move shared functionality to basecrawl.py
- create a base BaseCrawl object, which contains start / finish time, metadata and files array
- create BaseCrawlOps, base class for CrawlOps, which supports base crawl deletion, querying and collection add/remove

* uploads api: (part of #929)
- new UploadCrawl object which extends BaseCrawl, has name and description
- support multipart form data data upload to /uploads/formdata
- support streaming upload of a single file via /uploads/stream, using botocore multipart upload to upload to s3-endpoint in parts
- require 'filename' param to set upload filename for streaming uploads (otherwise use form data names)
- sanitize filename, place uploads in /uploads/<uuid>/<sanitized-filename>-<random>.wacz
- uploads have internal id 'upload-<uuid>'
- create UploadedCrawl object with CrawlFiles pointing to the newly uploaded files, set state to 'complete'
- handle upload failures, abort multipart upload
- ensure uploads added within org bucket path
- return id / added when adding new UploadedCrawl
- support listing, deleting, and patch /uploads
- support upload details via /replay.json to support for replay
- add support for 'replaceId=<id>', which would remove all previous files in upload after new upload succeeds. if replaceId doesn't exist, create new upload. (only for stream endpoint so far).
- support patching upload metadata: notes, tags and name on uploads (UpdateUpload extends UpdateCrawl and adds 'name')

* base crawls api: Add /all-crawls list and delete endpoints for all crawl types (without resources)
- support all-crawls/<id>/replay.json with resources
- Use ListCrawlOut model for /all-crawls list endpoint
- Extend BaseCrawlOut from ListCrawlOut, add type
- use 'type: crawl' for crawls and 'type: upload' for uploads
- migration: ensure all previous crawl objects / missing type are set to 'type: crawl'
- indexes: add db indices on 'type' field and with 'type' field and oid, cid, finished, state

* tests: add test for multipart and streaming upload, listing uploads, deleting upload
- add sample WACZ for upload testing: 'example.wacz' and 'example-2.wacz'

* collections: support adding and remove both crawls and uploads via base crawl
- include collection_ids in /all-crawls list
- collections replay.json can include both crawls and uploads

bump version to 1.6.0-beta.2
---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-07-07 09:13:26 -07:00
DongWoo Lee
b03b848ec2 have ingress for signer only when it is enabled 2023-01-06 14:06:33 -08:00
Ilya Kreymer
82ffc0dfbc
Local Deployment Work: Support running locally + test cluster on CI (#396)
* k8s local deployment work:
- make it easier to deploy w/o ingress by setting 'local_service_port' (suggested port 30870)
- if using local minio, ensure file endpoints set to /data/ and /data/ proxies correctly to local bucket
- if not using minio, ensure file endpoints point to correct access / endpoint url.
- setup should work with docker desktop, minikube, microk8s and k3s!
- nginx chart: bump nginx memory limit to 20Mi
- nginx image: 00-default-override-resolver-config -> 00-browsertrix-nginx-init for clarity
- nginx image: use default nginx.conf, pin to nginx 1.23.2
- mongo: readd readiness probe, bump connect wait timeout (needed for ci)
- config: set superadmin username to 'admin'
- config schema: set 'name' as required 
- add sample chart values overrides:
- chart values: local-config.yaml for running locally with 'local_service_port'
- chart values: add microk8s-hosted.yaml for configuring a hosted microk8s setup
- chart values: add microk8s-ci.yaml for ci tests
- ci: remove docker swarm tests
- ci: add microk8s integration tests: launching cluster, logging in, running a crawl of example.com, downloading/checking WACZ
- bump to 1.1.0-beta.2
2022-12-02 19:58:34 -08:00
Ilya Kreymer
aabb0b2a92
chart / deployment fixes to run on microk8s: (fixes #385) (#387)
- ingress: fix proxying /data to minio, use another ingress which proxies correct host to ensure presigned urls work
- presigning: determine if signing endpoint url (minio) or access endpoint (cloud bucket) based on if access endpoint is provided, set bool on storage object
- chart: fix indent on incorrect storageClassName configs
- ingress: make 'ingress_class' configurable (set to 'public' for microk8s, default to 'nginx')
- minio: use older minio image which supports legacy fs based setup (for now)
- nginx service: add 'nginx_service_use_node_port' config setting: if true, will use NodePort for frontend,
other will use default (ClusterIP) and only for the frontend / nginx
- chart: remove changing service type for other services
2022-11-30 09:21:58 -08:00
Ilya Kreymer
dde4c5ee68 k8s chart:
ingress: use separate ingress for authsign to allow ssl-redirect true on main ingress
mongo: local: disable readiness check for now due to issues with eval command (for now)
2022-10-15 13:46:31 -07:00
Ilya Kreymer
c023fe7c9a
Backend API prefix (#240)
* apply /api prefix consistently, both directly through backend and when accessing via frontend, fixes #236

* docs: update local deployment docs to use 9871 instead of 8000, don't expose 8000 by default

* schemas: don't include /openapi.json as /healthz in documentation, keep /healthz at root

* k8s: route backend to /api without additional rewriting
2022-05-31 19:29:20 -07:00
Ilya Kreymer
2e2b8b329d
Add signing server via authsign (k8s only) (#107)
- add k8s deployment of signing server, if 'signer.enabled' chart value if set
- update ingress to provide access for 'signer.host' if signing server enabled to verify domain, run signing server itself on different port (also turn off ssl redirects to support signing server)
- set WACZ_SIGN_URL and WACZ_SIGN_TOKEN (supported in browesertrix-crawler 0.5.0)
- authsign deployment uses a volume to store current certs
- add sample signer block, with signing disabled by default
2022-01-26 23:27:13 -08:00
Ilya Kreymer
05c1129fb8
Frontend + Backend Integrated Deployment (K8s only) (#45)
* support running backend + frontend together on k8s
* split nginx container into separate frontend service, which uses nignx-base image and the static frontend files
* add nginx-based frontend image to docker-compose build (for building only, docker-based combined deployment not yet supported)

* backend:
- fix paths for email templates
- chart: support '--set backend_only=1' and '--set frontend_only=1' to only force deploy one or the other
- run backend from root /api in uvicorn
2021-12-03 10:17:22 -08:00
Ilya Kreymer
3d4d7049a2
Misc backend fixes for cloud deployment (#26)
* misc backend fixes:
- fix running w/o local minio
- ensure crawler image pull policy is configurable, loaded via chart value
- use digitalocean repo for main backend image (for now)
- add bucket_name to config only if using default bucket

* enable all behaviors, support 'access_endpoint_url' for default storages

* debugging: add 'no_delete_jobs' setting for k8s and docker to disable deletion of completed jobs
2021-11-25 11:58:26 -08:00
Ilya Kreymer
57a4b6b46f add collections api:
- collections defined by name per archive
- can update collections with additional metadata (currently just description)
- crawl config api accepts a list of collections by name, resolved to collection uids and stored in config
- finished crawls also associated with collection list
- /archives/{aid}/collections/{name} can list all crawl artifacts (wacz files) from a named collection (in frictionless data package-ish format)
- /archives/{aid}/collections/$all lists all crawled artifacts for the archive

readiness check: add /healthz endpoints for app and nginx
ingress: add /data/ route to local bucket

storage improvements:
- for default storages, store path only, and prepend default storage access endpoint
- collections api returns the paths using the storage access endpoint
- define default storages as secrets in k8s (can support multiple), hard-coded in docker (only one for now)
2021-10-27 09:39:14 -07:00
Ilya Kreymer
4ae4005d74 add ingress + nginx container for better routing
support screencasting to dynamically created service via nginx (k8s only thus far)
add crawl /watch endpoint to enable watching, creates service if doesn't exist
add crawl /running endpoint to check if crawl is running
nginx auth check in place, but not yet enabled
add k8s nginx.conf
add missing chart files
file reorg: move docker config to configs/
k8s: add readiness check for nginx and api containers for smoother reloading
ensure service deleted along with job
todo: update dockerman with screencast support
2021-10-09 23:47:29 -07:00
Ilya Kreymer
a111bacfb5 add k8s support
- working apis for adding crawls, removing crawls in mongo, mapped to k8s cronjobs
- more complete crawl spec
- option to start on-demand job from cronjobs
- optional minio in separate deployment/service
2021-06-30 21:48:44 -07:00