- collections defined by name per archive - can update collections with additional metadata (currently just description) - crawl config api accepts a list of collections by name, resolved to collection uids and stored in config - finished crawls also associated with collection list - /archives/{aid}/collections/{name} can list all crawl artifacts (wacz files) from a named collection (in frictionless data package-ish format) - /archives/{aid}/collections/$all lists all crawled artifacts for the archive readiness check: add /healthz endpoints for app and nginx ingress: add /data/ route to local bucket storage improvements: - for default storages, store path only, and prepend default storage access endpoint - collections api returns the paths using the storage access endpoint - define default storages as secrets in k8s (can support multiple), hard-coded in docker (only one for now)
32 lines
771 B
Bash
32 lines
771 B
Bash
# Env Settings (for local Docker Deployment)
|
|
|
|
MONGO_HOST=mongo
|
|
PASSWORD_SECRET=change_me
|
|
|
|
MONGO_INITDB_ROOT_USERNAME=root
|
|
MONGO_INITDB_ROOT_PASSWORD=example
|
|
|
|
MINIO_ROOT_USER=ADMIN
|
|
MINIO_ROOT_PASSWORD=PASSW0RD
|
|
|
|
STORE_ENDPOINT_URL=http://minio:9000/test-bucket/
|
|
STORE_ACCESS_ENDPOINT_URL=http://localhost:9000/test-bucket/
|
|
STORE_ACCESS_KEY=ADMIN
|
|
STORE_SECRET_KEY=PASSW0RD
|
|
|
|
MC_HOST_local=http://ADMIN:PASSW0RD@minio:9000
|
|
|
|
REDIS_URL=redis://redis/0
|
|
|
|
# enable to send verification emails
|
|
#EMAIL_SMTP_HOST=smtp.gmail.com
|
|
#EMAIL_SMTP_PORT=587
|
|
#EMAIL_SENDER=user@example.com
|
|
#EMAIL_PASSWORD=password
|
|
|
|
# Browsertrix Crawler image to use
|
|
CRAWLER_IMAGE=webrecorder/browsertrix-crawler
|
|
|
|
CRAWL_ARGS="--timeout 90 --logging stats,behaviors,debug --generateWACZ --screencastPort 9037"
|
|
|