browsertrix/backend
Ilya Kreymer 57a4b6b46f add collections api:
- collections defined by name per archive
- can update collections with additional metadata (currently just description)
- crawl config api accepts a list of collections by name, resolved to collection uids and stored in config
- finished crawls also associated with collection list
- /archives/{aid}/collections/{name} can list all crawl artifacts (wacz files) from a named collection (in frictionless data package-ish format)
- /archives/{aid}/collections/$all lists all crawled artifacts for the archive

readiness check: add /healthz endpoints for app and nginx
ingress: add /data/ route to local bucket

storage improvements:
- for default storages, store path only, and prepend default storage access endpoint
- collections api returns the paths using the storage access endpoint
- define default storages as secrets in k8s (can support multiple), hard-coded in docker (only one for now)
2021-10-27 09:39:14 -07:00
..
archives.py add collections api: 2021-10-27 09:39:14 -07:00
colls.py add collections api: 2021-10-27 09:39:14 -07:00
crawlconfigs.py add collections api: 2021-10-27 09:39:14 -07:00
crawls.py add collections api: 2021-10-27 09:39:14 -07:00
db.py use redis based queue instead of url for crawl done webhook 2021-10-10 12:18:28 -07:00
Dockerfile
dockerman.py add collections api: 2021-10-27 09:39:14 -07:00
emailsender.py
k8sman.py add collections api: 2021-10-27 09:39:14 -07:00
main.py add collections api: 2021-10-27 09:39:14 -07:00
requirements.txt
scheduler.py
storages.py
users.py add ingress + nginx container for better routing 2021-10-09 23:47:29 -07:00