- collections defined by name per archive - can update collections with additional metadata (currently just description) - crawl config api accepts a list of collections by name, resolved to collection uids and stored in config - finished crawls also associated with collection list - /archives/{aid}/collections/{name} can list all crawl artifacts (wacz files) from a named collection (in frictionless data package-ish format) - /archives/{aid}/collections/$all lists all crawled artifacts for the archive readiness check: add /healthz endpoints for app and nginx ingress: add /data/ route to local bucket storage improvements: - for default storages, store path only, and prepend default storage access endpoint - collections api returns the paths using the storage access endpoint - define default storages as secrets in k8s (can support multiple), hard-coded in docker (only one for now) |
||
---|---|---|
.. | ||
archives.py | ||
colls.py | ||
crawlconfigs.py | ||
crawls.py | ||
db.py | ||
Dockerfile | ||
dockerman.py | ||
emailsender.py | ||
k8sman.py | ||
main.py | ||
requirements.txt | ||
scheduler.py | ||
storages.py | ||
users.py |