browsertrix/backend/btrixcloud
Ilya Kreymer 8ae032ff88 More friendly WARC prefix inside WACZ based on Org slug + Crawl Name / First Seed URL. (#1537)
Supports setting WARC prefix for WARCs inside WACZ to `<org slug>-<slug
[crawl name | first seed host]>`.
- Prefix set via WARC_PREFIX env var, supported in browsertrix-crawler
1.0.0-beta.4 or higher
If crawl name is provided, uses crawl name, other hostname of first
seed. The name is 'sluggified', using lowercase alphanum characters
separated by dashes.

Ex: in an organization called `Default Org`, a crawl of
`https://specs.webrecorder.net/` and no name will have WARCs named:
`default-org-specs-webrecorder-net-....warc.gz`
If the crawl is given the name `SPECS`, the WARCs will be named
`default-org-specs-manual-....warc.gz`

Fixes #412 in a default way.
2024-02-22 23:54:23 -08:00
..
migrations Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
__init__.py
auth.py
background_jobs.py Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
basecrawls.py Add crawl, upload, and collection delete webhook event notifications (#1363) 2023-11-09 18:19:08 -08:00
colls.py Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
crawlconfigs.py More friendly WARC prefix inside WACZ based on Org slug + Crawl Name / First Seed URL. (#1537) 2024-02-22 23:54:23 -08:00
crawlmanager.py More friendly WARC prefix inside WACZ based on Org slug + Crawl Name / First Seed URL. (#1537) 2024-02-22 23:54:23 -08:00
crawls.py better handling of failed redis connection + exec time updates (#1520) 2024-02-09 16:14:29 -08:00
db.py Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
emailsender.py Email Templates (#1375) 2023-11-15 15:22:12 -08:00
invites.py Email Templates (#1375) 2023-11-15 15:22:12 -08:00
k8sapi.py More friendly WARC prefix inside WACZ based on Org slug + Crawl Name / First Seed URL. (#1537) 2024-02-22 23:54:23 -08:00
main_op.py Send email to superuser when background job fails (#1355) 2023-11-08 19:55:59 -08:00
main.py Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
models.py Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
operator.py More friendly WARC prefix inside WACZ based on Org slug + Crawl Name / First Seed URL. (#1537) 2024-02-22 23:54:23 -08:00
orgs.py Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
pagination.py Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
profiles.py Support multiple crawler versions (#1420) 2024-01-16 15:32:12 -08:00
storages.py storages: use asynccontextmanager instead of sync to close client (#1521) 2024-02-08 08:28:53 -08:00
uploads.py Background Jobs Work (#1321) 2023-11-02 13:02:17 -07:00
users.py Add API endpoints for crawl statistics (#1461) 2024-01-10 13:30:47 -08:00
utils.py Add API endpoints for crawl statistics (#1461) 2024-01-10 13:30:47 -08:00
version.py version: bump to 1.10.0-beta.0 2024-02-20 00:22:29 -08:00
webhooks.py Add crawl, upload, and collection delete webhook event notifications (#1363) 2023-11-09 18:19:08 -08:00
zip.py Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00