browsertrix/backend
Ilya Kreymer 8ae032ff88 More friendly WARC prefix inside WACZ based on Org slug + Crawl Name / First Seed URL. (#1537)
Supports setting WARC prefix for WARCs inside WACZ to `<org slug>-<slug
[crawl name | first seed host]>`.
- Prefix set via WARC_PREFIX env var, supported in browsertrix-crawler
1.0.0-beta.4 or higher
If crawl name is provided, uses crawl name, other hostname of first
seed. The name is 'sluggified', using lowercase alphanum characters
separated by dashes.

Ex: in an organization called `Default Org`, a crawl of
`https://specs.webrecorder.net/` and no name will have WARCs named:
`default-org-specs-webrecorder-net-....warc.gz`
If the crawl is given the name `SPECS`, the WARCs will be named
`default-org-specs-manual-....warc.gz`

Fixes #412 in a default way.
2024-02-22 23:54:23 -08:00
..
btrixcloud More friendly WARC prefix inside WACZ based on Org slug + Crawl Name / First Seed URL. (#1537) 2024-02-22 23:54:23 -08:00
test Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
test_nightly Add extra and gifted execution minutes (#1361) 2023-12-07 14:34:37 -05:00
.pylintrc
Dockerfile Backend mem usage fix - use fixed MOTOR_MAX_WORKERS + switch to gunicorn (#1468) 2024-01-16 15:32:42 -08:00
mypy.ini Support multiple crawler versions (#1420) 2024-01-16 15:32:12 -08:00
requirements.txt Backend mem usage fix - use fixed MOTOR_MAX_WORKERS + switch to gunicorn (#1468) 2024-01-16 15:32:42 -08:00
test-requirements.txt Add slugs to org backend (#1250) 2023-10-10 18:30:09 -07:00