browsertrix/backend/btrixcloud
Ilya Kreymer 887cb16146
Allow configurable max pages per crawl in deployment settings (#717)
* backend: max pages per crawl limit, part of fix for #716:
- set 'max_pages_crawl_limit' in values.yaml, default to 100,000
- if set/non-0, automatically set limit if none provided
- if set/non-0, return 400 if adding config with limit exceeding max limit
- return limit as 'maxPagesPerCrawl' in /api/settings
- api: /all/crawls - add runningOnly=0 to show all crawls, default to 1/true (for more reliable testing)

tests: add test for 'max_pages_per_crawl' setting
- ensure 'limit' can not be set higher than max_pages_per_crawl
- ensure pages crawled is at the limit
- set test limit to max 2 pages
- add settings test
- check for pages.jsonl and extraPages.jsonl when crawling 2 pages
2023-03-28 16:26:29 -07:00
..
k8s backend: Fix for total crawl time limit. (#665) 2023-03-10 11:43:16 -08:00
migrations Filter and sort crawl and workflow list API endpoints in backend (#724) 2023-03-28 17:55:40 -04:00
__init__.py
colls.py Filter and sort crawl and workflow list API endpoints in backend (#724) 2023-03-28 17:55:40 -04:00
crawl_job.py backend: Fix for total crawl time limit. (#665) 2023-03-10 11:43:16 -08:00
crawlconfigs.py Allow configurable max pages per crawl in deployment settings (#717) 2023-03-28 16:26:29 -07:00
crawlmanager.py backend: Fix for total crawl time limit. (#665) 2023-03-10 11:43:16 -08:00
crawls.py Allow configurable max pages per crawl in deployment settings (#717) 2023-03-28 16:26:29 -07:00
db.py Make pending invites expire via TTL index (#568) 2023-02-14 16:07:14 -05:00
emailsender.py Reformat backend for black 23.1.0 (#548) 2023-02-01 20:01:09 -05:00
invites.py Filter and sort crawl and workflow list API endpoints in backend (#724) 2023-03-28 17:55:40 -04:00
main.py Allow configurable max pages per crawl in deployment settings (#717) 2023-03-28 16:26:29 -07:00
orgs.py Filter and sort crawl and workflow list API endpoints in backend (#724) 2023-03-28 17:55:40 -04:00
pagination.py Filter and sort crawl and workflow list API endpoints in backend (#724) 2023-03-28 17:55:40 -04:00
profile_job.py VNC-Based Profile Browser (#433) 2023-01-10 14:42:42 -08:00
profiles.py Filter and sort crawl and workflow list API endpoints in backend (#724) 2023-03-28 17:55:40 -04:00
storages.py Fix POST /orgs/{oid}/crawls/delete (#591) 2023-02-15 21:06:12 -05:00
users.py Filter and sort crawl and workflow list API endpoints in backend (#724) 2023-03-28 17:55:40 -04:00
version.py version: update to 1.4.0-beta.1 2023-03-17 21:14:42 -07:00
worker.py Fix logic for creating pidfile parent dir (#512) 2023-01-23 17:02:25 -08:00