browsertrix/backend/btrixcloud
Ilya Kreymer 8ea16393c5
Optimize single-page crawl workflows (#2656)
For single page crawls:
- Always force 1 browser to be used, ignoring browser windows/scale
setting
- Don't use custom PVC volumes in crawler / redis, just use emptyDir -
no chance of crawler being interrupted and restarted on different
machine for a single page.

Adds a 'is_single_page' check to CrawlConfig, checking for either limit
or scopeType / no extra hops.

Fixes #2655
2025-06-10 12:13:57 -07:00
..
migrations Allow users to run crawls with 1 or 2 browser windows (#2627) 2025-06-03 13:37:30 -07:00
operator Optimize single-page crawl workflows (#2656) 2025-06-10 12:13:57 -07:00
__init__.py
auth.py fixes token lifetime bug / improve security (#2490) 2025-03-19 10:07:09 -07:00
background_jobs.py Rework crawl page migration + MongoDB Query Optimizations (#2412) 2025-02-20 15:26:11 -08:00
basecrawls.py Pause / Resume Crawls Initial Implmentation. (#2572) 2025-05-21 14:05:16 -07:00
colls.py remove deleted collections from crawlconfigs (#2615) 2025-05-20 18:38:40 -07:00
crawlconfigs.py Optimize single-page crawl workflows (#2656) 2025-06-10 12:13:57 -07:00
crawlmanager.py Optimize single-page crawl workflows (#2656) 2025-06-10 12:13:57 -07:00
crawls.py Allow users to run crawls with 1 or 2 browser windows (#2627) 2025-06-03 13:37:30 -07:00
db.py Allow users to run crawls with 1 or 2 browser windows (#2627) 2025-06-03 13:37:30 -07:00
emailsender.py Rework crawl page migration + MongoDB Query Optimizations (#2412) 2025-02-20 15:26:11 -08:00
invites.py Reformat with Black for 2025 ruleset (#2349) 2025-01-29 16:57:06 -05:00
k8sapi.py Optimize single-page crawl workflows (#2656) 2025-06-10 12:13:57 -07:00
main_bg.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
main_migrations.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
main_op.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
main.py Allow users to run crawls with 1 or 2 browser windows (#2627) 2025-06-03 13:37:30 -07:00
models.py Allow users to run crawls with 1 or 2 browser windows (#2627) 2025-06-03 13:37:30 -07:00
ops.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
orgs.py Allow users to run crawls with 1 or 2 browser windows (#2627) 2025-06-03 13:37:30 -07:00
pages.py Add Org Check for Collection access (#2616) 2025-05-20 15:30:22 -07:00
pagination.py
profiles.py support overriding crawler image pull policy per channel (#2523) 2025-03-31 14:11:41 -07:00
storages.py Optimize presigning for replay.json (#2516) 2025-05-20 12:09:35 -07:00
subs.py Add API endpoint to check if subscription is activated (#2582) 2025-05-06 17:36:58 -07:00
uploads.py Add Org Check for Collection access (#2616) 2025-05-20 15:30:22 -07:00
users.py Fix user emails use userout (#2511) 2025-03-24 12:04:39 -07:00
utils.py Allow users to run crawls with 1 or 2 browser windows (#2627) 2025-06-03 13:37:30 -07:00
version.py version: bump to 1.17.0-beta.0 2025-06-02 14:46:32 -07:00
webhooks.py Better cacheing of presigned URLs + support for thumbnails (#2446) 2025-03-03 12:05:23 -08:00