browsertrix/backend/btrixcloud
Tessa Walsh 6f81d588a9
Ensure crawl page counts are correct when re-adding pages (#2601)
Fixes #2600 

This PR fixes the issue by ensuring that crawl page counts (total,
unique, files, errors) are reset to 0 when crawl pages are deleted, such
as right before being re-added.

It also adds a migration will recalculates file and error page counts
for each crawl without re-adding pages from the WACZ files.
2025-05-13 14:05:41 -04:00
..
migrations Ensure crawl page counts are correct when re-adding pages (#2601) 2025-05-13 14:05:41 -04:00
operator Ensure error and behavior logs are written to database in order (#2540) 2025-04-08 09:35:50 -04:00
__init__.py refactoring to use statefulsets + job (#245) 2022-06-05 10:37:17 -07:00
auth.py fixes token lifetime bug / improve security (#2490) 2025-03-19 10:07:09 -07:00
background_jobs.py Rework crawl page migration + MongoDB Query Optimizations (#2412) 2025-02-20 15:26:11 -08:00
basecrawls.py Add behavior logs from Redis to database and add endpoint to serve (#2526) 2025-04-08 02:16:10 +02:00
colls.py compute top page origins for each collection (#2483) 2025-05-08 14:22:40 -07:00
crawlconfigs.py Sort running crawls first by default (#2587) 2025-05-08 17:21:17 -04:00
crawlmanager.py feat: Apply saved workflow settings to current crawl (#2514) 2025-04-29 11:43:14 -07:00
crawls.py Add behavior logs from Redis to database and add endpoint to serve (#2526) 2025-04-08 02:16:10 +02:00
db.py Ensure crawl page counts are correct when re-adding pages (#2601) 2025-05-13 14:05:41 -04:00
emailsender.py Rework crawl page migration + MongoDB Query Optimizations (#2412) 2025-02-20 15:26:11 -08:00
invites.py Reformat with Black for 2025 ruleset (#2349) 2025-01-29 16:57:06 -05:00
k8sapi.py Fixes #2488 (#2493) 2025-03-19 10:06:25 -07:00
main_bg.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
main_migrations.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
main_op.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
main.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
models.py compute top page origins for each collection (#2483) 2025-05-08 14:22:40 -07:00
ops.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
orgs.py Add API endpoint to check if subscription is activated (#2582) 2025-05-06 17:36:58 -07:00
pages.py Ensure crawl page counts are correct when re-adding pages (#2601) 2025-05-13 14:05:41 -04:00
pagination.py Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
profiles.py support overriding crawler image pull policy per channel (#2523) 2025-03-31 14:11:41 -07:00
storages.py Add thumbnail endpoint (#2468) 2025-03-07 12:29:36 -08:00
subs.py Add API endpoint to check if subscription is activated (#2582) 2025-05-06 17:36:58 -07:00
uploads.py Rework crawl page migration + MongoDB Query Optimizations (#2412) 2025-02-20 15:26:11 -08:00
users.py Fix user emails use userout (#2511) 2025-03-24 12:04:39 -07:00
utils.py Add behavior logs from Redis to database and add endpoint to serve (#2526) 2025-04-08 02:16:10 +02:00
version.py version: bump to 1.16.0 2025-05-08 14:30:00 -07:00
webhooks.py Better cacheing of presigned URLs + support for thumbnails (#2446) 2025-03-03 12:05:23 -08:00