browsertrix/backend/btrixcloud
Ilya Kreymer c134b576ae
Optimize presigning for replay.json (#2516)
Fixes #2515.

This PR introduces a significantly optimized logic for presigning URLs
for crawls and collections.
- For collections, the files needed from all crawls are looked up, and
then the 'presign_urls' table is merged in one pass, resulting in a
unified iterator containing files and presign urls for those files.
- For crawls, the presign URLs are also looked up once, and the same
iterator is used for a single crawl with passed in list of CrawlFiles
- URLs that are already signed are added to the return list.
- For any remaining URLs to be signed, a bulk presigning function is
added, which shares an HTTP connection and signing 8 files in parallels
(customizable via helm chart, though may not be needed). This function
is used to call the presigning API in parallel.
2025-05-20 12:09:35 -07:00
..
migrations Add ISO-639-1 language code validation to backend (#2602) 2025-05-13 16:54:33 -04:00
operator Ensure error and behavior logs are written to database in order (#2540) 2025-04-08 09:35:50 -04:00
__init__.py refactoring to use statefulsets + job (#245) 2022-06-05 10:37:17 -07:00
auth.py fixes token lifetime bug / improve security (#2490) 2025-03-19 10:07:09 -07:00
background_jobs.py Rework crawl page migration + MongoDB Query Optimizations (#2412) 2025-02-20 15:26:11 -08:00
basecrawls.py Optimize presigning for replay.json (#2516) 2025-05-20 12:09:35 -07:00
colls.py Optimize presigning for replay.json (#2516) 2025-05-20 12:09:35 -07:00
crawlconfigs.py Add ISO-639-1 language code validation to backend (#2602) 2025-05-13 16:54:33 -04:00
crawlmanager.py feat: Apply saved workflow settings to current crawl (#2514) 2025-04-29 11:43:14 -07:00
crawls.py Add behavior logs from Redis to database and add endpoint to serve (#2526) 2025-04-08 02:16:10 +02:00
db.py Add ISO-639-1 language code validation to backend (#2602) 2025-05-13 16:54:33 -04:00
emailsender.py Rework crawl page migration + MongoDB Query Optimizations (#2412) 2025-02-20 15:26:11 -08:00
invites.py Reformat with Black for 2025 ruleset (#2349) 2025-01-29 16:57:06 -05:00
k8sapi.py Fixes #2488 (#2493) 2025-03-19 10:06:25 -07:00
main_bg.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
main_migrations.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
main_op.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
main.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
models.py Optimize presigning for replay.json (#2516) 2025-05-20 12:09:35 -07:00
ops.py move db migrations to initContainers: (#2449) 2025-03-03 13:13:15 -08:00
orgs.py Add ISO-639-1 language code validation to backend (#2602) 2025-05-13 16:54:33 -04:00
pages.py Optimize presigning for replay.json (#2516) 2025-05-20 12:09:35 -07:00
pagination.py Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
profiles.py support overriding crawler image pull policy per channel (#2523) 2025-03-31 14:11:41 -07:00
storages.py Optimize presigning for replay.json (#2516) 2025-05-20 12:09:35 -07:00
subs.py Add API endpoint to check if subscription is activated (#2582) 2025-05-06 17:36:58 -07:00
uploads.py Rework crawl page migration + MongoDB Query Optimizations (#2412) 2025-02-20 15:26:11 -08:00
users.py Fix user emails use userout (#2511) 2025-03-24 12:04:39 -07:00
utils.py Add ISO-639-1 language code validation to backend (#2602) 2025-05-13 16:54:33 -04:00
version.py Bump version to 1.16.1 (#2606) 2025-05-13 17:29:49 -04:00
webhooks.py Better cacheing of presigned URLs + support for thumbnails (#2446) 2025-03-03 12:05:23 -08:00