browsertrix/backend/btrixcloud
Ilya Kreymer c134b576ae
Optimize presigning for replay.json (#2516)
Fixes #2515.

This PR introduces a significantly optimized logic for presigning URLs
for crawls and collections.
- For collections, the files needed from all crawls are looked up, and
then the 'presign_urls' table is merged in one pass, resulting in a
unified iterator containing files and presign urls for those files.
- For crawls, the presign URLs are also looked up once, and the same
iterator is used for a single crawl with passed in list of CrawlFiles
- URLs that are already signed are added to the return list.
- For any remaining URLs to be signed, a bulk presigning function is
added, which shares an HTTP connection and signing 8 files in parallels
(customizable via helm chart, though may not be needed). This function
is used to call the presigning API in parallel.
2025-05-20 12:09:35 -07:00
..
migrations
operator
__init__.py
auth.py
background_jobs.py
basecrawls.py Optimize presigning for replay.json (#2516) 2025-05-20 12:09:35 -07:00
colls.py Optimize presigning for replay.json (#2516) 2025-05-20 12:09:35 -07:00
crawlconfigs.py
crawlmanager.py
crawls.py
db.py
emailsender.py
invites.py
k8sapi.py
main_bg.py
main_migrations.py
main_op.py
main.py
models.py Optimize presigning for replay.json (#2516) 2025-05-20 12:09:35 -07:00
ops.py
orgs.py
pages.py Optimize presigning for replay.json (#2516) 2025-05-20 12:09:35 -07:00
pagination.py
profiles.py
storages.py Optimize presigning for replay.json (#2516) 2025-05-20 12:09:35 -07:00
subs.py
uploads.py
users.py
utils.py
version.py
webhooks.py