browsertrix/backend/btrixcloud
Anish Lakhwara 037396f3d9
Fix: Stream log downloading from WACZ (#1225)
* Fix(backend): Stream logs without causing OOM

Also be smarter about when to use `heapq.merge` and when to use
`itertools.chain`: If all the logs are coming from the same instance we
`chain` them, otherwise we'll `merge` them

iterator fixes:
- group wacz files by instance by suffix, eg. -0.wacz, -1.wacz, -2.wacz
- sort wacz files, and all logs within each wacz file
- chain log iterators for all log files within wacz group
- merge log iterators across wacz files in different groups
- add type hints to help keep track of iterator helper functions
- add iter_lines() from botocore, use that for line parsing for simplicity

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-09-28 18:54:52 -07:00
..
migrations optimization: convert all uses of 'async for' to use iterator directly (#1229) 2023-09-28 12:31:08 -07:00
templates improvements to redis pod: (#1219) 2023-09-28 13:00:31 -07:00
__init__.py
basecrawls.py Remove username lookups for crawls and workflows by storing usernames in db (#1199) 2023-09-28 09:37:23 -07:00
colls.py optimization: convert all uses of 'async for' to use iterator directly (#1229) 2023-09-28 12:31:08 -07:00
crawlconfigs.py optimization: convert all uses of 'async for' to use iterator directly (#1229) 2023-09-28 12:31:08 -07:00
crawlmanager.py
crawls.py Remove username lookups for crawls and workflows by storing usernames in db (#1199) 2023-09-28 09:37:23 -07:00
db.py migration improvements: (#1228) 2023-09-28 12:04:19 -07:00
emailsender.py Disable smtp_use_tls with false instead of empty string (#1184) 2023-09-28 12:10:20 -07:00
invites.py Improved type checking for backend with mypy (#1174) 2023-09-13 19:40:26 -07:00
k8sapi.py ensure max crawl size and max crawl timeout values are set to 0 when unused, instead of null (#1167) 2023-09-13 09:51:26 -07:00
main_op.py Refactor / Cleanup: move ops functions back into classes (#1171) 2023-09-13 11:56:09 -07:00
main.py migration improvements: (#1228) 2023-09-28 12:04:19 -07:00
models.py Remove username lookups for crawls and workflows by storing usernames in db (#1199) 2023-09-28 09:37:23 -07:00
operator.py Track bytes stored per file type and include in org metrics (#1207) 2023-09-22 12:55:21 -04:00
orgs.py optimization: convert all uses of 'async for' to use iterator directly (#1229) 2023-09-28 12:31:08 -07:00
pagination.py
profiles.py Track bytes stored per file type and include in org metrics (#1207) 2023-09-22 12:55:21 -04:00
storages.py Fix: Stream log downloading from WACZ (#1225) 2023-09-28 18:54:52 -07:00
uploads.py Remove username lookups for crawls and workflows by storing usernames in db (#1199) 2023-09-28 09:37:23 -07:00
users.py Improved type checking for backend with mypy (#1174) 2023-09-13 19:40:26 -07:00
utils.py
version.py bump version to 1.7.0-beta.1 2023-09-18 14:33:03 -07:00
webhooks.py Improved type checking for backend with mypy (#1174) 2023-09-13 19:40:26 -07:00
zip.py Fix: Stream log downloading from WACZ (#1225) 2023-09-28 18:54:52 -07:00