browsertrix/backend/btrixcloud/migrations
Tessa Walsh 14189b7cfb
Add crawl pages and related API endpoints (#1516)
Fixes #1502 

- Adds pages to database as they get added to Redis during crawl
- Adds migration to add pages to database for older crawls from
pages.jsonl and extraPages.jsonl files in WACZ
- Adds GET, list GET, and PATCH update endpoints for pages
- Adds POST (add), PATCH, and POST (delete) endpoints for page notes,
each with their own id, timestamp, and user info in addition to text
- Adds page_ops methods for 1. adding resources/urls to page, and 2.
adding automated heuristics and supplemental info (mime, type, etc.) to
page (for use in crawl QA job)
- Modifies `Migration` class to accept kwargs so that we can pass in ops
classes as needed for migrations
- Deletes WACZ files and pages from database for failed crawls during
crawl_finished process
- Deletes crawl pages when a crawl is deleted

Note: Requires a crawler version 1.0.0 beta3 or later, with support for
`--writePagesToRedis` to populate pages at crawl completion. Beta 4 is
configured in the test chart, which should be upgraded to stable 1.0.0
when it's released.

Connected to https://github.com/webrecorder/browsertrix-crawler/pull/464

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2024-02-28 12:11:35 -05:00
..
__init__.py Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
migration_0001_archives_to_orgs.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0002_crawlconfig_crawlstats.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0003_mutable_crawl_configs.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0004_config_seeds.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0005_operator_scheduled_jobs.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0006_precompute_crawl_stats.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0007_colls_and_config_update.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0008_precompute_crawl_file_stats.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0009_crawl_types.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0010_collection_total_size.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0011_crawl_timeout_configmap.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0012_notes_to_description.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0013_crawl_name.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0014_to_collection_ids.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0015_org_storage_usage.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0016_operator_scheduled_jobs_v2.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0017_storage_by_type.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0018_usernames.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0019_org_slug.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0020_org_storage_refs.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0021_profile_filenames.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0022_partial_complete.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0023_available_extra_exec_mins.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0024_crawlerchannel.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0025_workflow_db_configmap_fixes.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00
migration_0026_crawl_pages.py Add crawl pages and related API endpoints (#1516) 2024-02-28 12:11:35 -05:00