browsertrix/backend/btrixcloud/migrations/__init__.py
Ilya Kreymer 544346d1d4
backend: make crawlconfigs mutable! (#656) (#662)
* backend: make crawlconfigs mutable! (#656)
- crawlconfig PATCH /{id} can now receive a new JSON config to replace the old one (in addition to scale, schedule, tags)
- exclusions: add / remove APIs mutate the current crawlconfig, do not result in a new crawlconfig created
- exclusions: ensure crawl job 'config' is updated when exclusions are added/removed, unify add/remove exclusions on crawl
- k8s: crawlconfig json is updated along with scale
- k8s: stateful set is restarted by updating annotation, instead of changing template
- crawl object: now has 'config', as well as 'profileid', 'schedule', 'crawlTimeout', 'jobType' properties to ensure anything that is changeable is stored on the crawl
- crawlconfigcore: store share properties between crawl and crawlconfig in new crawlconfigcore (includes 'schedule', 'jobType', 'config', 'profileid', 'schedule', 'crawlTimeout', 'tags', 'oid')
- crawlconfig object: remove 'oldId', 'newId', disallow deactivating/deleting while crawl is running
- rename 'userid' -> 'createdBy'
- remove unused 'completions' field
- add missing return to fix /run response
- crawlout: ensure 'profileName' is resolved on CrawlOut from profileid
- crawlout: return 'name' instead of 'configName' for consistent response
- update: 'modified', 'modifiedBy' fields to set modification date and user modifying config
- update: ensure PROFILE_FILENAME is updated in configmap is profileid provided, clear if profileid==""
- update: return 'settings_changed' and 'metadata_changed' if either crawl settings or metadata changed
- tests: update tests to check settings_changed/metadata_changed return values

add revision tracking to crawlconfig:
- store each revision separate mongo db collection
- revisions accessible via /crawlconfigs/{cid}/revs
- store 'rev' int in crawlconfig and in crawljob
- only add revision history if crawl config changed

migration:
- update to db v3
- copy fields from crawlconfig -> crawl
- rename userid -> createdBy
- copy userid -> modifiedBy, created -> modified
- skip invalid crawls (missing config), make createdBy optional (just in case)

frontend: Update crawl config keys with new API (#681), update frontend to use new PATCH endpoint, load config from crawl object in details view

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: sua yoo <sua@suayoo.com>
2023-03-07 20:36:50 -08:00

73 lines
2.3 KiB
Python

"""
BaseMigration class to subclass in each migration module
"""
from pymongo.errors import OperationFailure
class MigrationError(Exception):
"""Custom migration exception class"""
class BaseMigration:
"""Base Migration class."""
def __init__(self, mdb, migration_version="0001"):
self.mdb = mdb
self.migration_version = migration_version
async def get_db_version(self):
"""Get current db version from database."""
db_version = None
version_collection = self.mdb["version"]
version_record = await version_collection.find_one()
if not version_record:
return db_version
try:
db_version = version_record["version"]
except KeyError:
pass
return db_version
async def set_db_version(self):
"""Set db version to migration_version."""
version_collection = self.mdb["version"]
await version_collection.find_one_and_update(
{}, {"$set": {"version": self.migration_version}}, upsert=True
)
async def migrate_up_needed(self):
"""Verify migration up is needed and return boolean indicator."""
db_version = await self.get_db_version()
print(f"Current database version before migration: {db_version}")
print(f"Migration available to apply: {self.migration_version}")
# Databases from prior to migrations will not have a version set.
if not db_version:
return True
if db_version < self.migration_version:
return True
return False
async def migrate_up(self):
"""Perform migration up."""
raise NotImplementedError(
"Not implemented in base class - implement in subclass"
)
async def run(self):
"""Run migrations."""
if await self.migrate_up_needed():
print("Performing migration up", flush=True)
try:
await self.migrate_up()
await self.set_db_version()
except OperationFailure as err:
print(f"Error running migration {self.migration_version}: {err}")
return False
else:
print("No migration to apply - skipping", flush=True)
return False
print(f"Database successfully migrated to {self.migration_version}", flush=True)
return True