browsertrix

Author	SHA1	Message	Date
sua yoo	4c36c80351	feat: Display scale as number of browser windows (#2057 ) Resolves https://github.com/webrecorder/browsertrix/issues/2048 --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2024-09-05 17:32:40 -07:00
sua yoo	337454f8c9	feat: Add link to hosted sign-up page (#2045 ) Resolves https://github.com/webrecorder/browsertrix/issues/2043 <!-- Fixes #issue_number --> ### Changes - Shows link to sign up in UI if `sign_up_url` is configured. - Expires settings in session storage (for now)	2024-08-26 17:26:25 -07:00
Ilya Kreymer	7fa2b61b29	Execution time tracking tweaks (#1994 ) Tweaks to how execution time is tracked for more accuracy + excluding waiting states: - don't update if crawl state is in a 'waiting state' (waiting for capacity or waiting for org limit) - rename start states -> waiting states for clarity - reset lastUpdatedTime if two consecutive updates of non-running state, to ensure non-running states don't count, but also account for occasional hiccups -- if only one update detects non-running state, don't reset - webhooks: move start webhook to when crawl actually starts for first time (db lastUpdatedTime is not yet + crawl is running) - don't set lastUpdatedTime until pods actually running - set crawljob update interval to every 10 seconds for more accurate execution time tracking - frontend: show seconds in 'Execution Time' display	2024-08-06 09:44:44 -07:00
Ilya Kreymer	96691a33fa	Fix for cronjob skipping response (#1976 ) If a cronjob is disabled, the operator should quickly return a success value so that the job can be terminated. Was previously returning an incorrect response, causing disabled cronjobs to not be cleaned up. Add proper typing to always return correct response	2024-07-29 12:24:18 -07:00
Ilya Kreymer	b35669af8d	disable behaviors for QA runs via configmap (#1963 ) - make crawl args a reusable template - adds QA_ARGS to configmap, setting to same value as CRAWL_ARGS but with --behaviors= prepended to disable behaviors for QA, to improve performance of QA runs. fixes #1962	2024-07-23 19:54:21 -07:00
Ilya Kreymer	01ddf95a56	allow disabling of auto-resize of crawler pods (#1964 ) - only enable if 'enable_auto_resize' is true, default to false - if true, set memory limit to 1.2 of memory requests, resize when hitting 'soft oom' of initial request, adjust by 1.2 (current behavior) up to max_crawler_memory - if false, set memory limit to max_crawler_memory and never adjust memory requests or memory limits - part of #1959	2024-07-23 21:00:40 -04:00
Ilya Kreymer	9a67e28f13	Adds Subscription API (#1914 ) Fixes https://github.com/webrecorder/browsertrix/issues/1905 - adds a new top-level `/api/subscriptions` endpoint and SubOps handler on the backend. - enable subscriptions API endpoints available only if `billing_enabled` is set in helm chart - new POST /subscriptions/create, /subscriptions/update, /subscriptions/cancel API endpoints - Subscriptions mongo collection storing timestamped /subscription API events - GET /subscriptions/events API to get subscription events, support for filtering and sorting - Subscription data model - Support for setting and handling readOnlyOnCancel on org - /orgs/<id>/billing-portal to lookup portalUrl using external API - subscription in org getter and list views - mark org as readOnly for subscription status `paused_payment_failed`, clears it on status `active` --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-07-10 17:41:16 -07:00
Vinzenz Sinapius	01d8bdc5e6	Crawler network policy (#1727 ) Limit egress traffic from crawler/profilebrowser pods to the internet and limited internal services like dns, redis, frontend, auth-signer on certain ports --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2024-07-03 10:55:03 -07:00
Tessa Walsh	f076e7d9e3	Add superuser API endpoints to export and import org data (#1394 ) Fixes #890 This PR introduces new streaming superuser-only API endpoints to export and import database information for an organization. New Adminstrator deployment documentation on how to manage the process and copy files between S3 buckets as needed is also included. --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2024-07-02 17:14:34 -04:00
Ilya Kreymer	e1ef894275	Extends Org Create endpont + shared secret auth (#1897 ) Updates the /api/orgs/create endpoint to: - not have name / slug be required, will be renamed on first user via #1870 - support optional quotas - support optional first admin user email, who will receive an invite to join the org. Also supports a new shared secret mechanism, to allow an external automation to access the /api/orgs/create endpoint (and only that endpoint thus far) via a shared secret instead of normal login.	2024-07-01 09:37:02 -07:00
Ilya Kreymer	3cd52342a7	Remove Crawl Workflow Configmaps (#1894 ) Fixes #1893 - Removes crawl workflow-scoped configmaps, and replaces with operator-controlled per-crawl configmaps that only contain the json config passed to Browsertrix Crawler (as a volume). - Other configmap settings replaced are replaced the custom CrawlJob options (mostly already were, just added profile_filename and storage_filename) - Cron jobs also updated to create CrawlJob without relying on configmaps, querying the db for additional settings. - The `userid` associated with cron jobs is set to the user that last modified the schedule of the crawl, rather than whomever last modified the workflow - Various functions that deal with updating configmaps have been removed, including in migrations. - New migration 0029 added to remove all crawl workflow configmaps	2024-06-28 15:25:23 -07:00
Tessa Walsh	7af3980323	Add billing enabled and sales email to Helm chart and /settings API endpoint (#1873 ) Backend work for first two tasks of https://github.com/webrecorder/browsertrix/issues/1875 New /billing API endpoint to be added separately once we have a better idea of what data we can get from the payment processor.	2024-06-25 10:55:29 -04:00
Ilya Kreymer	fa6627ce70	ensure QA configmap is updated for long running QA runs: (#1865 ) - add a 'expire_at_duration_seconds' which is 75% of actual presign duration time, or <25% remaining until presigned URL actually expires to ensure presigned URLs are updated early than when they actually expire - set cached expireAt time to the renew at time for more frequent updates - update QA configmap in place with updated presigned URLs when expireAt time is reached - mount qa config volume under /tmp/qa/ without subPath to get automatic updates, which crawler will handle - tests: fix qa test typo (from main) - fixes #1864	2024-06-12 10:51:35 -07:00
Ilya Kreymer	d42de92d75	QA analysis scale configurable in helm chart (#1843 ) - allow configuring QA run scale via 'qa_scale' setting in helm values (overriding any setting on the qa crawljob) - adds additional comments to browser instances helm values settings for clarity - fixes #1842	2024-05-30 12:59:21 -07:00
Ilya Kreymer	61239a40ed	include workflow config in QA runs + different browser instances for QA (#1829 ) Currently, the workflow crawl settings were not being included at all in QA runs. This mounts the crawl workflow config, as well as QA configmap, into QA run crawls, allowing for page limits from crawl workflow to be applied to QA runs. It also allows a different number of browser instances to be used for QA runs, as QA runs might work better with less browsers, (eg. 2 instead of 4). This can be set with `qa_browser_instances` in helm chart. Default qa browser workers to 1 if unset (for now, for best results) Fixes #1828	2024-05-29 13:32:25 -07:00
Ilya Kreymer	f6c0791dc1	fix missing settings / typos: (#1748 ) - ensure max_crawler_memory_size is inited before it is set! - pass profile_browser_memory / profile_browser_cpu from chart values - map volume to /tmp/home to avoid persisting /tmp for profiles	2024-04-25 09:00:17 +02:00
Ilya Kreymer	ec74eb4242	operator: add 'max_crawler_memory' to limit autosizing of crawler pods (#1746 ) Adds a `max_crawler_memory` chart setting, which, if set, will defines the upper crawler memory limit that crawler pods can be resized up to. If not set, auto resizing is disabled and pods are always set to 'crawler_memory' memory	2024-04-24 15:16:32 +02:00
Ilya Kreymer	b94070160b	allow configuring designated registration org to which new users can register (#1735 ) if 'registration_enabled' is set, check 'registration_org_id' for org id of an existing org that new users should be added to when they register. if omitted, default to the default org Fixes #1729	2024-04-23 17:11:37 -04:00
Vinzenz Sinapius	a8336925b6	Run crawler and profilebrowser with non-root user (#1625 ) With these changes, crawler and profilebrowser jobs run as a non-root user.	2024-04-17 12:03:33 -07:00
Ilya Kreymer	835014d829	restrict qa runs to a 'min_qa_crawler_image' if set in the chart (#1685 ) - fixes #1684 - can be used to optionally restrict QA to only some crawls (eg. with browsertrix-crawler>=1.0.0) - enforce error on backend (return 400) and handle special error on the frontend	2024-04-17 08:48:33 -07:00
Ilya Kreymer	95f5605af7	renumber crawl priority classes: (#1673 ) - priority classes <-10 are ignored by cluster-autoscaler so QA jobs with too low priorities never run - start crawl priorities at 0 going down (same as before) - start qa run priorities at -2 going down (instead of -100) - this means a crawl of with scale of 3 can be preempted by 1st qa pod, but otherwise crawls have higher priority - rename priority classes as they are otherwise immutable and error on helm upgrade This allows for more room in lower pri classes for other type of objects, while keeping in mind the -10 and below threshold: (see: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md)	2024-04-13 12:24:43 -07:00
Ilya Kreymer	17f49a52de	email templates update + customization + doc update (fixes #1652 ) (#1653 ) - modify invite email template to answer common questions - email templates: make each email template overridable with --set-file - docs: update customization doc to document how to customize email templates --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-04-08 12:27:47 -07:00
Ilya Kreymer	c1817cbe04	add horizontal pod autoscaler for backend and frontend via helm charts (#1633 ) Supports horizontal pod autoscaling (hpa) for backend and frontend pods: - use cpu and memory averages - adjust base memory + cpu for backend - threshold set to 80% cpu and 95% memory utilization by default (configurable in values.yaml) - instead of backend and frontend replicas, set max replicas in values.yaml - only enable hpa if backend_max_replicas or frontend_max_replicas is >1, default to 1 for now	2024-03-28 16:39:27 -07:00
Ilya Kreymer	3438133fcb	Crawler pod memory padding + auto scaling (#1631 ) - set memory limit to 1.2x memory request to provide extra padding and avoid OOM - attempt to resize crawler pods by 1.2x when exceeding 90% of available memory - do a 'soft OOM' (send extra SIGTERM) to pod when reaching 100% of requested memory, resulting in faster graceful restart, but avoiding a system-instant OOM Kill - Fixes #1632 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-03-28 16:39:00 -07:00
Ilya Kreymer	4f676e4e82	QA Runs Initial Backend Implementation (#1586 ) Supports running QA Runs via the QA API! Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes #1498 Also requires the latest Browsertrix Crawler 1.1.0+ (from webrecorder/browsertrix-crawler#469 branch) Notable changes: - QARun objects contain info about QA runs, which are crawls performed on data loaded from existing crawls. - Various crawl db operations can be performed on either the crawl or `qa.` object, and core crawl fields have been moved to CoreCrawlable. - While running,`QARun` data stored in a single `qa` object, while finished qa runs are added to `qaFinished` dictionary on the Crawl. The QA list API returns data from the finished list, sorted by most recent first. - Includes additional type fixes / type safety, especially around BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific get_upload(), get_basecrawl(), get_crawl() getters for internal use and get_crawl_out() for API - Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch) along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results. --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-03-20 22:42:16 -07:00
Tessa Walsh	21ae38362e	Add endpoints to read pages from older crawl WACZs into database (#1562 ) Fixes #1597 New endpoints (replacing old migration) to re-add crawl pages to db from WACZs. After a few implementation attempts, we settled on using [remotezip](https://github.com/gtsystem/python-remotezip) to handle parsing of the zip files and streaming their contents line-by-line for pages. I've also modified the sync log streaming to use remotezip as well, which allows us to remove our own zip module and let remotezip handle the complexity of parsing zip files. Database inserts for pages from WACZs are batched 100 at a time to help speed up the endpoint, and the task is kicked off using asyncio.create_task so as not to block before giving a response. StorageOps now contains a method for streaming the bytes of any file in a remote WACZ, requiring only the presigned URL for the WACZ and the name of the file to stream.	2024-03-19 14:14:21 -07:00
Ilya Kreymer	804f755787	Increase startup probe time to account for long-running migrations (#1560 ) - increases the failureThreshold for startupProbe for the api backend container to account for long running migrations, upto 300 seconds - add `/healthzStartup` which checks if db is ready - bump - keeps `/healthz` to always return 200 when running - increases livenessProbe failureThreshold to be higher than readiness probe, following recommended best practice of liveness probe > readiness probe - fixes #1559	2024-02-28 14:22:33 -08:00
Tessa Walsh	14189b7cfb	Add crawl pages and related API endpoints (#1516 ) Fixes #1502 - Adds pages to database as they get added to Redis during crawl - Adds migration to add pages to database for older crawls from pages.jsonl and extraPages.jsonl files in WACZ - Adds GET, list GET, and PATCH update endpoints for pages - Adds POST (add), PATCH, and POST (delete) endpoints for page notes, each with their own id, timestamp, and user info in addition to text - Adds page_ops methods for 1. adding resources/urls to page, and 2. adding automated heuristics and supplemental info (mime, type, etc.) to page (for use in crawl QA job) - Modifies `Migration` class to accept kwargs so that we can pass in ops classes as needed for migrations - Deletes WACZ files and pages from database for failed crawls during crawl_finished process - Deletes crawl pages when a crawl is deleted Note: Requires a crawler version 1.0.0 beta3 or later, with support for `--writePagesToRedis` to populate pages at crawl completion. Beta 4 is configured in the test chart, which should be upgraded to stable 1.0.0 when it's released. Connected to https://github.com/webrecorder/browsertrix-crawler/pull/464 --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2024-02-28 12:11:35 -05:00
Ilya Kreymer	b2a5dbf2cd	enable screenshots by default + fix py version formatting (#1518 ) configmap: add --screenshot thumbnail,view as default screenshots version: update update-version.sh to add newline in version.py to match new black formatting (from changes in #1507) Fixes #1519	2024-02-07 17:07:28 -08:00
Tessa Walsh	07fa46d9aa	Add custom user agent to workflows (#1465 ) Fixes #1341 Adds "User Agent" field to workflow editor under the Browser Settings tab. If not set, the crawler will use the browser's default user agent. Also added to docs and to the workflow details page (if set). --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics> Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>	2024-01-17 17:33:50 -05:00
Ilya Kreymer	90197b2a85	Backend mem usage fix - use fixed MOTOR_MAX_WORKERS + switch to gunicorn (#1468 ) Refactors backend deployment to: - Use MOTOR_MAX_WORKERS (defaulting to 1) to reduce threads used by mongodb connections - Also sets backend workers to 1 by default to reduce default memory usage - Switches to gunicorn with uvloop worker for production use instead of uvicorn (as recommended by uvicorn) Lower thread count should address memory leak/increased usage, which resulted in 5x thread x cpus x workers, eg. potentially 20 or 40 threads just for mongodb connections. Lower default number of workers should make it easier to scale backend with HPA if additional capacity. Fixes #1467	2024-01-16 15:32:42 -08:00
Tessa Walsh	032859f361	Support multiple crawler versions (#1420 ) Fixes #1385 ## Changes Supports multiple crawler 'channels' which can be configured to different browsertrix-crawler versions - Replaces `crawler_image` in helm chart with `crawler_channels` array similar to how storages are handled - The `default` crawler channel must always be provided and specifies the default crawler image - Adds backend `/orgs/{oid}/crawlconfigs/crawler-channels` API endpoint to fetch information about available crawler versions (name, image, and label) and test - Adds crawler channel select to workflow creation/edit screens and profile creation dialog, and updates related API endpoints and configmaps accordingly. The select dropdown is shown only if more than one channel is configured. - Adds `crawlerChannel` to workflow and crawl details. - Add `image` to crawler image, used to display actual image used as part of the crawl. - Modifies `crawler_crawl_id` backend test fixture to use `test` crawler version to ensure crawler versions other than latest work - Adds migration to add `crawlerChannel` set to `default` to existing workflow and profile objects and workflow configmaps --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2024-01-16 15:32:12 -08:00
Ilya Kreymer	b23eed5003	Email Templates (#1375 ) - Emails are now processed from Jinja2 templates found in `charts/email-templates`, to support easier updates via helm chart in the future. - The available templates are: `invite`, `password_reset`, `validate` and `failed_bg_job`. - Each template can be text only or also include HTML. The format of the template is: ``` subject ~~~ <html content> ~~~ text ``` - A new `support_email` field is also added to the email block in values.yaml Invite Template: - Currently, only the invite template includes an HTML version, other templates are text only. - The same template is used for new and existing users, with slightly different text if adding user to an existing org. - If user is invited by the superadmin, the invited by field is not included, otherwise it also includes 'You have been invited by X to join Y'	2023-11-15 15:22:12 -08:00
Ilya Kreymer	ff10124d01	charts cleanup: (#1360 ) - move authsign secret to signer and make port configurable - rename storages to more general ops-configs - put 'storages.json' path into env var - rename backend secret to backend-auth - cronjobs: don't keep succeeded jobs around, triggers operator update	2023-11-08 19:24:00 -08:00
Ilya Kreymer	d2d7240455	background jobs fix: ensure bucket is parsed correctly (#1359 ) Follow-up to #1321 - correctly parse the endpoint_url into prefix and bucket path - also add region and s3 provider type to storage secrets	2023-11-08 15:08:23 -08:00
Ilya Kreymer	5530ca92e1	Move backend app templates to be installed from configmap volume (#1331 ) Instead of adding the app templates launched from the backend via `backend/btrixcloud/templates`, add them to a configmap and mount the configmap in the same location. This allows these templates to be updated, like other values in charts/... without having to rebuild any of the images, speeding up dev and maintenance time. Changes include: - move backend/btrixcloud/templates -> chart/app-templates/ - add app-templates/*.yaml to app-templates configmap - mount app-templates configmap to /app/btrixcloud/templates/ in api and op containers	2023-11-06 09:37:48 -08:00
Francesco Servida	0b8bbcf8e6	Allow User to specify custom cluster-issuer (#1332 ) Implemented variable and defaults for cluster-issuer to allow users to specify, if needed, their own cluster issuer. (eg. installations with only outbound traffic that cannot solve ACME https challenge)	2023-11-04 13:29:17 -07:00
Francesco Servida	4998274ab0	correctly suffix Auth-Signer url when running in custom namespace (#1335 )	2023-11-04 10:34:05 -07:00
Ilya Kreymer	fb3d88291f	Background Jobs Work (#1321 ) Fixes #1252 Supports a generic background job system, with two background jobs, CreateReplicaJob and DeleteReplicaJob. - CreateReplicaJob runs on new crawls, uploads, profiles and updates the `replicas` array with the info about the replica after the job succeeds. - DeleteReplicaJob deletes the replica. - Both jobs are created from the new `replica_job.yaml` template. The CreateReplicaJob sets secrets for primary storage + replica storage, while DeleteReplicaJob only needs the replica storage. - The job is processed in the operator when the job is finalized (deleted), which should happen immediately when the job is done, either because it succeeds or because the backoffLimit is reached (currently set to 3). - /jobs/ api lists all jobs using a paginated response, including filtering and sorting - /jobs/<job id> returns details for a particular job - tests: nightly tests updated to check create + delete replica jobs for crawls as well as uploads, job api endpoints - tests: also fixes to timeouts in nightly tests to avoid crawls finishing too quickly. --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-11-02 13:02:17 -07:00
Ilya Kreymer	6dc452ebad	Storage Refactor: Replication + Custom Storage Support (#1296 ) - Refactors storage to support replicas + custom storages on the Org. - There is a default primary + replica storage, while an Org can also have primary and replica storages. - StorageRef object is used to store references to default and custom storage. - CrawlFile has been updated to contain a StorageRef instead of a def_storage_name, which references either a default storage (in StorageOps) or custom storage (in Organization) - There is also a 'replicas' Optional[List[StorageRef]] which contains replicas, if any. - CrawlFileOut contain a numReplicas for how many replicas exist for a given file. - Migration: migration 0020 added to migrate existing Orgs, CrawlFile and ProfileFile objects to new storage system (CrawlFile and ProfileFile now extend BaseFile) Part of #1262 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-10-26 21:44:09 -07:00
Tessa Walsh	d58747dfa2	Provide full resources in archived items finished webhooks (#1308 ) Fixes #1306 - Include full `resources` with expireAt (as string) in crawlFinished and uploadFinished webhook notifications rather than using the `downloadUrls` field (this is retained for collections). - Set default presigned duration to one minute short of 1 week and enforce maximum supported by S3 - Add 'storage_presign_duration_minutes' commented out to helm values.yaml - Update tests --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-10-23 19:01:58 -07:00
Anish Lakhwara	834fa72baf	Refactor microk8s playbook to follow "new" structure (#1264 ) * Refactor microk8s playbook to follow structure with shared roles - Integrates with btrix/deploy role for deploying - Seperated RedHat and Debian into seperate roles - Created Common role - allow running remotely by default - use 'browsertrix_cloud_home' for charts path - add additional customizable options to btrix_values.j2 (todo: unify all the templates) - docs: update to new playbook path --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-10-11 19:33:30 -07:00
Ilya Kreymer	16e7a1d0a2	Storage Ops Refactor (#1257 ) * storage ops refactor: - create StorageOps class similar to other ops classes - init storages list in StorageOps, no longer require lookup up default storages via CrawlManager - convert all storage functions to members, add storageops to operator - remove unused params, ensure crawl exists for rollover restart - add env var to determine if using local minio to use correct endpoint URL * crawls /seeds endpoint: just return empty list if not a crawl (eg. upload) * crawlmanager: remove unused code, rename check_storage -> has_storage	2023-10-10 15:04:23 -07:00
Ilya Kreymer	fa86555eed	Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235 ) * keep track of per pod status on crawljob: - crashes time, and reason - 'used' vs 'allocated' resources - 'percent' used / allocated * crawl log errors: log error when crawler crashes via OOM, either via redis error log or to console * add initial autoscaling support! - detect if metrics server is available via K8SApi.is_pod_metrics_available() - if available, use metrics for 'used' fields - if no metrics, set memory used for redis only (using redis apis) - allow overriding memory and cpu via newMemory and newCpu settings on pod status - scale memory / cpu based on newMemory and newCpu setting - templates: update jinja templates to allow restarting crawler and redis with new resources - ci: enable metrics-server on k3d, microk8s and nightly k3d ci runs * roles: cleanup unused roles, add permissions for listing metrics * stats for running crawls: - update in db via operator - avoids losing stats if redis pod happens to be done - tradeoff is more db access in operator, but less extra connections to redis + already loading from db in backend - size stat: ensure size of previous files is added to the stats * crawler deployment tweaks: - adjust cpu/mem per browser - add --headless flag to configmap to use new headless mode by default!	2023-10-05 20:41:18 -07:00
Ilya Kreymer	86a424af93	migration improvements: (#1228 ) * migration improvements + rerunning migrations: (fixes #1227) - avoid starting some workers while migration is still running - ensure workers that aren't performing migration await for migration to complete - backend will not be valid until migration is run * allow rerunning migration from specified version via --set rerun_from_migration=<VERSION> (replaces rerun_last_migration)	2023-09-28 12:04:19 -07:00
Ilya Kreymer	c9c39d47b7	Scheduled Crawl Refactor: Handle via Operator + Add Skipped Crawls on Quota Reached (#1162 ) * use metacontroller's decoratorcontroller to create CrawlJob from Job * scheduled job work: - use existing job name for scheduled crawljob - use suspended job, set startTime, completionTime and succeeded status on job when crawljob is done - simplify cronjob template: remove job_image, cron_namespace, using same namespace as crawls, placeholder job image for cronjobs * move storage quota check to crawljob handler: - add 'skipped_quota_reached' as new failed status type - check for storage quota before checking if crawljob can be started, fail if not (check before any pods/pvcs created) * frontend: - show all crawls in crawl workflow, no need to filter by status - add 'skipped_quota_reached' status, show as 'Skipped (Quota Reached)', render same as failed * migration: make release namespace available as DEFAULT_NAMESPACE, delete old cronjobs in DEFAULT_NAMESPACE and recreate in crawlers namespace with new template	2023-09-12 13:05:43 -07:00
Ilya Kreymer	ad9bca2e92	Operator refactor to control pods + pvcs directly instead of statefulsets (#1149 ) - Ability for pod to be Completed, unlike in Statefulset - eg. if 3 pods are running and first one finishes, all 3 must be running until all 3 are done. With this setup, the first finished pod can remain in Completed state. - Fixed shutdown order - crawler pods now correctly shutdown first before redis pods, by switching to background deletion. - Pod priority decreases with scale: 1st instance of a new crawl can preempt 3rd or 2nd instance of another crawl - Create priority classes upto 'max_crawl_scale, configured in values.yaml - Improved scale change reconciliation: if increasing scale, immediately scale up. If decreasing scale, graceful stop scaled-down instance to complete via redis 'stopone' key, wait until they exit with Completed state before adjust status.scale / removing scaled down pods. Ensures unaccepted interrupts don't cause scaled down data to be deleted. - Redis pod remains inactive until crawler is first active, or after no crawl pods are active for 60 seconds - Configurable Redis storage with 'redis_storage' value, set to 3Gi by default - CrawlJob deletion starts as soon as post-finish crawl operations are run - Post-crawl operations get their own redis instance, since one during response is being cleaned up in finalizer - Finalizer ignores request with incorrect state (returns 400 if reported as not finished while crawl is finished) - Current resource usage added to status - Profile browser: also manage single pod directly without statefulset for consistency. - Restart pods via restartTime value: if spec.restartTime != status.restartTime, clear out pods and update status.restartTime (using OnDelete policy to avoid recreate loops in edge cases). - Update to latest metacontroller (v4.11.0) - Add --restartOnError flag for crawler (for browsertrix-crawler 0.11.0) - Failed crawl logging: dd 'fail_crawl()' to be used for failing a crawl, which prints logs for default container (if enabled) as well as pod status - tests: check other finished states to avoid stuck in infinite loop if crawl fails - tests: disable disk utilization check, which adds unpredictability to crawl testing! fixes #1147 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-09-11 10:38:04 -07:00
Anish Lakhwara	e57148d0e9	feat: add SMTP {port, use_tls} config (#1142 ) * feat: add SMTP {port, use_tls} config * If `password` is None don't attempt to log in * remove 'can be omitted' comment --------- Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>	2023-09-08 08:18:36 -07:00
Ilya Kreymer	2967f1e320	ingress: simplify ingress config: (fixes #1135 ) (#1146 ) * ingress: simplify ingress config: (fixes #1135) - use standard Prefix pathTypes - remove nginx-specific rewriting - remove 'scheme', use https/http based on 'tls' setting (in ingress and configmap) - fix signing ingress to use ingressClassName	2023-09-07 09:51:48 -07:00
Ilya Kreymer	68bc053ba0	Print crawl log to operator log (mostly for testing) (#1148 ) * log only if 'log_failed_crawl_lines' value is set to number of last lines to log from failed container --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-09-06 17:53:02 -07:00

1 2 3

130 Commits