browsertrix

Author	SHA1	Message	Date
Emma Segal-Grossman	99dd9b4acb	Remove non-prod & optional dependencies when building frontend in ci (#1455 ) Fixes #1454 ## Motivation We've had a number of cases recently where a build dependency is added to `devDependencies`, the PR passes the frontend build check (`frontend-build-check.yaml`) in the branch, and then fails the cluster run (`k3d-ci.yaml`) in `main` because the frontend build check installs all dependencies, whereas the cluster run uses the frontend Dockerfile, which skips everything but prod dependencies. ## Changes This runs an additional step in the frontend build check, after running unit tests and the l10n build but before doing the build, that re-runs `yarn` with the same arguments as are in the frontend Dockerfile, installing just prod dependencies. This results in slightly longer frontend build check runtimes, but should save us some wasted time fixing broken `main`.	2024-01-10 11:46:17 -08:00
Ilya Kreymer	dfba4b3940	Replace partial_complete -> stopped_by_user or stopped_quota_reached + operator edge cases (#1368 ) - Adds two new crawl finished state, stopped_by_user and stopped_quota_reached - Tracking other possible 'stop reasons' in operator, though not making them distinct states for now. - Updated frontend with 'Stopped by User' and 'Stopped: Time Quota Reached', shown with same icon as current partial_complete - Added migration of partial_complete to either stopped_by_user or complete (no historical quota data available) - Addresses edge case in scaling: if crawl never scaled (no redis entry, no pod), automatically scale down - Edge case in status: if crawl is somehow 'canceled' but not deleted, immediately delete crawl object and begin finalizing. --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-11-14 11:17:16 -08:00
Ilya Kreymer	fb3d88291f	Background Jobs Work (#1321 ) Fixes #1252 Supports a generic background job system, with two background jobs, CreateReplicaJob and DeleteReplicaJob. - CreateReplicaJob runs on new crawls, uploads, profiles and updates the `replicas` array with the info about the replica after the job succeeds. - DeleteReplicaJob deletes the replica. - Both jobs are created from the new `replica_job.yaml` template. The CreateReplicaJob sets secrets for primary storage + replica storage, while DeleteReplicaJob only needs the replica storage. - The job is processed in the operator when the job is finalized (deleted), which should happen immediately when the job is done, either because it succeeds or because the backoffLimit is reached (currently set to 3). - /jobs/ api lists all jobs using a paginated response, including filtering and sorting - /jobs/<job id> returns details for a particular job - tests: nightly tests updated to check create + delete replica jobs for crawls as well as uploads, job api endpoints - tests: also fixes to timeouts in nightly tests to avoid crawls finishing too quickly. --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-11-02 13:02:17 -07:00
Ilya Kreymer	6384d8b5f1	Additional Type Hints / Type Fix Pass (#1320 ) This PR adds more type safety to the backend codebase: - All ops classes calls should be type checked - Avoiding circular references with TYPE_CHECKING conditional - Consistent UUID usage: uuid.UUID / UUID4 with just UUID - Crawl states moved to models, made into lists - Additional typing added as needed, fixed a few type related errors - CrawlOps / UploadOps / BaseCrawlOps now all have same param init order to simplify changes	2023-10-30 12:59:24 -04:00
Ilya Kreymer	63291e95a5	avoid exception if 'errors' key doesn't exist (#1301 ) - avoid exception if 'errors' (or 'files' keys) don't exist (part of #1297) - ensure 'errors' list always set on output model for consistency, defaulting to empty list - fix tests for 'errors' being an empty empty list follow-up to #1300 (merging 1.7.1 release into main)	2023-10-19 14:39:54 -07:00
Ilya Kreymer	c591a5755d	test quickfix: microk8s crawls were not running due to exceeding CI resource capacity to fix: - disable metrics-server - lower per-browser mem/cpu requirements	2023-10-10 23:29:10 -07:00
Tessa Walsh	266afdf8d9	Add slugs to org backend (#1250 ) - Add slug field with uniqueness constraint to Organization - Use python-slugify to generate slug from name and import that in migration - Require name in all /rename and org creation requests - Auto-generate slug for new org with no slug or when /rename is called w/o a slug - Auto-generate slug for 'default-org' based on name - Add /api/orgs/slugs GET endpoint to return all slugs in use - tests: extend backend test-requirements.txt from requirements to allow testing slugify - tests: move get_redis_crawl_stats() to avoid extra dependency in utils	2023-10-10 18:30:09 -07:00
Ilya Kreymer	fa86555eed	Track pod resource usage, detect OOM crashes, handle auto-scaling (#1235 ) * keep track of per pod status on crawljob: - crashes time, and reason - 'used' vs 'allocated' resources - 'percent' used / allocated * crawl log errors: log error when crawler crashes via OOM, either via redis error log or to console * add initial autoscaling support! - detect if metrics server is available via K8SApi.is_pod_metrics_available() - if available, use metrics for 'used' fields - if no metrics, set memory used for redis only (using redis apis) - allow overriding memory and cpu via newMemory and newCpu settings on pod status - scale memory / cpu based on newMemory and newCpu setting - templates: update jinja templates to allow restarting crawler and redis with new resources - ci: enable metrics-server on k3d, microk8s and nightly k3d ci runs * roles: cleanup unused roles, add permissions for listing metrics * stats for running crawls: - update in db via operator - avoids losing stats if redis pod happens to be done - tradeoff is more db access in operator, but less extra connections to redis + already loading from db in backend - size stat: ensure size of previous files is added to the stats * crawler deployment tweaks: - adjust cpu/mem per browser - add --headless flag to configmap to use new headless mode by default!	2023-10-05 20:41:18 -07:00
Anish Lakhwara	253a267830	Move DO ansible playbook to new format (#1159 ) * feat: move do_setup to new unified format at root of ansible/ dir to allow sharing roles, inventory with playbooks for other deployment types * fix: pass ansible lint * update do settings to current deployment: - bump main node params - add additional settings to helm values template --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-09-27 22:36:34 -07:00
Ilya Kreymer	feb7ab7652	Improved type checking for backend with mypy (#1174 ) * add mypy type check - run type check on backend fix ambiguous typing issues - add mypy to lint gh action + precommit hook - add mypy.ini	2023-09-13 19:40:26 -07:00
Tessa Walsh	7cf2b11eb7	Add event webhook tests (#1155 ) * Add success filter to webhook list GET endpoint * Add sorting to webhooks list API and add event filter * Test webhooks via echo server * Set address to echo server on host from CI env var for k3d and microk8s * Add -s back to pytest command for k3d ci * Change pytest test path to avoid hanging on collecting tests * Revert microk8s to only run on push to main	2023-09-12 22:08:40 -07:00
Ilya Kreymer	ad9bca2e92	Operator refactor to control pods + pvcs directly instead of statefulsets (#1149 ) - Ability for pod to be Completed, unlike in Statefulset - eg. if 3 pods are running and first one finishes, all 3 must be running until all 3 are done. With this setup, the first finished pod can remain in Completed state. - Fixed shutdown order - crawler pods now correctly shutdown first before redis pods, by switching to background deletion. - Pod priority decreases with scale: 1st instance of a new crawl can preempt 3rd or 2nd instance of another crawl - Create priority classes upto 'max_crawl_scale, configured in values.yaml - Improved scale change reconciliation: if increasing scale, immediately scale up. If decreasing scale, graceful stop scaled-down instance to complete via redis 'stopone' key, wait until they exit with Completed state before adjust status.scale / removing scaled down pods. Ensures unaccepted interrupts don't cause scaled down data to be deleted. - Redis pod remains inactive until crawler is first active, or after no crawl pods are active for 60 seconds - Configurable Redis storage with 'redis_storage' value, set to 3Gi by default - CrawlJob deletion starts as soon as post-finish crawl operations are run - Post-crawl operations get their own redis instance, since one during response is being cleaned up in finalizer - Finalizer ignores request with incorrect state (returns 400 if reported as not finished while crawl is finished) - Current resource usage added to status - Profile browser: also manage single pod directly without statefulset for consistency. - Restart pods via restartTime value: if spec.restartTime != status.restartTime, clear out pods and update status.restartTime (using OnDelete policy to avoid recreate loops in edge cases). - Update to latest metacontroller (v4.11.0) - Add --restartOnError flag for crawler (for browsertrix-crawler 0.11.0) - Failed crawl logging: dd 'fail_crawl()' to be used for failing a crawl, which prints logs for default container (if enabled) as well as pod status - tests: check other finished states to avoid stuck in infinite loop if crawl fails - tests: disable disk utilization check, which adds unpredictability to crawl testing! fixes #1147 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-09-11 10:38:04 -07:00
Tessa Walsh	89e44e5cd6	Add operator logs to nightly tests (#1150 )	2023-09-07 09:15:47 -07:00
Ilya Kreymer	7d0cfa93e2	quick fix: fix typo in publish-helm-chart specifying version	2023-09-05 15:51:10 -04:00
Anish Lakhwara	3bfa69b98a	fix: add "v" to helm chart release filename (#1141 ) * fix: add "v" to helm chart release filename, fixes #1134 * add 'v' to helm chart version and update-version.sh --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-09-05 15:47:39 -04:00
Ilya Kreymer	a9ab17fc61	publish helm chart on release (fixes #1114 ) (#1117 ) (#1123 ) - no longer using :latest by default in values.yaml, instead updating version with each release - set chart version to match app version in Chart.yaml - update version in helm chart and values.yaml as part of update-version.sh script - update test.yaml and local-config.yaml to enable using :latest tag images - ci: add ci script for packaging current helm chart - docs: updates docs to indicate deploying directly from GitHub release - docs: add script to fill in latest version for 'VERSION' using custom script - chart: set local_service_port to 30870 by default, but use only if no ingress. - default values.yaml set up for local deployment, local-config.yaml contains additional commented out examples - ci draft: add deployment info to draft with helm install command for current version - test: fix password check test	2023-08-30 12:02:02 -07:00
Ilya Kreymer	38f67a6cc0	Optimize Frontend Image Build on CI (#1057 ) * Always run yarn only on build platform with --platform=$BUILDPLATFORM * Remove optional dependencies (playwright + chromium) from build with --ignore-optional and move some devDependencies to be optional * Disable husky pre-commit hook checks on frontend Co-authored-by: sua yoo <sua@suayoo.com>	2023-08-09 12:06:20 -07:00
Anish Lakhwara	b5a9c42df1	feat: add pre-commit to check we don't have real passwords in yml files (#990 ) * feat: use existing pre-commit framework * feat(ci): add github action for password_check * feat: add some simple tests to password_check.py * fix: set `backend_password_secret` in default values.yaml to an allowed password	2023-07-26 13:29:37 -07:00
Tessa Walsh	577416024b	Fix pull_request syntax in ansible lint GH Action (#995 ) * Fix pull_request syntax in ansible lint GH Action * Only lint Digital Ocean playbook for now * fix: pass ansible lint --------- Co-authored-by: Anish Lakhwara <anish+git@lakhwara.com>	2023-07-20 12:13:52 +02:00
Anish Lakhwara	bc82f562dc	feat: ansible lint github action	2023-07-10 17:58:47 -07:00
Ilya Kreymer	00fb8ac048	Concurrent Crawl Limit (#874 ) concurrent crawl limits: (addresses #866) - support limits on concurrent crawls that can be run within a single org - change 'waiting' state to 'waiting_org_limit' for concurrent crawl limit and 'waiting_capacity' for capacity-based limits orgs: - add 'maxConcurrentCrawl' to new 'quotas' object on orgs - add /quotas endpoint for updating quotas object operator: - add all crawljobs as related, appear to be returned in creation order - operator: if concurrent crawl limit set, ensures current job is in the first N set of crawljobs (as provided via 'related' list of crawljob objects) before it can proceed to 'starting', otherwise set to 'waiting_org_limit' - api: add org /quotas endpoint for configuring quotas - remove 'new' state, always start with 'starting' - crawljob: add 'oid' to crawljob spec and label for easier querying - more stringent state transitions: add allowed_from to set_state() - ensure state transitions only happened from allowed states, while failed/canceled can happen from any state - ensure finished and state synched from db if transition not allowed - add crawl indices by oid and cid frontend: - show different waiting states on frontend: 'Waiting (Crawl Limit) and 'Waiting (At Capacity)' - add gear icon on orgs admin page - and initial popup for setting org quotas, showing all properties from org 'quotas' object tests: - add concurrent crawl limit nightly tests - fix state waiting -> waiting_capacity - ci: add logging of operator output on test failure	2023-05-30 15:38:03 -07:00
Ilya Kreymer	12f7db3ae2	tests: fixes for crawl cancel + crawl stopped (#864 ) * tests: - fix cancel crawl test by ensuring state is not running or waiting - fix stop crawl test by ensuring stop is only initiated after at least one page has been crawled, otherwise result may be failed, as no crawl data has been crawled yet (separate fix in crawler to avoid loop if stopped before any data written webrecorder/browsertrix-crawler#314) - bump page limit to 4 for tests to ensure crawl is partially complete, not fully complete when stopping - allow canceled or partial_complete due to race condition * chart: bump frontend limits in default, not just for tests (addresses #780) * crawl stop before starting: - if crawl stopped before it started, mark as canceled - add test for stopping immediately, which should result in 'canceled' crawl - attempt to increase resync interval for immediate failure - nightly tests: increase page limit to test timeout * backend: - detect stopped-before-start crawl as 'failed' instead of 'done' - stats: return stats counters as int instead of string	2023-05-22 20:17:29 -07:00
Sara Tavares	07fb7317fe	Delete proofread-action.yaml (#760 ) Resulting in a lot of false positives (to revisit later)	2023-04-11 15:49:27 -07:00
Ilya Kreymer	63be81d835	ci: make playwright integration tests run only on PRs involving frontend	2023-04-05 09:57:34 -07:00
Ilya Kreymer	2b0d5ff8b3	misc frontend build fixes: playwright version + chunking (#740 ) * misc frontend build fixes: - fix playwright version to be consistent to fix playwright test - chunking: set max number of chunks generated * lock playwright version * remove intl polyfill --------- Co-authored-by: sua yoo <sua@suayoo.com>	2023-04-03 21:27:44 -07:00
Sara Tavares	b61592b5ed	CI: Add Playwright UI e2e tests + CI (#614 ) Adds Playwright for UI tests. Basic Playwright test to login. Playwright Github Action. --------- Co-authored-by: sua yoo <sua@suayoo.com>	2023-03-22 16:23:22 -07:00
sua yoo	e8f88a797b	Remove new issue project automation config (#718 )	2023-03-21 13:49:34 -07:00
Sara Tavares	3fa93b01b8	ci: Create proofread-action.yaml (#714 )	2023-03-20 21:08:56 -07:00
sua yoo	934ee18044	chore: switch actions for issue assign automation addresses #658	2023-03-08 10:01:00 -08:00
sua yoo	3b61266eed	chore: switch to issue node ID proposed fix for update-project-column	2023-03-06 12:32:08 -08:00
sua yoo	ba2d8db413	chore: fix update-project-column org	2023-03-06 12:27:05 -08:00
sua yoo	0007e9bf0b	chore: remove operation from gh action see: https://github.com/github/update-project-action/pull/50	2023-03-06 12:24:45 -08:00
sua yoo	1e3b384e31	chore: update assign issue automation action	2023-03-06 12:18:28 -08:00
sua yoo	31dc5c56c9	chore: update add-to-project action version	2023-03-06 11:40:28 -08:00
sua yoo	18abc84484	chore: update project automation action	2023-03-06 11:38:10 -08:00
Ilya Kreymer	a86a3b470a	ci: add tokens to fix project automation (to be able to write to shared project)	2023-03-02 09:57:52 -08:00
sua yoo	29f31cd462	ci: add workflows for adding issues to project (#660 )	2023-02-28 18:37:01 -08:00
Tessa Walsh	cbab425fec	Make nightly tests run nightly, not monthly (#624 )	2023-02-22 17:54:16 -05:00
Tessa Walsh	14b349443f	Make pending invites expire via TTL index (#568 ) * Make invites expire after configurable window The value can be set in EXPIRE_AFTER_SECONDS env var and via helm chart values, and defaults to 7 days. * Create nightly test CI and add invite expiration test to it * Update 404 error message for missing or expired invite --------- Co-authored-by: sua yoo <sua@suayoo.com>	2023-02-14 16:07:14 -05:00
Ilya Kreymer	3261e7d666	ci: - run k3d-ci on all changes to charts or backend, including PRs (faster) - run microk8s and k3d-log-ci only on commits to main that change charts or backend (slower) - run lint on all changes to backend, including PRs fix	2023-02-08 11:24:54 -08:00
sua yoo	d128525e4e	Run unit tests in frontend PR check (#569 )	2023-02-06 17:47:15 -08:00
Ilya Kreymer	da60403e4b	ci: set env vars for deploy script	2023-02-03 10:54:03 -08:00
Ilya Kreymer	6866383c6f	ci: fix deploy script typo, ensure version is set	2023-02-03 10:49:17 -08:00
Ilya Kreymer	e8b90b7c3e	Deploy to Dev Cluster Fixes (#542 ) fix typos for on-demand k8s cluster deployment via github action trigger	2023-01-31 16:33:56 -08:00
Ilya Kreymer	e98df10dad	CI: Setup manual workflow for dev deployment (#540 ) * deployment: add initial manual workflow for deploying to dev cluster, addresses #428 * opt: only run k3d-log-ci tests on backend or chart changes	2023-01-31 15:42:50 -08:00
D. Lee	be4f918149	Merge pull request #442 from webrecorder/admin-logging-service Add logging service	2023-01-19 22:15:16 -08:00
sua yoo	24a7b14d63	Add path filter to GH workflows (#500 ) * update docs path * update lint check * cluster runs only on backend changes	2023-01-18 15:02:21 -08:00
sua yoo	f7892d7f2f	Add frontend build check (#498 )	2023-01-18 13:06:33 -08:00
Ilya Kreymer	edfb1bd513	quickfix: pydantic / lint fix (#452 ) * backend: use latest pydantic again, fix pylint with custom .pylintrc (as suggested in pydantic/pydantic#1961)	2023-01-10 18:54:11 -08:00
Ilya Kreymer	56a6d7a5d8	Backend lint check (#451 ) - apply lint + format fixes to backend - add ci for lint + format fixes for backend - use fixed version of pydantic	2023-01-10 16:17:06 -08:00

1 2

64 Commits