browsertrix

Author	SHA1	Message	Date
Tessa Walsh	08b3d706a7	btrix helper: Add -microk8s flag to explicitly use microk8s (#888 )	2023-05-30 15:41:26 -07:00
Ilya Kreymer	00fb8ac048	Concurrent Crawl Limit (#874 ) concurrent crawl limits: (addresses #866) - support limits on concurrent crawls that can be run within a single org - change 'waiting' state to 'waiting_org_limit' for concurrent crawl limit and 'waiting_capacity' for capacity-based limits orgs: - add 'maxConcurrentCrawl' to new 'quotas' object on orgs - add /quotas endpoint for updating quotas object operator: - add all crawljobs as related, appear to be returned in creation order - operator: if concurrent crawl limit set, ensures current job is in the first N set of crawljobs (as provided via 'related' list of crawljob objects) before it can proceed to 'starting', otherwise set to 'waiting_org_limit' - api: add org /quotas endpoint for configuring quotas - remove 'new' state, always start with 'starting' - crawljob: add 'oid' to crawljob spec and label for easier querying - more stringent state transitions: add allowed_from to set_state() - ensure state transitions only happened from allowed states, while failed/canceled can happen from any state - ensure finished and state synched from db if transition not allowed - add crawl indices by oid and cid frontend: - show different waiting states on frontend: 'Waiting (Crawl Limit) and 'Waiting (At Capacity)' - add gear icon on orgs admin page - and initial popup for setting org quotas, showing all properties from org 'quotas' object tests: - add concurrent crawl limit nightly tests - fix state waiting -> waiting_capacity - ci: add logging of operator output on test failure	2023-05-30 15:38:03 -07:00
sua yoo	ab518f51fb	Fix ResizeObserver loop error (#902 )	2023-05-30 14:59:34 -07:00
sua yoo	6208ead040	Sort collection by last updated (modified) (#897 ) Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-05-30 14:09:10 -04:00
Ilya Kreymer	4d30a64bc9	collection delete: (#896 ) set delete endpoint to use DELETE verb, fix for #869	2023-05-29 18:19:04 -07:00
Tessa Walsh	df4c4e6c5a	Optimize workflow statistics updates (#892 ) * optimizations: - rename update_crawl_config_stats to stats_recompute_all, only used in migration to fetch all crawls and do a full recompute of all file sizes - add stats_recompute_last to only get last crawl by size, increment total size by specified amount, and incr/decr number of crawls - Update migration 0007 to use stats_recompute_all - Add isCrawlRunning, lastCrawlStopping, and lastRun to stats_recompute_last - Increment crawlSuccessfulCount in stats_recompute_last * operator/crawls: - operator: keep track of filesAddedSize in redis as well - rename update_crawl to update_crawl_state_if_changed() and only update if state is different, otherwise return false - ensure mark_finished() operations only occur if crawl is state has changed - don't clear 'stopping' flag, can track if crawl was stopped - state always starts with "starting", don't reset to starting tests: - Add test for incremental workflow stats updating - don't clear stopping==true, indicates crawl was manually stopped --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-05-26 22:57:08 -07:00
Tessa Walsh	9c7a312a4c	Rework collections to track collections in Crawl (#878 ) * Track collections in Crawl rather than crawls in Collection * Add delete collection API endpoint and tests * Precompute collection crawlCount, pageCount, and tags and add them to GET collection responses * Add modified field to Collection * Update collection replay.json method * Make add and remove crawls accept list of crawl ids * Auto-add new workflow crawls to collections when they successfully complete via CrawlConfig.autoAddCollections field * Move long-running post-crawl operator tasks into asyncio task * Make CrawlConfig.autoAddCollections updatable via /update API endpoint	2023-05-25 15:41:50 -04:00
Tessa Walsh	52acd831cd	Add current context and confirmation dialog to reset/bootstrap methods (#887 )	2023-05-25 13:43:53 -04:00
Ilya Kreymer	d7c19c7613	Wait for DB init for healthcheck + settings (#885 ) * init check: (backend fix for #794) - wait until db is inited before settings /api/settings to return 200 - also return 503 from healthcheck endpoint, until db is available	2023-05-25 09:58:30 -07:00
sua yoo	965aa7ff90	Update backend local development docs (#884 ) * docs refactor: - add local deployment guide local-dev-setup.md - deploy/local.md focuses only on deployment with latest release, links to local-dev-setup.md for local image deployment - add nav to mkdocs.yml to ensure correct order of pages - update microk8s specific info - update minikube specific info --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-05-25 09:50:56 -07:00
sua yoo	4852532866	Show org creation form if there are no orgs (#883 )	2023-05-24 13:10:12 -07:00
sua yoo	88b5646a91	docs: fix link to dev docs	2023-05-24 10:59:41 -07:00
Henry Wilkinson	f788934ef5	Fix copy tags button disabling when no tags on Crawl Details page (#877 )	2023-05-24 12:30:31 -04:00
Tessa Walsh	e94e179bb9	Fix crawl stopping tests (#875 ) * Update currCrawlStopping references in backend tests * Make sure previous crawl is fully stopped before next test	2023-05-23 12:39:53 -07:00
Tessa Walsh	5c944d4626	Remove uniqueness constraint on collection descriptions Fix for copy-paste error	2023-05-23 11:03:13 -04:00
Ilya Kreymer	12f7db3ae2	tests: fixes for crawl cancel + crawl stopped (#864 ) * tests: - fix cancel crawl test by ensuring state is not running or waiting - fix stop crawl test by ensuring stop is only initiated after at least one page has been crawled, otherwise result may be failed, as no crawl data has been crawled yet (separate fix in crawler to avoid loop if stopped before any data written webrecorder/browsertrix-crawler#314) - bump page limit to 4 for tests to ensure crawl is partially complete, not fully complete when stopping - allow canceled or partial_complete due to race condition * chart: bump frontend limits in default, not just for tests (addresses #780) * crawl stop before starting: - if crawl stopped before it started, mark as canceled - add test for stopping immediately, which should result in 'canceled' crawl - attempt to increase resync interval for immediate failure - nightly tests: increase page limit to test timeout * backend: - detect stopped-before-start crawl as 'failed' instead of 'done' - stats: return stats counters as int instead of string	2023-05-22 20:17:29 -07:00
Tessa Walsh	28f1c815d0	Add crawlSuccessfulCount to workflows (#871 )	2023-05-22 19:06:37 -04:00
Tessa Walsh	bd8b306fbd	Improve sorting workflows by lastUpdated (#826 ) * Precompute config crawl stats Includes a database migration to move preciously dynamically computed crawl stats for workflows into the CrawlConfig model. * Add lastRun sorting option and enable it by default * Add modified as final sort key to order non-run workflows * Remove currCrawl* fields and update frontend accordingly * Add isCrawlRunning field to backend and use in frontend	2023-05-22 18:42:30 -04:00
Tessa Walsh	60fac2b677	Add collection sorting and filtering (#863 ) * Sort by name and description (ascending by default) * Filter by name * Add endpoint to fetch collection names for search * Add collation so that utf-8 chars sort as expected	2023-05-22 16:53:49 -04:00
sua yoo	821fbc12d8	Upgrade Shoelace to stable version (v2) (#856 )	2023-05-22 10:01:48 -07:00
Ilya Kreymer	826c2e8298	version: bump to 1.6.0-beta.0	2023-05-19 11:29:31 -07:00
Tessa Walsh	f482831d53	Use collection uuid as id (instead of name) (#855 ) Also ensure name is not empty by adding minimum length of 1 Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-05-19 09:03:48 -04:00
Ilya Kreymer	d07204e59d	version: bump to 1.5.1	2023-05-18 17:28:42 -07:00
sua yoo	b5781c8869	Fix workflow edit back button (#857 )	2023-05-17 12:07:12 -07:00
Henry Wilkinson	da33231be9	Removes webkit `<summary>` element triangle (#852 )	2023-05-16 18:13:59 -04:00
Ilya Kreymer	a1ef93a46a	version: bump to 1.5.0 for release!	2023-05-16 17:36:58 +02:00
Ilya Kreymer	ebee5e1788	version: bump to 1.5.0-beta.4	2023-05-12 07:34:50 +02:00
sua yoo	f250293794	Fix workflow edit page not loading (#848 ) * fix workflow not loading * don't add hash if editing * remove controller	2023-05-12 07:33:35 +02:00
sua yoo	98d82184e6	Fix superadmin running crawls views (#846 ) - Updates superadmin "Running Crawls" to show active crawls (starting, waiting, running, stopping) and sort by start by default - Navigates to crawl workflow watch view on clicking crawl item - Adds "Copy Crawl ID" to crawl actions for easy paste into "Jump to crawl" - Navigates to crawl workflow watch when jumping to crawl	2023-05-11 08:15:52 +02:00
Ilya Kreymer	d8b36c0ae2	version: bump to 1.5.0-beta.3	2023-05-11 03:05:46 +02:00
sua yoo	a6435ae3d0	Improve Workflow Detail tab and button UX (#840 ) - Adds primary action button next to "Actions" dropdown - Switches "Edit Workflow Settings" button to icon button - Redirects user to "Watch Crawl" tab when starting crawl - Now uses crawl ID from `data.started` in API `/run` response for more responsive UI - Keeps "Watch Crawl" tab navigation button in list but disable when crawl is not running - Also handles watch view when workflow is not running to cover navigational edge cases - Adds banner in "Crawls" list to direct users to the Watch Crawl when workflow is running - Shows notification when crawl is done to make redirect to Crawls tab smoother - Uses workflow scale when updating crawl scale - Removes "All" from "View: All Finished Crawls" on Finished Crawl page for wording consistency	2023-05-11 02:57:38 +02:00
Ilya Kreymer	d1e5b0a021	version: bump to 1.5.0-beta.2	2023-05-10 14:55:35 +02:00
Ilya Kreymer	a6ddde496d	backend: fixes to 0005 migration: (#843 ) - catch any errors on updating config (likely due to missing configmap), fix formatting	2023-05-10 12:00:41 +02:00
Ilya Kreymer	cf15d9c873	backend: ensure cid is a UUID, remove unneeded inactive check on crawls (#842 ) * backend: ensure cid is a UUID, remove unneeded inactive check on crawls * add UUID cast to cancel only	2023-05-10 11:59:44 +02:00
sua yoo	42794cad46	Add stop crawl confirmation dialog (#841 ) * switch dialog control * wait for workflow update to complete before showing dialog * add stop dialog * close scale after save * update crawl text	2023-05-10 07:21:16 +02:00
Ilya Kreymer	82b21b6813	frontend crawl stopping improvements (#836 ) (#838 ) * frontend crawl stopping improvements (#836) - support new backend 'stopping' property - for now, keep 'stopping' indicator state when crawl is running but stopping set to true	2023-05-08 23:52:49 -07:00
Ilya Kreymer	2cae065c46	Add Waiting state on the backend and frontend (#839 ) * operator: add waiting state - add pods as related objects - inspect pod status, set crawl status to 'waiting' if no pods are running frontend: - frontend support for 'waiting' state - show waiting icon from mocks --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2023-05-08 17:05:01 -07:00
Ilya Kreymer	70319594c2	crawlconfig: fix default filename template, make configurable (#835 ) * crawlconfig: fix default filename template, make configurable - make default crawl file template configurable with 'default_crawl_filename_template' value in values.yaml - set to '@ts-@hostsuffix.wacz' by default - allow updating via 'crawlFilenameTemplate' in crawlconfig patch, which updates configmap - tests: add test for custom 'default_crawl_filename_template'	2023-05-08 14:03:27 -07:00
Ilya Kreymer	fd7e81b8b7	stopping fix: backend fixes for #836 + prep for additional status fields (#837 ) * stopping fix: backend fixes for #836 - sets 'stopping' field on crawl when crawl is being stopped (both via db and on k8s object) - k8s: show 'stopping' as part of crawljob object, update subchart - set 'currCrawlStopping' on workflow - support old and new browsertrix-crawler stopping keys - tests: add tests for new stopping state, also test canceling crawl (disable test for stopping crawl, currently failing) - catch redis error when getting stats operator: additional optimizations: - run pvc removal as background task - catch any exceptions in finalizer stage (eg. if db is down), return false until finalizer completes	2023-05-08 14:02:20 -07:00
Ilya Kreymer	064cd7e08a	quickfix: fix stopping crawls with current browsertrix-crawler beta	2023-05-06 23:35:25 -07:00
Ilya Kreymer	b40d599e17	operator fixes: (#834 ) - just pass cid from operator for consistency, don't load crawl from update_crawl (different object) - don't throw in update_config_crawl_stats() to avoid exception in operator, only throw in crawlconfigs api	2023-05-06 13:02:33 -07:00
Ilya Kreymer	f992704491	version: bump version to 1.5.0-beta.1	2023-05-06 00:31:03 -07:00
Tessa Walsh	4f121fb868	Update precompute migration to only update active workflows (#833 )	2023-05-05 21:35:03 -07:00
Tessa Walsh	8281ba723e	Pre-compute workflow last crawl information (#812 ) * Precompute config crawl stats * Includes a database migration to move preciously dynamically computed crawl stats for workflows into the CrawlConfig model. * Add crawls.finished descending index * Add last crawl fields to workflow tests	2023-05-05 15:12:52 -07:00
sua yoo	9fcbc3f87e	Allow users to set max depth/hop out within scope (#816 ) - Adds an input to the Workflow creation and edit form for specifying crawl depth. This input is conditionally shown for seeded crawls when the scope is set to "Pages on this domain", "Pages on this domain & subdomains" or "Custom page prefix". The "any" scope is also supported for backwards compatibility but is not shown by default or in new configs. - API implementation: The depth value is set in the primary seed config, i.e. the first seed in seeds: [], not in the outer .config.depth property.	2023-05-05 14:26:48 -07:00
Henry Wilkinson	7409e0637e	Improves crawl detail files list truncation (#830 )	2023-05-05 14:25:29 -07:00
sua yoo	0d23b45dac	Crawl workflow detail page improvements (#823 ) Resolves #817 - Adds relevant action buttons to each Workflow detail tab header - Adds "Delete" action menu item to crawls in Crawls tab - Prevent automatically switching to "Watch" tab after running crawl from detail page - Removes "Stop" confirmation prompt and only shows "Cancel" confirmation prompt if there are one or more pages crawled - Replaces "Cancel" confirmation prompt with web component dialog (partially addresses Switch to in-page dialogue boxes #619) - Fixes hash routing to fix going back with browser back button	2023-05-05 13:50:45 -07:00
Ilya Kreymer	aae0e6590e	Ensure Volumes are deleted when crawl is canceled (#828 ) * operator: - ensures crawler pvcs are always deleted before crawl object is finalized (fixes #827) - refactor to ensure finalizer handler always run when finalizing - remove obsolete config entries	2023-05-05 12:05:54 -07:00
Tessa Walsh	48d34bc3c4	Add option to list workflows API endpoint to filter by schedule (#822 ) * Add option to filter workflows by empty or non-empty schedule * Add tests	2023-05-05 12:05:19 -07:00
Tessa Walsh	542ad7a24a	Update scale in workflow when crawl scale is updated (#820 )	2023-05-05 11:59:57 -07:00

1 2 3 4 5 ...

636 Commits