browsertrix

Author	SHA1	Message	Date
sua yoo	e56c412737	Add QA columns to archived item list (#1683 ) - Shows QA review status, latest QA state, and QA run count in archived item list. "Created by" column replaced for space and moved to "Date Created" tooltip - Enables sort archived items by QA columns - Adds tooltip to all columns except name - Hides non-applicable columns in when archived items are filtered to uploads only - Fixes issue where tooltips weren't activated on hover when using `btrix-table` - Minor refactor to reuse status labels/colors, renames page review -> page approval to clarify	2024-04-22 13:19:07 -07:00
Tessa Walsh	80008a2853	Add post load delay to Browsertrix (#1700 ) Fixes #1699 Adds post load delay to: - Backend `RawCrawlConfig` model - Frontend (workflow editor and config details component) - Workflow setup docs	2024-04-18 20:03:47 -07:00
Emma Segal-Grossman	9f0a1fcc95	QA page details (#1656 ) ### Changes - improves overflow issues at smaller screen sizes - adds icons to buttons - updates text & layout to match mocks - changes primary button & button options depending on if there's a qa run available - adds a loading state for qa run status & buttons - updates `<btrix-crawl-status>` with a `type` param allowing for crawls, uploads, and QA runs - Updates `<btrix-alert>` to match `<sl-tag>` styling - Improves overflow issues at smaller viewport sizes by making tab lists overflow when necessary ### Features - Ability to start/stop/cancel QA runs https://github.com/webrecorder/browsertrix-cloud/pull/1666 @SuaYoo - Ability to see progress of current QA run @emma-sg - Ability to delete QA runs @emma-sg - Ability to download QA run files https://github.com/webrecorder/browsertrix-cloud/pull/1666 @SuaYoo - Only able to start review if a QA Run is finished (for now, initial pass). @SuaYoo - Only most recent running or successful QA run is displayed in header --------- Co-authored-by: sua yoo <sua@webrecorder.org> Co-authored-by: sua yoo <sua@suayoo.com> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2024-04-17 15:17:13 -07:00
Tessa Walsh	172a9bf0cd	Change crawl.reviewStatus to 1-5 scale int (#1664 )	2024-04-09 17:51:06 -07:00
sua yoo	83c9203a11	Initial QA Review UI! (#1624 ) QA Details page: - Enables QA tab with ability to start automated analysis QA Run + view a and manual review status - Pages listed with review status + overall crawl review status shown on QA details (relates to #1508) - Initial placeholder for QA run analytics (part of #1589) - Addresses a good deal of #1477 Automated Analysis QA in Review Mode: - Ability to select from multiple analysis QA runs / view QA runs in QA details - Shows analysis screenshot, text and resources compare and replay tabs (fixes #1496) - Sorting by worst screenshot / worst text score for each QA run - Includes pages sidebar with screenshot/text/resource compare results (fixes #1497) Manual Review QA in Review Mode: - Per-page replay available as separate tab (fixes #1499) - Supports thumbs up, thumbs down, notes for each page - Supports entering review status approval (good/acceptable/bad can be entered when finishing review --------- Co-authored-by: Emma Segal-Grossman <hi@emma.cafe> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2024-04-04 15:09:52 -07:00
Emma Segal-Grossman	b1e2f1b325	Add ESLint rules for import ordering (#1608 ) Follow-up from https://github.com/webrecorder/browsertrix-cloud/pull/1546#discussion_r1529001599 (cc @SuaYoo) - Adds `eslint-plugin-import-x` and `@ianvs/prettier-plugin-sort-imports` and configures rules for them both so imports get sorted on format & on lint. - Runs both on everything!	2024-03-18 21:50:02 -04:00
sua yoo	9f312c075e	Manually approve pages in QA review (#1576 ) - Automatically update view to first page if page ID isn't specified - Show current page URL in location bar (resolves https://github.com/webrecorder/browsertrix-cloud/issues/1495) - Approve, reject, or leave notes on a page - Display temporary list of links to pages in the sidebar	2024-03-12 10:08:51 -07:00
Emma Segal-Grossman	8462c08206	Fix a couple linting issues (#1565 )	2024-03-11 16:20:37 -07:00
Emma Segal-Grossman	780dd09321	Create `ArchivedItemPage` and `ArchivedItemPageComment` types (#1567 ) Based on #1534 Figured this should be in place so we can work on other front-end things with these, rather than dealing with refactoring later <!-- Fixes #issue_number --> ### Changes - Adds `ArchivedItemPage` and `ArchivedItemPageComment` types from #1534 (thank you @SuaYoo!) - Adds typedefs for match and resource count properties - sets properties optional in the db schema to optional in the type as well ### Manual testing 1. ### Screenshots \| Page \| Image/video \| \| ---- \| ----------- \| \| \| \| <!-- ### Follow-ups -->	2024-03-04 18:52:09 -05:00
Emma Segal-Grossman	3ed10ad893	Add comments I meant to add in #1528 (#1529 ) Tessa merged #1528 before I got to add these comments lol (it's my fault, should have left the PR as draft until I was actually ready)	2024-02-12 19:33:16 -05:00
Emma Segal-Grossman	d88a6eb07f	Include leading zero in months when accessing usage and quota data (#1528 ) Closes #1527 Improves front-end types & ensures the data being accessed matches the data sent by the back-end. Tested by hand by using the returned data from the `/orgs/${orgId}` endpoint in prod where this is happening in dev	2024-02-12 19:27:42 -05:00
Emma Segal-Grossman	d1156b0145	enable a few more useful eslint suggestions & correct some more types (#1517 ) ## Changes Implements suggestions from https://typescript-eslint.io/blog/consistent-type-imports-and-exports-why-and-how/ and https://www.totaltypescript.com/method-shorthand-syntax-considered-harmful, along with a couple more auto-fixable consistency rules. Of note: - Functions that return a promise are marked as async - Suggestions now appear for where to simplify boolean checks, non-nullish assertions, and optional chaining	2024-02-09 16:14:08 -08:00
Emma Segal-Grossman	3968928ac2	ESLint improvements & Typescript upgrade (#1501 ) ## Overview Adds a bunch of ESLint rules, mostly from `typescript-eslint`, and fixes the issues turning on these rules raises. Also updates Typescript & typescript-eslint. ## Rationale Most of these new rules are auto-fixable, so I've tackled a bunch of the little fixes that do need manual intervention now with the intention that this shouldn't add much of any additional friction in future development work, and also give us a good bump in overall code quality. A lot of the rules here are also great for catching potential bugs! ## Changes - Adds `void` to most un-awaited and unhandled promises (i.e. places where async functions are called but nothing is done with the promise) - Converts properties that are only ever read to `readonly` - Adds a new `isApiError` function that informs Typescript of when an error is an `APIError` - Adds types to a bunch of places that were previously untyped - Changes instances of `Map<string, any>` in lit property update methods to `PropertyValues<this>`, or sometimes `PropertyValues<this> & Map<string, unknown>` where private or protected members are used (`keyof` doesn't include private and protected members, unfortunately) - Adds types to a bunch of custom events - Cleans up a regex by removing unnecessary escape characters - Makes a number of implied type conversions explicit (by wrapping with `Boolean(...)` or calling `.toString()`) - More consistently applies type coercions when necessary, and removes them when unnecessary - Converts a couple const strings to an enum - Removes the need to type debounced functions as `any` by doing type coercions to the underlying function type at where the method is bound to the event in the `html` block	2024-01-31 14:42:06 -05:00
sua yoo	1f55edbe68	Update collection archived item lists (#1457 ) New features & enhancements: - New UI for collection item selection dialog - Consistent data table styles for collection list and collection item list Refactors: - Adds `btrix-table` as low-level table component - Adds `btrix-archived-item-list`, removes `checkbox-list` and deprecates `crawl-list` - Upgrades Shoelace for `sl-tree` fixes - Fixes `ArchivedItem` typing	2024-01-22 17:14:53 -08:00
Tessa Walsh	07fa46d9aa	Add custom user agent to workflows (#1465 ) Fixes #1341 Adds "User Agent" field to workflow editor under the Browser Settings tab. If not set, the crawler will use the browser's default user agent. Also added to docs and to the workflow details page (if set). --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics> Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>	2024-01-17 17:33:50 -05:00
Tessa Walsh	032859f361	Support multiple crawler versions (#1420 ) Fixes #1385 ## Changes Supports multiple crawler 'channels' which can be configured to different browsertrix-crawler versions - Replaces `crawler_image` in helm chart with `crawler_channels` array similar to how storages are handled - The `default` crawler channel must always be provided and specifies the default crawler image - Adds backend `/orgs/{oid}/crawlconfigs/crawler-channels` API endpoint to fetch information about available crawler versions (name, image, and label) and test - Adds crawler channel select to workflow creation/edit screens and profile creation dialog, and updates related API endpoints and configmaps accordingly. The select dropdown is shown only if more than one channel is configured. - Adds `crawlerChannel` to workflow and crawl details. - Add `image` to crawler image, used to display actual image used as part of the crawl. - Modifies `crawler_crawl_id` backend test fixture to use `test` crawler version to ensure crawler versions other than latest work - Adds migration to add `crawlerChannel` set to `default` to existing workflow and profile objects and workflow configmaps --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2024-01-16 15:32:12 -08:00
sua yoo	2bb21c615d	Improve frontend event system (#1450 ) - Adds notify, navigate, and log in events to global event map, handle in `btrix-app` - Adds console debugs, which are stripped in prod - Replaces TODO redundant `navTo`s with controller implementation - Refactors rest of `LitElement` helpers into arrow functions	2023-12-13 14:11:15 -08:00
Tessa Walsh	be41c48c27	Add extra and gifted execution minutes (#1361 ) Fixes #1358 - Adds `extraExecMinutes` and `giftedExecMinutes` org quotas, which are not reset monthly but are updateable amounts that carry across months - Adds `quotaUpdate` field to `Organization` to track when quotas were updated with timestamp - Adds `extraExecMinutesAvailable` and `giftedExecMinutesAvailable` fields to `Organization` to help with tracking available time left (includes tested migration to initialize these to 0) - Modifies org backend to track time across multiple categories, using monthlyExecSeconds, then giftedExecSeconds, then extraExecSeconds. All time is also written into crawlExecSeconds, which is now the monthly total and also contains any overage time above the quotas - Updates Dashboard crawling meter to include all types of execution time if `extraExecMinutes` and/or `giftedExecMinutes` are set above 0 - Updates Dashboard Usage History table to include all types of execution time (only displaying columns that have data) - Adds backend nightly test to check handling of quotas and execution time - Includes migration to add new fields and copy crawlExecSeconds to monthlyExecSeconds for previous months Co-authored-by: emma <hi@emma.cafe>	2023-12-07 14:34:37 -05:00
Emma Segal-Grossman	b15c5ccddd	ESLint & Typescript fixes (#1407 ) Closes #1405 - Properly uses `typescript-eslint`: we were missing the preset from it, so some of the default `eslint` rules (that don't properly work with typescript) were being applied and causing false positives - I also moved the `eslint` config into its own file, and enabled `typescript-eslint`'s type-awareness, so that we can enable more type-aware rules in the future if we like - Adds `ts-lit-plugin` to the typescript config, which _hopefully_ will allow us to catch issues during build (in CI) - It looks like `ts-lit-plugin` is sort of abandonware at the moment, and unfortunately _doesn't_ actually work for this purpose right now, but the lit team is working on a replacement here: https://www.npmjs.com/package/@lit-labs/analyzer - Adds `fork-ts-checker-webpack-plugin`, which allows the typescript checking process to be run on a separate forked thread in Webpack, which can help speed up builds & checking - Enables incremental type checking for better speed - Fixes a whole bunch of `eslint`-auto-fixable issues (unused imports and variables, some type issues, etc) - Fixes a bunch of `lit-analyzer` issues (mostly attribute naming, some type issues as well) - Fixes various other type issues: - Improves type safety in a bunch of places, notably anywhere `apiFetch` and `APIPaginatedList` are used - Removes some `any`s	2023-11-24 12:32:53 -05:00
Ilya Kreymer	dfba4b3940	Replace partial_complete -> stopped_by_user or stopped_quota_reached + operator edge cases (#1368 ) - Adds two new crawl finished state, stopped_by_user and stopped_quota_reached - Tracking other possible 'stop reasons' in operator, though not making them distinct states for now. - Updated frontend with 'Stopped by User' and 'Stopped: Time Quota Reached', shown with same icon as current partial_complete - Added migration of partial_complete to either stopped_by_user or complete (no historical quota data available) - Addresses edge case in scaling: if crawl never scaled (no redis entry, no pod), automatically scale down - Edge case in status: if crawl is somehow 'canceled' but not deleted, immediately delete crawl object and begin finalizing. --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-11-14 11:17:16 -08:00
Tessa Walsh	ea5650f173	Add checkmark next to replicated/backed up files (#1343 ) Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>	2023-11-08 11:21:31 -05:00
Tessa Walsh	38f32f11ea	Enforce quota and hard cap for monthly execution minutes (#1284 ) Fixes #1261 Closes #1092 The quota for monthly execution minutes is treated as a hard cap. Once it is exceeded, an alert indicating that an org has exceeded its monthly execution minutes will display and the user will be unable to start new crawls. Any running crawls will be stopped once the quota is exceeded. An execution minutes meter bar is also added in the Org Dashboard and displayed if a quota is set. More detail in #1305 which was merged into this branch. ## Changes - Enable setting 'maxExecMinutesPerMonth' in orgs list quotas by superadmin - Enforce quota by stopping crawls in operator once quota is reached - Show alert banner once execution time quota is hit: - Once quota is hit, disable Run Crawl buttons in frontend, return 403 message with `exec_minutes_quota_reached` detail in backend from crawl config `/run` endpoint, and don't run new workflows on creation (similar to storage quota) - Display execution time for crawls in the crawl details overview, immediately below - Show execution minutes meter on dashboard (from #1305) --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: sua yoo <sua@webrecorder.org>	2023-10-26 15:38:51 -07:00
sua yoo	2e5952a444	Display crawl time usage history table (#1304 ) Partially resolves #1223, fixes #1298 - Adds crawl usage table in dashboard under metrics - Shows skeleton loading indicator when metrics are loading (@Shrinks99 feel free to adjust how this looks) - Shows max number of concurrent crawls running if any are running ("`running` / `max` Crawls Running")	2023-10-23 16:25:16 -07:00
sua yoo	8466caf1d9	Allow org admins to update slug (#1276 ) - Allows editing of org slugs (actual URL updates will be handled in https://github.com/webrecorder/browsertrix-cloud/issues/1258.) - Converts user input to slug using slugify - Adds help text to org name and slug - Renames tab from "information" to "general" settings	2023-10-13 17:00:43 -07:00
Tessa Walsh	b1ead614ee	Add --failOnFailedSeed checkbox to URL list workflows (#1236 ) - If set, and any of the seeds fails, the entire crawl is marked as a failure. - Add checkbox which adds --failOnFailedSeed checkbox to URL list workflows - Add 'Fail Crawl On Failed URL' to crawl workflow setup docs	2023-10-03 18:46:09 -07:00
sua yoo	941a75ef12	Separate seeds into a new endpoints (#1217 ) - Remove config.seeds from workflow and crawl detail endpoints - Add new paginated GET /crawls/{crawl_id}/seeds and /crawlconfigs/{cid}/seeds endpoints to retrieve seeds for a crawl or workflow - Include firstSeed in GET /crawlconfigs/{cid} endpoint (was missing before) - Modify frontend to fetch seeds from new /seeds endpoints with loading indicator --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-10-02 10:56:12 -07:00
Tessa Walsh	9224f52f51	Remove config from list endpoints to speed up responses (#1193 ) * Remove config from list endpoints - Remove config field from workflow and crawl list endpoints - Add seedCount to CrawlConfigOut on backend and Workflow on frontend - Refactor CrawlConfig and CrawlConfigOut to extend CrawlConfigCore + CrawlConfigAdditional - Refactor workflow list in frontend to use firstSeed and seedCount - Frontend uses ListWorkflow type which is Omit<Workflow, "config">	2023-09-19 11:05:48 -05:00
Ilya Kreymer	c9c39d47b7	Scheduled Crawl Refactor: Handle via Operator + Add Skipped Crawls on Quota Reached (#1162 ) * use metacontroller's decoratorcontroller to create CrawlJob from Job * scheduled job work: - use existing job name for scheduled crawljob - use suspended job, set startTime, completionTime and succeeded status on job when crawljob is done - simplify cronjob template: remove job_image, cron_namespace, using same namespace as crawls, placeholder job image for cronjobs * move storage quota check to crawljob handler: - add 'skipped_quota_reached' as new failed status type - check for storage quota before checking if crawljob can be started, fail if not (check before any pods/pvcs created) * frontend: - show all crawls in crawl workflow, no need to filter by status - add 'skipped_quota_reached' status, show as 'Skipped (Quota Reached)', render same as failed * migration: make release namespace available as DEFAULT_NAMESPACE, delete old cronjobs in DEFAULT_NAMESPACE and recreate in crawlers namespace with new template	2023-09-12 13:05:43 -07:00
Tessa Walsh	d2ededc895	Add and enforce org storage quota (#1106 ) * Implement in backend - Track bytesStored in org - Add migration to pre-calculate based on size of crawlfiles and profilefiles - Add methods to increase or decrease org storage when crawl or profile files are added or deleted - Include storageQuotaReached boolean in API responses that alter storage - Don't start new crawls and fail uploads if storage quota reached * Implement in frontend - Add to orgs-list quotas - Update org's storageQuotaReached based on backend endpoint responses - Disable buttons when storage quota is met - Show toast notification when attempting to run a crawl when org storage quota is met	2023-09-07 12:45:43 -04:00
sua yoo	ff6650d481	Manage collection from archived item details (#1085 ) - Lists collections that an archived item belongs to in item detail view - Improves performance of collection add component --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-09-05 17:52:17 -04:00
Tessa Walsh	e667fe2e97	Add max crawl size option to backend and frontend (#1045 ) Backend: - add 'maxCrawlSize' to models and crawljob spec - add 'MAX_CRAWL_SIZE' to configmap - add maxCrawlSize to new crawlconfig + update APIs - operator: gracefully stop crawl if current size (from stats) exceeds maxCrawlSize - tests: add max crawl size tests Frontend: - Add Max Crawl Size text box Limits tab - Users enter max crawl size in GB, convert to bytes - Add BYTES_PER_GB as constant for converting to bytes - docs: Crawl Size Limit to user guide workflow setup section Operator Refactor: - use 'status.stopping' instead of 'crawl.stopping' to indicate crawl is being stopped, as changing later has no effect in operator - add is_crawl_stopping() to return if crawl is being stopped, based on crawl.stopping or size or time limit being reached - crawlerjob status: store byte size under 'size', human readable size under 'sizeHuman' for clarity - size stat always exists so remove unneeded conditional (defaults to 0) - store raw byte size in 'size', human readable size in 'sizeHuman' Charts: - subchart: update crawlerjob crd in btrix-crds to show status.stopping instead of spec.stopping - subchart: show 'sizeHuman' property instead of 'size' - bump subchart version to 0.1.1 --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-08-26 22:00:37 -07:00
Anish Lakhwara	8b16124675	feat: implement 'collections' array with {name, id} for archived item details (#1098 ) - rename 'collections' -> 'collectionIds', adding migration 0014 - only populate 'collections' array with {name, id} pair for get_crawl() / single archived item path, but not for aggregate/list methods - remove Crawl.get_crawl(), redundant with BaseCrawl.get_crawl() version - ensure _files_to_resources returns an empty [] instead of none if empty (matching BaseCrawl.get_crawl() behavior to Crawl.get_crawl()) - tests: update tests to use collectionIds for id list, add 'collections' for {name, id} test - frontend: change Crawl object to have collectionIds instead of collections --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-08-25 00:26:46 -07:00
sua yoo	54cf4f23e4	Paginate Workflows and refactor to use server-side queries (#1078 ) - Paginates Crawl Workflows when there are more than 10 workflows - Refactors workflow search and crawl search to use the same component - Adds sort by first seed, workflow creation date, and workflow modified date - Separates "last run" date from "modified" date - Update column layout into Name & Schedule (or Manual Ru'ri=), Latest Crawl (<finish time> in <duration>), total size, and last modified (modified by and modified time)	2023-08-22 16:29:17 -07:00
Ilya Kreymer	362afa47bd	Support for Public / Shareable Collections (#1038 ) * collections: support toggling collections public/private, viewable via RWP - backend: add 'public' to collection model, support patching to update - backend: add .../collections/<id>/public/replay.json for public access - backend: add CORS handling for public endpoint - frontend: support 'make shareable / make private' dropdown actions on collection detail + collection list views - frontend: show shareable / private icons by collection name on detail + list views - frontend: link to replayweb.page for standalone browsing - frontend: add embed code popup when a collection is shareable - refer to public collections as 'shareable' for now --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2023-08-03 19:11:01 -07:00
sua yoo	cc52dfd940	Sort Collections by size (#1026 ) - Adds "Size" column to Collections list view - Adds "Size" option to sort dropdown	2023-08-01 09:47:47 -07:00
Ilya Kreymer	06cf9c7cc3	add crawl ending states: 'generate-wacz', 'uploading-wacz', 'pending-wait' that occur after a crawl is finished or is being stopped (#1022 ) operator: ensure transitions from each of these states is supported, including to 'waiting_capacity' add extra check on stopping to avoid transitioning back to a running state after crawl is finished ui: add states to UI display, localization, add as active states fixes #263	2023-08-01 00:15:59 -07:00
Ilya Kreymer	6506965d98	Streaming Download for Collections (#1012 ) * support streaming download of collections (part of #927) - WACZ zip created on the fly using stream-zip - add 'Download Collection' option to collection detail and list - after editing collection, return to collection view - tests: add test for streaming download, ensure WACZ files + datapackage present, STORE compression used --------- Co-authored-by: sua yoo <sua@suayoo.com>	2023-07-26 15:42:17 -07:00
Tessa Walsh	c21153255a	Rename notes to description in frontend and backend (#1011 ) - Rename crawl notes to description - Add migration renaming notes -> description - Stop inheriting workflow description in crawl - Update frontend to replace crawl/upload notes with description - Remove setting of config description from crawl list - Adjust tests for changes	2023-07-26 13:00:04 -07:00
Tessa Walsh	d5c3a8519f	Add crawler Use Sitemap option to Browsertrix Cloud (#978 ) * Add user-guide docs for Use Sitemap option --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2023-07-19 13:57:52 -04:00
sua yoo	f3660839bf	Allow users to add uploads to collections (#968 ) * show uploads in 'Select Uploads' section	2023-07-09 22:21:50 -07:00
sua yoo	de4b18aa67	List crawls, uploads, and all objects in UI (#941 ) - Adds top-level "Archived Data" view, replacing "Finished Crawls" and moving it as "Crawls" into view - Adds list for viewing all artifacts/data - Adds list for viewing all uploaded crawls - Updates crawl detail view to show upload details - Edit upload metadata, including 'name' - Delete uploads --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2023-07-07 13:20:28 -07:00
Tessa Walsh	bd6dc79449	Add frontend support for auto-adding collections to workflows (#916 ) - Adds collections search and list to workflow editor - Adds collections to workflow details component - Adds namePrefix filter to backend GET /orgs/{oid}/collections endpoint to support case-insensitive searching of collections - Adds documentation for new setting --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2023-06-12 18:18:05 -07:00
sua yoo	66b3befef9	Frontend collections beta UI (#886 ) - Support for creating new collections and editing existing collections - Can select crawling workflows which adds entire workflow, and then deselect individual crawls - Can edit existing collections and add more crawls - Can view, create and delete collections via new Collections top-level nav entry	2023-06-06 17:52:01 -07:00
Ilya Kreymer	00fb8ac048	Concurrent Crawl Limit (#874 ) concurrent crawl limits: (addresses #866) - support limits on concurrent crawls that can be run within a single org - change 'waiting' state to 'waiting_org_limit' for concurrent crawl limit and 'waiting_capacity' for capacity-based limits orgs: - add 'maxConcurrentCrawl' to new 'quotas' object on orgs - add /quotas endpoint for updating quotas object operator: - add all crawljobs as related, appear to be returned in creation order - operator: if concurrent crawl limit set, ensures current job is in the first N set of crawljobs (as provided via 'related' list of crawljob objects) before it can proceed to 'starting', otherwise set to 'waiting_org_limit' - api: add org /quotas endpoint for configuring quotas - remove 'new' state, always start with 'starting' - crawljob: add 'oid' to crawljob spec and label for easier querying - more stringent state transitions: add allowed_from to set_state() - ensure state transitions only happened from allowed states, while failed/canceled can happen from any state - ensure finished and state synched from db if transition not allowed - add crawl indices by oid and cid frontend: - show different waiting states on frontend: 'Waiting (Crawl Limit) and 'Waiting (At Capacity)' - add gear icon on orgs admin page - and initial popup for setting org quotas, showing all properties from org 'quotas' object tests: - add concurrent crawl limit nightly tests - fix state waiting -> waiting_capacity - ci: add logging of operator output on test failure	2023-05-30 15:38:03 -07:00
Tessa Walsh	bd8b306fbd	Improve sorting workflows by lastUpdated (#826 ) * Precompute config crawl stats Includes a database migration to move preciously dynamically computed crawl stats for workflows into the CrawlConfig model. * Add lastRun sorting option and enable it by default * Add modified as final sort key to order non-run workflows * Remove currCrawl* fields and update frontend accordingly * Add isCrawlRunning field to backend and use in frontend	2023-05-22 18:42:30 -04:00
Ilya Kreymer	82b21b6813	frontend crawl stopping improvements (#836 ) (#838 ) * frontend crawl stopping improvements (#836) - support new backend 'stopping' property - for now, keep 'stopping' indicator state when crawl is running but stopping set to true	2023-05-08 23:52:49 -07:00
Ilya Kreymer	2cae065c46	Add Waiting state on the backend and frontend (#839 ) * operator: add waiting state - add pods as related objects - inspect pod status, set crawl status to 'waiting' if no pods are running frontend: - frontend support for 'waiting' state - show waiting icon from mocks --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2023-05-08 17:05:01 -07:00
sua yoo	9fcbc3f87e	Allow users to set max depth/hop out within scope (#816 ) - Adds an input to the Workflow creation and edit form for specifying crawl depth. This input is conditionally shown for seeded crawls when the scope is set to "Pages on this domain", "Pages on this domain & subdomains" or "Custom page prefix". The "any" scope is also supported for backwards compatibility but is not shown by default or in new configs. - API implementation: The depth value is set in the primary seed config, i.e. the first seed in seeds: [], not in the outer .config.depth property.	2023-05-05 14:26:48 -07:00
Henry Wilkinson	7978cb4d85	Crawl detail page update (#808 ) - Removes the info bar rendering and moves relevant information to the Overview section - Adds total crawl size to the overview section	2023-05-03 15:50:15 -04:00
sua yoo	7888c4fde3	Frontend crawl workflows rework (#775 )	2023-04-25 14:16:07 -07:00

1 2

72 Commits