Commit Graph

68 Commits

Author SHA1 Message Date
sua yoo
83c9203a11
Initial QA Review UI! (#1624)
QA Details page:
- Enables QA tab with ability to start automated analysis QA Run + view a and manual review status
- Pages listed with review status + overall crawl review status shown on QA details (relates to #1508)
- Initial placeholder for QA run analytics (part of #1589)
- Addresses a good deal of #1477

Automated Analysis QA in Review Mode:
- Ability to select from multiple analysis QA runs / view QA runs in QA details
- Shows analysis screenshot, text and resources compare and replay tabs (fixes #1496)
- Sorting by worst screenshot / worst text score for each QA run
- Includes pages sidebar with screenshot/text/resource compare results (fixes #1497)

Manual Review QA in Review Mode:
- Per-page replay available as separate tab (fixes #1499)
- Supports thumbs up, thumbs down, notes for each page
- Supports entering review status approval (good/acceptable/bad can be entered when finishing review

---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-04-04 15:09:52 -07:00
Emma Segal-Grossman
b1e2f1b325
Add ESLint rules for import ordering (#1608)
Follow-up from
https://github.com/webrecorder/browsertrix-cloud/pull/1546#discussion_r1529001599
(cc @SuaYoo)

- Adds `eslint-plugin-import-x` and
`@ianvs/prettier-plugin-sort-imports` and configures rules for them both
so imports get sorted on format & on lint.
- Runs both on everything!
2024-03-18 21:50:02 -04:00
sua yoo
9f312c075e
Manually approve pages in QA review (#1576)
- Automatically update view to first page if page ID isn't specified
- Show current page URL in location bar (resolves
https://github.com/webrecorder/browsertrix-cloud/issues/1495)
- Approve, reject, or leave notes on a page
- Display temporary list of links to pages in the sidebar
2024-03-12 10:08:51 -07:00
Emma Segal-Grossman
8462c08206
Fix a couple linting issues (#1565) 2024-03-11 16:20:37 -07:00
Emma Segal-Grossman
780dd09321
Create ArchivedItemPage and ArchivedItemPageComment types (#1567)
Based on #1534

Figured this should be in place so we can work on other front-end things
with these, rather than dealing with refactoring later

<!-- Fixes #issue_number -->

### Changes

- Adds `ArchivedItemPage` and `ArchivedItemPageComment` types from #1534
(thank you @SuaYoo!)
- Adds typedefs for match and resource count properties
- sets properties optional in the db schema to optional in the type as
well

### Manual testing

1.

### Screenshots

| Page | Image/video |
| ---- | ----------- |
|      |             |

<!-- ### Follow-ups -->
2024-03-04 18:52:09 -05:00
Emma Segal-Grossman
3ed10ad893
Add comments I meant to add in #1528 (#1529)
Tessa merged #1528 before I got to add these comments lol (it's my
fault, should have left the PR as draft until I was actually ready)
2024-02-12 19:33:16 -05:00
Emma Segal-Grossman
d88a6eb07f
Include leading zero in months when accessing usage and quota data (#1528)
Closes #1527 

Improves front-end types & ensures the data being accessed matches the
data sent by the back-end.

Tested by hand by using the returned data from the `/orgs/${orgId}`
endpoint in prod where this is happening in dev
2024-02-12 19:27:42 -05:00
Emma Segal-Grossman
d1156b0145
enable a few more useful eslint suggestions & correct some more types (#1517)
## Changes

Implements suggestions from
https://typescript-eslint.io/blog/consistent-type-imports-and-exports-why-and-how/
and
https://www.totaltypescript.com/method-shorthand-syntax-considered-harmful,
along with a couple more auto-fixable consistency rules.

Of note:
- Functions that return a promise are marked as async
- Suggestions now appear for where to simplify boolean checks,
non-nullish assertions, and optional chaining
2024-02-09 16:14:08 -08:00
Emma Segal-Grossman
3968928ac2
ESLint improvements & Typescript upgrade (#1501)
## Overview

Adds a bunch of ESLint rules, mostly from `typescript-eslint`, and fixes
the issues turning on these rules raises.

Also updates Typescript & typescript-eslint.

## Rationale

Most of these new rules are auto-fixable, so I've tackled a bunch of the
little fixes that do need manual intervention now with the intention
that this shouldn't add much of any additional friction in future
development work, and also give us a good bump in overall code quality.
A lot of the rules here are also great for catching potential bugs!

## Changes

- Adds `void` to most un-awaited and unhandled promises (i.e. places
where async functions are called but nothing is done with the promise)
- Converts properties that are only ever read to `readonly`
- Adds a new `isApiError` function that informs Typescript of when an
error is an `APIError`
- Adds types to a bunch of places that were previously untyped
- Changes instances of `Map<string, any>` in lit property update methods
to `PropertyValues<this>`, or sometimes `PropertyValues<this> &
Map<string, unknown>` where private or protected members are used
(`keyof` doesn't include private and protected members, unfortunately)
  - Adds types to a bunch of custom events
- Cleans up a regex by removing unnecessary escape characters
- Makes a number of implied type conversions explicit (by wrapping with
`Boolean(...)` or calling `.toString()`)
- More consistently applies type coercions when necessary, and removes
them when unnecessary
- Converts a couple const strings to an enum
- Removes the need to type debounced functions as `any` by doing type
coercions to the underlying function type at where the method is bound
to the event in the `html` block
2024-01-31 14:42:06 -05:00
sua yoo
1f55edbe68
Update collection archived item lists (#1457)
New features & enhancements:
- New UI for collection item selection dialog
- Consistent data table styles for collection list and collection item
list

Refactors:
- Adds `btrix-table` as low-level table component
- Adds `btrix-archived-item-list`, removes `checkbox-list` and
deprecates `crawl-list`
- Upgrades Shoelace for `sl-tree` fixes
- Fixes `ArchivedItem` typing
2024-01-22 17:14:53 -08:00
Tessa Walsh
07fa46d9aa
Add custom user agent to workflows (#1465)
Fixes #1341

Adds "User Agent" field to workflow editor under the Browser Settings
tab. If not set, the crawler will use the browser's default user agent.

Also added to docs and to the workflow details page (if set).

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2024-01-17 17:33:50 -05:00
Tessa Walsh
032859f361
Support multiple crawler versions (#1420)
Fixes #1385 

## Changes
Supports multiple crawler 'channels' which can be configured to
different browsertrix-crawler versions
- Replaces `crawler_image` in helm chart with `crawler_channels` array
similar to how storages are handled
- The `default` crawler channel must always be provided and specifies
the default crawler image
- Adds backend `/orgs/{oid}/crawlconfigs/crawler-channels` API endpoint
to fetch information about available crawler versions (name, image, and
label) and test
- Adds crawler channel select to workflow creation/edit screens and
profile creation dialog, and updates related API endpoints and
configmaps accordingly. The select dropdown is shown only if more than
one channel is configured.
- Adds `crawlerChannel` to workflow and crawl details.
- Add `image` to crawler image, used to display actual image used as
part of the crawl.
- Modifies `crawler_crawl_id` backend test fixture to use `test` crawler
version to ensure crawler versions other than latest work
- Adds migration to add `crawlerChannel` set to `default` to existing
workflow and profile objects and workflow configmaps

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-01-16 15:32:12 -08:00
sua yoo
2bb21c615d
Improve frontend event system (#1450)
- Adds notify, navigate, and log in events to global event map, handle
in `btrix-app`
- Adds console debugs, which are stripped in prod
- Replaces TODO redundant `navTo`s with controller implementation
- Refactors rest of `LitElement` helpers into arrow functions
2023-12-13 14:11:15 -08:00
Tessa Walsh
be41c48c27
Add extra and gifted execution minutes (#1361)
Fixes #1358 

- Adds `extraExecMinutes` and `giftedExecMinutes` org quotas, which are
not reset monthly but are updateable amounts that carry across months
- Adds `quotaUpdate` field to `Organization` to track when quotas were
updated with timestamp
- Adds `extraExecMinutesAvailable` and `giftedExecMinutesAvailable`
fields to `Organization` to help with tracking available time left
(includes tested migration to initialize these to 0)
- Modifies org backend to track time across multiple categories, using
monthlyExecSeconds, then giftedExecSeconds, then extraExecSeconds.
All time is also written into crawlExecSeconds, which is now the monthly
total and also contains any overage time above the quotas
- Updates Dashboard crawling meter to include all types of execution
time if `extraExecMinutes` and/or `giftedExecMinutes` are set above 0
- Updates Dashboard Usage History table to include all types of
execution time (only displaying columns that have data)
- Adds backend nightly test to check handling of quotas and execution
time
- Includes migration to add new fields and copy crawlExecSeconds to
monthlyExecSeconds for previous months

Co-authored-by: emma <hi@emma.cafe>
2023-12-07 14:34:37 -05:00
Emma Segal-Grossman
b15c5ccddd
ESLint & Typescript fixes (#1407)
Closes #1405

- Properly uses `typescript-eslint`: we were missing the preset from it,
so some of the default `eslint` rules (that don't properly work with
typescript) were being applied and causing false positives
- I also moved the `eslint` config into its own file, and enabled
`typescript-eslint`'s type-awareness, so that we can enable more
type-aware rules in the future if we like
- Adds `ts-lit-plugin` to the typescript config, which _hopefully_ will
allow us to catch issues during build (in CI)
- It looks like `ts-lit-plugin` is sort of abandonware at the moment,
and unfortunately _doesn't_ actually work for this purpose right now,
but the lit team is working on a replacement here:
https://www.npmjs.com/package/@lit-labs/analyzer
- Adds `fork-ts-checker-webpack-plugin`, which allows the typescript
checking process to be run on a separate forked thread in Webpack, which
can help speed up builds & checking
- Enables incremental type checking for better speed
- Fixes a whole bunch of `eslint`-auto-fixable issues (unused imports
and variables, some type issues, etc)
- Fixes a bunch of `lit-analyzer` issues (mostly attribute naming, some
type issues as well)
- Fixes various other type issues:
- Improves type safety in a bunch of places, notably anywhere `apiFetch`
and `APIPaginatedList` are used
  - Removes some `any`s
2023-11-24 12:32:53 -05:00
Ilya Kreymer
dfba4b3940
Replace partial_complete -> stopped_by_user or stopped_quota_reached + operator edge cases (#1368)
- Adds two new crawl finished state, stopped_by_user and
stopped_quota_reached
- Tracking other possible 'stop reasons' in operator, though not making
them distinct states for now.
- Updated frontend with 'Stopped by User' and 'Stopped: Time Quota
Reached', shown with same icon as current partial_complete
- Added migration of partial_complete to either stopped_by_user or
complete (no historical quota data available)
- Addresses edge case in scaling: if crawl never scaled (no redis entry,
no pod), automatically scale down
- Edge case in status: if crawl is somehow 'canceled' but not deleted,
immediately delete crawl object and begin finalizing.

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-11-14 11:17:16 -08:00
Tessa Walsh
ea5650f173
Add checkmark next to replicated/backed up files (#1343)
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
2023-11-08 11:21:31 -05:00
Tessa Walsh
38f32f11ea
Enforce quota and hard cap for monthly execution minutes (#1284)
Fixes #1261 Closes #1092

The quota for monthly execution minutes is treated as a hard cap. Once
it is exceeded, an alert indicating that an org has exceeded its monthly
execution minutes will display and the user will be unable to start new
crawls. Any running crawls will be stopped once the quota is exceeded.

An execution minutes meter bar is also added in the Org Dashboard and
displayed if a quota is set. More detail in #1305 which was
merged into this branch.

## Changes

- Enable setting 'maxExecMinutesPerMonth' in orgs list quotas by superadmin
- Enforce quota by stopping crawls in operator once quota is reached
- Show alert banner once execution time quota is hit:
- Once quota is hit, disable Run Crawl buttons in frontend, return 403
message with `exec_minutes_quota_reached` detail in backend from
crawl config `/run` endpoint, and don't run new workflows on creation
(similar to storage quota)
- Display execution time for crawls in the crawl details overview,
immediately below
- Show execution minutes meter on dashboard (from #1305)

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: sua yoo <sua@webrecorder.org>
2023-10-26 15:38:51 -07:00
sua yoo
2e5952a444
Display crawl time usage history table (#1304)
Partially resolves #1223, fixes #1298

- Adds crawl usage table in dashboard under metrics
- Shows skeleton loading indicator when metrics are loading (@Shrinks99
feel free to adjust how this looks)
- Shows max number of concurrent crawls running if any are running ("`running` / `max` Crawls Running")
2023-10-23 16:25:16 -07:00
sua yoo
8466caf1d9
Allow org admins to update slug (#1276)
- Allows editing of org slugs (actual URL updates will be handled in
https://github.com/webrecorder/browsertrix-cloud/issues/1258.)
- Converts user input to slug using slugify
- Adds help text to org name and slug
- Renames tab from "information" to "general" settings
2023-10-13 17:00:43 -07:00
Tessa Walsh
b1ead614ee
Add --failOnFailedSeed checkbox to URL list workflows (#1236)
- If set, and any of the seeds fails, the entire crawl is marked as a failure.
- Add checkbox which adds --failOnFailedSeed checkbox to URL list workflows
- Add 'Fail Crawl On Failed URL' to crawl workflow setup docs
2023-10-03 18:46:09 -07:00
sua yoo
941a75ef12
Separate seeds into a new endpoints (#1217)
- Remove config.seeds from workflow and crawl detail endpoints
- Add new paginated GET /crawls/{crawl_id}/seeds and /crawlconfigs/{cid}/seeds endpoints to retrieve seeds for a crawl or workflow
- Include firstSeed in GET /crawlconfigs/{cid} endpoint (was missing before)
- Modify frontend to fetch seeds from new /seeds endpoints with loading indicator

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-10-02 10:56:12 -07:00
Tessa Walsh
9224f52f51
Remove config from list endpoints to speed up responses (#1193)
* Remove config from list endpoints

- Remove config field from workflow and crawl list endpoints
- Add seedCount to CrawlConfigOut on backend and Workflow on frontend
- Refactor CrawlConfig and CrawlConfigOut to extend CrawlConfigCore + CrawlConfigAdditional
- Refactor workflow list in frontend to use firstSeed and seedCount
- Frontend uses ListWorkflow type which is Omit<Workflow, "config">
2023-09-19 11:05:48 -05:00
Ilya Kreymer
c9c39d47b7
Scheduled Crawl Refactor: Handle via Operator + Add Skipped Crawls on Quota Reached (#1162)
* use metacontroller's decoratorcontroller to create CrawlJob from Job
* scheduled job work:
- use existing job name for scheduled crawljob
- use suspended job, set startTime, completionTime and succeeded status on job when crawljob is done
- simplify cronjob template: remove job_image, cron_namespace, using same namespace as crawls,
placeholder job image for cronjobs

* move storage quota check to crawljob handler:
- add 'skipped_quota_reached' as new failed status type
- check for storage quota before checking if crawljob can be started, fail if not (check before any pods/pvcs created)

* frontend:
- show all crawls in crawl workflow, no need to filter by status
- add 'skipped_quota_reached' status, show as 'Skipped (Quota Reached)', render same as failed

* migration: make release namespace available as DEFAULT_NAMESPACE, delete old cronjobs in DEFAULT_NAMESPACE and recreate in crawlers namespace with new template
2023-09-12 13:05:43 -07:00
Tessa Walsh
d2ededc895
Add and enforce org storage quota (#1106)
* Implement in backend

- Track bytesStored in org
- Add migration to pre-calculate based on size of crawlfiles and profilefiles
- Add methods to increase or decrease org storage when crawl or profile files
are added or deleted
- Include storageQuotaReached boolean in API responses that alter storage
- Don't start new crawls and fail uploads if storage quota reached

* Implement in frontend

- Add to orgs-list quotas
- Update org's storageQuotaReached based on backend endpoint responses
- Disable buttons when storage quota is met
- Show toast notification when attempting to run a crawl when org
storage quota is met
2023-09-07 12:45:43 -04:00
sua yoo
ff6650d481
Manage collection from archived item details (#1085)
- Lists collections that an archived item belongs to in item detail view
- Improves performance of collection add component
---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-09-05 17:52:17 -04:00
Tessa Walsh
e667fe2e97
Add max crawl size option to backend and frontend (#1045)
Backend:
- add 'maxCrawlSize' to models and crawljob spec
- add 'MAX_CRAWL_SIZE' to configmap
- add maxCrawlSize to new crawlconfig + update APIs
- operator: gracefully stop crawl if current size (from stats) exceeds maxCrawlSize
- tests: add max crawl size tests

Frontend:
- Add Max Crawl Size text box Limits tab
- Users enter max crawl size in GB, convert to bytes
- Add BYTES_PER_GB as constant for converting to bytes
- docs: Crawl Size Limit to user guide workflow setup section

Operator Refactor:
- use 'status.stopping' instead of 'crawl.stopping' to indicate crawl is being stopped, as changing later has no effect in operator
- add is_crawl_stopping() to return if crawl is being stopped, based on crawl.stopping or size or time limit being reached
- crawlerjob status: store byte size under 'size', human readable size under 'sizeHuman' for clarity
- size stat always exists so remove unneeded conditional (defaults to 0)
- store raw byte size in 'size', human readable size in 'sizeHuman'

Charts:
- subchart: update crawlerjob crd in btrix-crds to show status.stopping instead of spec.stopping
- subchart: show 'sizeHuman' property instead of 'size'
- bump subchart version to 0.1.1

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-08-26 22:00:37 -07:00
Anish Lakhwara
8b16124675
feat: implement 'collections' array with {name, id} for archived item details (#1098)
- rename 'collections' -> 'collectionIds', adding migration 0014
- only populate 'collections' array with {name, id} pair for get_crawl() / single archived item
path, but not for aggregate/list methods
- remove Crawl.get_crawl(), redundant with BaseCrawl.get_crawl() version
- ensure _files_to_resources returns an empty [] instead of none if empty (matching BaseCrawl.get_crawl() behavior to Crawl.get_crawl())
- tests: update tests to use collectionIds for id list, add 'collections' for {name, id} test
- frontend: change Crawl object to have collectionIds instead of collections

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-08-25 00:26:46 -07:00
sua yoo
54cf4f23e4
Paginate Workflows and refactor to use server-side queries (#1078)
- Paginates Crawl Workflows when there are more than 10 workflows
- Refactors workflow search and crawl search to use the same component
- Adds sort by first seed, workflow creation date, and workflow modified date
- Separates "last run" date from "modified" date
- Update column layout into Name & Schedule (or Manual Ru'ri=), Latest Crawl (<finish time> in <duration>), total size, and last modified (modified by and modified time)
2023-08-22 16:29:17 -07:00
Ilya Kreymer
362afa47bd
Support for Public / Shareable Collections (#1038)
* collections: support toggling collections public/private, viewable via RWP
- backend: add 'public' to collection model, support patching to update
- backend: add .../collections/<id>/public/replay.json for public access
- backend: add CORS handling for public endpoint
- frontend: support 'make shareable / make private' dropdown actions on collection detail + collection list views
- frontend: show shareable / private icons by collection name on detail + list views
- frontend: link to replayweb.page for standalone browsing
- frontend: add embed code popup when a collection is shareable
- refer to public collections as 'shareable' for now

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-08-03 19:11:01 -07:00
sua yoo
cc52dfd940
Sort Collections by size (#1026)
- Adds "Size" column to Collections list view
- Adds "Size" option to sort dropdown
2023-08-01 09:47:47 -07:00
Ilya Kreymer
06cf9c7cc3
add crawl ending states: 'generate-wacz', 'uploading-wacz', 'pending-wait' that occur after a crawl is finished or is being stopped (#1022)
operator: ensure transitions from each of these states is supported, including to 'waiting_capacity'
add extra check on stopping to avoid transitioning back to a running state after crawl is finished
ui: add states to UI display, localization, add as active states
fixes #263
2023-08-01 00:15:59 -07:00
Ilya Kreymer
6506965d98
Streaming Download for Collections (#1012)
* support streaming download of collections (part of #927)
- WACZ zip created on the fly using stream-zip
- add 'Download Collection' option to collection detail and list
- after editing collection, return to collection view
- tests: add test for streaming download, ensure WACZ files + datapackage present, STORE compression used

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2023-07-26 15:42:17 -07:00
Tessa Walsh
c21153255a
Rename notes to description in frontend and backend (#1011)
- Rename crawl notes to description
- Add migration renaming notes -> description
- Stop inheriting workflow description in crawl
- Update frontend to replace crawl/upload notes with description
- Remove setting of config description from crawl list
- Adjust tests for changes
2023-07-26 13:00:04 -07:00
Tessa Walsh
d5c3a8519f
Add crawler Use Sitemap option to Browsertrix Cloud (#978)
* Add user-guide docs for Use Sitemap option
---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-07-19 13:57:52 -04:00
sua yoo
f3660839bf
Allow users to add uploads to collections (#968)
* show uploads in 'Select Uploads' section
2023-07-09 22:21:50 -07:00
sua yoo
de4b18aa67
List crawls, uploads, and all objects in UI (#941)
- Adds top-level "Archived Data" view, replacing "Finished Crawls" and moving it as "Crawls" into view
- Adds list for viewing all artifacts/data
- Adds list for viewing all uploaded crawls
- Updates crawl detail view to show upload details
- Edit upload metadata, including 'name'
- Delete uploads
---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-07-07 13:20:28 -07:00
Tessa Walsh
bd6dc79449
Add frontend support for auto-adding collections to workflows (#916)
- Adds collections search and list to workflow editor
- Adds collections to workflow details component
- Adds namePrefix filter to backend GET /orgs/{oid}/collections endpoint to support case-insensitive searching of collections
- Adds documentation for new setting

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-06-12 18:18:05 -07:00
sua yoo
66b3befef9
Frontend collections beta UI (#886)
- Support for creating new collections and editing existing collections
- Can select crawling workflows which adds entire workflow, and then deselect individual crawls
- Can edit existing collections and add more crawls
- Can view, create and delete collections via new Collections top-level nav entry
2023-06-06 17:52:01 -07:00
Ilya Kreymer
00fb8ac048
Concurrent Crawl Limit (#874)
concurrent crawl limits: (addresses #866)
- support limits on concurrent crawls that can be run within a single org
- change 'waiting' state to 'waiting_org_limit' for concurrent crawl limit and 'waiting_capacity' for capacity-based
limits

orgs:
- add 'maxConcurrentCrawl' to new 'quotas' object on orgs
- add /quotas endpoint for updating quotas object

operator:
- add all crawljobs as related, appear to be returned in creation order
- operator: if concurrent crawl limit set, ensures current job is in the first N set of crawljobs (as provided via 'related' list of crawljob objects) before it can proceed to 'starting', otherwise set to 'waiting_org_limit'
- api: add org /quotas endpoint for configuring quotas
- remove 'new' state, always start with 'starting'
- crawljob: add 'oid' to crawljob spec and label for easier querying
- more stringent state transitions: add allowed_from to set_state()
- ensure state transitions only happened from allowed states, while failed/canceled can happen from any state
- ensure finished and state synched from db if transition not allowed
- add crawl indices by oid and cid

frontend: 
- show different waiting states on frontend: 'Waiting (Crawl Limit) and 'Waiting (At Capacity)'
- add gear icon on orgs admin page
- and initial popup for setting org quotas, showing all properties from org 'quotas' object

tests:
- add concurrent crawl limit nightly tests
- fix state waiting -> waiting_capacity
- ci: add logging of operator output on test failure
2023-05-30 15:38:03 -07:00
Tessa Walsh
bd8b306fbd
Improve sorting workflows by lastUpdated (#826)
* Precompute config crawl stats

Includes a database migration to move preciously dynamically computed
crawl stats for workflows into the CrawlConfig model.

* Add lastRun sorting option and enable it by default

* Add modified as final sort key to order non-run workflows

* Remove currCrawl* fields and update frontend accordingly

* Add isCrawlRunning field to backend and use in frontend
2023-05-22 18:42:30 -04:00
Ilya Kreymer
82b21b6813
frontend crawl stopping improvements (#836) (#838)
* frontend crawl stopping improvements (#836)
- support new backend 'stopping' property
- for now, keep 'stopping' indicator state when crawl is running but stopping set to true
2023-05-08 23:52:49 -07:00
Ilya Kreymer
2cae065c46
Add Waiting state on the backend and frontend (#839)
* operator: add waiting state
- add pods as related objects
- inspect pod status, set crawl status to 'waiting' if no pods are running

frontend:
- frontend support for 'waiting' state
- show waiting icon from mocks

---------
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-05-08 17:05:01 -07:00
sua yoo
9fcbc3f87e
Allow users to set max depth/hop out within scope (#816)
- Adds an input to the Workflow creation and edit form for specifying crawl depth. This input is conditionally shown for seeded crawls when the scope is set to "Pages on this domain", "Pages on this domain & subdomains" or "Custom page prefix". The "any" scope is also supported for backwards compatibility but is not shown by default or in new configs.
- API implementation: The depth value is set in the primary seed config, i.e. the first seed in seeds: [], not in the outer .config.depth property.
2023-05-05 14:26:48 -07:00
Henry Wilkinson
7978cb4d85
Crawl detail page update (#808)
- Removes the info bar rendering and moves relevant information to the Overview section
- Adds total crawl size to the overview section
2023-05-03 15:50:15 -04:00
sua yoo
7888c4fde3
Frontend crawl workflows rework (#775) 2023-04-25 14:16:07 -07:00
sua yoo
80bc4a3eb9
Fix additional URLs (#752) 2023-04-05 20:11:09 -07:00
sua yoo
91c2c1ad62
Allow users to set additional page time limits (#744) 2023-04-05 20:06:46 -07:00
sua yoo
c60dc5d086
Crawls list backend pagination (#735) 2023-04-05 10:55:42 -07:00
Tessa Walsh
4724754efc
Filter and sort crawl and workflow list API endpoints in backend (#724)
* Re-implement pagination and paginate crawlconfig revs

First step toward simplifying pagination to set us up for sorting
and filtering of list endpoints. This commit removes fastapi-pagination
as a dependency.

* Migrate all HttpUrl seeds to Seeds

This commit also updates the frontend to always use Seeds and to
fix display issues resulting from the change.

* Filter and sort crawls and workflows

Crawls:
- Filter by createdBy (via userid param)
- Filter by state (comma-separated string for multiple values)
- Filter by first_seed, name, description
- Sort by started, finished, fileSize, firstSeed
- Sort descending by default to match frontend

Workflows:
- Filter by createdBy (formerly userid) and modifiedBy
- Filter by first_seed, name, description
- Sort by created, modified, firstSeed, lastCrawlTime

* Add crawlconfigs search-values API endpoint and test
2023-03-28 17:55:40 -04:00