browsertrix

Author	SHA1	Message	Date
Ilya Kreymer	8a507f0473	Consolidate list page endpoints + better QA sorting + optimize pages fix (#2417 ) - consolidate list_pages() and list_replay_query_pages() into list_pages() - to keep backwards compatibility, add <crawl>/pagesSearch that does not include page totals, keep <crawl>/pages with page total (slower) - qa frontend: add default 'Crawl Order' sort order, to better show pages in QA view - bgjob: account for parallelism in bgjobs, add logging if succeeded mismatches parallelism - QA sorting: default to 'crawl order' by default to get better results. - Optimize pages job: also cover crawls that may not have any pages but have pages listed in done stats - Bgjobs: give custom op jobs more memory	2025-02-21 13:47:20 -08:00
Tessa Walsh	f8fb2d2c8d	Rework crawl page migration + MongoDB Query Optimizations (#2412 ) Fixes #2406 Converts migration 0042 to launch a background job (parallelized across several pods) to migrate all crawls by optimizing their pages and setting `version: 2` on the crawl when complete. Also Optimizes MongoDB queries for better performance. Migration Improvements: - Add `isMigrating` and `version` fields to `BaseCrawl` - Add new background job type to use in migration with accompanying `migration_job.yaml` template that allows for parallelization - Add new API endpoint to launch this crawl migration job, and ensure that we have list and retry endpoints for superusers that work with background jobs that aren't tied to a specific org - Rework background job models and methods now that not all background jobs are tied to a single org - Ensure new crawls and uploads have `version` set to `2` - Modify crawl and collection replay.json endpoints to only include fields for replay optimization (`initialPages`, `pageQueryUrl`, `preloadResources`) if all relevant crawls/uploads have `version` set to `2` - Remove `distinct` calls from migration pathways - Consolidate collection recompute stats Query Optimizations: - Remove all uses of $group and $facet - Optimize /replay.json endpoints to precompute preload_resources, avoid fetching crawl list twice - Optimize /collections endpoint by not fetching resources - Rename /urls -> /pageUrlCounts and avoid $group, instead sort with index, either by seed + ts or by url to get top matches. - Use $gte instead of $regex to get prefix matches on URL - Use $text instead of $regex to get text search on title - Remove total from /pages and /pageUrlCounts queries by not using $facet - frontend: only call /pageUrlCounts when dialog is opened. --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Emma Segal-Grossman <hi@emma.cafe> Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>	2025-02-20 15:26:11 -08:00
Ilya Kreymer	7b2932c582	Add initial pages + pagesQuery endpoint to /replay.json APIs (#2380 ) Fixes #2360 - Adds `initialPages` to /replay.json response for collections, returning up-to 25 pages (seed pages first, then sorted by capture time). - Adds `pagesQueryUrl` to /replay.json - Adds a public pages search endpoint to support public collections. - Adds `preloadResources`, including list of WACZ files that should always be loaded, to /replay.json --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-02-13 16:53:47 -08:00
sua yoo	f7b9b73a68	fix: Sort filtered collection page URLs (#2384 ) Fixes https://github.com/webrecorder/browsertrix/issues/2383 - Fixes unpredictable sort order when typing in collection page URL - Fixes page URL results flickering in and out while typing --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-02-12 11:59:20 -05:00
Emma Segal-Grossman	f8a44258d8	Merge pull request #2332 from webrecorder/frontend-collection-editing-dialog Collection editing and sharing revamp	2025-02-11 18:27:35 -05:00
Tessa Walsh	0a8df62ab4	Ensure collection stats are updated when WACZ is added on upload (#2351 ) Fixes #2350 Collection earliest/latest dates and the collection modified date are also now updated when crawls or uploads are added to a collection via the collection auto-add feature.	2025-01-30 13:05:56 -08:00
Tessa Walsh	763c654484	feat: Update collection sorting, metadata, stats (#2327 ) - Refactors dashboard and org profile preview to use private API endpoint, to fix public collections not showing when the org visibility is hidden - Adds additional sorting options for collections - Adds unique page url counts for archived items, collections, and organizations to backend and exposes this in collections - Shows collection period (i.e. `dateEarliest` to `dateLatest`) in collections list - Shows same collection metadata in private and public views, updates private view info bar - Fixes "Update Org Profile" action item showing for crawler roles --------- Co-authored-by: sua yoo <sua@webrecorder.org> Co-authored-by: sua yoo <sua@suayoo.com> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2025-01-23 13:32:23 -05:00
Tessa Walsh	4583babecb	feat: Add slug to collections and use it in public collection URLs (#2301 ) Resolves https://github.com/webrecorder/browsertrix/issues/2298 ## Changes - Slugs added to collections, can be specified separately when creating or updating collections or else is based off of supplied collection name - Migration added to backfill slugs for existing collections - Redirect collection to newest slug if changed - Adds option to copy public profile link to "Public Collections" action menu - Show "Back to <Org>" link instead of breadcrumbs --------- Co-authored-by: sua yoo <sua@suayoo.com> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2025-01-15 22:44:32 -08:00
sua yoo	4347fcdba5	feat: Show collection created date (#2302 ) - Shows collection created date in detail view (if present) - Adds `black` formatter to vscode extension recommendations --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2025-01-14 11:22:00 -05:00
Tessa Walsh	d8655d3bc6	Use id for thumbnail size error detail	2025-01-13 15:15:49 -08:00
Tessa Walsh	be9ff04ee8	Make more explicit error message for large thumbnails	2025-01-13 15:15:49 -08:00
Tessa Walsh	a031fab313	Backend work for public collections (#2198 ) Fixes #2182 This rather large PR adds the rest of what should be needed for public collections work in the frontend. New API endpoints include: - Public collections endpoints: GET, streaming download - Paginated list of URLs in collection with snapshot (page) info for each - Collection endpoint to set home URL - Collection endpoint to upload thumbnail as stream - DELETE endpoint to remove collection thumbnail Changes to existing API endpoints include: - Paginating public collection list results - Several `pages` endpoints that previously only supported `/crawls/` in their path, e.g. `/orgs/{oid}/crawls/all/pages/reAdd`, now support `/uploads/` and `/all-crawls/` namespaces as well. This is necessitated by adding pages for uploads to the database (see below). For `/orgs/{oid}/namespace/all/pages/reAdd`, `crawls` or `uploads` will serve as a filter to only affect crawls of that given type. Other endpoints are more liberal at this point, and will perform the same action regardless of the namespace used in the route (we'll likely want to change this in a follow-up to be more consistent). - `/orgs/{oid}/namespace/all/pages/reAdd` now kicks off a background job rather than doing all of the computation in an asyncio task in the backend container. The background job additionally updates collection date ranges, page/size counts, and tags for each collection in the org after pages have been (re)added. Other big changes: - New uploads will now have their pages read into the database! Collection page counts now also include uploads - A migration was added to start a background job for each org that will add the pages for previously-uploaded WACZ files to the database and update collections accordingly - Adds a new `ImageFile` subclass of `BaseFile` for thumbnails that we can use for other user-uploaded image files moving forward, with separate output models for authenticated and public endpoints	2025-01-13 15:15:48 -08:00
Tessa Walsh	190bdeb868	Add public API endpoint for public collections (#2174 ) Fixes #1051 If org with provided slug doesn't exist or no public collections exist for that org, return same 404 response with a detail of "public_profile_not_found" to prevent people from using public endpoint to determine whether an org exists. Endpoint is `GET /api/public-collections/<org-slug>` (no auth needed) to avoid collisions with existing org and collection endpoints.	2025-01-13 15:15:48 -08:00
Tessa Walsh	42ebfd303d	Make changes to collections to support publicly listed collections (#2164 ) Fixes #2158 - Adds `Organization.listPublicCollections` field and API endpoint to update it - Replaces `Collection.isPublic` boolean with `Collection.access` (values: `private`, `unlisted`, `public`) and add database migration - Update frontend to use `Collection.access` instead of `isPublic`, otherwise not changing current behavior --------- Co-authored-by: sua yoo <sua@suayoo.com>	2025-01-13 15:15:47 -08:00
Ilya Kreymer	104ea097c4	switch to simpler streaming download + multiwacz metadata improvements: (#1982 ) - download via presigned URLs via requests instead of boto APIs, remove boto - follow-up to #1933 for streaming download improvements - fixes datapackage.json in multi-wacz to contain the same resources objects with: `name`, `path`, `hash`, `bytes` to match single WACZ. - Add additional metadata to multi-wacz datapackage.json, including `type` (`crawl`, `upload`, `collection`, `qaRun`), `id` (unique id for the object), `title` / `description` if available (for crawl/upload/collection), and `crawlId` for `qaRun`	2024-10-03 16:13:31 -07:00
Tessa Walsh	d41647e6c2	Document all API endpoints with response models (#1928 ) Fixes #1920 Adds response models to all API endpoints that were missing them, documenting current behavior without making any changes at this stage to standardize responses. Follow-up work will involve adding generics to some of the response models	2024-07-16 12:48:38 -07:00
Tessa Walsh	aaf18e70a0	Add created date to Organization and fix datetimes across backend (#1921 ) Fixes #1916 - Add `created` field to Organization and OrgOut, set on org creation - Add migration to backfill `created` dates from first workflow `created` - Replace `datetime.now()` and `datetime.utcnow()` across app with consistent timezone-aware `utils.dt_now` helper function, which now uses `datetime.now(timezone.utc)`. This is in part to ensure consistency in how we handle datetimes, and also to get ahead of timezone naive datetime creation methods like `datetime.utcnow()` being deprecated in Python 3.12. For more, see: https://blog.miguelgrinberg.com/post/it-s-time-for-a-change-datetime-utcnow-is-now-deprecated	2024-07-15 19:46:32 -07:00
Ilya Kreymer	4f676e4e82	QA Runs Initial Backend Implementation (#1586 ) Supports running QA Runs via the QA API! Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes #1498 Also requires the latest Browsertrix Crawler 1.1.0+ (from webrecorder/browsertrix-crawler#469 branch) Notable changes: - QARun objects contain info about QA runs, which are crawls performed on data loaded from existing crawls. - Various crawl db operations can be performed on either the crawl or `qa.` object, and core crawl fields have been moved to CoreCrawlable. - While running,`QARun` data stored in a single `qa` object, while finished qa runs are added to `qaFinished` dictionary on the Crawl. The QA list API returns data from the finished list, sorted by most recent first. - Includes additional type fixes / type safety, especially around BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific get_upload(), get_basecrawl(), get_crawl() getters for internal use and get_crawl_out() for API - Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch) along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results. --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-03-20 22:42:16 -07:00
Tessa Walsh	a898c2b456	Format backend with Black 24 (#1507 ) Fixes #1506	2024-02-07 11:35:34 -08:00
Tessa Walsh	f3cbd9e179	Add crawl, upload, and collection delete webhook event notifications (#1363 ) Fixes #1307 Fixes #1132 Related to #1306 Deleted webhook notifications include the org id and item/collection id. This PR also includes API docs for the new webhooks and extends the existing tests to account for the new webhooks. This PR also does some additional cleanup for existing webhooks: - Remove `downloadUrls` from item finished webhook bodies - Rename collection webhook body `downloadUrls` to `downloadUrl`, since we only ever have one per collection - Fix API docs for existing webhooks, one of which had the wrong response body	2023-11-09 18:19:08 -08:00
Ilya Kreymer	6384d8b5f1	Additional Type Hints / Type Fix Pass (#1320 ) This PR adds more type safety to the backend codebase: - All ops classes calls should be type checked - Avoiding circular references with TYPE_CHECKING conditional - Consistent UUID usage: uuid.UUID / UUID4 with just UUID - Crawl states moved to models, made into lists - Additional typing added as needed, fixed a few type related errors - CrawlOps / UploadOps / BaseCrawlOps now all have same param init order to simplify changes	2023-10-30 12:59:24 -04:00
Ilya Kreymer	16e7a1d0a2	Storage Ops Refactor (#1257 ) * storage ops refactor: - create StorageOps class similar to other ops classes - init storages list in StorageOps, no longer require lookup up default storages via CrawlManager - convert all storage functions to members, add storageops to operator - remove unused params, ensure crawl exists for rollover restart - add env var to determine if using local minio to use correct endpoint URL * crawls /seeds endpoint: just return empty list if not a crawl (eg. upload) * crawlmanager: remove unused code, rename check_storage -> has_storage	2023-10-10 15:04:23 -07:00
Anish Lakhwara	1bf531e1ec	Fix: Make Collections Public on Creation (#1213 ) - Add isPublic to Add Collection endpoint, send isPublic from frontend - Fixes #1212	2023-09-29 12:08:10 -07:00
Ilya Kreymer	7eac0fdf95	optimization: convert all uses of 'async for' to use iterator directly (#1229 ) - optimization: convert all uses of 'async for' to use iterator directly instead of converting to list to avoid unbounded size lists - additional cursor.to_list() to async for conversions for stats computation, simply crawlconfigs stats computation --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-09-28 12:31:08 -07:00
Ilya Kreymer	feb7ab7652	Improved type checking for backend with mypy (#1174 ) * add mypy type check - run type check on backend fix ambiguous typing issues - add mypy to lint gh action + precommit hook - add mypy.ini	2023-09-13 19:40:26 -07:00
Ilya Kreymer	4b34da033a	Refactor / Cleanup: move ops functions back into classes (#1171 ) * remove almost all standalone functions and move them back into ops member functions * operator now has access to all the ops classes as well * keep two standalone functions used only in migrations --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-09-13 11:56:09 -07:00
Tessa Walsh	7cf2b11eb7	Add event webhook tests (#1155 ) * Add success filter to webhook list GET endpoint * Add sorting to webhooks list API and add event filter * Test webhooks via echo server * Set address to echo server on host from CI env var for k3d and microk8s * Add -s back to pytest command for k3d ci * Change pytest test path to avoid hanging on collecting tests * Revert microk8s to only run on push to main	2023-09-12 22:08:40 -07:00
Tessa Walsh	147bfd9d44	Add event webhook notifications system to backend (#1061 ) Initial set of backend API for event webhook notifications for the following events: * Crawl started (including boolean indicating if crawl was scheduled) * Crawl finished * Upload finished * Archived item added to collection * Archived item removed from collection Configuration of URLs is done via /api/orgs/<oid>/event-webhook-urls. If a URL is configured for a given event, a webhook notification is added to the database and then attempted to be sent (up to a total of 5 tries per overall attempt, with an increasing backoff between, implemented via use of the backoff library, which supports async). webhook status available via /api/orgs/<oid>/webhooks (Additional testing + potential fastapi integration left in separate follow-ups Fixes #1041	2023-08-31 19:52:37 -07:00
Anish Lakhwara	8b16124675	feat: implement 'collections' array with {name, id} for archived item details (#1098 ) - rename 'collections' -> 'collectionIds', adding migration 0014 - only populate 'collections' array with {name, id} pair for get_crawl() / single archived item path, but not for aggregate/list methods - remove Crawl.get_crawl(), redundant with BaseCrawl.get_crawl() version - ensure _files_to_resources returns an empty [] instead of none if empty (matching BaseCrawl.get_crawl() behavior to Crawl.get_crawl()) - tests: update tests to use collectionIds for id list, add 'collections' for {name, id} test - frontend: change Crawl object to have collectionIds instead of collections --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-08-25 00:26:46 -07:00
Ilya Kreymer	8d0a4f2ca9	fix public collections endpoint returning 404 when not public (#1052 ) tests: add tests for public collections endpoint when collection is public and when not	2023-08-04 13:29:13 -04:00
Ilya Kreymer	362afa47bd	Support for Public / Shareable Collections (#1038 ) * collections: support toggling collections public/private, viewable via RWP - backend: add 'public' to collection model, support patching to update - backend: add .../collections/<id>/public/replay.json for public access - backend: add CORS handling for public endpoint - frontend: support 'make shareable / make private' dropdown actions on collection detail + collection list views - frontend: show shareable / private icons by collection name on detail + list views - frontend: link to replayweb.page for standalone browsing - frontend: add embed code popup when a collection is shareable - refer to public collections as 'shareable' for now --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2023-08-03 19:11:01 -07:00
Ilya Kreymer	6506965d98	Streaming Download for Collections (#1012 ) * support streaming download of collections (part of #927) - WACZ zip created on the fly using stream-zip - add 'Download Collection' option to collection detail and list - after editing collection, return to collection view - tests: add test for streaming download, ensure WACZ files + datapackage present, STORE compression used --------- Co-authored-by: sua yoo <sua@suayoo.com>	2023-07-26 15:42:17 -07:00
Tessa Walsh	fcd48b1831	Add totalSize to collections and make it sortable in list endpoint (#1001 ) * Precompute collection.totalSize and make sortable * Add migration to recompute collection data with totalSize	2023-07-24 13:12:23 -04:00
Tessa Walsh	4014d98243	Move pydantic models to separate module + refactor crawl response endpoints to be consistent (#983 ) * Move all pydantic models to models.py to avoid circular dependencies * Include automated crawl details in all-crawls GET endpoints - ensure /all-crawls endpoint resolves names / firstSeed data same as /crawls endpoint for crawls to ensure consistent frontend display. fields added in get and list all-crawl endpoints for automated crawls only: - cid - name - description - firstSeed - seedCount - profileName * Add automated crawl fields to list all-crawls test * Uncomment mongo readinessProbe * cleanup CrawlOutWithResources: - remove 'files' from output model, only resources should be returned - add _files_to_resources() to simplify computing presigned 'resources' from raw 'files' - update upload tests to be more consistent, 'files' never present, 'errors' always none --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-07-20 13:05:33 +02:00
Ilya Kreymer	00eb62214d	Uploads API: BaseCrawl refactor + Initial support for /uploads endpoint (#937 ) * basecrawl refactor: make crawls db more generic, supporting different types of 'base crawls': crawls, uploads, manual archives - move shared functionality to basecrawl.py - create a base BaseCrawl object, which contains start / finish time, metadata and files array - create BaseCrawlOps, base class for CrawlOps, which supports base crawl deletion, querying and collection add/remove * uploads api: (part of #929) - new UploadCrawl object which extends BaseCrawl, has name and description - support multipart form data data upload to /uploads/formdata - support streaming upload of a single file via /uploads/stream, using botocore multipart upload to upload to s3-endpoint in parts - require 'filename' param to set upload filename for streaming uploads (otherwise use form data names) - sanitize filename, place uploads in /uploads/<uuid>/<sanitized-filename>-<random>.wacz - uploads have internal id 'upload-<uuid>' - create UploadedCrawl object with CrawlFiles pointing to the newly uploaded files, set state to 'complete' - handle upload failures, abort multipart upload - ensure uploads added within org bucket path - return id / added when adding new UploadedCrawl - support listing, deleting, and patch /uploads - support upload details via /replay.json to support for replay - add support for 'replaceId=<id>', which would remove all previous files in upload after new upload succeeds. if replaceId doesn't exist, create new upload. (only for stream endpoint so far). - support patching upload metadata: notes, tags and name on uploads (UpdateUpload extends UpdateCrawl and adds 'name') * base crawls api: Add /all-crawls list and delete endpoints for all crawl types (without resources) - support all-crawls/<id>/replay.json with resources - Use ListCrawlOut model for /all-crawls list endpoint - Extend BaseCrawlOut from ListCrawlOut, add type - use 'type: crawl' for crawls and 'type: upload' for uploads - migration: ensure all previous crawl objects / missing type are set to 'type: crawl' - indexes: add db indices on 'type' field and with 'type' field and oid, cid, finished, state * tests: add test for multipart and streaming upload, listing uploads, deleting upload - add sample WACZ for upload testing: 'example.wacz' and 'example-2.wacz' * collections: support adding and remove both crawls and uploads via base crawl - include collection_ids in /all-crawls list - collections replay.json can include both crawls and uploads bump version to 1.6.0-beta.2 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-07-07 09:13:26 -07:00
Tessa Walsh	c7051d5fbf	Backend API consistency pass (#921 ) * Make API add and update method returns consistent - Updates return {"updated": True} - Adds return {"added": True} - Both can additionally have other fields as needed, e.g. id or name - remove Profile response model, as returning added / id only - reformat --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-06-16 18:52:46 -07:00
Tessa Walsh	bd6dc79449	Add frontend support for auto-adding collections to workflows (#916 ) - Adds collections search and list to workflow editor - Adds collections to workflow details component - Adds namePrefix filter to backend GET /orgs/{oid}/collections endpoint to support case-insensitive searching of collections - Adds documentation for new setting --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2023-06-12 18:18:05 -07:00
Tessa Walsh	325355d991	Fix post-crawl collection stats update and add test (#918 ) This fixes #917, where crawls added to a collection via the workflow autoAddCollections were not successfully represented in the crawl and page count stats in the collection after completing.	2023-06-10 19:06:25 -07:00
sua yoo	66b3befef9	Frontend collections beta UI (#886 ) - Support for creating new collections and editing existing collections - Can select crawling workflows which adds entire workflow, and then deselect individual crawls - Can edit existing collections and add more crawls - Can view, create and delete collections via new Collections top-level nav entry	2023-06-06 17:52:01 -07:00
sua yoo	6208ead040	Sort collection by last updated (modified) (#897 ) Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-05-30 14:09:10 -04:00
Ilya Kreymer	4d30a64bc9	collection delete: (#896 ) set delete endpoint to use DELETE verb, fix for #869	2023-05-29 18:19:04 -07:00
Tessa Walsh	9c7a312a4c	Rework collections to track collections in Crawl (#878 ) * Track collections in Crawl rather than crawls in Collection * Add delete collection API endpoint and tests * Precompute collection crawlCount, pageCount, and tags and add them to GET collection responses * Add modified field to Collection * Update collection replay.json method * Make add and remove crawls accept list of crawl ids * Auto-add new workflow crawls to collections when they successfully complete via CrawlConfig.autoAddCollections field * Move long-running post-crawl operator tasks into asyncio task * Make CrawlConfig.autoAddCollections updatable via /update API endpoint	2023-05-25 15:41:50 -04:00
Tessa Walsh	5c944d4626	Remove uniqueness constraint on collection descriptions Fix for copy-paste error	2023-05-23 11:03:13 -04:00
Tessa Walsh	60fac2b677	Add collection sorting and filtering (#863 ) * Sort by name and description (ascending by default) * Filter by name * Add endpoint to fetch collection names for search * Add collation so that utf-8 chars sort as expected	2023-05-22 16:53:49 -04:00
Tessa Walsh	f482831d53	Use collection uuid as id (instead of name) (#855 ) Also ensure name is not empty by adding minimum length of 1 Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-05-19 09:03:48 -04:00
Tessa Walsh	59e49eacd5	Update collections backend API (#759 ) * Re-implement collections, storing crawlIds in collection * Return collections for crawl endpoints and filter on coll name * Remove crawl from all collections when deleted * Revert get_collection_crawls to flat array of resources * Fix tests	2023-04-14 12:17:18 -04:00
Tessa Walsh	e9b61c632d	Add pageSize to pagination format (#736 )	2023-04-03 15:57:47 -04:00
Tessa Walsh	4724754efc	Filter and sort crawl and workflow list API endpoints in backend (#724 ) * Re-implement pagination and paginate crawlconfig revs First step toward simplifying pagination to set us up for sorting and filtering of list endpoints. This commit removes fastapi-pagination as a dependency. * Migrate all HttpUrl seeds to Seeds This commit also updates the frontend to always use Seeds and to fix display issues resulting from the change. * Filter and sort crawls and workflows Crawls: - Filter by createdBy (via userid param) - Filter by state (comma-separated string for multiple values) - Filter by first_seed, name, description - Sort by started, finished, fileSize, firstSeed - Sort descending by default to match frontend Workflows: - Filter by createdBy (formerly userid) and modifiedBy - Filter by first_seed, name, description - Sort by created, modified, firstSeed, lastCrawlTime * Add crawlconfigs search-values API endpoint and test	2023-03-28 17:55:40 -04:00
Tessa Walsh	e98c7172a9	Paginate API list endpoints (#659 ) * Paginate API list endpoints fastapi-pagination is pinned to 0.9.3, the latest release that plays nicely with pinned versions of fastapi and fastapi-users. * Increase page size via overriden Params and Page classes * update api resource list keys --------- Co-authored-by: sua yoo <sua@suayoo.com>	2023-03-06 14:41:25 -05:00
Sara Tavares	8167d7da8d	fix typos (#640 )	2023-02-24 11:10:49 -08:00

1 2

53 Commits