browsertrix

Author	SHA1	Message	Date
Tessa Walsh	c21153255a	Rename notes to description in frontend and backend (#1011 ) - Rename crawl notes to description - Add migration renaming notes -> description - Stop inheriting workflow description in crawl - Update frontend to replace crawl/upload notes with description - Remove setting of config description from crawl list - Adjust tests for changes	2023-07-26 13:00:04 -07:00
Ilya Kreymer	4bea7565bc	load handling: scale up redis only when crawler pods running (#1009 ) Operator: Modified init behavior to only load redis when at least one crawler pod available: - waits for at least one crawler pod to be available before starting redis pod, to avoid situation where many crawler pods are in pending mode, but redis pods are still running. - redis statefulset starts at scale of 0 - once crawler pod becomes available, redis sts is scaled to 1 (via `initRedis==true` status) - crawl remains in 'starting' or 'waiting_capacity' state until pod becomes available without redis pod running - set to 'running' state only after redis and at least one crawler pod is available - if no crawler pods available after running, or, if stuck in starting for >60 seconds, switch to 'waiting_capacity' state - when switching to 'waiting_capacity', also scale down redis to 0, wait for crawler pod to become available, only then scale up redis to 1, and get back to 'running' other tweaks: - add new status field 'initRedis', default to false, not displayed - crawler pod: consider 'ContainerCreating' state as available, as container will not be blocked by resource limits - add a resync after 3 seconds when waiting for crawler pod or redis pod to become available, configurable via 'operator_fast_resync_secs' - set_state: if not updating state, ensure state reflects actual value in db	2023-07-26 08:40:05 -07:00
Tessa Walsh	608a744aaf	Add migration to replace None with 0 for configmap CRAWL_TIMEOUT (#1008 )	2023-07-24 15:49:26 -04:00
Tessa Walsh	fcd48b1831	Add totalSize to collections and make it sortable in list endpoint (#1001 ) * Precompute collection.totalSize and make sortable * Add migration to recompute collection data with totalSize	2023-07-24 13:12:23 -04:00
sua yoo	75b011f951	Upload WACZ via UI (#992 ) - Users can now upload .WACZ archives from the "Archived Data" page. - Can specify name, description, tags and collection(s) to add upload to - Show progress of upload - Support canceling upload	2023-07-21 16:45:52 +02:00
Tessa Walsh	9f32aa697b	Add collections and tags to upload API endpoints (#993 ) * Add collections and tags to uploads * Fix order of deletion check test * Re-add tags to UploadedCrawl model after rebase * Fix Users model heading	2023-07-21 16:44:56 +02:00
Tessa Walsh	4014d98243	Move pydantic models to separate module + refactor crawl response endpoints to be consistent (#983 ) * Move all pydantic models to models.py to avoid circular dependencies * Include automated crawl details in all-crawls GET endpoints - ensure /all-crawls endpoint resolves names / firstSeed data same as /crawls endpoint for crawls to ensure consistent frontend display. fields added in get and list all-crawl endpoints for automated crawls only: - cid - name - description - firstSeed - seedCount - profileName * Add automated crawl fields to list all-crawls test * Uncomment mongo readinessProbe * cleanup CrawlOutWithResources: - remove 'files' from output model, only resources should be returned - add _files_to_resources() to simplify computing presigned 'resources' from raw 'files' - update upload tests to be more consistent, 'files' never present, 'errors' always none --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-07-20 13:05:33 +02:00
Tessa Walsh	577416024b	Fix pull_request syntax in ansible lint GH Action (#995 ) * Fix pull_request syntax in ansible lint GH Action * Only lint Digital Ocean playbook for now * fix: pass ansible lint --------- Co-authored-by: Anish Lakhwara <anish+git@lakhwara.com>	2023-07-20 12:13:52 +02:00
sua yoo	85913112a2	Upgrade lit + shoelace to reduce build size (#938 ) * upgrade lit * upgrade shoelace * upgrade testing libraries * add webpack bundle analyzer * revert shoelace changes * remove bundle analyzer * remove console log	2023-07-20 11:50:05 +02:00
Tessa Walsh	d5c3a8519f	Add crawler Use Sitemap option to Browsertrix Cloud (#978 ) * Add user-guide docs for Use Sitemap option --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2023-07-19 13:57:52 -04:00
Anish Lakhwara	db851b8360	Merge pull request #976 from webrecorder/ansible-lint-action feat: ansible lint github action	2023-07-19 03:22:22 +10:00
Ilya Kreymer	a5312709bb	fix issues that caused cronjob container to crash: (#987 ) - don't set CRAWL_TIMEOUT to "None" in configmap, and if encountered, just set to 0 - run register_exit_handler() after run loop has been inited	2023-07-18 18:08:53 +02:00
sua yoo	c5b3be0680	Fix frontend formatting pre-commit (#991 ) * update lint staged config * remove prettier defaults	2023-07-18 17:51:13 +02:00
Anish Lakhwara	4fed3ed1b0	fix: resolve ansible pipenv dependencies successfully (#977 )	2023-07-18 17:39:38 +02:00
Ilya Kreymer	5dede47874	remove accidentally added values file!	2023-07-16 15:05:08 +02:00
Anish Lakhwara	bc82f562dc	feat: ansible lint github action	2023-07-10 17:58:47 -07:00
Ilya Kreymer	2372f43c2c	frontend: fix to collection editor with crawls and uploads (#971 ) * frontend: - follow up to #969, fixes crawl workflows by using crawl-specific endpoint and merging results * get crawls and uploads concurrently --------- Co-authored-by: sua yoo <sua@suayoo.com>	2023-07-10 19:29:19 +02:00
sua yoo	f3660839bf	Allow users to add uploads to collections (#968 ) * show uploads in 'Select Uploads' section	2023-07-09 22:21:50 -07:00
Ilya Kreymer	7d694754c6	uploads api ext: (#970 ) - also support collectionId filter on /all-crawls - update tests	2023-07-09 22:12:54 -07:00
Ilya Kreymer	f1bce310d0	uploads api: support filtering uploads by collectionId (#969 ) tests: add collection filter test	2023-07-09 10:54:30 -07:00
Ilya Kreymer	a640f58657	Tests: fix test get crawl loop (#967 ) * tests: add sleep() between all looping get_crawl() calls to avoid tight request loop, also remove unneeded loop will likely fix occasional '504 timeout' test failures where frontend is overwhelmed with /replay.json requests	2023-07-08 17:16:11 -07:00
Henry Wilkinson	d9e73fcbc3	Reorder Limits section (#966 ) * Reorder Limits section - Minor text change to section names - "Limit Per Page" → "Per-Page Limits" - "Limit Per Crawl" → "Per-Crawl Limits" * Reorder limits section in documentation	2023-07-08 08:54:30 -07:00
Anish Lakhwara	fd310f620a	fix: mongodb uri password not accessible on second API call (#964 )	2023-07-08 08:48:50 -07:00
Anish Lakhwara	9489c1e00d	fix: configure_kubectl is the variable name (#963 )	2023-07-08 08:13:54 -07:00
Anish Lakhwara	df82a4755f	fix: pass ansible-lint in DO playbook (#962 ) * fix: pass ansible-lint in DO playbook * fix: don't break s3 module	2023-07-08 08:13:23 -07:00
Ilya Kreymer	8eeb66e11f	Frontend more upload path fixes (#961 ) * additional fixes for #935: - don't use artifactType for detail pages, ensure correct artifact selected based on path * naming tweaks: - from uploads detail, return to 'All Uploads' with filter - from crawls detail, return to 'All Crawls' with filter - rename general to 'All Archived Data'	2023-07-07 15:41:03 -07:00
Anish Lakhwara	478719d59a	fix: only use db_create when the db is created (#959 )	2023-07-07 14:38:03 -07:00
Ilya Kreymer	d3a757e20b	partial fix for: #935 : (#960 ) - add route for /artifacts/upload/<id> to be used for uploads - link uploads to /artifacts/upload/<id> instead of /artifacts/crawl/<id>	2023-07-07 14:23:26 -07:00
sua yoo	de4b18aa67	List crawls, uploads, and all objects in UI (#941 ) - Adds top-level "Archived Data" view, replacing "Finished Crawls" and moving it as "Crawls" into view - Adds list for viewing all artifacts/data - Adds list for viewing all uploaded crawls - Updates crawl detail view to show upload details - Edit upload metadata, including 'name' - Delete uploads --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2023-07-07 13:20:28 -07:00
Ilya Kreymer	d7cb47390e	readd support for passing in 'crawler_extra_args' for additional/custom (#957 ) options not covered by standard crawler opts (removed setting all args this way in #889)	2023-07-07 12:08:40 -07:00
Ilya Kreymer	2038e3d668	remove default: similar to #952 , remove default extraHops setting as it disables 'url list' extraHops by forcing the value to 0 (#954 )	2023-07-07 12:08:30 -07:00
Ilya Kreymer	7139b9a7a9	operator: ensure finished is always set (#953 )	2023-07-07 12:08:15 -07:00
Anish Lakhwara	99117a532b	feat: configure mongodb firewall (#949 ) Co-authored-by: Anish Lakhwara <anish+git@lakhwara.com>	2023-07-07 09:15:36 -07:00
Anish Lakhwara	c5803dcda0	feat: configure kubectl through ansible (#948 ) Co-authored-by: Anish Lakhwara <anish+git@lakhwara.com>	2023-07-07 09:15:18 -07:00
Anish Lakhwara	dd3d9001fb	fix: idempotent mongodb creation, with saved facts (#945 ) Co-authored-by: Anish Lakhwara <anish+git@lakhwara.com>	2023-07-07 09:14:12 -07:00
Ilya Kreymer	00eb62214d	Uploads API: BaseCrawl refactor + Initial support for /uploads endpoint (#937 ) * basecrawl refactor: make crawls db more generic, supporting different types of 'base crawls': crawls, uploads, manual archives - move shared functionality to basecrawl.py - create a base BaseCrawl object, which contains start / finish time, metadata and files array - create BaseCrawlOps, base class for CrawlOps, which supports base crawl deletion, querying and collection add/remove * uploads api: (part of #929) - new UploadCrawl object which extends BaseCrawl, has name and description - support multipart form data data upload to /uploads/formdata - support streaming upload of a single file via /uploads/stream, using botocore multipart upload to upload to s3-endpoint in parts - require 'filename' param to set upload filename for streaming uploads (otherwise use form data names) - sanitize filename, place uploads in /uploads/<uuid>/<sanitized-filename>-<random>.wacz - uploads have internal id 'upload-<uuid>' - create UploadedCrawl object with CrawlFiles pointing to the newly uploaded files, set state to 'complete' - handle upload failures, abort multipart upload - ensure uploads added within org bucket path - return id / added when adding new UploadedCrawl - support listing, deleting, and patch /uploads - support upload details via /replay.json to support for replay - add support for 'replaceId=<id>', which would remove all previous files in upload after new upload succeeds. if replaceId doesn't exist, create new upload. (only for stream endpoint so far). - support patching upload metadata: notes, tags and name on uploads (UpdateUpload extends UpdateCrawl and adds 'name') * base crawls api: Add /all-crawls list and delete endpoints for all crawl types (without resources) - support all-crawls/<id>/replay.json with resources - Use ListCrawlOut model for /all-crawls list endpoint - Extend BaseCrawlOut from ListCrawlOut, add type - use 'type: crawl' for crawls and 'type: upload' for uploads - migration: ensure all previous crawl objects / missing type are set to 'type: crawl' - indexes: add db indices on 'type' field and with 'type' field and oid, cid, finished, state * tests: add test for multipart and streaming upload, listing uploads, deleting upload - add sample WACZ for upload testing: 'example.wacz' and 'example-2.wacz' * collections: support adding and remove both crawls and uploads via base crawl - include collection_ids in /all-crawls list - collections replay.json can include both crawls and uploads bump version to 1.6.0-beta.2 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2023-07-07 09:13:26 -07:00
Anish Lakhwara	e1d6de21a0	docs: ansible deploy docs reflect expected env var names (#946 ) Co-authored-by: Anish Lakhwara <anish+git@lakhwara.com>	2023-07-06 21:57:19 -07:00
Tessa Walsh	29a6f0f6bc	Fix links in watch crawl after workflow crawl completes (#943 )	2023-07-06 15:04:26 -07:00
Tessa Walsh	bf1e817da3	Unset default scopeType for seeds so they inherit parent scopeType by default (#952 )	2023-07-06 15:03:05 -07:00
Henry Wilkinson	8a240ad044	Fixes z-index (#939 )	2023-07-04 23:05:09 -04:00
Henry Wilkinson	ac4716614e	Minor gramatical changes to documentation (#919 )	2023-07-04 17:14:49 -04:00
Ilya Kreymer	4c8de3160b	typo fix: fix extra trailing quote on CRAWL_ARGS in configmap.yaml	2023-06-16 18:55:21 -07:00
Ilya Kreymer	e37f220d6c	version: bump to 1.6.0-beta.1	2023-06-16 18:53:32 -07:00
Tessa Walsh	c7051d5fbf	Backend API consistency pass (#921 ) * Make API add and update method returns consistent - Updates return {"updated": True} - Adds return {"added": True} - Both can additionally have other fields as needed, e.g. id or name - remove Profile response model, as returning added / id only - reformat --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>	2023-06-16 18:52:46 -07:00
Ilya Kreymer	d9ad8c11d2	frontend: fix RWP_BASE_URL not being set correctly for nginx image	2023-06-13 00:04:46 -07:00
Tessa Walsh	bd6dc79449	Add frontend support for auto-adding collections to workflows (#916 ) - Adds collections search and list to workflow editor - Adds collections to workflow details component - Adds namePrefix filter to backend GET /orgs/{oid}/collections endpoint to support case-insensitive searching of collections - Adds documentation for new setting --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2023-06-12 18:18:05 -07:00
Henry Wilkinson	71e9984e65	Adds documentation link and version copy button to footer (#920 ) * Updates footer - Adds documentation link - Adds label to GitHub link, moves outside of the version code - Adds copy button to version code for quick access when filing bug reports :) * Comments out invisible div * Improves responsiveness on mobile	2023-06-12 17:51:21 -07:00
Ilya Kreymer	ec3404c798	Fix Extra URLs in Scope (#913 ) * scope fix: when using 'Custom Page Prefix scope (fixes #873) - don't include primary seed URL in include list - don't always add trailing slash to extra in scope URLs - set seed scope to 'prefix' (supported via webrecorder/browsertrix-crawler#318) instead of re-including seed URL - add comments on using 'custom' to indicate 'Custom Prefix Scope' semantics on frontend, setting actual scope to 'prefix' on backend - remove unneeded conditional for additional urls, main scopeType overridden per seed anyway	2023-06-12 17:29:41 -07:00
Henry Wilkinson	79703baa69	Org Settings documetation & Getting Started docs page updates	2023-06-11 17:39:16 -04:00
Henry Wilkinson	2364433932	Admin Panel Minor Frontend Style Updates (#915 ) - Unifies trash icons on all pages to use trash3 (there were a few stragglers!) - Brings styling of org quotas dialogue in-line with the rest of our dialogues - Adds missing localization strings - Swaps button with icon button to match table row action styling elsewhere	2023-06-10 19:21:34 -07:00

1 2 3 4 5 ...

749 Commits