browsertrix

Author	SHA1	Message	Date
Ilya Kreymer	9bd402fa17	New WS Endpoint for Watching Crawl (#152 ) * backend support for new watch system (#134): - support for watch via redis pubsub and websocket connection to backend - can support watch from any number of crawler instances to support scaled crawls - use /archives/{aid}/crawls/{crawl_id}/watch/ws websocket endpoint - ws: ignore graceful connectionclosedok exception, log other exceptions - set logging to info to instead of debug for now (debug logs all ws traffic) - remove old watch apis in backend - remove old websocket routing to crawler instance for old watch system - oauth bearer check: support websockets, use websocket object if no request object - crawler args: replace --screencastPort with --screencastRedis	2022-02-22 10:33:10 -08:00
Ilya Kreymer	aa5207915c	backend: fix crawl config revision links (#149 ) backed: crawlconfig: - ensure newId is saved on old config being replaced - if old config replaced is being deleted, ensure newId link is set on its old config (if any), and the oldId points to the oldId of config being replaced (if any)	2022-02-21 16:51:27 -08:00
sua yoo	f30b398fea	Deactivate crawl templates in UI (#145 ) wip #144	2022-02-21 11:37:15 -08:00
Ilya Kreymer	ee68a2f64e	Support for setting scale in crawlconfig (#148 ) * backend: scale support: - add 'scale' field to crawlconfig - support updating 'scale' field in crawlconfig patch - add constraint for crawlconfig and crawl scale (currently 1-3)	2022-02-20 11:27:47 -08:00
Ilya Kreymer	ca626f3c0a	k8s chart: add permissions for pod exec and logs	2022-02-20 09:39:11 -08:00
sua yoo	aa645d9b15	Enable frontend exception tracking (#140 )	2022-02-18 10:34:07 -08:00
Ilya Kreymer	d05f04be9f	Crawl Config Editing Support (#141 ) * support inactive configs in same collection, configs with `inactive` set to true (#137) - add `inactive`, `newId`, `oldId` to crawlconfigs - filter out inactive configs by default for most operations - add index for aid + inactive field for faster querying - delete returns status: 'deactivated' or 'deleted' - if no crawls ran, config can be deleted, otherwise it is deactivated * update crawl endpoint: add general PATCH crawl config endpoint, support updating schedule and name	2022-02-17 16:04:07 -08:00
Ilya Kreymer	e9d6c68f6a	frontend: replay: use single wacz replay for now (using first wacz file)	2022-02-15 08:34:14 -08:00
Ilya Kreymer	57e5b9fceb	k8s charts: update default resource usage in values.yaml add liveness probe for backend pod	2022-02-14 18:49:56 -08:00
Ilya Kreymer	d28ebcc7b6	backend: crawlconfig: don't pass default settings to crawlconfig to avoid redundant settings, use browsertrix-crawler defaults when config not set	2022-02-14 18:47:52 -08:00
Ilya Kreymer	ca85edc8b3	backend: resource limits: - set resource mem and cpu requests/limits for all used services (not minio for now) - add readiness proble to redis, mongo - adjust crawler limits, set via configmap	2022-02-08 19:53:41 -08:00
sua yoo	c577e36b74	add debug for access token	2022-02-08 17:52:27 -08:00
Ilya Kreymer	71842be94a	backend: k8s setup minor tweaks: - add 'emptyDir' volume for crawl directory (to allow any pod restarts to have access to the data) - rename minio and redis volumes to avoid any confusion - add pod termination grace-period (default to 600 secs)	2022-02-08 15:52:57 -08:00
Ilya Kreymer	8acb43b171	backend: use redis to mark crawls as canceled immediately, avoid dupes in crawl list (even if paging is added for db results)	2022-02-01 15:58:56 -08:00
Ilya Kreymer	4b7522920a	backend: k8s: fix finished check, resource limits increase	2022-02-01 15:07:20 -08:00
sua yoo	02f46f108b	Crawl & crawl config UX improvements (#136 )	2022-02-01 14:28:07 -08:00
Ilya Kreymer	b3f21932fc	backend: k8s: list running jobs tweak: if succeeded jobs == number of parallel jobs, filter out from list, assume finished and not stopping	2022-02-01 00:05:13 -08:00
Ilya Kreymer	2b2e6fedfa	Misc backend fixes (#133 ) * misc backend fixes: - fix uuid typing: roles list, user invites - crawlconfig: fix created date setting, fix userName lookup - docker: fix timezone for scheduler, fix running check - remove prints - fix get crawl stuck in 'stopping' - check finished list first, then run list (in case k8s job has not been deleted)	2022-01-31 19:41:04 -08:00
sua yoo	d7f58c964c	Fix in-app link UX (#132 ) closes #130, closes #113	2022-01-31 17:36:50 -08:00
Ilya Kreymer	adb5c835f2	Presign and replay (#127 ) * support for replay via replayweb.page embed, fixes #124 backend: - pre-sign all files urls - cache pre-signed urls in redis, presign again when expired (default duration 3600, settable via PRESIGN_DURATION_SECONDS env var) - change files output -> resources to confirm to Data Package spec supported by replayweb.page - add CrawlFileOut which contains 'name' (file id), 'path' (presigned url), 'hash', and 'size' - add /replay/sw.js endpoint to import sw.js from latest replay-web-page release - update to fastapi-users 9.2.2 - customize backend auth to allow authentication to check 'auth_bearer' query arg if 'Authorization' header not set - remove sw.js endpoint, handling in frontend frontend: - add <replay-web-page> to frontend, include rwp ui.js from latest release in index.html for now - update crawl api endpoint to end in json - replay-web-page loads the api endpoint directly! - update Crawl type to use new format, 'resources' -> instead of 'files', each file has 'name' and 'path' - nginx: add endpoint to serve the replay sw.js endpoint - add defer attr to ui.js - move 'Download' to 'Download Files' * frontend: support customizing replayweb.page loading url via RWP_BASE_URL env var in Dockerfile - default prod value set in frontend Dockerfile (set to upcoming 1.5.8 release needed for multi-wacz-file support) (can be overridden during image build via --build-arg) - rename index.html -> index.ejs to allow interpolation - RWP_BASE_URL defaults to latest https://replayweb.page/ for testing - for local testing, add sw.js loading via devServer, also using RWP_BASE_URL (#131) Co-authored-by: sua yoo <sua@suayoo.com>	2022-01-31 17:02:15 -08:00
sua yoo	336cf11521	Fix "View crawl" links (#129 ) * update key * update in crawl config	2022-01-31 15:45:48 -08:00
sua yoo	d7c0877403	Refactor archive tabs & navigation improvements (#123 ) closes #112	2022-01-31 15:45:36 -08:00
sua yoo	9de1a3a003	fix stopping gracefully feedback	2022-01-31 12:02:10 -08:00
Ilya Kreymer	f569125a3d	storage: support loading default storage from crawl manangers (#126 ) support s3-compatible presigning with default storage backend support for #120	2022-01-31 11:22:03 -08:00
Ilya Kreymer	523b557eac	replay route: (prepare for replay, #124 ) - add support for /replay/sw.js - ensure route works in both k8s and docker (routed via main nginx)	2022-01-31 11:18:10 -08:00
Ilya Kreymer	be86505347	backend: crawls api: better fix for graceful stop - k8s: don't use redis, set to 'stopping' if status.active is not set, toggled immediately on delete_job - docker: set custom redis key to indicate 'stopping' state (container still running) - api: remove crawl is_running endpoint, redundant with general get crawl api	2022-01-30 22:01:00 -08:00
Ilya Kreymer	542680daf7	backend fixes: fix graceful stop + stats (#122 ) * backend fixes: fix graceful stop + stats - use redis to track stopping state, to be overwritten when finished - also include stats in completed crawls - docker: use short container id for crawl id - graceful stop returns 'stopping_gracefully' instead of 'stopped_gracefully' - don't set stopping state when complete! - beginning files support: resolve absolute urls for crawl detail (not pre-signing yet)	2022-01-30 18:58:47 -08:00
sua yoo	be4bf3742f	Initial crawl detail page (#108 )	2022-01-30 18:36:43 -08:00
sua yoo	7c067ffe36	Crawl template enhancements (#114 ) closes #100	2022-01-30 18:30:54 -08:00
Ilya Kreymer	bcbc40059e	Refactor backend data model to support UUID (fixes #118 ) (#119 ) * uuid fix: (fixes #118) - update all mongo models to use UUID type as main '_id' (users continue to use 'id' as defined by fastapi-users) - update all foreign doc references to use UUID instead of string - api handlers convert str->uuid as needed api fix: - fix single crawl api, add CrawlOut response model - fix collections api - fix standalone-docker apis - for manual job, set user to current user, overriding the setting from crawlconfig * additional fixes: - rename username -> userName to indicate not the login 'username' - rename user -> userid, archive -> aid for crawlconfig + crawls - ensure invites correctly convert str -> uuid as needed - filter out unset values from browsertrix-crawler config * convert remaining user -> userid variables ensure archive id is passed to crawl_manager as str (via archive.id_str) * remove bulk crawlconfig delete * add support for `stopping` state when gracefully stopping crawl * for get crawl endpoint, check stopped crawls first, then running	2022-01-29 19:00:11 -08:00
sua yoo	b93ca4e833	Add empty state for crawls (#121 )	2022-01-29 15:55:44 -08:00
sua yoo	7777a22829	Poll crawls list & add additional details (#116 )	2022-01-29 14:37:16 -08:00
Ilya Kreymer	9499ebfbba	Crawls API improvements (#117 ) * crawls api improvements (fixes #110) - add GET /crawls/{crawlid} api to return single crawl - resolve crawlconfig name, add as `configName` to crawl model - add 'created' date for crawlconfigs - flatten list to single 'crawls' list, instead of separate 'finished' and 'running' (running crawls added first) - include 'fileCount' and 'fileSize', remove files - remove `files` from crawl list response, also remove `aid` - remove `schedule` from crawl data altogether, (available in crawl config) - add ListCrawls response model	2022-01-29 12:08:02 -08:00
sua yoo	2636f33123	Make crawl list interactive (#109 ) - Cancel and stop crawl - Sorts crawls by start time, status and crawl template ID - Filters crawls by crawl template ID - Adds shortcut to copy template ID	2022-01-29 10:38:58 -08:00
Ilya Kreymer	01ad7e656f	quickfix: for /cancel immediate crawl cancelation, send SIGABRT instead of SIGUSR1	2022-01-27 20:45:03 -08:00
sua yoo	28b59130a6	Initial list of crawls (#105 )	2022-01-27 19:28:19 -08:00
Ilya Kreymer	2e2b8b329d	Add signing server via authsign (k8s only) (#107 ) - add k8s deployment of signing server, if 'signer.enabled' chart value if set - update ingress to provide access for 'signer.host' if signing server enabled to verify domain, run signing server itself on different port (also turn off ssl redirects to support signing server) - set WACZ_SIGN_URL and WACZ_SIGN_TOKEN (supported in browesertrix-crawler 0.5.0) - authsign deployment uses a volume to store current certs - add sample signer block, with signing disabled by default	2022-01-26 23:27:13 -08:00
sua yoo	5fccd07329	Edit crawl schedule (#103 )	2022-01-26 22:11:32 -08:00
Ilya Kreymer	0bea0cfff2	crawl config new template: add support for 'extraHops' config option (available in browsertrix-crawler 0.5.0) (#104 ) frontend: - add checkbox to basic crawl config component which sets 'extraHops' to 1, otherwise to 0 - text tweaks: rename Scope Type -> Crawl Scope, capitalization backend: add 'extraHops' to CrawlConfig fixes #102	2022-01-26 21:18:22 -08:00
sua yoo	2666b6f6aa	Duplicate crawl config from list (#99 )	2022-01-25 17:07:54 -08:00
sua yoo	3a461d86d4	Crawl config detail views (#97 )	2022-01-25 11:56:34 -08:00
Ilya Kreymer	f55f84c60b	backend: - crawlconfigs cleanup: simplify get_crawl_configs api - return CrawlConfigOut for single crawlconfig api endpoint, include currCrawlId	2022-01-22 17:41:37 -08:00
Ilya Kreymer	77aa5213f2	quickfix: typo fix, return config, not archive, fixes #96	2022-01-22 17:21:29 -08:00
sua yoo	9ed216ba05	Run and delete crawl templates from list view (#94 )	2022-01-22 14:18:02 -08:00
Ilya Kreymer	b506442b21	backend api: add curr crawl to crawlconfig listing (#95 ) * backend api: add current crawl id to crawlconfig listing - model: add 'currCrawlId' to CrawlConfig model - output: add response model to /crawlconfigs api response to show correct openapi model - rename crawl_configs -> crawlConfigs for consistency	2022-01-22 13:52:46 -08:00
sua yoo	ec1a758e42	Upgrade TailwindCSS to v3 (#93 )	2022-01-19 22:00:06 -08:00
sua yoo	cb5cf55c69	Add helper for dispatching notify events (#92 )	2022-01-19 21:01:47 -08:00
sua yoo	22942527e9	Refactor crawl config into multiple components (#91 )	2022-01-19 18:43:19 -08:00
sua yoo	3645e3b096	Create crawl config UX enhancements (#90 ) closes #87	2022-01-19 11:01:17 -08:00
sua yoo	c3edb4bba4	Allow user to configure crawls with JSON (#86 )	2022-01-18 19:58:55 -08:00

1 2 3

131 Commits