Commit Graph

926 Commits

Author SHA1 Message Date
sua yoo
aa645d9b15
Enable frontend exception tracking (#140) 2022-02-18 10:34:07 -08:00
Ilya Kreymer
d05f04be9f
Crawl Config Editing Support (#141)
* support inactive configs in same collection, configs with `inactive` set to true (#137)
- add `inactive`, `newId`, `oldId` to crawlconfigs
- filter out inactive configs by default for most operations
- add index for aid + inactive field for faster querying
- delete returns status: 'deactivated' or 'deleted'
- if no crawls ran, config can be deleted, otherwise it is deactivated

* update crawl endpoint: add general PATCH crawl config endpoint, support updating schedule and name
2022-02-17 16:04:07 -08:00
Ilya Kreymer
e9d6c68f6a frontend: replay: use single wacz replay for now (using first wacz file) 2022-02-15 08:34:14 -08:00
Ilya Kreymer
57e5b9fceb k8s charts: update default resource usage in values.yaml
add liveness probe for backend pod
2022-02-14 18:49:56 -08:00
Ilya Kreymer
d28ebcc7b6 backend: crawlconfig: don't pass default settings to crawlconfig to avoid redundant settings, use browsertrix-crawler defaults
when config not set
2022-02-14 18:47:52 -08:00
Ilya Kreymer
ca85edc8b3 backend: resource limits:
- set resource mem and cpu requests/limits for all used services (not minio for now)
- add readiness proble to redis, mongo
- adjust crawler limits, set via configmap
2022-02-08 19:53:41 -08:00
sua yoo
c577e36b74
add debug for access token 2022-02-08 17:52:27 -08:00
Ilya Kreymer
71842be94a backend: k8s setup minor tweaks:
- add 'emptyDir' volume for crawl directory (to allow any pod restarts to have access to the data)
- rename minio and redis volumes to avoid any confusion
- add pod termination grace-period (default to 600 secs)
2022-02-08 15:52:57 -08:00
Ilya Kreymer
8acb43b171 backend: use redis to mark crawls as canceled immediately, avoid dupes in crawl list (even if paging is added for db results) 2022-02-01 15:58:56 -08:00
Ilya Kreymer
4b7522920a backend: k8s: fix finished check, resource limits increase 2022-02-01 15:07:20 -08:00
sua yoo
02f46f108b
Crawl & crawl config UX improvements (#136) 2022-02-01 14:28:07 -08:00
Ilya Kreymer
b3f21932fc backend: k8s: list running jobs tweak: if succeeded jobs == number of parallel jobs, filter out from list, assume finished and not stopping 2022-02-01 00:05:13 -08:00
Ilya Kreymer
2b2e6fedfa
Misc backend fixes (#133)
* misc backend fixes:
- fix uuid typing: roles list, user invites
- crawlconfig: fix created date setting, fix userName lookup
- docker: fix timezone for scheduler, fix running check
- remove prints
- fix get crawl stuck in 'stopping' - check finished list first, then run list (in case k8s job has not been deleted)
2022-01-31 19:41:04 -08:00
sua yoo
d7f58c964c
Fix in-app link UX (#132)
closes #130, closes #113
2022-01-31 17:36:50 -08:00
Ilya Kreymer
adb5c835f2
Presign and replay (#127)
* support for replay via replayweb.page embed, fixes #124

backend:
- pre-sign all files urls
- cache pre-signed urls in redis, presign again when expired (default duration 3600, settable via PRESIGN_DURATION_SECONDS env var)
- change files output -> resources to confirm to Data Package spec supported by replayweb.page
- add CrawlFileOut which contains 'name' (file id), 'path' (presigned url), 'hash', and 'size'
- add /replay/sw.js endpoint to import sw.js from latest replay-web-page release
- update to fastapi-users 9.2.2
- customize backend auth to allow authentication to check 'auth_bearer' query arg if 'Authorization' header not set
- remove sw.js endpoint, handling in frontend

frontend:
- add <replay-web-page> to frontend, include rwp ui.js from latest release in index.html for now
- update crawl api endpoint to end in json
- replay-web-page loads the api endpoint directly!
- update Crawl type to use new format, 'resources' -> instead of 'files', each file has 'name' and 'path'

- nginx: add endpoint to serve the replay sw.js endpoint
- add defer attr to ui.js
- move 'Download' to 'Download Files'

* frontend: support customizing replayweb.page loading url via RWP_BASE_URL env var in Dockerfile
- default prod value set in frontend Dockerfile (set to upcoming 1.5.8 release needed for multi-wacz-file support) (can be overridden during image build via --build-arg)
- rename index.html -> index.ejs to allow interpolation
- RWP_BASE_URL defaults to latest https://replayweb.page/ for testing
- for local testing, add sw.js loading via devServer, also using RWP_BASE_URL (#131)

Co-authored-by: sua yoo <sua@suayoo.com>
2022-01-31 17:02:15 -08:00
sua yoo
336cf11521
Fix "View crawl" links (#129)
* update key

* update in crawl config
2022-01-31 15:45:48 -08:00
sua yoo
d7c0877403
Refactor archive tabs & navigation improvements (#123)
closes #112
2022-01-31 15:45:36 -08:00
sua yoo
9de1a3a003
fix stopping gracefully feedback 2022-01-31 12:02:10 -08:00
Ilya Kreymer
f569125a3d
storage: support loading default storage from crawl manangers (#126)
support s3-compatible presigning with default storage
backend support for #120
2022-01-31 11:22:03 -08:00
Ilya Kreymer
523b557eac replay route: (prepare for replay, #124)
- add support for /replay/sw.js
- ensure route works in both k8s and docker (routed via main nginx)
2022-01-31 11:18:10 -08:00
Ilya Kreymer
be86505347 backend: crawls api: better fix for graceful stop
- k8s: don't use redis, set to 'stopping' if status.active is not set, toggled immediately on delete_job
- docker: set custom redis key to indicate 'stopping' state (container still running)
- api: remove crawl is_running endpoint, redundant with general get crawl api
2022-01-30 22:01:00 -08:00
Ilya Kreymer
542680daf7
backend fixes: fix graceful stop + stats (#122)
* backend fixes: fix graceful stop + stats
- use redis to track stopping state, to be overwritten when finished
- also include stats in completed crawls
- docker: use short container id for crawl id
- graceful stop returns 'stopping_gracefully' instead of 'stopped_gracefully'
- don't set stopping state when complete!
- beginning files support: resolve absolute urls for crawl detail (not pre-signing yet)
2022-01-30 18:58:47 -08:00
sua yoo
be4bf3742f
Initial crawl detail page (#108) 2022-01-30 18:36:43 -08:00
sua yoo
7c067ffe36
Crawl template enhancements (#114)
closes #100
2022-01-30 18:30:54 -08:00
Ilya Kreymer
bcbc40059e
Refactor backend data model to support UUID (fixes #118) (#119)
* uuid fix: (fixes #118)
- update all mongo models to use UUID type as main '_id' (users continue to use 'id' as defined by fastapi-users)
- update all foreign doc references to use UUID instead of string
- api handlers convert str->uuid as needed
api fix:
- fix single crawl api, add CrawlOut response model
- fix collections api
- fix standalone-docker apis
- for manual job, set user to current user, overriding the setting from crawlconfig

* additional fixes:
- rename username -> userName to indicate not the login 'username'
- rename user -> userid, archive -> aid for crawlconfig + crawls
- ensure invites correctly convert str -> uuid as needed
- filter out unset values from browsertrix-crawler config

* convert remaining user -> userid variables
ensure archive id is passed to crawl_manager as str (via archive.id_str)

* remove bulk crawlconfig delete
* add support for `stopping` state when gracefully stopping crawl
* for get crawl endpoint, check stopped crawls first, then running
2022-01-29 19:00:11 -08:00
sua yoo
b93ca4e833
Add empty state for crawls (#121) 2022-01-29 15:55:44 -08:00
sua yoo
7777a22829
Poll crawls list & add additional details (#116) 2022-01-29 14:37:16 -08:00
Ilya Kreymer
9499ebfbba
Crawls API improvements (#117)
* crawls api improvements (fixes #110)
- add GET /crawls/{crawlid} api to return single crawl
- resolve crawlconfig name, add as `configName` to crawl model
- add 'created' date for crawlconfigs 
- flatten list to single 'crawls' list, instead of separate 'finished' and 'running' (running crawls added first)
- include 'fileCount' and 'fileSize', remove files
- remove `files` from crawl list response, also remove `aid`
- remove `schedule` from crawl data altogether, (available in crawl config)
- add ListCrawls response model
2022-01-29 12:08:02 -08:00
sua yoo
2636f33123
Make crawl list interactive (#109)
- Cancel and stop crawl
- Sorts crawls by start time, status and crawl template ID
- Filters crawls by crawl template ID
- Adds shortcut to copy template ID
2022-01-29 10:38:58 -08:00
Ilya Kreymer
01ad7e656f quickfix: for /cancel immediate crawl cancelation, send SIGABRT instead of SIGUSR1 2022-01-27 20:45:03 -08:00
sua yoo
28b59130a6
Initial list of crawls (#105) 2022-01-27 19:28:19 -08:00
Ilya Kreymer
2e2b8b329d
Add signing server via authsign (k8s only) (#107)
- add k8s deployment of signing server, if 'signer.enabled' chart value if set
- update ingress to provide access for 'signer.host' if signing server enabled to verify domain, run signing server itself on different port (also turn off ssl redirects to support signing server)
- set WACZ_SIGN_URL and WACZ_SIGN_TOKEN (supported in browesertrix-crawler 0.5.0)
- authsign deployment uses a volume to store current certs
- add sample signer block, with signing disabled by default
2022-01-26 23:27:13 -08:00
sua yoo
5fccd07329
Edit crawl schedule (#103) 2022-01-26 22:11:32 -08:00
Ilya Kreymer
0bea0cfff2
crawl config new template: add support for 'extraHops' config option (available in browsertrix-crawler 0.5.0) (#104)
frontend:
- add checkbox to basic crawl config component which sets 'extraHops' to 1, otherwise to 0
- text tweaks: rename Scope Type -> Crawl Scope, capitalization

backend: add 'extraHops' to CrawlConfig
fixes #102
2022-01-26 21:18:22 -08:00
sua yoo
2666b6f6aa
Duplicate crawl config from list (#99) 2022-01-25 17:07:54 -08:00
sua yoo
3a461d86d4
Crawl config detail views (#97) 2022-01-25 11:56:34 -08:00
Ilya Kreymer
f55f84c60b backend:
- crawlconfigs cleanup: simplify get_crawl_configs api
- return CrawlConfigOut for single crawlconfig api endpoint, include currCrawlId
2022-01-22 17:41:37 -08:00
Ilya Kreymer
77aa5213f2 quickfix: typo fix, return config, not archive, fixes #96 2022-01-22 17:21:29 -08:00
sua yoo
9ed216ba05
Run and delete crawl templates from list view (#94) 2022-01-22 14:18:02 -08:00
Ilya Kreymer
b506442b21
backend api: add curr crawl to crawlconfig listing (#95)
* backend api: add current crawl id to crawlconfig listing
- model: add 'currCrawlId' to CrawlConfig model
- output: add response model to /crawlconfigs api response to show correct openapi model
- rename crawl_configs -> crawlConfigs for consistency
2022-01-22 13:52:46 -08:00
sua yoo
ec1a758e42
Upgrade TailwindCSS to v3 (#93) 2022-01-19 22:00:06 -08:00
sua yoo
cb5cf55c69
Add helper for dispatching notify events (#92) 2022-01-19 21:01:47 -08:00
sua yoo
22942527e9
Refactor crawl config into multiple components (#91) 2022-01-19 18:43:19 -08:00
sua yoo
3645e3b096
Create crawl config UX enhancements (#90)
closes #87
2022-01-19 11:01:17 -08:00
sua yoo
c3edb4bba4
Allow user to configure crawls with JSON (#86) 2022-01-18 19:58:55 -08:00
sua yoo
ff77a92108
Schedule time of day when creating config (#85) 2022-01-18 13:58:28 -08:00
sua yoo
f58bc801fc
remove unused package dependency 2022-01-16 14:49:21 -08:00
sua yoo
b2088f5634
Add initial crawl template form (#80) 2022-01-16 14:43:33 -08:00
Ilya Kreymer
b3ca501a19 helm chart: support cloud-based persistent volumes if values 'volume_storage_class' is specified.
use PersistentVolumeClaim to create a persistent volume for each local service (mongo, minio, redis) when running in a cloud setup
if cloud-specified volume storage class not specified, create default hostPath volume (eg. for minikube)
lint: add default icon for chart
2022-01-15 19:34:31 -08:00
Ilya Kreymer
88f1689e0e crawlconfig: add 'name' property to crawl config
superuser init: don't check invite token for verified superuser (automatic init)
fix formatting
2022-01-15 19:06:48 -08:00