Commit Graph

860 Commits

Author SHA1 Message Date
sua yoo
8863776c54
Define websocket host in common webpack config (#195)
* move websocket host var to common config, better fix for #193
2022-03-15 18:34:49 -07:00
Ilya Kreymer
e6467c3374 backend work:
- support {configname}-{username}-@ts-@hostsuffix.wacz as output filename, sanitize username and config name
- support returning 'starting' for crawl status if no ips or 0/0 pages found.
- fix updating scale via POST crawlconfig update
- fix duplicate user error on superuser init
2022-03-15 18:20:25 -07:00
Ilya Kreymer
4b2f89db91 k8s: support for using a pre-made persistent volume/claim for crawling, configurable via CRAWLER_PV_CLAIM, otherwise using emptyDir
k8s: ability to set deployment scale for frontend as well
2022-03-15 11:18:23 -07:00
Ilya Kreymer
912004751d quickfix: partial mitigation for #193, use current host for websock address 2022-03-14 15:29:35 -07:00
Ilya Kreymer
8ce7a9802b backend quick fix:
chart/config: use screencastPort, fixed collection name
k8s: set pod to never restart to see logs
2022-03-14 11:42:53 -07:00
sua yoo
6fabea3e7a
Frontend build fixes (#191)
* copy specific files
* replace api host env var
* remove unused dotenv
* Update frontend/webpack.dev.js
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2022-03-10 23:26:21 -08:00
sua yoo
4190e40964
Show last crawl state in UI (#192)
* update crawl list status

* show on detail page
2022-03-10 23:25:42 -08:00
Ilya Kreymer
9c99d67b1d quickfix: backend: docker: fix loading ips for watch 2022-03-04 17:12:19 -08:00
sua yoo
edf6b9ded7
Update home page routing (#186)
closes #183
2022-03-04 16:18:41 -08:00
sua yoo
0fe54653be
Fix unable to save edits to simple view (#185) 2022-03-04 16:17:57 -08:00
sua yoo
f2f67c34af
Copy extra hops value when duplicating crawl config (#184)
closes #158
2022-03-04 16:17:37 -08:00
sua yoo
4383c5e8d8
Disable error tracking in prod (#182)
closes #161
2022-03-04 16:17:05 -08:00
Ilya Kreymer
fb51f8e33e
Mongo auth fix (#190)
* backend: makes mongo auth configurable!
use mongo_auth secret in k8s and set env vars in docker
fixes #177 
* docker: update config.sample.env: use ws screencast by default, add NO_DELETE_ON_FAIL option, extend default login lifetime
2022-03-04 15:04:33 -08:00
Ilya Kreymer
cdd0ab34a3
Watch Stream Directly from Browsertrix Crawler (#189)
* watch work: proxy directly to crawls instead of redis pubsub
- add 'watchIPs' to crawl detail output
- cache crawl ips for quick access for auth
- add '/ipaccess/{ip}' endpoint for watch ws connection to ensure ws has access to the specified container ip
- enable 'auth_request' in nginx frontend
- requirements: update to latest redis-py
remaining fixes for #134
2022-03-04 14:55:11 -08:00
sua yoo
c18418ff09
Show invite message to super admin & layout fixes (#181) 2022-03-02 18:09:26 -08:00
sua yoo
fe31f551b2
Add "crawler" role to members (#174)
closes #139
2022-03-02 18:09:10 -08:00
sua yoo
c888a45d97
Fix seed URLs reset on JSON view toggle (#172)
closes #160
2022-03-02 18:08:45 -08:00
sua yoo
373c489b00
Watch crawl from crawl detail page (#156)
closes #164
closes #134 

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2022-03-02 18:08:08 -08:00
Ilya Kreymer
51a573ef1f backend prod settings:
- set WEB_CONCURRENCY env var to configure number of backend api workers for both docker and k8s
- set via 'backend_workers' in values.yaml
- also add 'rwp_base_url' to values.yaml
- update containers to use public webrecorder/browsertrix-backend and webrecorder/browsertrix-frontend containers
- make liveness, readiness and startup health checks more tolerant
2022-02-28 18:09:13 -08:00
Ilya Kreymer
84a9079b1f
support signing in docker deployment: (#166)
- add authsign to docker-compose.yml
- add signing.sample.yaml to be copied to signing.yaml for authsign
- add WACZ_SIGN_URL and WACZ_SIGN_TOKEN to config.sample.env
- signing enabled if WACZ_SIGN_URL is set
- add instructions on how to enable signing to Deployment
- update .gitignore, don't commit 'signing.yaml'
- update images to use public repo browsertrix images
2022-02-28 14:32:19 -08:00
sua yoo
83ded98081
Set and update crawl scale (#162)
closes #143
2022-02-28 09:14:27 -08:00
Ilya Kreymer
1053675d7d backend: docker setup quickfix: add placeholder 'tianon/true' container to ensure image is pulled, fixes #165 2022-02-28 00:58:17 +00:00
Ilya Kreymer
92878dec2c
Move deployment info to deployment.md (#159)
* add deployment instructions
2022-02-24 00:16:36 -08:00
sua yoo
3fe3691e74
Update crawl run duration at intervals (#155)
fixes #138
2022-02-23 16:14:01 -08:00
Ilya Kreymer
bb9c953d92
Add License, Logo and README updates for release (#157)
* Add new logo (for now)
* Add agpl license
* Minor README cleanup
2022-02-23 12:10:46 -08:00
sua yoo
4af30a02be
Archive and crawl navigation improvements (#154) 2022-02-23 09:19:48 -08:00
sua yoo
b5874c3f8c
call super disconnected callback after custom callback 2022-02-22 15:59:55 -08:00
sua yoo
c563216582
Allow user to edit crawl template (#147)
closes #144
2022-02-22 13:54:25 -08:00
Ilya Kreymer
8ede386a8b
docker image fix (#151)
* frontend docker build: pass GIT_COMMIT_HASH and GIT_BRANCH_NAME as env vars to remove dependency on git in webpack.config.js (for glitchtip)
fixes #150

* default to "unknown" if git and env vars not available

* add comment about error reporting for local use

Co-authored-by: sua yoo <sua@suayoo.com>
2022-02-22 10:52:27 -08:00
Ilya Kreymer
9bd402fa17
New WS Endpoint for Watching Crawl (#152)
* backend support for new watch system (#134):
- support for watch via redis pubsub and websocket connection to backend
- can support watch from any number of crawler instances to support scaled crawls
- use /archives/{aid}/crawls/{crawl_id}/watch/ws websocket endpoint
- ws: ignore graceful connectionclosedok exception, log other exceptions
- set logging to info to instead of debug for now (debug logs all ws traffic)
- remove old watch apis in backend
- remove old websocket routing to crawler instance for old watch system
- oauth bearer check: support websockets, use websocket object if no request object
- crawler args: replace --screencastPort with --screencastRedis
2022-02-22 10:33:10 -08:00
Ilya Kreymer
aa5207915c
backend: fix crawl config revision links (#149)
backed: crawlconfig:
- ensure newId is saved on old config being replaced
- if old config replaced is being deleted, ensure newId link is set on its old config (if any),
and the oldId points to the oldId of config being replaced (if any)
2022-02-21 16:51:27 -08:00
sua yoo
f30b398fea
Deactivate crawl templates in UI (#145)
wip #144
2022-02-21 11:37:15 -08:00
Ilya Kreymer
ee68a2f64e
Support for setting scale in crawlconfig (#148)
* backend: scale support:
- add 'scale' field to crawlconfig
- support updating 'scale' field in crawlconfig patch
- add constraint for crawlconfig and crawl scale (currently 1-3)
2022-02-20 11:27:47 -08:00
Ilya Kreymer
ca626f3c0a k8s chart: add permissions for pod exec and logs 2022-02-20 09:39:11 -08:00
sua yoo
aa645d9b15
Enable frontend exception tracking (#140) 2022-02-18 10:34:07 -08:00
Ilya Kreymer
d05f04be9f
Crawl Config Editing Support (#141)
* support inactive configs in same collection, configs with `inactive` set to true (#137)
- add `inactive`, `newId`, `oldId` to crawlconfigs
- filter out inactive configs by default for most operations
- add index for aid + inactive field for faster querying
- delete returns status: 'deactivated' or 'deleted'
- if no crawls ran, config can be deleted, otherwise it is deactivated

* update crawl endpoint: add general PATCH crawl config endpoint, support updating schedule and name
2022-02-17 16:04:07 -08:00
Ilya Kreymer
e9d6c68f6a frontend: replay: use single wacz replay for now (using first wacz file) 2022-02-15 08:34:14 -08:00
Ilya Kreymer
57e5b9fceb k8s charts: update default resource usage in values.yaml
add liveness probe for backend pod
2022-02-14 18:49:56 -08:00
Ilya Kreymer
d28ebcc7b6 backend: crawlconfig: don't pass default settings to crawlconfig to avoid redundant settings, use browsertrix-crawler defaults
when config not set
2022-02-14 18:47:52 -08:00
Ilya Kreymer
ca85edc8b3 backend: resource limits:
- set resource mem and cpu requests/limits for all used services (not minio for now)
- add readiness proble to redis, mongo
- adjust crawler limits, set via configmap
2022-02-08 19:53:41 -08:00
sua yoo
c577e36b74
add debug for access token 2022-02-08 17:52:27 -08:00
Ilya Kreymer
71842be94a backend: k8s setup minor tweaks:
- add 'emptyDir' volume for crawl directory (to allow any pod restarts to have access to the data)
- rename minio and redis volumes to avoid any confusion
- add pod termination grace-period (default to 600 secs)
2022-02-08 15:52:57 -08:00
Ilya Kreymer
8acb43b171 backend: use redis to mark crawls as canceled immediately, avoid dupes in crawl list (even if paging is added for db results) 2022-02-01 15:58:56 -08:00
Ilya Kreymer
4b7522920a backend: k8s: fix finished check, resource limits increase 2022-02-01 15:07:20 -08:00
sua yoo
02f46f108b
Crawl & crawl config UX improvements (#136) 2022-02-01 14:28:07 -08:00
Ilya Kreymer
b3f21932fc backend: k8s: list running jobs tweak: if succeeded jobs == number of parallel jobs, filter out from list, assume finished and not stopping 2022-02-01 00:05:13 -08:00
Ilya Kreymer
2b2e6fedfa
Misc backend fixes (#133)
* misc backend fixes:
- fix uuid typing: roles list, user invites
- crawlconfig: fix created date setting, fix userName lookup
- docker: fix timezone for scheduler, fix running check
- remove prints
- fix get crawl stuck in 'stopping' - check finished list first, then run list (in case k8s job has not been deleted)
2022-01-31 19:41:04 -08:00
sua yoo
d7f58c964c
Fix in-app link UX (#132)
closes #130, closes #113
2022-01-31 17:36:50 -08:00
Ilya Kreymer
adb5c835f2
Presign and replay (#127)
* support for replay via replayweb.page embed, fixes #124

backend:
- pre-sign all files urls
- cache pre-signed urls in redis, presign again when expired (default duration 3600, settable via PRESIGN_DURATION_SECONDS env var)
- change files output -> resources to confirm to Data Package spec supported by replayweb.page
- add CrawlFileOut which contains 'name' (file id), 'path' (presigned url), 'hash', and 'size'
- add /replay/sw.js endpoint to import sw.js from latest replay-web-page release
- update to fastapi-users 9.2.2
- customize backend auth to allow authentication to check 'auth_bearer' query arg if 'Authorization' header not set
- remove sw.js endpoint, handling in frontend

frontend:
- add <replay-web-page> to frontend, include rwp ui.js from latest release in index.html for now
- update crawl api endpoint to end in json
- replay-web-page loads the api endpoint directly!
- update Crawl type to use new format, 'resources' -> instead of 'files', each file has 'name' and 'path'

- nginx: add endpoint to serve the replay sw.js endpoint
- add defer attr to ui.js
- move 'Download' to 'Download Files'

* frontend: support customizing replayweb.page loading url via RWP_BASE_URL env var in Dockerfile
- default prod value set in frontend Dockerfile (set to upcoming 1.5.8 release needed for multi-wacz-file support) (can be overridden during image build via --build-arg)
- rename index.html -> index.ejs to allow interpolation
- RWP_BASE_URL defaults to latest https://replayweb.page/ for testing
- for local testing, add sw.js loading via devServer, also using RWP_BASE_URL (#131)

Co-authored-by: sua yoo <sua@suayoo.com>
2022-01-31 17:02:15 -08:00
sua yoo
336cf11521
Fix "View crawl" links (#129)
* update key

* update in crawl config
2022-01-31 15:45:48 -08:00