Commit Graph

522 Commits

Author SHA1 Message Date
Ilya Kreymer
2f63c7dcf8
Profiles: Backend API + Nginx Devtools Proxy Support (#212)
* add profile creation, list endpoints at /archives/<aid>/profiles
* add profile browser creation, get, ping, commit, delete endpoints at /archives/<aid>/profiles/browser
* support creation of profile browser using browsertrix-crawler 'create-login-profile' in docker and k8s
* ensure profile browser expires after set time, k8s job or docker container automatically deleted on exit
* profile browser creation returns temporary browser id, or `{"detail": "waiting_for_browser"}` while waiting for browser container init
* nginx frontend: proxy /loadbrowser/ to port 9223 in browsertrix-crawler, connecting directly to chrome devtools
* profile api auth: use redis for auth
- store browserid->archiveid and browserid->browser ip mapping in redis
- browser apis: ensure profile browser is associated with specified archive
- browser ws: pass arcchiveid and browserid to ws query args, browserid is part of archive, and browserid corresponds to specified ip
* store profiles in /profiles/ directory in default storage, include profileid in profile tar.gz filename

* support profile in crawlconfig:
- add profileid to CrawlConfig, and profileName to CrawlConfigOut
- support resolving profile path via profileid, setting '--profile @{path/to/profile.tar.gz}' for crawler (assuming same storage for profile as output for now) in both docker and k8s setups
- docker: support out_filename, custom wacz output filename missing functionality
2022-04-13 19:36:06 -07:00
sua yoo
238ee8f7ee
delete unused component file 2022-04-11 13:18:23 -07:00
sua yoo
8828681e8e
hotfix: fix crawl sort control alignment 2022-04-11 13:13:53 -07:00
sua yoo
d4b3ae3795
delete unused component file 2022-04-11 13:10:23 -07:00
sua yoo
5307138202
enable opening crawl template in new tab 2022-04-11 13:03:19 -07:00
sua yoo
f90ef071de
enable opening crawl in new tab 2022-04-11 13:03:10 -07:00
sua yoo
29b586b03f
Edit crawl config as YAML (#207) 2022-04-06 17:40:25 -07:00
Ilya Kreymer
9a6483630e
Support for Admin interface for viewing web archives (#198)
* backend api
- superadmin has admin access to all archives
- new superadmin endpoints: /archives/all/crawls and /archives/all/crawls/<crawl_id>.json for list all running crawls
and loading crawl data by id

- frontend superadmin view (fixes #201)
* show all archives on superadmin home page
* show jump to crawl for super admin (#200)
* navbar links for: all archives, all running crawls and jump to crawl

Co-authored-by: sua yoo <sua@suayoo.com>
2022-04-06 12:42:04 -07:00
sua yoo
ec3a77b71e
Mobile layout fixes (#206)
closes #202
2022-03-30 15:54:25 -07:00
Ilya Kreymer
aa83d51f7a
k8s backend improvements: (#205)
- add liveness probe for crawls, configurable via 'crawler_liveness_port'
- add User system:anonymous permissions
- treat jobs that have exceeded total as 'partial_complete' (experimental)
2022-03-30 14:39:06 -07:00
sua yoo
9e2274f612
remove temp file 2022-03-30 13:51:02 -07:00
Ilya Kreymer
9e45dc35d2
minor frontend-tweaks: (#196)
* frontend-tweaks:
- treat 'starting' state same as 'running'
- default to no schedule instead of weekly for default
- add 'Domain' scopeType

* backend: also allow 'domain' as a scopeType
2022-03-15 21:19:23 -07:00
sua yoo
8863776c54
Define websocket host in common webpack config (#195)
* move websocket host var to common config, better fix for #193
2022-03-15 18:34:49 -07:00
Ilya Kreymer
e6467c3374 backend work:
- support {configname}-{username}-@ts-@hostsuffix.wacz as output filename, sanitize username and config name
- support returning 'starting' for crawl status if no ips or 0/0 pages found.
- fix updating scale via POST crawlconfig update
- fix duplicate user error on superuser init
2022-03-15 18:20:25 -07:00
Ilya Kreymer
4b2f89db91 k8s: support for using a pre-made persistent volume/claim for crawling, configurable via CRAWLER_PV_CLAIM, otherwise using emptyDir
k8s: ability to set deployment scale for frontend as well
2022-03-15 11:18:23 -07:00
Ilya Kreymer
912004751d quickfix: partial mitigation for #193, use current host for websock address 2022-03-14 15:29:35 -07:00
Ilya Kreymer
8ce7a9802b backend quick fix:
chart/config: use screencastPort, fixed collection name
k8s: set pod to never restart to see logs
2022-03-14 11:42:53 -07:00
sua yoo
6fabea3e7a
Frontend build fixes (#191)
* copy specific files
* replace api host env var
* remove unused dotenv
* Update frontend/webpack.dev.js
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2022-03-10 23:26:21 -08:00
sua yoo
4190e40964
Show last crawl state in UI (#192)
* update crawl list status

* show on detail page
2022-03-10 23:25:42 -08:00
Ilya Kreymer
9c99d67b1d quickfix: backend: docker: fix loading ips for watch 2022-03-04 17:12:19 -08:00
sua yoo
edf6b9ded7
Update home page routing (#186)
closes #183
2022-03-04 16:18:41 -08:00
sua yoo
0fe54653be
Fix unable to save edits to simple view (#185) 2022-03-04 16:17:57 -08:00
sua yoo
f2f67c34af
Copy extra hops value when duplicating crawl config (#184)
closes #158
2022-03-04 16:17:37 -08:00
sua yoo
4383c5e8d8
Disable error tracking in prod (#182)
closes #161
2022-03-04 16:17:05 -08:00
Ilya Kreymer
fb51f8e33e
Mongo auth fix (#190)
* backend: makes mongo auth configurable!
use mongo_auth secret in k8s and set env vars in docker
fixes #177 
* docker: update config.sample.env: use ws screencast by default, add NO_DELETE_ON_FAIL option, extend default login lifetime
2022-03-04 15:04:33 -08:00
Ilya Kreymer
cdd0ab34a3
Watch Stream Directly from Browsertrix Crawler (#189)
* watch work: proxy directly to crawls instead of redis pubsub
- add 'watchIPs' to crawl detail output
- cache crawl ips for quick access for auth
- add '/ipaccess/{ip}' endpoint for watch ws connection to ensure ws has access to the specified container ip
- enable 'auth_request' in nginx frontend
- requirements: update to latest redis-py
remaining fixes for #134
2022-03-04 14:55:11 -08:00
sua yoo
c18418ff09
Show invite message to super admin & layout fixes (#181) 2022-03-02 18:09:26 -08:00
sua yoo
fe31f551b2
Add "crawler" role to members (#174)
closes #139
2022-03-02 18:09:10 -08:00
sua yoo
c888a45d97
Fix seed URLs reset on JSON view toggle (#172)
closes #160
2022-03-02 18:08:45 -08:00
sua yoo
373c489b00
Watch crawl from crawl detail page (#156)
closes #164
closes #134 

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2022-03-02 18:08:08 -08:00
Ilya Kreymer
51a573ef1f backend prod settings:
- set WEB_CONCURRENCY env var to configure number of backend api workers for both docker and k8s
- set via 'backend_workers' in values.yaml
- also add 'rwp_base_url' to values.yaml
- update containers to use public webrecorder/browsertrix-backend and webrecorder/browsertrix-frontend containers
- make liveness, readiness and startup health checks more tolerant
2022-02-28 18:09:13 -08:00
Ilya Kreymer
84a9079b1f
support signing in docker deployment: (#166)
- add authsign to docker-compose.yml
- add signing.sample.yaml to be copied to signing.yaml for authsign
- add WACZ_SIGN_URL and WACZ_SIGN_TOKEN to config.sample.env
- signing enabled if WACZ_SIGN_URL is set
- add instructions on how to enable signing to Deployment
- update .gitignore, don't commit 'signing.yaml'
- update images to use public repo browsertrix images
2022-02-28 14:32:19 -08:00
sua yoo
83ded98081
Set and update crawl scale (#162)
closes #143
2022-02-28 09:14:27 -08:00
Ilya Kreymer
1053675d7d backend: docker setup quickfix: add placeholder 'tianon/true' container to ensure image is pulled, fixes #165 2022-02-28 00:58:17 +00:00
Ilya Kreymer
92878dec2c
Move deployment info to deployment.md (#159)
* add deployment instructions
2022-02-24 00:16:36 -08:00
sua yoo
3fe3691e74
Update crawl run duration at intervals (#155)
fixes #138
2022-02-23 16:14:01 -08:00
Ilya Kreymer
bb9c953d92
Add License, Logo and README updates for release (#157)
* Add new logo (for now)
* Add agpl license
* Minor README cleanup
2022-02-23 12:10:46 -08:00
sua yoo
4af30a02be
Archive and crawl navigation improvements (#154) 2022-02-23 09:19:48 -08:00
sua yoo
b5874c3f8c
call super disconnected callback after custom callback 2022-02-22 15:59:55 -08:00
sua yoo
c563216582
Allow user to edit crawl template (#147)
closes #144
2022-02-22 13:54:25 -08:00
Ilya Kreymer
8ede386a8b
docker image fix (#151)
* frontend docker build: pass GIT_COMMIT_HASH and GIT_BRANCH_NAME as env vars to remove dependency on git in webpack.config.js (for glitchtip)
fixes #150

* default to "unknown" if git and env vars not available

* add comment about error reporting for local use

Co-authored-by: sua yoo <sua@suayoo.com>
2022-02-22 10:52:27 -08:00
Ilya Kreymer
9bd402fa17
New WS Endpoint for Watching Crawl (#152)
* backend support for new watch system (#134):
- support for watch via redis pubsub and websocket connection to backend
- can support watch from any number of crawler instances to support scaled crawls
- use /archives/{aid}/crawls/{crawl_id}/watch/ws websocket endpoint
- ws: ignore graceful connectionclosedok exception, log other exceptions
- set logging to info to instead of debug for now (debug logs all ws traffic)
- remove old watch apis in backend
- remove old websocket routing to crawler instance for old watch system
- oauth bearer check: support websockets, use websocket object if no request object
- crawler args: replace --screencastPort with --screencastRedis
2022-02-22 10:33:10 -08:00
Ilya Kreymer
aa5207915c
backend: fix crawl config revision links (#149)
backed: crawlconfig:
- ensure newId is saved on old config being replaced
- if old config replaced is being deleted, ensure newId link is set on its old config (if any),
and the oldId points to the oldId of config being replaced (if any)
2022-02-21 16:51:27 -08:00
sua yoo
f30b398fea
Deactivate crawl templates in UI (#145)
wip #144
2022-02-21 11:37:15 -08:00
Ilya Kreymer
ee68a2f64e
Support for setting scale in crawlconfig (#148)
* backend: scale support:
- add 'scale' field to crawlconfig
- support updating 'scale' field in crawlconfig patch
- add constraint for crawlconfig and crawl scale (currently 1-3)
2022-02-20 11:27:47 -08:00
Ilya Kreymer
ca626f3c0a k8s chart: add permissions for pod exec and logs 2022-02-20 09:39:11 -08:00
sua yoo
aa645d9b15
Enable frontend exception tracking (#140) 2022-02-18 10:34:07 -08:00
Ilya Kreymer
d05f04be9f
Crawl Config Editing Support (#141)
* support inactive configs in same collection, configs with `inactive` set to true (#137)
- add `inactive`, `newId`, `oldId` to crawlconfigs
- filter out inactive configs by default for most operations
- add index for aid + inactive field for faster querying
- delete returns status: 'deactivated' or 'deleted'
- if no crawls ran, config can be deleted, otherwise it is deactivated

* update crawl endpoint: add general PATCH crawl config endpoint, support updating schedule and name
2022-02-17 16:04:07 -08:00
Ilya Kreymer
e9d6c68f6a frontend: replay: use single wacz replay for now (using first wacz file) 2022-02-15 08:34:14 -08:00
Ilya Kreymer
57e5b9fceb k8s charts: update default resource usage in values.yaml
add liveness probe for backend pod
2022-02-14 18:49:56 -08:00