Commit Graph

179 Commits

Author SHA1 Message Date
Ilya Kreymer
68ec582f73
nginx simplify: (#259)
- add custom init script for ./docker-entrypoint.d/ to setup resolver from local /etc/resolv.conf
- custom init script also removes default.conf, and removes minio route if NO_MINIO_ROUTE=1 is set
- assign template vars to nginx vars to avoid conflicts when interpolating
- k8s: remove initContainers and volumes, now handled via custom init script in image
2022-06-13 11:53:15 -07:00
Ilya Kreymer
d16c22f45a Merge branch 'main' into dev 2022-06-11 12:40:18 -07:00
Ilya Kreymer
9fce8cfc1d frontend: fix missed renames 2022-06-11 12:37:24 -07:00
sua yoo
710639365b
adjust no files message (#250)
Change 'no files yet' -> 'no files to replay' when there are no files available for replay.
2022-06-07 22:59:34 -07:00
sua yoo
fa4b71288c
Fix watch crawl running state (#249) 2022-06-07 12:04:35 -07:00
Ilya Kreymer
e3f268a2e8
CI setup for new swarm mode (#248)
- build backend and frontend with cacheing using GHA cache)
- streamline frontend image to reduce layers
- setup local swarm with test/setup.sh script, wait for containers to init
- copy sample config files as default (add storages.sample.yaml)
- add initial backend test for logging in with default superadmin credentials via 127.0.0.1:9871
- must use 127.0.0.1 instead of localhost for accessing frontend container within action
2022-06-06 09:34:02 -07:00
Ilya Kreymer
0c8a5a49b4 refactor to use docker swarm for local alternative to k8s instead of docker compose (#247):
- use python-on-whale to use docker cli api directly, creating docker stack for each crawl or profile browser
- configure storages via storages.yaml secret
- add crawl_job, profile_job, splitting into base and k8s/swarm implementations
- split manager into base crawlmanager and k8s/swarm implementations
- swarm: load initial scale from db to avoid modifying fixed configs, in k8s, load from configmap
- swarm: support scheduled jobs via swarm-cronjob service
- remove docker dependencies (aiodocker, apscheduler, scheduling)
- swarm: when using local minio, expose via /data/ route in nginx via extra include (in k8s, include dir is empty and routing handled via ingress)
- k8s: cleanup minio chart: move init containers to minio.yaml
- swarm: stateful set implementation to be consistent with k8s scaling:
  - don't use service replicas,
  - create a unique service with '-N' appended and allocate unique volume for each replica
  - allows crawl containers to be restarted w/o losing data
- add volume pruning background service, as volumes can be deleted only after service shuts down fully
- watch: fully simplify routing, route via replica index instead of ip for both k8s and swarm
- rename network btrix-cloud-net -> btrix-net to avoid conflict with compose network
2022-06-05 10:37:17 -07:00
Ilya Kreymer
bf79959a5a refactoring to use statefulsets + job (#245)
- use statefulsets instead of deployments for mongo, redis, signer
- use k8s job + statefulset for running crawls
- use separate statefulset for crawl (scaled) and single-replica redis stateful set
- move crawl job update login to crawl_updater
- remove shared redis chart

package refactor:
- move to shared code to 'btrixcloud'
- move k8s to 'btrixcloud.k8s'
- move docker to 'btrixcloud.docker'
2022-06-05 10:37:17 -07:00
sua yoo
502d687620
Enable duplicating and editing browser profile (#237)
* ensure editing other config options does not lose profile
* support adding/editing/removing profile of existing config
* when duplicating config, ensure profile setting is also copied in the duplicate
2022-06-04 08:26:19 -07:00
sua yoo
0c1dc2a1d1
Show crawl replay for running crawls (#235)
* show replay and watch at same time

* add separate section for watch

* only show replay if crawl has files, otherwise show 'no files' message
2022-06-04 08:19:09 -07:00
sua yoo
6a78bcd4aa
Delete browser profile (#243)
- delete browser profile, if not in use
- if in use, show error message, listing crawl configs that use the profile
- backend: fix check for confirming profile deletion
2022-06-01 19:18:41 -07:00
sua yoo
9cf1ed7d4d
copy yaml (#239) 2022-06-01 19:06:52 -07:00
Ilya Kreymer
aa1a2bf211 frontend: adjust api for websocket access checks 2022-06-01 15:08:50 -07:00
Ilya Kreymer
c023fe7c9a
Backend API prefix (#240)
* apply /api prefix consistently, both directly through backend and when accessing via frontend, fixes #236

* docs: update local deployment docs to use 9871 instead of 8000, don't expose 8000 by default

* schemas: don't include /openapi.json as /healthz in documentation, keep /healthz at root

* k8s: route backend to /api without additional rewriting
2022-05-31 19:29:20 -07:00
sua yoo
2355de3067
docs: remove extra comment 2022-05-31 14:13:17 -07:00
sua yoo
6e19e854be
Fix "Run now" button (#234) 2022-05-30 16:15:10 -07:00
Ilya Kreymer
955197579e frontend: support multi wacz replay using the crawl json as input 2022-05-20 09:11:23 -07:00
Ilya Kreymer
cdefb8d06e frontend: further nginx template, just rename to frontend.template -> frontend.conf.template 2022-05-13 11:29:09 -04:00
Andy Jackson
330c0347dc
frontend: ensure generated config file has correct .conf extension. (#228) 2022-05-13 10:10:40 -04:00
Ilya Kreymer
0fab6db75e
frontend: add nginx.conf to limit worker processes (#226)
set the number of nginx workers to 2 to avoid exceeding memory, which can happen with default worker_processes: auto due to the cpu limit setting.
2022-05-10 15:11:35 -04:00
sua yoo
bda817dadd
View and edit browser profile (#218) 2022-04-23 20:12:16 -07:00
sua yoo
f157e2031f
Filter and sort crawl templates (#217) 2022-04-23 20:11:53 -07:00
sua yoo
cb80c6767e
hotfix: update profile ID in crawl template 2022-04-20 19:40:30 -07:00
Ilya Kreymer
38869cdd24
crawl templates: check that lastCrawlState is not null (#220) 2022-04-20 19:17:24 -07:00
sua yoo
db27b6aaaf
View and edit browser profile (#214) 2022-04-19 10:44:21 -07:00
sua yoo
71eec4d915
Create crawl template with browser profile (#215) 2022-04-18 10:36:28 -07:00
Ilya Kreymer
73b8c64ba4 frontend profile browser: cover devtools sidebar with profile sidebar, add try/catch for localStorage override 2022-04-13 21:41:51 -07:00
sua yoo
f5993e8ad8
Create browser profile UI (#211) 2022-04-13 21:11:13 -07:00
sua yoo
d2653ae835
View browser profiles in UI (#209) 2022-04-13 21:10:22 -07:00
Ilya Kreymer
2f63c7dcf8
Profiles: Backend API + Nginx Devtools Proxy Support (#212)
* add profile creation, list endpoints at /archives/<aid>/profiles
* add profile browser creation, get, ping, commit, delete endpoints at /archives/<aid>/profiles/browser
* support creation of profile browser using browsertrix-crawler 'create-login-profile' in docker and k8s
* ensure profile browser expires after set time, k8s job or docker container automatically deleted on exit
* profile browser creation returns temporary browser id, or `{"detail": "waiting_for_browser"}` while waiting for browser container init
* nginx frontend: proxy /loadbrowser/ to port 9223 in browsertrix-crawler, connecting directly to chrome devtools
* profile api auth: use redis for auth
- store browserid->archiveid and browserid->browser ip mapping in redis
- browser apis: ensure profile browser is associated with specified archive
- browser ws: pass arcchiveid and browserid to ws query args, browserid is part of archive, and browserid corresponds to specified ip
* store profiles in /profiles/ directory in default storage, include profileid in profile tar.gz filename

* support profile in crawlconfig:
- add profileid to CrawlConfig, and profileName to CrawlConfigOut
- support resolving profile path via profileid, setting '--profile @{path/to/profile.tar.gz}' for crawler (assuming same storage for profile as output for now) in both docker and k8s setups
- docker: support out_filename, custom wacz output filename missing functionality
2022-04-13 19:36:06 -07:00
sua yoo
238ee8f7ee
delete unused component file 2022-04-11 13:18:23 -07:00
sua yoo
8828681e8e
hotfix: fix crawl sort control alignment 2022-04-11 13:13:53 -07:00
sua yoo
d4b3ae3795
delete unused component file 2022-04-11 13:10:23 -07:00
sua yoo
5307138202
enable opening crawl template in new tab 2022-04-11 13:03:19 -07:00
sua yoo
f90ef071de
enable opening crawl in new tab 2022-04-11 13:03:10 -07:00
sua yoo
29b586b03f
Edit crawl config as YAML (#207) 2022-04-06 17:40:25 -07:00
Ilya Kreymer
9a6483630e
Support for Admin interface for viewing web archives (#198)
* backend api
- superadmin has admin access to all archives
- new superadmin endpoints: /archives/all/crawls and /archives/all/crawls/<crawl_id>.json for list all running crawls
and loading crawl data by id

- frontend superadmin view (fixes #201)
* show all archives on superadmin home page
* show jump to crawl for super admin (#200)
* navbar links for: all archives, all running crawls and jump to crawl

Co-authored-by: sua yoo <sua@suayoo.com>
2022-04-06 12:42:04 -07:00
sua yoo
ec3a77b71e
Mobile layout fixes (#206)
closes #202
2022-03-30 15:54:25 -07:00
sua yoo
9e2274f612
remove temp file 2022-03-30 13:51:02 -07:00
Ilya Kreymer
9e45dc35d2
minor frontend-tweaks: (#196)
* frontend-tweaks:
- treat 'starting' state same as 'running'
- default to no schedule instead of weekly for default
- add 'Domain' scopeType

* backend: also allow 'domain' as a scopeType
2022-03-15 21:19:23 -07:00
sua yoo
8863776c54
Define websocket host in common webpack config (#195)
* move websocket host var to common config, better fix for #193
2022-03-15 18:34:49 -07:00
Ilya Kreymer
912004751d quickfix: partial mitigation for #193, use current host for websock address 2022-03-14 15:29:35 -07:00
sua yoo
6fabea3e7a
Frontend build fixes (#191)
* copy specific files
* replace api host env var
* remove unused dotenv
* Update frontend/webpack.dev.js
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2022-03-10 23:26:21 -08:00
sua yoo
4190e40964
Show last crawl state in UI (#192)
* update crawl list status

* show on detail page
2022-03-10 23:25:42 -08:00
sua yoo
edf6b9ded7
Update home page routing (#186)
closes #183
2022-03-04 16:18:41 -08:00
sua yoo
0fe54653be
Fix unable to save edits to simple view (#185) 2022-03-04 16:17:57 -08:00
sua yoo
f2f67c34af
Copy extra hops value when duplicating crawl config (#184)
closes #158
2022-03-04 16:17:37 -08:00
sua yoo
4383c5e8d8
Disable error tracking in prod (#182)
closes #161
2022-03-04 16:17:05 -08:00
Ilya Kreymer
cdd0ab34a3
Watch Stream Directly from Browsertrix Crawler (#189)
* watch work: proxy directly to crawls instead of redis pubsub
- add 'watchIPs' to crawl detail output
- cache crawl ips for quick access for auth
- add '/ipaccess/{ip}' endpoint for watch ws connection to ensure ws has access to the specified container ip
- enable 'auth_request' in nginx frontend
- requirements: update to latest redis-py
remaining fixes for #134
2022-03-04 14:55:11 -08:00
sua yoo
c18418ff09
Show invite message to super admin & layout fixes (#181) 2022-03-02 18:09:26 -08:00
sua yoo
fe31f551b2
Add "crawler" role to members (#174)
closes #139
2022-03-02 18:09:10 -08:00
sua yoo
c888a45d97
Fix seed URLs reset on JSON view toggle (#172)
closes #160
2022-03-02 18:08:45 -08:00
sua yoo
373c489b00
Watch crawl from crawl detail page (#156)
closes #164
closes #134 

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2022-03-02 18:08:08 -08:00
sua yoo
83ded98081
Set and update crawl scale (#162)
closes #143
2022-02-28 09:14:27 -08:00
sua yoo
3fe3691e74
Update crawl run duration at intervals (#155)
fixes #138
2022-02-23 16:14:01 -08:00
Ilya Kreymer
bb9c953d92
Add License, Logo and README updates for release (#157)
* Add new logo (for now)
* Add agpl license
* Minor README cleanup
2022-02-23 12:10:46 -08:00
sua yoo
4af30a02be
Archive and crawl navigation improvements (#154) 2022-02-23 09:19:48 -08:00
sua yoo
b5874c3f8c
call super disconnected callback after custom callback 2022-02-22 15:59:55 -08:00
sua yoo
c563216582
Allow user to edit crawl template (#147)
closes #144
2022-02-22 13:54:25 -08:00
Ilya Kreymer
8ede386a8b
docker image fix (#151)
* frontend docker build: pass GIT_COMMIT_HASH and GIT_BRANCH_NAME as env vars to remove dependency on git in webpack.config.js (for glitchtip)
fixes #150

* default to "unknown" if git and env vars not available

* add comment about error reporting for local use

Co-authored-by: sua yoo <sua@suayoo.com>
2022-02-22 10:52:27 -08:00
Ilya Kreymer
9bd402fa17
New WS Endpoint for Watching Crawl (#152)
* backend support for new watch system (#134):
- support for watch via redis pubsub and websocket connection to backend
- can support watch from any number of crawler instances to support scaled crawls
- use /archives/{aid}/crawls/{crawl_id}/watch/ws websocket endpoint
- ws: ignore graceful connectionclosedok exception, log other exceptions
- set logging to info to instead of debug for now (debug logs all ws traffic)
- remove old watch apis in backend
- remove old websocket routing to crawler instance for old watch system
- oauth bearer check: support websockets, use websocket object if no request object
- crawler args: replace --screencastPort with --screencastRedis
2022-02-22 10:33:10 -08:00
sua yoo
f30b398fea
Deactivate crawl templates in UI (#145)
wip #144
2022-02-21 11:37:15 -08:00
sua yoo
aa645d9b15
Enable frontend exception tracking (#140) 2022-02-18 10:34:07 -08:00
Ilya Kreymer
e9d6c68f6a frontend: replay: use single wacz replay for now (using first wacz file) 2022-02-15 08:34:14 -08:00
sua yoo
c577e36b74
add debug for access token 2022-02-08 17:52:27 -08:00
sua yoo
02f46f108b
Crawl & crawl config UX improvements (#136) 2022-02-01 14:28:07 -08:00
sua yoo
d7f58c964c
Fix in-app link UX (#132)
closes #130, closes #113
2022-01-31 17:36:50 -08:00
Ilya Kreymer
adb5c835f2
Presign and replay (#127)
* support for replay via replayweb.page embed, fixes #124

backend:
- pre-sign all files urls
- cache pre-signed urls in redis, presign again when expired (default duration 3600, settable via PRESIGN_DURATION_SECONDS env var)
- change files output -> resources to confirm to Data Package spec supported by replayweb.page
- add CrawlFileOut which contains 'name' (file id), 'path' (presigned url), 'hash', and 'size'
- add /replay/sw.js endpoint to import sw.js from latest replay-web-page release
- update to fastapi-users 9.2.2
- customize backend auth to allow authentication to check 'auth_bearer' query arg if 'Authorization' header not set
- remove sw.js endpoint, handling in frontend

frontend:
- add <replay-web-page> to frontend, include rwp ui.js from latest release in index.html for now
- update crawl api endpoint to end in json
- replay-web-page loads the api endpoint directly!
- update Crawl type to use new format, 'resources' -> instead of 'files', each file has 'name' and 'path'

- nginx: add endpoint to serve the replay sw.js endpoint
- add defer attr to ui.js
- move 'Download' to 'Download Files'

* frontend: support customizing replayweb.page loading url via RWP_BASE_URL env var in Dockerfile
- default prod value set in frontend Dockerfile (set to upcoming 1.5.8 release needed for multi-wacz-file support) (can be overridden during image build via --build-arg)
- rename index.html -> index.ejs to allow interpolation
- RWP_BASE_URL defaults to latest https://replayweb.page/ for testing
- for local testing, add sw.js loading via devServer, also using RWP_BASE_URL (#131)

Co-authored-by: sua yoo <sua@suayoo.com>
2022-01-31 17:02:15 -08:00
sua yoo
336cf11521
Fix "View crawl" links (#129)
* update key

* update in crawl config
2022-01-31 15:45:48 -08:00
sua yoo
d7c0877403
Refactor archive tabs & navigation improvements (#123)
closes #112
2022-01-31 15:45:36 -08:00
sua yoo
9de1a3a003
fix stopping gracefully feedback 2022-01-31 12:02:10 -08:00
Ilya Kreymer
523b557eac replay route: (prepare for replay, #124)
- add support for /replay/sw.js
- ensure route works in both k8s and docker (routed via main nginx)
2022-01-31 11:18:10 -08:00
sua yoo
be4bf3742f
Initial crawl detail page (#108) 2022-01-30 18:36:43 -08:00
sua yoo
7c067ffe36
Crawl template enhancements (#114)
closes #100
2022-01-30 18:30:54 -08:00
sua yoo
b93ca4e833
Add empty state for crawls (#121) 2022-01-29 15:55:44 -08:00
sua yoo
7777a22829
Poll crawls list & add additional details (#116) 2022-01-29 14:37:16 -08:00
sua yoo
2636f33123
Make crawl list interactive (#109)
- Cancel and stop crawl
- Sorts crawls by start time, status and crawl template ID
- Filters crawls by crawl template ID
- Adds shortcut to copy template ID
2022-01-29 10:38:58 -08:00
sua yoo
28b59130a6
Initial list of crawls (#105) 2022-01-27 19:28:19 -08:00
sua yoo
5fccd07329
Edit crawl schedule (#103) 2022-01-26 22:11:32 -08:00
Ilya Kreymer
0bea0cfff2
crawl config new template: add support for 'extraHops' config option (available in browsertrix-crawler 0.5.0) (#104)
frontend:
- add checkbox to basic crawl config component which sets 'extraHops' to 1, otherwise to 0
- text tweaks: rename Scope Type -> Crawl Scope, capitalization

backend: add 'extraHops' to CrawlConfig
fixes #102
2022-01-26 21:18:22 -08:00
sua yoo
2666b6f6aa
Duplicate crawl config from list (#99) 2022-01-25 17:07:54 -08:00
sua yoo
3a461d86d4
Crawl config detail views (#97) 2022-01-25 11:56:34 -08:00
sua yoo
9ed216ba05
Run and delete crawl templates from list view (#94) 2022-01-22 14:18:02 -08:00
sua yoo
ec1a758e42
Upgrade TailwindCSS to v3 (#93) 2022-01-19 22:00:06 -08:00
sua yoo
cb5cf55c69
Add helper for dispatching notify events (#92) 2022-01-19 21:01:47 -08:00
sua yoo
22942527e9
Refactor crawl config into multiple components (#91) 2022-01-19 18:43:19 -08:00
sua yoo
3645e3b096
Create crawl config UX enhancements (#90)
closes #87
2022-01-19 11:01:17 -08:00
sua yoo
c3edb4bba4
Allow user to configure crawls with JSON (#86) 2022-01-18 19:58:55 -08:00
sua yoo
ff77a92108
Schedule time of day when creating config (#85) 2022-01-18 13:58:28 -08:00
sua yoo
f58bc801fc
remove unused package dependency 2022-01-16 14:49:21 -08:00
sua yoo
b2088f5634
Add initial crawl template form (#80) 2022-01-16 14:43:33 -08:00
sua yoo
b9622df9e6
Update app layout (#78)
closes #75
2022-01-13 17:52:59 -08:00
sua yoo
5492eca117
Fix login page update side effect (#77)
closes #76
2022-01-12 16:43:34 -08:00
sua yoo
5258149a34
Fix password autocomplete and suggestions (#73)
closes #72
2022-01-08 14:28:54 -08:00
Ilya Kreymer
684f112c85 dockerfile hotfix: remove unused 'scripts' directory 2021-12-07 11:22:30 -08:00
sua yoo
83be07bdcf
remove extra word in readme 2021-12-07 09:42:52 -08:00
sua yoo
2c3debfec6
Show invite info to user (#68) 2021-12-07 09:36:03 -08:00
sua yoo
3324bd960f
Refactor to remove sign up and JWT env variables (#65)
closes #63
closes #66
2021-12-06 19:39:04 -08:00
sua yoo
e787e751d9
update alert import (#64) 2021-12-06 18:49:42 -08:00
sua yoo
ddf1b63313
fix invite token logic 2021-12-05 19:01:21 -08:00