browsertrix/backend/btrixcloud
Vinzenz Sinapius bb6e703f6a
Configure browsertrix proxies (#1847)
Resolves #1354

Supports crawling through pre-configured proxy servers, allowing users to select which proxy servers to use (requires browsertrix crawler 1.3+)

Config:
- proxies defined in btrix-proxies subchart
- can be configured via btrix-proxies key or separate proxies.yaml file via separate subchart
- proxies list refreshed automatically if crawler_proxies.json changes if subchart is deployed
- support for ssh and socks5 proxies
- proxy keys added to secrets in subchart
- support for default proxy to be always used if no other proxy configured, prevent starting cluster if default proxy not available
- prevent starting manual crawl if previously configured proxy is no longer available, return error
- force 'btrix' username and group name on browsertrix-crawler non-root user to support ssh

Operator:
- support crawling through proxies, pass proxyId in CrawlJob
- support running profile browsers which designated proxy, pass proxyId to ProfileJob
- prevent starting scheduled crawl if previously configured proxy is no longer available

API / Access:
- /api/orgs/all/crawlconfigs/crawler-proxies - get all proxies (superadmin only)
- /api/orgs/{oid}/crawlconfigs/crawler-proxies - get proxies available to particular org
- /api/orgs/{oid}/proxies - update allowed proxies for particular org (superadmin only)
- superadmin can configure which orgs can use which proxies, stored on the org
- superadmin can also allow an org to access all 'shared' proxies, to avoid having to allow a shared proxy on each org.

UI:
- Superadmin has 'Edit Proxies' dialog to configure for each org if it has: dedicated proxies, has access to shared proxies.
- User can select a proxy in Crawl Workflow browser settings
- Users can choose to launch a browser profile with a particular proxy
- Display which proxy is used to create profile in profile selector
- Users can choose with default proxy to use for new workflows in Crawling Defaults

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-10-02 18:35:45 -07:00
..
migrations fix resetting of invalid logins: (#2002) 2024-08-07 12:36:06 -07:00
operator Configure browsertrix proxies (#1847) 2024-10-02 18:35:45 -07:00
__init__.py
auth.py Include user and user org info in login response (#2014) 2024-08-12 18:51:42 -07:00
background_jobs.py api docs cleanup + readd webhooks: (#1949) 2024-07-22 09:00:59 -07:00
basecrawls.py Serialize datetimes with Z suffix (#2058) 2024-09-12 16:16:13 -07:00
colls.py Document all API endpoints with response models (#1928) 2024-07-16 12:48:38 -07:00
crawlconfigs.py Configure browsertrix proxies (#1847) 2024-10-02 18:35:45 -07:00
crawlmanager.py Configure browsertrix proxies (#1847) 2024-10-02 18:35:45 -07:00
crawls.py Configure browsertrix proxies (#1847) 2024-10-02 18:35:45 -07:00
db.py Serialize datetimes with Z suffix (#2058) 2024-09-12 16:16:13 -07:00
emailsender.py Subscription Update Quotas (#1988) 2024-08-05 15:59:47 -07:00
invites.py security: tweak get /invite endpoints / InviteOut to: (#2087) 2024-09-20 11:52:56 -07:00
k8sapi.py Configure browsertrix proxies (#1847) 2024-10-02 18:35:45 -07:00
main_op.py Add superuser API endpoints to export and import org data (#1394) 2024-07-02 17:14:34 -04:00
main.py Configure browsertrix proxies (#1847) 2024-10-02 18:35:45 -07:00
models.py Configure browsertrix proxies (#1847) 2024-10-02 18:35:45 -07:00
orgs.py Configure browsertrix proxies (#1847) 2024-10-02 18:35:45 -07:00
pages.py Serialize datetimes with Z suffix (#2058) 2024-09-12 16:16:13 -07:00
pagination.py Format backend with Black 24 (#1507) 2024-02-07 11:35:34 -08:00
profiles.py Configure browsertrix proxies (#1847) 2024-10-02 18:35:45 -07:00
storages.py Implement downloading archived item + QA runs as multi-WACZ (#1933) 2024-07-25 10:28:57 -07:00
subs.py Subscription Update Quotas (#1988) 2024-08-05 15:59:47 -07:00
uploads.py optimize org quota lookups (#1973) 2024-07-25 14:00:16 -07:00
users.py Configure browsertrix proxies (#1847) 2024-10-02 18:35:45 -07:00
utils.py Serialize datetimes with Z suffix (#2058) 2024-09-12 16:16:13 -07:00
version.py version: bump to 1.12.0-beta.0 2024-09-12 14:30:15 -07:00
webhooks.py Add webhooks for qaAnalysisStarted, qaAnalysisFinished, and crawlReviewed (#1974) 2024-07-25 16:53:49 -07:00