Resolves #1354 Supports crawling through pre-configured proxy servers, allowing users to select which proxy servers to use (requires browsertrix crawler 1.3+) Config: - proxies defined in btrix-proxies subchart - can be configured via btrix-proxies key or separate proxies.yaml file via separate subchart - proxies list refreshed automatically if crawler_proxies.json changes if subchart is deployed - support for ssh and socks5 proxies - proxy keys added to secrets in subchart - support for default proxy to be always used if no other proxy configured, prevent starting cluster if default proxy not available - prevent starting manual crawl if previously configured proxy is no longer available, return error - force 'btrix' username and group name on browsertrix-crawler non-root user to support ssh Operator: - support crawling through proxies, pass proxyId in CrawlJob - support running profile browsers which designated proxy, pass proxyId to ProfileJob - prevent starting scheduled crawl if previously configured proxy is no longer available API / Access: - /api/orgs/all/crawlconfigs/crawler-proxies - get all proxies (superadmin only) - /api/orgs/{oid}/crawlconfigs/crawler-proxies - get proxies available to particular org - /api/orgs/{oid}/proxies - update allowed proxies for particular org (superadmin only) - superadmin can configure which orgs can use which proxies, stored on the org - superadmin can also allow an org to access all 'shared' proxies, to avoid having to allow a shared proxy on each org. UI: - Superadmin has 'Edit Proxies' dialog to configure for each org if it has: dedicated proxies, has access to shared proxies. - User can select a proxy in Crawl Workflow browser settings - Users can choose to launch a browser profile with a particular proxy - Display which proxy is used to create profile in profile selector - Users can choose with default proxy to use for new workflows in Crawling Defaults --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
36 lines
719 B
YAML
36 lines
719 B
YAML
apiVersion: btrix.cloud/v1
|
|
kind: ProfileJob
|
|
metadata:
|
|
name: profilejob-{{ id }}
|
|
labels:
|
|
browser: "{{ id }}"
|
|
role: "job"
|
|
btrix.org: {{ oid }}
|
|
btrix.user: {{ userid }}
|
|
{%- if base_profile %}
|
|
btrix.baseprofile: "{{ base_profile }}"
|
|
{%- endif %}
|
|
btrix.storage: "{{ storage_name }}"
|
|
|
|
spec:
|
|
selector:
|
|
matchLabels:
|
|
browser: "{{ id }}"
|
|
|
|
id: "{{ id }}"
|
|
userid: "{{ userid }}"
|
|
oid: "{{ oid }}"
|
|
|
|
storageName: "{{ storage_name }}"
|
|
crawlerImage: "{{ crawler_image }}"
|
|
|
|
startUrl: "{{ url }}"
|
|
profileFilename: "{{ profile_filename }}"
|
|
vncPassword: "{{ vnc_password }}"
|
|
|
|
proxyId: "{{ proxy_id }}"
|
|
|
|
{% if expire_time %}
|
|
expireTime: "{{ expire_time }}"
|
|
{% endif %}
|