Commit Graph

14 Commits

Author SHA1 Message Date
Tessa Walsh
07fa46d9aa
Add custom user agent to workflows (#1465)
Fixes #1341

Adds "User Agent" field to workflow editor under the Browser Settings
tab. If not set, the crawler will use the browser's default user agent.

Also added to docs and to the workflow details page (if set).

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2024-01-17 17:33:50 -05:00
Tessa Walsh
032859f361
Support multiple crawler versions (#1420)
Fixes #1385 

## Changes
Supports multiple crawler 'channels' which can be configured to
different browsertrix-crawler versions
- Replaces `crawler_image` in helm chart with `crawler_channels` array
similar to how storages are handled
- The `default` crawler channel must always be provided and specifies
the default crawler image
- Adds backend `/orgs/{oid}/crawlconfigs/crawler-channels` API endpoint
to fetch information about available crawler versions (name, image, and
label) and test
- Adds crawler channel select to workflow creation/edit screens and
profile creation dialog, and updates related API endpoints and
configmaps accordingly. The select dropdown is shown only if more than
one channel is configured.
- Adds `crawlerChannel` to workflow and crawl details.
- Add `image` to crawler image, used to display actual image used as
part of the crawl.
- Modifies `crawler_crawl_id` backend test fixture to use `test` crawler
version to ensure crawler versions other than latest work
- Adds migration to add `crawlerChannel` set to `default` to existing
workflow and profile objects and workflow configmaps

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2024-01-16 15:32:12 -08:00
Henry Wilkinson
ae8804d87f
Improves user documentation intro (#1376)
Closes #1369 

### Changes
- Adds improved getting started steps and intro contact information to
the User Guide homepage
- Adds a small section about the execution minutes graph for orgs with a
quota set
- Moves existing signup content to a dedicated signup page
- Changes admonitions from using em dashes to using colons.
- Em dashes are great and I love em.... But sometimes I love them a
little _too_ much and they were a bad fit here.
- Fixes user guide homepage link
- Fixes `ReplayWeb.page` and `ArchiveWeb.page` names
- Fixes broken links (would be good to have a CI system for this I
think)

---------
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-11-15 17:55:47 -08:00
Henry Wilkinson
0bd8748e68
Minor Workflow Creator UX Changes (#1267)
- Adds `position: sticky` to the workflow creator / editor controls to
affix them to the bottom of the screen, they are now always visible!
- Renames "Extra URLs in Scope" to "Extra URL Prefixes in Scope"
- Updates documentation accordingly
- Adjusts casing for checkboxes
- Adds the multiplication sign to the crawler instances settings to
better communicate that they are increases in scale and not arbitrary
numbers.
2023-10-13 16:55:54 -07:00
Tessa Walsh
b1ead614ee
Add --failOnFailedSeed checkbox to URL list workflows (#1236)
- If set, and any of the seeds fails, the entire crawl is marked as a failure.
- Add checkbox which adds --failOnFailedSeed checkbox to URL list workflows
- Add 'Fail Crawl On Failed URL' to crawl workflow setup docs
2023-10-03 18:46:09 -07:00
Tessa Walsh
e667fe2e97
Add max crawl size option to backend and frontend (#1045)
Backend:
- add 'maxCrawlSize' to models and crawljob spec
- add 'MAX_CRAWL_SIZE' to configmap
- add maxCrawlSize to new crawlconfig + update APIs
- operator: gracefully stop crawl if current size (from stats) exceeds maxCrawlSize
- tests: add max crawl size tests

Frontend:
- Add Max Crawl Size text box Limits tab
- Users enter max crawl size in GB, convert to bytes
- Add BYTES_PER_GB as constant for converting to bytes
- docs: Crawl Size Limit to user guide workflow setup section

Operator Refactor:
- use 'status.stopping' instead of 'crawl.stopping' to indicate crawl is being stopped, as changing later has no effect in operator
- add is_crawl_stopping() to return if crawl is being stopped, based on crawl.stopping or size or time limit being reached
- crawlerjob status: store byte size under 'size', human readable size under 'sizeHuman' for clarity
- size stat always exists so remove unneeded conditional (defaults to 0)
- store raw byte size in 'size', human readable size in 'sizeHuman'

Charts:
- subchart: update crawlerjob crd in btrix-crds to show status.stopping instead of spec.stopping
- subchart: show 'sizeHuman' property instead of 'size'
- bump subchart version to 0.1.1

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-08-26 22:00:37 -07:00
Henry Wilkinson
2952988864
docs: formatting fixes & minor content updates (#1091)
Additional tweaks on Browser Profiles pages + general consistency pass
2023-08-21 13:26:43 -07:00
Henry Wilkinson
02a01e7abb
docs: Adds information about 1.6 features to documentation (#1086)
* 1.6 docs update

### Changes

- Adds note in style guide about referencing actions in the app
- Adds page for Browser Profiles
  - Adds callout for uploads in the context of combining items from multiple sources
- Adds page for Collections
- Adds page for Crawl Workflows
- Updates index to link to new dedicated Crawl Workflow page in addition to the Crawl Workflow Setup page
- Updates Org Settings page action styling in accordance with new rules
- Updates Crawl Workflow Setup page with links to the new pages and a hierarchy fix for the first item
- Updates user guide navigation with a new section for crawling related items
---------

Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2023-08-18 21:55:20 -07:00
Tessa Walsh
d5c3a8519f
Add crawler Use Sitemap option to Browsertrix Cloud (#978)
* Add user-guide docs for Use Sitemap option
---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-07-19 13:57:52 -04:00
Henry Wilkinson
d9e73fcbc3
Reorder Limits section (#966)
* Reorder Limits section

- Minor text change to section names
  - "Limit Per Page" → "Per-Page Limits"
  - "Limit Per Crawl" → "Per-Crawl Limits"

* Reorder limits section in documentation
2023-07-08 08:54:30 -07:00
Henry Wilkinson
ac4716614e
Minor gramatical changes to documentation (#919) 2023-07-04 17:14:49 -04:00
Tessa Walsh
bd6dc79449
Add frontend support for auto-adding collections to workflows (#916)
- Adds collections search and list to workflow editor
- Adds collections to workflow details component
- Adds namePrefix filter to backend GET /orgs/{oid}/collections endpoint to support case-insensitive searching of collections
- Adds documentation for new setting

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
2023-06-12 18:18:05 -07:00
Henry Wilkinson
79703baa69
Org Settings documetation & Getting Started docs page updates 2023-06-11 17:39:16 -04:00
Henry Wilkinson
8477919989
Adds all workflow settings to the user docs with descriptions (#894)
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-06-08 14:28:58 -04:00