Documentation Update for Pausing and Resuming Crawl section (#2639)

- Rename 'Modifying Running Crawls' to 'Running Crawls'
- Add section about pausing/resuming crawls, and that paused crawls will eventually become stopped if not resumed.
- Add new crawl pausing, paused, resuming statuses and icons.

Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
This commit is contained in:
DaleLore 2025-06-11 00:47:03 -04:00 committed by GitHub
parent 3fa0c68922
commit 1a6d2a20c2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 41 additions and 14 deletions

View File

@ -0,0 +1,4 @@
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-pause-circle" viewBox="0 0 16 16">
<path d="M8 15A7 7 0 1 1 8 1a7 7 0 0 1 0 14m0 1A8 8 0 1 0 8 0a8 8 0 0 0 0 16"/>
<path d="M5 6.25a1.25 1.25 0 1 1 2.5 0v3.5a1.25 1.25 0 1 1-2.5 0zm3.5 0a1.25 1.25 0 1 1 2.5 0v3.5a1.25 1.25 0 1 1-2.5 0z"/>
</svg>

After

Width:  |  Height:  |  Size: 345 B

View File

@ -0,0 +1,4 @@
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-play-circle" viewBox="0 0 16 16">
<path d="M8 15A7 7 0 1 1 8 1a7 7 0 0 1 0 14m0 1A8 8 0 1 0 8 0a8 8 0 0 0 0 16"/>
<path d="M6.271 5.055a.5.5 0 0 1 .52.038l3.5 2.5a.5.5 0 0 1 0 .814l-3.5 2.5A.5.5 0 0 1 6 10.5v-5a.5.5 0 0 1 .271-.445"/>
</svg>

After

Width:  |  Height:  |  Size: 341 B

View File

@ -1,4 +1,4 @@
@import './theme.css';
@import "./theme.css";
/* Font style definitions */
@font-face {
@ -8,9 +8,9 @@
font-display: swap;
src: url("https://cdn.webrecorder.net/fonts/recursive/recursive-latin.woff2")
format("woff2");
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,
U+02DA, U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC,
U+2122, U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA,
U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191,
U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}
@font-face {
@ -141,7 +141,10 @@ h3 {
}
.md-typeset {
font-feature-settings: "ss04" off,"ss07" on,"ss08" on;
font-feature-settings:
"ss04" off,
"ss07" on,
"ss08" on;
}
/* Custom badge classes, applies custom overrides to inline-code blocks */

View File

@ -1,4 +1,4 @@
# Modifying Running Crawls
# Running Crawls
Running crawls can be modified from the crawl workflow **Latest Crawl** tab. You may want to modify a running crawl if you find that the workflow is crawling pages that you didn't intend to archive, or if you want a boost of speed.
@ -8,17 +8,20 @@ A crawl workflow that is in progress can be in one of the following states:
| Status | Description |
| ---- | ---- |
| <span class="status-waiting">:bootstrap-hourglass-split: Waiting</span> | The workflow can't start running yet but it is queued to run when resources are available. |
| <span class="status-waiting">:btrix-status-dot: Starting</span> | New resources are starting up. Crawling should begin shortly.|
| <span class="status-success">:btrix-status-dot: Running</span> | The crawler is finding and capturing pages! |
| <span class="status-waiting">:btrix-status-dot: Stopping</span> | A user has instructed this workflow to stop. Finishing capture of the current pages.|
| <span class="status-waiting">:btrix-status-dot: Finishing Downloads</span> | The workflow has finished crawling and is finalizing downloads.|
| <span class="status-waiting">:btrix-status-dot: Generating WACZ</span> | Data is being packaged into WACZ files.|
| <span class="status-waiting">:btrix-status-dot: Uploading WACZ</span> | WACZ files have been created and are being transferred to storage.|
| <span class="status-violet-600">:bootstrap-hourglass-split: Waiting</span> | The workflow can't start running yet but it is queued to run when resources are available. |
| <span class="status-violet-600">:btrix-status-dot: Starting</span> | New resources are starting up. Crawling should begin shortly.|
| <span class="status-green-600">:btrix-status-dot: Running</span> | The crawler is finding and capturing pages! |
| <span class="status-violet-600">:bootstrap-pause-circle: Pausing</span> | The workflow is in the process of being paused. |
| <span class="status-neutral-500">:bootstrap-pause-circle: Paused</span> | The workflow is currently paused. |
| <span class="status-violet-600">:bootstrap-play-circle: Resuming</span> | The workflow is in the process of resuming after being paused. |
| <span class="status-violet-600">:btrix-status-dot: Stopping</span> | A user has instructed this workflow to stop. Finishing capture of the current pages.|
| <span class="status-violet-600">:btrix-status-dot: Finishing Downloads</span> | The workflow has finished crawling and is finalizing downloads.|
| <span class="status-violet-600">:btrix-status-dot: Generating WACZ</span> | Data is being packaged into WACZ files.|
| <span class="status-violet-600">:btrix-status-dot: Uploading WACZ</span> | WACZ files have been created and are being transferred to storage.|
## Watch Crawl
You can watch the current state of the browser windows as the crawler visit pages in the **Watch** tab of **Latest Crawl**. A list of queued URLs are displayed below in the **Upcoming Pages** section.
You can watch the current state of the browser windows as the crawler visits pages in the **Watch** tab of **Latest Crawl**. A list of queued URLs are displayed below in the **Upcoming Pages** section.
## Live Exclusion Editing
@ -34,6 +37,19 @@ Like exclusions, the number of [browser windows](workflow-setup.md#browser-windo
Unlike exclusions, this change will not be applied to future workflow runs.
## Pausing and Resuming Crawls
If you need to reassess or rescope your crawl at any point after it has started, you can pause the running crawl.
To pause a running crawl, click the *Pause* button. The crawl status will change from *Running* to *Pausing* as in-progress pages are completed, and then to *Paused* once the crawler is successful paused. Paused crawls do not continue to accrue execution time.
While a crawl is paused, it is possible to replay the pages crawled up to that point and to download the WACZ files from the *Latest Crawl* tab.
To resume a paused crawl, simply click the *Resume* button. The crawl status will update from *Resuming* to *Running* to indicate that the crawler has started crawling again. Any changes to the workflow settings will be applied in the the resumed crawl.
???+ Note
Paused crawls that are not resumed within 7 days of being paused are automatically updated to *Stopped*. Once stopped, the crawl is finished and can no longer be resumed.
## End a Crawl
If a crawl workflow is not crawling websites as intended it may be preferable to end crawling operations and update the crawl workflow's settings before trying again. There are two operations to end crawls, available both on the workflow's details page, or as part of the actions menu in the workflow list.