Documentation Update for Pausing and Resuming Crawl section (#2639)
- Rename 'Modifying Running Crawls' to 'Running Crawls' - Add section about pausing/resuming crawls, and that paused crawls will eventually become stopped if not resumed. - Add new crawl pausing, paused, resuming statuses and icons. Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com> Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
This commit is contained in:
parent
3fa0c68922
commit
1a6d2a20c2
@ -0,0 +1,4 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-pause-circle" viewBox="0 0 16 16">
|
||||
<path d="M8 15A7 7 0 1 1 8 1a7 7 0 0 1 0 14m0 1A8 8 0 1 0 8 0a8 8 0 0 0 0 16"/>
|
||||
<path d="M5 6.25a1.25 1.25 0 1 1 2.5 0v3.5a1.25 1.25 0 1 1-2.5 0zm3.5 0a1.25 1.25 0 1 1 2.5 0v3.5a1.25 1.25 0 1 1-2.5 0z"/>
|
||||
</svg>
|
After Width: | Height: | Size: 345 B |
@ -0,0 +1,4 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-play-circle" viewBox="0 0 16 16">
|
||||
<path d="M8 15A7 7 0 1 1 8 1a7 7 0 0 1 0 14m0 1A8 8 0 1 0 8 0a8 8 0 0 0 0 16"/>
|
||||
<path d="M6.271 5.055a.5.5 0 0 1 .52.038l3.5 2.5a.5.5 0 0 1 0 .814l-3.5 2.5A.5.5 0 0 1 6 10.5v-5a.5.5 0 0 1 .271-.445"/>
|
||||
</svg>
|
After Width: | Height: | Size: 341 B |
@ -1,4 +1,4 @@
|
||||
@import './theme.css';
|
||||
@import "./theme.css";
|
||||
/* Font style definitions */
|
||||
|
||||
@font-face {
|
||||
@ -8,9 +8,9 @@
|
||||
font-display: swap;
|
||||
src: url("https://cdn.webrecorder.net/fonts/recursive/recursive-latin.woff2")
|
||||
format("woff2");
|
||||
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,
|
||||
U+02DA, U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC,
|
||||
U+2122, U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
|
||||
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA,
|
||||
U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191,
|
||||
U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
|
||||
}
|
||||
|
||||
@font-face {
|
||||
@ -141,7 +141,10 @@ h3 {
|
||||
}
|
||||
|
||||
.md-typeset {
|
||||
font-feature-settings: "ss04" off,"ss07" on,"ss08" on;
|
||||
font-feature-settings:
|
||||
"ss04" off,
|
||||
"ss07" on,
|
||||
"ss08" on;
|
||||
}
|
||||
|
||||
/* Custom badge classes, applies custom overrides to inline-code blocks */
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Modifying Running Crawls
|
||||
# Running Crawls
|
||||
|
||||
Running crawls can be modified from the crawl workflow **Latest Crawl** tab. You may want to modify a running crawl if you find that the workflow is crawling pages that you didn't intend to archive, or if you want a boost of speed.
|
||||
|
||||
@ -8,17 +8,20 @@ A crawl workflow that is in progress can be in one of the following states:
|
||||
|
||||
| Status | Description |
|
||||
| ---- | ---- |
|
||||
| <span class="status-waiting">:bootstrap-hourglass-split: Waiting</span> | The workflow can't start running yet but it is queued to run when resources are available. |
|
||||
| <span class="status-waiting">:btrix-status-dot: Starting</span> | New resources are starting up. Crawling should begin shortly.|
|
||||
| <span class="status-success">:btrix-status-dot: Running</span> | The crawler is finding and capturing pages! |
|
||||
| <span class="status-waiting">:btrix-status-dot: Stopping</span> | A user has instructed this workflow to stop. Finishing capture of the current pages.|
|
||||
| <span class="status-waiting">:btrix-status-dot: Finishing Downloads</span> | The workflow has finished crawling and is finalizing downloads.|
|
||||
| <span class="status-waiting">:btrix-status-dot: Generating WACZ</span> | Data is being packaged into WACZ files.|
|
||||
| <span class="status-waiting">:btrix-status-dot: Uploading WACZ</span> | WACZ files have been created and are being transferred to storage.|
|
||||
| <span class="status-violet-600">:bootstrap-hourglass-split: Waiting</span> | The workflow can't start running yet but it is queued to run when resources are available. |
|
||||
| <span class="status-violet-600">:btrix-status-dot: Starting</span> | New resources are starting up. Crawling should begin shortly.|
|
||||
| <span class="status-green-600">:btrix-status-dot: Running</span> | The crawler is finding and capturing pages! |
|
||||
| <span class="status-violet-600">:bootstrap-pause-circle: Pausing</span> | The workflow is in the process of being paused. |
|
||||
| <span class="status-neutral-500">:bootstrap-pause-circle: Paused</span> | The workflow is currently paused. |
|
||||
| <span class="status-violet-600">:bootstrap-play-circle: Resuming</span> | The workflow is in the process of resuming after being paused. |
|
||||
| <span class="status-violet-600">:btrix-status-dot: Stopping</span> | A user has instructed this workflow to stop. Finishing capture of the current pages.|
|
||||
| <span class="status-violet-600">:btrix-status-dot: Finishing Downloads</span> | The workflow has finished crawling and is finalizing downloads.|
|
||||
| <span class="status-violet-600">:btrix-status-dot: Generating WACZ</span> | Data is being packaged into WACZ files.|
|
||||
| <span class="status-violet-600">:btrix-status-dot: Uploading WACZ</span> | WACZ files have been created and are being transferred to storage.|
|
||||
|
||||
## Watch Crawl
|
||||
|
||||
You can watch the current state of the browser windows as the crawler visit pages in the **Watch** tab of **Latest Crawl**. A list of queued URLs are displayed below in the **Upcoming Pages** section.
|
||||
You can watch the current state of the browser windows as the crawler visits pages in the **Watch** tab of **Latest Crawl**. A list of queued URLs are displayed below in the **Upcoming Pages** section.
|
||||
|
||||
## Live Exclusion Editing
|
||||
|
||||
@ -34,6 +37,19 @@ Like exclusions, the number of [browser windows](workflow-setup.md#browser-windo
|
||||
|
||||
Unlike exclusions, this change will not be applied to future workflow runs.
|
||||
|
||||
## Pausing and Resuming Crawls
|
||||
|
||||
If you need to reassess or rescope your crawl at any point after it has started, you can pause the running crawl.
|
||||
|
||||
To pause a running crawl, click the *Pause* button. The crawl status will change from *Running* to *Pausing* as in-progress pages are completed, and then to *Paused* once the crawler is successful paused. Paused crawls do not continue to accrue execution time.
|
||||
|
||||
While a crawl is paused, it is possible to replay the pages crawled up to that point and to download the WACZ files from the *Latest Crawl* tab.
|
||||
|
||||
To resume a paused crawl, simply click the *Resume* button. The crawl status will update from *Resuming* to *Running* to indicate that the crawler has started crawling again. Any changes to the workflow settings will be applied in the the resumed crawl.
|
||||
|
||||
???+ Note
|
||||
Paused crawls that are not resumed within 7 days of being paused are automatically updated to *Stopped*. Once stopped, the crawl is finished and can no longer be resumed.
|
||||
|
||||
## End a Crawl
|
||||
|
||||
If a crawl workflow is not crawling websites as intended it may be preferable to end crawling operations and update the crawl workflow's settings before trying again. There are two operations to end crawls, available both on the workflow's details page, or as part of the actions menu in the workflow list.
|
||||
|
Loading…
Reference in New Issue
Block a user