Closes #1642 ### Changes - Adds section to the collections page on downloading collections - Changes the Files section on the archived items page to be more explicit about downloading files because that's the only action you can do there! --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
3.4 KiB
Archived Items
Archived Items consist of one or more WACZ files created by a crawl workflow, or uploaded to Browsertrix. They can be individually replayed, or combined with other archived items in a collection. The Archived Items page lists all items in the organization.
Uploading Web Archives
WACZ files can be given metadata and uploaded to Browsertrix by pressing the Upload WACZ button on the archived items list page. Only one WACZ file can be uploaded at a time.
Status
The status of an archived item depends on its type. Uploads will always have the status :bootstrap-upload: Uploaded, crawls have four possible states:
Status | Description |
---|---|
:bootstrap-check-circle: Complete | The crawl completed according to the workflow's settings. Workflows with limits set may stop running before they capture every queued page, but the resulting archived item will still be marked as "Complete". |
:bootstrap-dash-circle: Stopped | The crawl workflow was stopped gracefully by a user and data is saved. |
:bootstrap-x-octagon: Canceled | The crawl workflow was canceled by a user, no data is saved. |
:bootstrap-exclamation-triangle: Failed | A serious error occurred while crawling, no data is saved. |
Because :bootstrap-x-octagon: Canceled and :bootstrap-exclamation-triangle: Failed crawls do not contain data, they are omitted from the archived items list page and cannot be added to a collection.
Archived Item Details
The archived item details page is composed of five sections, though the Crawl Settings tab is only available for crawls and not uploads.
Overview
The Overview tab displays the item's metadata and statistics associated with its creation process.
Metadata can be edited by pressing the pencil icon at the top right of the metadata section to edit the item's description, tags, and collections it is associated with.
Replay
The Replay tab displays the web content contained within the archived item.
For more details on navigating web archives within ReplayWeb.page, see the ReplayWeb.page user documentation.
Exporting Files
While crawling, Browsertrix will output one or more WACZ files — the crawler aims to output files in consistently sized chunks, and each crawler instance will output separate WACZ files.
The Files tab lists the individually downloadable WACZ files that make up the archived item as well as their file sizes and backup status. To combine one or more archived items and download them all as a single WACZ file, add them to a collection and download the collection.
Error Logs
The Error Logs tab displays a list of errors encountered during crawling. Clicking an errors in the list will reveal additional information.
All log entries with that were recorded in the creation of the Archived Item can be downloaded in JSONL format by pressing the Download Logs button.
Crawl Settings
The Crawl Settings tab displays the crawl workflow configuration options that were used to generate the resulting archived item.