Resolves https://github.com/webrecorder/browsertrix/issues/2066 ### Changes - Allows directly choosing new "Page List" or "Site Crawl from workflow list - Reverts terminology introduced in https://github.com/webrecorder/browsertrix/pull/2032
36 lines
1.9 KiB
Markdown
36 lines
1.9 KiB
Markdown
# Your First Crawl
|
||
|
||
Let’s crawl your first webpage! Start by opening up a webpage that you'd like to crawl, and note the URL for later.
|
||
|
||
## Logging in
|
||
|
||
To start crawling with hosted Browsertrix, you'll need a Browsertrix account. [Sign up for an account](./signup.md) and log in.
|
||
|
||
!!! note "Self-hosting"
|
||
|
||
If you'd like to try Browsertrix before signing up, or you have specialized hosting requirements, you can host Browsertrix yourself. [Set up Browsertrix](../deploy/index.md) on your system and log in as your admin user.
|
||
|
||
## Starting the crawl
|
||
|
||
Once you've logged in you should see your org [overview](overview.md). If you land somewhere else, navigate to **Overview**.
|
||
|
||
1. Tap the _Create New..._ shortcut and select **Crawl Workflow**.
|
||
2. Choose **Page List**. We'll get into the details of the options [later](./crawl-workflows.md), but this is a good starting point for a simple crawl.
|
||
3. Enter the URL of the webpage that you noted earlier in **Page URL(s)**.
|
||
4. Tap _Review & Save_.
|
||
5. Tap _Save Workflow_.
|
||
6. You should now see your new crawl workflow. Give the crawler a few moments to warm up, and then watch as it archives the webpage!
|
||
|
||
---
|
||
|
||
## Next steps
|
||
|
||
After running your first crawl, check out the following to learn more about Browsertrix's features:
|
||
|
||
- A detailed list of [crawl workflow setup](workflow-setup.md) options.
|
||
- Adding [exclusions](workflow-setup.md#exclusions) to limit your crawl's scope and evading crawler traps by [editing exclusion rules while crawling](running-crawl.md#live-exclusion-editing).
|
||
- Best practices for crawling with [browser profiles](browser-profiles.md) to capture content only available when logged in to a website.
|
||
- Managing archived items, including [uploading previously archived content](archived-items.md#uploading-web-archives).
|
||
- Organizing and combining archived items with [collections](collections.md) for sharing and export.
|
||
- [Invite collaborators](org-members.md) to your org.
|