Docs: Elaborates on using user agents (#1841)
- Provides a link to Mozilla's page explaining what they are (good for folks new to the concept) - Provides a link to useragents.me, the same site we link to in the app - Provides two examples of situations where they may be helpful to get around content restrictions
This commit is contained in:
parent
b432d226bd
commit
251aef3ac1
@ -168,7 +168,20 @@ Will prevent any content from the domains listed in [Steven Black's Unified Host
|
||||
|
||||
### User Agent
|
||||
|
||||
Sets the browser's user agent in outgoing requests to the specified value. If left blank, the crawler will use the browser's default user agent.
|
||||
Sets the browser's [user agent](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent) in outgoing requests to the specified value. If left blank, the crawler will use the Brave browser's default user agent. For a list of common user agents see [useragents.me](https://www.useragents.me/).
|
||||
|
||||
??? example "Using custom user agents to get around restrictions"
|
||||
Despite being against best practices, some websites will block specific browsers based on their user agent: a string of text that browsers send web servers to identify what type of browser or operating system is requesting content. If Brave is blocked, using a user agent string of a different browser (such as Chrome or Firefox) may be sufficient to convince the website that a different browser is being used.
|
||||
|
||||
User agents can also be used to voluntarily identify your crawling activity, which can be useful when working with a website's owners to ensure crawls can be completed successfully. We recommend using a user agent string similar to the following, replacing the `orgname` and URL comment with your own:
|
||||
|
||||
```
|
||||
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.3 orgname.browsertrix (+https://example.com/crawling-explination-page)
|
||||
```
|
||||
|
||||
If you have no webpage to identify your organization or statement about your crawling activities available as a link, omit the bracketed comment section at the end entirely.
|
||||
|
||||
This string must be provided to the website's owner so they can allowlist Browsertrix to prevent it from being blocked.
|
||||
|
||||
### Language
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user