browsertrix/frontend/docs/mkdocs.yml
Tessa Walsh f8fb2d2c8d
Rework crawl page migration + MongoDB Query Optimizations (#2412)
Fixes #2406 

Converts migration 0042 to launch a background job (parallelized across
several pods) to migrate all crawls by optimizing their pages and
setting `version: 2` on the crawl when complete.

Also Optimizes MongoDB queries for better performance.

Migration Improvements:

- Add `isMigrating` and `version` fields to `BaseCrawl`
- Add new background job type to use in migration with accompanying
`migration_job.yaml` template that allows for parallelization
- Add new API endpoint to launch this crawl migration job, and ensure
that we have list and retry endpoints for superusers that work with
background jobs that aren't tied to a specific org
- Rework background job models and methods now that not all background
jobs are tied to a single org
- Ensure new crawls and uploads have `version` set to `2`
- Modify crawl and collection replay.json endpoints to only include
fields for replay optimization (`initialPages`, `pageQueryUrl`,
`preloadResources`) if all relevant crawls/uploads have `version` set to
`2`
- Remove `distinct` calls from migration pathways
- Consolidate collection recompute stats

Query Optimizations:
- Remove all uses of $group and $facet
- Optimize /replay.json endpoints to precompute preload_resources, avoid
fetching crawl list twice
- Optimize /collections endpoint by not fetching resources 
- Rename /urls -> /pageUrlCounts and avoid $group, instead sort with
index, either by seed + ts or by url to get top matches.
- Use $gte instead of $regex to get prefix matches on URL
- Use $text instead of $regex to get text search on title
- Remove total from /pages and /pageUrlCounts queries by not using
$facet
- frontend: only call /pageUrlCounts when dialog is opened.


---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Co-authored-by: Emma Segal-Grossman <hi@emma.cafe>
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
2025-02-20 15:26:11 -08:00

141 lines
3.8 KiB
YAML

site_name: Browsertrix Docs
repo_url: https://github.com/webrecorder/browsertrix-cloud/
repo_name: Browsertrix
edit_uri: edit/main/frontend/docs/docs/
extra_css:
- stylesheets/extra.css
extra_javascript:
- js/insertversion.js
theme:
name: material
custom_dir: docs/overrides
features:
- navigation.tabs
- navigation.tabs.sticky
- navigation.instant
- navigation.tracking
- navigation.footer
- content.code.copy
- content.action.edit
- content.tooltips
- search.suggest
palette:
scheme: webrecorder
logo: assets/brand/browsertrix-icon-white.svg
favicon: assets/brand/favicon.svg
icon:
admonition:
note: bootstrap/pencil-fill
abstract: bootstrap/file-earmark-text-fill
info: bootstrap/info-circle-fill
tip: bootstrap/exclamation-circle-fill
success: bootstrap/check-circle-fill
question: bootstrap/question-circle-fill
warning: bootstrap/exclamation-triangle-fill
failure: bootstrap/x-octagon-fill
danger: bootstrap/exclamation-diamond-fill
bug: bootstrap/bug-fill
example: bootstrap/mortarboard-fill
quote: bootstrap/quote
repo: bootstrap/github
edit: bootstrap/pencil
view: bootstrap/eye
nav:
- Overview: index.md
- User Guide:
- Getting Started:
- user-guide/index.md
- user-guide/signup.md
- user-guide/getting-started.md
- Orgs:
- user-guide/org.md
- user-guide/join.md
- user-guide/overview.md
- Crawling:
- user-guide/crawl-workflows.md
- user-guide/workflow-setup.md
- user-guide/running-crawl.md
- Archived Items:
- user-guide/archived-items.md
- user-guide/review.md
- Collections:
- user-guide/collection.md
- user-guide/presentation-sharing.md
- Browser Profiles:
- user-guide/browser-profiles.md
- Org Settings:
- user-guide/org-settings.md
- user-guide/org-members.md
- Account Settings:
- user-guide/user-settings.md
- user-guide/contribute.md
- Self-Hosting:
- Overview: deploy/index.md
- deploy/local.md
- deploy/remote.md
- deploy/customization.md
- deploy/proxies.md
- Ansible:
- deploy/ansible/digitalocean.md
- deploy/ansible/microk8s.md
- deploy/ansible/k3s.md
- Administration:
- deploy/admin/upgrade-notes.md
- deploy/admin/org-import-export.md
- Development:
- develop/index.md
- develop/local-dev-setup.md
- develop/frontend-dev.md
- develop/localization.md
- develop/docs.md
- API Reference: !ENV [ API_DOCS_URL, "/api/" ]
markdown_extensions:
- toc:
toc_depth: 3
permalink: true
- pymdownx.highlight:
anchor_linenums: true
- pymdownx.emoji:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
options:
custom_icons:
- docs/overrides/.icons
- admonition
- pymdownx.inlinehilite
- pymdownx.details
- pymdownx.superfences
- pymdownx.tabbed:
alternate_style: true
- pymdownx.keys
- def_list
- attr_list
extra:
generator: false
social:
- icon: bootstrap/globe
link: https://webrecorder.net
- icon: bootstrap/chat-left-text-fill
link: https://forum.webrecorder.net/
- icon: bootstrap/mastodon
link: https://digipres.club/@webrecorder
- icon: bootstrap/youtube
link: https://www.youtube.com/@webrecorder
analytics:
provider: plausible
enable_analytics: !ENV ENABLE_ANALYTICS
copyright: "Creative Commons Attribution 4.0 International (CC BY 4.0)"
plugins:
- search
- redirects:
redirect_maps:
"user-guide/collections.md": "user-guide/collection.md"