browsertrix/backend
Tessa Walsh 879e509b39 Backend: Move page file and error counts to crawl replay.json endpoint (#1868)
Backend work for #1859

- Remove file count from qa stats endpoint
- Compute isFile or isError per page when page is added
- Increment filePageCount and errorPageCount per crawl to count number of isFile or isError pages
- Add file and error counts to crawl replay.json endpoint (filePageCount and errorPageCount)
- Add migration 0028 to set isFile / isError for each page, aggregate filePageCount / errorPageCount per crawl
- Determine if page is a file based on loadState == 2, mime type or status code and lack of title
2024-06-20 19:02:57 -07:00
..
btrixcloud Backend: Move page file and error counts to crawl replay.json endpoint (#1868) 2024-06-20 19:02:57 -07:00
test Backend: Move page file and error counts to crawl replay.json endpoint (#1868) 2024-06-20 19:02:57 -07:00
test_nightly Give test_crawl_timeout 10 mins to finish (#1627) 2024-03-26 18:33:30 -07:00
.pylintrc
Dockerfile Backend mem usage fix - use fixed MOTOR_MAX_WORKERS + switch to gunicorn (#1468) 2024-01-16 15:32:42 -08:00
mypy.ini Support multiple crawler versions (#1420) 2024-01-16 15:32:12 -08:00
requirements.txt Add endpoints to read pages from older crawl WACZs into database (#1562) 2024-03-19 14:14:21 -07:00
test-requirements.txt Add slugs to org backend (#1250) 2023-10-10 18:30:09 -07:00