Commit Graph

12 Commits

Author SHA1 Message Date
Tessa Walsh
f7ba712646 Add seed file support to Browsertrix backend (#2710)
Fixes #2673 

Changes in this PR:

- Adds a new `file_uploads.py` module and corresponding `/files` API
prefix with methods/endpoints for uploading, GETing, and deleting seed
files (can be extended to other types of files moving forward)
- Seed files are supported via `CrawlConfig.config.seedFileId` on POST
and PATCH endpoints. This seedFileId is replaced by a presigned url when
passed to the crawler by the operator
- Seed files are read when first uploaded to calculate `firstSeed` and
`seedCount` and store them in the database, and this is copied into the
workflow and crawl documents when they are created.
- Logic is added to store `firstSeed` and `seedCount` for other
workflows as well, and a migration added to backfill data, to maintain
consistency and fix some of the pymongo aggregations that previously
assumed all workflows would have at least one `Seed` object in
`CrawlConfig.seeds`
- Seed file and thumbnail storage stats are added to org stats
- Seed file and thumbnail uploads first check that the org's storage
quota has not been exceeded and return a 400 if so
- A cron background job (run weekly each Sunday at midnight by default,
but configurable) is added to look for seed files at least x minutes old
(1440 minutes, or 1 day, by default, but configurable) that are not in
use in any workflows, and to delete them when they are found. The
backend pods will ensure this k8s batch job exists when starting up and
create it if it does not already exist. A database entry for each run of
the job is created in the operator on job completion so that it'll
appear in the `/jobs` API endpoints, but retrying of this type of
regularly scheduled background job is not supported as we don't want to
accidentally create multiple competing scheduled jobs.
- Adds a `min_seed_file_crawler_image` value to the Helm chart that is
checked before creating a crawl from a workflow if set. If a workflow
cannot be run, return the detail of the exception in
`CrawlConfigAddedResponse.errorDetail` so that we can display the reason
in the frontend
- Add SeedFile model from base UserFile (former ImageFIle), ensure all APIs
returning uploaded files return an absolute pre-signed URL (either with external origin or internal service origin)

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-07-22 19:11:02 -07:00
Tessa Walsh
da77b066a4
Prevent btrix helper from doing anything to k8s contexts other than docker-desktop (#2431)
The `./btrix` development helper shouldn't be used for anything other
than local dev, which this commit helps to enforce.

When running any command, if the k8s context is anything other than
`docker-desktop` the script will now shut down immediately without doing
anything and print the message: "Attempting to modify context other than
docker-desktop not supported. Quitting."
2025-02-26 23:13:25 -08:00
Ilya Kreymer
b574f00d2b
Add Repository Index + Chart Rename + Docs Rename (#1708)
Repository Index: Generate an index.yaml in ./docx/helm-repo/index.yaml
to allow for browsertrix to be a helm repository.
docs: rename docs.browsertrix.cloud -> docs.browsertrix.com
docs: update deployment doc to mention helm repo as preferred way to
install
docs build action: generate repository index in GH action
publish action: update auto-generated message to mention installing from
the repo.

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2024-04-21 09:42:25 -07:00
Tessa Walsh
138e2da8b3
Add setup command to btrix helper to copy local config (#1462)
Fixes #1157 

Adds `./btrix setup` command to `btrix` helper which copies the example
local config to `./chart/local.yaml` where `btrix` expects it.

If another command is run when the local config file doesn't yet exist,
the helper will stop and suggest to the user to run `./btrix setup` and
edit the resulting file.
2024-01-10 19:32:39 -08:00
Tessa Walsh
aa3f1ebf5f
Add down command to uninstall and delete data (#1285)
Small improvement to `btrix` helper. Adds `./btrix down` command to
uninstall and delete data without resetting the dev environment.
2023-10-13 17:16:12 -07:00
Tessa Walsh
ab76f0f394
Make improvements to reset command (#1160)
* Make improvements to reset command

- Removing running crawls and profile browsers
- Delete cronjobs
- Delete configmaps

---------
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2023-09-12 22:27:09 -07:00
Tessa Walsh
7cf2b11eb7
Add event webhook tests (#1155)
* Add success filter to webhook list GET endpoint

* Add sorting to webhooks list API and add event filter

* Test webhooks via echo server

* Set address to echo server on host from CI env var for k3d and microk8s

* Add -s back to pytest command for k3d ci

* Change pytest test path to avoid hanging on collecting tests

* Revert microk8s to only run on push to main
2023-09-12 22:08:40 -07:00
Anish Lakhwara
32428f4d93
fix: usr/bin/env bash interpreter for btrix (#1028) 2023-08-01 09:28:56 -07:00
Tessa Walsh
08b3d706a7
btrix helper: Add -microk8s flag to explicitly use microk8s (#888) 2023-05-30 15:41:26 -07:00
Tessa Walsh
52acd831cd
Add current context and confirmation dialog to reset/bootstrap methods (#887) 2023-05-25 13:43:53 -04:00
Tessa Walsh
a9c1c54194
Make btrix helper work with microk8s (#768)
* Check for microk8s

* Use python3

* Add note about installing pytest

* Add chart/local.yaml to .gitignore to avoid committing
2023-04-18 08:50:46 -04:00
Tessa Walsh
f6f3b7abba
Add btrix CLI dev helper (#732)
* Add btrix CLI dev helper

* Fix identation

* Use bash syntax for ifs
2023-04-05 21:51:22 -07:00