Commit Graph

167 Commits

Author SHA1 Message Date
Ilya Kreymer
07e9f51292
backend: update queue apis to work with new sorted queue apis (also b… (#712)
* backend: update queue apis to work with new sorted queue apis (also backwards compatible to existing apis)
designed for browsertrix-crawler 0.9.0-beta.1 but also backwards compatible with older list-based queue as well
2023-03-17 21:11:17 -07:00
Ilya Kreymer
de9212eec7
exclusions editor fix: (#692)
- backend: fix updating model after exclusions change
- frontend: don't check for new_cid, just success
- fixes #691
2023-03-10 22:36:10 -08:00
Ilya Kreymer
86ca9c4bac
backend: Fix for total crawl time limit. (#665)
* backend: fix for total crawl timelimit:
- time limit is computed for total job run time
- when limit is exceeded, job starts to stop crawls gracefully, equivalent to 'stop crawl' operation
- fix for #664

* rename crawl-timeout -> crawl_expire_time

* fix lint
2023-03-10 11:43:16 -08:00
Ilya Kreymer
c2fa78859b
permissions: allow user with 'viewer' permissions to access read-only crawlconfig apis (#687)
addresses issue in #653, fixes #685
2023-03-08 09:29:25 -08:00
Ilya Kreymer
544346d1d4
backend: make crawlconfigs mutable! (#656) (#662)
* backend: make crawlconfigs mutable! (#656)
- crawlconfig PATCH /{id} can now receive a new JSON config to replace the old one (in addition to scale, schedule, tags)
- exclusions: add / remove APIs mutate the current crawlconfig, do not result in a new crawlconfig created
- exclusions: ensure crawl job 'config' is updated when exclusions are added/removed, unify add/remove exclusions on crawl
- k8s: crawlconfig json is updated along with scale
- k8s: stateful set is restarted by updating annotation, instead of changing template
- crawl object: now has 'config', as well as 'profileid', 'schedule', 'crawlTimeout', 'jobType' properties to ensure anything that is changeable is stored on the crawl
- crawlconfigcore: store share properties between crawl and crawlconfig in new crawlconfigcore (includes 'schedule', 'jobType', 'config', 'profileid', 'schedule', 'crawlTimeout', 'tags', 'oid')
- crawlconfig object: remove 'oldId', 'newId', disallow deactivating/deleting while crawl is running
- rename 'userid' -> 'createdBy'
- remove unused 'completions' field
- add missing return to fix /run response
- crawlout: ensure 'profileName' is resolved on CrawlOut from profileid
- crawlout: return 'name' instead of 'configName' for consistent response
- update: 'modified', 'modifiedBy' fields to set modification date and user modifying config
- update: ensure PROFILE_FILENAME is updated in configmap is profileid provided, clear if profileid==""
- update: return 'settings_changed' and 'metadata_changed' if either crawl settings or metadata changed
- tests: update tests to check settings_changed/metadata_changed return values

add revision tracking to crawlconfig:
- store each revision separate mongo db collection
- revisions accessible via /crawlconfigs/{cid}/revs
- store 'rev' int in crawlconfig and in crawljob
- only add revision history if crawl config changed

migration:
- update to db v3
- copy fields from crawlconfig -> crawl
- rename userid -> createdBy
- copy userid -> modifiedBy, created -> modified
- skip invalid crawls (missing config), make createdBy optional (just in case)

frontend: Update crawl config keys with new API (#681), update frontend to use new PATCH endpoint, load config from crawl object in details view

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Co-authored-by: sua yoo <sua@webrecorder.org>
Co-authored-by: sua yoo <sua@suayoo.com>
2023-03-07 20:36:50 -08:00
Tessa Walsh
e98c7172a9
Paginate API list endpoints (#659)
* Paginate API list endpoints

fastapi-pagination is pinned to 0.9.3, the latest release that plays
nicely with pinned versions of fastapi and fastapi-users.

* Increase page size via overriden Params and Page classes

* update api resource list keys

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2023-03-06 14:41:25 -05:00
Ilya Kreymer
ace4e79e3f version: bump version to 1.4.0-beta.0 2023-03-06 10:20:56 -08:00
Ilya Kreymer
df9a7eccf3 version: bump to 1.3.1 2023-02-28 18:40:15 -08:00
Ilya Kreymer
4901fc2fe9 version: bump to 1.3.0 2023-02-24 18:07:56 -08:00
Tessa Walsh
e2f359c352
CrawlConfig migration and crawl stats query optimization (#633)
* Drop crawl stats fields from CrawlConfig and add migration

* Remove migrate_down from BaseMigration

* Get crawl stats from optimized mongo query
2023-02-24 18:01:15 -08:00
Sara Tavares
8167d7da8d
fix typos (#640) 2023-02-24 11:10:49 -08:00
Tessa Walsh
1b1bc10c60
Fix nightly tests (#632) 2023-02-23 13:57:22 -05:00
Tessa Walsh
567e851235
Dynamically calculate crawl stats for crawlconfig endpoints (#623) 2023-02-22 22:17:45 -05:00
Tessa Walsh
ed94dde7e6
Include firstSeed and seedCount in crawl endpoints (#618) 2023-02-22 10:27:31 -05:00
Ilya Kreymer
0fd18ed3dd version: bump to 1.3.0-beta.0
CHANGES: add upcoming release, link to release changelist for 1.2.0
2023-02-21 10:14:08 -08:00
Tessa Walsh
4234f89d25
Rename crawlconfig name from file suffixes (#610) 2023-02-21 12:52:22 -05:00
Tessa Walsh
30f1930519
Add back GET /users/invite/{token} used by frontend (#607) 2023-02-16 13:02:38 -05:00
Tessa Walsh
bd4fba7af7
Fix POST /orgs/{oid}/crawls/delete (#591)
* Fix POST /orgs/{oid}/crawls/delete

- Add permissions check to ensure crawler users can only delete
their own crawls
- Fix broken delete_crawls endpoint
- Delete files from storage as well as deleting crawl from db
- Add tests, including nightly test that ensures crawl files are
no longer accessible after the crawl is deleted
2023-02-15 21:06:12 -05:00
Tessa Walsh
14b349443f
Make pending invites expire via TTL index (#568)
* Make invites expire after configurable window

The value can be set in EXPIRE_AFTER_SECONDS env var and via
helm chart values, and defaults to 7 days.

* Create nightly test CI and add invite expiration test to it

* Update 404 error message for missing or expired invite

---------

Co-authored-by: sua yoo <sua@suayoo.com>
2023-02-14 16:07:14 -05:00
Tessa Walsh
103d91556f
Remove non-org-scoped invites from backend (#585)
* Remove non-org-scoped invites
- remove POST /users/invite and related tests
- remove GET /users/invite-delete/{token}
2023-02-08 18:56:28 -08:00
Tessa Walsh
b642c53c59
Make crawlconfig name optional (#588) 2023-02-08 18:38:15 -08:00
Tessa Walsh
ce8f426978
Add notes to crawl and crawl updates (#587) 2023-02-08 18:36:22 -08:00
Ilya Kreymer
40fb04b385
backend: /orgs/<id>/remove: return 404 if org user doesn't exist, fix… (#561)
* backend: /orgs/<id>/remove: return 404 if org user doesn't exist, fixes issue in #535

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2023-02-08 16:22:36 -05:00
Tessa Walsh
a7a18b9db0
Add org-specific delete invite endpoint (#575)
Adds POST /orgs/{oid}/invites/delete, which expects the invited
email address in the POST body.

This endpoint will also delete duplicate invites with the same
email/oid combination if env var ALLOW_DUPE_INVITES allows dupes.
2023-02-08 16:10:09 -05:00
Tessa Walsh
95155e6fbf
Invite token improvements (#564)
- URL decode email address in invites.invite_user
- Add tests for accepting invites
2023-02-07 20:40:28 -08:00
Tessa Walsh
6d424a1ae0
Serialize pending invites to return "id" not "_id" (#559) 2023-02-06 12:28:11 -05:00
Ilya Kreymer
67df783885 bump version to 1.2.1-beta.0 2023-02-05 12:27:45 -08:00
Ilya Kreymer
af7ba4c90a version: update to 1.2.0 2023-02-02 23:46:23 -08:00
Tessa Walsh
2e3b3cb228
Add API endpoint to update crawl tags (#545)
* Add API endpoint to update crawls (tags only for now)
* Allow setting tags to empty list in crawlconfig updates
2023-02-01 22:24:36 -05:00
Tessa Walsh
23022193fb
Reformat backend for black 23.1.0 (#548) 2023-02-01 20:01:09 -05:00
Tessa Walsh
58aafc4191
Make API updates for member updates (#541)
* Add API endpoint that lists pending invites for all orgs (superuser-only)
* Add API endpoint that lists pending invites for org
* Add user emails to /api/orgs/<oid> response
2023-02-01 16:44:00 -05:00
Ilya Kreymer
9048d46c6c backend: add extraHops to support #543 2023-02-01 13:21:26 -08:00
Tessa Walsh
7d25565ef4
Add org role to /users/me-with-orgs (#536)
* Add org role to /users/me-with-orgs
* Add SUPERADMIN role and return in /me-with-orgs for superusers
2023-01-31 16:27:13 -05:00
Tessa Walsh
6cb79b580a
Fix issue where users are added to default org as admin (#534)
Users should only be added as to the default org with Owner permissions
if they are not specifically being invited to another org. This commit
fixes the logic in the post-registration callback to make this the case.
2023-01-31 12:55:31 -08:00
Ilya Kreymer
6df31e13ab
backend: profile api: return additional data in profile /browser/<id> endpoint (#537)
supports #533 , switching to client side rendering from VNC websocket
2023-01-31 11:58:50 -08:00
Tessa Walsh
2e6bf7535d
Add support for tags to update_crawl_config API endpoint (#521)
* Add test for updating crawlconfigs
2023-01-30 21:46:54 -08:00
Tessa Walsh
231c37108c
Handle DuplicateKeyError on org rename requests (#514)
* Handle DuplicateKeyError on org rename requests
2023-01-25 17:46:35 -08:00
Tessa Walsh
9f0abd6a28
Only drop indexes if migrations are run (#515) 2023-01-25 17:46:10 -08:00
Tessa Walsh
0486d50fe9
Add new /users/me-with-orgs API endpoint (#510) 2023-01-24 10:23:30 -05:00
Tessa Walsh
31e7939cba
Add new API user management endpoints (#511)
- Remove user from org
- Delete user invite
2023-01-23 17:03:07 -08:00
Tessa Walsh
c0e2ec6155
Fix logic for creating pidfile parent dir (#512) 2023-01-23 17:02:25 -08:00
Ilya Kreymer
ccd87e0dff
Rename api / nginx settings -> backend / frontend, set pull policy job images (#504)
* rename config values
- api -> backend
- nginx -> frontend

* job pods:
- set job_pull_policy from api_pull_policy (same as backend image)
- default to Always, but can be overridden for local deployment (same as backend image)

typo fix: CRAWL_NAMESPACE -> CRAWLER_NAMESPACE (part of #491)
ansible: set default label to :latest instead of :dev for
2023-01-18 20:21:36 -08:00
Ilya Kreymer
1dfa494210
backend: add default behavior time to /api/settings (part of #321) (#499) 2023-01-18 14:52:15 -08:00
Tessa Walsh
0fa60ebc45
Rename archives/teams -> orgs in codebase + add db migration (#486)
* Rename archives to orgs and aid to oid on backend

* Rename archive to org and aid to oid in frontend

* Remove translation artifact

* Rename team -> organization

* Add database migrations and run once on startup

* This commit also applies the new by_one_worker decorator to other
asyncio tasks to prevent heavy tasks from being run in each worker.

* Run black, pylint, and husky via pre-commit

* Set db version and use in migrations

* Update and prepare database in single task

* Migrate k8s configmaps
2023-01-18 14:51:04 -08:00
Ilya Kreymer
d028b93412
backend: password related fixes: (#479)
- mongodb: support passwords with '@' by escaping mongo username and password
- superadmin: update superadmin email and password after initial creation if updated in helm values
2023-01-13 18:22:50 -08:00
Ilya Kreymer
bc67cc8443
backend: registration: (#472)
- if registration is enabled, newly registred users get added to the default org, instead of getting their own org/archive
2023-01-13 00:03:37 -08:00
Ilya Kreymer
827b643262
backend: add 'allow_dupe_invites' option to allow re-inviting users. if not set (default), duplicate invites will result in errors (#471) 2023-01-12 23:25:48 -08:00
Ilya Kreymer
4dbca8c421
email sending tweaks: (#470)
- support 'reply-to' email field in values, and in ansible-based values
- set 'subject' for different types of messages
2023-01-12 23:25:23 -08:00
Ilya Kreymer
a916322c30
ansible: digitalocean tweaks: (#469)
* ansible: digitalocean tweaks:
- add org_name to template
- better check for db existence
- simplify domain, fix default_org

chart:
- make job images pull IfNotPresent
2023-01-12 23:11:20 -08:00
Ilya Kreymer
2daa742585
Copy tags from crawlconfig to crawl (#467), fixes #466
- add tags to crawl object
- ensure tags are copied from crawlconfig to crawl when crawl is created (both manually and scheduled)
- tests: add test to ensure tags added to crawl, remove redundant wait replaced with fixtures
2023-01-12 17:46:19 -08:00