Increase startup probe time to account for long-running migrations (#1560)

- increases the failureThreshold for startupProbe for the api backend
container to account for long running migrations, upto 300 seconds
- add `/healthzStartup` which checks if db is ready
- bump 
- keeps `/healthz` to always return 200 when running
- increases livenessProbe failureThreshold to be higher than readiness
probe, following recommended best practice of liveness probe > readiness
probe
- fixes #1559
This commit is contained in:
Ilya Kreymer 2024-02-28 14:22:33 -08:00 committed by GitHub
parent 14189b7cfb
commit 804f755787
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 15 additions and 8 deletions

View File

@ -189,12 +189,20 @@ def main():
async def openapi() -> JSONResponse:
return JSONResponse(app_root.openapi())
@app_root.get("/healthz", include_in_schema=False)
async def healthz():
# Used for startup
# Returns 200 only when db is available + migrations are done
@app_root.get("/healthzStartup", include_in_schema=False)
async def healthz_startup():
if not db_inited.get("inited"):
raise HTTPException(status_code=503, detail="not_ready_yet")
return {}
# Used for readiness + liveness
# Always returns 200 while running
@app_root.get("/healthz", include_in_schema=False)
async def healthz():
return {}
app_root.include_router(app, prefix=API_PREFIX)

View File

@ -97,11 +97,10 @@ spec:
startupProbe:
httpGet:
path: /healthz
path: /healthzStartup
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 30
failureThreshold: 60
successThreshold: 1
readinessProbe:
@ -119,7 +118,7 @@ spec:
port: 8000
initialDelaySeconds: 5
periodSeconds: 30
failureThreshold: 5
failureThreshold: 15
successThreshold: 1
- name: op
@ -176,7 +175,7 @@ spec:
port: {{ .Values.opPort }}
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 30
failureThreshold: 5
successThreshold: 1
readinessProbe:
@ -194,7 +193,7 @@ spec:
port: {{ .Values.opPort }}
initialDelaySeconds: 5
periodSeconds: 30
failureThreshold: 5
failureThreshold: 15
successThreshold: 1