Skip to content

Warrior AI — QA Strategy

Test Strategy.
Sign-Off Ready.

Full QA coverage from unit tests through 10,000-user load simulation. Every item Anna needs to sign off — functional, load, security, and CI/CD gates — documented with pass criteria and current status.

✓ WARAI-71 CORS Fixed✓ WARAI-72 Rate Limiting Live42 Security Checklist Items22 Anna's Gate Items
3Test Environments
100+Unit Tests (planned)
10E2E Scenarios
5Load Test Phases
p95 < 3sResponse Threshold
22Sign-Off Checklist Items

Three QA Pillars

Functional

Every endpoint works correctly, auth is solid, user data is isolated per Warrior.

Manually verified — automated suite to build

Load & Scale

System stays operational at 5,000–10,000 concurrent Warriors with sub-3s p95 response times.

Architecture designed — staged scaling plan documented
🔒

Security

Malicious users cannot break, extract, or corrupt Warrior data across any attack surface.

Two critical gaps resolved — full audit documented

Test Environments

All QA work uses Staging only. No test ever runs against the Production Firebase project.

EnvironmentFirebase ProjectPurposeAccess
Stagingattack-with-stack-stagingAll QA, load testing, integration testingSteffen, Weston, QA
BetaTBDPre-production validation with invited WarriorsWeston to provision
ProductionTBDLive system for 3,000+ WarriorsPost-launch only

Testing Pyramid

QA Testing Pyramid — Unit, Integration, E2E
E2E Tests5–10 scenariosFull conversation: Flutter → Gateway → Dify → Bridge → Firebase
Integration Tests30–50 testsGateway ↔ Dify, Dify ↔ Bridge, Bridge ↔ Firestore
Unit Tests100+ testsAuth middleware, Zod schemas, route handlers

Unit Tests — Gateway

FileTestExpected
firebase-auth.test.tsValid Firebase ID token200 — userId in context
Missing Authorization header401 — Unauthorized
Malformed bearer token401 — Unauthorized
Expired token401 — Unauthorized
Token for User A used by User B401 — invalid signature
rate-limit.test.ts20 requests in 60sAll 200
21st request in window429 with Retry-After header
Two users, each at 20 reqBoth 200 — per-user limits
/health — any request countAlways 200 — not rate limited
chat.test.tsValid auth + message → Dify mock200 SSE stream
Missing/empty message field400 — Bad Request
Dify returns 5xx502 — Upstream Error

Unit Tests — Firebase Bridge

TestExpected
Stack content at 5,000 charsPasses validation
Stack content at 5,001 charsRejected — content too long
Stack type not in allowed enumRejected — invalid enum
Missing required fieldsRejected — missing field error
digital_trainer_stack flag on createAlways set to true
GET /stacks/:userId — known user200 with array
POST /stacks/:userId — valid payload201digital_trainer_stack: true
PATCH /stacks/:userId/:stackId — non-existent404
GET /context/:userId — cold RedisResponse < 300ms
GET /context/:userId — warm RedisResponse < 10ms
PUT /core4/:userId/:date — 6 of 8 toggles truescore = 75.0

Anna's 10 E2E Test Scenarios

These are the scenarios that represent a passing QA sign-off. Each requires a real staging session.

#ScenarioPass Signal
E2E-01Power Stack conversation (happy path)SSE streams within 3s; agent correctly identified; prior stacks referenced
E2E-02Coordinator routing accuracy≥ 8/10 messages routed to correct specialist (80% threshold)
E2E-03Firebase personalizationAgent's response references user's prior stack content
E2E-04Stack write and verificationNew stack in staging Firestore with digital_trainer_stack: true
E2E-05Auth rejection401 within 200ms; no Dify call made
E2E-06User data isolationUser A and User B see only their own data — zero cross-contamination
E2E-07Rate limit enforcement21st request returns 429 with Retry-After
E2E-08Core4 score calculationPUT Core4 with 6/8 true → score = 75.0 in Firestore
E2E-09Bridge write rate limit11th write/minute returns 429 from Bridge
E2E-10Full voice conversationAudio plays within 5s; no transcription gaps

Load Testing Plan

Tool: k6. SSE-native, TypeScript-friendly, clean dashboards, CI-ready.

Phase Schedule

PhaseUsersDurationGoalTrigger
Baseline0 → 1001 hourFind current ceilingNow (before Garrett demo)
Confidence0 → 2002 hoursConfirm 200-user stabilityAfter baseline passes
Stress0 → 5004 hoursFind the failure modePre-beta launch
Scale0 → 2,0008 hoursValidate Step 1 upgradeAfter VPS upgrade
Production Sim0 → 5,00024 hoursValidate Stage 2 architecturePre-production launch

Metrics Thresholds

MetricGreenYellowRed
p95 response time< 3s3–8s> 8s
Error rate< 0.5%0.5–2%> 2%
VPS RAM< 9GB9–11GB> 11GB
VPS CPU< 70%70–85%> 85%
Dify worker queue depth< 1010–50> 50
Redis cache hit rate> 80%60–80%< 60%

Anna's Gate — Full Sign-Off Checklist

All items require evidence, not just "looks good". All F-01–F-09 must be GREEN before the Garrett demo. All S-01–S-20 before production launch.

7.1 Functional Readiness

F-01Unit test suite exists for warrior-hono-gateway — all passing
F-02Unit test suite exists for warrior-firebase-bridge — all passing
F-03E2E-01: Happy path Power Stack conversation completes successfully
F-04E2E-02: Coordinator routes ≥ 8/10 test messages to correct specialist
F-05E2E-03: Agent references user's prior stacks in response
F-06E2E-04: AI-created stack appears in Firebase with digital_trainer_stack: true
F-07E2E-05: Expired token returns 401 within 200ms
F-08E2E-06: User A cannot see User B's stack data
F-09E2E-09: Bridge write rate limit (429 on 11th write/minute) confirmed

7.2 Load Readiness

L-01Baseline load test completed — results documented with p50/p95/p99
L-02150 concurrent users: p95 response < 3s, error rate < 1%
L-03VPS RAM < 10GB under 150-user sustained load
L-04Redis cache hit rate > 80% under load
L-05Celery worker queue depth < 50 under load
L-06Scaling upgrade path documented and costed

7.3 Security Readiness — Minimum Pre-Launch

Full security analysis in Security Documentation. Items below are the minimum gate. S-01–S-20 all required before production launch.

S-01ALLOWED_ORIGIN set in staging .env (WARAI-71 — merged)
S-02Rate limiting confirmed: 21st request returns 429 (WARAI-72 — merged)
S-03Direct HTTP to Bridge port 4000 from outside VPS: connection refused
S-04Firebase service account credentials absent from all logs
S-05Dify API key absent from any client-visible response
S-06Input length limit on /chat message field implemented
S-07Bridge write rate limit (ADR-W021 Redis pattern) implemented
S-08warrior-hono-gateway binds to 127.0.0.1:3000 + nginx reverse proxy (WARAI-69)
S-09Qdrant API key set, enforced, .env permissions verified at 600
S-10dify-sandbox on bridge mode; cannot reach dify-db or dify-redis
S-11SSRF proxy blocklist covers Docker bridge gateway IP range
S-12SSH port 22 restricted to Twingate egress IPs — not open to 0.0.0.0/0
S-13Firebase .env permissions at 600; git history scanned clean
S-14Gateway calls verifyIdToken with checkRevoked: true
S-15Bridge write path uses HMAC-signed user token — not bare Dify user_id (ADR-W026)
S-16Firestore Security Rules: request.auth.uid == userId enforced
S-17Injection pattern blocklist active at Gateway before Dify handoff
S-18Identity anchor + security boundary in all 7 agent system prompts
S-19Pre-write Code Node validation gate in Dify (user_id from system variable only)
S-20Bridge Docker container isolated from dify-sandbox and Dify LLM containers (ADR-W027)

7.4 CI/CD Readiness

CI-01GitHub Actions workflow runs unit tests on every push to dev
CI-02bun run typecheck passes in CI for both gateway and bridge repos
CI-03Load test baseline runs as scheduled nightly job or on release-candidate tags — not on every push to main

Open Items

ItemOwnerPriorityBlocks
Set ALLOWED_ORIGIN in staging .envSteffen / WestonHIGHS-01
Implement Bridge write rate limit (ADR-W021 Redis)JeremyHIGHS-07
Add input length limit on Gateway /chat messageJeremyMEDIUMS-06
Build unit test suite for both reposJeremyHIGHF-01, F-02
Set up GitHub Actions CI pipelineJeremyMEDIUMCI-01
Provision Beta Firebase projectWestonMEDIUMBeta launch
Full security audit — prompt injection, Dify sandboxingSeparate sessionHIGHProduction

Where Truth Meets Time.