Warrior AI — QA Strategy

Test Strategy.
Sign-Off Ready.

Full QA coverage from unit tests through 10,000-user load simulation. Every item Anna needs to sign off — functional, load, security, and CI/CD gates — documented with pass criteria and current status.

✓ WARAI-71 CORS Fixed✓ WARAI-72 Rate Limiting Live42 Security Checklist Items22 Anna's Gate Items

3Test Environments

100+Unit Tests (planned)

10E2E Scenarios

5Load Test Phases

p95 < 3sResponse Threshold

22Sign-Off Checklist Items

Three QA Pillars

✅

Functional

Every endpoint works correctly, auth is solid, user data is isolated per Warrior.

Manually verified — automated suite to build

⚡

Load & Scale

System stays operational at 5,000–10,000 concurrent Warriors with sub-3s p95 response times.

Architecture designed — staged scaling plan documented

🔒

Security

Malicious users cannot break, extract, or corrupt Warrior data across any attack surface.

Two critical gaps resolved — full audit documented

Test Environments

All QA work uses Staging only. No test ever runs against the Production Firebase project.

Environment	Firebase Project	Purpose	Access
Staging	`attack-with-stack-staging`	All QA, load testing, integration testing	Steffen, Weston, QA
Beta	TBD	Pre-production validation with invited Warriors	Weston to provision
Production	TBD	Live system for 3,000+ Warriors	Post-launch only

Testing Pyramid

E2E Tests5–10 scenariosFull conversation: Flutter → Gateway → Dify → Bridge → Firebase

Integration Tests30–50 testsGateway ↔ Dify, Dify ↔ Bridge, Bridge ↔ Firestore

Unit Tests100+ testsAuth middleware, Zod schemas, route handlers

Unit Tests — Gateway

File	Test	Expected
`firebase-auth.test.ts`	Valid Firebase ID token	`200` — userId in context
	Missing Authorization header	`401` — Unauthorized
	Malformed bearer token	`401` — Unauthorized
	Expired token	`401` — Unauthorized
	Token for User A used by User B	`401` — invalid signature
`rate-limit.test.ts`	20 requests in 60s	All `200`
	21st request in window	`429` with Retry-After header
	Two users, each at 20 req	Both `200` — per-user limits
	/health — any request count	Always `200` — not rate limited
`chat.test.ts`	Valid auth + message → Dify mock	`200` SSE stream
	Missing/empty message field	`400` — Bad Request
	Dify returns 5xx	`502` — Upstream Error

Unit Tests — Firebase Bridge

Test	Expected
Stack content at 5,000 chars	Passes validation
Stack content at 5,001 chars	Rejected — content too long
Stack type not in allowed enum	Rejected — invalid enum
Missing required fields	Rejected — missing field error
`digital_trainer_stack` flag on create	Always set to `true`
GET /stacks/:userId — known user	`200` with array
POST /stacks/:userId — valid payload	`201` — `digital_trainer_stack: true`
PATCH /stacks/:userId/:stackId — non-existent	`404`
GET /context/:userId — cold Redis	Response < 300ms
GET /context/:userId — warm Redis	Response < 10ms
PUT /core4/:userId/:date — 6 of 8 toggles true	score = 75.0

Anna's 10 E2E Test Scenarios

These are the scenarios that represent a passing QA sign-off. Each requires a real staging session.

#	Scenario	Pass Signal
E2E-01	Power Stack conversation (happy path)	SSE streams within 3s; agent correctly identified; prior stacks referenced
E2E-02	Coordinator routing accuracy	≥ 8/10 messages routed to correct specialist (80% threshold)
E2E-03	Firebase personalization	Agent's response references user's prior stack content
E2E-04	Stack write and verification	New stack in staging Firestore with `digital_trainer_stack: true`
E2E-05	Auth rejection	`401` within 200ms; no Dify call made
E2E-06	User data isolation	User A and User B see only their own data — zero cross-contamination
E2E-07	Rate limit enforcement	21st request returns `429` with `Retry-After`
E2E-08	Core4 score calculation	PUT Core4 with 6/8 true → score = 75.0 in Firestore
E2E-09	Bridge write rate limit	11th write/minute returns `429` from Bridge
E2E-10	Full voice conversation	Audio plays within 5s; no transcription gaps

Load Testing Plan

Tool: k6. SSE-native, TypeScript-friendly, clean dashboards, CI-ready.

Phase Schedule

Phase	Users	Duration	Goal	Trigger
Baseline	0 → 100	1 hour	Find current ceiling	Now (before Garrett demo)
Confidence	0 → 200	2 hours	Confirm 200-user stability	After baseline passes
Stress	0 → 500	4 hours	Find the failure mode	Pre-beta launch
Scale	0 → 2,000	8 hours	Validate Step 1 upgrade	After VPS upgrade
Production Sim	0 → 5,000	24 hours	Validate Stage 2 architecture	Pre-production launch

Metrics Thresholds

Metric	Green	Yellow	Red
p95 response time	< 3s	3–8s	> 8s
Error rate	< 0.5%	0.5–2%	> 2%
VPS RAM	< 9GB	9–11GB	> 11GB
VPS CPU	< 70%	70–85%	> 85%
Dify worker queue depth	< 10	10–50	> 50
Redis cache hit rate	> 80%	60–80%	< 60%

Anna's Gate — Full Sign-Off Checklist

All items require evidence, not just "looks good". All F-01–F-09 must be GREEN before the Garrett demo. All S-01–S-20 before production launch.

7.1 Functional Readiness

F-01Unit test suite exists for warrior-hono-gateway — all passing

F-02Unit test suite exists for warrior-firebase-bridge — all passing

F-03E2E-01: Happy path Power Stack conversation completes successfully

F-04E2E-02: Coordinator routes ≥ 8/10 test messages to correct specialist

F-05E2E-03: Agent references user's prior stacks in response

F-06E2E-04: AI-created stack appears in Firebase with digital_trainer_stack: true

F-07E2E-05: Expired token returns 401 within 200ms

F-08E2E-06: User A cannot see User B's stack data

F-09E2E-09: Bridge write rate limit (429 on 11th write/minute) confirmed

7.2 Load Readiness

L-01Baseline load test completed — results documented with p50/p95/p99

L-02150 concurrent users: p95 response < 3s, error rate < 1%

L-03VPS RAM < 10GB under 150-user sustained load

L-04Redis cache hit rate > 80% under load

L-05Celery worker queue depth < 50 under load

L-06Scaling upgrade path documented and costed

7.3 Security Readiness — Minimum Pre-Launch

Full security analysis in Security Documentation. Items below are the minimum gate. S-01–S-20 all required before production launch.

S-01ALLOWED_ORIGIN set in staging .env (WARAI-71 — merged)

S-02Rate limiting confirmed: 21st request returns 429 (WARAI-72 — merged)

S-03Direct HTTP to Bridge port 4000 from outside VPS: connection refused

S-04Firebase service account credentials absent from all logs

S-05Dify API key absent from any client-visible response

S-06Input length limit on /chat message field implemented

S-07Bridge write rate limit (ADR-W021 Redis pattern) implemented

S-08warrior-hono-gateway binds to 127.0.0.1:3000 + nginx reverse proxy (WARAI-69)

S-09Qdrant API key set, enforced, .env permissions verified at 600

S-10dify-sandbox on bridge mode; cannot reach dify-db or dify-redis

S-11SSRF proxy blocklist covers Docker bridge gateway IP range

S-12SSH port 22 restricted to Twingate egress IPs — not open to 0.0.0.0/0

S-13Firebase .env permissions at 600; git history scanned clean

S-14Gateway calls verifyIdToken with checkRevoked: true

S-15Bridge write path uses HMAC-signed user token — not bare Dify user_id (ADR-W026)

S-16Firestore Security Rules: request.auth.uid == userId enforced

S-17Injection pattern blocklist active at Gateway before Dify handoff

S-18Identity anchor + security boundary in all 7 agent system prompts

S-19Pre-write Code Node validation gate in Dify (user_id from system variable only)

S-20Bridge Docker container isolated from dify-sandbox and Dify LLM containers (ADR-W027)

7.4 CI/CD Readiness

CI-01GitHub Actions workflow runs unit tests on every push to dev

CI-02bun run typecheck passes in CI for both gateway and bridge repos

CI-03Load test baseline runs as scheduled nightly job or on release-candidate tags — not on every push to main

Open Items

Item	Owner	Priority	Blocks
Set ALLOWED_ORIGIN in staging .env	Steffen / Weston	HIGH	S-01
Implement Bridge write rate limit (ADR-W021 Redis)	Jeremy	HIGH	S-07
Add input length limit on Gateway /chat message	Jeremy	MEDIUM	S-06
Build unit test suite for both repos	Jeremy	HIGH	F-01, F-02
Set up GitHub Actions CI pipeline	Jeremy	MEDIUM	CI-01
Provision Beta Firebase project	Weston	MEDIUM	Beta launch
Full security audit — prompt injection, Dify sandboxing	Separate session	HIGH	Production

Test Strategy.Sign-Off Ready.