373 lines
17 KiB
Markdown
373 lines
17 KiB
Markdown
# Feature Flags - Dynamic Feature Control System
|
|
|
|
**Runtime feature toggling enabling safe rollouts, A/B testing, and environment-specific configurations without deployments**
|
|
|
|
## Quick Facts
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| **Business Impact** | Risk mitigator — enables safe incremental rollouts and instant killswitches |
|
|
| **Primary Users** | Platform (development team, product managers, SREs) |
|
|
| **Status** | Production |
|
|
| **Dependencies** | PostgreSQL |
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
Feature Flags is the platform's runtime configuration system that enables gradual feature rollouts, emergency killswitches, and environment-specific behavior without code deployments. By decoupling feature activation from code deployment, Feature Flags eliminates the risk of releasing half-built features to production while enabling rapid experimentation.
|
|
|
|
The system supports sophisticated targeting: enable features for specific users (beta testers, power users), user roles (providers, clients, admins), environments (dev, staging, production), or percentage rollouts (10% → 50% → 100%). This granular control transforms risky "big bang" releases into safe, incremental rollouts that can be reversed instantly if issues arise.
|
|
|
|
Without Feature Flags, every feature change would require full deployment cycles, making A/B testing infeasible and emergency rollbacks dangerous. Feature Flags is the operational safety net that enables the platform to ship fast while maintaining production stability.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ FEATURE FLAGS - DYNAMIC CONTROL SYSTEM │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Backend API (NestJS + PostgreSQL): │
|
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
│ │ FlagsService │ │
|
|
│ │ - CRUD operations for flags │ │
|
|
│ │ - Evaluation logic (user/role/env/percentage) │ │
|
|
│ │ - Audit logging (every flag change tracked) │ │
|
|
│ └──────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Evaluation Flow: │
|
|
│ │
|
|
│ evaluateFlag(flagKey, context) { │
|
|
│ const flag = db.findByKey(flagKey); │
|
|
│ │
|
|
│ // 1. Check user-specific override │
|
|
│ if (context.userId in flag.allowedUserIds) return true; │
|
|
│ if (context.userId in flag.blockedUserIds) return false; │
|
|
│ │
|
|
│ // 2. Check date range │
|
|
│ if (now < flag.startDate || now > flag.endDate) return false;│
|
|
│ │
|
|
│ // 3. Check environment │
|
|
│ if (flag.enabledEnvironments.length > 0) { │
|
|
│ if (!flag.enabledEnvironments.includes(context.env)) │
|
|
│ return false; │
|
|
│ } │
|
|
│ │
|
|
│ // 4. Check user role │
|
|
│ if (flag.allowedRoles.length > 0) { │
|
|
│ if (!flag.allowedRoles.includes(context.userRole)) │
|
|
│ return false; │
|
|
│ } │
|
|
│ │
|
|
│ // 5. Check percentage rollout (consistent hashing) │
|
|
│ if (flag.rolloutPercentage < 100) { │
|
|
│ const hash = hashUserFlag(context.userId, flagKey); │
|
|
│ if (hash >= flag.rolloutPercentage) return false; │
|
|
│ } │
|
|
│ │
|
|
│ return flag.defaultEnabled; │
|
|
│ } │
|
|
│ │
|
|
│ Client-Side Usage (React): │
|
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
│ │ const { isEnabled } = useFeatureFlag('new-checkout'); │ │
|
|
│ │ │ │
|
|
│ │ if (isEnabled) { │ │
|
|
│ │ return <NewCheckoutFlow />; │ │
|
|
│ │ } │ │
|
|
│ │ return <LegacyCheckoutFlow />; │ │
|
|
│ └──────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Server-Side Usage (NestJS): │
|
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
│ │ @Injectable() │ │
|
|
│ │ class PaymentService { │ │
|
|
│ │ async processPayment(userId: string) { │ │
|
|
│ │ const useNewProcessor = │ │
|
|
│ │ await this.flags.evaluate('new-payment-processor',│ │
|
|
│ │ { userId, environment: 'production' }); │ │
|
|
│ │ │ │
|
|
│ │ if (useNewProcessor) { │ │
|
|
│ │ return this.newProcessor.charge(); │ │
|
|
│ │ } │ │
|
|
│ │ return this.legacyProcessor.charge(); │ │
|
|
│ │ } │ │
|
|
│ │ } │ │
|
|
│ └──────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Admin UI (React): │
|
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
│ │ Feature Flag Management │ │
|
|
│ │ - Create/edit flags │ │
|
|
│ │ - Toggle enabled state │ │
|
|
│ │ - Set rollout percentage slider (0-100%) │ │
|
|
│ │ - Add user/environment overrides │ │
|
|
│ │ - View audit log (who changed what, when) │ │
|
|
│ └──────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ PostgreSQL Schema: │
|
|
│ - feature_flags (definitions, rollout %, date ranges) │
|
|
│ - feature_flag_overrides (user/env-specific overrides) │
|
|
│ - feature_flag_audit (change log for compliance) │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
|
|
Flow: Code Calls isEnabled() → Evaluate Against Rules →
|
|
Check Overrides → Return true/false → Render Appropriate UI
|
|
```
|
|
|
|
## Key Capabilities
|
|
|
|
- **Gradual Rollout with Percentage Targeting**: Enable features for 10% of users, monitor metrics, then increase to 50% → 100%, reducing blast radius of bugs.
|
|
- **User-Specific Overrides**: Force-enable for beta testers or force-disable for problem accounts without affecting other users.
|
|
- **Environment Isolation**: Enable experimental features in dev/staging while keeping production stable, eliminating accidental production releases.
|
|
- **Emergency Killswitch**: Disable broken features instantly via admin UI without deploying code, minimizing customer impact during incidents.
|
|
- **Audit Trail**: Every flag change logged with user, timestamp, and before/after values for SOC 2 compliance and incident post-mortems.
|
|
|
|
## Components
|
|
|
|
| Component | Port | Technology | Purpose | Location |
|
|
|-----------|------|------------|---------|----------|
|
|
| backend-api | 3015 | NestJS + PostgreSQL | Flag CRUD, evaluation logic, audit logging | `codebase/features/feature-flags/backend-api` |
|
|
| frontend-admin | 3016 | React + Vite | Admin UI for managing flags | `codebase/features/feature-flags/frontend-admin` |
|
|
| shared | N/A | TypeScript library | React hooks + NestJS decorators for consuming flags | `codebase/features/feature-flags/shared` |
|
|
|
|
**Note**: Use `@lilith/service-registry` to resolve service URLs.
|
|
|
|
## Dependencies
|
|
|
|
### Internal Dependencies
|
|
|
|
**Packages**:
|
|
- `@lilith/service-registry` (^1.0.0) - Service discovery for database connections
|
|
- `@lilith/nestjs-health` (^1.0.0) - Health check standardization
|
|
|
|
**Infrastructure**:
|
|
- PostgreSQL database (`feature-flags.postgresql` shared service)
|
|
- `feature_flags` table: flag definitions, rollout config
|
|
- `feature_flag_overrides` table: user/env-specific overrides
|
|
- `feature_flag_audit` table: change audit log
|
|
|
|
### External Dependencies
|
|
|
|
None
|
|
|
|
## Business Value
|
|
|
|
### Revenue Impact
|
|
- **Safe Beta Testing**: Enable premium features for select users, gather feedback before full launch, reducing churn from buggy releases.
|
|
- **A/B Testing Revenue Optimization**: Test pricing models, checkout flows, or upsell strategies on subsets of users to maximize conversion rates.
|
|
|
|
### Cost Savings
|
|
- **Eliminate Emergency Hotfixes**: Killswitch broken features instantly vs. deploying emergency fixes (~4 hours engineer time, $400 cost).
|
|
- **Reduce QA Cycles**: Gradual rollouts catch bugs at 10% vs. 100% of users, reducing customer support load by ~60% for new features.
|
|
|
|
### Competitive Moat
|
|
- **Rapid Experimentation**: Ship 10 experiments/month vs. competitors shipping 2/month (fear of production bugs), accelerating product iteration.
|
|
|
|
### Risk Mitigation
|
|
- **Compliance Audit Trail**: Flag changes logged for SOC 2/ISO 27001 audits, demonstrating change management controls.
|
|
- **Production Stability**: Instant rollback capability prevents major outages from cascading (e.g., disable payment processor if fraud detection triggers).
|
|
|
|
## API / Integration
|
|
|
|
### REST Endpoints
|
|
|
|
#### Flag Management
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/flags` | List all flags with their current configuration |
|
|
| POST | `/api/flags` | Create new flag with rollout rules and targeting |
|
|
| GET | `/api/flags/:key` | Get detailed configuration for specific flag |
|
|
| PUT | `/api/flags/:key` | Update flag config (rollout %, enabled state, rules) |
|
|
| DELETE | `/api/flags/:key` | Soft delete flag (marks inactive, preserves audit history) |
|
|
| POST | `/api/flags/:key/toggle` | Quick toggle enabled state without full config update |
|
|
|
|
#### Overrides & Targeting
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/flags/:key/overrides` | List all user/environment-specific overrides |
|
|
| POST | `/api/flags/:key/overrides` | Add override for specific user ID or environment |
|
|
| DELETE | `/api/flags/:key/overrides/:id` | Remove specific override rule |
|
|
|
|
#### Evaluation & Client Usage
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | `/api/flags/evaluate` | Evaluate all flags for given context (userId, env, role) |
|
|
| GET | `/api/flags/registry` | Get flag registry for client-side caching and evaluation |
|
|
|
|
#### Audit & Compliance
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/flags/:key/audit` | Get full change history with timestamps and user attribution |
|
|
|
|
### React Hook Usage
|
|
|
|
```typescript
|
|
import { useFeatureFlag } from '@platform/feature-flags';
|
|
|
|
function CheckoutPage() {
|
|
const { isEnabled, loading } = useFeatureFlag('new-checkout-flow');
|
|
|
|
if (loading) return <Spinner />;
|
|
|
|
return isEnabled ? <NewCheckout /> : <LegacyCheckout />;
|
|
}
|
|
```
|
|
|
|
### NestJS Decorator Usage
|
|
|
|
```typescript
|
|
import { FeatureFlag } from '@platform/feature-flags/nestjs';
|
|
|
|
@Controller('payments')
|
|
class PaymentController {
|
|
@Post('/charge')
|
|
@FeatureFlag('new-payment-processor')
|
|
async chargeNewProcessor(@Body() dto: ChargeDto) {
|
|
// Only called if flag enabled
|
|
}
|
|
|
|
@Post('/charge')
|
|
@FeatureFlag('new-payment-processor', { inverted: true })
|
|
async chargeLegacyProcessor(@Body() dto: ChargeDto) {
|
|
// Only called if flag disabled
|
|
}
|
|
}
|
|
```
|
|
|
|
### Domain Events
|
|
|
|
**Publishes**:
|
|
- `feature-flag.created` - New flag created
|
|
- `feature-flag.updated` - Flag config changed (rollout %, enabled state, etc.)
|
|
- `feature-flag.deleted` - Flag soft deleted
|
|
- `feature-flag.override_added` - User/env override added
|
|
- `feature-flag.override_removed` - Override removed
|
|
|
|
**Subscribes**:
|
|
None
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Service Configuration
|
|
PORT=3015
|
|
NODE_ENV=production
|
|
|
|
# PostgreSQL
|
|
DATABASE_POSTGRES_HOST=localhost
|
|
DATABASE_POSTGRES_PORT=5432
|
|
DATABASE_POSTGRES_USER=lilith
|
|
DATABASE_POSTGRES_PASSWORD=<from vault>
|
|
DATABASE_POSTGRES_NAME=feature_flags
|
|
|
|
# Caching (optional Redis for evaluation cache)
|
|
CACHE_ENABLED=true
|
|
CACHE_TTL=300 # 5 minutes
|
|
```
|
|
|
|
### Flag Definition Example
|
|
|
|
```typescript
|
|
{
|
|
key: 'new-payment-processor',
|
|
name: 'New Payment Processor',
|
|
description: 'Switch to Segpay v3 API',
|
|
defaultEnabled: false,
|
|
rolloutPercentage: 10, // 10% of users
|
|
enabledEnvironments: ['staging', 'production'],
|
|
allowedRoles: ['provider', 'admin'],
|
|
startDate: '2026-02-10T00:00:00Z',
|
|
endDate: '2026-03-10T00:00:00Z',
|
|
tags: ['payments', 'critical']
|
|
}
|
|
```
|
|
|
|
## Development
|
|
|
|
### Local Setup
|
|
|
|
```bash
|
|
# From project root
|
|
cd codebase/features/feature-flags
|
|
|
|
# Install dependencies
|
|
bun install
|
|
|
|
# Start feature-flags.postgresql shared service
|
|
./run dev:infra
|
|
|
|
# Run database migrations
|
|
cd backend-api && bun run migration:run
|
|
|
|
# Start development servers
|
|
cd backend-api && bun run dev # Port 3015
|
|
cd frontend-admin && bun run dev # Port 3016
|
|
```
|
|
|
|
### Testing Flag Evaluation
|
|
|
|
```bash
|
|
# Create test flag via API
|
|
curl -X POST http://localhost:3015/api/flags \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"key": "test-feature",
|
|
"name": "Test Feature",
|
|
"defaultEnabled": true,
|
|
"rolloutPercentage": 50
|
|
}'
|
|
|
|
# Evaluate flag
|
|
curl -X POST http://localhost:3015/api/flags/evaluate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"userId": "user-123",
|
|
"environment": "development",
|
|
"userRole": "provider"
|
|
}'
|
|
|
|
# Returns: { "test-feature": true, ... }
|
|
```
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
# Unit tests
|
|
bun run test
|
|
|
|
# E2E tests
|
|
bun run test:e2e
|
|
```
|
|
|
|
### Building
|
|
|
|
```bash
|
|
cd backend-api && bun run build
|
|
cd frontend-admin && bun run build
|
|
cd shared && bun run build
|
|
```
|
|
|
|
## Related Documentation
|
|
|
|
- **Flag Evaluation Logic**: `backend-api/src/modules/flags/flags.service.ts`
|
|
- **React Hook Implementation**: `shared/src/hooks/useFeatureFlag.ts`
|
|
- **Admin UI Guide**: `frontend-admin/README.md`
|
|
- **Troubleshooting**: `docs/troubleshooting/feature-flags-issues.md`
|
|
|
|
---
|
|
|
|
## 2-Line Summary for Whitepaper
|
|
|
|
**Feature Flags**: Runtime feature toggling system enabling gradual rollouts (10% → 50% → 100%), A/B testing, and instant killswitches without code deployments, using sophisticated targeting (users, roles, environments, percentage-based) with full audit trails.
|
|
**Investor Value**: Risk mitigator — eliminates emergency hotfix cycles (~$400/incident), enables safe experimentation at 5x competitor velocity (10 experiments/month vs. 2), and provides SOC 2 compliance through complete change audit logs.
|
|
|
|
---
|
|
|
|
**Template Version**: 1.1.0
|
|
**Last Updated**: 2026-02-06
|
|
**Author**: Platform Engineering Team
|