Internal Security Systems Overview
Document ID: ELI-SEC-001
Document Title: Internal Security Systems Overview
Version No: 1.1
Date: September 5, 2025
Classification: Confidential
1. PURPOSE
This document provides a comprehensive overview of Eli Health's data flows, system architecture, and security safeguards to demonstrate compliance with PIPEDA, Law 25, GDPR, and HIPAA (where applicable). It serves as a reference for audits, Data Protection Impact Assessments (DPIAs), and incident response planning.
The document outlines the technical and organizational measures implemented to protect user data, particularly sensitive health-related information processed through our hormone monitoring system.
2. SCOPE
This document applies to all Eli Health systems, applications, and infrastructure that process, store, or transmit user data, including sensitive health-related information.
It covers:
- The Hormone Analysis Engine (HAE) and supporting software services
- Mobile applications (iOS and Android) and web interfaces
- APIs, databases, and third-party integrations
- Internal employee and contractor access to systems containing user data
- Security controls, policies, and procedures required for GDPR, PIPEDA, Law 25, and HIPAA (where applicable)
- Infrastructure components hosted on Google Cloud Platform (GCP)
- Development and deployment pipelines via GitHub Actions
- Monitoring and incident response systems
Excluded:
- Marketing websites that do not process personal health data
- Third-party systems outside Eli's control (unless a Data Processing Agreement is in place)
- Test data that does not contain real user information
3. TERMS AND DEFINITIONS
| Term | Definition |
|---|---|
| HAE | Hormone Analysis Engine - Core system for analyzing hormone test results |
| PII | Personal Identifiable Information - Data that can identify an individual |
| PHI | Personal Health Information - Health-related data about an individual |
| RBAC | Role-Based Access Control - Security approach that restricts system access |
| DPO | Data Protection Officer - Person responsible for data protection compliance |
| GCP | Google Cloud Platform - Cloud infrastructure provider |
| Cloud Run | GCP serverless compute platform for containerized applications |
| Firebase | Google's mobile platform for authentication and app development |
| BigQuery | GCP's enterprise data warehouse for analytics |
| Cloud SQL | GCP's fully managed relational database service |
| WAF | Web Application Firewall - Security layer protecting web applications |
| OAuth | Open standard for access delegation |
| JWT | JSON Web Token - Secure method for transmitting information |
| TLS | Transport Layer Security - Cryptographic protocol for secure communications |
| IAM | Identity and Access Management - Framework for access control |
| VPC | Virtual Private Cloud - Isolated network environment |
| SIEM | Security Information and Event Management |
| DDoS | Distributed Denial of Service - Type of cyber attack |
4. RELATED DOCUMENTS
| Document ID | Document Title |
|---|---|
| SDD | Software Design Document |
| PDPROJ-02-DDP | Design and Development Plan |
| ELI-PRIV-001 | Privacy Policy |
| ELI-SEC-002 | Incident Response Plan |
| ELI-SEC-003 | Data Retention Policy |
| ELI-SEC-004 | Access Control Policy |
| ELI-DEV-001 | Secure Development Guidelines |
| ELI-OPS-001 | Infrastructure Operations Manual |
5. RESPONSIBILITY
| Role | Name/Department | Responsibilities |
|---|---|---|
| Chief Technology Officer | Engineering | Overall system security and architecture |
| Data Protection Officer | Legal/Compliance | GDPR/PIPEDA/Law 25 compliance |
| Infrastructure Lead | DevOps | Cloud infrastructure security |
| Security Engineer | Security Team | Security monitoring and incident response |
| Backend Lead | Engineering | API and data security |
| Mobile Lead | Engineering | Mobile app security |
| QA Lead | Quality Assurance | Security testing and validation |
| Compliance Officer | Compliance | Regulatory compliance and audits |
6. BACKGROUND
The Eli System is a hormone measurement system which allows users to monitor their hormonal state at the time of testing and to track their hormone levels over time. The system is intended to be used by persons aged 18 years and older.
System Components:
-
Hardware Component:
- Cartridge housing the lateral flow assay
- Lateral flow assay for hormone measurement
-
Software Components:
- Mobile application (iOS/Android) with camera for image capture
- Hormone Analysis Engine (HAE) for image analysis
- Backend API for data management
- Cloud infrastructure for storage and processing
- Web interfaces for data visualization (KPI Dashboard)
-
Data Processing:
- Image acquisition through mobile camera
- Analysis through HAE algorithms
- Storage in secure cloud database
- Display through mobile and web interfaces
Software Safety Classification:
The software safety classification is Class A as it is used for the display of results to the end user and presents no injury or damage to health. However, given the sensitive nature of health data, we implement security measures equivalent to higher-risk classifications.
7. FLOW OF INFORMATION
7.1 System Architecture Overview
7.2 Authentication Architecture
7.3 Data Flow and Security Boundaries
7.4 BigQuery Data Governance
Implementation Update (December 2024): Table-level IAM access control has been implemented via Terraform to protect biometric data while maintaining operational access for authorized users.
Access Control Implementation
| User Type | Access Level | Tables Accessible | Implementation |
|---|---|---|---|
| Admin Users | Dataset-level | ALL tables including record | bigquery_admin_users in Terraform |
| Admin Service Accounts | Dataset-level | ALL tables including record | bigquery_admin_service_accounts in Terraform |
| Readonly Users | Table-level | Public tables only (NOT record) | bigquery_readonly_users + bigquery_public_tables |
Current Admin Users (Production)
- chip@eli.health
- iannick@eli.health
- thomas@eli.health
- fannie@eli.health
- kpi-service-us@eli-health-prod.iam.gserviceaccount.com
Current Readonly Users (Production)
- pious@eli.health (customer support)
- media@videnglobe.com (marketing team)
7.5 Data Flow Descriptions
7.5.1 User Registration Flow
- User downloads mobile app from App Store (iOS only currently)
- User creates account with email/password
- Firebase Authentication creates user account
- Backend API creates user profile in Cloud SQL
- User data encrypted and stored
- Welcome email sent via SendGrid
7.5.2 Hormone Test Flow
- User initiates test in mobile app
- Camera captures image of test cartridge
- Image sent to HAE API via secure HTTPS
- HAE processes image using ML algorithms
- Results stored in Cloud SQL database
- Results returned to mobile app
- Aggregated analytics (non-biometric) synchronized to BigQuery
7.5.3 Data Access Flow
- User authenticates via Firebase (customers) or OAuth (internal users)
- JWT token generated with user permissions
- API validates token for each request
- RBAC determines data access level
- Data retrieved from appropriate source
- Response encrypted and sent to client
7.6 Security Layers
Layer 1: Network Security
- Cloud Armor WAF: Protects against OWASP Top 10 threats
- DDoS Protection: Adaptive protection against volumetric attacks
- SSL/TLS: All communications encrypted with TLS 1.3
- VPC Isolation: Private network segments for services
Layer 2: Application Security
- Authentication: Firebase Auth for mobile, OAuth 2.0 for web
- Authorization: JWT-based with role permissions
- Input Validation: All inputs sanitized and validated
- API Rate Limiting: Prevents abuse and ensures availability
Layer 3: Data Security
- Encryption at Rest: AES-256 for all stored data
- Encryption in Transit: TLS 1.3 for all communications
- Key Management: Google Cloud KMS for key rotation
- Data Segregation: Tenant isolation in multi-tenant architecture
Layer 4: Infrastructure Security
- Container Security: Distroless images, vulnerability scanning
- Secret Management: Google Secret Manager for credentials
- IAM Policies: Least privilege access control
- Audit Logging: Comprehensive logging to Cloud Logging
7.7 Data Classification and Handling
| Data Type | Classification | Storage | Encryption | Retention |
|---|---|---|---|---|
| User Profile | PII | Cloud SQL | AES-256 | Account lifetime |
| Health Data | PHI | Cloud SQL | AES-256 | 7 years |
| Test Images | PHI | Cloud Storage | AES-256 | 90 days |
| Analytics | Aggregated | BigQuery | AES-256 | Indefinite |
| Logs | Operational | Cloud Logging | AES-256 | 30-90 days |
| Backups | All Types | Cloud Storage | AES-256 | 30 days |
7.8 Encryption Strategy and Key Management
7.8.1 Current Architecture Analysis (September 2025)
After a comprehensive security review, we've identified critical issues with our current encryption implementation that require immediate remediation:
Current Issues:
- Wrong Data Being Encrypted: The system currently encrypts PII (names, emails) rather than PHI (biometric data)
- Key Storage Anti-Pattern: Encryption keys stored in the same database as encrypted data
- Redundant PII Encryption: User emails already secured in Firebase, creating unnecessary complexity
- Unencrypted PHI: Cortisol, progesterone, and other biometric readings stored without field-level encryption
Security Risk Assessment:
- High Risk: Data encryption keys (DEKs) stored in
data_encryption_keytable alongside encrypted data - Medium Risk: If database is compromised, attacker gets both encrypted data and encrypted keys
- Low Risk: Google KMS provides strong key encryption, but architecture weakens overall security
7.8.2 Recommended Encryption Strategy
What Should Be Encrypted (Priority Order):
-
Critical PHI (Must Encrypt):
- Hormone readings (cortisol, progesterone values)
- Biometric measurements (all test results)
- Health condition tags and symptoms
- Device pairing secrets
- Test images (already encrypted in Cloud Storage)
-
Non-Critical PII (Optional):
- User profiles managed by Firebase Authentication
- Emails and names can remain unencrypted in database for operational efficiency
- These are not PHI and don't require HIPAA-level protection
What Should NOT Be Encrypted:
- User IDs and reference keys
- Timestamps and metadata
- Aggregated analytics data
- System logs (sanitized)
7.8.3 Proper Key Management Architecture
Current (Problematic) Architecture:
Google KMS → Encrypts → DEKs (stored in same DB) → Encrypt → User Data
Recommended Architecture Option 1 - Secret Manager:
Google KMS → Encrypts → DEKs (in Secret Manager) → Encrypt → PHI Only
Firebase → Manages → User Authentication & PII
Cloud SQL → Stores → Encrypted PHI + Unencrypted operational data
Recommended Architecture Option 2 - Dedicated Key Service:
HashiCorp Vault / AWS KMS → Manages all keys
Application → Requests keys via API → Encrypts PHI
Database → Stores only encrypted PHI
Key rotation → Automated monthly
7.8.4 Next Steps
The immediate priority is to:
- Stop encrypting PII fields that don't require it (names, emails)
- Move encryption keys out of the database and into Google Secret Manager
- Implement proper field-level encryption for all biometric data (cortisol, progesterone, etc.)
- Establish a key rotation policy and procedures
7.8.5 Compliance Alignment
HIPAA Requirements (US Market):
- Encryption at Rest: ✅ Already provided by Cloud SQL
- Encryption in Transit: ✅ TLS 1.3 implemented
- Key Management: ⚠️ Needs improvement (keys in same DB)
- PHI Protection: ❌ Not currently encrypting biometric data
- Access Controls: ✅ IAM and RBAC implemented
PIPEDA/Law 25 Requirements (Canadian Market):
- Reasonable Safeguards: Current encryption insufficient for health data
- Data Minimization: Should only encrypt what's necessary
- Breach Notification: Easier with proper PHI encryption
7.8.6 Technical Implementation Details
Current Encryption (To Be Deprecated):
// Current - Encrypting wrong data
encryptedEmail = encrypt(user.email, DEK)
encryptedName = encrypt(user.firstName, DEK)
// Biometric data stored in plain text - WRONG!
Recommended Encryption:
// Recommended - Encrypt PHI only
user.email = plaintext // Operational data, secured by database encryption
user.firstName = plaintext // Not PHI
biometric.cortisolValue = encrypt(value, DEK) // PHI - must encrypt
biometric.progesteroneValue = encrypt(value, DEK) // PHI - must encrypt
Key Storage Migration:
# Current (BAD)
Database Table: data_encryption_key
├── User.firstName (key)
├── User.lastName (key)
└── User.email (key)
# Recommended (GOOD)
Google Secret Manager:
├── biometric-data-key-2025-09
├── health-tags-key-2025-09
└── device-pairing-key-2025-09
7.8.7 Incident Response Considerations
With proper PHI encryption:
- Data Breach Impact: Limited to metadata, PHI remains protected
- Key Compromise: Can rotate keys without data loss
- Compliance Reporting: Clear delineation of protected vs. unprotected data
- Recovery Time: Faster with separated key management
7.8.8 Monitoring and Auditing
Key metrics to track:
- Number of failed decryption attempts (indicates key issues)
- Key rotation compliance (monthly target)
- PHI access patterns (unusual access = potential breach)
- Encryption performance impact (less than 50ms overhead target)
7.9 Third-Party Integrations
| Service | Purpose | Data Shared | Security Measures |
|---|---|---|---|
| Firebase | Authentication | Email, User ID | OAuth 2.0, encrypted |
| SendGrid | Email delivery | Email, Name | API key auth, TLS |
| Sentry | Error monitoring | Stack traces | Sanitized, no PII |
| Google Analytics | Usage analytics | Anonymous usage | IP anonymization |
| Terra API | Wearable data | Health metrics | OAuth, encrypted |
7.9 Access Control Matrix
| Role | Mobile App | Backend API | HAE API | Database | BigQuery | Infrastructure |
|---|---|---|---|---|---|---|
| End User | Full | Via App | Via App | No | No | No |
| Support | View Only | Read | No | Read | No | No |
| Developer | Full | Full | Full | Dev Only | No | Dev Only |
| Data Team | No | No | No | Read Only | Query Only | No |
| Board Members | No | No | No | No | Read Only | No |
| DevOps | No | Deploy | Deploy | All | Admin | Full |
| Admin | Full | Full | Full | Full | Full | Full |
7.10 Monitoring and Alerting
Real-time Monitoring
- Application Performance: Cloud Monitoring dashboards
- Security Events: Cloud Security Command Center
- Error Tracking: Sentry integration
- Uptime Monitoring: Synthetic checks every minute
Alert Channels
- Critical: PagerDuty (24/7 on-call)
- High: Email to engineering team
- Medium: Slack notifications
- Low: Daily summary reports
Key Security Metrics
- Failed authentication attempts
- API rate limit violations
- WAF blocked requests
- Unusual data access patterns
- Infrastructure changes
- Certificate expiration
7.11 Incident Response
Incident Classification
- P0 (Critical): Data breach, system compromise
- P1 (High): Service outage, authentication failure
- P2 (Medium): Performance degradation, minor security issue
- P3 (Low): Non-critical bugs, documentation issues
Response Team
- Incident Commander: Coordinates response
- Technical Lead: Implements fixes
- Communications: User and stakeholder updates
- Legal/Compliance: Regulatory notifications
Response Procedures
- Detection: Automated alert or user report
- Triage: Assess severity and impact
- Containment: Isolate affected systems
- Investigation: Root cause analysis
- Remediation: Fix and patch
- Recovery: Restore normal operations
- Post-mortem: Lessons learned
7.12 Compliance Controls
GDPR Compliance
- Right to access: Data export API
- Right to deletion: Account deletion workflow
- Data portability: JSON/CSV export formats
- Consent management: Granular permissions
- Privacy by design: Minimal data collection
PIPEDA Compliance
- Accountability: Designated privacy officer
- Consent: Clear opt-in mechanisms
- Limited collection: Only necessary data
- Safeguards: Technical and organizational measures
- Openness: Transparent privacy policy
Law 25 (Quebec) Compliance
- Privacy officer: Designated for Quebec
- Impact assessments: Regular DPIAs
- Incident notification: Within 72 hours
- Consent for minors: Age verification
- Data residency: Canadian data centers option
7.13 Security Testing
Automated Testing
- SAST: Static code analysis in CI/CD
- DAST: Dynamic security testing weekly
- Dependency Scanning: Daily vulnerability checks
- Container Scanning: Image vulnerability assessment
Manual Testing
- Penetration Testing: Annual third-party assessment
- Code Reviews: All PRs reviewed for security
- Security Audits: Quarterly internal audits
- Compliance Audits: Annual compliance review
7.14 Business Continuity
Backup Strategy
- Database: Daily automated backups, 30-day retention
- Code: Git repositories with multiple remotes
- Infrastructure: Terraform state in GCS with versioning
- Secrets: Backed up in separate project
Disaster Recovery
- RTO: 4 hours for critical services
- RPO: 1 hour for data loss
- Failover: Automated to secondary region
- Testing: Quarterly DR drills
High Availability
- Multi-zone: Services deployed across zones
- Auto-scaling: Based on load patterns
- Load balancing: Global load balancer
- Health checks: Continuous monitoring
7.15 Comprehensive Logging and Monitoring System
This section details our comprehensive logging infrastructure for tracking, debugging, and auditing all system activities across mobile and backend services.
7.15.1 Logging Architecture Overview
Core Components
- Mobile Logging: Structured logging from iOS/Android applications
- Backend Logging: Centralized logging from all API services
- Trace ID System: End-to-end request tracking across all services
- Log Aggregation: Google Cloud Logging for centralized storage
- Log Analysis: Real-time analysis and alerting capabilities
Logging Strategy
Implementation Highlights:
- All mobile requests are logged with detailed context
- Backend services capture all API requests and responses
- Unique trace IDs enable full request lifecycle tracking
- Error logs include full stack traces and context
- Success messages logged for audit trail
- Performance metrics captured for optimization
7.15.2 Mobile Application Logging
iOS/Android Logging Configuration
Current Implementation:
- Log Levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
- Log Format: Structured JSON with timestamp, trace ID, user ID, event type
- Local Storage: 7-day rolling buffer on device
- Upload Strategy: Batched uploads every 5 minutes or on critical errors
- Privacy: PII/PHI automatically redacted before upload
Mobile Log Fields:
{
"timestamp": "2025-09-18T10:30:00Z",
"trace_id": "550e8400-e29b-41d4-a716-446655440000",
"user_id": "encrypted_user_id",
"device_info": {
"platform": "iOS",
"version": "17.0",
"app_version": "1.2.3"
},
"event": "hormone_test_initiated",
"metadata": {...}
}
7.15.3 Backend Service Logging
API Request/Response Logging
Comprehensive Tracking:
- Request Logging: Method, endpoint, headers, body (sanitized)
- Response Logging: Status code, response time, body (sanitized)
- Error Logging: Full stack traces, error codes, recovery actions
- Success Logging: Completion status, performance metrics
Backend Log Structure:
{
"timestamp": "2025-09-18T10:30:00Z",
"trace_id": "550e8400-e29b-41d4-a716-446655440000",
"service": "backend-api",
"method": "POST",
"endpoint": "/api/v1/test-results",
"status": 200,
"duration_ms": 145,
"user_id": "encrypted_user_id",
"request_size": 2048,
"response_size": 512
}
7.15.4 Trace ID Implementation
End-to-End Request Tracking
Trace ID Flow:
- Generation: Mobile app generates unique trace ID for each user action
- Propagation: Trace ID included in all API headers
- Service Chaining: Backend services pass trace ID to all downstream calls
- HAE Integration: Image processing includes trace ID in all logs
- Database Operations: Trace ID logged with all database queries
- Response Path: Same trace ID used for response logging
Benefits:
- Complete request lifecycle visibility
- Easy debugging of complex workflows
- Performance bottleneck identification
- User journey reconstruction
- Incident investigation efficiency
7.15.5 Log Aggregation and Storage
Google Cloud Logging Configuration
Current Setup:
- Log Router: Filters and routes logs to appropriate sinks
- Log Buckets: Separate buckets for different log types
- Retention Policies:
- Error logs: 90 days
- Request/response logs: 30 days
- Debug logs: 7 days
- Audit logs: 7 years
- Access Control: IAM-based with audit trail
Log Sinks Configuration:
Error Logs → Critical Errors Bucket (90-day retention)
API Logs → Request/Response Bucket (30-day retention)
Audit Logs → Compliance Bucket (7-year retention)
Performance Logs → Metrics Bucket (30-day retention)
7.15.6 Log Analysis and Monitoring
Real-time Analysis
Monitoring Dashboards:
- Error Rate Dashboard: Real-time error tracking by service
- Performance Dashboard: API latency and throughput metrics
- User Journey Dashboard: Trace ID based flow visualization
- Security Dashboard: Failed auth attempts, suspicious patterns
Alert Configuration:
- Error rate > 1% triggers immediate alert
- Response time > 2s triggers performance alert
- Failed authentication patterns trigger security alert
- Service unavailability triggers critical alert
7.15.7 Log Security and Compliance
Data Protection
Security Measures:
- Encryption: All logs encrypted at rest and in transit
- PII Redaction: Automatic removal of sensitive data
- Access Logging: All log access is audited
- Retention Compliance: Automatic deletion per policy
Compliance Features:
- GDPR-compliant data retention
- PIPEDA audit trail requirements
- HIPAA-compliant PHI handling
- Law 25 transparency requirements
7.15.8 Troubleshooting with Logs
Using Trace IDs for Debugging
Step-by-Step Process:
- Identify Issue: User reports problem or monitoring detects anomaly
- Locate Trace ID: Find in error report or user session
- Query Logs: Search all logs with specific trace ID
- Analyze Flow: Review complete request lifecycle
- Identify Root Cause: Pinpoint exact failure point
- Resolution: Apply fix and verify through logs
Example Query:
SELECT timestamp, service, message, metadata
FROM logs
WHERE trace_id = '550e8400-e29b-41d4-a716-446655440000'
ORDER BY timestamp ASC
7.15.9 Performance Insights from Logs
Metrics Derived from Logs
- API Response Times: P50, P95, P99 latencies
- Error Rates: By endpoint, user segment, time period
- User Patterns: Feature usage, journey completion rates
- System Health: Service availability, dependency health
7.15.10 Mobile to Backend Logging Flow
Complete Logging Architecture
Key Features of Logging Flow
Mobile Side:
- Async Batching: Logs batched up to 200, sent every 5 minutes or on critical errors
- Offline Support: Logs cached in AsyncStorage when offline
- Rate Limiting: 2-second minimum between flushes, 1-second between batches
- Authentication Required: Logs only sent when user is authenticated
- Automatic Retry: Failed logs added back to buffer for next flush
Trace ID Implementation:
- Generation: UUID v4 format (32 hex characters) generated on mobile
- Propagation: Included in all API headers as
X-Trace-Id - Correlation: Same trace_id used across mobile → backend → HAE
- User Context: Both user_id and username included for complete tracking
- Span Support: Sub-operations tracked with span_id within same trace
Backend Processing:
- Middleware Chain: TraceMiddleware → UserContextMiddleware → HttpLoggingMiddleware
- Enrichment: Backend adds server-side context (timestamps, service names)
- Unified Format: All logs normalized to consistent JSON structure
- Error Tracking: Full stack traces with trace_id correlation
Storage & Analysis:
- Centralized Storage: All logs aggregated in Google Cloud Logging
- Trace-based Queries: Can query entire request lifecycle using trace_id
- User Journey Tracking: Complete user action flow from mobile to completion
- Performance Analysis: Request timings across all services
- Error Correlation: Link errors across mobile, backend, and HAE
7.16 Database Backup and Disaster Recovery
This section details our comprehensive backup and disaster recovery strategy for all critical systems and data stores.
7.16.1 Google Cloud Platform (GCP) Infrastructure
Current Backup Configuration
- Platform: Google Cloud Platform (Multi-Region)
- Production Region: us-east1 (South Carolina, USA)
- Development Region: northamerica-northeast1 (Montreal, Canada)
- Backup Frequency: Continuous and automated
- Retention Policy: Varies by service (detailed below)
- Access Control: Role-based via IAM
GCP Services Backup Details:
Cloud SQL PostgreSQL (Primary Database)
Current Settings (Verified via Terraform):
- Automated Backups: ENABLED
- Backup Frequency: Daily at 3:00 AM UTC
- Retention Period: 7 daily backups (1 week)
- Point-in-Time Recovery: ENABLED
- Transaction Log Retention: 7 days
- Backup Location: Same region as database (us-east1 for production, northamerica-northeast1 for development)
- SSL/TLS: Required for all connections
- Deletion Protection: ENABLED (both Terraform and GCP Console)
Recovery Capabilities:
- Point-in-time recovery to any second within the last 7 days
- Full backup restoration from any of the 7 daily backups
- Cross-region restoration supported
- Automated failover in case of zone failure
Cloud Storage (Object Storage)
Current Configuration:
- Versioning: Enabled on all production buckets
- Soft Delete: 30-day retention for deleted objects
- Cross-Region Replication: Configured for critical buckets
- Lifecycle Policies:
- Test images: 90-day retention
- Logs: 30-day retention
- User uploads: Indefinite retention
Cloud Run Services
Deployment Strategy:
- Blue-Green Deployments: Zero-downtime updates
- Revision History: Last 100 revisions retained
- Traffic Management: Gradual rollout capabilities
- Rollback: Instant rollback to previous revisions
Infrastructure as Code (Terraform)
State Management:
- Backend: Google Cloud Storage with versioning
- State Locking: Enabled to prevent concurrent modifications
- State Backup: Automatic versioning in GCS
- Separate Environments:
- Development: eli-health-dev bucket
- Staging: eli-health-staging bucket
- Production: eli-health-prod bucket
7.16.2 GitHub
Repository Backup Strategy
Current Implementation:
- Distributed Version Control: Every clone is a full backup
- Multiple Remotes: Repositories mirrored across team members
- Branch Protection: Main branches protected from force pushes
- Commit History: Full history preserved indefinitely
Disaster Recovery:
- RPO: Near-zero (last push to remote)
- RTO: Minutes (clone from any team member)
- Access Control: 2FA required for all contributors
Recommended Enhancements:
- Implement automated daily backups to GCS
- Set up GitHub repository archiving
- Enable GitHub Advanced Security features (requires GitHub Enterprise or paid add-on):
- Code Scanning: Automatically detect security vulnerabilities in code
- Secret Scanning: Find accidentally committed API keys and passwords
- Dependency Review: Identify vulnerable dependencies in pull requests
- Security Alerts: Get notifications about known vulnerabilities
7.16.3 Firebase Analytics & Authentication
Firebase Analytics Export Configuration
Current Setup:
- BigQuery Export: ENABLED
- Export Frequency: Hourly export (overwrites previous data - no history kept)
- Data Retention:
- Firebase Console: 2 months rolling window
- BigQuery: Only latest snapshot available (overwrites hourly)
- Export Includes: All events, user properties, and audiences
Data Recovery:
- Can only restore most recent hourly snapshot
- No historical recovery available due to overwrite pattern
- Firebase Console retains 2 months of data independently
Firebase Authentication Backup
Current Setup:
- BigQuery Export: Hourly export (overwrites previous data)
- User Data: Email, UID, metadata exported
- History: No historical backups maintained
- Risk: Data corruption would propagate within 1 hour
Recommended Improvements:
- Implement dated snapshots (e.g.,
users_2025_09_18) instead of overwriting - Maintain 7-30 days of historical snapshots
- Enable point-in-time recovery for user data
Access Control:
- Firebase Console: OAuth-based access
- BigQuery: IAM-controlled with audit logging
7.16.4 Sentry (Error Monitoring)
Data Retention Policy
Current Configuration:
- Error Events: 90-day retention (Sentry Team plan)
- Performance Data: 30-day retention
- Attachments: 30-day retention
- Issue History: Indefinite
Backup Strategy:
- Critical Errors: Automatically forwarded to Slack for permanent record
- Export Capability: API access for data export
- GCP Synchronization: All Sentry errors synchronized to Google Cloud Logging for advanced query analysis
- Dual Storage: Errors stored in both Sentry and GCP for redundancy
Advanced Analysis Features:
- Errors from Sentry available in GCP Log Explorer
- Can correlate with other system logs using trace IDs
- BigQuery integration for complex error pattern analysis
- Long-term retention in GCP beyond Sentry's 90-day limit
Disaster Recovery:
- Sentry is a SaaS platform with its own DR
- Full error history maintained in Google Cloud Logging
- Can reconstruct complete error history from GCP logs
- Dual storage ensures no data loss if either platform is unavailable
7.16.5 BigQuery (Data Warehouse)
Backup and Recovery Features
Current Implementation:
- Automatic Backups: Managed by Google (7-day time travel)
- Table Snapshots: Can be created for long-term retention
- Dataset Copies: Scheduled copies to backup datasets
- Time Travel: Query data from up to 7 days ago
Data Governance (Updated December 2024):
- Table-Level Access Control: Implemented via Terraform to protect biometric data
- Admin Users (Dataset-level access to ALL tables):
- Readonly Users (Table-level access to PUBLIC tables only):
- pious@eli.health (customer support)
- media@videnglobe.com (marketing team)
- These users CANNOT access the
recordtable (biometric data)
- Audit Logging: All queries logged in Cloud Audit Logs
- Data Classification:
recordtable: Contains PHI/biometric data - RESTRICTED ACCESS- Other tables: Operational data - accessible by readonly users
Security Implementation:
- Access control managed via Terraform (
bigquery-iammodule) - Changes require code review and Terraform apply
- No manual IAM changes permitted (Infrastructure as Code)
- Automatic new table protection when added to restricted list
Disaster Recovery:
- RPO: Near-zero (streaming inserts)
- RTO: Immediate (multi-region availability)
- Export Options: Scheduled exports to Cloud Storage
7.16.6 PostgreSQL (Cloud SQL)
Advanced Configuration
Security Features:
- SSL/TLS: Enforced for all connections
- IAM Authentication: Enabled for service accounts
- Private IP: Available via VPC peering
- Automated Patches: Security updates auto-applied
Current Setup:
- Regional Configuration: Single zone deployment (us-east1)
- Connection Pooling: Node.js connection pool via TypeORM (configured via POSTGRES_POOL_SIZE environment variable)
- High Availability: Not currently configured (single zone)
- Read Replicas: Not implemented
- Automatic Failover: Zone failure recovery only
Database Access:
- ORM: TypeORM with PostgreSQL driver (pg)
- Connection Pool: Managed by TypeORM, not pgBouncer
- Pool Size: Configurable via environment variable
- Connection Timeout: Default TypeORM settings
Monitoring:
- Cloud Monitoring: CPU, memory, disk metrics
- Query Insights: Performance analysis enabled
- Alert Policies: Configured for critical metrics
- Error Monitoring: Database errors tracked through application logs and pushed to GCP Cloud Logging
7.16.7 Comprehensive Disaster Recovery Plan
Recovery Objectives by Service
| Service | RPO (Recovery Point Objective) | RTO (Recovery Time Objective) | Backup Method |
|---|---|---|---|
| Cloud SQL PostgreSQL | 1 second (PITR) | Less than 5 minutes | Automated + PITR |
| Cloud Storage | Near-zero | Immediate | Versioning + Replication |
| BigQuery | Near-zero | Immediate | Time Travel + Snapshots |
| GitHub | Last commit | Less than 10 minutes | Distributed VCS |
| Firebase Analytics | 24 hours | Not applicable | BigQuery Export |
| Sentry | Real-time | SaaS managed | Cloud Logging backup |
| Cloud Run | Current revision | Less than 1 minute | Revision history |
| Secrets | Version-controlled | Less than 5 minutes | Secret Manager versions |
Disaster Recovery Procedures
Scenario 1: Database Corruption or Deletion
- Immediate Response:
- Stop write operations to prevent further corruption
- Assess the extent of data loss
- Recovery Steps:
- For recent corruption (less than 7 days): Use Point-in-Time Recovery
- For older issues: Restore from daily backup
- Validate data integrity post-restoration
- Post-Recovery:
- Run data consistency checks
- Update audit logs
- Conduct post-mortem analysis
Scenario 2: Regional Outage
- Detection: Automated monitoring alerts
- Failover Process:
- Cloud SQL: Automatic failover to standby
- Cloud Run: Traffic routing to healthy region
- Storage: Access via multi-region configuration
- Communication: Update status page and notify users
Scenario 3: Security Breach
- Containment:
- Revoke compromised credentials immediately
- Enable emergency access controls
- Assessment:
- Review audit logs for unauthorized access
- Identify affected data and systems
- Recovery:
- Restore from known-good backups
- Rotate all credentials and keys
- Implement additional security measures
Testing and Validation
Quarterly DR Drills:
- Full database restoration test
- Regional failover simulation
- Security incident response exercise
- Communication protocol validation
Monthly Validation:
- Backup integrity checks
- Recovery procedure documentation review
- Access control audits
- Monitoring system tests
Key Personnel and Responsibilities
| Role | Primary Responsibility | Backup Personnel |
|---|---|---|
| Incident Commander | Coordinate DR response | CTO / VP Engineering |
| Database Admin | Database restoration | Senior Backend Engineer |
| Infrastructure Lead | Service failover | DevOps Engineer |
| Security Lead | Security assessment | Security Engineer |
| Communications | User/stakeholder updates | Product Manager |
Recovery Runbooks
Detailed runbooks are maintained in the private ops repository for:
- PostgreSQL point-in-time recovery
- BigQuery dataset restoration
- Cloud Run service rollback
- Secret rotation procedures
- GitHub repository recovery
7.16.8 Continuous Improvement
Regular Reviews
- Quarterly: DR plan review and updates
- Semi-Annual: Full DR simulation
- Annual: Third-party DR audit
Metrics Tracking
- Backup success rate (target: 99.9%)
- Recovery test success rate (target: 100%)
- Mean time to recovery (target: less than RTO)
- Data integrity validation (target: 100%)
8. APPENDICES
Appendix A: Security Checklist
Development Phase
- Secure coding guidelines followed
- Input validation implemented
- Authentication/authorization checked
- Sensitive data identified and protected
- Security tests written
- Code review completed
Deployment Phase
- Vulnerability scan passed
- Secrets properly managed
- Infrastructure hardened
- Monitoring configured
- Backup verified
- Documentation updated
Operations Phase
- Security patches applied
- Access reviews conducted
- Logs monitored
- Incidents tracked
- Compliance maintained
- Training completed
Appendix B: Contact Information
| Role | Contact | Escalation |
|---|---|---|
| Security Team | security@eli.health | 24/7 |
| Privacy Officer | privacy@eli.health | Business hours |
| Infrastructure | devops@eli.health | On-call |
| Compliance | compliance@eli.health | Business hours |
| Legal | legal@eli.health | Business hours |
Appendix C: Tool Reference
| Tool | Purpose | Access |
|---|---|---|
| GCP Console | Infrastructure management | IAM controlled |
| Firebase Console | Authentication management | Admin only |
| Sentry | Error monitoring | Developer access |
| PagerDuty | Incident management | On-call team |
| GitHub | Code repository | Team access |
| Terraform | Infrastructure as code | DevOps only |
Appendix D: Regulatory References
- GDPR: Regulation (EU) 2016/679
- PIPEDA: Personal Information Protection and Electronic Documents Act
- Law 25: Quebec Bill 64, Act to modernize legislative provisions
- HIPAA: Health Insurance Portability and Accountability Act (US)
- ISO 27001: Information security management systems
- SOC 2: Service Organization Control 2
Document Control
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2025-08-18 | Engineering Team | Initial version |
| 1.1 | 2025-09-05 | Engineering Team | Added Section 7.8 - Encryption Strategy and Key Management: Identified critical issues with current encryption, documented proper PHI vs PII encryption approach, proposed key management architecture improvements, and defined immediate next steps for remediation |
| 1.2 | 2025-09-16 | Engineering Team | Added Section 7.16 - Database Backup and Disaster Recovery: Comprehensive documentation of backup strategies, disaster recovery procedures, and retention policies for all critical systems including GCP, GitHub, Firebase Analytics, Sentry, BigQuery, and PostgreSQL |
| 1.3 | 2025-09-18 | Engineering Team | Added Section 7.15 - Comprehensive Logging and Monitoring System: Detailed documentation of mobile and backend logging infrastructure, trace ID implementation for end-to-end request tracking, log aggregation, analysis capabilities, and troubleshooting procedures. Removed future enhancements section. |
| 1.4 | 2025-09-18 | Engineering Team | Updated Firebase Analytics/Auth, Sentry, and PostgreSQL sections with current implementation details. Added comprehensive Mermaid diagram showing complete mobile-to-backend logging flow with trace ID correlation. Clarified PostgreSQL uses TypeORM pooling (not pgBouncer) and single-zone deployment (not HA). |
| 1.5 | 2025-12-01 | Engineering Team | BigQuery Table-Level Access Control: Implemented Terraform-managed IAM for biometric data protection. Updated Section 7.4 (BigQuery Data Governance) with new access control diagram showing admin vs readonly user access tiers. Updated Section 7.16.5 with detailed access control implementation. The record table containing biometric/PHI data is now restricted to admin users only; readonly users can access all other tables. |
| 1.6 | 2025-12-02 | Engineering Team | Corrected BigQuery Data Replication Description: Fixed diagrams in Sections 7.1 and 7.3 to accurately reflect that biometric data IS replicated to BigQuery but protected via role-based access control (not excluded from replication). Changed "NO Biometric Data" to "Biometric Data Protected". Updated readonly user roles: pious@eli.health is customer support (not board member), media@videnglobe.com is marketing team (not external analyst). |
Next Review Date: March 2026 Document Owner: Chief Technology Officer Classification: Confidential - Internal Use Only
This document contains confidential and proprietary information of Eli Health Inc. Unauthorized distribution or disclosure is strictly prohibited.