Bridgit

Monitoring & Operations Policy

Bridgit Platform (askbridgit.ca)
Version 1.0 | Effective: April 29, 2026 | Next Review: October 29, 2026

1. Purpose and Scope

This policy defines the operational procedures, monitoring practices, backup and recovery processes, and security event detection mechanisms for the Bridgit platform.

Applies to: All personnel with access to Bridgit production systems, source code, or cloud infrastructure.

In scope: Cloud Run services, Cloud SQL database, Redis, GCS storage, GitHub CI/CD pipeline, third-party API integrations, and all application-level logging and monitoring.

Compliance mapping: ISO 27001 A.12, A.13; SOC 2 CC2.1, CC4, CC5.2, CC7.

2. Operational Procedures

Documented Procedures

Operating procedures are maintained in the git repository:

Documentation is updated when code changes affect procedures. A full documentation audit was conducted in February 2026.

Change Management

Three deployment modes via GitHub Actions:

All deployments require:

Emergency changes deployed directly with post-deployment review within 48 hours.

Capacity Management

Cloud Run provides automatic scaling. Cloud SQL connection pooling manages database capacity. AI provider usage monitored via ai_usage_logs. Capacity reviewed when performance issues are reported.

3. Malware Protection

Bridgit is cloud-native with no managed endpoints.

Application level: file upload validation (type and size constraints), npm dependency scanning (npm audit, Dependabot), no executable uploads permitted.

Infrastructure level: Cloud Run containers rebuilt from Dockerfiles on each deployment (ephemeral, stateless). Cloud SQL and GCS managed by Google with provider-level security patching.

Development: developer machines rely on OS-level protections (macOS Gatekeeper, XProtect).

Users report suspicious behavior via the Report a Problem form with security triage fields.

4. Backup and Recovery

Backup Policy

Cloud SQL database:

Source code: git repository with full history on GitHub.

File uploads: GCS regional redundancy (no separate backup required).

Secrets: GCP Secret Manager maintains secret versions.

Redis: append-only file persistence. Session data is transient.

Gap: local pg_dump backups are not encrypted at rest.

Backup Schedule

Restoration

  1. Identify appropriate backup (Cloud SQL automated or manual pg_dump)
  2. Restore via GCP console or psql import
  3. Verify key tables (users, activity_instances, form_schemas)
  4. Restart application services
  5. Confirm functionality via smoke test

Backup restoration tested semi-annually.

5. Logging and Monitoring

Events Logged

Log Retention

Gap: Cloud Run logs should be exported to Cloud Storage or BigQuery for longer retention.

Log Review

Gap: formal scheduled log review and automated error rate alerting to be implemented.

6. Network Security

Network Controls

Segmentation

Cloud-native architecture provides isolation:

Monitoring

7. Security Event Detection

Detection Mechanisms

Application-level: authentication failure logging, RBAC unauthorized access logging, API error logging, Report a Problem form with security section.

Infrastructure-level: Cloud Run error rates, Cloud SQL audit logs, GCP IAM audit logs, GitHub audit log.

Dependency-level: npm audit, GitHub Dependabot alerts.

AI monitoring: ai_usage_logs tracks all provider calls for anomalous patterns.

No formal SIEM deployed.

Alerting and Escalation

Follows Incident Response Policy severity levels:

Security-flagged Problem Reports reviewed by Platform Administrator.

Gap: automated threshold alerting to be configured.

8. Anomaly Monitoring

Baselines

Established through operational experience:

Baselines are informal at current scale. Formal statistical baseline collection is a planned improvement.

Anomaly Detection

Rule-based: authentication failure thresholds, RBAC blocking, file upload validation.

Manual: developer observation during deployments, Problem Report investigation, periodic ai_usage_logs review.

Infrastructure: Cloud Run auto-scaling behavior changes, Cloud SQL slow query logs.

Response Triggers

Automated: Cloud Run container restart, JWT blacklist enforcement, RBAC blocking.

Manual: unusual AI usage investigation, authentication failure investigation, cross-org data report triage, deployment rollback on error spike.

9. IT General Controls

Program Change Management: git version control, GitHub Actions CI/CD, three deployment modes, staging validation, mandatory backup, rollback capability.

Access to Programs and Data: GCP IAM, GitHub repository access, application RBAC, JWT with Redis blacklisting, AES-256-GCM encrypted OAuth tokens.

Computer Operations: Cloud Run auto-scaling and health management, Cloud Scheduler for GDPR jobs, automated daily database backups.

Program Development: PRD-driven process, architect/critic/code review for systemic changes, automated testing and linting, E2E tests.

Configuration management: docker-compose.yml, GitHub Actions workflows, GCP Secret Manager.

10. Information Quality and Internal Controls

Data integrity: PostgreSQL constraints, application-level validation, Sequelize model validation, Survey.js schema validation.

Source reliability: all data from platform database or GCP-managed services.

Timeliness: audit trail recorded at time of action, Cloud Run logs near real-time, AI usage logged synchronously.

Completeness: all API routes pass through middleware (auth, RBAC, billing).

Control issues communicated via: immediate escalation for P1/P2, development backlog for improvements, semi-annual review for gaps.

Critical deficiency escalation: within 24 hours.

11. Policy Administration

This policy is maintained alongside the platform source code and is subject to version control. Changes require review and re-approval.