Design Rationale ================ .. verified:: 2025-11-12 :reviewer: Christof Buchbender This document explains the key design decisions behind the ops-db-api architecture and why alternative approaches were rejected. .. contents:: Table of Contents :local: :depth: 2 The Remote Observatory Challenge --------------------------------- Understanding why this architecture exists requires understanding the environment: Physical Constraints ~~~~~~~~~~~~~~~~~~~~ The CCAT observatory site: * **Altitude**: 5600m (18,400 feet) - extreme environment * **Location**: Atacama Desert, Chile - remote, harsh conditions * **Distance**: 11,000+ km from main database in Cologne, Germany * **Connectivity**: Fiber * **Bandwidth**: Limited when available * **Latency**: 200-500ms when connected, infinite when not The Unacceptable Solution ~~~~~~~~~~~~~~~~~~~~~~~~~~ **What doesn't work**: Direct writes to remote database .. code-block:: python # This fails during network outage def record_observation(obs_data): try: db.insert(obs_data) # Blocks until remote DB responds db.commit() # Fails if network down except NetworkError: # What now? Observation already happened! # Data is lost or requires manual intervention raise **Problems**: 1. Blocks observation workflow on network latency 2. Fails completely during outages 3. No recovery mechanism 4. Lost data requires manual intervention Why Redis for Buffering? ------------------------- We chose Redis over other buffering solutions for specific reasons: Redis Advantages ~~~~~~~~~~~~~~~~ **1. In-Memory Speed** * Sub-millisecond write latency * No disk I/O blocking * Observation recording never waits **2. Persistence Options** * AOF (Append-Only File) for durability * Configurable fsync policies * Survives process restart **3. Data Structures** * Lists for transaction queue (LPUSH/RPOP) * Hashes for transaction metadata * Sets for tracking state * Pub/Sub for real-time updates **4. Simple Deployment** * Single binary, minimal dependencies * Low resource overhead * Proven reliability **5. Atomic Operations** * Transaction queuing is atomic * No race conditions * Safe for concurrent access Alternatives Considered ~~~~~~~~~~~~~~~~~~~~~~~ **RabbitMQ / Kafka** * **Pro**: Purpose-built for message queuing * **Pro**: Better durability guarantees * **Con**: Much heavier weight * **Con**: More complex deployment * **Con**: Overkill for single-site buffering **Local SQLite** * **Pro**: SQL interface * **Pro**: ACID guarantees * **Con**: Disk I/O latency * **Con**: Locking under concurrent access * **Con**: No built-in pub/sub for updates **PostgreSQL Queue Table** * **Pro**: Same database system * **Pro**: ACID guarantees * **Con**: Adds load to database * **Con**: Defeats purpose (still requires DB connection) * **Con**: Slower than in-memory **File System Queue** * **Pro**: Simple, no dependencies * **Pro**: Durable by default * **Con**: No atomicity guarantees * **Con**: Race conditions on concurrent access * **Con**: No built-in monitoring **Decision**: Redis provides the best balance of speed, reliability, and simplicity for our use case. Why PostgreSQL Streaming Replication? -------------------------------------- We use PostgreSQL's built-in streaming replication rather than application-level sync: Streaming Replication Benefits ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **1. Built-in Mechanism** * Native PostgreSQL feature * Battle-tested in production * No custom sync logic needed **2. Log Sequence Numbers (LSN)** * Precise replication state tracking * Know exactly what data has replicated * Enable smart cache management **3. Read-Only Replicas** * Safe by design (can't accidentally write) * Multiple replicas possible * No conflict resolution needed **4. Continuous Sync** * Changes stream as they happen * No batch delays * Minimal lag when network available **5. Automatic Catchup** * Replica auto-catches up after network restoration * Uses WAL (Write-Ahead Log) for efficiency * No manual intervention required Why Not Application-Level Sync? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Alternative**: Sync tables at application level (e.g., with timestamps) **Problems**: 1. **Conflict Resolution**: What if same record modified at both sites? 2. **Cascading Updates**: Foreign key relationships become complex 3. **Schema Changes**: Every DB change needs sync logic update 4. **Partial Failure**: Some tables sync, others don't - inconsistent state 5. **Performance**: Full table scans or complex timestamp tracking **PostgreSQL streaming replication** avoids all these issues by replicating at the database engine level. Why Eventual Consistency? -------------------------- We explicitly choose eventual consistency over strong consistency: The Trade-Off ~~~~~~~~~~~~~ **Strong Consistency** (rejected): * Every write waits for confirmation from all replicas * Guarantees immediate consistency everywhere * **Blocks when network down** ❌ **Eventual Consistency** (chosen): * Writes succeed locally, sync in background * Data becomes consistent "eventually" (seconds to minutes) * **Never blocks operations** ✅ Why This Is Acceptable ~~~~~~~~~~~~~~~~~~~~~~~ **For Observatory Operations**: * Recording observation at 00:00:00 doesn't need to be visible in Cologne at 00:00:01 * Scientists querying data hours/days later don't care about seconds of lag * **Reliability > Immediate Consistency** **For UI Users**: * Transfer monitoring: Seconds of lag is imperceptible * Data browsing: Minutes-old data is fine for most use cases * Real-time updates: WebSockets provide immediate local feedback **Critical**: We track replication state precisely with LSN, so we *know* when consistency is achieved. Why Not Multi-Master? ---------------------- **Multi-Master Replication** (rejected): Both sites can write, changes sync bidirectionally **Sounds Good But**: 1. **Conflict Resolution**: What if both sites create same ID? .. code-block:: sql -- Site A creates: INSERT INTO obs (id=123, status='running'); -- Simultaneously, Site B creates: INSERT INTO obs (id=123, status='completed'); -- Which one wins when they sync? 2. **Complex Logic**: Need custom resolution for every conflict type 3. **Partial Updates**: Site A updates field X, Site B updates field Y - merge how? 4. **Schema Evolution**: Database changes must work on both sites simultaneously 5. **Testing Complexity**: Every scenario needs conflict resolution test **Our Single-Master Approach**: * All writes go to Cologne (authoritative) * Observatory buffers and forwards * No conflicts possible * Simple, predictable behavior Cost of This Approach ~~~~~~~~~~~~~~~~~~~~~~ We pay a cost for this simplicity: * Observatory can't immediately see writes from Cologne (read-only replica) * Network partition means buffered data accumulates * Main site failure blocks all writes (but buffering prevents data loss) **We accept these costs** because the alternative (conflict resolution) is far more complex and error-prone. Writes from Cologne are not needed immediately. Technology Choices ------------------ Framework: FastAPI ~~~~~~~~~~~~~~~~~~ **Why FastAPI**: * Modern async Python framework * Automatic OpenAPI documentation * Type hints and validation (Pydantic) * WebSocket support built-in * High performance (comparable to Go, Node.js) **Why not Django/Flask**: * Django: Too heavyweight, REST framework less modern * Flask: Lacks async support, no automatic docs ORM: SQLAlchemy ~~~~~~~~~~~~~~~ **Why SQLAlchemy**: * Industry standard Python ORM * Powerful query builder * Async support (SQLAlchemy 1.4+) * Used by ops-db package (consistency) **Why not Django ORM**: * Tied to Django framework * Less flexible for complex queries Database: PostgreSQL ~~~~~~~~~~~~~~~~~~~~ **Why PostgreSQL**: * Streaming replication built-in * LSN tracking for replication state * JSON/JSONB for flexible schemas * Proven at scale * Excellent ops-db package support **Why not MySQL/MariaDB**: * Replication less sophisticated * No LSN equivalent * Weaker JSON support Cache/Buffer: Redis ~~~~~~~~~~~~~~~~~~~ **Why Redis**: * Discussed in detail above * In-memory speed * Versatile data structures * Simple deployment Language: Python ~~~~~~~~~~~~~~~~ **Why Python**: * Astronomy/science community standard * Rich ecosystem (astropy, numpy, etc.) * Easy for scientists to extend * Same language as ops-db package Summary ------- Every major design decision has a rationale: * **Unified API**: Shared infrastructure, planned split later * **Redis Buffering**: Speed + reliability + simplicity * **PostgreSQL Replication**: Built-in, LSN tracking, proven * **Eventual Consistency**: Reliability over immediate consistency * **Single Master**: Simplicity over conflict resolution * **FastAPI + SQLAlchemy + PostgreSQL + Redis**: Modern, proven, Python ecosystem These decisions optimize for the **critical requirement**: **Observatory operations must never fail due to network issues.** Next Steps ---------- * :doc:`distributed-architecture` - How the database topology works * :doc:`reliability-first` - Deep dive into reliability mechanisms