Design Rationale
================

.. verified:: 2025-11-12
   :reviewer: Christof Buchbender

This document explains the key design decisions behind the ops-db-api architecture and why alternative approaches were rejected.

.. contents:: Table of Contents
   :local:
   :depth: 2

The Remote Observatory Challenge
---------------------------------

Understanding why this architecture exists requires understanding the environment:

Physical Constraints
~~~~~~~~~~~~~~~~~~~~

The CCAT observatory site:

* **Altitude**: 5600m (18,400 feet) - extreme environment
* **Location**: Atacama Desert, Chile - remote, harsh conditions
* **Distance**: 11,000+ km from main database in Cologne, Germany
* **Connectivity**: Fiber
* **Bandwidth**: Limited when available
* **Latency**: 200-500ms when connected, infinite when not


The Unacceptable Solution
~~~~~~~~~~~~~~~~~~~~~~~~~~

**What doesn't work**: Direct writes to remote database

.. code-block:: python

   # This fails during network outage
   def record_observation(obs_data):
       try:
           db.insert(obs_data)  # Blocks until remote DB responds
           db.commit()          # Fails if network down
       except NetworkError:
           # What now? Observation already happened!
           # Data is lost or requires manual intervention
           raise

**Problems**:

1. Blocks observation workflow on network latency
2. Fails completely during outages
3. No recovery mechanism
4. Lost data requires manual intervention

Why Redis for Buffering?
-------------------------

We chose Redis over other buffering solutions for specific reasons:

Redis Advantages
~~~~~~~~~~~~~~~~

**1. In-Memory Speed**
   
   * Sub-millisecond write latency
   * No disk I/O blocking
   * Observation recording never waits

**2. Persistence Options**
   
   * AOF (Append-Only File) for durability
   * Configurable fsync policies
   * Survives process restart

**3. Data Structures**
   
   * Lists for transaction queue (LPUSH/RPOP)
   * Hashes for transaction metadata
   * Sets for tracking state
   * Pub/Sub for real-time updates

**4. Simple Deployment**
   
   * Single binary, minimal dependencies
   * Low resource overhead
   * Proven reliability

**5. Atomic Operations**
   
   * Transaction queuing is atomic
   * No race conditions
   * Safe for concurrent access

Alternatives Considered
~~~~~~~~~~~~~~~~~~~~~~~

**RabbitMQ / Kafka**

   * **Pro**: Purpose-built for message queuing
   * **Pro**: Better durability guarantees
   * **Con**: Much heavier weight
   * **Con**: More complex deployment
   * **Con**: Overkill for single-site buffering

**Local SQLite**

   * **Pro**: SQL interface
   * **Pro**: ACID guarantees
   * **Con**: Disk I/O latency
   * **Con**: Locking under concurrent access
   * **Con**: No built-in pub/sub for updates

**PostgreSQL Queue Table**

   * **Pro**: Same database system
   * **Pro**: ACID guarantees
   * **Con**: Adds load to database
   * **Con**: Defeats purpose (still requires DB connection)
   * **Con**: Slower than in-memory

**File System Queue**

   * **Pro**: Simple, no dependencies
   * **Pro**: Durable by default
   * **Con**: No atomicity guarantees
   * **Con**: Race conditions on concurrent access
   * **Con**: No built-in monitoring

**Decision**: Redis provides the best balance of speed, reliability, and simplicity for our use case.

Why PostgreSQL Streaming Replication?
--------------------------------------

We use PostgreSQL's built-in streaming replication rather than application-level sync:

Streaming Replication Benefits
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**1. Built-in Mechanism**
   
   * Native PostgreSQL feature
   * Battle-tested in production
   * No custom sync logic needed

**2. Log Sequence Numbers (LSN)**
   
   * Precise replication state tracking
   * Know exactly what data has replicated
   * Enable smart cache management

**3. Read-Only Replicas**
   
   * Safe by design (can't accidentally write)
   * Multiple replicas possible
   * No conflict resolution needed

**4. Continuous Sync**
   
   * Changes stream as they happen
   * No batch delays
   * Minimal lag when network available

**5. Automatic Catchup**
   
   * Replica auto-catches up after network restoration
   * Uses WAL (Write-Ahead Log) for efficiency
   * No manual intervention required

Why Not Application-Level Sync?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Alternative**: Sync tables at application level (e.g., with timestamps)

**Problems**:

1. **Conflict Resolution**: What if same record modified at both sites?
2. **Cascading Updates**: Foreign key relationships become complex
3. **Schema Changes**: Every DB change needs sync logic update
4. **Partial Failure**: Some tables sync, others don't - inconsistent state
5. **Performance**: Full table scans or complex timestamp tracking

**PostgreSQL streaming replication** avoids all these issues by replicating at the
database engine level.

Why Eventual Consistency?
--------------------------

We explicitly choose eventual consistency over strong consistency:

The Trade-Off
~~~~~~~~~~~~~

**Strong Consistency** (rejected):
   
   * Every write waits for confirmation from all replicas
   * Guarantees immediate consistency everywhere
   * **Blocks when network down** ❌

**Eventual Consistency** (chosen):
   
   * Writes succeed locally, sync in background
   * Data becomes consistent "eventually" (seconds to minutes)
   * **Never blocks operations** ✅

Why This Is Acceptable
~~~~~~~~~~~~~~~~~~~~~~~

**For Observatory Operations**:

* Recording observation at 00:00:00 doesn't need to be visible in Cologne at 00:00:01
* Scientists querying data hours/days later don't care about seconds of lag
* **Reliability > Immediate Consistency**

**For UI Users**:

* Transfer monitoring: Seconds of lag is imperceptible
* Data browsing: Minutes-old data is fine for most use cases
* Real-time updates: WebSockets provide immediate local feedback

**Critical**: We track replication state precisely with LSN, so we *know* when consistency is achieved.

Why Not Multi-Master?
----------------------

**Multi-Master Replication** (rejected): Both sites can write, changes sync bidirectionally

**Sounds Good But**:

1. **Conflict Resolution**: What if both sites create same ID?
   
   .. code-block:: sql

      -- Site A creates:
      INSERT INTO obs (id=123, status='running');
      
      -- Simultaneously, Site B creates:
      INSERT INTO obs (id=123, status='completed');
      
      -- Which one wins when they sync?

2. **Complex Logic**: Need custom resolution for every conflict type

3. **Partial Updates**: Site A updates field X, Site B updates field Y - merge how?

4. **Schema Evolution**: Database changes must work on both sites simultaneously

5. **Testing Complexity**: Every scenario needs conflict resolution test

**Our Single-Master Approach**:

* All writes go to Cologne (authoritative)
* Observatory buffers and forwards
* No conflicts possible
* Simple, predictable behavior

Cost of This Approach
~~~~~~~~~~~~~~~~~~~~~~

We pay a cost for this simplicity:

* Observatory can't immediately see writes from Cologne (read-only replica)
* Network partition means buffered data accumulates
* Main site failure blocks all writes (but buffering prevents data loss)

**We accept these costs** because the alternative (conflict resolution) is far more
complex and error-prone. Writes from Cologne are not needed immediately.

Technology Choices
------------------

Framework: FastAPI
~~~~~~~~~~~~~~~~~~

**Why FastAPI**:

* Modern async Python framework
* Automatic OpenAPI documentation
* Type hints and validation (Pydantic)
* WebSocket support built-in
* High performance (comparable to Go, Node.js)

**Why not Django/Flask**:

* Django: Too heavyweight, REST framework less modern
* Flask: Lacks async support, no automatic docs

ORM: SQLAlchemy
~~~~~~~~~~~~~~~

**Why SQLAlchemy**:

* Industry standard Python ORM
* Powerful query builder
* Async support (SQLAlchemy 1.4+)
* Used by ops-db package (consistency)

**Why not Django ORM**:

* Tied to Django framework
* Less flexible for complex queries

Database: PostgreSQL
~~~~~~~~~~~~~~~~~~~~

**Why PostgreSQL**:

* Streaming replication built-in
* LSN tracking for replication state
* JSON/JSONB for flexible schemas
* Proven at scale
* Excellent ops-db package support

**Why not MySQL/MariaDB**:

* Replication less sophisticated
* No LSN equivalent
* Weaker JSON support

Cache/Buffer: Redis
~~~~~~~~~~~~~~~~~~~

**Why Redis**:

* Discussed in detail above
* In-memory speed
* Versatile data structures
* Simple deployment

Language: Python
~~~~~~~~~~~~~~~~

**Why Python**:

* Astronomy/science community standard
* Rich ecosystem (astropy, numpy, etc.)
* Easy for scientists to extend
* Same language as ops-db package

Summary
-------

Every major design decision has a rationale:

* **Unified API**: Shared infrastructure, planned split later
* **Redis Buffering**: Speed + reliability + simplicity
* **PostgreSQL Replication**: Built-in, LSN tracking, proven
* **Eventual Consistency**: Reliability over immediate consistency
* **Single Master**: Simplicity over conflict resolution
* **FastAPI + SQLAlchemy + PostgreSQL + Redis**: Modern, proven, Python ecosystem

These decisions optimize for the **critical requirement**: **Observatory operations must
never fail due to network issues.**

Next Steps
----------

* :doc:`distributed-architecture` - How the database topology works
* :doc:`reliability-first` - Deep dive into reliability mechanisms