Architecture Overview
=====================
.. verified:: 2025-11-12
:reviewer: Christof Buchbender
This section provides a technical deep dive into the ops-db-api architecture, including system design, database topology, site configuration, authentication, and endpoint categorization.
.. contents:: Table of Contents
:local:
:depth: 2
Introduction
------------
The ops-db-api is built on a distributed database architecture designed to ensure observatory operations never fail due to network issues. This architecture combines:
* **FastAPI** for modern async Python web framework
* **PostgreSQL** for relational database with streaming replication
* **Redis** for transaction buffering and caching
* **SQLAlchemy** for ORM and database abstraction
* **Custom transaction buffering** for network resilience
Key Architectural Features
---------------------------
1. **Site-Aware Behavior**
The API behaves differently based on site type (MAIN vs SECONDARY), automatically routing operations appropriately.
2. **Transaction Buffering**
Critical operations at secondary sites buffer in Redis and execute asynchronously against the main database.
3. **LSN-Based Replication Tracking**
PostgreSQL Log Sequence Numbers provide precise knowledge of replication state for smart cache management.
4. **Smart Query Management**
Reads merge data from database + buffer + read buffer for consistent views even during replication lag.
5. **Dual Authentication**
Supports both GitHub OAuth (for UI users) and API tokens (for service scripts) through unified interface.
Architecture Diagram
--------------------
High-level system architecture:
.. mermaid::
graph TB
subgraph "Client Layer"
UI[Web Frontend
ops-db-ui]
Scripts[Observatory
Scripts]
end
subgraph "API Layer"
FastAPI[FastAPI Application]
Routers[Routers
transfer, obs_unit,
executed_obs_units, etc.]
Auth[Authentication
GitHub OAuth + API Tokens]
end
subgraph "Business Logic"
TxBuilder[Transaction Builder]
TxManager[Transaction Manager]
SmartQuery[Smart Query Manager]
end
subgraph "Infrastructure"
Redis[Redis
Buffer + Cache]
BgProcessor[Background Processor]
LSNTracker[LSN Tracker]
end
subgraph "Data Layer"
MainDB[(Main Database
Cologne)]
ReplicaDB[(Replica Database
Observatory)]
end
UI -->|HTTP/WS| FastAPI
Scripts -->|HTTP| FastAPI
FastAPI --> Auth
FastAPI --> Routers
Routers --> TxBuilder
Routers --> SmartQuery
TxBuilder --> TxManager
TxManager --> Redis
TxManager --> BgProcessor
BgProcessor --> MainDB
BgProcessor --> LSNTracker
LSNTracker --> ReplicaDB
SmartQuery --> ReplicaDB
SmartQuery --> Redis
MainDB -.->|Replication| ReplicaDB
style FastAPI fill:#90EE90
style Redis fill:#FFD700
style MainDB fill:#87CEEB
style ReplicaDB fill:#FFB6C1
Component Responsibilities
---------------------------
API Layer
~~~~~~~~~
**FastAPI Application** (``main.py``):
* Application lifecycle management
* Router registration
* CORS configuration
* WebSocket connection tracking
* Startup/shutdown hooks
**Routers**:
* UI-focused: ``transfer``, ``observing_program``, ``sources``, ``visibility``, ``instruments``
* Operations-focused: ``executed_obs_units``, ``raw_data_files``, ``raw_data_package``, ``staging``
* Shared: ``auth``, ``github_auth``, ``api_tokens``, ``site``, ``demo``
**Authentication**:
* Unified token validation (JWT + API tokens)
* Role-based access control (RBAC)
* Permission-based authorization
* Usage tracking for API tokens
Business Logic Layer
~~~~~~~~~~~~~~~~~~~~
**Transaction Builder**:
* Constructs multi-step database transactions
* Generates pre-allocated IDs
* Manages dependencies between steps
* Supports CREATE, UPDATE, DELETE, BULK_CREATE operations
**Transaction Manager**:
* Buffers transactions to Redis
* Manages retry logic and failed queue
* Provides transaction status queries
* Implements write-through caching
**Smart Query Manager**:
* Merges database + buffered + read buffer data
* Handles type conversion for filtering
* Retrieves related records via foreign keys
* Deduplicates and prioritizes fresher data
Infrastructure Layer
~~~~~~~~~~~~~~~~~~~~
**Redis**:
* Transaction buffer (list: LPUSH/RPOP)
* Transaction status (hash with TTL)
* Write-through cache (generated IDs)
* Buffered data cache (for smart queries)
* Read buffer (mutable updates to buffered records)
**Background Processor**:
* Polls transaction buffer continuously
* Executes buffered transactions on main DB
* Implements retry with exponential backoff
* Health monitoring and statistics
**LSN Tracker**:
* Captures LSN after main DB writes
* Polls replica for replication progress
* Determines when to cleanup caches
* Extends cache TTL if replication delayed
Data Layer
~~~~~~~~~~
**Main Database** (PostgreSQL):
* Single authoritative source of truth
* Accepts all write operations
* Generates WAL for replication
* Located in Cologne, Germany
**Replica Database** (PostgreSQL):
* Read-only streaming replica
* Receives WAL from main database
* Serves local reads at secondary sites
* Located at observatory (Chile) and potentially other sites
Request Flow Examples
---------------------
UI Read Request
~~~~~~~~~~~~~~~
.. mermaid::
sequenceDiagram
participant UI as Web Frontend
participant API as FastAPI
participant Auth as Authentication
participant Router as Transfer Router
participant DB as Local Database
UI->>API: GET /api/transfer/overview
API->>Auth: Verify JWT token
Auth-->>API: User authenticated
API->>Router: Route to handler
Router->>DB: Query transfers
DB-->>Router: Transfer data
Router-->>API: Format response
API-->>UI: JSON response
Observatory Write Request (Buffered)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. mermaid::
sequenceDiagram
participant Script as Observatory Script
participant API as FastAPI
participant Auth as Authentication
participant Router as Executed Obs Router
participant Builder as Transaction Builder
participant Manager as Transaction Manager
participant Redis as Redis Buffer
Script->>API: POST /executed_obs_units/start
API->>Auth: Verify API token
Auth-->>API: Service authenticated
API->>Router: Route to handler (@critical_operation)
Router->>Builder: Build transaction
Builder->>Builder: Generate UUID
Builder-->>Router: Transaction with pre-gen ID
Router->>Manager: Buffer transaction
Manager->>Redis: LPUSH to buffer
Redis-->>Manager: OK
Manager-->>Router: Transaction ID
Router-->>API: 201 Created
API-->>Script: {"id": "uuid", "status": "buffered"}
Background Processing
~~~~~~~~~~~~~~~~~~~~~~
.. mermaid::
sequenceDiagram
participant BG as Background Processor
participant Redis as Redis Buffer
participant Executor as Transaction Executor
participant MainDB as Main Database
participant LSN as LSN Tracker
participant Replica as Replica Database
loop Every 1 second
BG->>Redis: RPOP from buffer
Redis-->>BG: Transaction
BG->>Executor: Execute transaction
Executor->>MainDB: INSERT/UPDATE/DELETE
MainDB-->>Executor: Success
Executor->>MainDB: SELECT pg_current_wal_lsn()
MainDB-->>Executor: LSN: 0/12345678
Executor-->>BG: Success + LSN
BG->>LSN: Check replication (LSN: 0/12345678)
LSN->>Replica: SELECT pg_last_wal_replay_lsn()
Replica-->>LSN: LSN: 0/12345600 (behind)
LSN-->>BG: Not yet replicated
BG->>Redis: Extend cache TTL
end
Section Contents
----------------
Explore the architecture in detail:
.. toctree::
:maxdepth: 1
system-overview
database-topology
site-configuration
authentication-system
endpoint-categories
Related Sections
----------------
* :doc:`../philosophy/distributed-architecture` - Why this architecture
* :doc:`../deep-dive/index` - Implementation deep dives
* :doc:`../quickstart/installation` - Getting started
Key Takeaways
-------------
The architecture is designed with several key principles:
1. **Network Resilience**: Operations never fail due to network issues (transaction buffering)
2. **Precise Replication Tracking**: LSN-based tracking eliminates guesswork
3. **Consistent Views**: Smart queries merge multiple data sources
4. **Flexible Authentication**: Supports both interactive users and automation
5. **Site-Aware Behavior**: Automatically adapts to site type (main vs secondary)
This architecture enables reliable operation in challenging network environments while maintaining data consistency and providing responsive user experiences.