Architecture Overview ===================== .. verified:: 2025-11-12 :reviewer: Christof Buchbender This section provides a technical deep dive into the ops-db-api architecture, including system design, database topology, site configuration, authentication, and endpoint categorization. .. contents:: Table of Contents :local: :depth: 2 Introduction ------------ The ops-db-api is built on a distributed database architecture designed to ensure observatory operations never fail due to network issues. This architecture combines: * **FastAPI** for modern async Python web framework * **PostgreSQL** for relational database with streaming replication * **Redis** for transaction buffering and caching * **SQLAlchemy** for ORM and database abstraction * **Custom transaction buffering** for network resilience Key Architectural Features --------------------------- 1. **Site-Aware Behavior** The API behaves differently based on site type (MAIN vs SECONDARY), automatically routing operations appropriately. 2. **Transaction Buffering** Critical operations at secondary sites buffer in Redis and execute asynchronously against the main database. 3. **LSN-Based Replication Tracking** PostgreSQL Log Sequence Numbers provide precise knowledge of replication state for smart cache management. 4. **Smart Query Management** Reads merge data from database + buffer + read buffer for consistent views even during replication lag. 5. **Dual Authentication** Supports both GitHub OAuth (for UI users) and API tokens (for service scripts) through unified interface. Architecture Diagram -------------------- High-level system architecture: .. mermaid:: graph TB subgraph "Client Layer" UI[Web Frontend
ops-db-ui] Scripts[Observatory
Scripts] end subgraph "API Layer" FastAPI[FastAPI Application] Routers[Routers
transfer, obs_unit,
executed_obs_units, etc.] Auth[Authentication
GitHub OAuth + API Tokens] end subgraph "Business Logic" TxBuilder[Transaction Builder] TxManager[Transaction Manager] SmartQuery[Smart Query Manager] end subgraph "Infrastructure" Redis[Redis
Buffer + Cache] BgProcessor[Background Processor] LSNTracker[LSN Tracker] end subgraph "Data Layer" MainDB[(Main Database
Cologne)] ReplicaDB[(Replica Database
Observatory)] end UI -->|HTTP/WS| FastAPI Scripts -->|HTTP| FastAPI FastAPI --> Auth FastAPI --> Routers Routers --> TxBuilder Routers --> SmartQuery TxBuilder --> TxManager TxManager --> Redis TxManager --> BgProcessor BgProcessor --> MainDB BgProcessor --> LSNTracker LSNTracker --> ReplicaDB SmartQuery --> ReplicaDB SmartQuery --> Redis MainDB -.->|Replication| ReplicaDB style FastAPI fill:#90EE90 style Redis fill:#FFD700 style MainDB fill:#87CEEB style ReplicaDB fill:#FFB6C1 Component Responsibilities --------------------------- API Layer ~~~~~~~~~ **FastAPI Application** (``main.py``): * Application lifecycle management * Router registration * CORS configuration * WebSocket connection tracking * Startup/shutdown hooks **Routers**: * UI-focused: ``transfer``, ``observing_program``, ``sources``, ``visibility``, ``instruments`` * Operations-focused: ``executed_obs_units``, ``raw_data_files``, ``raw_data_package``, ``staging`` * Shared: ``auth``, ``github_auth``, ``api_tokens``, ``site``, ``demo`` **Authentication**: * Unified token validation (JWT + API tokens) * Role-based access control (RBAC) * Permission-based authorization * Usage tracking for API tokens Business Logic Layer ~~~~~~~~~~~~~~~~~~~~ **Transaction Builder**: * Constructs multi-step database transactions * Generates pre-allocated IDs * Manages dependencies between steps * Supports CREATE, UPDATE, DELETE, BULK_CREATE operations **Transaction Manager**: * Buffers transactions to Redis * Manages retry logic and failed queue * Provides transaction status queries * Implements write-through caching **Smart Query Manager**: * Merges database + buffered + read buffer data * Handles type conversion for filtering * Retrieves related records via foreign keys * Deduplicates and prioritizes fresher data Infrastructure Layer ~~~~~~~~~~~~~~~~~~~~ **Redis**: * Transaction buffer (list: LPUSH/RPOP) * Transaction status (hash with TTL) * Write-through cache (generated IDs) * Buffered data cache (for smart queries) * Read buffer (mutable updates to buffered records) **Background Processor**: * Polls transaction buffer continuously * Executes buffered transactions on main DB * Implements retry with exponential backoff * Health monitoring and statistics **LSN Tracker**: * Captures LSN after main DB writes * Polls replica for replication progress * Determines when to cleanup caches * Extends cache TTL if replication delayed Data Layer ~~~~~~~~~~ **Main Database** (PostgreSQL): * Single authoritative source of truth * Accepts all write operations * Generates WAL for replication * Located in Cologne, Germany **Replica Database** (PostgreSQL): * Read-only streaming replica * Receives WAL from main database * Serves local reads at secondary sites * Located at observatory (Chile) and potentially other sites Request Flow Examples --------------------- UI Read Request ~~~~~~~~~~~~~~~ .. mermaid:: sequenceDiagram participant UI as Web Frontend participant API as FastAPI participant Auth as Authentication participant Router as Transfer Router participant DB as Local Database UI->>API: GET /api/transfer/overview API->>Auth: Verify JWT token Auth-->>API: User authenticated API->>Router: Route to handler Router->>DB: Query transfers DB-->>Router: Transfer data Router-->>API: Format response API-->>UI: JSON response Observatory Write Request (Buffered) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. mermaid:: sequenceDiagram participant Script as Observatory Script participant API as FastAPI participant Auth as Authentication participant Router as Executed Obs Router participant Builder as Transaction Builder participant Manager as Transaction Manager participant Redis as Redis Buffer Script->>API: POST /executed_obs_units/start API->>Auth: Verify API token Auth-->>API: Service authenticated API->>Router: Route to handler (@critical_operation) Router->>Builder: Build transaction Builder->>Builder: Generate UUID Builder-->>Router: Transaction with pre-gen ID Router->>Manager: Buffer transaction Manager->>Redis: LPUSH to buffer Redis-->>Manager: OK Manager-->>Router: Transaction ID Router-->>API: 201 Created API-->>Script: {"id": "uuid", "status": "buffered"} Background Processing ~~~~~~~~~~~~~~~~~~~~~~ .. mermaid:: sequenceDiagram participant BG as Background Processor participant Redis as Redis Buffer participant Executor as Transaction Executor participant MainDB as Main Database participant LSN as LSN Tracker participant Replica as Replica Database loop Every 1 second BG->>Redis: RPOP from buffer Redis-->>BG: Transaction BG->>Executor: Execute transaction Executor->>MainDB: INSERT/UPDATE/DELETE MainDB-->>Executor: Success Executor->>MainDB: SELECT pg_current_wal_lsn() MainDB-->>Executor: LSN: 0/12345678 Executor-->>BG: Success + LSN BG->>LSN: Check replication (LSN: 0/12345678) LSN->>Replica: SELECT pg_last_wal_replay_lsn() Replica-->>LSN: LSN: 0/12345600 (behind) LSN-->>BG: Not yet replicated BG->>Redis: Extend cache TTL end Section Contents ---------------- Explore the architecture in detail: .. toctree:: :maxdepth: 1 system-overview database-topology site-configuration authentication-system endpoint-categories Related Sections ---------------- * :doc:`../philosophy/distributed-architecture` - Why this architecture * :doc:`../deep-dive/index` - Implementation deep dives * :doc:`../quickstart/installation` - Getting started Key Takeaways ------------- The architecture is designed with several key principles: 1. **Network Resilience**: Operations never fail due to network issues (transaction buffering) 2. **Precise Replication Tracking**: LSN-based tracking eliminates guesswork 3. **Consistent Views**: Smart queries merge multiple data sources 4. **Flexible Authentication**: Supports both interactive users and automation 5. **Site-Aware Behavior**: Automatically adapts to site type (main vs secondary) This architecture enables reliable operation in challenging network environments while maintaining data consistency and providing responsive user experiences.