Database Topology ================= .. verified:: 2025-11-12 :reviewer: Christof Buchbender The ops-db-api uses a distributed PostgreSQL architecture with a single authoritative database and read-only replicas at remote sites. .. contents:: Table of Contents :local: :depth: 2 Overview -------- The database topology consists of: * **Main Database**: Cologne, Germany (write target) * **Replica Database(s)**: Observatory in Chile, potentially other sites (read-only) * **Streaming Replication**: PostgreSQL WAL-based continuous sync .. mermaid:: graph LR Main[(Main Database
Cologne
Read + Write)] Replica1[(Replica
Observatory
Read Only)] Replica2[(Replica
Institute
Read Only)] Main -.->|WAL Stream| Replica1 Main -.->|WAL Stream| Replica2 API_Main[API @ Main] -->|Writes| Main API_Main -->|Reads| Main API_Obs[API @ Observatory] -->|Buffered Writes| Main API_Obs -->|Reads| Replica1 style Main fill:#90EE90 style Replica1 fill:#FFB6C1 style Replica2 fill:#FFB6C1 Main Database (Cologne) ----------------------- Characteristics ~~~~~~~~~~~~~~~ **Role**: Single authoritative source of truth **Location**: Cologne, Germany (data center) **Access**: * Direct write access (low latency) * Direct read access (low latency) * Public endpoint for remote API instances **Configuration**: .. code-block:: ini # postgresql.conf wal_level = replica max_wal_senders = 10 wal_keep_size = 1GB synchronous_commit = on **Replication Settings**: .. code-block:: ini # Enable WAL archiving archive_mode = on archive_command = 'cp %p /archive/%f' # Replication slots (one per replica) max_replication_slots = 10 Schema Management ~~~~~~~~~~~~~~~~~ Database schema is managed by the ``ops-db`` package: .. code-block:: bash # Install ops-db package pip install git+https://github.com/ccatobs/ops-db.git # Models are imported from ccat_ops_db from ccat_ops_db import models **Key tables**: * ``executed_obs_unit`` - Observatory operation records * ``raw_data_package`` - Data file groupings * ``raw_data_file`` - Individual file metadata * ``observing_program`` - Planned observations * ``source`` - Astronomical sources * ``user`` - System users * ``api_token`` - Authentication tokens Write Operations ~~~~~~~~~~~~~~~~ All writes go through the main database: .. code-block:: python # At main site: Direct write async with main_db_session() as session: obs = models.ExecutedObsUnit(**data) session.add(obs) await session.commit() # At secondary site: Buffered write transaction = builder.create( model_class=models.ExecutedObsUnit, data=data ) await transaction_manager.buffer_transaction(transaction) # Background processor later writes to main database Replica Databases ----------------- Observatory Replica (Chile) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Role**: Local read-only copy for observatory operations **Location**: CCAT observatory, Chile (5600m altitude) **Purpose**: * Fast local reads (no network latency) * Continue operations during network outages * Merge with buffered data for consistent views **Connection from API**: .. code-block:: python # Site configuration for secondary site LOCAL_DB_HOST=localhost # Points to local replica LOCAL_DB_PORT=5432 **Replication Lag**: Typical lag depends on network conditions: * Good network: 1-10 seconds * Slow network: 10-60 seconds * Network down: Accumulates until reconnection Institute Replicas (Optional) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Additional replicas can be deployed at partner institutions: **Purpose**: * Distributed data access for scientists * Load balancing for read queries * Disaster recovery backup **Configuration**: Same as observatory replica PostgreSQL Streaming Replication --------------------------------- How It Works ~~~~~~~~~~~~ PostgreSQL uses Write-Ahead Logging (WAL) for replication: 1. **Main database writes to WAL** before applying changes 2. **WAL records sent to replicas** continuously or in batches 3. **Replicas replay WAL** to apply the same changes 4. **Replication lag** is the time between main write and replica apply WAL (Write-Ahead Log) ~~~~~~~~~~~~~~~~~~~~~ The WAL contains: * All data modifications (INSERT, UPDATE, DELETE) * Transaction commits and rollbacks * Schema changes (DDL) * Checkpoint records **WAL segment size**: Typically 16MB per file **WAL retention**: Configured to retain enough for replicas to catch up **WAL format**: Binary, PostgreSQL-specific LSN (Log Sequence Number) ~~~~~~~~~~~~~~~~~~~~~~~~~~ Each WAL record has a unique LSN: **Format**: ``file_offset/byte_offset`` Example: ``0/12345678`` **Properties**: * Monotonically increasing * Unique per transaction * Same across main and replicas * Used to track replication progress **Querying LSN**: .. code-block:: sql -- On main database: Current write position SELECT pg_current_wal_lsn(); -- Returns: 0/12345678 -- On replica: Last replayed position SELECT pg_last_wal_replay_lsn(); -- Returns: 0/12345600 (slightly behind) **Comparing LSNs**: .. code-block:: python def parse_lsn(lsn_string: str) -> int: """Convert LSN string to integer for comparison""" file_part, offset_part = lsn_string.split('/') return (int(file_part, 16) << 32) + int(offset_part, 16) main_lsn = parse_lsn("0/12345678") replica_lsn = parse_lsn("0/12345600") if replica_lsn >= main_lsn: print("Replica has caught up") else: bytes_behind = main_lsn - replica_lsn print(f"Replica is {bytes_behind} bytes behind") Replication Slots ~~~~~~~~~~~~~~~~~ **Purpose**: Ensure main database retains WAL until replica has consumed it **Creating a slot**: .. code-block:: sql SELECT pg_create_physical_replication_slot('observatory_slot'); **Monitoring slots**: .. code-block:: sql SELECT slot_name, active, restart_lsn, pg_current_wal_lsn() - restart_lsn AS bytes_behind FROM pg_replication_slots; **Danger**: Inactive slots can cause WAL accumulation and disk space issues Connection Configuration ------------------------ API Configuration ~~~~~~~~~~~~~~~~~ The API uses separate connection strings for main and local databases: **At Main Site** (Cologne): .. code-block:: bash SITE_TYPE=main MAIN_DB_HOST=localhost MAIN_DB_PORT=5432 LOCAL_DB_HOST=localhost # Same as main LOCAL_DB_PORT=5432 **At Secondary Site** (Observatory): .. code-block:: bash SITE_TYPE=secondary MAIN_DB_HOST=main-db.example.com # Remote MAIN_DB_PORT=5432 LOCAL_DB_HOST=localhost # Local replica LOCAL_DB_PORT=5432 Replication Monitoring ---------------------- Checking Replication Status ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **On main database**: .. code-block:: sql -- Active replication connections SELECT client_addr, state, sync_state, replay_lsn, pg_current_wal_lsn() - replay_lsn AS bytes_behind FROM pg_stat_replication; **On replica**: .. code-block:: sql -- Replication status SELECT pg_is_in_recovery(), -- Should be true pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp(); Replication Lag Metrics ~~~~~~~~~~~~~~~~~~~~~~~~ **Time-based lag**: .. code-block:: sql -- On replica SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag FROM pg_stat_replication; **Byte-based lag**: .. code-block:: sql -- On main database SELECT client_addr, pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS bytes_behind FROM pg_stat_replication; LSN Tracking in API ~~~~~~~~~~~~~~~~~~~ The API actively tracks replication using LSN: .. literalinclude:: ../../ccat_ops_db_api/transaction_buffering/lsn_tracker.py :language: python :lines: 18-52 :emphasize-lines: 11-12, 27-30 Failover and Recovery --------------------- Network Partition Scenarios ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Scenario 1: Observatory loses connection to main** * Buffering continues (Redis local) * Reads from local replica (may be stale) * Background processor retries periodically * When reconnected: Buffer drains automatically **Scenario 2: Main database goes down** * At main site: API becomes unavailable * At secondary site: Buffering continues * Reads work from replica * When main restored: Buffer processes **Scenario 3: Replica falls very behind** * Reads become increasingly stale * Smart query manager still merges buffer * LSN tracking detects lag * Alert monitoring triggers * Manual intervention may be needed Database Migrations ------------------- Schema changes must be coordinated: Migration Process ~~~~~~~~~~~~~~~~~ 1. **Develop migration** in ops-db package 2. **Test on development database** 3. **Apply to main database** (Cologne) 4. **Replicas receive automatically** via WAL 5. **Update API** to use new schema 6. **Deploy API changes** to all sites Example Migration ~~~~~~~~~~~~~~~~~ .. code-block:: python # In ops-db package # alembic/versions/xxx_add_observation_status.py def upgrade(): op.add_column( 'executed_obs_unit', sa.Column('detailed_status', sa.String(50)) ) def downgrade(): op.drop_column('executed_obs_unit', 'detailed_status') **Apply migration**: .. code-block:: bash # On main database alembic upgrade head **Replicas** receive the schema change automatically through WAL. Backup and Restore ------------------ Backup Strategy ~~~~~~~~~~~~~~~ **Main database**: * Continuous WAL archiving to S3/backup storage * Daily full backups with pg_dump * Point-in-time recovery capability * Retention: 30 days **Replicas**: * No backups needed (can rebuild from main) * Optional: Snapshot for faster rebuild Restore Procedure ~~~~~~~~~~~~~~~~~ .. code-block:: bash # Restore from backup pg_restore -d ccat_ops_db backup_file.dump # Or rebuild replica from main pg_basebackup -h main-db.example.com -D /var/lib/postgresql/data -U replication -v -P Performance Considerations -------------------------- Replication Impact on Main ~~~~~~~~~~~~~~~~~~~~~~~~~~~ **WAL generation**: Minimal overhead (< 5%) **Network bandwidth**: Depends on write volume * Light writes: < 1 MB/s * Heavy writes: 10-50 MB/s **Replication slots**: Can accumulate WAL if replica is down (monitor!) Replica Query Performance ~~~~~~~~~~~~~~~~~~~~~~~~~~ **Read-only**: Cannot create indexes not on main **Hot standby feedback**: Prevents query cancellation, may delay vacuum **Connection pooling**: Same as main database **Query routing**: API automatically queries local replica Summary ------- The database topology provides: * **Single source of truth**: Main database in Cologne * **Distributed reads**: Replicas at each site * **Automatic synchronization**: PostgreSQL streaming replication * **Precise tracking**: LSN-based replication monitoring * **Network resilience**: Operations continue during outages Key characteristics: * All writes → main database * Reads from local replica (with buffer merge) * WAL-based replication (continuous or near-continuous) * LSN tracking for cache management * Monitoring for replication health Next Steps ---------- * :doc:`site-configuration` - How sites are configured * :doc:`../deep-dive/transaction-buffering/lsn-tracking` - LSN tracking implementation * :doc:`../development/debugging-buffering` - Troubleshooting replication issues