Data Lifecycle Management ========================= .. verified:: 2025-11-06 :reviewer: Christof Buchbender TBD needs cleanup to make function refererences work. The Data Transfer System implements intelligent data lifecycle policies that balance storage efficiency with data safety. This document explains how data moves through its lifecycle and when deletion operations occur. Lifecycle Philosophy -------------------- **Safety First** Data is never deleted from source locations until: 1. Verified to exist in at least one long-term archive 2. Checksums validated at destination 3. Archive marked as ARCHIVED in database **Storage Efficiency** Temporary copies are cleaned up based on: * Buffer location status and disk pressure * Completion of transfers and unpacking operations * Processing job completion and retention policies **Data Flow** Data progresses through stages with different retention policies: .. code-block:: text SOURCE (Raw Files) ↓ Package into tar BUFFER (Temporary at source site) ↓ Transfer between sites BUFFER (Temporary at LTA site) ↓ Unpack and move to permanent storage LONG_TERM_ARCHIVE (Permanent) ↓ Stage for processing (optional) PROCESSING (Temporary) ↓ Cleanup after analysis completes [Deleted from temporary locations] .. seealso:: :doc:`pipeline` - Complete data flow through the system :doc:`monitoring` - Buffer monitoring and alerting Data States ----------- PhysicalCopyStatus ~~~~~~~~~~~~~~~~~~ :py:class:`ccat_ops_db.models.PhysicalCopyStatus` is the state of data at each physical location. The primary states are: * ``PRESENT`` - File exists and is accessible at this location * ``STAGED`` - Package unpacked (used in PROCESSING locations), archive deleted to save space * ``DELETION_SCHEDULED`` - Marked for deletion, task queued * ``DELETION_IN_PROGRESS`` - Currently being deleted * ``DELETION_POSSIBLE`` - (RawDataFiles only) Parent package deleted, eligible for conditional deletion * ``DELETED`` - Successfully removed from this location * ``DELETION_FAILED`` - Deletion attempt failed State Transitions ^^^^^^^^^^^^^^^^^ For normal deletions: .. code-block:: text PRESENT → DELETION_SCHEDULED → DELETION_IN_PROGRESS → DELETED For failed deletions: .. code-block:: text DELETION_IN_PROGRESS → FAILED (marked for retry) For RawDataFiles in SOURCE/BUFFER locations: .. code-block:: text PRESENT → DELETION_POSSIBLE → DELETION_SCHEDULED → ... PackageState ~~~~~~~~~~~~ :py:class:`ccat_ops_db.models.PackageState` is the high-level state of packages in the pipeline: * ``WAITING`` - At source location only, not yet transferred * ``TRANSFERRING`` - Part of active DataTransferPackage * ``ARCHIVED`` - Safe in long-term archive storage * ``FAILED`` - Transfer or archive operation failed **Safety Rule**: Data at SOURCE locations is only eligible for deletion when package state is ``ARCHIVED``. Deletion Manager ---------------- The :py:mod:`ccat_data_transfer.deletion_manager` module implements all cleanup policies and runs continuously to process eligible data. Main Entry Point ~~~~~~~~~~~~~~~~ .. autofunction:: ccat_data_transfer.deletion_manager.delete_data_packages :noindex: This is the main orchestration function that coordinates all deletion operations: .. literalinclude:: ../../ccat_data_transfer/deletion_manager.py :pyobject: delete_data_packages :language: python The deletion manager cycles through the following operations: 1. Delete DataTransferPackages from buffers 2. Delete RawDataPackages from SOURCE and LTA buffers 3. Delete individual RawDataFiles from processing locations 4. Delete staged (unpacked) files from processing locations 5. Process RawDataFiles marked as ``DELETION_POSSIBLE`` Deletion Decision Logic ----------------------- The system uses specific conditions to determine when data can be safely deleted from each location type. RawDataPackages ~~~~~~~~~~~~~~~ From SOURCE Site Buffers ^^^^^^^^^^^^^^^^^^^^^^^^^ :py:func:`ccat_data_transfer.deletion_manager.can_delete_raw_data_package_from_source_buffer` A RawDataPackage can be deleted from SOURCE site buffers when: 1. Location is of type ``BUFFER`` at a SOURCE site 2. Package exists in at least one ``LONG_TERM_ARCHIVE`` location (not just LTA site buffer) 3. Physical copy at LTA has status ``PRESENT`` **Side Effect**: When a :class:`~ccat_ops_db.models.RawDataPackage` is deleted from SOURCE, all associated :class:`~ccat_ops_db.models.RawDataFile` are marked as ``DELETION_POSSIBLE``. From LTA Site Buffers ^^^^^^^^^^^^^^^^^^^^^ :py:func:`ccat_data_transfer.deletion_manager.can_delete_raw_data_package_from_lta_buffer` A :class:`~ccat_ops_db.models.RawDataPackage` can be deleted from LTA site buffers when: 1. Location is of type ``BUFFER`` at an LTA site 2. Package exists in the actual :class:`~ccat_ops_db.models.DataLocation` with type ``LONG_TERM_ARCHIVE`` at the same site 3. Physical copy at LTA has status ``PRESENT`` Never Deleted From ^^^^^^^^^^^^^^^^^^ * :class:`~ccat_ops_db.models.DataLocation` with type ``LONG_TERM_ARCHIVE`` - These provide permanent storage and data is never automatically deleted Implementation ^^^^^^^^^^^^^^ :py:func:`ccat_data_transfer.deletion_manager.delete_raw_data_packages_bulk` Bulk deletion implementation for RawDataPackages: .. literalinclude:: ../../ccat_data_transfer/deletion_manager.py :pyobject: delete_raw_data_packages_bulk :language: python :linenos: DataTransferPackages ~~~~~~~~~~~~~~~~~~~~ DataTransferPackages are temporary containers that exist only during the transfer process. From SOURCE Site Buffers ^^^^^^^^^^^^^^^^^^^^^^^^^ :py:func:`ccat_data_transfer.deletion_manager.can_delete_data_transfer_package_from_source_buffer` A :class:`~ccat_ops_db.models.DataTransferPackage` can be deleted from SOURCE site buffers when: 1. Location is of type ``BUFFER`` at a SOURCE site 2. Has completed :class:`~ccat_ops_db.models.DataTransfer` to at least one LTA site 3. Transfer has ``unpack_status`` of ``COMPLETED`` From LTA Site Buffers ^^^^^^^^^^^^^^^^^^^^^ :py:func:`ccat_data_transfer.deletion_manager.can_delete_data_transfer_package_from_lta_buffer` A :class:`~ccat_ops_db.models.DataTransferPackage` can be deleted from LTA site buffers when: 1. Location is of type ``BUFFER`` at an LTA site 2. Package has been successfully transferred and unpacked at ALL other LTA site buffers 3. Uses round-robin routing logic to determine expected destinations Never Stored In ^^^^^^^^^^^^^^^ * :class:`~ccat_ops_db.models.DataLocation` with type ``LONG_TERM_ARCHIVE`` - DataTransferPackages are unpacked at LTA site buffers; only the extracted :class:`~ccat_ops_db.models.RawDataPackage` are moved to LTA storage Implementation ^^^^^^^^^^^^^^ :py:func:`ccat_data_transfer.deletion_manager.delete_data_transfer_packages` RawDataFiles ~~~~~~~~~~~~ RawDataFiles follow a two-stage deletion process to handle the large number of individual files efficiently. Stage 1: Marking as DELETION_POSSIBLE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When a parent :class:`~ccat_ops_db.models.RawDataPackage` is deleted from SOURCE, all associated :class:`~ccat_ops_db.models.RawDataFile` are marked as ``DELETION_POSSIBLE``: :py:func:`ccat_data_transfer.deletion_manager.mark_raw_data_files_for_deletion` This uses bulk database updates to avoid performance issues: .. literalinclude:: ../../ccat_data_transfer/deletion_manager.py :pyobject: mark_raw_data_files_for_deletion :language: python :linenos: Stage 2: Conditional Deletion ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Files marked as ``DELETION_POSSIBLE`` are processed based on retention policies and buffer status: :py:func:`ccat_data_transfer.deletion_manager.process_deletion_possible_raw_data_files` The system considers: * Retention period compliance * Buffer disk usage and pressure * Location-specific rules * Access patterns Processing Location Cleanup ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RawDataFiles in PROCESSING locations follow different rules based on staging job status. PRESENT Files (Active Jobs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :py:func:`ccat_data_transfer.deletion_manager.delete_processing_raw_data_files` Files in PROCESSING locations are deleted when: 1. No active :class:`~ccat_ops_db.models.StagingJob` references them 2. All staging jobs using these files have ``active=False`` :py:func:`ccat_data_transfer.deletion_manager.find_deletable_processing_raw_data_files` STAGED Files (Completed Jobs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :py:func:`ccat_data_transfer.deletion_manager.delete_staged_raw_data_files_from_processing` After staging jobs complete, unpacked files are cleaned up: 1. Finds RawDataPackages with status ``STAGED`` in PROCESSING locations 2. Verifies all staging jobs for these packages have ``active=False`` 3. Schedules bulk deletion of individual RawDataFiles :py:func:`ccat_data_transfer.deletion_manager.find_deletable_staged_raw_data_files_by_location` .. literalinclude:: ../../ccat_data_transfer/deletion_manager.py :pyobject: find_deletable_staged_raw_data_files_by_location :language: python :linenos: Deletion Decision Matrix ------------------------- The following table summarizes when data is eligible for deletion: .. list-table:: Deletion Rules Summary :header-rows: 1 :widths: 20 20 30 30 * - Data Type - Location Type - Deletion Condition - Safety Requirement * - RawDataPackage - SOURCE Buffer - Exists in LTA DataLocation - ≥1 LTA DataLocation copy with PRESENT status * - RawDataPackage - LTA Site Buffer - Exists in same site's LTA DataLocation - Same site LTA DataLocation copy with PRESENT status * - RawDataPackage - LTA DataLocation - **Never (automatic)** - N/A - Permanent storage * - DataTransferPackage - SOURCE Buffer - Verified at LTA site buffer - Completed transfer + unpack to ≥1 LTA site * - DataTransferPackage - LTA Site Buffer - Replicated to all other LTA sites - Completed transfers to all LTA sites * - DataTransferPackage - LTA DataLocation - **Not stored here** - N/A - Temporary containers only * - RawDataFile - SOURCE/BUFFER - Parent package deleted + retention/buffer rules - DELETION_POSSIBLE status + policy compliance * - RawDataFile - PROCESSING - No active staging jobs - All StagingJobs have active=False Worker Implementation --------------------- Deletion tasks execute on workers with direct access to the storage locations. Deletion Task Base Class ~~~~~~~~~~~~~~~~~~~~~~~~~ .. autoclass:: ccat_data_transfer.deletion_manager.DeletionTask :members: :noindex: Single File Deletion ~~~~~~~~~~~~~~~~~~~~ :py:func:`ccat_data_transfer.deletion_manager.delete_physical_copy` This Celery task handles deletion of a single physical copy: .. literalinclude:: ../../ccat_data_transfer/deletion_manager.py :pyobject: delete_physical_copy :language: python :linenos: Bulk Deletion Operations ~~~~~~~~~~~~~~~~~~~~~~~~~ For efficiency, the system batches deletions: **Bulk RawDataFile Deletion**: :py:func:`ccat_data_transfer.deletion_manager.delete_bulk_raw_data_files` **Bulk RawDataPackage Deletion**: :py:func:`ccat_data_transfer.deletion_manager.delete_bulk_raw_data_packages` Internal Implementation ^^^^^^^^^^^^^^^^^^^^^^^ The internal bulk deletion function handles the actual deletion work: .. literalinclude:: ../../ccat_data_transfer/deletion_manager.py :pyobject: _delete_bulk_raw_data_files_internal :language: python :linenos: **Benefits of Bulk Operations**: * Reduces number of Celery task submissions * Decreases database transaction overhead * Enables more efficient resource utilization * Faster overall deletion throughput Buffer Management Integration ------------------------------ The deletion manager integrates with the buffer monitoring system to respond to disk pressure. Buffer Manager ~~~~~~~~~~~~~~ :py:class:`ccat_data_transfer.buffer_manager.BufferManager` The buffer manager continuously monitors disk usage: .. literalinclude:: ../../ccat_data_transfer/buffer_manager.py :pyobject: BufferManager._check_thresholds :language: python :linenos: Buffer Thresholds ~~~~~~~~~~~~~~~~~ Thresholds are configured per environment in :file:`settings.toml`: .. literalinclude:: ../../ccat_data_transfer/config/settings.toml :start-after: # Buffer Management Settings :end-before: # Maximum number of parallel transfers :language: toml For production environment: .. literalinclude:: ../../ccat_data_transfer/config/settings.toml :lines: 84-87 :language: toml Buffer Status Integration ~~~~~~~~~~~~~~~~~~~~~~~~~ :py:func:`ccat_data_transfer.deletion_manager.get_buffer_status_for_location` :py:func:`ccat_data_transfer.deletion_manager.should_delete_based_on_buffer_status` The system uses different thresholds for different location types: .. literalinclude:: ../../ccat_data_transfer/deletion_manager.py :pyobject: should_delete_based_on_buffer_status :language: python :linenos: Escalating Response to Disk Pressure ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The system adapts its behavior based on buffer conditions: .. code-block:: text < 70%: Normal operations • Standard retention policies • Full parallel transfer capacity 70-85%: Warning state • Logged warnings • Normal deletion continues 85-95%: Critical state • Reduced parallel transfers • Accelerated deletion of eligible data • More frequent manager cycles > 95%: Emergency state • New data creation may be paused • Aggressive cleanup of all eligible data • Administrator alerts sent • Minimal parallel transfers Configuration ------------- Deletion Manager Settings ~~~~~~~~~~~~~~~~~~~~~~~~~ Key configuration parameters from :mod:`ccat_data_transfer.config.config`: .. literalinclude:: ../../ccat_data_transfer/config/settings.toml :lines: 30-35 :language: toml Manager Sleep Times ^^^^^^^^^^^^^^^^^^^ Control how frequently each manager checks for work: .. code-block:: toml RAW_DATA_PACKAGE_MANAGER_SLEEP_TIME = 10 # seconds DATA_TRANSFER_PACKAGE_MANAGER_SLEEP_TIME = 5 TRANSFER_MANAGER_SLEEP_TIME = 5 DELETION_MANAGER_SLEEP_TIME = 5 # Deletion check frequency STAGING_MANAGER_SLEEP_TIME = 5 Retention Policies ~~~~~~~~~~~~~~~~~~ .. literalinclude:: ../../ccat_data_transfer/config/settings.toml :lines: 43-44 :language: toml * ``RETENTION_PERIOD_MINUTES`` - Default retention for processing data (30 days = 43200 minutes) * ``DISK_USAGE_THRESHOLD_PERCENT`` - Threshold that triggers accelerated cleanup Transfer Limits ~~~~~~~~~~~~~~~ .. literalinclude:: ../../ccat_data_transfer/config/settings.toml :lines: 41-42 :language: toml These settings control how the system responds to buffer pressure: * ``MAX_CRITICAL_TRANSFERS`` - Maximum parallel transfers when buffer is critical (1) * ``MAX_NORMAL_TRANSFERS`` - Maximum parallel transfers under normal conditions (5) Location-Specific Overrides ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Individual :class:`~ccat_ops_db.models.DataLocation` instances can override defaults with custom retention policies. Staging and STAGED Status -------------------------- The ``STAGED`` status has special meaning in PROCESSING locations. What is STAGED? ~~~~~~~~~~~~~~~~ When a :class:`~ccat_ops_db.models.StagingJob` completes: 1. RawDataPackage is transferred to PROCESSING location 2. Package (tar archive) is unpacked 3. Individual RawDataFiles are extracted 4. PhysicalCopy records created for each RawDataFile 5. Original package archive is **deleted** to save space 6. RawDataPackagePhysicalCopy status set to ``STAGED`` This means "unpacked and archive removed": .. literalinclude:: ../../ccat_data_transfer/staging_manager.py :pyobject: _mark_package_as_staged_and_cleanup :language: python :linenos: Cleanup Process ~~~~~~~~~~~~~~~ When staging jobs complete (``active=False``): 1. System identifies STAGED packages with inactive jobs 2. Finds all RawDataFile physical copies for these packages 3. Schedules bulk deletion of individual files 4. Updates RawDataPackagePhysicalCopy to ``DELETED`` This two-phase approach (unpack then delete) allows: * Efficient access to individual files during processing * Space savings by removing redundant archives * Clean separation between "in use" and "cleanup ready" states Deletion Audit Trail -------------------- All deletions are logged and tracked for accountability. Database Records ~~~~~~~~~~~~~~~~ :class:`~ccat_ops_db.models.PhysicalCopy` records are never deleted from the database, only marked: .. code-block:: python class PhysicalCopy: status: PhysicalCopyStatus # DELETED deleted_at: datetime # When deletion occurred # Additional tracking fields depend on subclass PhysicalCopy subclasses retain their records to maintain a complete audit trail: * :class:`~ccat_ops_db.models.RawDataFilePhysicalCopy` * :class:`~ccat_ops_db.models.RawDataPackagePhysicalCopy` * :class:`~ccat_ops_db.models.DataTransferPackagePhysicalCopy` Deletion Logging ~~~~~~~~~~~~~~~~ The deletion manager includes helper functions for structured logging: .. code-block:: python def _add_deletion_log( session: Session, physical_copy: models.PhysicalCopy, message: str ) -> None: """Add deletion log entry for audit trail.""" # Logs include: # - Timestamp # - Physical copy ID and type # - Location information # - Reason for deletion # - Success/failure status Query Deletion History ~~~~~~~~~~~~~~~~~~~~~~ Database queries can retrieve deletion history: .. code-block:: sql -- Show all deletions in last 24 hours SELECT pc.id, pc.type, pc.status, pc.deleted_at, dl.name as location_name FROM physical_copy pc JOIN data_location dl ON pc.data_location_id = dl.id WHERE pc.status = 'DELETED' AND pc.deleted_at > NOW() - INTERVAL '24 hours' ORDER BY pc.deleted_at DESC; Log Files ~~~~~~~~~ Structured logs capture deletion details using the centralized logging system: .. code-block:: json { "timestamp": "2024-11-27T10:30:00Z", "level": "INFO", "logger": "ccat_data_transfer.deletion_manager", "event": "physical_copy_deleted", "physical_copy_id": 12345, "copy_type": "raw_data_file", "location": "ccat_telescope_buffer", "size_bytes": 1048576, "reason": "parent_package_archived" } Manual Deletion --------------- Administrators can manually trigger deletion operations when needed. .. warning:: Manual deletion should be used with caution. Always verify data exists in LTA locations before forcing deletion from SOURCE or BUFFER locations. Available CLI Commands ~~~~~~~~~~~~~~~~~~~~~~ The system provides limited CLI commands for inspection: **List Data Locations:** .. code-block:: bash # View all available locations ccat_data_transfer list-locations This shows all configured sites and their locations, useful for identifying location names for manual operations. **Monitor Disk Usage:** .. code-block:: bash # Monitor all active disk locations ccat_data_transfer disk-monitor --all # Monitor specific location ccat_data_transfer disk-monitor --location-name cologne_buffer # Monitor by site ccat_data_transfer disk-monitor --site cologne Python API for Manual Operations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For administrative scripting and manual deletion operations, use the Python API: **Inspect Deletable Data:** .. code-block:: python from ccat_data_transfer.deletion_manager import ( find_deletable_raw_data_packages_by_location, find_deletable_data_transfer_packages ) from ccat_data_transfer.database import DatabaseConnection # Get database connection db = DatabaseConnection() session, _ = db.get_connection() try: # Find deletable RawDataPackages by location deletable_packages = find_deletable_raw_data_packages_by_location(session) print("\n=== Deletable RawDataPackages ===") for location, packages in deletable_packages.items(): total_size = sum(p.size for p in packages) print(f"\nLocation: {location.name} ({location.location_type.value})") print(f" Site: {location.site.name}") print(f" Packages: {len(packages)}") print(f" Total size: {total_size / (1024**3):.2f} GB") # Find deletable DataTransferPackages deletable_transfers = find_deletable_data_transfer_packages(session) print("\n=== Deletable DataTransferPackages ===") for package, location in deletable_transfers: print(f"Package: {package.file_name}") print(f" Location: {location.name}") print(f" Size: {package.size / (1024**3):.2f} GB") finally: session.close() **Trigger Manual Deletion Cycle:** .. code-block:: python from ccat_data_transfer.deletion_manager import delete_data_packages # Run one deletion cycle with verbose logging delete_data_packages(verbose=True) print("Deletion cycle completed") **Schedule Specific Deletions:** .. code-block:: python from ccat_data_transfer.deletion_manager import delete_physical_copy from ccat_data_transfer.queue_discovery import route_task_by_location from ccat_data_transfer.operation_types import OperationType from ccat_data_transfer.database import DatabaseConnection from ccat_ops_db import models db = DatabaseConnection() session, _ = db.get_connection() try: # Find a specific physical copy to delete physical_copy = session.query(models.RawDataPackagePhysicalCopy).filter( models.RawDataPackagePhysicalCopy.id == 12345, models.RawDataPackagePhysicalCopy.status == models.PhysicalCopyStatus.PRESENT ).first() if physical_copy: # Safety check: Verify it's actually deletable # (Add your safety checks here based on location type and package state) # Mark as scheduled physical_copy.status = models.PhysicalCopyStatus.DELETION_SCHEDULED session.commit() # Route to appropriate queue queue_name = route_task_by_location( OperationType.DELETION, physical_copy.data_location ) # Schedule deletion task delete_physical_copy.apply_async( args=[physical_copy.id], kwargs={"queue_name": queue_name}, queue=queue_name ) print(f"Scheduled deletion of physical copy {physical_copy.id}") finally: session.close() Deletion Service Management ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The deletion manager runs as a continuous service. To control it: **Start the deletion manager:** .. code-block:: bash # Start as a service (runs continuously) ccat_data_transfer deletion-manager # Start with verbose logging ccat_data_transfer deletion-manager -v The deletion manager will run in a loop, checking for deletable data every ``DELETION_MANAGER_SLEEP_TIME`` seconds (default: 5 seconds). **In Docker Compose deployments:** The deletion manager runs as a service defined in ``docker-compose.yml``. To restart: .. code-block:: bash # Restart the deletion manager service docker-compose restart deletion-manager # View deletion manager logs docker-compose logs -f deletion-manager Safety Considerations ~~~~~~~~~~~~~~~~~~~~~ When performing manual deletions: 1. **Verify LTA copies exist** - Always check that data is safely in LTA before deleting from SOURCE 2. **Check package state** - Ensure RawDataPackage state is ``ARCHIVED`` 3. **Review deletion logs** - Check logs to understand why automatic deletion hasn't occurred 4. **Test in development first** - Run manual deletion scripts in dev environment 5. **Use transactions** - Wrap operations in database transactions for atomicity 6. **Monitor disk space** - Check if manual deletion is actually needed or if automatic cleanup is working Data Recovery ------------- If data is accidentally deleted, recovery options depend on the location type. Recovery from LTA ~~~~~~~~~~~~~~~~~ If data was deleted from PROCESSING or BUFFER locations: 1. Verify data exists in :class:`~ccat_ops_db.models.DataLocation` with type ``LONG_TERM_ARCHIVE`` 2. Create a new :class:`~ccat_ops_db.models.StagingJob` to re-stage the data 3. System will retrieve data from LTA and unpack to PROCESSING location 4. No actual data loss, just need to re-copy Recovery from SOURCE ~~~~~~~~~~~~~~~~~~~~ If data was deleted from SOURCE before reaching LTA (should never happen due to safety checks): 1. Check database for :class:`~ccat_ops_db.models.PhysicalCopy` records 2. Verify if package exists in any LTA location 3. If in LTA: Can be recovered via staging 4. If not in LTA: **Data may be permanently lost** - check backup systems Prevention Mechanisms ~~~~~~~~~~~~~~~~~~~~~ Multiple safeguards prevent accidental deletion: * SOURCE deletions require package state ``ARCHIVED`` * Double-check in worker before actual file deletion * Database transactions ensure consistency * Deletion manager logs all decisions * Physical copy records retained for audit Best Practices -------------- For Instrument Teams ~~~~~~~~~~~~~~~~~~~~ * **File data promptly** - Use ops-db-api to register new data quickly * **Never manually delete** - Let the system manage lifecycle automatically * **Monitor filing status** - Check ops-db-ui for package states * **Trust the system** - Automatic lifecycle management is safer than manual intervention For Administrators ~~~~~~~~~~~~~~~~~~ * **Monitor buffer trends** - Add capacity before reaching warning thresholds (70%) * **Review deletion logs** - Periodically check for unexpected patterns * **Adjust retention periods** - Tune based on actual usage patterns and disk capacity * **Test recovery procedures** - Regularly verify staging from LTA works correctly * **Monitor metrics** - Use InfluxDB dashboards to track deletion rates For Scientists ~~~~~~~~~~~~~~ * **Set appropriate retention** - Configure :class:`~ccat_ops_db.models.StagingJob` retention periods based on analysis needs * **Mark jobs inactive** - Set ``active=False`` when processing completes to enable cleanup * **Don't rely on PROCESSING** - Use LTA locations for long-term data access, not temporary processing areas * **Plan disk usage** - Consider data volume when creating multiple staging jobs For Developers ~~~~~~~~~~~~~~ * **Always check package state** - Verify ``ARCHIVED`` state before deleting from SOURCE * **Use bulk operations** - Batch deletions for efficiency when handling many files * **Add generous logging** - Structured logs are essential for debugging deletion issues * **Test deletion logic** - Thoroughly test edge cases in safety checks * **Consider race conditions** - Use database transactions and locks appropriately Troubleshooting --------------- Common Issues ~~~~~~~~~~~~~ **Data not deleting from SOURCE** Check: 1. Package state is ``ARCHIVED`` (not just ``TRANSFERRING``) 2. Physical copy exists in LTA location with status ``PRESENT`` 3. Deletion manager is running and processing location 4. Check logs for errors in deletion manager cycle **Buffer filling up** Solutions: 1. Verify deletion manager is running correctly 2. Check if data is actually reaching LTA 3. Review buffer thresholds in configuration 4. Consider increasing ``DELETION_MANAGER_SLEEP_TIME`` (more frequent cycles) 5. Manually trigger cleanup if needed **Files stuck in DELETION_POSSIBLE** This means files are waiting for retention/buffer policies: 1. Check buffer status for the location 2. Verify retention period settings 3. Review ``should_delete_based_on_buffer_status`` logic 4. Check if buffer monitoring is active Debugging ~~~~~~~~~ Enable verbose logging: .. code-block:: python from ccat_data_transfer.deletion_manager import delete_data_packages # Run with verbose logging delete_data_packages(verbose=True) Check Redis for buffer status: .. code-block:: python from ccat_data_transfer.deletion_manager import get_buffer_status_for_location status = get_buffer_status_for_location("cologne_buffer") print(f"Disk usage: {status.get('disk_usage_percent')}%") Query database for deletion candidates: .. code-block:: python from ccat_data_transfer.deletion_manager import ( find_deletable_raw_data_packages_by_location ) from ccat_data_transfer.database import DatabaseConnection db = DatabaseConnection() session, _ = db.get_connection() deletable = find_deletable_raw_data_packages_by_location(session) for location, packages in deletable.items(): print(f"{location.name}: {len(packages)} packages") Next Steps ---------- * :doc:`monitoring` - Buffer monitoring, metrics, and alerting * :doc:`pipeline` - Complete data flow including lifecycle stages * :doc:`concepts` - Data model fundamentals (PhysicalCopy, Package concepts) * :doc:`api_reference` - Complete API documentation for deletion functions .. seealso:: **Related Modules:** * :mod:`ccat_data_transfer.deletion_manager` - Deletion orchestration * :mod:`ccat_data_transfer.buffer_manager` - Buffer monitoring * :mod:`ccat_data_transfer.staging_manager` - Staging operations that create STAGED status * :mod:`ccat_ops_db.models` - Database models including PhysicalCopy and PackageState