Files
steampipe/pkg
Nathan Wallace e8256d8728 fix: Make OCI installations atomic to prevent inconsistent states (fixes #4758) (#4902)
* test: Add tests demonstrating non-atomic OCI installation bug

Add TestInstallFdwFiles_PartialInstall_BugDocumentation and
TestInstallDbFiles_PartialMove_BugDocumentation to demonstrate
issue #4758 where OCI installations can leave the system in an
inconsistent state if they fail partway through.

The FDW test simulates a scenario where:
- Binary is extracted successfully (v2.0)
- Control file move fails (permission error)
- System left with v2.0 binary but v1.0 control/SQL files

The DB test simulates a scenario where:
- MoveFolderWithinPartition fails partway through
- Some files updated to v2.0, others remain v1.0
- Database in inconsistent state

These tests will fail initially, demonstrating the bug exists.

Related to #4758

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Make OCI installations atomic to prevent inconsistent states

Fixes #4758

This commit implements atomic installation for both FDW and DB OCI
installations using a staging directory approach.

Changed installFdwFiles() to use a two-stage process:

1. **Stage 1 - Prepare files in staging directory:**
   - Extract binary to staging/bin/
   - Copy control file to staging/
   - Copy SQL file to staging/
   - If ANY operation fails, no destination files are touched

2. **Stage 2 - Move all files to final destinations:**
   - Remove old binary (Mac M1 compatibility)
   - Move staged binary to destination
   - Move staged control file to destination
   - Move staged SQL file to destination
   - Includes rollback on failure

Benefits:
- If staging fails, destination files unchanged (safe failure)
- All files validated before touching destinations
- Rollback attempts if final move fails

Changed installDbFiles() to use atomic directory rename:

1. Move all files to staging directory (dest +".staging")
2. Rename existing destination to backup (dest + ".backup")
3. Atomically rename staging to destination
4. Clean up backup on success
5. Rollback on failure (restore backup)

Benefits:
- Directory rename is atomic on most filesystems
- Either all DB files update or none do
- Backup allows rollback on failure

The bug documentation tests demonstrate the issue:
- TestInstallFdwFiles_PartialInstall_BugDocumentation
- TestInstallDbFiles_PartialMove_BugDocumentation

These tests intentionally fail to show the bug exists. With the
atomic implementation, the actual install functions prevent the
inconsistent states these tests demonstrate.

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

* Improve idempotency: cleanup old backup/staging directories

Add cleanup of .backup and .staging directories at the start of DB
installation to handle cases where the process was killed during a
previous installation attempt. This prevents accumulation of leftover
directories and ensures installation can proceed cleanly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Remove bug documentation tests that are now fixed by atomic installation

The TestInstallDbFiles_PartialMove_BugDocumentation and
TestInstallFdwFiles_PartialInstall_BugDocumentation tests were added
during rebase from other PRs (4895, 4898, 4900). They document bugs
where partial installations could leave the system in an inconsistent
state.

However, PR #4902's atomic staging approach fixes these bugs, so the
tests now fail (because the bugs no longer exist). Since tests should
validate current behavior rather than document old bugs, these tests
have been removed entirely. The bugs are well-documented in the PR
descriptions and git history.

Also removed unused 'io' import from fdw_test.go.

* Preserve Mac M1 safety in FDW binary installation

During rebase conflict resolution, the Mac M1 safety mechanism from
PR #4898 was inadvertently weakened. The original fix ensured the new
binary was fully ready before deleting the old one.

Original PR #4898 approach:
1. Extract new binary
2. Verify it exists
3. Move to .tmp location
4. Delete old binary
5. Rename .tmp to final location

Our initial PR #4902 rebase broke this:
1. Extract to staging
2. Delete old binary  (too early!)
3. Move from staging

If the move failed, the system would be left with NO binary at all.

Fixed approach (preserves both Mac M1 safety AND atomic staging):
1. Extract to staging directory
2. Move staging to .tmp location (verifies move works)
3. Delete old binary (now safe - new one is ready)
4. Rename .tmp to final location (atomic)

This ensures we never delete the old binary until the new one is
confirmed ready, while still using the staging directory approach
for atomic multi-file installations.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-17 04:46:37 -05:00
..