# Advanced Features: Production-Ready Connector

**Summary:** Add enterprise features to your connector: schema evolution, dedupe mode with primary keys, CDC support for deletions, and performance optimization. After completing this guide, your connector will be production-ready.

---

## Prerequisites

**Complete:** [4-write-operations.md](./4-write-operations.md) - Your connector must have working append and overwrite modes.

---

## What You'll Build

After completing this guide, you'll have a production-ready connector with:

- **Schema evolution:** Automatic column add/drop/modify when source schema changes
- **Dedupe mode:** MERGE with primary key for incremental syncs
- **CDC support:** Handle hard/soft deletes from change data capture sources
- **Performance optimization:** Efficient batch writes for large datasets
- **Production readiness:** Suitable for enterprise use

---

## Advanced Phase 1: Schema Evolution

**Goal:** Automatically adapt to schema changes

**Checkpoint:** Can add, drop, and modify columns

**📋 Schema Evolution Scenarios:**
1. **Add column**: Source adds new field → add column to destination
2. **Drop column**: Source removes field → drop column from destination (optional)
3. **Change type**: Source changes field type → alter column (with casting)
4. **Change nullability**: Source changes nullable → alter column constraints

### Advanced Step 1: Implement discoverSchema()

**File:** Update `client/{DB}AirbyteClient.kt`

```kotlin
override suspend fun discoverSchema(tableName: TableName): TableSchema {
    val columns = getColumnsFromDb(tableName)
    return TableSchema(columns)
}

private fun getColumnsFromDb(tableName: TableName): Map<String, ColumnType> {
    val columns = mutableMapOf<String, ColumnType>()

    dataSource.connection.use { connection ->
        // Postgres/MySQL: Query information_schema.columns
        val sql = """
            SELECT column_name, data_type, is_nullable
            FROM information_schema.columns
            WHERE table_schema = '${tableName.namespace}'
              AND table_name = '${tableName.name}'
        """

        connection.createStatement().use { statement ->
            val rs = statement.executeQuery(sql)

            while (rs.next()) {
                val columnName = rs.getString("column_name")

                // Skip Airbyte metadata columns
                if (columnName in AIRBYTE_META_COLUMNS) continue

                val dataType = rs.getString("data_type")
                    .takeWhile { it != '(' }  // Strip precision (e.g., VARCHAR(255) → VARCHAR)
                val nullable = rs.getString("is_nullable") == "YES"

                columns[columnName] = ColumnType(dataType, nullable)
            }
        }
    }

    return columns
}

private val AIRBYTE_META_COLUMNS = setOf(
    "_airbyte_raw_id",
    "_airbyte_extracted_at",
    "_airbyte_meta",
    "_airbyte_generation_id"
)
```

**Database-specific approaches:**
- **Snowflake**: `DESCRIBE TABLE` or `SHOW COLUMNS`
- **Postgres/MySQL**: `information_schema.columns`
- **ClickHouse**: `system.columns` or client API `getTableSchema()`
- **BigQuery**: `INFORMATION_SCHEMA.COLUMNS`

### Advanced Step 2: Implement computeSchema()

**File:** Update `client/{DB}AirbyteClient.kt`

```kotlin
override fun computeSchema(
    stream: DestinationStream,
    columnNameMapping: ColumnNameMapping
): TableSchema {
    // Modern CDK pattern: schema is pre-computed by CDK using TableSchemaMapper
    return TableSchema(stream.tableSchema.columnSchema.finalSchema)
}
```

**What this does:**
- Returns the pre-computed final schema from the stream
- CDK has already applied `TableSchemaMapper.toColumnType()` and `toColumnName()` to compute this
- No manual type conversion needed - TableSchemaMapper handles it

**Why this is simpler than manual conversion:**
- TableSchemaMapper (from Phase 3.1) defines type mappings
- CDK calls mapper during catalog initialization
- Result is stored in `stream.tableSchema.columnSchema.finalSchema`
- `computeSchema()` just returns this pre-computed value

### Advanced Step 3: Implement alterTable() - ADD COLUMN

**File:** Update `client/{DB}SqlGenerator.kt`

```kotlin
fun alterTable(
    tableName: TableName,
    columnsToAdd: Map<String, ColumnType>,
    columnsToDrop: Map<String, ColumnType>,
    columnsToChange: Map<String, ColumnTypeChange>,
): Set<String> {
    val statements = mutableSetOf<String>()

    // ADD COLUMN (simplest - implement first)
    columnsToAdd.forEach { (name, type) ->
        val nullableClause = if (type.nullable) "" else " NOT NULL"
        statements.add(
            "ALTER TABLE ${fullyQualifiedName(tableName)} ADD COLUMN \"$name\" ${type.type}$nullableClause".andLog()
        )
    }

    // DROP COLUMN (implement second)
    columnsToDrop.forEach { (name, _) ->
        statements.add(
            "ALTER TABLE ${fullyQualifiedName(tableName)} DROP COLUMN \"$name\"".andLog()
        )
    }

    // MODIFY COLUMN (implement last - most complex)
    columnsToChange.forEach { (name, typeChange) ->
        // See Step 12.4 for implementation
    }

    return statements
}
```

### Advanced Step 4: Implement alterTable() - MODIFY COLUMN

**Add to alterTable():**

```kotlin
columnsToChange.forEach { (name, typeChange) ->
    when {
        // Safe: NOT NULL → NULL (widen constraint)
        !typeChange.originalType.nullable && typeChange.newType.nullable -> {
            // Postgres/MySQL
            statements.add(
                "ALTER TABLE ${fullyQualifiedName(tableName)} ALTER COLUMN \"$name\" DROP NOT NULL".andLog()
            )

            // Or Snowflake
            statements.add(
                "ALTER TABLE ${fullyQualifiedName(tableName)} MODIFY COLUMN \"$name\" DROP NOT NULL".andLog()
            )
        }

        // Unsafe: Type change - use temp column approach
        typeChange.originalType.type != typeChange.newType.type -> {
            val tempColumn = "${name}_${UUID.randomUUID().toString().replace("-", "").take(8)}"
            val backupColumn = "${name}_backup"

            // 1. Add temp column with new type
            statements.add(
                "ALTER TABLE ${fullyQualifiedName(tableName)} ADD COLUMN \"$tempColumn\" ${typeChange.newType.type}".andLog()
            )

            // 2. Cast and copy data
            statements.add(
                "UPDATE ${fullyQualifiedName(tableName)} SET \"$tempColumn\" = CAST(\"$name\" AS ${typeChange.newType.type})".andLog()
            )

            // 3. Rename original to backup
            statements.add(
                "ALTER TABLE ${fullyQualifiedName(tableName)} RENAME COLUMN \"$name\" TO \"$backupColumn\"".andLog()
            )

            // 4. Rename temp to original
            statements.add(
                "ALTER TABLE ${fullyQualifiedName(tableName)} RENAME COLUMN \"$tempColumn\" TO \"$name\"".andLog()
            )

            // 5. Drop backup
            statements.add(
                "ALTER TABLE ${fullyQualifiedName(tableName)} DROP COLUMN \"$backupColumn\"".andLog()
            )
        }

        // Unsafe: NULL → NOT NULL (skip - may fail with existing NULLs)
        typeChange.originalType.nullable && !typeChange.newType.nullable -> {
            log.info { "Skipping change from nullable to non-nullable for column $name (may have NULL values)" }
        }

        // No change needed
        else -> {
            log.debug { "No schema change needed for column $name" }
        }
    }
}
```

**Alternative: Table Recreation (ClickHouse pattern):**

For databases where ALTER is expensive or impossible (e.g., changing primary key):

```kotlin
fun recreateTable(
    stream: DestinationStream,
    oldTableName: TableName,
    newTableName: TableName,
    columnMapping: ColumnNameMapping
): List<String> {
    return listOf(
        // 1. Create new table with new schema
        createTable(stream, newTableName, columnMapping, replace = false),

        // 2. Copy data (with type casting)
        copyTable(columnMapping, oldTableName, newTableName),

        // 3. Drop old table
        dropTable(oldTableName),

        // 4. Rename new table to old name
        "ALTER TABLE ${fullyQualifiedName(newTableName)} RENAME TO ${oldTableName.name.quote()}".andLog()
    )
}
```

### Advanced Step 5: Implement applyChangeset()

**File:** Update `client/{DB}AirbyteClient.kt`

```kotlin
override suspend fun applyChangeset(
    stream: DestinationStream,
    columnNameMapping: ColumnNameMapping,
    tableName: TableName,
    expectedColumns: TableColumns,
    columnChangeset: ColumnChangeset,
) {
    if (columnChangeset.isNoop()) return

    log.info { "Summary of table alterations for ${tableName}:" }
    log.info { "  Added columns: ${columnChangeset.columnsToAdd.keys}" }
    log.info { "  Dropped columns: ${columnChangeset.columnsToDrop.keys}" }
    log.info { "  Modified columns: ${columnChangeset.columnsToChange.keys}" }

    val statements = sqlGenerator.alterTable(
        tableName,
        columnChangeset.columnsToAdd,
        columnChangeset.columnsToDrop,
        columnChangeset.columnsToChange,
    )

    statements.forEach { execute(it) }
}
```

### Advanced Step 6: Implement ensureSchemaMatches()

**File:** Update `client/{DB}AirbyteClient.kt`

```kotlin
override suspend fun ensureSchemaMatches(
    stream: DestinationStream,
    tableName: TableName,
    columnNameMapping: ColumnNameMapping
) {
    val actualSchema = discoverSchema(tableName)
    val expectedSchema = computeSchema(stream, columnNameMapping)
    val changeset = actualSchema.diff(expectedSchema)

    if (!changeset.isNoop()) {
        log.info { "Schema mismatch detected for ${tableName}, applying changes" }
        applyChangeset(stream, columnNameMapping, tableName, expectedSchema.columns, changeset)
    } else {
        log.debug { "Schema matches for ${tableName}, no changes needed" }
    }
}
```

**When is this called?**
- StreamLoader.start() calls ensureSchemaMatches() before writing
- If source schema changed since last sync, applies schema changes
- Automatic - no user intervention needed

### Advanced Step 7: Create TableSchemaEvolutionTest

**File:** `src/test-integration/kotlin/.../component/{DB}TableSchemaEvolutionTest.kt`

```kotlin
package io.airbyte.integrations.destination.{db}.component

import io.airbyte.cdk.load.component.TableOperationsClient
import io.airbyte.cdk.load.component.TableSchema
import io.airbyte.cdk.load.component.TableSchemaEvolutionClient
import io.airbyte.cdk.load.component.TableSchemaEvolutionSuite
import io.airbyte.cdk.load.component.TestTableOperationsClient
import io.airbyte.cdk.load.schema.TableSchemaFactory
import io.micronaut.test.extensions.junit5.annotation.MicronautTest
import org.junit.jupiter.api.Test

@MicronautTest(environments = ["component"])
class {DB}TableSchemaEvolutionTest(
    override val client: TableSchemaEvolutionClient,
    override val opsClient: TableOperationsClient,
    override val testClient: TestTableOperationsClient,
    override val schemaFactory: TableSchemaFactory,
) : TableSchemaEvolutionSuite {

    // Provide expected schema for your database's type representations
    // This validates that discoverSchema() and computeSchema() produce consistent results
    private val expectedAllTypesSchema: TableSchema by lazy {
        // Build expected schema based on your TableSchemaMapper.toColumnType() implementation
        // Example for Postgres:
        // TableSchema(mapOf(
        //     "boolean_col" to ColumnType("boolean", true),
        //     "integer_col" to ColumnType("bigint", true),
        //     "number_col" to ColumnType("numeric", true),
        //     "string_col" to ColumnType("character varying", true),
        //     ...
        // ))
        TODO("Define expected schema for all types")
    }

    @Test
    override fun `discover recognizes all data types`() {
        super.`discover recognizes all data types`(expectedAllTypesSchema)
    }

    @Test
    override fun `computeSchema handles all data types`() {
        super.`computeSchema handles all data types`(expectedAllTypesSchema)
    }

    @Test
    override fun `noop diff`() {
        super.`noop diff`()
    }

    @Test
    override fun `changeset is correct when adding a column`() {
        super.`changeset is correct when adding a column`()
    }

    @Test
    override fun `changeset is correct when dropping a column`() {
        super.`changeset is correct when dropping a column`()
    }

    @Test
    override fun `changeset is correct when changing a column's type`() {
        super.`changeset is correct when changing a column's type`()
    }

    @Test
    override fun `apply changeset - handle sync mode append`() {
        super.`apply changeset - handle sync mode append`()
    }

    @Test
    override fun `apply changeset - handle changing sync mode from append to dedup`() {
        super.`apply changeset - handle changing sync mode from append to dedup`()
    }

    @Test
    override fun `apply changeset - handle changing sync mode from dedup to append`() {
        super.`apply changeset - handle changing sync mode from dedup to append`()
    }

    @Test
    override fun `apply changeset - handle sync mode dedup`() {
        super.`apply changeset - handle sync mode dedup`()
    }

    @Test
    override fun `change from string type to unknown type`() {
        super.`change from string type to unknown type`()
    }

    @Test
    override fun `change from unknown type to string type`() {
        super.`change from unknown type to string type`()
    }
}
```

**What each test validates:**

| Test | What It Validates |
|------|-------------------|
| `discover recognizes all data types` | discoverSchema() correctly identifies existing table columns |
| `computeSchema handles all data types` | computeSchema() produces correct schema from stream definition |
| `noop diff` | No changes detected when schemas match |
| `changeset is correct when adding a column` | Detects when new columns need to be added |
| `changeset is correct when dropping a column` | Detects when columns should be dropped |
| `changeset is correct when changing a column's type` | Detects type changes |
| `apply changeset - handle sync mode *` | Schema evolution works across sync mode changes |
| `change from string/unknown type` | Complex type transformations work |

**Note:** This test validates both `TableSchemaEvolutionClient` AND `TableSchemaMapper` (via `computeSchema`).

### Advanced Step 8: Validate Schema Evolution

**Validate:**
```bash
$ ./gradlew :destination-{db}:componentTest --tests "*TableSchemaEvolutionTest*"
$ ./gradlew :destination-{db}:componentTest  # All component tests should pass
$ ./gradlew :destination-{db}:integrationTest  # Integration tests should pass
```

**Common failures and fixes:**

| Failure | Likely Cause | Fix |
|---------|--------------|-----|
| `discover recognizes all data types` fails | discoverSchema() returns wrong types | Check information_schema query and type name normalization |
| `computeSchema handles all data types` fails | TableSchemaMapper.toColumnType() returns wrong types | Update type mapping in TableSchemaMapper |
| Type mismatch between discover/compute | Inconsistent type names (e.g., "VARCHAR" vs "varchar") | Normalize type names in both methods |
| `apply changeset` fails | ALTER TABLE syntax wrong for your database | Check SqlGenerator.alterTable() implementation |

✅ **Checkpoint:** Schema evolution works + all previous phases still work

---

---

## Advanced Phase 2: Dedupe Mode

**Goal:** Support primary key deduplication

**Checkpoint:** Can deduplicate by primary key with "last write wins"

**📋 Dedupe Strategy:**
1. Write all records to temp table
2. Deduplicate in temp table (ROW_NUMBER() by primary key, ordered by cursor DESC)
3. Upsert from temp to final:
   - Match on primary key
   - Update if cursor is newer
   - Insert if no match

**Sync modes:**
- **Append** (Phase 8): Just insert
- **Overwrite** (Phase 10): Swap tables
- **Dedupe** (Phase 13): Upsert with primary key

### Advanced Step 1: Implement upsertTable() in SQL Generator

**Option A: MERGE Statement (Snowflake, SQL Server, BigQuery)**

**File:** Update `client/{DB}SqlGenerator.kt`

```kotlin
fun upsertTable(
    stream: DestinationStream,
    columnNameMapping: ColumnNameMapping,
    source: TableName,
    target: TableName
): String {
    val importType = stream.importType as Dedupe
    val pkColumns = importType.primaryKey.map { columnNameMapping[it]!! }
    val allColumns = columnNameMapping.values.toList()

    // Cursor column (for "last write wins" logic)
    val cursorColumn = if (importType.cursor.isNotEmpty()) {
        columnNameMapping[importType.cursor.first()]!!
    } else {
        "_airbyte_extracted_at"  // Fallback to extraction timestamp
    }

    // Deduplication CTE (keep latest record per primary key)
    val dedupCte = if (pkColumns.isNotEmpty()) {
        """
        WITH deduped AS (
          SELECT *, ROW_NUMBER() OVER (
            PARTITION BY ${pkColumns.joinToString(", ") { "\"$it\"" }}
            ORDER BY "$cursorColumn" DESC, "_airbyte_extracted_at" DESC
          ) AS rn
          FROM ${fullyQualifiedName(source)}
        )
        SELECT * FROM deduped WHERE rn = 1
        """
    } else {
        // No primary key - just dedupe by all columns (inefficient but safe)
        "SELECT * FROM ${fullyQualifiedName(source)}"
    }

    // Primary key match condition (handles NULLs)
    val pkMatch = pkColumns.joinToString(" AND ") { col ->
        """(target."$col" = source."$col" OR (target."$col" IS NULL AND source."$col" IS NULL))"""
    }

    // Cursor comparison (for UPDATE condition)
    val cursorComparison = """
        (
          target."$cursorColumn" < source."$cursorColumn"
          OR (target."$cursorColumn" = source."$cursorColumn" AND target."_airbyte_extracted_at" < source."_airbyte_extracted_at")
          OR (target."$cursorColumn" IS NULL AND source."$cursorColumn" IS NOT NULL)
        )
    """.trimIndent()

    // Column assignments for UPDATE
    val columnAssignments = allColumns.joinToString(",\n    ") { col ->
        "\"$col\" = source.\"$col\""
    }

    // Column list for INSERT
    val columnList = allColumns.joinToString(", ") { "\"$it\"" }
    val sourceColumnList = allColumns.joinToString(", ") { "source.\"$it\"" }

    return """
        MERGE INTO ${fullyQualifiedName(target)} AS target
        USING (
          $dedupCte
        ) AS source
        ON $pkMatch
        WHEN MATCHED AND $cursorComparison THEN UPDATE SET
          $columnAssignments
        WHEN NOT MATCHED THEN INSERT (
          $columnList
        ) VALUES (
          $sourceColumnList
        )
    """.trimIndent().andLog()
}
```

**Option B: INSERT ... ON CONFLICT (Postgres, SQLite)**

```kotlin
fun upsertTable(
    stream: DestinationStream,
    columnNameMapping: ColumnNameMapping,
    source: TableName,
    target: TableName
): List<String> {
    val importType = stream.importType as Dedupe
    val pkColumns = importType.primaryKey.map { columnNameMapping[it]!! }
    val allColumns = columnNameMapping.values.toList()
    val cursorColumn = if (importType.cursor.isNotEmpty()) {
        columnNameMapping[importType.cursor.first()]!!
    } else {
        "_airbyte_extracted_at"
    }

    val columnList = allColumns.joinToString(", ") { "\"$it\"" }
    val updateAssignments = allColumns
        .filter { it !in pkColumns }  // Don't update PK columns
        .joinToString(", ") { "\"$it\" = EXCLUDED.\"$it\"" }

    // 1. Dedupe in temp table first
    val dedupSql = """
        CREATE TEMP TABLE deduped AS
        SELECT * FROM (
          SELECT *, ROW_NUMBER() OVER (
            PARTITION BY ${pkColumns.joinToString(", ") { "\"$it\"" }}
            ORDER BY "$cursorColumn" DESC, "_airbyte_extracted_at" DESC
          ) AS rn
          FROM ${fullyQualifiedName(source)}
        ) WHERE rn = 1
    """.trimIndent().andLog()

    // 2. Upsert with cursor check
    val upsertSql = """
        INSERT INTO ${fullyQualifiedName(target)} ($columnList)
        SELECT $columnList FROM deduped
        ON CONFLICT (${pkColumns.joinToString(", ") { "\"$it\"" }})
        DO UPDATE SET $updateAssignments
        WHERE ${fullyQualifiedName(target)}."$cursorColumn" < EXCLUDED."$cursorColumn"
           OR (${fullyQualifiedName(target)}."$cursorColumn" = EXCLUDED."$cursorColumn"
               AND ${fullyQualifiedName(target)}."_airbyte_extracted_at" < EXCLUDED."_airbyte_extracted_at")
    """.trimIndent().andLog()

    // 3. Cleanup
    val cleanupSql = "DROP TABLE deduped".andLog()

    return listOf(dedupSql, upsertSql, cleanupSql)
}
```

**Option C: Separate DELETE + INSERT (fallback)**

```kotlin
fun upsertTable(...): List<String> {
    val pkColumns = (stream.importType as Dedupe).primaryKey.map { columnNameMapping[it]!! }
    val allColumns = columnNameMapping.values.toList()

    // 1. Dedupe (same as above)
    val dedupSql = "..."

    // 2. Delete existing records with matching PKs
    val deleteSql = """
        DELETE FROM ${fullyQualifiedName(target)}
        WHERE (${pkColumns.joinToString(", ") { "\"$it\"" }})
        IN (SELECT ${pkColumns.joinToString(", ") { "\"$it\"" }} FROM deduped)
    """.trimIndent().andLog()

    // 3. Insert all from deduped
    val insertSql = """
        INSERT INTO ${fullyQualifiedName(target)}
        SELECT ${allColumns.joinToString(", ") { "\"$it\"" }} FROM deduped
    """.trimIndent().andLog()

    // 4. Cleanup
    val cleanupSql = "DROP TABLE deduped".andLog()

    return listOf(dedupSql, deleteSql, insertSql, cleanupSql)
}
```

### Advanced Step 2: Implement upsertTable() in Client

**File:** Update `client/{DB}AirbyteClient.kt`

```kotlin
override suspend fun upsertTable(
    stream: DestinationStream,
    columnNameMapping: ColumnNameMapping,
    sourceTableName: TableName,
    targetTableName: TableName
) {
    val sql = sqlGenerator.upsertTable(stream, columnNameMapping, sourceTableName, targetTableName)

    // Single statement (MERGE)
    if (sql is String) {
        execute(sql)
    } else {
        // Multiple statements (INSERT ON CONFLICT, DELETE+INSERT)
        sql.forEach { execute(it) }
    }
}
```

### Advanced Step 3: Update Writer for Dedupe Mode

**File:** Update `write/{DB}Writer.kt`

```kotlin
override fun createStreamLoader(stream: DestinationStream): StreamLoader {
    val initialStatus = if (::initialStatuses.isInitialized) {
        initialStatuses[stream] ?: DirectLoadInitialStatus(null, null)
    } else {
        DirectLoadInitialStatus(null, null)
    }

    val tableNameInfo = names[stream]
    val (realTableName, tempTableName, columnNameMapping) = if (tableNameInfo != null) {
        Triple(
            tableNameInfo.tableNames.finalTableName!!,
            tempTableNameGenerator.generate(tableNameInfo.tableNames.finalTableName!!),
            tableNameInfo.columnNameMapping
        )
    } else {
        val tableName = TableName(
            namespace = stream.mappedDescriptor.namespace ?: "test",
            name = stream.mappedDescriptor.name
        )
        Triple(tableName, tempTableNameGenerator.generate(tableName), ColumnNameMapping(emptyMap()))
    }

    // Choose StreamLoader based on sync mode and import type
    return when (stream.minimumGenerationId) {
        0L -> when (stream.importType) {
            is Dedupe -> DirectLoadTableDedupStreamLoader(
                stream, initialStatus, realTableName, tempTableName,
                columnNameMapping, client, client, streamStateStore
            )
            else -> DirectLoadTableAppendStreamLoader(
                stream, initialStatus, realTableName, tempTableName,
                columnNameMapping, client, client, streamStateStore
            )
        }
        stream.generationId -> when (stream.importType) {
            is Dedupe -> DirectLoadTableDedupTruncateStreamLoader(
                stream, initialStatus, realTableName, tempTableName,
                columnNameMapping, client, client, streamStateStore, tempTableNameGenerator
            )
            else -> DirectLoadTableAppendTruncateStreamLoader(
                stream, initialStatus, realTableName, tempTableName,
                columnNameMapping, client, client, streamStateStore
            )
        }
        else -> throw SystemErrorException("Hybrid refresh not supported")
    }
}
```

**What changed:**
- Added `when (stream.importType)` check inside generation ID check
- Four StreamLoader types now supported:
  - DirectLoadTableAppendStreamLoader (incremental append)
  - DirectLoadTableAppendTruncateStreamLoader (full refresh overwrite)
  - DirectLoadTableDedupStreamLoader (incremental dedupe)
  - DirectLoadTableDedupTruncateStreamLoader (full refresh dedupe)

### Advanced Step 4: Enable Tests

**File:** Update `src/test-integration/kotlin/.../component/{DB}TableOperationsTest.kt`

```kotlin
@Test
override fun `upsert tables`() {
    super.`upsert tables`()
}
```

### Advanced Step 5: Validate

**Validate:**
```bash
$ ./gradlew :destination-{db}:testComponentUpsertTables  # 1 test should pass
$ ./gradlew :destination-{db}:componentTest  # 13 tests should pass
$ ./gradlew :destination-{db}:integrationTest  # 3 tests should pass
```

✅ **Checkpoint:** Dedupe mode works + all previous phases still work

---

---

## Advanced Phase 3: CDC Support (Optional)

**Goal:** Handle source deletions

**Checkpoint:** Can process CDC deletion events

**📋 CDC (Change Data Capture):**
- Tracks INSERT, UPDATE, DELETE operations from source
- Deletion marked with `_ab_cdc_deleted_at` timestamp
- Two modes:
  - **Hard delete**: Remove record from destination
  - **Soft delete**: Keep record with deletion timestamp

### Advanced Step 1: Add CDC Configuration

**File:** Update `spec/{DB}Specification.kt`

```kotlin
@get:JsonProperty("cdc_deletion_mode")
@get:JsonPropertyDescription(
    """Whether to execute CDC deletions as hard deletes (propagate source deletions)
    or soft deletes (leave tombstone record). Defaults to hard deletes."""
)
val cdcDeletionMode: CdcDeletionMode? = null

enum class CdcDeletionMode(@get:JsonValue val value: String) {
    @JsonProperty("hard_delete") HARD_DELETE("Hard delete"),
    @JsonProperty("soft_delete") SOFT_DELETE("Soft delete"),
}
```

**File:** Update `spec/{DB}Configuration.kt`

```kotlin
data class {DB}Configuration(
    // ... existing fields
    val cdcDeletionMode: CdcDeletionMode,
)

// In ConfigurationFactory
override fun makeWithoutExceptionHandling(pojo: {DB}Specification): {DB}Configuration {
    return {DB}Configuration(
        // ... existing fields
        cdcDeletionMode = pojo.cdcDeletionMode ?: CdcDeletionMode.HARD_DELETE,
    )
}
```

### Advanced Step 2: Add CDC Logic to upsertTable()

**File:** Update `client/{DB}SqlGenerator.kt`

```kotlin
fun upsertTable(...): String {
    val importType = stream.importType as Dedupe
    val pkColumns = importType.primaryKey.map { columnNameMapping[it]!! }
    val allColumns = columnNameMapping.values.toList()

    val hasCdc = stream.schema.asColumns().containsKey(CDC_DELETED_AT_COLUMN)
    val isHardDelete = config.cdcDeletionMode == CdcDeletionMode.HARD_DELETE

    val cursorColumn = if (importType.cursor.isNotEmpty()) {
        columnNameMapping[importType.cursor.first()]!!
    } else {
        "_airbyte_extracted_at"
    }

    // Deduplication CTE (same as before)
    val dedupCte = """..."""

    val pkMatch = """..."""
    val cursorComparison = """..."""

    // CDC DELETE clause (must come BEFORE UPDATE)
    val cdcDeleteClause = if (hasCdc && isHardDelete) {
        """
        WHEN MATCHED AND source."_ab_cdc_deleted_at" IS NOT NULL
             AND $cursorComparison THEN DELETE
        """
    } else {
        ""
    }

    // Skip INSERT for deleted records (hard delete mode)
    val cdcSkipInsertClause = if (hasCdc && isHardDelete) {
        "AND source.\"_ab_cdc_deleted_at\" IS NULL"
    } else {
        ""
    }

    val columnAssignments = allColumns.joinToString(",\n    ") { "\"$it\" = source.\"$it\"" }
    val columnList = allColumns.joinToString(", ") { "\"$it\"" }
    val sourceColumnList = allColumns.joinToString(", ") { "source.\"$it\"" }

    return """
        MERGE INTO ${fullyQualifiedName(target)} AS target
        USING (
          $dedupCte
        ) AS source
        ON $pkMatch
        $cdcDeleteClause
        WHEN MATCHED AND $cursorComparison THEN UPDATE SET $columnAssignments
        WHEN NOT MATCHED $cdcSkipInsertClause THEN INSERT ($columnList) VALUES ($sourceColumnList)
    """.trimIndent().andLog()
}

private val CDC_DELETED_AT_COLUMN = "_ab_cdc_deleted_at"
```

**Key Points:**
- DELETE clause must come **before** UPDATE (SQL execution order)
- Must check cursor (only delete if deletion is newer than existing record)
- Skip INSERT for deleted records (don't re-insert deleted rows)
- Soft delete: No special clauses (just upsert the deletion record with timestamp)

### Advanced Step 3: Test CDC

CDC tests are typically included in integration tests automatically if you have CDC streams configured. No separate test enablement needed - the framework tests CDC if the stream has `_ab_cdc_deleted_at` column.

### Advanced Step 4: Validate

**Validate:**
```bash
$ ./gradlew :destination-{db}:componentTest  # 13 tests should pass
$ ./gradlew :destination-{db}:integrationTest  # 3 tests should pass (CDC tested automatically if applicable)
```

✅ **Checkpoint:** Full CDC support + all previous phases still work

---

---

## Advanced Phase 4: Optimization & Polish

**Goal:** Production-ready performance


---

## Next Steps

**Next:** Your connector is now production-ready! Continue to [6-testing.md](./6-testing.md) to run BasicFunctionalityIntegrationTest and validate all features.