First draft expanding and clarifying WAL-related sync failures docume… (#48765)

2025-12-21 11:01:41 -05:00 · 2024-12-18 15:33:19 -08:00
parent c94eebcbf1
commit f867354a89
1 changed files with 161 additions and 9 deletions
--- a/docs/integrations/sources/postgres/postgres-troubleshooting.md
+++ b/docs/integrations/sources/postgres/postgres-troubleshooting.md
@@ -1,3 +1,6 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
 # Troubleshooting Postgres Sources

 ## Connector Limitations
@@ -13,7 +16,9 @@
 - Changing a column data type or removing a column might break connections.

 ### Xmin Limitations
+
 There are some notable shortcomings associated with the Xmin replication method:
+
 - Unsupported DDL operations : This replication method cannot support row deletions.
 - Performance : Requires a full table scan, so can lead to poor performance.
 - Row-level granularity : The xmin column is stored at the row level. This means that a row will still be synced if it had been modified, regardless of whether the modification corresponded to the subset of columns the user is interested in.
@@ -97,9 +102,9 @@ Caused by: org.postgresql.util.PSQLException: FATAL: terminating connection due

 Possible solutions include:

- [Recommended] Set [`hot_standby_feedback`](https://www.postgresql.org/docs/14/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK) to `true` on the replica server. This parameter will prevent the primary server from deleting the write-ahead logs when the replica is busy serving user queries. However, the downside is that the write-ahead log will increase in size.
- [Recommended] Sync data when there is no update running in the primary server, or sync data from the primary server.
- [Not Recommended] Increase [`max_standby_archive_delay`](https://www.postgresql.org/docs/14/runtime-config-replication.html#GUC-MAX-STANDBY-ARCHIVE-DELAY) and [`max_standby_streaming_delay`](https://www.postgresql.org/docs/14/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY) to be larger than the amount of time needed to complete the data sync. However, it is usually hard to tell how much time it will take to sync all the data. This approach is not very practical.
+- Recommended: Set [`hot_standby_feedback`](https://www.postgresql.org/docs/14/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK) to `true` on the replica server. This parameter will prevent the primary server from deleting the write-ahead logs when the replica is busy serving user queries. However, the downside is that the write-ahead log will increase in size.
+- Recommended: Sync data when there is no update running in the primary server, or sync data from the primary server.
+- Not Recommended: Increase [`max_standby_archive_delay`](https://www.postgresql.org/docs/14/runtime-config-replication.html#GUC-MAX-STANDBY-ARCHIVE-DELAY) and [`max_standby_streaming_delay`](https://www.postgresql.org/docs/14/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY) to be larger than the amount of time needed to complete the data sync. However, it is usually hard to tell how much time it will take to sync all the data. This approach is not very practical.

 ### Under CDC incremental mode, there are still full refresh syncs

@@ -146,14 +151,161 @@ The connector waits for the default initial wait time of 5 minutes (300 seconds)

 If you know there are database changes to be synced, but the connector cannot read those changes, the root cause may be insufficient waiting time. In that case, you can increase the waiting time (example: set to 600 seconds) to test if it is indeed the root cause. On the other hand, if you know there are no database changes, you can decrease the wait time to speed up the zero record syncs.

-### (Advanced) WAL disk consumption and heartbeat action query
+### (Advanced) Resolving sync failures due to WAL disk consumption {#advanced-wal-disk-consumption-and-heartbeat-action-query}

-In certain situations, WAL disk consumption increases. This can occur when there are a large volume of changes, but only a small percentage of them are being made to the databases, schemas and tables configured for capture.
+When using the `Read Changes using Write-Ahead Log (CDC)` update method, you might encounter a situation where your initial sync is successful, but further syncs fail. You may also notice that the `confirmed_flush_lsn` column of the server's `pg_replication_slots` view doesn't advance as expected.

-A workaround for this situation is to artificially add events to a heartbeat table that the Airbyte use has write access to. This will ensure that Airbyte can process the WAL and prevent disk space to spike. To configure this:
+This is a general issue that affects databases, schemas, and tables with small transaction volumes. There are complexities in the way PostgreSQL disk space can be consumed by WAL files, and these can cause issues for the connector when dealing with low transaction volumes. Airbyte's connector depends on Debezium. These complexities are outlined in [their documentation](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-wal-disk-space), if you want to learn more.
+
+#### Simple fix (read-only)
+
+The easiest way to fix this issue is to add one or more tables with high transaction volumes to the Airbyte publication. You do not need to actually sync these tables, but adding them to the publication will advance the log sequence number (LSN), ensuring the sync can succeed without you giving Airbyte write access to the database. However, this may greatly increase disk consumption.
+
+```sql
+ALTER PUBLICATION <publicationName> ADD TABLE <high_volume_table>;
+```
+
+If you do not want to increase disk consumption, use the following solutions, which require write access.
+
+#### Fix when reading against a primary or standalone (write)
+
+To fix the issue when reading against primary or standalone, artificially add events to a heartbeat table the Airbyte user can write to.
+
+1. Create an `airbyte_heartbeat` table in the database and schema being tracked.
+
+	```sql
+	CREATE TABLE airbyte_heartbeat (
+		id SERIAL PRIMARY KEY,
+		timestamp TIMESTAMP NOT NULL DEFAULT current_timestamp,
+		text TEXT
+	);
+	```

-1. Create a table (e.g. `airbyte_heartbeat`) in the database and schema being tracked.
 2. Add this table to the airbyte publication.
-3. Configure the `heartbeat_action_query` property while setting up the source-postgres connector. This query will be periodically executed by Airbyte on the `airbyte_heartbeat` table. For example, this param can be set to a query like `INSERT INTO airbyte_heartbeat (text) VALUES ('heartbeat')`.

-See detailed documentation [here](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-wal-disk-space).
+	```sql
+	ALTER PUBLICATION <publicationName> ADD TABLE airbyte_heartbeat;
+	```
+
+3. In the Postgres source connector in Airbyte, configure the `Debezium heartbeat query` property. For example:
+
+	```sql
+	INSERT INTO airbyte_heartbeat (text) VALUES ('heartbeat')
+	```
+
+Airbyte periodically executes this query on the `airbyte_heartbeat` table.
+
+#### Fix when reading against a read replica (write)
+
+To fix the issue when reading against a read replica:
+
+1. [Add pg_cron as an extension](#wal-pg-cron).
+2. [Periodically add events](#wal-heartbeat-table) to a heartbeat table your Airbyte user can write to.
+
+##### Add the pg_cron extension {#wal-pg-cron}
+
+[pg_cron](https://github.com/citusdata/pg_cron) is a cron-based job scheduler for PostgreSQL that runs inside the database as an extension so you can schedule PostgreSQL commands directly from the database.
+
+<Tabs>
+  <TabItem value="Google Cloud" label="Google Cloud" default>
+    
+1. Ensure your PostgreSQL instance is version 10 or higher. Version 10 is the minimum version that supports pg_cron.
+
+    ```sql
+    SELECT version();
+    ```
+
+2. Configure your database flags to enable pg_cron. For help with this, see [Google Cloud's docs](https://cloud.google.com/sql/docs/postgres/flags).
+
+	1. Set the `cloudsql.enable_pg_cron` flag to `on`. For more information, see [Google Cloud's docs](https://cloud.google.com/sql/docs/postgres/extensions#pg_cron).
+
+	2. Set the `shared_preload_libraries` flag to include `pg_cron`.
+
+		```
+		shared_preload_libraries = 'pg_cron'
+		```
+
+		If you already have other libraries in this parameter, add `pg_cron` to the list, separating each library with a comma.
+
+		```
+		shared_preload_libraries = 'pg_cron,pg_stat_statements'
+		```
+
+	3. Restart your PostgreSQL instance. For help with this, see [Google Cloud's docs](https://cloud.google.com/sql/docs/postgres/start-stop-restart-instance#restart).
+
+PostgreSQL now preloads the `pg_cron` extension when the instance starts.
+
+  </TabItem>
+  <TabItem value="Amazon RDS" label="Amazon RDS">
+
+1. Ensure your RDS for PostgreSQL instance is version 12.5 or later. pg_cron requires version 12.5 and later.
+
+    ```sql
+    SELECT version();
+    ```
+
+2. Modify the parameter group associated with your PostgreSQL database instance to enable pg_cron. For help modifying parameter groups, see the [AWS docs](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithParamGroups.Modifying.html).
+
+	1. If your RDS for PostgreSQL database instance uses the `rds.allowed_extensions` parameter to spcify which extensions can be installed, add `pg_cron` to that list.
+	
+	2. Edit the custom parameter group associated with your PostgreSQL DB instance. Modify the `shared_preload_libraries` parameter to include the value `pg_cron`.
+
+	3. Reboot your PostgreSQL database instance. For help, see the [AWS docs](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RebootInstance.html#USER_RebootInstance.steps).
+
+PostgreSQL now preloads the `pg_cron` extension when the instance starts.
+
+  </TabItem>
+</Tabs>
+
+##### Enable the pg_cron extension, create a heartbeat table, and schedule a cron job {#wal-heartbeat-table}
+
+1. Verify pg_cron was successfully added to `shared_preload_libraries`.
+
+	```
+	show shared_preload_libraries
+	```
+
+2. Enable the pg_cron extension, create a `periodic_log` (heartbeat) table, and schedule a cron job.
+
+	```sql
+	CREATE EXTENSION IF NOT EXISTS pg_cron;
+
+	CREATE TABLE periodic_log (
+		log_id SERIAL PRIMARY KEY,
+		log_time TIMESTAMP DEFAULT current_timestamp
+	);
+
+	SELECT cron.schedule(
+		'periodic_logger',               -- job name
+		'*/1 * * * *',                   -- cron expression (every minute)
+		$$INSERT INTO periodic_log DEFAULT VALUES$$ -- the SQL statement to run
+	);
+	```
+
+3. Verify the scheduled job.
+
+	```sql
+	SELECT * FROM cron.job;
+	```
+
+4. Verify the periodic update.
+
+	```sql
+	SELECT * FROM periodic_log ORDER BY log_time DESC;
+	```
+
+5. Alter the publication to include this table on the primary.
+
+	```sql
+	ALTER PUBLICATION airbyte_publication ADD TABLE periodic_log;
+	```
+
+6. Sync normally from the primary to the replica.
+
+##### Stop the sync
+
+If you need to stop syncing later, unschedule the cron job.
+
+```sql
+SELECT cron.unschedule('periodic_logger');
+```