mechanical changes to get the new interfaces. Presumably if it compiles,
it works.
ended up doing this as a separate PR just to avoid polluting my test
components PR with this stuff
## What
* Revert introduction of column cache to formatter to improve
performance
## How
* Restore previous logic
## Review guide
* `SnowflakeRecordFormatter.kt`
## User Impact
## Can this PR be safely reverted and rolled back?
- [X] YES 💚
- [ ] NO ❌
## What
* Cache the Airbyte metadata column lookup in the record formatter to
improve performance
## How
* Add cache the formatters
* Restore accidentally deleted test
* Add logging
## Review guide
1. `SnowflakeRecordFormatter.kt`
## Can this PR be safely reverted and rolled back?
- [X] YES 💚
- [ ] NO ❌
## What
example sql:
```sql
ALTER SESSION SET QUOTED_IDENTIFIERS_IGNORE_CASE = true;
create table foo (id int);
select count(*) as "total" from foo;
```
the returned result set will have `TOTAL` as the column, rather than
`total`.
## How
just always upcase the `total` column 🤷
## Review guide
<!--
1. `x.py`
2. `y.py`
-->
## User Impact
<!--
* What is the end result perceived by the user?
* If there are negative side effects, please list them.
-->
## Can this PR be safely reverted and rolled back?
<!--
* If unsure, leave it blank.
-->
- [x] YES 💚
- [ ] NO ❌
## What
This should fix the `User character length limit (16777216) exceeded by
string` error
## Can this PR be safely reverted and rolled back?
<!--
* If unsure, leave it blank.
-->
- [X] YES 💚
- [ ] NO ❌
## What
Brings SF to the latest CDK
## Notes
We no longer set max concurrent stream to 50 which is a good thing as
that could have lead to massive disk usage. We will now evict if we use
over 1750mb (350mb * 5) of disk, which is much more reasonable.
## User Impact
We won't immediately evict aggregates if over some static number of
interleaved-ness. We secondarily check the size of all active aggregates
and only evict if over the configured size.
## What
* Configuration GZIP compression level for CSV file output of
destination-snowflake
## How
* Expose the ability to set the compression level on the
`GZIPOutputStream`
* Default the level to level 5 (faster/less compression than default
level 6)
* Inline file format into copy command to avoid conflicts when modifying
the file format.
**N.B.** Testing showed about a 12% speed improvement over the current
code. Ideally, we would dial this in to the best tradeoff level for
speed vs compression, possibly even exposing this setting to the user
for advanced cases where more compression is required over speed.
## Review guide
1. `SnowflakeInsertBuffer.kt`
## Can this PR be safely reverted and rolled back?
- [X] YES 💚
- [ ] NO ❌
---------
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
## What
for https://github.com/airbytehq/oncall/issues/9621. Branched from
https://github.com/airbytehq/airbyte/pull/67575. Stop upcasing the
column name before dropping it.
This only really comes into play if either (a) a schema evolution is
interrupted, or (b) the user manually adds a column with lowercase name
to their table. Most schema evolutions are working fine, in spite of
this bug.
## How
The bug is that when we `describe table`, we called
`toSnowflakeCompatibleName()` on the column names. This upcases the
column names. Then when we go to drop the column, the column obviously
doesn't exist, because we modified it incorrectly.
This PR switches to the method which does the quote-escaping, but
doesn't upcase the name.
Tested manually:
```sql
alter table edgao_test.variant_test add column "x" object;
alter table edgao_test.variant_test add column "y" object;
```
Then run this thing to simulate a sync (the sync will create a table
with column `X`). Without this change, it fails; with this change, it
correctly drops the `x` and `y` columns, and leaves the `X` column
untouched.
```kotlin
@Test
fun arst() {
val stream =
DestinationStream(
"edgao_test",
"variant_test",
importType = Append,
schema =
ObjectType(
properties =
linkedMapOf(
"x" to FieldType(ObjectType(properties = linkedMapOf()), nullable = true)
)
),
generationId = 42,
minimumGenerationId = 0,
syncId = 42,
namespaceMapper = namespaceMapperForMedium()
)
runSync(
updatedConfig,
stream,
listOf(
InputRecord(
stream,
"""
{
"x": {"foo": "bar"}
}""".trimIndent(),
emittedAtMs = 1000,
checkpointId = checkpointKeyForMedium()?.checkpointId
)
)
)
}
```
## Can this PR be safely reverted and rolled back?
<!--
* If unsure, leave it blank.
-->
- [x] YES 💚
- [ ] NO ❌
## What
Ignore backslashes in field values
## How
Snowflake defaults to treating `\` as an escape character (...ish.
`\foo,bar` treats it as a normal character, but `foo\,bar` treats it as
an escape)
We're already doing normal CSV escape things, so just tell snowflake to
stop doing this.
## Can this PR be safely reverted and rolled back?
<!--
* If unsure, leave it blank.
-->
- [x] YES 💚
- [ ] NO ❌
Following #61584. Bumping certified connector versions to make sure the version and code commits align. Doing this in 2 parts.
Bump BQ, SF, S3, S3-data-lake.