airbyte

mirror of synced 2026-01-05 12:05:28 -05:00

Author	SHA1	Message	Date
Lake Mossman	73034c64da	Sweep old scheduler code (#13400 ) * sweep all scheduler application code and new-scheduler conditional logic * remove airbyte-scheduler from deployments and docs * format * remove 'v2' from github actions * add back scheduler in delete deployment command * remove scheduler parameters from helm chart values * add back job cleaner + test and add comment * remove now-unused env vars from code and docs * format * remove feature flags from web backend connection handler as it is no longer needed * remove feature flags from config api as it is now longer needed * remove feature flags input from config api test * format + shorter url * remove scheduler parameters from helm chart readme	2022-06-06 10:49:17 -07:00
Davin Chia	7788594e22	Start publishing proper artifacts. (#13484 ) ## What Finale of https://github.com/airbytehq/airbyte/pull/13122. We've renamed all directories in previous PRs. Here we remove the fat jar configuration and add publishing to all subprojects. Explanation for what is happening: Identically named subprojects have the following issues: * publishing as is leads to classpath confusion when the jars with the same names are placed in the Java distribution. This leads to NoClassDefFound errors on runtime. * deconflicting the jar names without changing directory names leads to dependency errors as the OSS jar pom files are generated using project dependencies (suggesting a dependency a sibling subproject in the same repo) that use subprojects group and name as a reference. This means the generated jars look for Jars that do not exists (as their names have been changed) and cannot compile. * the workaround to changing a subproject's name involves resetting the subproject's name in the settings.gradle and depending on the new name in each build.gradle. This increases configuration burden and decreases the ease of reading, since one will have to check the settings.gradle to know what the right subproject name is. See https://github.com/gradle/gradle/issues/847 for more info. * given that Gradle itself doesn't have support for identically named subprojects (see the linked issue), the simplest solution is to not allow duplicated directories. I've only renamed conflicting directories here to keep things simple. I will create a follow up issues to enforce non-identical subproject names in our builds. ## How * Remove fat jar configuration. * Add publishing to all subprojects.	2022-06-06 17:15:25 +08:00
Marcos Marx	9aa0220820	revert redshift and json operations (#13465 )	2022-06-03 19:44:40 -03:00
Marcos Marx	adf4b6df25	run gradlew add headers, correct formatting (#13460 )	2022-06-03 16:17:03 -03:00
Adam	071a8e9dc8	Destination Redshift: fixed array contents verification for SUPER (#13069 ) * add array handling to json flatten * bump version, add changelog entry * test: add flatten unit tests * auto-bump connector version Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>	2022-06-03 15:46:16 -03:00
LiRen Tu	3dcda7ae52	Use cheaper operation to estimate json data byte size (#13240 ) * Simplify byte size estimation * Format code * Update comment	2022-05-26 19:42:54 -07:00
Alexandre Girard	3894134d11	Bump year in license short to 2022 (#13191 ) * Bump to 2022 * format	2022-05-25 17:56:49 -07:00
Lake Mossman	26ed3856e1	✨ Migrate OSS to temporal scheduler (#12757 ) * Migrate OSS to temporal scheduler * add comment about migration being performed in server * add comments about removing migration logic * formatting and add tests for migration logic * rm duplicated test * remove more duplicated build task * remove retry * disable acceptance tests that call temporal directly when on kube * set NEW_SCHEDULER and CONTAINER_ORCHESTRATOR_ENABLED env vars to true to be consistent * set default value of container orchestrator enabled to true * Revert "set default value of container orchestrator enabled to true" This reverts commit `21b36703a9`. * Revert "set NEW_SCHEDULER and CONTAINER_ORCHESTRATOR_ENABLED env vars to true to be consistent" This reverts commit `6dd2ec04a2`. * Revert "Revert "set NEW_SCHEDULER and CONTAINER_ORCHESTRATOR_ENABLED env vars to true to be consistent"" This reverts commit `2f40f9da50`. * Revert "Revert "set default value of container orchestrator enabled to true"" This reverts commit `26068d5b31`. * fix sync workflow test * remove defunct cancellation tests due to internal temporal error * format - remove unused imports * revert changes that set container orchestrator enabled to true everywhere * remove NEW_SCHEDULER feature flag from .env files, and set CONTAINER_ORCHESTRATOR_ENABLED flag to true for kube .env files Co-authored-by: Benoit Moriceau <benoit@airbyte.io>	2022-05-18 17:05:42 -07:00
Jonathan Pearlin	ebb9f3e1ac	Prepare Database Access Layer for Dependency Injection (#12546 ) * Prepare database access objects for dependency injection * Replace duplicate code * Remove unused imports * Remove redundant validation call * Remove unused imports * Use constants * Disable fast fail during connection pool initialization * Remove typo * Add missing test dependency * Add missing test dependency * Add missing test dependency * Fix issue caused by rebase * Add method for cloud * Autoclose DSL context during migration * Better connection close handling * Fix typo in dependency * Fix SpotBugs issue * React to rebase * Fix typo * Update JavaDoc * Fix database close calls * Pass configs to getServer * Fix typo * Fix call to removed method * Fix typo * Use catalog to manage versions * PR feedback * Centralize shutdown hook * Fix rebase issues * Document test cases * Document test cases * Formatting * Properly close database resources * Rebase cleanup	2022-05-09 15:26:54 -04:00
Benoit Moriceau	455decc018	Migrate to a secret store by default (#12516 ) This is adding a metadata entry in order to make the use of the DB secret store to be the default one. It will avoid having secret outside of the secret table.	2022-05-06 12:45:13 -07:00
Benoit Moriceau	07359ffd77	Migrate secret from a non secret store to a secret store (#12088 ) Introduce a migration to a secret manager If a secret manager is specify, it will go though all the config, save the secret in the configured secret store. If the secret is already in a store, it will not migrate the secret to the secret store.	2022-05-04 16:40:33 -07:00
Benoit Moriceau	e8813ee60c	Restore jsonPath and fix it (#12325 ) This restore the Json traversal library. A bug was introduce in the Json path library, the PR fix it. In a json schema we can define an enum without specifying a "type" attribute. It wasn't handle in the previous implemantation. We now return a right type in the getType method and process it the same way than the an integer/boolean/string type.	2022-04-28 13:58:59 -07:00
Subodh Kant Chaturvedi	10a3aa70a4	Revert "json schema traversal + secrets (#11847 )" (#12185 ) This reverts commit `d41c3f7d6f`.	2022-04-20 23:05:43 +05:30
Charles	d41c3f7d6f	json schema traversal + secrets (#11847 )	2022-04-14 17:37:06 -07:00
Parker Mossman	884a94ed29	Un-Revert OSS branch build for Cloud workflow (#11808 ) * Revert "Revert "Build OSS branch for deploying to Cloud env (#11474)"" This reverts commit `55e3c1e051`. * add action to get dev branch tag to OSS project instead of doing it in cloud * remove dev branch version action, going to do this in cloud afterall	2022-04-08 15:17:04 -07:00
Charles	f512208afd	Introduce json path commons lib (#11680 )	2022-04-08 12:03:57 -07:00
lmossman	55e3c1e051	Revert "Build OSS branch for deploying to Cloud env (#11474 )" This reverts commit `189efe7b42`.	2022-04-05 15:44:31 -07:00
Parker Mossman	189efe7b42	Build OSS branch for deploying to Cloud env (#11474 ) * add VERSION buildArg to Dockerfiles, default to current airbyte version but overwritable * use VERSION env var consistently as Dockerfile buildArg, jar version, and tag pass version and image_tag into docker build task function * add github action for building and pushing an OSS branch for Cloud to consume * allow AirbyteVersion to validate versions containing 'oss-branch' prefix * change oss-branch prefix to dev for branch-based versions * better action name * add docker-compose-cloud.build.yaml to define minimum set of cloud images that are pushed by oss branch action * update local dev docs to describe optional usage of VERSION env var * make branch_version_tag input optional, if not provided, generates dev-<commit_hash> * fix typo * fix missed merge conflict * update docker docs * update integrationRunner isDev check	2022-04-05 15:06:17 -07:00
Christophe Duong	848bb349b5	🎉 Change destination-s3 buffering to reduce/stabilize memory/thread consumption (#11294 ) * Refactor destination-s3 to use the new serialization strategy and get memory usage under control	2022-03-28 17:40:44 +02:00
Peter Hu	8fbbf4b7cf	List actor definitions for workspace (#11336 ) * writeActorDefinitionWorkspaceGrant * listPublicActorDefinitions * listGrantedActorDefinitions * list concat helper * listSourceDefinitionsForWorkspace * listDestinationDefinitionsForWorkspace * named mock definitions don't rely on magic index * remove excessive line breaks	2022-03-23 16:14:26 -07:00
Benoit Moriceau	f2cb12cbc0	Avoid putting secret in the exported JSON object (#11296 ) The OSS deployment allows to do an export and import of the workspace configuration. If the users is not using a secret manager in their deployment, this will return API key and password in plain text. This is a serious security issue, especially since it has been presented by @timroes that some instance are exposed on the public internet. In order to avoid returning password or other values that should be in a secret we need to sanitize the export. This is what this PR is doing.	2022-03-23 10:08:09 -07:00
terencecho	f4bb7b21b2	Add Auto-Disable Failing Connections feature (#11099 ) * Add Disable Failing Connections feature * Rename and cleanup * list jobs based off connection id * Move variables to env config and update unit tests * Fix env flag name * Fix missing name changes * Add comments to unit test * Address PR comments * Support multiple config types * Update unit tests * Remove the attemptId notion in the connectionManagerWorkflow (#10780) This is removing the attemptId from the create attempt activity to replace it with the attemptNumber. This will be modified in the workflow in a later commit. * Revert "Remove the attemptId notion in the connectionManagerWorkflow (#10780)" (#11057) This reverts commit `99338c852a`. * Revert "Revert "Remove the attemptId notion in the connectionManagerWorkflow (#10780)" (#11057)" (#11073) This reverts commit `892dc7ec66`. * Revert "Revert "Revert "Remove the attemptId notion in the connectionManagerWorkflow (#10780)" (#11057)" (#11073)" (#11081) This reverts commit `e27bb74050`. * Add Disable Failing Connections feature * Rename and cleanup * Fix rebase * only disable if first job is older than max days * Return boolean for activity * Return boolean for activity * Add unit tests for ConnectionManagerWorkflow * Utilize object output for activity and ignore non success or failed runs * Utilize object output for activity and ignore non success or failed runs Co-authored-by: Benoit Moriceau <benoit@airbyte.io>	2022-03-18 11:13:28 -07:00
Charles	c1c8675366	Add readmes to all modules (#8893 )	2022-03-13 14:45:36 -07:00
Davin Chia	af6c64d2d6	Reporter App Monitoring. (#11074 ) Add a metric to monitor the monitoring app's rate of publishing metrics. Though this isn't perfect, it gives us some insight into whether metric publishing is okay or running into issues.	2022-03-13 11:48:56 +08:00
Charles	5fde59fdbd	add spotbugs (#10522 )	2022-03-11 12:05:17 -08:00
Davin Chia	a135ff2259	Refactor Reporter App. (#11070 ) Refactor for readability. Create a separate enum class. This enum is added to when a dev needs to add another metric to the reporter. This helps us isolate the emission logic + scheduling configuration from the actual threads pushing the logic.	2022-03-12 02:03:54 +08:00
LiRen Tu	f5748998c8	Fix unit test failures from env configs (#10998 ) * Fix unit test failures from env configs * Default null to empty string * Format code * Fix one more unit test * Remove unused import	2022-03-09 14:24:54 -08:00
Marcos Marx	44e8f6fcdf	Auto-upgrade connectors when they are in use only a patch version update (#10515 ) * auto-upgrade connectors there are in use with patch version only * update check version docstring * remove try/catch from hasNewPatchVersion * refactor write std defs function * run format * add unit test and change exception * update airbyte version function name to be more clear * correct unit test in migration tests * run format	2022-03-09 18:43:48 -03:00
LiRen Tu	81417e6728	Add connector metadata as sentry tags (#10475 ) * Pass worker metadata to connector * Fix compilation * Pass in job id and image from worker * Remove application version * Add default job environment variables * Add back removed comment * Rename env map to job metadata * Fix env configs * Read connector from application * Use empty string * Remove println * Fix unit test * Fix compilation error * Introduce constants for worker env * Add worker env to ENV_VARS_TO_TRANSFER * Pass into getWorkerMetadata map to all constructions * Format code * Format octavia cli * Fix test compilation * Fix typos	2022-03-09 07:36:03 -08:00
Peter Hu	c78b510028	push failures to segment (#10715 ) * test: new failures metadata for segment tracking * new failures metadata for segment tracking failure_reasons: array of all failures (as json objects) for a job - for general analytics on failures main_failure_reason: main failure reason (as json object) for this job - for operational usage (for Intercom) - currently this is just the first failure reason chronologically - we'll probably to change this when we have more data on how to determine failure reasons more intelligently - added an attempt_id to failures so we can group failures by attempt - removed stacktrace from failures since it's not clear how we'd use these in an analytics use case (and because segment has a 32kb size limit for events) * remove attempt_id attempt info is already in failure metadata * explicitly sort failures array chronologically * replace "unknown" enums with null note: ImmutableMaps don't allow nulls * move sorting to the correct place	2022-03-01 11:33:42 -08:00
Jared Rhizor	a85c02ab76	run format (#10445 )	2022-02-17 15:28:28 -08:00
Edward Gao	07e2232025	Track const config values in analytics (#10120 )	2022-02-17 15:17:03 -08:00
LiRen Tu	049a11b2bc	🎉 Snowflake destination: reduce memory footprint (#10394 ) * Add detailed logging for flushing * Log sentry transaction event id * Adjust logging * Log memory usage * Add jvm monitoring * Remove log * Remove port 9010 * Remove host network mode * Sample record size * Remove profiling code * Add unit tests * Use average estimation * Rename variable * Format code * Bump version * Revert unnecessary change * Update doc * Fix format * Bump version in seed	2022-02-17 12:55:35 -08:00
Jared Rhizor	a843fdac0f	allow injecting the image for the container orchestrator (#10169 )	2022-02-07 21:02:52 -08:00
Benoit Moriceau	20b940abcc	Add logs (#10158 )	2022-02-07 12:14:35 -08:00
Jared Rhizor	27e4c71fa4	fix container orchestrator logging (#10117 ) * fix container orchestrator logging * just reconfiguring is sufficient * remove out of date comment	2022-02-04 15:38:53 -08:00
Jared Rhizor	c1ae3073d1	upgrade log4j2 to 2.17.1 (#8977 ) * upgrade log4j2 to 2.17.0 * Use colors instead of background. * Put this back. * Fix syntax. * go back to original log colors * fix comments + remove system property -> env var fallback mechanism * upgrade all the way to 2.17.1 Co-authored-by: Davin Chia <davinchia@gmail.com> Co-authored-by: lmossman <lake@airbyte.io>	2022-02-04 10:00:21 -08:00
Jared Rhizor	db4093277f	async container launch kubernetes "process" (#9242 ) * add misc todos * save work so far * configure async pod processing * remove comment * fmt * working except logging propagation? * add comment * add logging and misc configuration fixes * add output propagation * fix state reading * logging is working (but background highlighting is not) * fix log highlighting * use sys instead of ctx * comment * clean up and test state management * clean up orchestrator app construction * unify launcher workers and handle resuming * respond to comments * misc * disable * fix comment * respond to comments	2022-01-20 07:56:06 -08:00
Benoit Moriceau	e7da9232bb	Fix record count and add acceptance test to the new scheduler (#9487 ) * Add a job notification The new scheduler was missing a notification step after the job is done. This is needed in order to report the number of record of a sync. * Acceptance test with the new scheduler Add a new github action task to run the acceptances test with the new scheduler * Retry if the failure * PR comments	2022-01-19 18:16:19 -08:00
Davin Chia	8c3c68c160	Document various available configuration. (#9249 ) - Add comments to the interface methods in Configs.java. - Add new document on configuring airbyte. Transfer the non internal-only variables to this document.	2022-01-04 17:27:58 +08:00
Alexander Tsukanov	eea41b4fc8	🎉 Destination Snowflake and RedShift: Implement the Byte-buffered logic (#8869 ) * airbyte-8336: Byte based approach. * test-commit * airbyte-8336: Split file by cnhunks. * airbyte-8336: Renamed variable. * airbyte-8336: make snowflake DEFAULT_MAX_BATCH_SIZE_BYTES_SNOWFLAKE constant. * airbyte-8336: make snowflake DEFAULT_MAX_BATCH_SIZE_BYTES_SNOWFLAKE constant. * airbyte-8336: make snowflake DEFAULT_MAX_BATCH_SIZE_BYTES_SNOWFLAKE constant. * airbyte-8336: fix of unit tests * airbyte-8336: Changed to default buffer size in SnowFlake. * airbyte-8336: Changed 15 GB to 1 GB for max size. * airbyte-8336: Changed to default buffer size in SnowFlake. * airbyte-8336: Bumped connector version. * airbyte-8336: Bumped connector version. * airbyte-8336: Bumped connector version.	2021-12-24 12:32:10 +02:00
Benoit Moriceau	389efbd23d	Feature/new temporal scheduler (#8352 ) This getting rid of scheduling with the scheduler app and the job submitter. It is replaced by a temporal workflow which will be responsible to schedule the syncs on time.	2021-12-23 20:15:38 +01:00
Subodh Kant Chaturvedi	8654c4a62f	implement flyway migration for config database normalization + tests (#8563 ) * implement flyway migration for config database normalization + tests * use Enums + create connection operation table * address review comments * format * undo change * handle null value for enums * implement new database config persistence (#8620) * implement new database config persistence * make new persistence compatible with applications * update tests * do not update createdAt timestamp * review comments + incorporate changes from bootloader * address review comments * fixed test + remove unused method * fix archive handler test * handle null value for tombstone * add logs for assert migration * final review comments	2021-12-21 22:33:25 +05:30
Charles	cc70c0f721	allow serializing to yaml without quoting strings (#8896 )	2021-12-19 11:16:50 -08:00
Jared Rhizor	20c488948e	add container orchestrator in feature flag (#8015 ) * it's working * clean up process factory * fix and add tests * clarify airbyte_default * fix build * fix kube acceptance test (maybe) * oops * fix output prop issue * fix output propagation regression * fix kube singleton problem * sync passing on kube but getting wrong exit code of 7 * misc * fix port usage * remove host port that causes conflicts * eliminate envconfigs static usage * this took way too long to figure out * get rid of annoying ==== this is new log messages * finally successfully completing syncs * stop using magic strings and clean up logging * misc minor cleanups * fmt * misc * correct * misc fixes * rename + misc * better logs * logging fix 1 * logging fix 1 -- fixed * finally get logging working nicely * add comment for simplification * fmt * misc * fmt * break into separate class * remove comment * remove flaky multi-node testing * try to fix connector build * remove separate node check * switch to new configs type * fix regression from logging config changes * only log path one time * remove misleading setting terminology * fix connector build * fix earlier merge conflict * fix runtime kubernetes bug * fix connector build (again) * greatly simplify logging config by forcing the container-orchestrator to use default (non-job) logging * add secret insertion for orchestrator on kube * fix k8s ports * add four ports * fix logging test regression * temporarily disable kube tests to check logging * improve comments * make Docker run more secure by limiting env vars transferred * re-enable kubernetes tests * fix conflict * fix docker launching * revert temporal hacks * match master * fix typo * remove completed todo * fix conflict * increase memory requirement to something reasonable * Update airbyte-container-orchestrator/Dockerfile Co-authored-by: Davin Chia <davinchia@gmail.com> * Update airbyte-container-orchestrator/Dockerfile Co-authored-by: Davin Chia <davinchia@gmail.com> * see if this stabilitizes tests * address review comments * bump new container version * revert temporary addition * change port back to 9000 * make re-initialization actually a no-op * add feature flag * fix version from merging earlier * fix dockerfile * fix connector build * fix * bump node version * fix dockerfile Co-authored-by: Davin Chia <davinchia@gmail.com>	2021-12-17 16:51:17 -08:00
Davin Chia	bab9edabd8	Use throwable for tryDeserialise. (#8631 )	2021-12-09 01:32:06 +08:00
Davin Chia	635109aeba	🐛 Fix forgotten rename. (#8511 )	2021-12-04 23:48:49 +08:00
Davin Chia	341f505a94	Rename env vars for better readability. (#8447 ) * Rename GcsStorageBucket to GcsLogBucket. * Update all references to GCP_STORAGE_BUCKET to GCS_LOG_BUCKET. * Undo this for configuration files for older Airbyte versions. * Clean up Job env vars. (#8462) * Rename MAX_SYNC_JOB_ATTEMPTS to SYNC_JOB_MAX_ATTEMPTS. * Rename MAX_SYNC_TIMEOUT_DAYS to SYNC_JOB_MAX_TIMEOUT_DAYS. * Rename WORKER_POD_TOLERATIONS to JOB_POD_TOLERATIONS. * Rename WORKER_POD_NODE_SELECTORS to JOB_POD_NODE_SELECTORS. * Rename JOB_IMAGE_PULL_POLICY to JOB_POD_MAIN_CONTAINER_IMAGE_PULL_POLICY. * Rename JOBS_IMAGE_PULL_SECRET to JOB_POD_MAIN_CONTAINER_IMAGE_PULL_SECRET. * Rename JOB_SOCAT_IMAGE to JOB_POD_SOCAT_IMAGE. * Rename JOB_BUSYBOX_IMAGE to JOB_POD_BUSYBOX_IMAGE. * Rename JOB_CURL_IMAGE to JOB_POD_CURL_IMAGE. * Rename KUBE_NAMESPACE to JOB_POD_KUBE_NAMESPACE. * Rename RESOURCE_CPU_REQUEST to JOB_POD_MAIN_CONTAINER_CPU_REQUEST. * Rename RESOURCE_CPU_LIMIT to JOB_POD_MAIN_CONTAINER_CPU_LIMIT. * Rename RESOURCE_MEMORY_REQUEST to JOB_POD_MAIN_CONTAINER_MEMORY_REQUEST. * Rename RESOURCE_MEMORY_LIMIT to JOB_POD_MAIN_CONTAINER_MEMORY_LIMIT. * Remove worker suffix from created pods to reduce confusion with actual worker pods. * Use sync instead of worker to name job pods.	2021-12-03 23:28:48 +08:00
Charles	ada2e1724a	Refactor MigrationAcceptanceTest to test for major version bumps (#8154 )	2021-11-29 20:14:18 -08:00
mkhokh-33	5032addf3e	🐛 Source MySQL: transform binary data base64 format (#8047 ) * Source-MySql: transform binary data base64 format, add integration tests * Source-MySql: fix code style * Source-MySql: bump versions * Source-MySql: bump versions in source_specs.yaml * Source-MySql: added test for stream with binary data for DestinationAbstractTest * Source-MySql: added format	2021-11-23 16:04:48 +02:00

1 2 3

149 Commits