Commit Graph

218 Commits

Author SHA1 Message Date
Loïc Mathieu
b8f6102b27 fix(system): compilation issue 2025-12-16 14:40:59 +01:00
Loïc Mathieu
c208fe1cb9 fix(system): compilation issue and test failures 2025-12-15 16:52:51 +01:00
Loïc Mathieu
c46e58048b feat(execution): Execution Command
Use a command pattern for all changes to the execution for the Executor to process them.
2025-12-15 16:44:42 +01:00
Florian Hussonnois
b5cda54342 fix(scheduler): fix TriggerSchedulerMonitor
Adds new `trigger_id` to the executions table with index
Adds new method findAllByTrigger on ExecutionRepository
2025-12-15 16:44:42 +01:00
Loïc Mathieu
6f9ae15661 feat(system): move the indexer in its own module
Part-of: https://github.com/kestra-io/kestra-ee/issues/5751
2025-12-15 16:44:42 +01:00
Florian Hussonnois
3252b695bc fix(scheduler): rellocate core package 2025-12-15 16:44:42 +01:00
Loïc Mathieu
b209b6358e chore(system): fix merge issues 2025-12-15 16:44:42 +01:00
Loïc Mathieu
7440855f47 feat(system): metastores refactoring and FlowListeners removal 2025-12-15 16:44:41 +01:00
Florian Hussonnois
bd8a22026f fix(triggers): migrate to new model and fix api
* Rewrites TriggerRepositoryInterface tests
* Migrates Trigger model to TriggerState model
* Migrates next_execution_date to next_evaluation_epoch (fix timezone issue)
* Refactors and cleanup trigger REST APIs
* Migrates trigger.Toggle task to use new trigger event queue
* Removes legacy trigger queue
* Adds migration trigger script
2025-12-15 16:44:41 +01:00
Florian Hussonnois
8f29b09959 feat(scheduler): new scheduler implementation
Introduces new scheduler based on parralel event-loop and
consistent hashing for trigger distribution across scheduler instances
2025-12-15 16:44:41 +01:00
Loïc Mathieu
20dd44a5d7 fix(core): remove PostgresSchedulerScheduleTest as other JDBC impl didn't have it 2025-12-15 16:42:54 +01:00
Loïc Mathieu
3f848454d4 feat(system): refactor concurrency limit 2025-12-15 16:42:54 +01:00
Loïc Mathieu
9d54f4c407 feat(system): move the DefaultServiceLivenessCoordinator to the executor
As it is only started by the executor it should be inside this module
2025-12-15 16:42:54 +01:00
Loïc Mathieu
adcdab7e7e feat(services): use a single service liveness coordinator 2025-12-15 16:42:54 +01:00
Loïc Mathieu
4e50f4c363 feat(system): un-couple queues and repositories 2025-12-15 16:42:54 +01:00
Loïc Mathieu
c532fc3cc8 feat(system): Executor v2 2025-12-15 16:42:54 +01:00
Florian Hussonnois
ed8e810791 refactor(system): extract JdbcQueuePoller class from JdbcQueue
Extract a JdbcQueueConfiguration and JdbcQueuePoller classes from
JdbcQueue to improve clarity, testability and reuse of the code.
2025-12-15 16:22:34 +01:00
Loïc Mathieu
d5ba7e7304 feat(system): add a lock mechanism 2025-12-15 16:22:34 +01:00
Loïc Mathieu
2e2b05c227 chore(system): switch new migrations to V3 2025-12-15 16:22:34 +01:00
Loïc Mathieu
40d33f91d1 feat(flows): remove Templates
Part-of: https://github.com/kestra-io/kestra-ee/issues/3238
2025-12-15 16:22:33 +01:00
Loïc Mathieu
8eae8aba72 feat(executions): add a protection mecanism to avoid any potential concurrency overflow
Concurrency limit is based on a counter that increment and decrement the limit each time a flow is started and terminated.

This count should always be accurate.

But if some unexpected event occurs (bug or user manually do something wrong), the count may not be accurate anymore.

To avoid any potential issue, when we decrement the counter, we chech that concurrency count is bellow the limit before unqueing an execution.

Fixes #12031
Closes  #13301
2025-12-11 14:33:30 +01:00
Shankar
56bb3ca29c Fix week format in filter 2025-12-09 10:00:37 +01:00
Loïc Mathieu
14029e8c14 chore(tests): isolate concurrency related tests in their own class 2025-12-09 09:57:49 +01:00
brian.mulier
682d258e7b feat(ns-files): add a metadata layer on top for better performance & versioned ns files
part of https://github.com/kestra-io/kestra/issues/5617
2025-12-01 18:13:49 +01:00
Roman Acevedo
fbe6df34ca feat(executions): set duration to null for non-terminated execs and fix frontend duration 2025-12-01 16:06:35 +01:00
brian.mulier
c3d94dc8ff fix(tests): remove JdbcTestUtils.drop usages as it defeats concurrent test runs 2025-11-27 19:23:17 +01:00
Loïc Mathieu
1d1a065833 fix(tests): lower termination grace period
If a test mis-behave, for ex starting an execution but not terminating it, as the default termination grace period is 5mn it can take very long time to wait for post-test terminaison.
Switching to a termination grace period of 5s may help.

I also detect that the ExecutionControllerRunner test when launching the test suite, would not properly kill the `sleep-long` flow so waiting for it to complete, or the termination grace period. When a test that use this flow is launched separatly it works properly. As a safety net I reduce the sleep from 5mn to 30s.
2025-11-19 11:58:06 +01:00
Loïc Mathieu
735697ac71 chore(system): share JDBC repository code in an abstract CRUD repository 2025-11-18 11:13:16 +01:00
Loïc Mathieu
8346874c43 chore(system): reduce repository code duplication between OSS and EE
Part-of: https://github.com/kestra-io/kestra-ee/issues/1684
2025-11-17 10:03:45 +01:00
Roman Acevedo
71e49f9eb5 feat(executions): add IN, NOT_IN, CONTAINS LABELS #11916
- advance on https://github.com/kestra-io/kestra/issues/11587
- companion PR: https://github.com/kestra-io/kestra-ee/pull/5617
2025-10-31 10:20:05 +01:00
brian.mulier
07e90de835 fix(core): CrudEvent should not be done on the repository side for KV 2025-10-30 16:57:57 +01:00
Loïc Mathieu
c06ffb3063 feat(system): set taskrun attempt to resubmitted when a taskrun is resubmitted to a worker
Closes https://github.com/kestra-io/kestra/issues/12481
2025-10-30 15:46:05 +01:00
brian.mulier
d9144c8c4f feat(core): introduce KV Metadata in-repository storing (#12342)
part of https://github.com/kestra-io/kestra/issues/12341
2025-10-29 17:18:43 +01:00
Loïc Mathieu
948a5beffa fix(executions): set tasks to submitted after sending to the Worker
When computing the next tasks to run, all task runs are created in the CREATED state.
Then when computed tasks to send to the worker, CREATED task runs are listed and converted into worker task.
The issue is that on the next execution message, if tasks sent to the worker are still in CREATED (for ex because the Worker didn't start them yet), they would still be evaluted as to send to the worker.
Setting them to a new SUBMITTED state would prevent them to be taken into account again until they are really terminated.

This should avoid the deduplicateWorkerTask state but this is kept for now with a warning and would be removed later if it proves to work in all cases.
2025-10-10 11:06:31 +02:00
Loïc Mathieu
4a9564be3c fix(system): refactor concurrency limit to use a counter
A counter allow to lock by flow which solves the race when two executions are created at the same time and the executoion_runnings table is empty.

Evaluating concurrency limit on the main executionQueue method also avoid an unexpected behavior where the CREATED execution is processed twice as its status didn't change immediatly when QUEUED.

Closes https://github.com/kestra-io/kestra-ee/issues/4877
2025-10-09 15:39:59 +02:00
Loïc Mathieu
5c24308e71 fix(executions): evaluate multiple conditions in a separate queue
By evaluating multiple condition in a separate queue, we serialize their evaluation which avoir races when we compute the outputs for flow triggers.
This is because evaluation is a multi step process: first you get the existing condtion, then you evaluate, then you store the result. As this is not guarded by a lock you must not do it concurrently.

The race can still occurs if muiltiple executors run but this is less probable. A re-implementation would be needed probably in 2.0 for that.

Fixes https://github.com/kestra-io/kestra-ee/issues/4602
2025-10-03 10:35:49 +02:00
YannC
296fb2fb7a feat: implement Flows as a DataSource for dashboards (#11439)
* feat: implement Flows as a DataSource for dashboards

* chore: review changes

* fix: method signature changes from another commit apply in new flow fetchData method
2025-10-01 12:57:25 +02:00
Nicolas K.
b294457953 feat(tests): rework runner utils to not use the queue during testing (#11380)
* feat(tests): rework runner utils to not use the queue during testing

* feat(tests): rework runner utils to not use the queue during testing

* test: rework RetryCaseTest to not rely on executionQueue

* fix(tests): don't catch the Queue exception

* fix(tests): don't catch the Queue exception

* fix compile

* fix(test): concurrency error and made runner test parallel ready

* fix(tests): remove test instance

* feat(tests): use Test Runner Utils

* fix(tests): flaky tests

* fix(test): flaky tests

* feat(tests): rework runner utils to not use the queue during testing

* feat(tests): rework runner utils to not use the queue during testing

* test: rework RetryCaseTest to not rely on executionQueue

* fix(tests): don't catch the Queue exception

* fix(tests): don't catch the Queue exception

* fix compile

* fix(test): concurrency error and made runner test parallel ready

* fix(tests): remove test instance

* feat(tests): use Test Runner Utils

* fix(tests): flaky tests

* fix(test): flaky tests

* fix(tests): flaky set test

* fix(tests): remove RunnerUtils

* fix(tests): fix flaky

* feat(test): rework runner tests to remove the queue usage

* feat(test): fix a flaky and remove parallelism from mysql test suit

* fix(tests): flaky tests

* clean(tests): unwanted test

* add debug exec when fail

* feat(tests): add thread to mysql thread pool

* fix(test): flaky and disable a test

---------

Co-authored-by: nKwiatkowski <nkwiatkowski@kestra.io>
Co-authored-by: Roman Acevedo <roman.acevedo62@gmail.com>
2025-09-24 08:18:02 +02:00
Loïc Mathieu
02d9c589fb chore(system): remove the task run page
Part-of: https://github.com/kestra-io/kestra-ee/issues/5174
2025-09-23 14:48:30 +02:00
Roman Acevedo
504f925085 test: make AbstractExecutionRepositoryTest parallelizable (#11295)
* test: make AbstractExecutionRepositoryTest parallelizable

* feat(tests): play jdbc h2 tests in parallel

* fix(tests): failing unit tests

* tests: add await until timeout on some tests

* fix(tests): failing unit tests

* fix(tests): failing unit tests

---------

Co-authored-by: nKwiatkowski <nkwiatkowski@kestra.io>
Co-authored-by: Nicolas K. <nk_mikmak@hotmail.com>
2025-09-17 17:41:10 +02:00
Loïc Mathieu
617daa79db fix(executions): truncate the execution_running table as in 0.24 there was an issue in the purge
This table contains executions for flows that have a concurrency that are currently running.
It has been added in 0.24 but in that release there was a bug that may prevent some records to being correctly removed from this table.
To fix that, we truncate it once.
2025-09-15 17:29:28 +02:00
Loïc Mathieu
be5e24217b chore(system): extract the scheduler to its own module 2025-09-04 11:04:13 +02:00
Loïc Mathieu
a5724bcb18 chore(system): extract the executor to its own module 2025-09-04 11:04:13 +02:00
Florian Hussonnois
3929bf6172 feat(system): add distinct server-events for reporting
Refactor the services used to generate periodic reports on server usage.

Related-to: kestra-io/kestra-ee#3014
2025-08-20 12:20:31 +02:00
Florian Hussonnois
194ae826e5 chore(system): add WorkerJobQueueInterface to properly pass workerId on subscribe 2025-08-12 19:26:31 +02:00
Loïc Mathieu
2acf37e0e6 fix(executions): properly fail the task if it contains unsupported unicode sequence
This occurs in Postgres using the `\u0000` unicode sequence. Postgres refuse to store any JSONB with this sequence as it has no textual representation.
We now properly detect that and fail the task.

Fixes #10326
2025-08-11 11:53:39 +02:00
YannC
c68c1b16d9 fix: set postgres and mysql queue offset as a bigint (#10344) 2025-07-28 16:28:09 +02:00
Loïc Mathieu
ab464fff6e fix(executions): flow concurrency limit not honors when executions are created at a high rate
This is due to the fact that we now process the execution queue concurrently so there is a race when counting currently running executions. This can be seen easily using a ForEachItem as it could create tens or hundreds of executions almost instantly leading to almost all those executions started as they would all see 0 executions running...

Using a dedicated execution running queue, as done in EE, would serialize the messages and fix the issue.

However, if using multiple executor instances and concurrency limit = 1, there is a theoretical race as no locks will be done if no execution is running. A max surge of executions could be as high as the number of executor but this race is less probable to happen in real world scenario.

Fixes #10167
2025-07-25 11:35:00 +02:00
YannC.
b1d41f6f47 fix: handle label filter with and instead or for flow
close #4390
2025-07-22 09:42:45 +02:00
Loïc Mathieu
05b50c22e3 feat(executions): allow suspending an execution at a breakpoint
- When creating an execution, you can pass a breakpoint of the form `taskId.value` and an execution kind.
- An execution with a breakpoint will be suspended in the `BREAKPOINT` state when arriving at the point where the breakpoint task should be executed
- You can resume an execution from a breakpoint, this would resume the execution and remove the existing breakpoint. At this time a new breakpoint can be passed.
- You can pass a breakpoint when replaying an execution.

Part-of: https://github.com/kestra-io/kestra-ee/issues/1547
2025-07-09 10:56:00 +02:00