rfc: Optimized Refresh and Detection of New Objects

Signed-off-by: Martin Atkins <mart@degeneration.co.uk>
2025-12-19 17:59:05 -05:00 · 2025-10-29 11:03:27 -07:00
parent ab51186a30
commit 0062ee2f33
1 changed files with 138 additions and 0 deletions
--- a/rfc/20251029-optimized-refresh.md
+++ b/rfc/20251029-optimized-refresh.md
@@ -0,0 +1,138 @@
 # Optimized Refresh and Detection of New Objects
 By default, OpenTofu's "plan" phase currently includes a step of requesting the
 latest settings for every object tracked in the prior state, to check whether
 something has changed in the remote system outside of OpenTofu's workflow.
 This extra step is useful because it ensures that OpenTofu is planning against
 the true current state of the remote objects rather than a stale snapshot of
 what was current at the end of the previous plan/apply round, but the current
 approach has some drawbacks too:
 - The current provider protocol uses a separate call to refresh each object,
  which means that configurations that include many objects are often slow
  to refresh and may cause API rate limits to be exceeded.
    This tends to cause those with larger configurations to use `-refresh=false`
    to completely disable refreshing, or to use the `-target=...` option
    to work with only small fragments of configuration at a time, both of
    which can cause OpenTofu to be left with an inconsistent view of the
    remote objects, potentially causing problems later.
 - Refreshing individual objects doesn't allow OpenTofu to detect entirely new
  objects that might've been created outside of OpenTofu, and so we have a
  separate "import" workflow to deal with those and that is useful only if
  the operator already knows that the additional objects have been created.
 This document proposes some changes to OpenTofu's default behavior that should
 hopefully strike a better compromise where for most operators the refresh
 behavior will be a benefit rather than a burden. It also proposes retaining
 something more like the current default behavior as an opt-in, so it will
 still be available for those who wish to prioritize having a completely-updated
 state, and those folks can still benefit from some of the performance
 improvements even when the force full refreshing.
 Related issues:
 - [Option to skip refreshing resource instances whose configuration hasn't changed since the last apply](https://github.com/opentofu/opentofu/issues/1703)
 - [Make Terraliths a Thing of the Past – Enable Scalable Root Modules in OpenTofu](https://github.com/opentofu/opentofu/issues/2860)
 - [More granular state storage, locking, and planning](https://github.com/opentofu/opentofu/issues/2662)
 - [Auto Import Resources If Possible](github.com/opentofu/opentofu/issues/2321)
 - [Ability to have OpenTofu automatically import an existing object if it exists, or create it otherwise](https://github.com/opentofu/opentofu/issues/1760)
 ## Proposed Solution
 There are three main parts to this proposal that could potentially be
 implemented separately but that are proposed together because the complement
 each other to produce a better overall system:
 1. Allow providers to optionally optimize their refresh calls by performing
   many at once in a single request to the remote system, whereever the remote
   API has support for that.
    This requires a provider protocol extension.
 2. Introduce a new configuration language feature for describing search queries
   that might discover new remote objects that would be considered to be in
   the management scope of the current configuration.
    This would extend the meaning of "refresh" to also include discovery of
    objects that OpenTofu didn't create, after which the operator can decide
    whether to adopt them into the desired state (by generating configuration)
    or to delete them as unwanted drift.
    The full functionality of this part of the proposal requires a provider
    protocol extension, but partial support is possible with existing provider
    protocol features.
 3. Change OpenTofu's default behavior so that it will only refresh objects whose
   resource instance configurations have changed since the most recent
   plan/apply round or which are dependencies of resource instances whose
   configurations have changed.
    This new compromise gives OpenTofu access to up-to-date information about
    the objects involved in an intentional configuration change while allowing
    unrelated objects to remain stale until a future run.
    The current behavior of always refreshing everything would remain available
    as a new planning option useful for e.g. periodic "drift detection" runs,
    and OpenTofu would also continue to refresh everything in the `-refresh-only`
    planning mode where detecting differences is the primary purpose.
    This does not require a provider protocol change, and so would provide
    immediate benefit regardless of which providers are being used.
 The following subsections describe each of the items above in more detail.
 ### Bulk refresh
 Today's OpenTofu calls the provider protocol's "read managed resource" operation
 separately for each resource instance from the prior state, as part of the
 process of planning each resource instance.
 ### Automatic discovery of new objects
 ### Refreshing only objects whose configuration has changed
 ## Open Questions and Alternatives
 ### Continue refreshing everything by default?
 This document has proposed that we change the _default_ behavior of OpenTofu
 so that it will refresh only the subset of objects whose configurations have
 changed (or that have actions planned for any other reason).
 We could also potentially choose to retain the current default and require
 those who want the new behavior to explicitly opt in to it.
 The proposal to change the default is founded in the assumption that more
 operators would want the new behavior than would want to keep the old behavior,
 that this behavior change is not significant enough to be considered "breaking",
 and that those who need the previous behavior would be able to add the new
 planning option relatively easily.
 This echoes a tradeoff made for a similar change made long ago to OpenTofu's
 predecessor:
 Originally the "apply" command performed the plan and apply phase together
 immediately without any interactive confirmation prompt, and so anyone who
 wanted to review their plan before applying it needed to use the saved plan
 workflow, which is pretty inconvenient for those who are running the program
 interactively from a shell prompt.
 Someone proposed adding a new option to enable an interactive mode for "apply"
 which would show the plan and then prompt for confirmation before proceeding.
 Subsequent discussion found that actually the interactive approval prompt was
 the more commonly-needed mode, and so the interactive prompt was implemented
 as the new default behavior and the `-auto-approve` option added for the
 minority who wanted the previous behavior.
 That appears to have been a good decision in the long run, even though it was
 admittedly inconvenient for those who needed to adjust their existing usage
 patterns or wrapper scripts at the time. Similarly, I think that partial
 refreshing is the better default behavior for most operators, and that the
 current full-refresh behavior is more suited to special situations like when
 implementing a "drift detection" system which runs periodically with the
 explicit goal of finding changes made outside of OpenTofu.