Files
opentf/rfc/20251105-use-cobra-instead-of-mitchellh.md
2025-12-12 17:04:44 +02:00

18 KiB

Exploring cobra replacement

Tickets that are targeted to be resolved by this:

When OpenTofu predecessor project started, the options for CLI libraries were slim and were lacking features heavily. Because of that, mitchellh/cli has been created and served its purpose. Now, it is archived and some dependencies of that (e.g.: https://github.com/posener/complete) are not maintained anymore, putting OpenTofu in a weird position where, in order to be able to improve on layers controlled by it, we would need to either fork and maintain those ourselves or switch to another maintained library.

The main road blocks we hit because of using this old library are as follows:

  • Deprecated autocompletion scripts for zsh.
  • Hardcoded handling of the autocompletion scripts. The library used (https://github.com/posener/complete) is writing these scripts to hardcoded file paths without any option to specify the stream to be written into.
  • Flags parsing is scattered all over the place and help text is written and formatted manually for each and every flag.
  • The flags format we have is the golang style which is way more uncommon compared with the POSIX style one.

In this RFC, after several in-depth tests, I want to share a possible solution of switching to cobra.

I chose cobra for this because of its adoption in the go community, of its vast catalogue of features, and for its extensibility. In this RFC I want to list some of the approaches that we could take on making this transition and what would be the steps for it.

Backwards compatibility

From the get go, we need to be clear that by attempting such a change, we need to be mindful about the risks and the promises that we need to keep about the UI of OpenTofu.

Risks and careful consideration:

  • flags parsing could result in different parsed values because of some structures that we have today.
  • flags ordering and propagation between commands could be slightly different. This would be handled by cobra instead and not by custom implementation that we have today.
  • autocompletion by using the previously installed scripts could be broken

Note

Recommended is to add unit tests for these risks before doing any changes.

Ensure that:

  • once a user will switch to the new version, no change will be needed for it to work as before.
  • all the features should work the same as they did before the switch:
    • flags should be parsable with single dash in front (e.g.: -flag).
    • autocomplete should work with the previously installed scripts.
    • the help function(s) should print the same way. The only acceptable change could be that the text will be correctly (automatically) wrapped at 80 chars.

Proposed Solution

Autocomplete

In order to preserve the current functionality and the one provided by cobra we would have to create some kind of a "bridge" between the two.

This means that when autocompletion scripts are already installed into a system by an old tofu binary, we could forward the requests to posener/complete to ensure that the users that didn't update yet the autocompletion scripts, will still have the autocompletion work properly.

To be able to do so, understanding how the options work is crucial.

posener/complete

posener/complete has been created initially as a pure bash autocomplete library. Because of that, it relies heavily on the bash inner works to provide this capability, by using two environment variable:

  • COMP_LINE
  • COMP_POINT

As the library evolved, it included support for others shells (fish, zsh) but it didn't add official support of those, but took the shortest path on achieving this capability by exporting the env vars aforementioned before calling the binary to provide suggestions (E.g.: fish generated script).

In the same time, to make this work for zsh, it's not using the built-in functionality for autocompletion, but instead it loads the bashcompinit to achieve this (which is bash specific library that zsh offers for compatibility purposes).

spf13/cobra

Cobra instead, relies on an automatically included, hidden __complete command. This command contains a generic logic for handling autocompletion based on the arguments received. The scripts that are generated for each supported shell are baked in cobra, use official completion APIs of the targeted shell and converts the specific shell information in values understandable by the __complete command.

Here is a list of the scripts cobra generate and their associated official documentation for reference:

  • bash - official docs (programmable completion and builtins)
    • Uses complete builtin function together with other functions to provide the aforementioned env vars for the target application to provide completion suggestions.
  • zsh - official docs
    • To be able to use source <(tofu completion zsh), it uses the compdef function as the second line of the completion script. This works similarly with the complete command in bash.
    • To allow zsh recommended way to install these scripts, the first line from the cobra script contains #compdef directive. This is used by zsh to lazy load the script.
  • fish - official docs
  • powershell - official docs
    • Uses Microsoft PS Core Register-ArgumentCompleter to register a completion script that calls into the cobra logic.

One possible downside of the cobra approach around this is about the length of the scripts. Because for each shell, the script is responsible with converting the shell specific information in arguments for the cobra command, the scripts tend to be quite complex. But the advantage of having those scripts baked in cobra is that the maintenance of those is ensured by the community supporting cobra from different shell communities. Therefore, for the foreseeable future, the scripts will be updated with the best practices for each shell and we can inherit those with each update of the library.

"Bridge" between old and new autocompletion scripts

To ensure backwards compatibility, posener/complete should be the first called to provide autocompletion suggestions if possible. When it will provide none, we will allow cobra to execute, which internally will provide suggestions if __complete command has been invoked. The logic for such a "bridge" is already in mitchellh/cli. The idea is quite simple: since posener/complete relies on the COMP_LINE env var to execute its logic, it checks if that is configured and if it isn't returns false announcing that it could not provide suggestions. The idea that I sketched is exactly like that, where I do the same check, and if it returns true, I build the autocompletion context required by posener/complete and run its logic. Check this for details.

Another advantage of using cobra is that the autocompletion scripts can be written to any buffer (default stdout), which allows us to do several things:

  • allow users to source the scripts directly, without the need to write those to a file. Helpful when OpenTofu is executed on a system where the files like .zshrc are read only.
  • allows generating the scripts right before the release and package those in the delivery archives for each OS.
    • a good idea got during reviews, was to not generate the scripts during the release, but instead to generate the scripts by using go generate in the normal development flow and include the files in the final binary and serve it like that. This way, we can configure goreleaser to include the already generated files without relying on the binaries compiled during release. An added benefit in doing this is that the scripts will be reviewable before release. Such an example can be seen in the experiment repo.

Cobra offers a ValidArgsFunction that can calculate on the fly the valid arguments for a command. One such example can be seen in tofu workspace select.

Flags

When it comes to flags, this is more of a sensitive topic since these control OpenTofu in a more functional manner, so we need to ensure that the parsing of the values is not affected. One other thing that we need to be sure of, is that the position of these will not raise errors to the users once they start using a new version that will include the cobra integration.

Problem statement

Note

The issue with how in general flags are handled in OpenTofu, is a topic for another RFC.

In this one we will explore only the challenges of migrating from the flags format we have to POSIX compliant ones.

Because OpenTofu uses golang stdlib flag package, and because that lib supports long format flags with a single dash in front (e.g.: -flagname), the migration to a POSIX compliant format (e.g.: --flagname) is more challenging especially since we don't want to break already configured CI/CD that uses go style flags.

In order for tidying up and making the flags handling more clear in OpenTofu, we strive towards defining all the flags by using spf13/pflag, which works hand in hand with spf13/cobra.

The next two sections shows different approaches on achieving a migration without breaking existing flows.

Copy spf13/pflag/FlagSet to flag/FlagSet

This was a first attempt since the golang style flags do support single and double dashes for the defined flags (getting us closer to POSIX formatted flags).

spf13/pflag offers a functionality to copy the flags defined into the golang standard lib flag.FlagSet.

This would achieve 2 things:

  • Common way to define flags by using pflag
  • A backwards compatible way to parse flags from the arguments

One downside that I encountered while testing this is related to the integration of this approach in cobra.

To be able to use this approach, we would do the following:

  • defined flags using pflag
  • disable cobra flags parsing
  • copy the flags to stdlib flag.FlagSet
  • use the stdlib flag.FlagSet.Parse(os.Args)

Because of the 2nd step, since cobra does not parse the flags from the args, it will pass all the args (including flags) into the args of the execution function (Run, PreRun, etc) of each command (root, sub command, etc).

For example, having this command line:

tofu -chdir=test apply -auto-approve planfile

By having the flags parsing disabled, all the execution functions of the commands will receive exactly the same args slice: ["-chdir=test", "-auto-approve", "planfile"]. I am talking here about the rootCmd.PersistentPreRun, rootCmd.Run/RunE and applyCmd.Run, etc.

This adds maintenance cost, mixing of concerns and a quite tangled way to have logic bound to the flags.

Note

As noted in the official documentation, tried also to use Command.TraverseChildren to force args given to each command object to contain only its defined flags, but that works only when Command.DisableFlagParsing = false, which breaks the first requirement we have to be able to copy the flags in the flag stdlib.

Use the spf13/pflag parsing

To use this approach, is quite straight forward:

  • For each command define its flags
  • Those will be parsed before execution of the command

One big benefit that we have with this is that combined with Command.TraverseChildren, each *cobra.Command object receives strictly the arguments and not the flag values, flags being already parsed and injected in the structs used to configure the flags.

An additional advantage compared with the copying of the flags, is that for any complex flag configured with a custom type behind right now, we avoid having its backing structure handle the parsing in unknown ways.

Probably till now, you asked yourself, ok, but what about backwards compatibility? For this approach to work, the backwards compatibility would be covered by a silly hack visible here.

Considering that right now all the OpenTofu's flags are long form, ensuring that each argument that looks like a single-dash flag should be converted to a double-dash one, is safe enough to unlock all the inner functionality of cobra.

For the TF_CLI_ARGS I don't have a specific proposal, but there are multiple ways of doing it:

  • similar with what we have today where we alter the os.Args before executing cobra command
  • process this in the executed command PreRun function
  • parse again the command defined flagset against the TF_CLI_ARGS env var and merge it with the struct that was parsed directly by cobra (following the precedence in place right now)
  • out of this scope, but we could use also viper that can bind to flags and when configs loaded, it follows the same precedence as we have now.

Help text

Once we have all the things mentioned above, cobra offers the option to customise the help by our needs. There is already a good example wrote in my experiment. The function is configured on the root command and any other sub command makes use of it when -h/--help is provided.

For backwards compatibility of this part, we should render the flags with one dash only, but personally, I am inclined generating these with 2 dashes in front to have a smooth transition for the newcomers.

Open Questions

  • Why there are flags that are hidden or never visible? Like -install-autocomplete.
    • Asking because I would like to make those visible with this change

Future Considerations

This RFC is tackling the most important aspects of such a migration but there are still a lot to be done around details of this:

  • TF_CLI_ARGS
  • Streams and View/UI proper handling
  • CLI configuration before chdir and provider sources
  • TF_REATTACH_PROVIDERS
  • cleanup provider clients
  • Suggestions for misspelled commands. Something like this

We might consider including opentofu#3050 under these changes too.

When implementing this, we might want to consider having this under an experimental flag that will allow users to opt-in the new CLI integration for one minor version, and in the next minor version, we can change the meaning of the experimental flag from enabling the new CLI to enable the old CLI, making the new implementation the default one.

Potential Alternatives

  • Do not migrate to a new library and try to do the most we can by forking the existing ones and just add the missing features
  • Do nothing :(
  • Check other libraries

Additional resources

Cobra completion command reference (link). Cobra user guide (link)