Overcoming YAML Hell in Build Pipelines

 

Photo by Osman Rana on Unsplash

TL;DR: YAML can be a wolf in sheep’s clothing when used for CI infrastructure. Other alternatives include moving to build systems like CAKE/FAKE, or using TeamCity and its Kotlin DSL. Both approaches have their unique pros and cons. NUKE, of which I’m the author, provides a solution that combines the power of both - flexibility of a build system, and CI specific features like parallelization and build queue optimizations.

If you dive into the DevOps world, chances are high you meet YAML around the next corner. For some tools, like Docker and Kubernetes, I think it’s a good match. However, for CI infrastructure it often becomes a nightmare, and actually, I’m not alone having such feelings. Recently, a tweet of Jeff Fritz started a debate about YAML in DevOps, to which the general agreement can be summarized as:

Welcome the world of YAML pain

Many of us only use YAML reluctantly. Personally, I would even say that the idea of Configuration as Code is a lie, because it doesn’t feel like coding (more about this in the next section). Yet, almost every CI/CD service out there is YAML-first:

Some try to stay clear from YAML, only to replace it with other suboptimal solutions. For instance, Jenkins uses a Groovy-flavoured configuration files, which isn’t really great either.

By the way, Azure Pipelines actually provides some better tooling around editing YAML, but apparently, it doesn’t really help:

YAML CAN KISS MY &SS

But let’s try getting more to the bottom of this.

What’s wrong with YAML

While I see how YAML configuration can be attractive, I truly believe that for CI/CD purpose, this is only because the sample pipelines in talks and blog posts are often just that – samples. YAML is in disrepute for several reasons. Some of the most important ones, in particular for CI/CD, are:

  • It imposes long feedback loops. Typically, the only way to test your configuration, is to commit your changes to the repository. Then the CI servers needs to pick up changes, and finally you might need to wait until the agent is available to trigger a new build. This adds up to a lot of time, especially given the following pitfalls.
  • It’s error-prone. We might indent too much or too little, mistype a well-known property, or forget to escape properly without even knowing. YAML is almost always valid, and there is no proper syntax highlighting. There are schema files that enable rudimentary code completion, but as a C# developer, this still feels clunky.
  • It’s not refactoring-safe. This is a matter of tooling again. Whenever we’re dealing with IDs and their references, the best choice you have is Search & Replace. This should only be the last resort.
  • It’s declarative. Not imperative. Not everyone needs that, but usually, there’s a time when you want to iterate over a collection, filter items, write some more complex conditions, and other funky stuff. YAML is just the wrong format for that.
  • It causes vendor lock-ins. Each CI system has its very own format. Switching to a different CI system becomes non-trivial, as we have to rewrite the complete configuration.

One more important fact is that many YAML configurations define inline Bash or PowerShell scripts. Typically, those make it hard to use any kind of IDE tooling. However, in JetBrains IDEs we can use language injections:

So I've recently been working on some @GitHub Actions with steps written in shell. The bad thing is that I was doing that in YAML, so the syntax highlighting... oh, wait a sec... Smiling face with sunglasses @intellijidea #github #intellij

Seems like JetBrains has at least partially fixed YAML! 🤓

Modern Configuration as Code

Hilarious. Calling it modern almost sounds like a second attempt to make it actually attractive. Following the same discussion from earlier, Jeff Fritz is on to something:

I would rather have a scripting language over this and generate the YAML in the appropriate dialect. Extra space? Whoops, your build didn't execute properly.

In fact, TeamCity implements this approach since 2016 already. We can use the Kotlin DSL to implement our complete build pipeline, which is then internally converted by TeamCity to its own runner format, which is XML. I’m absolutely not a Java or Kotlin developer, but writing Kotlin scripts is actually pretty decent and discoverable when using IntelliJ IDEA. We get all the IDE features like syntax highlighting, code completion, navigation, and refactorings:

Writing Kotlin in IntelliJ IDEA

In my opinion, this is much better than YAML. We don’t have to commit our configuration just to realize that we missed an indentation, or mistyped a reference. Our IDE will just tell us right away, if something is semantically broken or we’re good. As a bonus, whenever we feel lost in the Kotlin DSL, we can fallback to using the UI wizards and let TeamCity show us the particular configuration as Kotlin code:

Viewing DSL from UI wizard

Getting into Build Systems

If you’re a .NET developer, I can understand if you’re feeling reluctant to use Kotlin DSL. After all, I’m a huge fan of the philosophy to use the same language for build implementation as for the rest of the project1. Following this philosophy, and using build systems such as FAKE, CAKE, or BullsEye, …

The the YAML is essentially: run build.cake

Great thing, right? We gain the benefit of being loosely coupled from the CI system, so we don’t experience a vendor lock-in, and can easily switch if we need to. Another plus, is that we can easily execute the build locally, which makes troubleshooting much easier. However, this approach also comes with a drawback: we deliberately avoid using features of the CI system like parallelization of tasks or build queue optimization. We basically gained portability and ease of use at the cost of provided value of the CI system.

Merging Approaches

Can there be a way to get the best of both worlds without any of the disadvantages? For sure. What it boils down to is what Damian Hickey suggested a little earlier already:

Going to see if I can run individual bullseye defined targets-as-steps. Will share how I get on.

Meaning that we use both, a CI configuration and a build system, whereas the CI configuration defines multiple steps, each invoking the build system with a separate target. Typically, we get much better log output this way, and also allow the CI system to gather statistical data. However, even if the CI configuration is quite simple, writing it ourselves still has the potential to break things. For instance, when a target or input parameter name gets changed, we need to update configuration. Secondly, it’s hard to share state between different targets invocations. For instance, one target might calculate an in-memory list, which should be reported in the next target. How would that work?

Integration with NUKE

NUKE is a build system that I’m working on, which is similar to CAKE and FAKE. One unique aspect to NUKE is that it can generates the CI configuration from the C# build implementation itself. Currently supporting Azure Pipelines, AppVeyor, GitHub Actions, TeamCity, GitLab, and JetBrains Space. Let’s start with a more simple example and see how we can use GitHub Actions for our CI build:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[GitHubActions(
    "continuous",
    GitHubActionsImage.UbuntuLatest,
    GitHubActionsImage.MacOsLatest,
    On = new[] { GitHubActionsTrigger.Push },
    InvokedTargets = new[] { nameof(Test), nameof(Pack) },
    ImportSecrets = new[] { nameof(SlackWebhook), nameof(GitterAuthToken) })]
partial class Build : NukeBuild
{
    public static int Main() => Execute<Build>(x => x.Pack);
    
    [Parameter("Gitter Auth Token")] readonly string GitterAuthToken;
    [Parameter("Slack Webhook")] readonly string SlackWebhook;
    
    Target Test => /* ... */
    Target Pack => /* ... */
}

In the example above, we’re adding the GitHubActionsAttribute to our build class (Line 1) to define a new workflow called continuous. The workflow invokes the Test and Pack targets (Line 6) on 2 different images (Line 3-4) whenever we push new changes (Line 5). Additionally, we import the secrets GitterAuthToken and SlackWebhook (Line 7). Note that everything is refactoring-safe! Images and triggers are defined via enumerations. The targets and parameters are referenced with the nameof operator. If we rename them, our CI configuration will change as well. Finally, here’s the generated YAML file based on our attribute:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// <auto-generated />

name: continuous

on: [push]

jobs:
  ubuntu-latest:
    name: ubuntu-latest
    runs-on: ubuntu-latest
    steps:
      - uses: actions/[email protected]
      - name: Run './build.cmd Test Pack'
        run: ./build.cmd Test Pack
        env:
            SlackWebhook: ${{ secrets.SlackWebhook }}
            GitterAuthToken: ${{ secrets.GitterAuthToken }}
  macOS-latest:
    name: macOS-latest
    runs-on: macOS-latest
    steps:
      - uses: actions/[email protected]
      - name: Run './build.cmd Test Pack'
        run: ./build.cmd Test Pack
        env:
            SlackWebhook: ${{ secrets.SlackWebhook }}
            GitterAuthToken: ${{ secrets.GitterAuthToken }}

A more complex example is the following usage of the TeamCityAttribute that generates a configuration for TeamCity:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[TeamCity(
    TeamCityAgentPlatform.Windows,
    VcsTriggeredTargets = new[] { nameof(Pack), nameof(Test) },
    NightlyTriggeredTargets = new[] { nameof(Test) },
    ManuallyTriggeredTargets = new[] { nameof(Publish) })]
partial class Build : NukeBuild
{
    AbsolutePath TestResultDirectory => OutputDirectory / "test-results";
    
    [Partition(2)] readonly Partition TestPartition;
    IEnumerable<Project> TestProjects => TestPartition.GetCurrent(Solution.GetProjects("*.Tests"));

    Target Test => _ => _
        .DependsOn(Compile)
        .Produces(TestResultDirectory / "*.trx")
        .Produces(TestResultDirectory / "*.xml")
        .Partition(() => TestPartition)
        .Executes(() =>
        {
           // Test invocation
        });

}

This time we won’t look at the generated code, but point out individual features:

  • Nightly builds are defined via NightlyTriggeredTargets property (Line 4). Again, we can reference the Test target with the nameof operator. Internally, NUKE will generate a scheduled trigger.
  • Manual builds are defined via ManuallyTriggeredTargets (Line 5). In this case, we choose that the Publish target is represented as a deployment build configuration.
  • Parallelization can be achieved in three easy steps. Firstly, we declare a TestPartition object along with its size (Line 10). Secondly, we assign the partition to the Test target by calling Partition(() => TestPartition) (Line 17). This causes TeamCity to use a composite build configuration with multiple sub-configurations according to the partition size. In the last step, we use the partition to get the current slice of test projects for the currently running sub-configuration (Line 11).
  • Publishing and consuming artifacts is just a matter of calling Produces(...) and Consumes(...) (Line 15-16). This can be used to forward data to a subsequent target, or to provide file downloads through the TeamCity UI.
  • Build queue optimization is more of an implicit feature in TeamCity that comes along with the separation of different targets. Whenever a target has already been executed and could be reused, for instance when the affecting files haven’t changed, TeamCity will happily do that and save you resources.

Here are a few illustrations how things will look like in TeamCity. Including the Run Build dialog, that automatically exposes all parameters declared in the build class:

If you want to learn more about NUKE and its CI integration, check out the documentation.

One remaining issue is to allow different build steps to share state on a .NET process level. For instance, changes to a field of type List<Data> should be available in the next build step. A possible solution is to deserialize and serialize the build object before and after a build is invoked. TeamCity actually provides a great extension point with the .teamcity directory, which is automatically published as hidden artifact. Other CI systems have not been evaluated for this functionality yet.

Conclusion

As far as I’m concerned, extending build systems to generate CI configuration files is an interesting way to provide a better developer experience when building in different environments. Publishing artifacts, nightly builds, parallelization, build queue optimizations, and other gems are just a step an attribute away. We don’t need to know about CI systems inside out, and there is much less effort involved when we switch between them.

Skip the YAML pain and try NUKE!

  1. Credits to Gary Ewan Park