May 8th, 2015
Configuration management from Git to Consul
Author: Ryan Breen
Cimpress (the parent company of Vistaprint) runs on a large number of geographically distributed computing systems. Configuring them has always been a challenge, but it reached monumental proportions as we have recently begun a shift to a microservices architecture. The scale, geographic reach, and volatility of these systems will all increase dramatically as we make this shift, so traditional systems of configuration management are poorly suited to the challenge.
Historically, we relied on a configuration system that was database-centric. This was a good fit for a more centralized architecture, but it’s poorly suited as you spread your reach geographically. To maintain the same design, that database needs to be replicated everywhere that your application runs, and sometimes that dependency is cumbersome.
Another issue is that we’re moving some critical home-built systems to open-source alternatives that are driven by configuration files. With a database-centric solution, we’d have to teach these third-party tools to talk to a configuration database, or write some mediation layer to concoct configuration files from database records, both of which feel like swimming upstream.
Ultimately, we decided that files are the most straightforward way to reason about a unit of configuration data. Even our home-built applications know how to read configurations from files, and, it’s hard to imagine a third-party platform that doesn’t have first-class support for files.
Selecting a System of Record
If files are your unit of configuration, what should you use as the source of truth about the state of the system? Systems like Apache Zookeeper work well for this, and we did give Zookeeper serious consideration. Ultimately, we decided that its configuration was cumbersome, a cost that needs to be paid per datacenter. We also decided that Zookeeper is overly reliant on the health of the server nodes. We were concerned about adding another single point of failure to a complex distributed system.
We opted, instead, to use another wildly successful open-source project as the system of record for our configuration files: Git. Think about all the reasons why revision control systems are so valuable. They give you an audit trail for all changes. They give you exceptional tools for analyzing what has changed from revision to revision and from branch to branch. And in our case, we already had a Git solution (Atlassian’s Stash) federated with Active Directory. This meant that all of the work to define who should be allowed to modify what has been done for us: teams using the system could simply set permissions for configuration repos in Stash, as they would for any other repo.
While Git makes a good solution for centralized storage of configuration files, we weren’t as sold on orchestrating configuration pushes with Git’s clone or pull operations. For one, we always want to keep the installed footprint low on our servers, and a Git client feels like a clunky dependency to add. Also, there’s no clean mechanism to push changes to our distributed set of servers, so the only option would be to have each server poll Git for updates. This could be burdensome for our shared Git infrastructure, and it increases the latency in moving configurations around the network. Finally, the granularity of Git is bound at the repo level: if you have a configuration repo with thousands of files, a server that needs only two of those files still needs to clone and pull the entire repo.
What we really wanted was a low-latency mechanism to push changes to servers, where servers subscribe to the specific set of changes relevant to them. One project we’ve had our eye on for a bit is Consul from Hashicorp. We’ve had a lot of success using other Hashicorp tools like Vagrant and Packer, and Consul seemed like a really intriguing fit for our needs. It provides a distributed key-value (“K/V”) store with rich support for long-polling requests, solving our desire to push changes to subscribing nodes.
Consul also provides an elegant mechanism for service discovery. That doesn’t buy us a whole lot for the topic of configuration management, but it does give us another reason to tolerate the minimal footprint of Consul on server nodes. In fact, multiple teams are interested in adopting Consul for its service discovery features alone, so it’s an easy sell as our configuration distribution mechanism.
Integrating Consul and Git
Consul gave us a robust mechanism for moving data around the network, but we still needed components that would put that configuration data into Consul and then extrude it onto the filesystem. Neither existed, so we built and open-sourced them. git2consul runs on centralized infrastructure, mirroring Git repositories into Consul’s K/V store. fsconsul runs on machines that wish to subscribe to configuration data and translates relevant K/Vs into files.
git2consul monitors a set of Git repositories and branches and turns each file into a Consul K/V. It’s based on the principle that a Value in Consul’s K/V store might as well be an entire configuration file. Since a Value can be up to 512kB, we have plenty of headroom for any realistic configuration data.
Git is not designed to protect confidential information, so we created the gosecret utility to support end-to-end encryption of secrets within configuration files. gosecret ensures that configuration contents that should be kept secret are only viewable by two parties: the user with the ability to read or grant these privileges (for example, the operator allowed to grant a new database password) and the system on which these privileges should be read. Any other user will only see an opaque string of data, so it’s safe to put all configuration files in world-readable source control.
fsconsul integrates gosecret as a shared library, decrypting any ciphertext found in configuration data before writing the data to disk. This avoids any potential race conditions where encrypted data sits on disk for a couple seconds waiting for a decryption routine to run. fsconsul also relies on Consul’s built-in long-polling support, blocking and waiting for notification of a change. Network utilization is thus kept as low as possible: changes to configuration data are sent over the network only to the set of nodes subscribing to that configuration.
The combination of these tools yields a pipeline that starts with robust, versioned configuration management as source. Configuration data is then pushed securely and in near real-time directly to subscribing nodes. There is no wasted traffic. There is no need to wonder who authorized a configuration change. We call this system Cicero.
Perhaps our favorite aspect of the system, though, is how composable it is: all the pieces are designed to work with each other, but you can use individual components if that’s all you need. For example, we created the hiera-gosecret project to support encrypting individual configuration elements managed by Puppet’s Hiera system. For some teams, that’s the only aspect of Cicero they need.
Cicero meets all of our goals and does so with minimal custom code to maintain. It’s a system built by integrating best of breed solutions into a cohesive whole.