Scalability of CFEngine and Puppet

UPDATE: Please see this updated post instead.

NOTE: These tests were carried out in the beginning of 2011, with the newest stable versions of CFEngine and Puppet back then (CFEngine web site and Suse Linux repository, respectively). New versions of both solutions have been released since.

Introduction

CFEngine and Puppet are configuration management solutions that can help you automate IT infrastructure. Practical examples include adding local users, installing Apache, and making sure password-based authentication in sshd is turned off. The more servers and complexity you have, the more interesting such a solution becomes for you.

The companies and communities behind CFEngine and Puppet frequently make claims that their solution “is scalable”. They point to some users managing hundreds, thousands, even tens of thousands of servers with the tools. But what does all this mean? Is CFEngine and Puppet indistinguishable with respect to scalability?

In this post, we will highlight one aspect of scalability: how many clients a server can handle. As always, to find useful answers, we should do measurements — and stop listening to the marketing departments.

 

Test setup

Amazon EC2 is used to measure performance at the server while the number of clients is being increased in steps every 30 minutes. We start up with 25 clients, then go to 50, 100, 200, 400.

CPU usage at the central server and client execution time is being measured. The Amazon CludWatch system is being used to monitor resource usage.

The following cloud instances were used,

  • Clients: t1.micro. up to 2 ECUs, 1 core, 613MB memory, SuSE 11.1, 64-bit
  • Server: m1.large. 4 ECUs, 2 cores, 7.5GB memory, SuSE 11.1, 64-bit

and these versions of each tool:

  • CFEngine 3.1.5 (with Enterprise extensions)
  • Puppet 0.24.8

Both tools were set to run every 5 minutes.

 

Policy/manifest details

Neither CFEngine nor Puppet will perform any action if you do not tell it to. During these tests, a policy or manifest to copy 20 configuration files from the server (totalling 140 kb) was used to have something realistic to test.

Doing file copies and templating is very common in the space of configuration management, merely because all Unix systems can be configured in terms of various configuration files.

 

Results

The CFEngine server was not much affected by the extra load of more clients. At 400 clients, the average server CPU usage is below 10%. Client execution time is pretty much constant at 7.85 seconds independent of the number of clients.

CFEngine server CPU usage
CFEngine server CPU usage

 

For Puppet, the story looks different. The Puppet server worked up to 50 clients, but stopped responding to client requests at 100. Reducing the amount of clients back down to 50 made the Puppet server start responding again.

Another thing to note is that the CPU graph for Puppet has much more spikes than the smoother graph for CFEngine.

Puppet server CPU usage
Puppet server CPU usage

 

Conclusions

We did not manage to find the maximum client count for CFEngine servers, but if you extrapolate the CPU graph, it should lie somewhere between 4000 – 5000 clients. The Puppet server started failing between 50 – 100 clients. The difference is mind-blowing!

We have just discussed the amount of clients per server. Scalability is a abstract term, so there are definitively other interesting aspects to highlight. One example is, how easy is it to scale to multiple servers?

This entry was posted in CFEngine and tagged , , , , . Bookmark the permalink.

15 Responses to Scalability of CFEngine and Puppet

  1. Jeff Blaine says:

    Why on earth would you test with Puppet 0.24.8, which is no less than 3 years old?

  2. He also used CFEngine 2.x instead of 3.x. I guess he just used whatever package was available on the distribution he used.

    Anyway I don’t think the latest Puppet will be 10x faster and CFEngine 3.x 10x slower.

    • Jeff Blaine says:

      Laurent, he used CFEngine Enterprise 2.0.2 which is CFEngine 3. I don’t think the latest Puppet will be 10x faster than 0.24.8 either, but it’s a completely bogus comparison that negates most of that effort he went to.

      • author says:

        Is there any evidence that the new Puppet is faster than the old?

        It seems the primary reasons for the heavy load on the Puppet master is the heavily centralised architecture (it basically does all the work for the clients), plus that the Ruby interpreter is not that efficient. This has not changed, no?

        The reason they are not the newest packages is that this test was done about a year ago – but I didn’t have the time to publish until now.

  3. Mark Burgess says:

    This comparison does seem to be a bit old. While I appreciate the spirit of the study, its bottom line would be easier to accept if it were current. (There doesn’t seem to be a date on the post, so it is probably an old thing that someone rediscovered now). It would be really nice and valuable to users to find a truly independent party to make a thorough price/performance/functionality comparison. Very few people out there are credibly independent, even if their intentions are honest – so that isn’t an easy task, and in my experience it is considered bad form to compare open source software like that.

    One issue that is often forgotten is not just speed, but the cost of scaling. Of course, you can scale any software by brute force — just throwing more hardware at the problem (the old Microsoft strategy) — but then you need to include that hardware cost, and additional management overhead in the calculation. Today, esp. small users don’t care about scalability, they are more interested in quick and easy, but I believe that will change when you see how much extra it costs to run heavyweight software, esp in the cloud where this is a very visible cost. I have seen some cost estimates claiming that the cost of running Puppet vs CFEngine in EC2 was a factor of 10x. That seems also hard to believe, but it would be valuable to users if an independent investigator were to publish a detailed study of these things where all of the assumptions and details were included.

    I have read that the default Puppet settings are not really suitable out of the box, but that if you install a bunch of extra stuff, it will perform better. It is not clear from the study here whether this was the default or not, so there are many questions. Perhaps this should be viewed as a challenge to someone out there to update this study.

  4. author says:

    I have updated the post now, so that it is clear that the versions tested are not current anymore.

    Thanks for the feedback!

  5. Thank you for publishing your findings. I subscribe to the maxim “better late than never” so applaud you for reporting your findings.

    I might have mentioned in your post that you work for CFEngine. :-) (Even though that is clear from other articles on your blog.)

    Mark, could you please clarify why this would be considered bad form? Would it have been better if Puppet did a technical review of the article before it was published? Is it bad form because the report is tainted by affiliation? I’m not disagreeing, just want to understand this better. :)

  6. Seth Thomas says:

    So reading this very much as a challenge to update I am curious as to the details of how the test was conducted. Questions like: what exactly were the actions the playbooks/cookbooks were performing? How were the nodes instantiated/bootstrapped? Were the tests repeated multiple times?

    Without more detail and given the age of the software in play it is hard to take away much value from this comparsion.

    • author says:

      Seth,

      I agree, more details & reproducibility would be good. Unfortunately, I don’t currently have all the manifest/policy details, which is an important aspect of this. What were you wondering about with respect to bootstrapping? Anything else you would like to see?

      Thanks.

      • Seth Thomas says:

        Curious how the machines were brought up – from a script, ec2-tools, web interface. Multiple runs of each test (perhaps at different times of day) would also be ideal in a benchmark scenario to help average out the variability of EC2 performance. Also throwing Chef into the mix would be extremely interesting as that would round out the major players in the config management space. Still very interesting data even if dated.

  7. Pingback: Scalability of CFEngine 3.3.5 and Puppet 2.7.19 | BlogCompiler

  8. author says:

    Hi guys,

    I have created an updated post based on your feedback: http://www.blogcompiler.com/2012/09/30/scalability-of-cfengine-and-puppet-2/

    I hopes this addresses all your concerns.
    Thanks again.

Leave a Reply

Your email address will not be published. Required fields are marked *


+ 3 = nine

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>