Sysadmin #6

Puppet at

image

a B&B production

Aurelien Rougemont <>

Arthur Gautier <>

Disclaimer

image

What is puppet ?

An open-source configuration management tool mainly written in Ruby and Clojure.

Puppet is extensible.

Puppet is modular.

Puppet is is a framework.

What is NOT puppet ?

Foremost puppet is not the silver bullet.

Puppet is not an orchestrator.

Puppet is not a lightweight.

Why did Gandi pick Puppet ?

  • pull model : puppet agent sends facts attached to a configuration request to a central service (puppetmasters)
  • promise theory
  • early adopters : in 2009 a very few could stand cfengine, chef had its first release in january and puppet was released since 2005
  • master or masterless model are possible
  • simple DSL : puppet DSL does not require to be ruby fluent
  • descriptive : the puppeteer describes the wanted state, puppet promises to conform the system to it
  • popular : puppet was widely used, almost already an industry standard. 7 years later it is still true : Google, Intel, Wikimedia, Paypal, Twitter, Zynga, Spotify, Dell , Rackspace and so on.

A bit of Gandi CM history

  • before 2009 : shell ninjas and mad packaging
  • 2009 : puppet repo creation at Gandi without ruby knowledge in the ops team
  • 2010 : it’s a mess but we manage. (CSV, MySQL, ENC, Git). no modules. no cry.
  • 2011 : it is becoming a gigantic mess. We design the first gandi module template with hiera at its heart
  • 2013 : puppet drops dynamic scoping feature. We can’t upgrade to puppet 3.x, we end up maintaining 2 production branches
  • 2014 : we integrate monitoring into puppet
  • 2015 : we normalize 250+ puppet modules in 10 days… and switch to puppet 3.7 with a few cool CI things
  • 2016 : we now face the orchestration challenge with laser eyes

Puppet is code right ?

image

But its also authors

image

Bringing order to this mess

Well we are in 2011 and we have a few means to organize all this. Let’s do this.

External Node Classifier

At first we did store ENC in database, but it became hard to correlate database’s state with repository’s content at a given commit in time.

An ENC is a command called by puppet during catalog compilation which links a datasource containing nodes declaration and afferent metadata information.

In 2015 we switched to yaml files storage for ENC : /path/to/puppet/enc/<fqdn>.yaml

---
classes:
  - myfarmclass
parameters:
  country: fr
  datacenter: equinix
  farm: myfarmclass
  flavor: myflavor

Now a simple commit gives us code/enc/facts context

from none to a bunch of servers

image

Facts matters

Puppet uses facter which presents facts to enrich metadata info for each node

[...]
kernelversion => 3.14.43
lsbdistcodename => jessie
lsbdistdescription => Debian GNU/Linux 8.1 (jessie)
lsbdistid => Debian
lsbdistrelease => 8.1
lsbmajdistrelease => 8
[...]
country => fr
datacenter => equinix
diskdrives => sda,sdb
diskid_765bc821-a587-45ac-a352-400d2ff8cc0e => sdb
diskid_d17a4911-c56d-40a8-b8bd-17570826e82e => sda
diskmodel_sda => QEMU HARDDISK   
diskmodel_sdb => QEMU HARDDISK   
diskscheduler_sda => deadline
diskscheduler_sdb => deadline
[...]

Those metadata are very important. The more you have, the merrier you will be when you will be describing a state in puppet code or when you will be orchestrating an infrastructure modification transaction.

Farms, flavors and facts

As said before, the more hosts you have the messier it gets. We mostly rely on :

  • farm : it’s is a group of functionaly identical servers. (i.e. : farm: api)
  • flavor : it’s a farm subset derived from a common base. (i.e. : farm: api, flavor: iaas)
  • facts : facter metadata, custom metadata embedded in puppet modules or ENC metadata

in-farm-ative numbers

image

But you said modular and extensible ?

image

Let’s talk about modules

In our first setup there was no module registry.

Noone had no common functions

In 2009-2011 we could not find any shared coding style inside public modules

Later on the entry cost for non-ruby-fluent to use the puppet library was and is too high

As a result in 2011 we came up with this simple module template design. So far it is still valid

Gandi module tree

Nothing peculiar about it but it is the same in 95% of puppet gandi modules

When reality checks in, we allow a very few exceptions but they have to be discussed with the team.

/path/to/module
           |-README                      # hiera configuration snip and information about the module
           |-metadata.json               # deprecated module medatada
           |-manifests                   # contains the puppet code
           | |-init.pp                   ## interface with data, also handles files inclusions
           | |-install.pp                ## handles package installation
           | |-config.pp                 ## handles templates (config files)
           | |-files.pp                  ## handles folders, static or binary files 
           | |-cron.pp                   ## handles cron jobs
           | |-service.pp                ## receives modifications notifications and handle services accordingly
           |
           |-lib
           | |-facter
           |   |-fact.rb                 ## if necessary add facts for the module
           |
           |-templates
           | |-prefix
           |   |-path
           |     |-file.ext.erb          ## config template
           |
           |-files
             |-prefix
               |-path
                 |-file.ext              ## blob file

Gandi module flow

image

Coding style

We enforce a strict policy on coding style for pragmatic reasons:

  • more human readable
  • portable code
  • best practices for a quicker and more efficient catalog compilation
  • easy to modify with your favorite do_your_magic_with_sed_awk_whatever

hiera as datasource

Hiera is a YAML based hierarchical database.

Gandi hiera levels (bottom to top) :
  • nodes
  • farm_datacenter_flavor
  • farm_datacenter
  • farm_flavor
  • farm
  • room
  • datacenter
  • country
  • os-lsb
  • os
  • common

Those levels are easily extensible (and we did add levels over time)

Level up

image

Flee you fools

A bit of puppet’s feature history

Feature 0.23.x 0.24.x 0.25.x 2.6.x 2.7.0 3.x 3.2.x 3.4.x
Dynamic variable scope X X X X X
Appending to attributes in class inheritance (+>) X X X X X X X X
Multi-line C-style comments X X X X X X X
Arrays of resource references allowed in relationships X X X X X X X
Overrides in class inheritance X X X X X X X
Appending to variables in child scopes (+=) X X X X X X X
Class names starting with 0-9 X X X
Regular expressions in node definitions X X X X X X
Assigning expressions to variables X X X X X X
Regular expressions in conditionals/expresions X X X X X X
elsif in if statements X X X X X
Chaining Resources X X X X X
Hashes X X X X X
Class Parameters X X X X X
Run Stages X X X X X
The “in” operator X X X X X
$title, $name, and $module_name available in parameter lists X X X X X
Optional trailing comma in parameter lists X X X X
Hyphens/dashes allowed in variable names * X
Automatic class parameter lookup via data bindings X X X
“Unless” conditionals X X X
Iteration over arrays and hashes X X
The modulo (%) operator X X
$trusted hash X

A bit of puppet’s feature vanishment history

Feature 0.23.x 0.24.x 0.25.x 2.6.x 2.7.0 3.x 3.2.x 3.4.x
Dynamic variable scope X X X X X
Appending to attributes in class inheritance (+>) X X X X X X X X
Multi-line C-style comments X X X X X X X
Arrays of resource references allowed in relationships X X X X X X X
Overrides in class inheritance X X X X X X X
Appending to variables in child scopes (+=) X X X X X X X
Class names starting with 0-9 X X X
Regular expressions in node definitions X X X X X X
Assigning expressions to variables X X X X X X
Regular expressions in conditionals/expresions X X X X X X
elsif in if statements X X X X X
Chaining Resources X X X X X
Hashes X X X X X
Class Parameters X X X X X
Run Stages X X X X X
The “in” operator X X X X X
$title, $name, and $module_name available in parameter lists X X X X X
Optional trailing comma in parameter lists X X X X
Hyphens/dashes allowed in variable names X
Automatic class parameter lookup via data bindings X X X
“Unless” conditionals X X X
Iteration over arrays and hashes X X
The modulo (%) operator X X
$trusted hash X

Dynamic variable scoping

Until puppet 2.6 included, dynamic scoping was allowed

Starting from the 3.x versions it has beed droped

<%= myvar =%>
became
<%= scope.lookupvar('mymodule::myvar') =%> # portable version between 2.6, 2.7, 3.7 and 3.8 can now be written scope['mymodule::myvar']

To sum it up you were able to use any variable from anywhere leading to naming conflicts and performance loss

# portable version between 2.6, 2.7, 3.7 and 3.8
<%= scope.lookupvar('mymodule::myvar') =%>
# can now be written (>= 3.7)
scope['mymodule::myvar']

Sad panda

image

Exported resources

This is good, but costly in terms of execution time and overall performance

Puppet datatypes appearance

Before Puppet 3.x you could do some dirty tricks to iterate over arrays

$users = [ "user1", "user2" ]

define print_users {
        $user = $name
        notify { "Found user $user":; }
}

print_users { $users:; }

Puppet version 3.x introduced a very important feature : variable types

"false" != false

Hiera merging policy change

Our default hiera merging policy is now deeper merge but it used to be something else

# from common.yaml
ssh_config:
    package:
        version: "latest"
    acl:
        root:
            - "georges"
            - "abitbol"

# from farm.yaml
ssh_config:
    package:
        version: "1:6.7p1-5+deb8u1"
    acl:
        root:
            - "peter"
            - "steeven"

# will result in :
ssh_config:
    package:
        version: "1:6.7p1-5+deb8u1"
    acl:
        root:
            - "georges"
            - "abitbol"
            - "peter"
            - "steeven"

Beware of deeper merge with arrays !

It is not possible to override the content of an array with deeper merge policy

workflow integration

Each gitlab MR triggers a jenkins job which validates a couple of things:

  • merge-ability to the production branch
  • puppet linting
  • gandi module layout
  • puppet syntax validation
  • YAML validation
  • ERB syntax validation

Branching policy

image

Feature branch example

local~:# git checkout -b features/ninjas-and-lasers
[...]
local~:# git commit -m 'and shit'
local~:# git push origin features/ninjas-and-lasers

# open a mr on gitlab

local~:# ssh testserver
testserver~:# puppet_dryrun --logdest=console --environment=mr42

Hey MR.

image

Under the hood

Each puppetmaster runs a python flask application that receives events from gitlab webhooks. Each merge-request activity is sent to puppetmaster which will checkout a new copy of the repository.

Required gitlab patch (https://github.com/gitlabhq/gitlabhq/pull/8872)

Git strategy

Consider the following situation:

image

What would you deploy?

Git strategy (our)

image

git merge --no-commit master

git-merge(1):

--commit, --no-commit
  Perform the merge and commit the result. This option can be used to override --no-commit.

  With --no-commit perform the merge but pretend the merge failed and do not autocommit, to give the user a
  chance to inspect and further tweak the merge result before committing.
Side-effect:
  • better (read simple) caching strategy.
Cons:
  • Have to redeploy all mr for each reference in either source (feature-branch) or target (master) branch.

Git magic

git --git-dir bare.puppet.git fetch
git clone --shared -b origin/feature-branch bare.puppet.git /srv/hasheds/$SOURCECOMMITID_$TARGETCOMMITID
git --work-tree=/srv/hasheds/$SOURCECOMMITID_$TARGETCOMMITID reset --hard $SOURCECOMMITID
git --work-tree=/srv/hasheds/$SOURCECOMMITID_$TARGETCOMMITID clean -xdf
git --work-tree=/srv/hasheds/$SOURCECOMMITID_$TARGETCOMMITID merge --no-commit $TARGETCOMMITID
ln -sf /srv/branches/mr42.new
mv /srv/branches/mr42.new /srv/branches/mr42

write once, read many

all deployments made atomics

number of inodes under control

Precious tools

Basic code validation

puppet-lint

check the syntax against upstream puppet rules and some specific and stricter Gandi rules.

fqdn~:# puppet-lint \
   --fail-on-warnings \
   --show-ignored \
   --no-nested_classes_or_defines-check \
   --no-autoloader_layout-check \
   --no-80chars-check \
   --with-filename your_file.pp

puppet validate

Validate the puppet DSL file.

fqdn~:# puppet validate your_file.pp

ERB checker

Check and validate the syntax of Ruby template.

fqdn~:# erb -P -x -T '-' $1 | ruby -c

YAML validation

Validate the YAML with ruby.

fqdn~:# ruby -e 'require "yaml"; File.open(ARGV[0]) { |f| YAML.load(f.read()) }' myfile.yaml

puppet code deployment

image

puppetdb

It’s a database (RDBS) with a REST API. It stores everything: every fact, exported resource and internal information.

curl -X GET --data '
{
  "query": "
    [\"and\"
      ,[\"=\", \"exported\",  true]
      ,[ \"not\",
        [\"=\", [\"parameter\", \"ensure\"], \"absent\"]
      ]
      ,[\"=\", [\"node\", \"active\"], true]
      ,[\"=\", \"type\", \"Nagios_host\"]
    ]",
  "order-by": "[{\"field\": \"title\"}]"
}' http://puppetdb/v3/resources

Pros:

  • Very fast
  • !ZOMG! fast
  • Extremely expressive api
  • Dumps json
  • You can pipe output to jinja and generate … nagios configuration
  • Have I said it was fast?

Cons:

  • brainfuck^Wclojure oriented api.

mcollective

Marionette collective is a Distributed RPC framework

Every gandi server runs mcollective

A mcollective client can broadcast encrypted orders according to metadata based filters Once the order is received and deciphered by the server, the server checks if it should apply it (security policy deployed through puppet).

mcollective aggregates results mcollective can control execution concurrency

[P]mcollectiveclient:~# mc-package -F grsec=unavailable -F country=fr -F lsbdistcodename=jessie status ssmtp

* [ ============================================================> ] 79 / 79

host1.domain.tld                         version = ssmtp-2.64-8
[...]
host79.domain.tld                        version = ssmtp-purged

---- package agent summary ----
           Nodes: 79 / 79
                   Versions: 33 * 2.64-8, 4 * absent, 42 * purged
                       Elapsed Time: 0.47 s

We mainly use mcollective for triggering puppet runs among farms.

Monitoring-integration

eg: params/farm/readdb.yaml

percona_mysql_config:
  tuning:
     innodb_buffer_pool_size:   "8G"
  [...]

monitoring_config:
  services_nrpe:
    df-mountpoint:
      check_command: "check_disk -E -e -w 10% -c 5% -W 10% -K 5% -p /srv/readdb"
    mysql_running:
      check_command: "check_mysql"
    mysql_slave_running:
      check_command: "gandi/check_mysql_replication -w :5 -c :15"
    mysql_config:
      description: "checks running mysqld config is in sync with /etc/mysql"
      check_command: "gandi/check_mysql_config -c 0:0"
  service_dependencies:
    mysql-running-config:
      service_description: mysql_running
      dependent_service_description: mysql_config
    mysql-running-slave:
      service_description: mysql_running
      dependent_service_description: mysql_slave_running

Context is important!

Service definition and it’s associated checks in the same file.

Monitoring-integration (2/x)

  • puppet agent asks a catalog to puppet master
  • puppet master compiles catalog
  • puppet master exports resources to puppetdb
  • puppet agent applys catalog
  • Custom script queries puppetdb and compiles to nagios configuration

Monitoring-integration (3/x)

Cons:

  • Have to run on every node to get a new check deployed

Pros:

  • Expressive
  • hiera granularity
  • Deep merge FTW
  • Result: x4 checks on same infrastructure

Monitoring-integration (4/x)

Don’t:

  • Ever, EVER import exported resources from puppet (even using puppetdb)

    Nagios_ghost <<| |>>
  • This is slow
  • Like hell
  • Mostly because marshalling / unmarshalling a catalog with thousand of objects takes forever

Homemade tools

hieracles / hieraviz

As the level defined in our hiera setup increases, it is humanly difficult to merge the hiera info (do the deeper merge). We spec and mose wrote a ruby CLI tool to do that : hieracles. He added a web interface/REST API: hieraviz to show the deeper merge in a graphical way. In a near future, users should be able to modify hiera variables using this web interface/REST API.

hiera{cles,viz} are now opensourced and published here : https://github.com/Gandi/hieracles

fqdn~:# hc hostname.domain.tld allparams '.*lldp.*'
[-] (merged)
[0] params/farm/myfarm.yaml
[1] params/datacenter/equinix.yaml
[2] params/common/common.yaml

[2] lldpd_config.cron_enable false
[2] lldpd_config.package_version latest
[2] lldpd_config.prefixes.files lldpd/default
[2] lldpd_config.prefixes.templates lldpd/default
[2] lldpd_config.service false
[2] lldpd_config.subscribe []

shell aliases

Small wrapper to unify our toolbox, avoid command error and learning change between versions of puppet.

Run by devops on local system:

[P]fqdn:~# puppet_status 
Last run: 2016-02-02 12:01:21.463621299
unlocked
[P]fqdn:~# puppet_lock test
[P]fqdn:~# puppet_status 
Last run: 2016-02-02 12:01:21.463621299
locked by 1.2.3.4 : test @ 2016-02-17 16:56:45.147621420 +0100
[P]fqdn:~# puppet_unlock 
by 1.2.3.4 : test


[P]fqdn:~# puppet_dryrun
[...]
[P]fqdn:~# puppet_run
[...]

pragmatic thoughts

  • we do need an orchestrator
  • we need to get rid of the ENC
  • we need to refactor hiera data. if some mad man with mad programming skillz has some time i have a mad idea. yes mad.
  • we might want to rethink our current hiera information guidelines

Thanks

Questions ?