Monday, November 17, 2014

Service discovery with consul and consul-template

I talked in the past about an "Ops Design Pattern: local haproxy talking to service layer". I described how we used a local haproxy on pretty much all nodes at a given layer of our infrastructure (webapp, API, e-commerce) to talk to services offered by the layer below it. So each webapp server has a local haproxy that talks to all API nodes it sends requests to. Similarly, each API node has a local haproxy that talks to all e-commerce nodes it needs info from.

This seemed like a good idea at a time, but it turns out it has a couple of annoying drawbacks:
  • each local haproxy runs health checks against N nodes, so if you have M nodes running haproxy, each of the N nodes will receive M health checks; if M and N are large, then you have a health check storm on your hands
  • to take a node out of a cluster at any given layer, we tag it as 'inactive' in Chef, then run chef-client on all nodes that run haproxy and talk to the inactive node at layers above it; this gets old pretty fast, especially when you're doing anything that might conflict with Chef and that the chef-client run might overwrite (I know, I know, you're not supposed to do anything of that nature, but we are all human :-)
For the second point, we are experimenting with haproxyctl so that we don't have to run chef-client on every node running haproxy. But it still feels like a heavy-handed approach.

If I were to do this again (which I might), I would still have an haproxy instance in front of our webapp servers, but for communicating from one layer of services to another I would use a proper service discovery tool such as grampa Apache ZooKeeper or the newer kids on the block, etcd from CoreOS and consul from HashiCorp.

I settled on consul for now, so in this post I am going to show how you can use consul in conjunction with the recently released consul-template to discover services and to automate configuration changes. At the same time, I wanted to experiment a bit with Ansible as a configuration management tool. So the steps I'll describe were actually automated with Ansible, but I'll leave that for another blog post.

The scenario I am going to describe involves 2 haproxy instances, each pointing to 2 Wordpress servers running Apache, PHP and MySQL, with Varnish fronting the Wordpress application. One of the 2 Wordpress servers is considered primary as far as haproxy is concerned, and the other one is a backup server, which will only get requests if the primary server is down. All servers are running Ubuntu 12.04.

Install and run the consul agent on all nodes

The agent will start in server mode on the 2 haproxy nodes, and in agent mode on the 2 Wordpress nodes.

I first deployed consul to the 2 haproxy nodes. I used a modified version of the ansible-consul role from jivesoftware. The configuration file /etc/consul.cfg for the first server (lb1) is:

{
  "domain": "consul.",
  "data_dir": "/opt/consul/data",
  "log_level": "INFO",
  "node_name": "lb1",
  "server": true,
  "bind_addr": "10.0.0.1",
  "datacenter": "us-west-1b",
  "bootstrap": true,
  "rejoin_after_leave": true
}

(and similar for lb2, with only node_name and bind_addr changed to lb2 and 10.0.0.2 respectively)

The ansible-consul role also creates a consul user and group, and an upstart configuration file like this:

# cat /etc/init/consul.conf

# Consul Agent (Upstart unit)
description "Consul Agent"
start on (local-filesystems and net-device-up IFACE!=lo)
stop on runlevel [06]

exec sudo -u consul -g consul /opt/consul/bin/consul agent -config-dir /etc/consul.d -config-file=/etc/consul.conf >> /var/log/consul 2>&1
respawn
respawn limit 10 10
kill timeout 10

To start/stop consul, I use:

# start consul
# stop consul

Note that "server" is set to true and "bootstrap" is also set to true, which means that each consul server will be the leader of a cluster with 1 member, itself. To join the 2 servers into a consul cluster, I did the following:
  • join lb1 to lb2: on lb1 run consul join 10.0.0.2
  • tail /var/log/consul on lb1, note messages complaining about both consul servers (lb1 and lb2) running in bootstrap mode
  • stop consul on lb1: stop consul
  • edit /etc/consul.conf on lb1 and set  "bootstrap": false
  • start consul on lb1: start consul
  • tail /var/log/consul on both lb1 and lb2; it should show no more errors
  • run consul info on both lb1 and lb2; the output should show server=true on both nodes, but leader=true only on lb2
Next I ran the consul agent in regular non-server mode on the 2 Wordpress nodes. The configuration file /etc/consul.cfg on node wordpress1 was:

{
  "domain": "consul.",
  "data_dir": "/opt/consul/data",
  "log_level": "INFO",
  "node_name": "wordpress1",
  "server": false,
  "bind_addr": "10.0.1.1",
  "datacenter": "us-west-1b",
  "rejoin_after_leave": true
}

(and similar for wordpress2, with the node_name set to wordpress2 and bind_addr set to 10.0.1.2)

After starting up the agents via upstart, I joined them to lb2 (although the could be joined to any of the existing members of the cluster). I ran this on both wordpress1 and wordpress2:

# consul join 10.0.0.2

At this point, running consul members on any of the 4 nodes should show all 4 members of the cluster:

Node          Address         Status  Type    Build  Protocol
lb1           10.0.0.1:8301   alive   server  0.4.0  2
wordpress2    10.0.1.2:8301   alive   client  0.4.0  2
lb2           10.0.0.2:8301   alive   server  0.4.0  2
wordpress1    10.0.1.1:8301   alive   client  0.4.0  2

Install and run dnsmasq on all nodes

The ansible-consul role does this for you. Consul piggybacks on DNS resolution for service naming, and by default the domain names internal to Consul start with consul. In my case they are configured in consul.cfg via "domain": "consul."

The dnsmasq configuration file for consul is:

# cat /etc/dnsmasq.d/10-consul

server=/consul./127.0.0.1#8600

This causes dnsmasq to provide DNS resolution for domain names starting with consul. by querying a DNS server on 127.0.0.1 running on port 8600 (which is the port the local consul agent listens on to provide DNS resolution).

To start/stop dnsmasq, use: service dnsmasq start | stop.

Now that dnsmasq is running, you can look up names that end in .node.consul from any member node of the consul cluster (there are 4 member nodes in my cluster, 2 servers and 2 agents). For example, I ran this on lb2:

$ dig wordpress1.node.consul

; <<>> DiG 9.8.1-P1 <<>> wordpress1.node.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2511
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;wordpress1.node.consul. IN A

;; ANSWER SECTION:
wordpress1.node.consul. 0 IN A 10.0.1.1

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Nov 14 00:09:16 2014
;; MSG SIZE  rcvd: 76

Configure services and checks on consul agent nodes

Internal DNS resolution within the .consul domain becomes even more useful when nodes define services and checks. For example, the 2 Wordpress nodes run varnish and apache (on port 80 and port 443) so we can define 3 services as JSON files in /etc/consul.d. On wordpress1, which is our active/primary node in haproxy, I defined these services:

$ cat http_service.json
{
    "service": {
        "name": "http",
        "tags": ["primary"],
        "port":80,
        "check": {
                "id": "http_check",
                "name": "HTTP Health Check",
  "script": "curl -H 'Host=www.mydomain.com' http://localhost",
        "interval": "5s"
        }
    }
}

$ cat ssl_service.json
{
    "service": {
        "name": "ssl",
        "tags": ["primary"],
        "port":443,
        "check": {
                "id": "ssl_check",
                "name": "SSL Health Check",
  "script": "curl -k -H 'Host=www.mydomain.com' https://localhost:443",
        "interval": "5s"
        }
    }
}

$ cat varnish_service.json
{
    "service": {
        "name": "varnish",
        "tags": ["primary"],
        "port":6081 ,
        "check": {
                "id": "varnish_check",
                "name": "Varnish Health Check",
  "script": "curl http://localhost:6081",
        "interval": "5s"
        }
    }
}

Each service we defined has a name, a port and a check with its own ID, name, script that runs whenever the check is executed, and an interval that specifies how often the check is run. In the examples above I specified simple curl commands against the ports that these services are running on. Note also that each service has a list of tags associated with it. In my case, the services on wordpress1 have the tag "primary". The services defined on wordpress2 are identical to the ones on wordpress1 with the only difference being the tag, which on wordpress2 is "backup".

After restarting consul on wordpress1 and wordpress2, the following service-related DNS names are available for resolution on all nodes in the consul cluster (I am going to include only relevant portions of the dig output):

$ dig varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN A 10.0.1.1
varnish.service.consul. 0 IN A 10.0.1.2

This name resolves in DNS round-robin fashion to the IP addresses of all nodes that are running the varnish service, regardless of their tags and regardless of the data centers that their nodes run in. In our case, it resolves to the IP addresses of wordpress1 and wordpress2.

Note that the IP address of a given node only appears in the DNS result set if the service running on that node has a healty check. If the check fails, then consul's DNS service will not include the IP of the node in the result set. This is very important for the dynamic discovery of healthy services.

$ dig varnish.service.us-west-1b.consul

;; ANSWER SECTION:
varnish.service.us-west-1b.consul. 0 IN A 10.0.1.2
varnish.service.us-west-1b.consul. 0 IN A 10.0.1.1

If we include the data center (in our case us-west-1b) in the DNS name we query, then only the services running on nodes in that data center will be returned in the result set. In our case though, all nodes run in the us-west-1b data center, so this query returns, like the previous one, the IP addresses of wordpress1 and wordpress2. Note that the IPs can be returned in any order, because of DNS round-robin. In this case the IP of wordpress2 was first.

$ dig SRV varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN SRV 1 1 6081 wordpress1.node.us-west-1b.consul.
varnish.service.consul. 0 IN SRV 1 1 6081 wordpress2.node.us-west-1b.consul.

;; ADDITIONAL SECTION:
wordpress1.node.us-west-1b.consul. 0 IN A 10.0.1.1
wordpress2.node.us-west-1b.consul. 0 IN A 10.0.1.2

A useful feature of the consul DNS service is that it returns the port number that a given service runs on when queried for an SRV record. So this query returns the names and IPs of the nodes that the varnish service runs on, as well as the port number, which in this case is 6081. The application querying for the SRV record needs to interpret this extra piece of information, but this is very useful for the discovery of internal services that might run on non-standard port numbers.

$ dig primary.varnish.service.consul

;; ANSWER SECTION:
primary.varnish.service.consul. 0 IN A 10.0.1.1

$ dig backup.varnish.service.consul

;; ANSWER SECTION:
backup.varnish.service.consul. 0 IN A 10.0.1.2

The 2 DNS queries above show that it's possible to query a service by its tag, in our case 'primary' vs. 'backup'. The result set will contain the IP addresses of the nodes tagged with the specific tag and running the specific service we asked for. This feature will prove useful when dealing with consul-template in haproxy, as I'll show later in this post.

Load balance across services

It's easy now to see how an application can take advantage of the internal DNS service provided by consul and load balance across services. For example, an application that needs to load balance across the 2 varnish services on wordpress1 and wordpress2 would use varnish.service.consul as the DNS name it talks to when it needs to hit varnish. Every time this DNS name is resolved, a random node from wordpress1 and wordpress2 is returned via the DNS round-robin mechanism. If varnish were to run on a non-standard port number, the application would need to issue a DNS request for the SRV record in order to obtain the port number as well as the IP address to hit.

Note that this method of load balancing has health checks built in. If the varnish health check fails on one of the nodes providing the varnish service, that node's IP address will not be included in the DNS result set returned by the DNS query for that service.

Also note that the DNS query can be customized for the needs of the application, which can query for a specific data center, or a specific tag, as I showed in the examples above.

Force a node out of service

I am still looking for the best way to take nodes in and out of service for maintenance or other purposes. One way I found so far is to deregister a given service via the Consul HTTP API. Here is an example of a curl command that accomplishes that, executed on node wordpress1:

$ curl -v http://localhost:8500/v1/agent/service/deregister/varnish
* About to connect() to localhost port 8500 (#0)
*   Trying 127.0.0.1... connected
> GET /v1/agent/service/deregister/varnish HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: localhost:8500
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 17 Nov 2014 19:01:06 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host localhost left intact
* Closing connection #0

The effect of this command is that the varnish service on node wordpress1 is 'deregistered', which for my purposes means 'marked as down'. DNS queries for varnish.service.consul will only return the IP address of wordpress2:

$ dig varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN A 10.0.1.2

We can also use the Consul HTTP API to verify that the varnish service does not appear in the list of active services on node wordpress1. We'll use the /agent/services API call and we'll save the output to a file called services.out, then we'll use the jq tool to pretty-print the output:

$ curl -v http://localhost:8500/v1/agent/services -o services.out

$ jq . <<< `cat services.out`
{
 "http": {
   "ID": "http",
   "Service": "http",
   "Tags": [
     "primary"
   ],
   "Port": 80
 },
 "ssl": {
   "ID": "ssl",
   "Service": "ssl",
   "Tags": [
     "primary"
   ],
   "Port": 443
 }
}

Note that only the http and ssl services are shown.

Force a node back in service

Again, I am still looking for the best way to mark as service as 'up' once it was marked as 'down'. One way would be to register the service via the Consul HTTP API, and that requires issuing a POST request with the payload being the JSON configuration file for that service. Another way is to just restart the consul agent on the node in question. This will register the service that had been deregistered previously.

Install and configure consul-template

For the next few steps, I am going to show how to use consul-template in conjuction with consul for discovering services and configuring haproxy based on the discovered services.

I automated the installation and configuration of consul-template via an Ansible role that I put on Github, but I am going to discuss the main steps here. See also the instructions on the consul-template Github page.

In my Ansible role, I copy the consul-template binary to the target node (in my case the 2 haproxy nodes lb1 and lb2), then create a directory structure /opt/consul-template/{bin,config,templates}. The consul-template configuration file is /opt/consul-template/config/consul-template.cfg and it looks like this in my case:

$ cat config/consul-template.cfg
consul = "127.0.0.1:8500"

template {
  source = "/opt/consul-template/templates/haproxy.ctmpl"
  destination = "/etc/haproxy/haproxy.cfg"
  command = "service haproxy restart"
}

Note that consul-template needs to be able to talk a consul agent, which in my case is the local agent listening on port 8500. The template that consul-template maintains is defined in another file,  /opt/consul-template/templates/haproxy.ctmpl. What consul-template does is monitor changes to that file via changes to the services referenced in the file. Upon any such change, consul-template will generate a new target file based on the template and copy it to the destination file, which in my case is the haproxy config file /etc/haproxy/haproxy.cfg. Finally, consul-template will executed a command, which in my case is the restarting of the haproxy service.

Here is the actual template file for my haproxy config, which is written in the Go template format:

$ cat /opt/consul-template/templates/haproxy.ctmpl

global
  log 127.0.0.1   local0
  maxconn 4096
  user haproxy
  group haproxy

defaults
  log     global
  mode    http
  option  dontlognull
  retries 3
  option redispatch
  timeout connect 5s
  timeout client 50s
  timeout server 50s
  balance  roundrobin

# Set up application listeners here.

frontend http
  maxconn {{key "service/haproxy/maxconn"}}
  bind 0.0.0.0:80
  default_backend servers-http-varnish

backend servers-http-varnish
  balance            roundrobin
  option httpchk GET /
  option  httplog
{{range service "primary.varnish"}}
    server {{.Node}} {{.Address}}:{{.Port}} weight 1 check port {{.Port}}
{{end}}
{{range service "backup.varnish"}}
    server {{.Node}} {{.Address}}:{{.Port}} backup weight 1 check port {{.Port}}
{{end}}

frontend https
  maxconn            {{key "service/haproxy/maxconn"}}
  mode               tcp
  bind               0.0.0.0:443
  default_backend    servers-https

backend servers-https
  mode               tcp
  option             tcplog
  balance            roundrobin
{{range service "primary.ssl"}}
    server {{.Node}} {{.Address}}:{{.Port}} weight 1 check port {{.Port}}
{{end}}
{{range service "backup.ssl"}}
    server {{.Node}} {{.Address}}:{{.Port}} backup weight 1 check port {{.Port}}
{{end}}


To the trained eye, this looks like a regular haproxy configuration file, with the exception of the portions bolded above. These are Go template snippets which rely on a couple of template functions exposed by consul-template above and beyond what the Go templating language offers. Specifically, the key function queries a key stored in the Consul key/value store and outputs the value associated with that key (or an empty string if the value doesn't exist). The service function queries a consul service by its DNS name and returns a result set used inside the range statement. The variables inside the result set can be inspected for properties such as Node, Address and Port, which correspond to the Consul service node name, IP address and port number for that particular service.

In my example above, I use the value of the key service/haproxy/maxconn as the value of maxconn. In the http-varnish backend, I used 2 sets of services names, primary.varnish and backup.varnish, because I wanted to differentiate in haproxy.cfg between the primary server (wordpress1 in my case) and the backup server (wordpress2). In the ssl backend, I did the same but with the ssl service.

Everything so far would work fine with the exception of the key/value pair represented by the key service/haproxy/maxconn. To define that pair, I used the Consul key/value store API (this can be run on any member of the Consul cluster):

$ cat set_haproxy_maxconn.sh
#!/bin/bash

MAXCONN=4000

curl -X PUT -d "$MAXCONN" http://localhost:8500/v1/kv/service/haproxy/maxconn

To verify that the value was set, I used:

$ cat query_consul_kv.sh
#!/bin/bash

curl -v http://localhost:8500/v1/kv/?recurse

$ ./query_consul_kv.sh
* About to connect() to localhost port 8500 (#0)
*   Trying 127.0.0.1... connected
> GET /v1/kv/?recurse HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: localhost:8500
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< X-Consul-Index: 30563
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Mon, 17 Nov 2014 23:01:07 GMT
< Content-Length: 118
<
* Connection #0 to host localhost left intact
* Closing connection #0
[{"CreateIndex":10995,"ModifyIndex":30563,"LockIndex":0,"Key":"service/haproxy/maxconn","Flags":0,"Value":"NDAwMA=="}]

At this point, everything is ready for starting up the consul-template service (in Ubuntu), I did it via this Upstart configuration file:

# cat /etc/init/consul-template.conf
# Consul Template (Upstart unit)
description "Consul Template"
start on (local-filesystems and net-device-up IFACE!=lo)
stop on runlevel [06]

exec /opt/consul-template/bin/consul-template  -config=/opt/consul-template/config/consul-template.cfg >> /var/log/consul-template 2>&1

respawn
respawn limit 10 10
kill timeout 10

# start consul-template

Once consul-template starts, it will peform the actions corresponding to the functions defined in the template file /opt/consul-template/templates/haproxy.ctmpl. In my case, it will query Consul for the value of the key service/haproxy/maxconn and for information about the 2 Consul services varnish.service and ssl.service. It will then save the generated file to /etc/haproxy/haproxy.cfg and it will restart the haproxy service. The relevant snippets from haproxy.cfg are:

frontend http
  maxconn 4000
  bind 0.0.0.0:80
  default_backend servers-http

backend servers-http
  balance            roundrobin
  option httpchk GET /
  option  httplog

    server wordpress1 10.0.1.1:6081 weight 1 check port 6081


    server wordpress2 10.0.1.2:6081 backup weight 1 check port 6081

and

frontend https
  maxconn            4000
  mode               tcp
  bind               0.0.0.0:443
  default_backend    servers-https

backend servers-https
  mode               tcp
  option             tcplog
  balance            roundrobin

    server wordpress1 10.0.1.1:443 weight 1 check port 443


    server wordpress2 10.0.1.2:443 backup weight 1 check port 443

I've been running this as a test on lb2. I don't consider my setup quite production-ready because I don't have monitoring in place, and I also want to experiment with consul security tokens for better security. But this is a pattern that I think will work.






Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...