Wednesday, July 13, 2016

Using JMESPath queries with the AWS CLI

The AWS CLI, based on the boto3 Python library, is the recommended way of automating interactions with AWS. In this post I'll show some examples of more advanced AWS CLI usage using the query mechanism based on the JMESPath JSON query language.

Installing the AWS CLI tools is straightforward. On Ubuntu via apt-get:

# apt-get install awscli

Or via pip:

# apt-get install python-pip
# pip install awscli

The next step is to configure awscli by specifying the AWS Access Key ID and AWS Secret Access Key, as well as the default region and output format:

# aws configure
AWS Access Key ID: your-aws-access-key-id
AWS Secret Access Key: your-aws-secret-access-key
Default region name [us-west-2]: us-west-2

Default output format [None]: json

The configure command creates a ~/.aws directory containing two files: config and credentials.

You can specify more than one pair of AWS keys by creating profiles in these files. For example, in ~/.aws/credentials you can have:

[profile profile1]
AWS_ACCESS_KEY_ID=key1
AWS_SECRET_ACCESS_KEY=secretkey1

[profile profile2]
AWS_ACCESS_KEY_ID=key2
AWS_SECRET_ACCESS_KEY=secretkey2

In ~/.aws/config you can have:
[profile profile1]
region = us-west-2

[profile profile2]
region = us-east-1

You can specify a given profile when you run awscli:

# awscli --profile profile1

Let's assume you want to write a script using awscli that deletes EBS snapshots older than N days. Let's go through this one step at a time.

Here's how you can list all snapshots owned by you:

# aws ec2 describe-snapshots --profile profile1 --owner-id YOUR_AWS_ACCT_NUMBER --query "Snapshots[]"

Note the use of the --query option. It takes a parameter representing a JMESPath JSON query string. It's not trivial to figure out how to build these query strings, and I advise you to spend some time reading over the JMESPath tutorial and JMESPath examples.

In the example above, the query string is simply "Snapshots[]", which represents all the snapshots that are present in the AWS account associated with the profile profile1. The default output in our case is JSON, but you can specify --output text at the aws command line if you want to see each snapshot on its own line of text.

Let's assume that when you create the snapshots, you specify a description with contains PROD or STAGE for EBS volumes attached to production and stage EC2 instances respectively. If you want to only display snapshots containing the string PROD, you would do:

# aws ec2 describe-snapshots --profile profile1 --owner-id YOUR_AWS_ACCT_NUMBER--query "Snapshots[?contains(Description, \`PROD\`) == \`true\`]" --output text

The Snapshots[] array now contains a condition represented by the question mark ?. The condition uses the contains() function included in the JMESPath specification, and is applied against the Description field of each object in the Snapshots[] array, verifying that it contains the string PROD.  Note the use of backquotes surrounding the strings PROD and true in the condition. I spent some quality time troubleshooting my queries when I used single or double quotes with no avail. The backquotes also need to be escaped so that the shell doesn't interpret them as commands to be executed.

To restrict the PROD snapshots even further, to the ones older than say 7 days ago, you can do something like this:

DAYS=7
TARGET_DATE=`date --date="$DAYS day ago" +%Y-%m-%d`

# aws ec2 describe-snapshots --profile profile1 --owner-id YOUR_AWS_ACCT_NUMBER--query "Snapshots[?contains(Description, \`PROD\`) == \`true\`]|[?StartTime < \`$TARGET_DATE\`]" --output text

Here I used the StartTime field of the objects in the Snapshots[] array and compared it against the target date. In this case, string comparison is good enough for the query to work.

In all the examples above, the aws command returned a subset of the Snapshots[] array and displayed all fields for each object in the array. If you wanted to display specific fields, let's say the ID, the start time and the description of each snapshot, you would run:

# aws ec2 describe-snapshots --profile profile1 --owner-id YOUR_AWS_ACCT_NUMBER--query "Snapshots[?contains(Description, \`PROD\`) == \`true\`]|[?StartTime < \`$TARGET_DATE\`].[SnapshotId,StartTime,Description]" --output text

To delete old snapshots, you can use the aws ec2 delete-snapshot command, which needs a snapshot ID as a parameter. You could use the command above to list only the SnapshotId for snapshots older than N days, then for each of these IDs, run something like this:

# aws ec2 delete-snapshot --profile profile1 --snapshot-id $id

All this is well and good when you run these commands interactively at the shell. However, I had no luck running them out of cron. The backquotes resulted in boto3 syntax errors. I had to do it the hard way, by listing all snapshots first, then going all in with sed and awk:

aws ec2 describe-snapshots --profile profile1 --owner-id YOUR_AWS_ACCT_NUMBER --output=text --query "Snapshots[].[SnapshotId,StartTime,Description]"  > $TMP_SNAPS

DAYS=7
TARGET_DATE=`date --date="$DAYS day ago" +%Y-%m-%d`

cat $TMP_SNAPS | grep PROD | sed 's/T[0-9][0-9]:[0-9][0-9]:[0-9][0-9].000Z//' | awk -v target_date="$TARGET_DATE" '{if ($2 < target_date){print}}' > $TMP_PROD_SNAPS

echo PRODUCTION SNAPSHOTS OLDER THAN $DAYS DAYS
cat $TMP_PROD_SNAPS

for sid in `awk '{print $1}' $TMP_PROD_SNAPS` ; do
echo Deleting PROD snapshot $sid
aws ec2 delete-snapshot --profile $PROFILE --region $REGION --snapshot-id $sid
done

Ugly, but it works out of cron. Hope it helps somebody out there.

July 14th 2016: I initially forgot to include this very good blog post from Joseph Lawson on advanced JMESPath usage with the AWS CLI.

No comments:

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...