Thursday, December 22, 2011

Load balancing and SSL in EC2

Here is another post I wrote for this year's Sysadvent blog. It briefly mentions some ways you can do load balancing in EC2, and focuses on how to upload SSL certificates to an Elastic Load Balancer using command-line tools. Any comments appreciated!

Monday, December 12, 2011

Analyzing logs with Pig and Elastic MapReduce

This is a blog post I wrote for this year's Sysadvent series. If you're not familiar with the Sysadvent blog, you should be, if you are at all interested in system administration/devops topics. It is maintained by the indefatigable Jordan Sissel, and it contains posts contributed by various authors, 25 posts per year, from Dec 1st through Dec 25th. The posts cover a variety of topics, and to give you a taste, here are the articles so far this year:

Day 1: "Don't bash your process outputs" by Phil Hollenback
Day 2: "Strategies for Java deployment" by Kris Buytaert
Day 3: "Share skills and permissions with code" by Jordan Sissel
Day 4: "A guide to packaging systems" by Jordan Sissel
Day 5: "Tracking requests with Request Tracker" by Christopher Webber
Day 6: "Always be hacking" by John Vincent
Day 7: "Change and proximity of communication" by Aaron Nichols
Day 8: "Running services with systemd" by Jordan Sissel
Day 9: "Data in the shell" by Jordan Sissel
Day 10: "Analyzing logs with Pig and Elastic MapReduce" by yours truly
Day 11: "Simple disk-based server backups with rsnapshot" by Phil Hollenback
Day 12: "Reverse-engineer servers with Blueprint" by Richard Crowley

Jordan needs more articles for this year, so if you have something to contribute, please propose it on the mailing list.

Wednesday, December 07, 2011

Crowd Mood - an indicator of health for products/projects

I thought I just coined a new term -- Crowd Mood -- but a quick Google search revealed a 2009 paper on "Crowd Behavior at Mass Gatherings: A Literature Review" (PDF) which says:

In the mass-gathering literature, the use of terms “crowd behavior”, “crowd type”, “crowd management”, and “crowd mood” are used in variable contexts. More practically, the term “crowd mood” has become an accepted measure of probable crowd behavior outcomes. This is particularly true in the context of crowds during protests/riots, where attempts have been made to identify factors that lead to a change of mood that may underpin more violent behavior.

Instead of protests and riots, the crowd behavior I'm referring to is the reaction of users of software products or projects. I think the overall Crowd Mood of these users is a good indicator of the health of those products/projects. I may state the obvious here, and maybe it's been done already, but I'm not aware of large-scale studies that try to correlate the success or failure of a particular software project with the mood of its users, as expressed in messages to mailing lists, in blog posts, and of course Twitter.

I'm aware of Sentiment Analysis and I know there are companies who offer this service by mining Twitter. But a more rigorous study would include other data sources. I have in mind something similar to this study by Kaleev Leetaru: "Culturonomics 2.0: Forecasting large-scale human behavior using global news media tone in time and space". This study mined information primarily from the archives of the "Summary of World Broadcasts (SWB)" global news monitoring service. It analyzed the tone or mood of the news regarding a particular event/person/place, and it established correlations between the sentiment it mined (for example negative news regarding Egypt's then-president Mubarak) and events that happen shortly afterwards, such as the Arab Spring of 2011. The study actually talks not only about correlation, but also about forecasting events based on current news tone.

I believe similar studies mining the Crowd Mood would be beneficial to any large-scale software product or project. For example, Canonical would be well-advised to conduct such a study in order to determine whether their decision to drop Gnome in favor of Unity was good or not (my feeling? it was BAD! -- and I think the Crowd Mood surrounding this decision would confirm it).

Another example: Python 3 and the decision not to continue releasing Python 2 past 2.8. Good or bad? I say BAD! (see also Armin Ronacher's recent blog post on the subject, which brilliantly expresses the issues around this decision).

Yet one more example: the recent changes in the Google UI, especially GMail. BAD!

These examples have a common theme in my opinion: the unilateral decision by a company or project to make non-backwards-compatible changes without really consulting its users. I know Apple can pull this off, but they're the exception, not the rule. The attitude of "trust us, we know what's good for you" leads to failure in the long run.

Somebody should build a product (or even better, an Open Source project) around Crowd Mood analysis. Maybe based on Google's Prediction API, which has functionality for sentiment analysis.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...