If It’s Not on Prod, It Doesn’t Count: The Value of Frequent Releases

At Shutterstock, we like to release code.  A lot.  We do it about 60 times per week.

Frequent code releases have become somewhat of a mantra among today’s fast-moving startups, but the value they bring isn’t always articulated well.  In fact, there are a lot of reasons not to push frequently: you could release shoddy or incomplete software, it might not be thoroughly tested, or you might not like the constant pressure of production deployments.

So it’s worth stepping back to look at all the benefits that frequent releases bring:

1) You deliver value to customers more quickly.

This is the first principle of the agile manifesto: “Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.”  Features that are sitting in your development environment aren’t benefiting your customers.  Frequent releases get those features into the wild so that your customers can use them.

Be sure to relentlessly focus on delivering value to customers.  Too often, frequent releases are interpreted as breaking big, complicated projects into component parts: first you tackle database schema changes, then business logic, then graphic design.  That’s not the point.  The point is to deliver complete, valuable features to customers as quickly as possible.

This idea also isn’t about releasing half-baked or hacky code.  The art is in finding the smallest implementation the team can develop, test, and release within a short period of time.   It helps to ask yourself, “what is the smallest impactful change we can make to get to our goal?”  Then, challenge what you decided was “smallest” — can you really not get there with an even smaller implementation?  In the end, you want to do the minimal work to test your idea with customers, then learn and repeat.

2) You learn quickly.

The lean software movement has popularized a revolutionary business philosophy: We don’t know what the best thing to do is.  The only way we can know it is to put something in front of customers and get their reactions to it.

By releasing software frequently, you have many more opportunities to get customer feedback and pivot based on it.  You avoid going too far down a path that’s not valuable.

3) It forces you to break big ideas into manageable pieces.

Big projects are risky, complex, and interminable.  By breaking big projects into small pieces and releasing one piece at a time, we not only deliver value more quickly, but we avoid death marches that demoralize software teams.

This is far easier said than done, because everyone loves big, splashy projects — they generate attention, they get people excited, and they offer a fleeting sense of accomplishment.  But users rarely like big, splashy projects.  In fact, users generally don’t like any sort of change; it forces them to re-learn something that they don’t want to re-learn.  By delivering small pieces of functionality, you provide additional value to users without surprising them with radical change.

Some people will object that an incremental process ultimately takes longer than a monolithic one.  That’s okay — it’s a trade-off we’re very happy to make, for two reasons: first, although the final result may end up in customers’ hands later, we’ve been delivering small pieces of value the whole time.  Second, it lets us change direction along the way as we learn instead of committing to a big project that we’re not sure has value.

4) You avoid horrible merges.

Merging code has always been and always will be a pain in the ass.  The more we can avoid it, the happier and more productive we’ll all be.  Frequent releases mean that code merges are small and simple (if they’re necessary at all).  This means you can move more quickly, and developers stay happier.

5) With good automated testing and an a/b testing platform, you reduce risk.

One complication of releasing frequently is making sure that your software works well and is thoroughly tested.  That’s why automated tests are so important in an agile environment — they let you quickly and thoroughly ensure that your code works.  Shorter release cycles inherently produce smaller code pushes.  In general, smaller code pushes are less risky simply because fewer things can go wrong.  By coupling small code pushes with automated testing, you can move quickly with little risk.

A good a/b testing platform also lets you iterate rapidly with low risk.  If you’re able to test changes on 1% of your customers, you drastically reduce the risk associated with rolling out new features, and are able to learn and adapt more quickly.

6) You reduce complexity.

Lots of developers like to over-engineer.  Given enough time, we’ll build dozens of layers of unnecessary abstraction (see Parkison’s Law).  By requiring frequent releases, we push ourselves to choose the simplest path forward.

If not done well, it is possible to paint yourself into a corner with this approach.  It’s important to remember that frequent releases don’t mean short-sighted thinking.  You can still get to a distant goal by approaching it one step at a time.

7) It keeps people motivated.

Who wants to work on a project for months (or years) and never have the thrill of showing it off to their friends?  Or hearing what customers think of it?  Frequent releases motivate people by letting them see the results of their hard work.

We use the scrum/agile framework, with two-week sprints and a demo at the end of each iteration.  A few years ago we started enforcing a rule to drive this point home: you can only demo what’s on production.  If it’s not on prod, it doesn’t count. That’s our way of saying, “You can code all you want, but all that matters is what our customers can do with it.”

For all these reasons, we evangelize frequent releases.  That’s not meant to minimize their difficulty.  It’s often very challenging to figure out how to take a small step forward that delivers value to customers while working towards a more distant goal and letting you change direction if necessary.  We never said it was easy.  In fact, it’s probably one of the most difficult problems in modern software development, because it requires developers to not only be great architects but also appreciate customer needs and product development.  But it’s the best method we’ve found for moving our business forward quickly while minimizing risk.

1 Comment

Perl: When DWIM Doesn’t

We’ve written in the past of our love for Perl. We meant it. But in any loving relationship, there will also be hard parts and unpleasant surprises. These are some tales of unpleasant surprises.

Surprise One: Bonus Feature

Here is some code that sets up a global $config hash, setting a file path the application should read data from.

our $config;
$config->{file_paht} = "/opt/app/data_file";

And the code that reads the data file:

open (my $fh, "<", $config->{file_path})
    or  die "can't open $config->{file_path}: $!";

my $data;
{
    local $/ = undef;
    $data = <$fh>;
}

You probably spotted that file_paht typo before we did. A warning or error would have helped us spot it earlier, but instead we got a bonus feature.

Perl decided what we really wanted was an anonymous temporary file, and provided us one. A brand-new, anonymous tempfile, that could never have been written to, opened for reading.

This bonus feature is documented as a “special” case in the sixteen or seventeenth paragraph of perldoc -f open. Special, indeed. So special that to debug it we ran an strace, thinking …

where the f^H Sam Hill did that open(“/tmp/PerlIO_Z2sAqY”, O_RDWR…) come from?

… and grepped the source to find the answer, and re-read perldoc -f open to try to find our sanity.

Avoiding this bug requires being more defensive, which is always a good idea when reading disk files in production code:

if (exists $config->{file_path}  and  -r $config->{file_path}) {
    ...
}

In writing this article we began to consider this case a bug in perl, and went to file one at rt.perl.org, only to find that the wonderful Perl 5 Porters had beat us to it, and that there is a current thread on the mailing list concerning this bug. Thanks p5p!

Surprise Two: When DWIM Doesn’t

We are constantly A/B testing at Shutterstock. Sometimes we need to usurp a random test assignment to view specific variants. The overrides are cached in the session:

$session->{ab_variant_overrides} = [34, 29];

Code checks if the variants are being usurped, and builds the appropriate template data structure:

if (exists $session->{ab_variant_overrides}) {
    # template expects custom_overrides to be [int, ...]
    $template->{custom_overrides} = $session->{ab_variant_overrides};
}

At one point we needed a quick hack to do something special inside a usurper variant:

if (grep { $_ == 42 } @{ $session->{ab_variant_overrides} }) {
    # give them something special
}

You may have spotted a bug in that code. If we haven’t yet assigned to $session->{ab_variant_overrides}, we’ll be dereferencing an undefined value. What should happen in that case?

One might expect Perl’s fatal “Can’t use an undefined value as an ARRAY reference” under strictures. Instead, the presence of the grep springs an empty array reference into place and assigns it to $session->{ab_variant_overrides}. Oops.

This behavior is hinted at in item 6 of the “Making References” section in perlref.

References of the appropriate type can spring into existence if you dereference them in a context that assumes they exist. Because we haven’t talked about dereferencing yet, we can’t show you any examples yet.

A quick fix here is to be more defensive by changing the dereferencing:

grep { $_ == 42 } @{ $session->{ab_variant_overrides} || [] }

What are your tales of surprise?

5 Comments

Introducing Rickshaw: A JavaScript toolkit for creating interactive time series graphs

We’re happy to share a project we’ve been working on that helps us see into our data.  It’s a JavaScript toolkit for creating interactive time series graphs, called Rickshaw.

At Shutterstock we use Rickshaw to read A/B tests, to monitor application and site health in real time, to see into dense product metrics, and all sorts of other things.

It has been a primary goal during the development of Rickshaw to keep the API simple. It has also been a primary goal to not obscure what lies beneath.  We use Mike Bostock’s wonderful d3 library to manipulate SVG, and those layers stay accessible if you want to get fancy.

Finally, we’ve kept the scope of our problem domain small.  Getting started with a simple graph is a couple of lines of JavaScript and HTML.  From there we can add new functionality by consuming extensions that come with the library.  Here is an example that shows some of them off.

Rickshaw helps us visualize dense time series data.  We hope you have similar success if you give it a try.  Here’s a listing of examples, and a tutorial to get you started.

Leave a comment

Feersum in the Wild: Perl’s Evented Web Server

We use open source software in just about every form it takes: programming languages, operating systems, web servers, databases… even firewalls.  We try to release some of our own software, too.  Open source software has all kinds of advantages, but one of my favorite’s is how easy it is to fix problems if any arise.

Earlier this year, we added autocomplete functionality to our search interface.  Autocomplete is a simple concept that has some tricky implementation details, especially if you have a big data set.  It requires a fast server-side lookup table, and the server response has to be lightning quick.

We’ve experimented with a lot of web servers at Shutterstock.  Our mainstay is Apache, but over the years we’ve toyed with lighttpd, nginx, node, and others.  But hooking logic into a full-featured webserver leaves you with a pretty bulky system, and we thought we’d be better off going with something lighter.

We looked around and found Feersum.  Feersum is an event-based webserver (like nginx and node) that’s written in Perl (or more accurately, a combination of Perl and C) and is based on EV/libev (the same event loop that node uses).  We whipped up a prototype with it and were impressed by its speed — 2,000 requests/sec with a 30ms mean response time with 100 concurrent connections on a lightweight box.  That’s quick!

So we wrote an implementation of autocomplete with it and launched it. And it was a great success — when it worked.  We noticed that sometimes it would simply fail on certain requests.  The host servers seemed fine.  The daemon was still listening and responding to requests.  But for some reason we’d sporadically get “400 Bad Request” errors.

At first we assumed this was a problem with the client — our AJAX code must have somehow been buggy and passed in bad data.  But we ruled that out pretty quickly, and soon isolated the problem to the daemon.  We were able to reproduce the issue by sending simple and innocous HTTP requests that would nonetheless return “400 Bad Request” responses back.  We scratched our heads a bit, and then did that glorious thing that open source software lets you do: we dove into the code.

Here, life got more interesting.  It turns out Feersum is based on another open source project, picohttpparser.  That presented a challenge, both because it was slightly harder to isolate the problem and also because picohttpparser is meant to be lightning fast and is therefore written with a bunch of effective but obscure optimizations.

So we spent a weekend hacking away at it, adding sprintf’s (debugger, bah!) to every line we could to understand the problem.  We got pretty close to figuring it out, but ultimately got tripped up by not knowing whether the problem was in how Feersum was calling picohttpparser, or in picohttpparser itself.

Happily, open source software gives you an easy next step: contact the author.  So we gathered all the information we could about the problem, tried to sum it up as succintly as we could, and posted an issue on Github.  Within two days, the author had identified the error, patched it, and released a new version.  Thanks, stash!

Delighted with the quick reponse, we installed the new version, did some tests — and saw our daemon work flawlessly.

And check it out — we’re now humming along with Feersum serving a snappy response on every keypress of every image search!  That’s way cool.

Leave a comment

Faster TCP Slow Starts

At Shutterstock we’re obsessed with speed. Faster page loads mean happier customers, and we like happy customers.

Shutterstock’s customers are widely distributed around the globe. Our primary set of web servers is co-located in New York. We push a lot of our static assets (image thumbnails, JavaScript, CSS, and heck, let’s include DNS entries in that category) to globally distributed edge caches, but our servers still have to generate markup and push it over the long wire to web browsers far away from NYC.

After some advocating, hacking and testing (well reported over at LWN), a patch has landed in Linux 2.6.39 that raises TCP/IP’s initial congestion window from 4 packets to 10 packets. It’s going to take some time for this patch to make its way into “the enterprise”, but we’d like our customers to benefit from this larger initial payload immediately. So we’ve applied a patch to our web server kernels and measured the benefit to our customers.

The results are fantastic. Things stayed fast in New York, and got a lot faster around the globe.

(Update based on some feedback in these comments and on proggit):

The data being measured there is how long it takes us to push page markup over the wire to WebMetrics polling agents in the cities that are labeled. The X axis is days. Those are numbered 1 to 7, left to right. Y is markup delivery time in fractions of seconds, with a base of zero. Pushing markup over the wire from our webservers to the WebMetrics polling agent in New York is fast, and stays slightly noisy but fairly constant over the week period in the graph.

Pushing markup over the wire from our webservers to the WebMetrics polling agents in remote geographies gets significantly faster (closer to zero seconds) as the 6th day on the graph paints. Then it stays faster on the 7th day.

5 Comments

Why Perl?

At Shutterstock we use Perl, and have built our industry-leading website using a Perl and open source stack.  Here’s some reasons we love Perl and why developers feel fanatical about Perl!

Perl Programmers love their language. For most Perl programmers, using Perl is more than a job; it’s part of one’s identity.  When using Perl I often feel like I am part of something greater than the individual code I may be writing at this temporal moment.  Objective studies seem to validate this anecdotal experience.

True Freedom. Perl is free software / open source software.  This means I can always find the source code and often can find the core developers in charge of some code.  I never need to worry about when some company is going to service my bug ticket, and I am never hostage to the changing whim of corporate strategy.   Because of this true freedom the Perl community is completely in charge of itself and has spent years doing the hard job of self-organizing and learning to coordinate our long-term objectives.

Great Community. Since Perl programmers know that the future of our language is solely in our own hands, this has fostered a strong sense of community and shared destiny.  This is not to say we live in a sort of new Eden; certainly there are arguments and differences of opinion.  However, our willingness to respect those who prove their point with code and not just words enhances our contentious meritocracy for the benefit of all.

CPAN. Quite simply there is nothing like it, and it gets better all the time.   With Perl you have one command to access literally tens of thousands of open sourced modules, covering everything from the quirky to the religious to the serious.  Additionally, CPAN is more than just the free modules; its ecosystem includes a distributed delivery system as well as a test collection framework (more than 16 million tests collected across a variety of platforms and Perl versions).

Awesome Tools. Although it goes without saying that my CPAN comment above would cover this, I think it is worth a shout out to a few of my personal favorites, without which I might have left the Perl community years ago.

  • Moose and the extended MooseX software ecosystem.  Simply the best way to model objects in Perl or any other language as far as I am concerned.
  • Plack.  This creates a strong foundation for web application building in Perl that is easy on developers, straightforward to deploy and encourages an unprecedented level of cooperation between all the different frameworks for authoring dynamic websites.
  • Perlbrewlocal::libModule::Install.  A great tool-chain for developers to organize, code and distribute applications.
  • Test::*.  Perl just has the best and most developer friendly testing code.  No surprise we have such a strong, test-centered culture!

Awesome Third Party Support. Want to connect to Twitter?  Access Facebook?  Search with Google or Bing?  Want to deploy or manage your EC2 clouds?  Or maybe you like Rackspace?  Maybe you love Github?  Or you are using a Platform as a Service provider like Dotcloud or Stackato for easy deployment?  Perl has you covered for this and much, much more!

Jobs. Shutterstock.com is always looking for awesome Perl developers.  In addition, Perl jobs tend to be very developer-centered.  The best Perl developers are often respected within their companies.  My personal experience as an IT worker has been significantly better than the average of my peers in other languages.  As a Perl developer, I have never interviewed and been hired into a job that I regretted or didn’t like at a later date.

Resources

 

2 Comments

The Importance of “noatime”

At Shutterstock we’ve been putting a lot of effort into rolling out an infrastructure-wide configuration management / provisioning system to ensure all our servers are built correctly every time.  This system consists of cobbler / puppet to ensure the appropriate packages and configurations on each server set that we have (thumb, web, DB, memcache, etc.), and we use some other cool tools like fabric / mcollective to do bulk jobs across pools of servers.  It’s been a lot of work and it’s always nice to see some validation that it was worth it.  I’ll write a larger post on some of these later on.

Below is a good example of a server that is currently not “puppetized” and largely was built by a human to be a replica of the other three servers in the pool.  This server was missing a simple “noatime” mount option.  By simple we mean the fix was completely trivial, though finding the cause of the problem itself was something we discussed for quite some time.  Not a ton of ops time was lost… but I think we spent some hours scratching our heads before one of our engineers really wanted to sort this out.  Check out the difference that this made on load.

Before:

After:

There are a few wins here:

Performance – Major decrease in load on thumb02

Sexy – A graph that looks like the server set is scaling horizontally as we would expect

Validation – The warm fuzzy thought knowing that with puppet on hosts a misconfiguration like this should never happen again (and if it does we can always do a diff to find out what’s awry).

Leave a comment

Our Developer Ethos

Over the years, the Shutterstock development team has grown from two people to almost twenty today.  We’ve tried to maintain a consistent development culture during that time, because it provides a spirit that makes us want to be here and keep getting better at what we do.

A while back we decided to see if we could write down a few core principles that we stand for.  What we came up with is our “Developer Ethos.”  We do our best to uphold it and look to it for guidance.

It’s a living document that we like to re-visit and update.  Here’s its latest incarnation.

Get feedback as early as possible, and work together

  • discuss design, implementation, approach, etc
  • get feedback from developers, product managers, QA, end users
  • feedback mitigates risk that we’ll get too far off course
  • working together consistently produces better ideas, better code
  • sharing encourages us to write code we’re proud to show

Prefer encapsulated, loosely coupled systems for core functionality

  • loose coupling means you’re only solving one problem at a time
  • it’s a good sign if we can release the code on CPAN
  • as with unix command line utils, firefox extensions, etc, less is more

Choose the smallest implementation that provides a way forward

  • if we need to iterate, we’ll iterate — let’s solve the problem first
  • “a way forward” is a key part of this ethic — avoid creating tech debt, and think about what’s good for the business in the long-run
  • think incrementally, develop incrementally, deploy incrementally (this is harder than it sounds — challenge yourself to break things up into smaller pieces)
  • diff’s should be comprised of core functionality changes
  • smaller implementations mean less testing, fewer bugs
  • if a similar problem has been solved before, reuse that solution
  • if we need to refactor, let’s, but separately from new functionality

Make your code readable, understandable, maintainable

  • optimize for humans reading your code, then for performance
  • name variables carefully, be explicit, write comments
  • a wise man once said, “write for someone who’s not as smart as you”
Leave a comment

Welcome to the Shutterstock Tech Blog

One of the core principles of Shutterstock is to provide the digital building bocks for designers to produce creative content. Within the tech team, we think of ourselves as doing that at a lower level: we provide the digital building blocks to make the Shutterstock sites possible. We do this implicitly, by creating code that serves millions of pageviews a day, and explicitly, by creating building blocks that other developers can use to be more productive and gain more insight into their work. We hope to talk a lot about building blocks on this blog.

All of our sites are built on the LAMP stack, with the “P” being Perl.  Perl isn’t the sexiest of languages these days, but there’s a ton of exciting things going on in the Perl world lately, from its post-modern object framework to the unmatched number of libraries freely available on CPAN. We’ll be talking more about Perl, and why we consider it to be the sexiest unsexy language around.

We’re also proud to use open source software in all parts of our site. In fact, until recently we could say that every packet entering and leaving our site traveled through commodity hardware running on an open source stack (we just moved to commercial firewalls and load balancers — but everything else remains open source). Our team contributes to a few open source projects, which we keep track of at code.shutterstock.com.

We’ve been inspired by recent trends to merge development and operations teams, and we try to actively promote that within our own group. We’re doing cool things to bring performance data into the hands of developers, and allow people to monitor how their changes affect key site metrics. We’ll be showing off some of what we’ve come up with soon.

Finally, we’re fervently agile. We release code to production as often as we can. Instead of treating frequent code pushes as a risk, we treat our rate of deployments as a matter of pride. We work in two-week iterations, crave user feedback, and pivot based on the metrics we collect.

We’ll be talking about all these topics and a lot more on this blog. We welcome your comments as we reveal some of our favorite projects and practices.

Leave a comment