Our engineering teams support many different sites, including the Shutterstock Images, the Shutterstock Videos, the Shutterstock Contributor, Bigstock, Offset, and Skillfeed (now part of Bigstock).

All these sites rely on a core set of REST services for functionality like authentication, payment, and search. Since these core services are so critical, we need to know if they’re functioning properly at all times, and get alerted if they aren’t. There are plenty of solutions for server-level monitoring, but we couldn’t find a good, simple solution for service or API monitoring. So we built one. It’s called ntf, for network testing framework, and it’s part of a large collection of tools that we’ve open-sourced.

ntf Overview

ntf is based on Node.js and nodeunit. The framework provides a server that polls multiple endpoints on all our services once per minute and verifies that they’re responding correctly. We’ve connected it to Icinga and OpenPOM, the two monitoring tools we use, so that we can get alerted if any of our ntf tests fail.

We’ve set things up so that our developers just need to add some tests to a git repository and deploy their changes to production. From there, we get automated testing, reporting, and alerting for free.

Working With ntf

The ntf framework is broken into three pieces:

  • ntf is a command-line tool to run specific tests
  • ntfd is a library for creating a daemon that runs ntf tests at specified intervals in an infinite loop and sends the results to ntfserver
  • ntfserver is a server that stores events from ntfd in a mysql database and provides a web interface to report the status of current and past tests

Let’s work through a full example, and start with ntf itself. Here’s a simple test to check if a particular service is reachable:

var accounts = require('ntf').http('https://accounts');
exports.accounts_reachable = accounts.get('/', function(test) {
    // test status code is 200
    test.statusCode(200);
    // finished
    test.done();
});

This test checks the root URL of our accounts service, and makes sure it’s returning a “200 OK” HTTP response.

We can also add a test to make sure the response to the login resource looks ok:

exports.accounts_login = accounts.get('/login', function(test) {
  test.body("Sign in");
  test.done();
})

If we put both of these tests in a file called accounts.js and run it, we get:

$ ntf accounts.js

accounts.js
✓ accounts_reachable
✓ accounts_login

OK: 2 assertions (512ms)

ntf supports a range of tests to support more complicated interactions with all our services. For a full list, see the documentation.

To run the tests continually in a loop, we use ntfd. ntfd is a library that gets included in a Node project, similar to express. You can build a simple daemon with ntfd like so:

var ntfd = require('ntfd')

ntfd({
  path: __dirname + '/tests',
  agent: 'test',
  plugin: [
    new ntfd.plugin.ConsoleEmitter(),
    new ntfd.plugin.HttpEmitter('http://localhost:8000/store')
  ],
  test: {
    interval: 10
  }
})

The easiest way to get started with it is to copy the ‘example’ directory in the ntfd repository, put the test files (like accounts.js, above) in the ‘tests’ directory, and run:

$ node .

After a minute, you’ll see the output of the tests on the command-line. With an HttpEmitter defined, it will also send test data to http://localhost:8000/store, which is meant to be captured by ntfserver. ntfserver provides a web interface to the test results. To start ntfserver, run:

$ ./bin/ntfserver
   info  - socket.io started

Then, navigate to http://localhost:8000 to see the ntfserver dashboard.

These three components provide a complete framework to manage, run, and monitor tests.

How We Use ntf

As we create RESTful services, we write ntf tests for each new resource to guarantee that the resource is always functioning properly. We have a git repo dedicated to our ntf tests, which contains a directory for each of our services. In that directory is a list of tests that get run for each service.

We always have the ntfserver dashboard up on a monitor. The dashboard lets us drill down to find out the details of any problems that it reports.

Meanwhile, Icinga and OpenPOM can hit the same dashboard and request a /status resource to know if anyone needs to get alerted to a problem.

ntf has been a great help in letting us rapidly expand our services infrastructure with the confidence of knowing our systems are always functional. It’s open source and available on Github, and we’d love for you to check it out and let us know what you think.