Shutterstock developers pay a lot of attention to the user experience of our website. We have a fleet of User Experience experts who help make sure the error states our web application shows to customers are useful and actionable.

But when we’re building backend APIs instead of HTML forms, that experience doesn’t translate. What’s the equivalent of this, in an API?


The Shutterstock Contributor Team has been building our next-generation content-review system, so that we can scale our image-review operation. We’re building it in a service-oriented fashion, in Ruby, with DataMapper as an ORM.

As developers building backend APIs, it’s solely our responsibility to provide useful information to the developers who will use our services. A good error validation framework preserves the integrity of our applications’ data and empowers developers to integrate with a new API.

Rather than write custom validation for each API endpoint, we took a systematic approach to add validation to all of them. Now we can avoid many application crashes, while providing useful information to developers.


One of the first things the review system needs is to learn about new items needing review:

POST /items

{
  "domain": "shutterstock-photo",
  "owner": "81",
  "item": "3709",
  "item_type": "photo",
  "queue": "main"
}

This call puts the photo with item id 3709 and owner id 81 into the main review queue. The expected result is HTTP 201 Created with a Location: header giving the URL of the created item.

There are several other Shutterstock teams that will eventually integrate with this review service. Sometimes, when developers are still writing the software, they will post invalid data:

POST /items

{
  "domain": "shutterstock-photo",
  "owner": "81",
  "item": "3709",
  "item_type": "photo"
  // "queue": "main"
}

Whoops! This POST left out the queue name, so the review system doesn’t know who’s supposed to review it. Without data validation, our application will throw a 500 error:

500 Internal Server Error TypeError: expected String, got NilClass 

It would be better if we told the programmer what he’s done wrong. Also, we’d like to return HTTP 400 Bad Request instead of having an internal server error.

Our team realized that there’s a tool to help us do this sort of thing: the json-schema Ruby gem, an implementation of the IETF JSON Schema spec. To use this, we’ll need to build up a schema. For the items route, it would look like this:

{
  "id":" http://review.shutterstock.com/items.schema",
  "type": "object",
  "required": ["domain","item","item_type","owner", "queue"],
  "properties": {
    "create_time": {"type":"string"},
    "item":        {"type":"string"},
    "domain":      {"type":"string"},
    "item_type":   {"type":"string"},
    "owner":       {"type":"string"},
    "queue":       {"type":"string"}
    }
 } 

Now we will make our review service pass the incoming POST data through json-schema’s JSON::Validator before doing anything else:

rest_data = JSON.parse(request.body.read)
json_errors = JSON::Validator.fully_validate(
   schema,
   rest_data,
   :version => :draft4)
if json_errors.length > 0
   content_type 'application/json'
   halt 400, JSON[{:errors => json_errors}]
end

If there are any errors, the response looks like this instead:

400 Bad Request

 {"errors"=> [
  "The property '#/' did not contain a required property of 'queue'
   in schema http://review.shutterstock.com/items.schema#"
]}

This message tells us that there’s a property missing in the JSON document root (#/). If there’s more than one item missing, the validator will identify them all. The validator does more than check for the existence of the required fields; it also checks the types of each field. If someone passes in a Hash instead of a string, like so:

POST /items

 {
  "owner": "81",
  // eek, I'm not a string, I'm a Hash:
  "item": {"domain": "shutterstock-photo", "id": "3709"},
  "item_type": "photo",
  "queue": "main"
}

then they’ll get an error message about item. Previously the application would have returned another Internal Server Error about a TypeError as soon as it tried to treat item as a string.)

There’s just one problem. We have a variety of resource types to manage. It would be really great if we didn’t have to write a custom schema for all of them. It’s a fair amount of text to write; it’s easy to get wrong; the hand-written schema can fall out of sync with the actual code; and above all, it’s redundant! Most of that validation information is already encoded in our ORM layer, where it looks like this:

class Item
    include DataMapper::Resource

    property :id, DataMapper::Property::Serial
    property :create_time, DateTime,
             :default => lambda {|_,_| DateTime.now }
    property :external_id, String, :required => true

    belongs_to :domain
    belongs_to :item_type
    belongs_to :owner

    validates_uniqueness_of :external_id,
      :scope => :domain,
      :message => "Item must be unique to a domain"

    has n, :reviews
    has n, :queues, :through => :queue_items
    ...

It turns out that we can use this class definition to build our schema:

  • figure out the class of the resource in question (we’ll call it resource_class)
  • ask the resource_class for a list of its properties (resource_class.properties)
  • ignore properties that our application can automatically populate (like the internal database id and create_time)
  • figure out the data type for the remaining properties (property.primitive)
  • ask the properties whether they’re not required (property.required)

Once we’ve done that, we almost have enough information to build a schema. There are a few other wrinkles: our properties include things like domain_id as an integer instead of a string, and we want our consumers to specify shutterstock-photo instead of the internal database ID. So for those we:

Finally, we present all this data in the JSON Schema format.

That’s all the information we need to build schemas for all of our resource types. By computing and caching this at application load time, we can provide a basic schema for all POST and PUT requests.

We may need to customize a generated schema for certain routes that are special cases. For instance, we’ve decided that the POST /items route calls its logical ID field item in the POST and external_id in the database. Such customization is straightforward to accomplish.

Our final realization was that once we had all the information about how a schema ought to look, we could make the schema available to our users. So now they can issue a request against http://review.shutterstock.com/items.schema (or domains.schema and owners.schema) and see for themselves exactly what fields the system is expecting to create a new resource. By providing a URL to the schema in the error message, we end up with a self-documenting API!