How We Built a Cutting-Edge Color Search App

Engineers love working at Shutterstock because they get to build cool things. We aim to solve problems that matter to customers, and we’re constantly trying out new ideas through rapid prototyping.  One of the great things about our culture at Shutterstock is that an idea can come from anywhere, from the newest engineer to the CEO — we’ll try them out equally and see what resonates with users. This is how one of those ideas, our Spectrum color search, came to life.

Finding a problem to solve

Shutterstock serves a very visual audience. Creative directors, designers, freelancers, and others come to us to find visually appealing content. On most stock sites, searching for images is a process of entering keywords and toggling filters to find those that best match an idea.

For our visual-centric audience, this often doesn’t provide an easy path to finding the right images. We realized that an important problem to solve was how we could give our customers a way to search using visual cues in the images themselves: whether an image is color or black-and-white; bright or dark; vibrant or muted; textured or flat.

Experimenting with Color Search

Under the hood, searching by color is an exercise in complex math. At the start, there were a few interesting problems that we needed to work out:

  • What color space do we use to represent our image data?

  • How do we build an algorithm with these values that helps us find interesting images?

  • What kind of interface and controls would make this method of searching intuitive?

Many users are familiar with RGB color palettes but we needed more options to find the right algorithm, so we started playing around with HSV, HSL, and finally the Lab/LCH color spaces, which seemed more intuitive than anything else.

We began by indexing LCH (Lightness, Chroma, Hue) data for a few thousand images into an instance of MongoDB. Each histogram represented the number of pixels in an image for different ranges of lightness, chroma, and hue, from which were were able to compute various other statistics that we added to our index. We then threw together a simple interface where we could plug in numbers, try different equations, and see what images came out.

forest_green_balanced

In order to understand how the mathematics of color spaces connected to a particular visual experience, we broke down all the numbers and put them up in charts, offering inspiration to find the right algorithms.

shutterstock_56644495_mixer

 

 

 

 

 

 

 

 

 

Manually plugging numbers into input fields was fine for a first phase of development, but it certainly wasn’t the interface we wanted to give our customers. A source of inspiration came when Wyatt Jenkins, our head of product — and a former DJ — proposed using sliders to give it the feel of a mixer.  The next immediate version of the prototype had over 20 sliders to control all the visual attributes we had indexed.

lch_sliders_briliant_flat

One of the first prototypes had 24 sliders that you could move to find images with different visual attributes.

 

Closing in on the Final Product.

As our engineers worked on refining the accuracy and performance of the technology behind our color-search prototype, product and user-experience design specialists joined in to help build an interface that was intuitive for customers.

This meant tackling many of the details that we avoided in our initial rough prototypes:

  • 20+ sliders was way too much to jam into an interface, so we tried a version with 6, and then simplified it down to 3.  Eventually our designers and engineers tuned the experience to work with just one slider.

  • Our initial prototype only had about 100,000 images, and we wanted to run it on over 10 million. To speed up search queries on that much data, we switched the backend data store to Apache Solr. To process our image color data faster, we used the Perl PDL module PDL::Graphics::Colorspace, which was written by one of our algorithm experts, Maggie Xiong. To speed up the interface even further, we added some layers of cache, and primed it with a set of warming queries.

  • Our customer research team and product specialists found that there were some queries that didn’t produce appealing results for customers — if the query was too specific, there were too few results for some colors, and the results for other queries were sometimes less than stellar. A few of our engineers continued iterating on the sorting algorithms until they found the right equations to give us the most evenly colored, high-contrast, and vibrant images.

  • We wanted to provide a unique search experience, but didn’t want it cluttered with all the search options on the main Shutterstock site. We decided to put this in our Labs environment so that we could build it as a standalone app and get customer feedback on it.

 

color_prototype_3

One of the penultimate iterations of Spectrum had three sliders to toggle lightness, hue, and chroma (saturation). Certain combinations of slider positions gave us some cool effects like the one above.

 

Where We Are Today

After a few rounds of iterating, both on the interface and the back-end implementation, we finally had the simple, intuitive, responsive color exploration experience we had been after.

Even after release, we continued to iterate behind the scenes, tweaking and tuning the index and search algorithms. In the latest iteration, we further optimized the Solr index so that we could sort on a single Solr field for each slider position rather than run complex spatial sorting functions at query time.  We also migrated the color data from the standalone Solr instance we had dedicated to this app into our main Solr pool so that we could use a wider range of data for future iterations of color search.

spectrum-final

The final version of Spectrum as it appears today. Try it out at www.shutterstock.com/labs/spectrum

Spectrum is just one example of some of the cool things that a group of passionate people can build at Shutterstock.  Along with other apps, like Shutterstock Instant, we have numerous other prototypes in various stages of development.  Every week, engineers and product specialists are coming up with new ideas to build, and throwing together quick proofs of concept to assess their potential.  As we work on them, we’ll continue validating our ideas with customers, get better at solving real problems, and build valuable features that can help our users around the world.

Interested in working at Shutterstock? We're hiring! >>

About Chris Becker

Chris Becker is the Principal Engineer of Search at Shutterstock where he's worked on numerous areas of the search stack including the search platform, solr, relevance algorithms, data processing, analytics, internationalization, and customer experience.
This entry was posted in Search and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

12 Responses to How We Built a Cutting-Edge Color Search App

  1. Stomme poes says:

    Hey, did SOLR end up completely replacing Mongo or do you still use Mongo for some indexing/caching? It sounds kinda redundant to keep Mongo in this situation but was just wondering. I’ve been hearing of other companies using both some noSQL DB and one of the uber search engines together, but that could be these companies not realising the full potential of things like SOLR or ElasticSearch.

    We’re using both Sphinx and Redis but here, Redis is merely a cache while Sphinx is restricted to searching specific indices.

    • Chris Becker says:

      Once we started using Solr, we no longer used MongoDB as a back-end data store. The main use that Mongo served was to store arbitrary data structures during the initial prototyping phase. Once we had optimized our data down to a set of specific fields that we needed to search on, we defined a Solr schema that met our exact needs.

  2. Raja says:

    We built a color driven product search engine – http://snaps.brika.com. We used a native rtree implementation instead of using something like mongo or solr. Would love to hear a little bit more about how you are utilizing solr (nearest neighbour search for multidimensional spatial data?)

    • Chris Becker says:

      Very cool! We once tried playing around with something similar to rtrees (mtrees), but had some difficulty getting it to scale for millions of images. Our solr implementation basically ran spatial queries on some processed LCH histogram data. For each position on the slider we basically did a proximity query between the selected color and the color in each image.

  3. Anon says:

    Please change your browser check. Instead of blocking users completely, you should at least provide a link for bypassing the browser check. I’m certain that my browser (Opera) is compatible.

  4. Richard Albritton says:

    I came here via a FB link provided by Edward Tufte: “Color-space based quilts, great example of principle: a visual solution for a visual problem.” That’s some high potency praise so I had to come see. The spectrum search is fun to play with besides being very practical, much like a good audio synthesizer tool (there are a lot of really bad synth interfaces, actually). I did find myself wanting a second spectrum slider so I could more easily dial up photos possessing my favorite team colors (orange and blue–War Eagle!).

    • Chris Becker says:

      Hi Richard,
      Glad you’re enjoying it! The team had a lot of fun building Spectrum. We’re definitely on the lookout for great ideas to build next, and there’s a lot of possibilities that multiple sliders could open up. We’re definitely looking to release some cool new tools in 2014.

  5. dirk says:

    That’s nifty. Using the default “forest” query, I wasn’t ever able to move the slider into a position that wasn’t overrun by gross lime green. Where’s the grit?