Stop Buying Load Balancers and Start Controlling Your Traffic Flow with Software

When it comes to traditional load balancers, you can either splurge on expensive hardware or go the software route. Hardware load balancers typically have poor/outdated API designs and are, at least in my experience, slow. You can find a few software load balancing products with decent APIs, but trying to use free alternatives like HAproxy leaves you with bolt on software that generates the configuration file for you. Even then, if you need high throughput you have to rely on vertical scaling of your load balancer or round robin DNS to distribute horizontally.

We were trying to figure out how to avoid buying a half million dollars worth of load balancers everytime we needed a new data center. What if you didn’t want to use a regular layer 4/7 load balancer and, instead, relied exclusively on layer 3? This seems entirely possible, especially after reading about how CloudFlare uses Anycast to solve this problem. There are a few ways to accomplish this. You can go full blown BGP and run that all the way down to your top of rack switches, but that’s a commitment and likely requires a handful of full time network engineers on your team. Running a BGP daemon on your servers is the easiest way to mix “Anycast for load balancing” into your network. You have multiple options to do this:

After my own research, I decided that ExaBGP is the easiest way to manipulate routes. The entire application is written in Python, making it perfect for hacking on. ExaBGP has a decent API, and even supports JSON for parts of it. The API works by reading STDOUT from your process and sending your process information through STDIN. In the end, I’m looking for automated control over my network, rather than more configuration management.

At this point, I can create a basic “healthcheck” process that might look like:

#!/usr/bin/env bash
STATE="down"

while true; do
  curl localhost:4000/healthcheck.html 2>/dev/null | grep OK

  if [[ $? == 0 ]]; then
    if [[ "$STATE" != "up" ]]; then
      echo "announce 10.1.1.2/32 next-hop self"
      STATE="up"
    fi
  else
    if [[ "$STATE" != "down" ]]; then
      echo "withdraw 10.1.1.2/32 next-hop self"
      STATE="down"
    fi
  fi

  sleep 2
done

Then in your ExaBGP configuration file, you would add something like this:

group anycast-test {
  router-id 10.1.10.11;
  local-as 65001;
  peer-as 65002;

  process watch-application {
    run /usr/local/bin/healthcheck.sh
  }

  neighbor 10.1.10.1 {
    local-address 10.1.10.11;
  }
}

Now, anytime your curl | grep check is passing, your BGP neighbor (10.1.10.1) will have a route to your service IP (10.1.1.2). When it begins to fail, the route will be withdrawn from the neighbor. If you now deploy this on a handful of servers, your upstream BGP neighbor will have multiple routes. At this point, you have to configure your router to properly spread traffic between the multiple paths with equal cost. In JUNOS, this would look like:

set policy-options policy-statement load-balancing-policy then load-balance per-packet
set routing-options forwarding-table export load-balancing-policy
commit

Even though the above says load-balance per-packet, it is actually more of a load-balance per-flow since each TCP session will stick to one route rather than individual packets going to different backend servers. As far as I can tell, the reasoning for this stems from legacy chipsets that did not support a per-flow packet distribution. You can read more about this configuration on Juniper’s website.. Below is our new network topology for accessing a service:

topology

There are some scale limitations though. It comes down to what your hardware router can handle for ECMP. I know a Juniper MX240 can handle 16 next-hops, and have heard rumors that a software update will bump this to 64, but again this is something to keep in mind. A tiered approach may be appropriate if you need a high number of backend machines. This would include a layer of route servers running BIRD/Quagga and then your backend services peer to this using ExaBGP. You could even use this approach to scale HAproxy horizontally.

In conclusion, replacing a traditional load balancer with layer 3 routing is entirely possible. In fact, it can even give you more control of where traffic is flowing in your datacenter if done right. I look forward to rolling this out with more backend services over the coming months and learning what problems may arise. The possibilities are endless, and I’d love to hear more about what others are doing.

Interested in working at Shutterstock? We're hiring! >>
This entry was posted in Devops, WebOps. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

7 Responses to Stop Buying Load Balancers and Start Controlling Your Traffic Flow with Software

  1. luben says:

    Another lightweight option is to use the linus LVS with direct routing and keepalived for healthchecks and groups/peers management. You could even cluster 2 balancers as active/passive using VRRP.

  2. david esquivel says:

    How about balanceNG ?

    http://www.inlab.de/balance.html

  3. Marcelo says:

    This is the kind of real world application of technologies that I like to read about. Thanks you so much for letting the rest of us know a little bit about your infrastructure.

  4. Craig says:

    “The entire application is written in Python, making it perfect for hacking on.”

    Python automatically makes it hackable and “perfect”? Why do people have these bizarre emotional attachments to tools? Python is very far from perfect.

    • Twirrim says:

      Craig: “perfect for hacking on” != “Python is perfect”

      The author makes no claim about Python being perfect.

    • Martin Barry says:

      Nothing is perfect. I think it’s just an offhand comment based on the relative ease of modifying it compared to something written in C.

    • Allan Feid says:

      As others have pointed out, I’m not saying Python is perfect, just that it’s easy to modify with out recompiling.