Building a Quick Reverse Proxy

Posted on May 20, 2014

Our universe is comprised of a seemingly infinite number of rules ranging from little tidbits like magnetism and inertia, to the unchangable truths of buyer's remorse and Steam downtime occurring on a weekend. Still, there is another rule that seems to be more faithful than gravity itself: given ‘n’ public facing IP addresses, you will receive ‘n+1’ requests for their allocation. Six IPs? I'll see your six IPs and raise you seven servers. Ouch.

So, what are our options?

  • Futilely try to convenience the person that their server is less important than the existing ones
  • Yank one of the other IPs and hope that nobody notices 1
  • Act like appending port numbers to the end of an IP is not a usability nightmare.
  • Perform a bit of magic, courtesy of a reverse proxy.

With the options enumerated, I suspect that most of us can agree that option four is the best for all parties. To that end, let's setup a reverse proxy. Though this can be done with IIS on windows, I'm going to stick with my favorite, NGinx.

A Reverse What?

A reverse proxy is very similar to its close relative, the proxy server. If you've used a hotspot at a Starbucks or a college campus, chances are that your traffic has been routed through a regular proxy server before being sent out across the general internet. Since it acts as a man-in-the-middle, the proxy server can speed the service by caching frequently accessed resources, or filter it based on content2 or destination.

In theory, a reverse proxy is extremely similar. The difference is that whereas a proxy server normally proxies outbound traffic, a reverse proxy proxies inbound traffic. By parsing all of the incoming requests, we can perform filtering and load balancing, or as we plan in this case, we can hide several servers behind a pool of IPs and decide which one to should be used to field our request based on the HTTP headers.

The Setup

Assuming we've already got an Ubuntu machine available, we should be able to get NGinx installed extremely quickly. If you're using another platform, be it Windows, Mac, or another Linux, you can still run NGinx, but you'll need to find the correct installation method for your particular system.

sudo apt-get install nginx

Simple enough.

Configuration

Next, we'll need to add some configuration directives for NGinx to follow. For the following steps, I'll assume NGinx configuration to reside in /etc/nginx. Let's create a new configuration file in /etc/nginx/sites-available. We'll name it after the site we're proxying. In this case, we'll do /etc/nginx/sites-available/jira.conf.

The first thing that we're going to need to add is a definition of our upstream server. In our case, we'll assume that Jira is using it's default port (8080) and that it is hosted internally on 10.1.2.3.

upstream jira {
    server 10.1.2.3:8080;
}

Next we're going to need to let NGinx know which requests should be handled by our new rules. For this, we'll add a server section to the configuration. This will tell it to listen on port 80 of all NICs, and to only field requests for jira.example.com.

server {
    listen       *:80;
    server_name  jira.example.com;
}

Now that we've directed NGinx which sites should be affected by our directives, we should define what the directives actually are. For this, we'll add a location item inside of the server item.

location / {
    proxy_pass  http://jira;
    proxy_set_header        Host            $host;
    proxy_set_header        X-Real-IP       $remote_addr;
    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;

    proxy_connect_timeout   180;
    proxy_send_timeout      180;
    proxy_read_timeout      180;
}

We're doing several things here. To start with, we're defining that requests should be proxied to the upstream that we defined earlier. 3 Next, we're going to define a few headers that should be passed along upstream. To start with, we're going to make sure that the host header being sent to the final server is the same as we received. Next, we add some information about the machine requesting the page into X-Real-IP and X-Forwarded-For. Finally, we set a three minute timeout for connecting to the proxy, and another three minutes for receiving, or sending data. 4

At this point, our completed configuration file should look as follows:

upstream jira {
    server 10.1.2.3:8080;
}

server {
    listen       *:80;
    server_name  jira.example.com;

    location / {
        proxy_pass  http://jira;
        proxy_set_header        Host            $host;
        proxy_set_header        X-Real-IP       $remote_addr;
        proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;

        proxy_connect_timeout   180;
        proxy_send_timeout      180;
        proxy_read_timeout      180;
    }
}

Verification

With the configuration complete, we have a total of three more actions before we're done. Earlier, we placed our configuration file in /etc/nginx/sites-available. While it is good to keep a copy of the configuration there, NGinx won't actually load it from here, so we need to create a symlink to this file in sites-enabled.

sudo ln -s /etc/nginx/sites-available/jira.conf /etc/nginx/sites-enabled/

Now that we've linked the file into place, let's have NGinx verify our configuration doesn't have a problem.

sudo nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

If there were no errors, we should be able to restart NGinx now.

sudo service nginx restart

Requests to jira.example.com should now show our actual Jira instance, without having to dedicate an IP just for it. By repeating this strategy at the network gateway, we can greatly expand the number of servers we can provide external access for without having to add more IPs. 5 6

Edit

The last configuration file said ‘localhost’ originally instead of ‘10.1.2.3’. Props to David Ruttka for noticing. Also, added note four about timeout length.

Footnotes


  1. They noticed. Oh, believe me, they noticed. They're probably emailing you about it right now. ↩︎

  2. Actual content filtering assumes that we aren't using HTTPS. At that point, the best we could do would really be host-based filtering. ↩︎

  3. Technically we didn't have to defined the upstream. We could have just defined our address in the proxy_pass line. While this would have worked, it doesn't lend itself to a maintainable script, as addresses change or fallbacks are added. ↩︎

  4. Three minutes is a bit long, right? Well, most of the time, yes. If you're doing bulk operations on a large number of items in Jira, however, you'll discover that everything is run in one go rather than in batches. Now, three minutes goes from being a long stretch of time to a gamble that it might not be long enough. ↩︎

  5. An important detail here. We now have the capability to expose large numbers of internal servers to our public network. However, it seems a reminder might be necessary that could != should. ↩︎

  6. It is important to note that this only works for HTTP and HTTPS traffic. ↩︎