Lightweight Streaming HTTP Proxy for Rack (Ruby CPU-light HTTP Client Library)

So, I'm experimenting with a situation where I want to transfer huge files from a third party url through my server to the requesting client.

So far I've tried to implement this with Curb or Net :: HTTP, following the standard practice of "each" response body rack, for example:

class StreamBody
  ...
  def each
    some_http_library.on_body do | body_chunk |
      yield(body_chunk)
    end
  end
end

      

However, I cannot use this system with less than, say, 40% CPU (on my MacBook Air). If I try to do the same with Goliath using em-synchrony (like on the Goliath page), I can use up to about 25% CPU, but I cannot clear the headers. My streaming download "hangs" on the requesting client and the headers are shown as soon as the entire response is sent to the client, no matter what headers I put in.

Am I correct in assuming that this is one of those cases where Ruby just sucks great and instead I have to go to the go and nodejs'es of the world?

In comparison, we are currently using PHP streams from CURL for the PHP output stream, and this works with very little CPU overhead.

Or is there a proxy solution above that I could ask to handle my stuff? The problem is I want to reliably call a Ruby function when the whole body is sent to the socket and things like nginx proxies won't do that for me.

UPDATE: I tried to do a simple test for HTTP clients and it looks like most of the CPU usage is from HTTP client libraries. There are tests for Ruby HTTP clients, but they are based on response time, whereas CPU usage is never mentioned. In my test, I did a HTTP streaming load, typing the result in /dev/null

, and got a consistent CPU usage of 30-40%, which is in line with the CPU usage mapping I use when streaming through any Rack handler.

UPDATE: It turns out that most Rack handlers (Unicorn, etc.) use a write () loop on the response body, which can go into pending (high CPU usage) when the response cannot write fast enough. This can be mitigated to an extent by using rack.hijack

and writing to the output socket using write_nonblock

a IO.select

(unexpectedly the servers don't do this themselves).

lambda do |socket|
  begin
    rack_response_body.each do | chunk |
      begin
        bytes_written = socket.write_nonblock(chunk)
        # If we could write only partially, make sure we do a retry on the next
        # iteration with the remaining part
        if bytes_written < chunk.bytesize
          chunk = chunk[bytes_written..-1]
          raise Errno::EINTR
        end
      rescue IO::WaitWritable, Errno::EINTR # The output socket is saturated.
        IO.select(nil, [socket]) # Then let wait on the socket to be writable again
        retry # and off we go...
      rescue Errno::EPIPE # Happens when the client aborts the connection
        return
      end
    end
  ensure
    socket.close rescue IOError
    rack_response_body.close if rack_response_body.respond_to?(:close)
  end
end

      

+3


source to share


1 answer


There were no answers, but in the end we managed to find a solution. This is remarkably successful, as we collect terabytes of data every day. Here are the main ingredients:

  • patron as an HTTP client. I'll explain the choice below the answer.
  • Robust multi-threaded web server (like Puma)
  • sendfile gem

The main problem with wanting to build something like this with Ruby is what I call line breaks. Basically, row allocation in a virtual machine is not free. When you push a lot of data, you end up allocating a Ruby String to a chunk of data received from the upstream source, and perhaps you also end up allocating lines if you can't have the write()

entire chunk on the socket that your client connected represents over TCP. So, out of all the approaches we've tried, we couldn't find a solution that would allow us to avoid row churn - before we stumbled upon Patron, that is.

Patron, as it turns out, is the only Ruby HTTP client that lets you write direct entries in user space. This means that you can download some data over HTTP without highlighting a ruby ​​string for the data you are pulling. Patron has a function that will open a pointer FILE*

and write directly to that pointer using libCURL callbacks. This happens when the Ruby GVL is unlocked, since everything is stacked into C level. In practice, this means that nothing will be allocated in the "pull" stage in the Ruby heap to hold the response body.

Note that curb, another widely used CURL binding library, does not have this feature - it will allocate Ruby lines on the heap and give them to you, which defeats the purpose.

The next step is serving this content for a TCP socket. As it happens - again - there are three ways to do this.



  • Read data from file loaded into Ruby heap and write it to socket
  • Write a thin C shim that does socket writing for you, avoiding the Ruby heap
  • Use syscall sendfile()

    to perform file-to-socket operations in kernel space, avoiding shared use.

You need to hit the TCP socket anyway, so you need to have full or partial support for Rack Hijack (check your webserver documentation, whether it has it or not).

We decided to go with the third option. sendfile

is a lovely gem of the Unicorn and Rainbows author, and it does exactly that - pass it a Ruby File object and TCPSocket

, and it will ask the kernel to send the file to the socket bypassing the machine as much as possible. Again, you don't need to read anything in a bunch. So in the end, here's the approach we went for (pseudocode-ish, doesn't handle edge cases):

# Use Tempfile to allocate a unique file name
tf = Tempfile.new('chunk')

# Download a part of the file using the Range header 
Patron::Session.new.get_file(the_url, tf.path, {'Range' => '..-..'})

# Use the blocking sendfile call (for demo purposes, you can also send in chunks).
# Note that non-blocking sendfile() is broken on OSX
socket.sendfile(file, start_reading_at=0, send_bytes=tf.size)

# Make sure to get rid of the file
tf.close; tf.unlink

      

This allows us to serve multiple connections, event-free, with very little CPU usage and very little heap pressure. That being said, we regularly see boxes serving hundreds of users using about 2% of the CPU. And Ruby GC remains happy. Basically, the only thing we don't like with this implementation is 8MB per thread of RAM per thread imposed by MRI. However, to get around this, we'll need to switch to a callable server (spaghetti code galore) or write our own IO reactor that multiplexes a large number of connections into a much smaller burst of streams, which is certainly doable but would take too much a lot of time.

Hope this helps someone.

+1


source







All Articles