An efficient way to showcase a ton of JSON on Heroku
I have built a simple API with one endpoint. It dumps files and currently has about 30,000 entries. Ideally, I would like to get all of these records in JSON with a single HTTP call.
Here is my Sinatra view code:
require 'sinatra'
require 'json'
require 'mongoid'
Mongoid.identity_map_enabled = false
get '/' do
content_type :json
Book.all
end
I have tried the following using multi_json with
require './require.rb'
require 'sinatra'
require 'multi_json'
MultiJson.engine = :yajl
Mongoid.identity_map_enabled = false
get '/' do
content_type :json
MultiJson.encode(Book.all)
end
The problem with this approach is that I am getting R14 error (memory quota exceeded). I am getting the same error when I try to use the "oj" gem.
I would just concentrate the whole long line of Redis, but the Heroku redis service is $ 30 / month for the instance size I need (> 10mb).
My current solution is to use a background task that creates objects and fills them with full jsonified objects near the Mongoid object size limit (16mb). Problems with this approach: it still takes almost 30 seconds to render, and I have to start post-processing in the receiving application in order to properly extract the json from the objects.
Does anyone have a better idea of how I can display json for 30k records in one call without switching from Heroku?
source to share
It sounds like you want to pass JSON directly to the client, not create it in memory. This is probably the best way to reduce memory usage. For example you can use yajl
to directly encode JSON in a stream.
Edit: I have rewritten all the code for yajl
because its API is much more attractive and allows for much cleaner code. I've also included an example for reading the answer in chunks. Here's a streaming JSON array helper I wrote:
require 'yajl'
module JsonArray
class StreamWriter
def initialize(out)
super()
@out = out
@encoder = Yajl::Encoder.new
@first = true
end
def <<(object)
@out << ',' unless @first
@out << @encoder.encode(object)
@out << "\n"
@first = false
end
end
def self.write_stream(app, &block)
app.stream do |out|
out << '['
block.call StreamWriter.new(out)
out << ']'
end
end
end
Using:
require 'sinatra'
require 'mongoid'
Mongoid.identity_map_enabled = false
# use a server that supports streaming
set :server, :thin
get '/' do
content_type :json
JsonArray.write_stream(self) do |json|
Book.all.each do |book|
json << book.attributes
end
end
end
To decode on the client side, you can read and parse the response in chunks, for example with em-http
. Note that this solution requires the client memory to be large enough to hold the entire array of objects. Here's the relevant streaming parser helper:
require 'yajl'
module JsonArray
class StreamParser
def initialize(&callback)
@parser = Yajl::Parser.new
@parser.on_parse_complete = callback
end
def <<(str)
@parser << str
end
end
def self.parse_stream(&callback)
StreamParser.new(&callback)
end
end
Using:
require 'em-http'
parser = JsonArray.parse_stream do |object|
# block is called when we are done parsing the
# entire array; now we can handle the data
p object
end
EventMachine.run do
http = EventMachine::HttpRequest.new('http://localhost:4567').get
http.stream do |chunk|
parser << chunk
end
http.callback do
EventMachine.stop
end
end
Alternative solution
You could have simplified the whole thing by eliminating the need to generate a "correct" JSON array. What is generated above is JSON in this form:
[{ ... book_1 ... }
,{ ... book_2 ... }
,{ ... book_3 ... }
...
,{ ... book_n ... }
]
However, we could pass each book as a separate JSON and thus reduce the format to the following:
{ ... book_1 ... } { ... book_2 ... } { ... book_3 ... } ... { ... book_n ... }
Then the code on the server will be much :
require 'sinatra'
require 'mongoid'
require 'yajl'
Mongoid.identity_map_enabled = false
set :server, :thin
get '/' do
content_type :json
encoder = Yajl::Encoder.new
stream do |out|
Book.all.each do |book|
out << encoder.encode(book.attributes) << "\n"
end
end
end
Like the client:
require 'em-http'
require 'yajl'
parser = Yajl::Parser.new
parser.on_parse_complete = Proc.new do |book|
# this will now be called separately for every book
p book
end
EventMachine.run do
http = EventMachine::HttpRequest.new('http://localhost:4567').get
http.stream do |chunk|
parser << chunk
end
http.callback do
EventMachine.stop
end
end
The great thing is that now the client does not have to wait for a complete answer, but instead parses each book separately. However, this will not work if one of your clients is expecting one large JSON array.
source to share