Converting xml to native Ruby data structure
I am grabbing data from api which returns xml like this:
<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>
I am new to deserialization, but what I think is appropriate is to parse this xml into a ruby object, which I can then refer to as objectFoo.seriess.series.frequency which will return "Quarterly".
From my searches here and on google, it doesn't seem like an obvious solution to this in Ruby (NOT rails), which makes me think I'm missing something pretty obvious. Any ideas?
Edit I am setting up a test case based on Winfield's suggestion.
class Exopenstruct
require 'ostruct'
def initialize()
hash = {"seriess"=>{"realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "series"=>{"id"=>"GDPC1", "realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "title"=>"Real Gross Domestic Product, 1 Decimal", "observation_start"=>"1947-01-01", "observation_end"=>"2012-10-01", "frequency"=>"Quarterly", "frequency_short"=>"Q", "units"=>"Billions of Chained 2005 Dollars", "units_short"=>"Bil. of Chn. 2005 $", "seasonal_adjustment"=>"Seasonally Adjusted Annual Rate", "seasonal_adjustment_short"=>"SAAR", "last_updated"=>"2013-01-30 07:46:54-06", "popularity"=>"93", "notes"=>"Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States.\n\nFor more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"}}}
object_instance = OpenStruct.new( hash )
end
end
In irb, I downloaded the rb file and instantiated the class. However, when I tried to access an attribute (like instance.seriess) I got: NoMethodError: undefined method `seriess'
Again, I apologize if I miss something obvious.
source to share
You might be better off using standard XML Hash parsing, like the one included with Rails:
object_hash = Hash.from_xml(xml_string) puts object_hash['seriess']
If you are not using the Rails stack, you can use a library like Nokogiri.
EDIT: If you're looking for object behavior, using OpenStruct is a great way to wrap a hash for this:
object_instance = OpenStruct.new( Hash.from_xml(xml_string) )
puts object_instance.seriess
NOTE. Deeply nested data may need to recursively convert inline hashes to OpenStruct instances. IE: If the attribute above is a hash of values, it will be a hash, not OpenStruct.
source to share
I've just started using HappyMapper's Damien Le Berrigaud fork and I'm very happy with it. You define simple Ruby and include HappyMapper
. When you call parse
it uses Nokogiri to slurp in XML, and you return the full tree from bona-fide Ruby objects.
I have used it to parse multi-megabyte XML files and found it fast and reliable. Check out the README .
One hint: since the encoding strings of an XML file sometimes lie, you may need to misinform your XML like this:
def sanitize(xml)
xml.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
end
before passing it to the #parse method to avoid the Nokogiri error Input is not proper UTF-8, indicate encoding !
.
Update
I went ahead and applied the OP's example in HappyMapper:
XML_STRING = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'
class Series; end; # fwd reference
class Seriess
include HappyMapper
tag 'seriess'
attribute :realtime_start, Date
attribute :realtime_end, Date
has_many :seriess, Series, :tag => 'series'
end
class Series
include HappyMapper
tag 'series'
attribute 'id', String
attribute 'realtime_start', Date
attribute 'realtime_end', Date
attribute 'title', String
attribute 'observation_start', Date
attribute 'observation_end', Date
attribute 'frequency', String
attribute 'frequency_short', String
attribute 'units', String
attribute 'units_short', String
attribute 'seasonal_adjustment', String
attribute 'seasonal_adjustment_short', String
attribute 'last_updated', DateTime
attribute 'popularity', Integer
attribute 'notes', String
end
def test
Seriess.parse(XML_STRING, :single => true)
end
and here's what you can do with it:
>> a = test
>> a.class
Seriess
>> a.seriess.first.frequency
=> "Quarterly"
>> a.seriess.first.observation_start
=> #<Date: 1947-01-01 ((2432187j,0s,0n),+0s,2299161j)>
>> a.seriess.first.popularity
=> 93
source to share
Nokogiri decides to parse. How to process the data is up to you, here I use OpenStruct
as an example:
require 'nokogiri'
require 'ostruct'
require 'open-uri'
doc = Nokogiri.parse open('http://www.w3schools.com/xml/note.xml')
note = OpenStruct.new
note.to = doc.at('to').text
note.from = doc.at('from').text
note.heading = doc.at('heading').text
note.body = doc.at('body').text
=> #<OpenStruct to="Tove", from="Jani", heading="Reminder", body="ToveJaniReminderDon't forget me this weekend!\r\n">
This is just a teaser, your problem could be many times greater. Just give you a head start to get started with
: google stackoverflow. @Winfield, rails Hash#from_xml
:
> require 'active_support/core_ext/hash/conversions'
> xml = Nokogiri::XML.parse(open('http://www.w3schools.com/xml/note.xml'))
> Hash.from_xml(xml.to_s)
=> {"note"=>{"to"=>"Tove", "from"=>"Jani", "heading"=>"Reminder", "body"=>"Don't forget me this weekend!"}}
, , ActiveRecord:: Base , .
http://nokogiri.org/
http://ruby-doc.org/stdlib-1.9.3/libdoc/ostruct/rdoc/OpenStruct.html
fooobar.com/questions/93586/...
xml Hash, nori gem .
:
require 'nori'
xml = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'
hash = Nori.new.parse(xml)
hash['seriess']
hash['seriess']['series']
puts hash['seriess']['series']['@frequency']
'@' , 'series' .