...">

Converting xml to native Ruby data structure

I am grabbing data from api which returns xml like this:

<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>

      

I am new to deserialization, but what I think is appropriate is to parse this xml into a ruby ​​object, which I can then refer to as objectFoo.seriess.series.frequency which will return "Quarterly".

From my searches here and on google, it doesn't seem like an obvious solution to this in Ruby (NOT rails), which makes me think I'm missing something pretty obvious. Any ideas?

Edit I am setting up a test case based on Winfield's suggestion.

class Exopenstruct

  require 'ostruct'

  def initialize()  

  hash = {"seriess"=>{"realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "series"=>{"id"=>"GDPC1", "realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "title"=>"Real Gross Domestic Product, 1 Decimal", "observation_start"=>"1947-01-01", "observation_end"=>"2012-10-01", "frequency"=>"Quarterly", "frequency_short"=>"Q", "units"=>"Billions of Chained 2005 Dollars", "units_short"=>"Bil. of Chn. 2005 $", "seasonal_adjustment"=>"Seasonally Adjusted Annual Rate", "seasonal_adjustment_short"=>"SAAR", "last_updated"=>"2013-01-30 07:46:54-06", "popularity"=>"93", "notes"=>"Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States.\n\nFor more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"}}}

  object_instance = OpenStruct.new( hash )

  end
end

      

In irb, I downloaded the rb file and instantiated the class. However, when I tried to access an attribute (like instance.seriess) I got: NoMethodError: undefined method `seriess'

Again, I apologize if I miss something obvious.

+3


source to share


4 answers


You might be better off using standard XML Hash parsing, like the one included with Rails:

object_hash = Hash.from_xml(xml_string)
puts object_hash['seriess']

      

If you are not using the Rails stack, you can use a library like Nokogiri.



EDIT: If you're looking for object behavior, using OpenStruct is a great way to wrap a hash for this:

object_instance = OpenStruct.new( Hash.from_xml(xml_string) )
puts object_instance.seriess

      

NOTE. Deeply nested data may need to recursively convert inline hashes to OpenStruct instances. IE: If the attribute above is a hash of values, it will be a hash, not OpenStruct.

+14


source


I've just started using HappyMapper's Damien Le Berrigaud fork and I'm very happy with it. You define simple Ruby and include HappyMapper

. When you call parse

it uses Nokogiri to slurp in XML, and you return the full tree from bona-fide Ruby objects.

I have used it to parse multi-megabyte XML files and found it fast and reliable. Check out the README .

One hint: since the encoding strings of an XML file sometimes lie, you may need to misinform your XML like this:

def sanitize(xml)
  xml.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
end

      

before passing it to the #parse method to avoid the Nokogiri error Input is not proper UTF-8, indicate encoding !

.



Update

I went ahead and applied the OP's example in HappyMapper:

XML_STRING = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'

class Series; end;              # fwd reference

class Seriess
  include HappyMapper
  tag 'seriess'

  attribute :realtime_start, Date
  attribute :realtime_end, Date
  has_many :seriess, Series, :tag => 'series'
end
class Series
  include HappyMapper
  tag 'series'

  attribute 'id', String
  attribute 'realtime_start', Date
  attribute 'realtime_end', Date
  attribute 'title', String
  attribute 'observation_start', Date
  attribute 'observation_end', Date
  attribute 'frequency', String
  attribute 'frequency_short', String
  attribute 'units', String
  attribute 'units_short', String
  attribute 'seasonal_adjustment', String
  attribute 'seasonal_adjustment_short', String
  attribute 'last_updated', DateTime
  attribute 'popularity', Integer
  attribute 'notes', String
end

def test
  Seriess.parse(XML_STRING, :single => true)
end

      

and here's what you can do with it:

>> a = test
>> a.class
Seriess
>> a.seriess.first.frequency
=> "Quarterly"
>> a.seriess.first.observation_start
=> #<Date: 1947-01-01 ((2432187j,0s,0n),+0s,2299161j)>
>> a.seriess.first.popularity
=> 93

      

+4


source


Nokogiri decides to parse. How to process the data is up to you, here I use OpenStruct

as an example:

require 'nokogiri'
require 'ostruct'
require 'open-uri'

doc = Nokogiri.parse open('http://www.w3schools.com/xml/note.xml')

note = OpenStruct.new

note.to = doc.at('to').text
note.from = doc.at('from').text
note.heading = doc.at('heading').text
note.body = doc.at('body').text

=> #<OpenStruct to="Tove", from="Jani", heading="Reminder", body="ToveJaniReminderDon't forget me this weekend!\r\n">

      

This is just a teaser, your problem could be many times greater. Just give you a head start to get started with


: google stackoverflow. @Winfield, rails Hash#from_xml

:

> require 'active_support/core_ext/hash/conversions'
> xml = Nokogiri::XML.parse(open('http://www.w3schools.com/xml/note.xml'))
> Hash.from_xml(xml.to_s)
=> {"note"=>{"to"=>"Tove", "from"=>"Jani", "heading"=>"Reminder", "body"=>"Don't forget me this weekend!"}}

      

, , ActiveRecord:: Base , .

http://nokogiri.org/
http://ruby-doc.org/stdlib-1.9.3/libdoc/ostruct/rdoc/OpenStruct.html fooobar.com/questions/93586/...

+1




xml Hash, nori gem .

:

require 'nori'

xml = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'

hash = Nori.new.parse(xml)    
hash['seriess']
hash['seriess']['series']
puts hash['seriess']['series']['@frequency']

      

'@' , 'series' .

0









All Articles