Ruby algorithm for determining the correct HTML structure

I have to as input an array with hashes, each hash is a html tag description (open and end position in the text and tag type). I need to create another array where the tags are ordered.

For example:

input = [
         {start_p: 0, end_p: 100, start_t: '<p>', end_t: '</p>'},
         {start_p: 10, end_p: 50, start_t: '<p>', end_t: '</p>'},
         {start_p: 0, end_p: 100, start_t: '<span>', end_t: '</span>'},
         {start_p: 20, end_p: 30, start_t: '<em>', end_t: '</em>'},
         {start_p: 40, end_p: 50, start_t: '<em>', end_t: '</em>'},
         {start_p: 50, end_p: 60, start_t: '<em>', end_t: '</em>'},
         {start_p: 70, end_p: 80, start_t: '<em>', end_t: '</em>'},
         {start_p: 8, end_p: 99, start_t: '<strong>', end_t: '</strong>'}
        ]

expected_output: [<p><span><strong><p><em></em><em></em></p><em></em><em></em></strong></span></p>]

      

not just output tags, each tag must be a hash with position and tag, for example:

     {position: 0, tag: '<p>'}

      

Most importantly, it needs to be ordered in the correct order, following the HTML rules that do not have overlapping tags (if multiple tags end at the same position, the first, open last, should go first, if it ends and the other opens at same position, end will be first, etc.).

It is part of a legacy system and the login and logout cannot be changed at this time. Also, the input can be very large (hundreds of thousands of items)

Better solution than just brute force, recursive?

+3


source to share


2 answers


input.group_by { |h| h[:start_p] }.
      values.
      flat_map do |a|
        x = 1.0
        a.flat_map do |h|
          x /= 2.0
          [[h[:start_p] += x, h[:start_t]], [h[:end_p] -= x, h[:end_t]]]
        end
      end.sort_by(&:first).map(&:last).join
#=> "<span><p><strong><p><em></em><em></p></em><em></em><em></em></strong></p></span>"

      

Following are the steps.

b = input.group_by { |h| h[:start_p] }
  #=> { 0=>[{:start_p=>0, :end_p=>100, :start_t=>"<p>", :end_t=>"</p>"},
  #        {:start_p=>0, :end_p=>100, :start_t=>"<span>", :end_t=>"</span>"}],
  #    10=>[{:start_p=>10, :end_p=>50, :start_t=>"<p>", :end_t=>"</p>"}],
  #    20=>[{:start_p=>20, :end_p=>30, :start_t=>"<em>", :end_t=>"</em>"}],
  #    40=>[{:start_p=>40, :end_p=>50, :start_t=>"<em>", :end_t=>"</em>"}],
  #    50=>[{:start_p=>50, :end_p=>60, :start_t=>"<em>", :end_t=>"</em>"}],
  #    70=>[{:start_p=>70, :end_p=>80, :start_t=>"<em>", :end_t=>"</em>"}],
  #     8=>[{:start_p=> 8, :end_p=>99, :start_t=>"<strong>", :end_t=>"</strong>"}]}
c = b.values
  #=> [[{:start_p=>0, :end_p=>100, :start_t=>"<p>", :end_t=>"</p>"},
  #     {:start_p=>0, :end_p=>100, :start_t=>"<span>", :end_t=>"</span>"}],
  #    [{:start_p=>10, :end_p=>50, :start_t=>"<p>", :end_t=>"</p>"}],
  #   ...
  #    [{:start_p=>8, :end_p=>99, :start_t=>"<strong>", :end_t=>"</strong>"}]]
d = c.flat_map do |a|
      x = 1.0
      a.flat_map do |h|
        x /= 2.0
        [[h[:start_p] += x, h[:start_t]], [h[:end_p] -= x, h[:end_t]]]
      end
    end
  #=> [[0.5, "<p>"], [99.5, "</p>"], [0.25, "<span>"], [99.75, "</span>"],
  #    [10.5, "<p>"], [49.5, "</p>"], [20.5, "<em>"], [29.5, "</em>"],
  #    [40.5, "<em>"], [49.5, "</em>"], [50.5, "<em>"], [59.5, "</em>"],
  #    [70.5, "<em>"], [79.5, "</em>"], [8.5, "<strong>"], [98.5, "</strong>"]]

      



The first four elements d

(tuples) are the most important to understand the approach I have used.

e = d.sort_by(&:first)
  #=> [[0.25, "<span>"], [0.5, "<p>"], [8.5, "<strong>"], [10.5, "<p>"],
  #    [20.5, "<em>"], [29.5, "</em>"], [40.5, "<em>"], [49.5, "</p>"],
  #    [49.5, "</em>"], [50.5, "<em>"], [59.5, "</em>"], [70.5, "<em>"],
  #    [79.5, "</em>"], [98.5, "</strong>"], [99.5, "</p>"], [99.75, "</span>"]]

f = e.map(&:last)
  #=> ["<span>", "<p>", "<strong>", "<p>", "<em>", "</em>", "<em>", "</p>",
  #    "</em>", "<em>", "</em>", "<em>", "</em>", "</strong>", "</p>", "</span>"]
f.join
  #=> "<span><p><strong><p><em></em><em></p></em><em></em><em></em></strong></p></span>"

      

When asked, make the calculation d

above.

+1


source


I'm not sure what you mean by brute force, recursive, but it can be done with sort_by

and map

. This is a question of how to make yours sort_by

correct to achieve the required HTML rules.

output = input.sort_by { |hsh| hsh[:start_p] }.map{|x| x.slice(:start_p, :start_t)}
output.each do |h|
  h[:position] = h.delete(:start_p)  
  h[:tag] = h.delete(:start_t)  
end

      



Monkey trimming method.

module MyExtension
  module Hash 
    def slice(*keys)
      ::Hash[[keys, self.values_at(*keys)].transpose]
    end
  end
end
Hash.include MyExtension::Hash

      

0


source







All Articles