Using FeedJira to Build an RSS Aggregator / Reader

I am trying to create my own rss reader app in ruby ​​on rails. I want to be able to store various news in my database, which I can pull later to show each story with its title, image, summary, etc. In a good layout. I am working with the feedjira library and I am also fairly new to RoR. I know these two commands in the rails console are fetching rss feeds and parsing them somehow:

urls = %w[http://feedjira.com/blog/feed.xml https://github.com/feedjira/feedjira/feed.xml]
feeds = Feedjira::Feed.fetch_and_parse urls

      

While these two commands are running on rss feeds, I was wondering how I can set up my database / model and then save the news posts I get from Feedjira to the db. I tried looking at railscast on this but it's a little outdated. Any help on this would be hugely appreciated! Thanks in advance!

+3


source to share


2 answers


Here's one way:

Create a model like:

class Entry < ActiveRecord::Base

  attr_accessible :guid, :source_site_id, :url, :title, :summary, :description, :published_at

  def self.update_from_feed(feed_name)
    feed = Feed.find_by_name(feed_name)
    feed_data = Feedjira::Feed.fetch_and_parse(feed.feed_url)
    add_entries(feed_data.entries, feed)
  end

  private
  def self.add_entries(entries, feed)
    entries.each do |entry|
      break if exists? :entry_id => entry.id

        create!(
            :entry_id     => entry.id,
            :feed_id      => feed.id,
            :url          => entry.url,
            :title        => entry.title.sanitize,
            :summary      => entry.summary.sanitize,
            :description  => entry.content.sanitize,
            :published_at => entry.published
        )

      end
    end
  end
end

      

Then you can call this from cli / cron or whatever, for example:

rails runner -e development 'Entry.update_from_feed("feedname")'

This runs the update_from_feed method in the context of your Rails application using a separate rails instance (a bit like rails console

), but does not affect the Rails instance.

This example has a separate model that has a name and feed_urls, so there is a URL lookup based on the name provided.

This code does not use Feedjira's ability to check for updates, so the cheat check is checked. ( This issue guthub says to avoid using the #update method.

Note that usage break

assumes that new posts are always added to the beginning of the feed. If you don't trust the feed, replace break if

with unless

. The URL can be used as an alternative unique identifier.

Edit:



Here's a version of the update_from_feed method that takes advantage of Feedjira's ability to handle multiple feeds:

def self.update_all
  feed_urls = Feed.pluck :feed_url
  feeds = Feedjira::Feed.fetch_and_parse(feed_urls)

  feed_urls.each do |feed_url|
    feed = Feed.find_by_feed_url(feed_url)
    add_entries(feeds[feed_url].entries, feed)
  end
end

      

pluck

returns all rows of the specified column (): (feed_url in this case) in an array. Similarly, you can modify it to accept an array of names, from which it looks for an array of urls to pass to feedjira.

Finally, if you want to use the self-looping method, you can include:

def self.update_all_periodically(frequency = 15.minutes)
  loop do
    update_all_from_feed
    sleep frequency.to_i
  end
end

      

Then this:

rails runner -e development 'Feed.update_all_periodically'

will not return until you terminate the process and update all channels at the default frequency, or specified as an optional argument.

If you want to trigger updates asynchronously in your main Rails process, then a background worker like Sidekiq, Resque, or DelayedJob will do ... the job. :)

+2


source


Scheduling to fetch and parse all of these pipes can be incredibly difficult and time consuming, which means you absolutely don't do it from the Rails application itself. At best, you should do it using an "offline" script.



You can also just rely on existing APIs like Superfeedr and rack middleware .

0


source







All Articles