Optimizing feed loading

I am currently working on a site that needs to load user feeds. But how can I best optimize the fetch if I have a database of, say, 300 feeds. I'm going to create a cron job that uploads feeds, but should I do it like 5 times per second or something?

Any ideas on how to do this best in PHP?

+1


source to share


4 answers


Based on the new information, I think I would do something like this:

Let the "first" client initiate the update and store a timestamp with it. Everey other clients who will request information receive cashing information until this information becomes old. The next click from the client will then update the cashe which will then be used by all clients until the next time to the old one.



The client who actually initiates the update doesn't have to wait for it to go Finnish, just serve the old cashed version and keep doing this until the job is done.

This way you don't need to update everything if clients don't ask for it.

+2


source


If I understand your question, are you mainly working on an aggregator site?

You can do the following: start by updating every 1 hour (for example). When you have some posts from a feed, calculate the average interval between posts. Then use that interval as the interval to receive this feed.

For example, if the site has published 7 articles in the last 7 days, you can receive feeds from it every 24 hours (1 day).

I use this algorithm with a few modifications, when I calculate this average interval, I divide it by 2 (so as not to get too infrequent). If the result is less than 60 minutes - I set the interval to 1 hour or it is greater than 24, I set it to 24 hours.



For example, something like this:

    public function updateRefreshInterval() {
            $sql = 'select count(*) _count ' .
                    'from article ' .
                    'where created>adddate(now(), interval -7 day) and feed_id = ' . (int) $this->getId();
            $array = Db::loadArray( $sql );

            $count = $array[ '_count' ];

            $interval = 7 * 24 * 60 * 60 / ( $count + 1 );
            $interval = $interval / 2;
            if( $interval < self::MIN_REFRESH_INTERVAL ) {
                    $interval = self::MIN_REFRESH_INTERVAL;
            }
            if( $interval > self::MAX_REFRESH_INTERVAL ) {
                    $interval = self::MAX_REFRESH_INTERVAL;
            }

            Db::execute( 'update feed set refresh_interval = ' . $interval . ' where id = ' . (int) $this->getId() );
    }

      

The "feed" table, "refreshed" is the timestampt when the feed was last refreshed, and "refresh_interval" is the desired time interval between two samples of the same feed.

+3


source


It's best to be “good” rather than overloading your feeds with a lot of unnecessary requests. I have set the update time to 1 hour for one of my webapps which is tracking about 150 blogs for updates. I save the time when they were last checked in the database and use that to decide when to update them. Channels were added at random times, so they don't all update at the same time.

0


source


I wrote pfetch to do this for me. It's small, but it has a couple of really important aspects:

  • It is written in a coiled state and can handle massive concurrency even if the network is slow.
  • It doesn't require any cron attacks or anything else.

I actually wrote this because my cron based builders became a problem. I now have a setup to get some random things I want all over the internet and then run scripts when something changes to update parts of my own website.

0


source







All Articles