Fastest service for crawling web pages or calling APIs (iTunes in particular)?

We need to download metadata for all iOS apps on a daily basis. We plan to extract the information by crawling the iTunes site and using the iTunes search API. Since there are 700K + apps out there, we need an efficient way to do this.

One approach is to set up a bunch of EC2 scripts and run them in parallel. Before we start this journey, are there services like 80legs that people have used to accomplish a similar task? Basically, we want something to help us crawl hundreds of thousands of pages (or make a bunch of API calls) very quickly.

+1


source to share


1 answer


You might want to take a look at the Apple Enterprise Partner Feed (EPF) . This is likely to be a lot cheaper than getting a bunch of EC2 machines or building a scan infrastructure to cleanse the data. From the EFP description itself:

The Enterprise Partner Feed is a data feed for a complete set of metadata from iTunes and the App Store. For partner partners, full integration of the iTunes and App Store aspects of the website or app is available.

EPF has two feed modes



iTunes generates EPF data in two modes:

full mode
   incremental mode

The full export is generated weekly and contains a complete snapshot of iTunes metadata on the day of generation. An incremental export is generated daily and contains records that have been added or changed since the last full export. Incremental exports are located relative to the total exports on which they are based.

Obviously, you should use full mode when you want to populate your data, then you will use incremental for daily updates.

Good luck.

+3


source







All Articles