Specifying individual lines when extracting on import.io

there must be a very simple solution that I am ignoring - I have set import.io to extract from the wikipedia page here and I cannot indicate each entry in alphabetical order to be on a separate line when retrieved - when learning it selects everything to be on one line so that it is not used. Any ideas?

+3


source to share


1 answer


Wikipedia is a very complex site, not fetching data from everything (html, javascript or AJAX), but to fetch automatically. This is because Wikipedia is free and open to edit, resulting in millions of different page structures.

There are several ways to get around this, although while they are easy to apply, it depends on each use case. Instead of using our course and training type, you can manually train it by specifying XPath. For example, if the data is always structured in a table, you can use XPath: // Table This will just crawl the entire site for any tables and extract it. However, this will most likely result in unwanted tables as well, so you will need to specify which table. For example, the table on this site has the class "wikitable". Therefore, we will specify it as: // table [@ class = "wikitable"]



And then, of course, you will need to make sure that the same applies to all other striker pages. The data is easily recognizable by humans, but to understand what the computer understands, it is difficult to find a common element between the data you are looking for and tell the robot that things with that common element are what it should extract.

Thanks,
Meg

+2


source







All Articles