Wait nested for ... loop
async traverse(url) {
const ts = new TournamentScraper()
const ms = new MatchScraper()
const results = []
const tournaments = await ts.run(url)
for(let href of tournaments.map(t => t.href)){
let matches = await ms.run(href)
let pages = ms.getPages()
let seasons = ms.getSeasons()
//console.log(pages)
//console.log(seasons)
results.push(matches)
for(let href of pages) {
//console.log(href)
matches = await ms.run(href)
//console.log(matches)
results.push(matches)
}
}
return results
}
TournamentScraper returns an array of objects, which usually looks like this:
{name: 'Foo', href: 'www.example.org/tournaments/foo/'}
The link points to the first page of the first season of the tournament. This page contains links to other seasons and page (if any).
Running MatchScraper returns some data and sets the dom instance property. getPages()
and getSeasons()
consumes this property and each one returns an array of references.
The problem with the results only contains the first batch of matches. I can see matches for the 2nd page in the console log, but they are not in the results array when returned traverse
.
I found this rule which is against waiting in a for loop. The problem that I have to wait ms.run(href)
is because it installs the dom as well getPages()
and getSeasons()
needs to be installed to fetch the links I want.
source to share
I think this should work. It uses Promise everything, not for loops
const run = href => ms.run(href);
async function getMatches(href) {
const out = [];
const matches = await run(href);
const pages = ms.getPages();
out.push(matches);
if(pages.length) {
const pageResults = await Promise.all(pages.map(href => run(href)));
out.push(...pageResults);
}
return out;
}
async function traverse(url) {
const ts = new TournamentScraper();
const ms = new MatchScraper();
const tournaments = await ts.run(url)
const matches = await Promise.all(tournaments.map(t => getMatches(t.href)));
return matches.reduce((a, b) => {
a.push(...b);
return a;
}, []);
}
source to share