Golang web scraper ignoring specific table cells

I am working on a small web scraper just to get a feel for the golang. It is currently grabbing information from a wiki from a spreadsheet and then grabbing information specifically from cells. I currently don't have any code (not at home), but it looks something like this:

    func main() {
        doc, err := goquery.NewDocument("http://monsterhunter.wikia.com/wiki/MH4:_Item_List")
        if err != nil {
                log.Fatal(err)
        }

        doc.Find("tbody").Each(func(i int, s *goquery.Selection) {
                title := s.Find("td").Text()
                fmt.Printf(title)
        })
}

      

The problem is, in this website, the first cell is the image, so it prints the image source, which I don't want. How can I ignore the first cell in every row of a large table?

+3


source to share


1 answer


Let me clear up some things. A Selection

is a collection of nodes that meet some criteria.

doc.Find()

Selection.Find()

that returns a new Selection

one containing items that match the criteria. And Selection.Each()

iterates over each of the elements of the collection and calls the function value passed to it.

So, in your case, Find("tbody")

will find all the tbody

elements, Each()

will iterate over all the elements, tbody

and call your anonymous function.

There s

is Selection

one element inside your anonymous function tbody

. You call s.Find("td")

that returns a new Selection

one that will contain all the elements of the td

current table. So when you call Text()

it will be the combined text content of each element td

, including their children. This is not what you want.

What you need to do is call another one Each()

on the Selection

returned one s.Find("td")

. And check if the Selection

second anonymous function has a child img

.



Sample code:

doc.Find("tbody").Each(func(i int, s *goquery.Selection) {
    // s here is a tbody element
    s.Find("td").Each(func(j int, s2 *goquery.Selection) {
        // s2 here is a td element
        if s3 := s2.Find("img"); s3 != nil && s3.Length() > 0 {
            return // This TD has at least one img child, skip it
        }
        fmt.Printf(s2.Text())
    })
})

      

Alternatively, you can search for elements tr

and skip the first child of td

each row, checking if the index passed to the third anonymous function 0

(first child) has passed , something like this:

doc.Find("tbody").Each(func(i int, s *goquery.Selection) {
    // s here is a tbody element
    s.Find("tr").Each(func(j int, s2 *goquery.Selection) {
        // s2 here is a tr element
        s2.Find("td").Each(func(k int, s3 *goquery.Selection) {
            // s3 here is a td element
            if k == 0 {
                return // This is the first TD in the row
            }
            fmt.Printf(s3.Text())
        })
    })
})

      

+4


source







All Articles