Goquery - combine a tag with the one that follows
For some background information I'm new to Go (3 or 4 days), but I'm more comfortable starting with it.
I am trying to use a goquery
web page to parse. (I want to put some data into the database after all). For my problem, an example would be the simplest way to explain it:
<html>
<body>
<h1>
<span class="text">Go </span>
</h1>
<p>
<span class="text">totally </span>
<span class="post">kicks </span>
</p>
<p>
<span class="text">hacks </span>
<span class="post">its </span>
</p>
<h1>
<span class="text">debugger </span>
</h1>
<p>
<span class="text">should </span>
<span class="post">be </span>
</p>
<p>
<span class="text">called </span>
<span class="post">ogle </span>
</p>
<h3>
<span class="statement">true</span>
</h3>
</body>
<html>
I would like to:
- Extract content
<h1..."text"
. - Insert (and merge) this extracted content into content
<p..."text"
. - Do this only for the tag
<p>
that immediately follows the tag<h1>
. - Do this for all tags
<h1>
on the page.
So, I want it to look like this:
<html>
<body>
<p>
<span class="text">Go totally </span>
<span class="post">kicks </span>
</p>
<p>
<span class="text">hacks </span>
<span class="post">its </span>
</p>
<p>
<span class="text">debugger should </span>
<span class="post">be </span>
</p>
<p>
<span class="text">called </span>
<span class="post">ogle</span>
</p>
<h3>
<span class="statement">true</span>
</h3>
</body>
<html>
When the code starts like this
package main
import (
"fmt"
"strings"
"github.com/PuerkitoBio/goquery"
)
func main() {
html_code := strings.NewReader(`code_example_above`)
doc, _ := goquery.NewDocumentFromReader(html_code)
I know that I can read <h1..."text"
with:
h3_tag := doc.Find("h3 .text")
I also know that I can add content <h1..."text"
to content <p..."text"
like this:
doc.Find("p .text").Before("h3 .text")
^ But this command inserts content from each case <h1..."text"
before each case <p..."text"
.
Then I found out how to take a step closer to what I want:
doc.Find("p .text").First().Before("h3 .text")
^ This command inserts content from each individual case <h1..."text"
only up to the first case <p..."text"
(which is closer to what I want).
I also tried using the function goquery
Each()
, but I couldn't get close to what I wanted with this method (although I'm sure there is a way to do it using Each()
, right?)
My biggest problem is that I can't figure out how to bind each instance <h1..."text"
to the instance <p..."text"
that immediately follows it.
If it helps, <h1..."text"
always , then <p..."text"
on the webpages that I am trying to analyze.
My brain is made of juice. Do the geniuses of Go Go know how to do this and are ready to explain it? Thanks in advance.
EDIT
I found out something else that I can do:
doc.Find("h1").Each(func(i int, s *goquery.Selection) {
nex := s.Next().Text()
fmt.Println(s.Text(), nex, "\n\n")
})
^ This prints out what I want - the contents of each instance <h1..."text"
followed by its immediate instance <p..."text"
. I thought it s.Next()
outputs the next instance <h1>
, but outputs the next tag in doc
- the *goquery.Selection
, through which it iterates. It is right?
Or, as pointed out mattn
, I might as well use doc.Find("h1+p")
.
I am still having a hard time adding <h1..."text"
to <p..."text"
. I'll post it as another question, because you can break this down into multiple questions and mattn
there is one already answered.
source to share