Get all content in a tag using BeautifulSoup
I am trying to get all content in an article tag, say http://magazine.magix.com/de/5-tipps-fuer-die-fotobearbeitung/
However, when using
print soup.article
Comes only to "... Foto auf verschiedene Art und Weise und fΓΌr verschiedene Zwecke bearbeiten".
Whole code:
from bs4 import BeautifulSoup
import requests
request_page = requests.get('http://magazine.magix.com/de/5-tipps-fuer-die-fotobearbeitung/', 'html.parser')
source = request_page.text
soup = BeautifulSoup(source, "html.parser")
print soup.article.text
How can I get everything?
source to share
Ok, finally found it. Welcome to the wonderful world of scraping.
There <article>
are tags in the tag </br>
, guy necessarily means <br/>
.
Anyway, it interrupts the html stream, so BS tries to parse it.
This is how I solved it:
from bs4 import BeautifulSoup
import requests
request_page = requests.get('http://magazine.magix.com/de/5-tipps-fuer-die-fotobearbeitung/', 'html.parser')
source = request_page.text
source = source.replace('</br>', '<br/>')
soup = BeautifulSoup(source, "html.parser")
print soup.article
(I replaced </br>
with <br/>
...)
This is a great selection, this material is legion, count on it :)
source to share