Serialize nice soup and xpath tree in python

First, there is a python script to get the xpath tree and nice soup given by url.

# get tree
def get_tree(url):
    r = requests.get(url)
    tree = html.fromstring(r.content)
    return tree

# get soup
def get_soup(url):
    r = requests.get(url)
    data = r.text
    soup = BeautifulSoup(data)
    return soup

      

We often need wood and soup to navigate to the source page and extract the useful information we need. But we can often have a mistake and then fix it, or we change our mind about what information we really need. So in this sense, serializing the soup and the xpath tree might be a good idea, and we can only do it once. But how can we serialize nice soup and xpath tree in python? Is there any kind of database for storing soup or wood? If not, any example code to serialize them manually? Thanks to

+3


source to share


1 answer


What I understand from your question is that you want to keep the soup variable so that you don't have to re-request the url while debugging. It looks like you don't know the python pickle module that can serialize any object. It is not without problems, but for your debugging needs, it can help you and it is very simple.

import pickle
pickle.dump(soup, open("soup.pickle","w"))
# then when you need to load the soup again
soup = pickle.load(open("soup.pickle","r"))

      



And now you have salty soup!: D

+3


source







All Articles