Parsing HTML from a local file

I am using Google App Engine with Python. I want to get a tree of HTML file from the same project as my Python script. I've tried many things, for example using an absolute url (for example http: // localhost: 8080 / nl / home.html ) and a relative url (/nl/home.html). Both don't seem to work. I am using this code:

class HomePage(webapp2.RequestHandler):    
    def get(self):

        path = self.request.path

        htmlfile = etree.parse(path)
        template = jinja_environment.get_template('/nl/template.html')

        pagetitle = htmlfile.find(".//title").text
        body = htmlfile.get_element_by_id("body").toString()

      

It returns the following error: IOError: Error reading file '/nl/home.html': Could not load external entity "/nl/home.html

Does anyone know how to get an HTML file tree from the same project with Python?

EDIT

This is the working code:

class HomePage(webapp2.RequestHandler):    
def get(self):

    path = self.request.path.replace("/","",1)
    logging.info(path)

    htmlfile = html.fromstring(urllib.urlopen(path).read())   
    template = jinja_environment.get_template('/nl/template.html')

    pagetitle = htmlfile.find(".//title").text
    body = innerHTML(htmlfile.get_element_by_id("body"))

def innerHTML(node): 
    buildString = ''
    for child in node:
        buildString += html.tostring(child)
    return buildString

      

+3


source to share


3 answers


Your working directory is the base of your application directory. Therefore, if your application is organized like this:

  • app.yaml
  • nl /
    • home.html


Then you can read your file into nl/html.html

(assuming you haven't changed your working directory).

+2


source


Looks like a permissions issue; make sure you python script can access the file. Does it work if you make this file available to everyone?



0


source


I believe your error is in your file path. You are assuming that your application directory is the root file on the server. It's not obligatory. Actually, I couldn't find any documentation on where the files will be, so this is what I'm doing (it works on a dev server, I'm not tired of it yet):

I am assuming that Google is storing relative file locations in my application. So if I know the location of one file, I can determine the location of the rest of my files. Fortunately, the python spec allows you to programmatically locate the python source file like this:

def get_src_dir(){
    return os.path.dirname(os.path.realpath(__file__))
}

      

get_src_dir () will get you the location of the source file.

os.path.join(get_src_dir(), rel_path_to_asset)

      

will now provide you with the path to your asset. rel_path_to_asset is the path to the asset relative to the source file, the get_src_dir () function is in ...

0


source







All Articles