Scrambling a web page based on fonts and fonts

Scraper HTML text can be done using various libraries that can be found on the Internet. I am trying to parse the largest title (title) of a webpage - just this - from different HTML pages.

I'm trying to auto-detect the main title of an element from several hundred pages (it could be a product page or an article page, etc.). It would be great if there was a way to make my parsing solution based on the font and font size of the text available on the web page. Since the main title is almost always the text with the largest font on a web page, this information can give me a lot of information about where to find the title.

So there are questions, is there a way this can be accomplished?

+3


source to share


1 answer


I suppose you could do it like this , but this is a very resource intensive task because you are looping through all the html elements in the body.



var text,
    size = 0;

$("body, body *").each(function() {
    var f_size = parseInt($(this).css("fontSize"));
    if (size<f_size) {
        text = $(this).text();
        size = f_size;
    }
    console.log(this.tagName + " " + f_size);
});

      

+1


source







All Articles