An alternative to HTML parsing with regex

Question

An alternative to HTML parsing with regex

I am parsing HTML with regex in node.js to return a string. However, I was told that this is not a good idea in this post: Pull a specific string from an HTTP request in node.js

What are the more stable alternatives?

I am new to programming, so links to tutorials would be very helpful. Some documentation explanations are hard for me to understand.

+3

javascript node.js regex parsing

mnort9 Apr 07 12 at 22:19

source to share

1 answer

josh3736 · Accepted Answer · 2012-04-07T23:09:18+0000

node-htmlparser handles all the heavy lifting of HTML parsing. In addition, node-soupselect allows you to use a CSS style selector to find which element you are looking for.

However , I looked at your other question, and the question you really should be asking is not "how to clear this data from the HTML page", but rather "is there a better way to get the data I'm looking for?" The USGS has APIs that provide its data in a machine-readable form .

Here's a JSON object for the location you are in. To get the "most recent instantaneous" for the elevation of the tank surface, you load this file, run var d = JSON.parse

and:

for (var i = 0; i < d.value.timeSeries.length; i++) {
    if (d.value.timeSeries[i].variable.variableName == 'Elevation of reservoir water surface above datum, ft') {
        var result = d.value.timeSeries[i].values[0].value[d.value.timeSeries[i].values[0].value.length-1];
    }
}

result

will now look like { dateTime: "2012-04-07T17:15:00.000-05:00", value: "1065.91" }

.

An alternative to HTML parsing with regex

More articles: