Remove the price from the website

I am trying to clear a price from a webpage using PHP and Regexes. The price will be in the format £ 123.12 or $ 123.12 (i.e. Pounds or Dollars).

I am downloading content using libcurl. The output of which then goes into preg_match_all

. So it looks something like this:

$contents = curl_exec($curl);

preg_match_all('/(?:\$|£)[0-9]+(?:\.[0-9]{2})?/', $contents, $matches);

      

So far so simple. The problem is that PHP doesn't fit anywhere, even if there are prices on the page. I've narrowed it down to the "£" symbol problem - PHP doesn't seem to like it.

I think it might be an encoding issue. But whatever I do, I can't get PHP to match this! Does anyone have any idea?

(Edit: It should be noted that if I try to use the Regex Test Tool using the same regex and page content, it works fine)

0


source to share


3 answers


You are trying to use \ before £

preg_match_all('/(\$|\£)[0-9]+(\.[0-9]{2})/', $contents, $matches);

      



I tried this expression with .Net with \ E and it works. I just edited it and removed some ":" alt text http://clip2net.com/clip/m12122/1227972904-clip-9kb.png

Read my comment on the possibility that Curl will give you bad encoding (comment on this post).

+1


source


maybe pound has html object replacement? i think you should try your regex with some kind of tillering program (i.e. match it with fixed text locally).



i would change my regex like this: '/(?:\$|£)\d+(?:\.\d{2})?/'

0


source


This should work for simple values.

'#(?:\$|\£|\€)(\d+(?:\.\d+)?)#'

      

This will not work with a thousand separator like 234.343 and 34.454.45.

0


source







All Articles