How can I extract numeric data from a text file?
I want a Perl script to extract data from a text file and save it as another text file. Each line of the text file contains a URL for a jpg, for example " http://pics1.riyaj.com/thumbs/000/082/104//small.jpg ". I want the script to extract the last 6 numbers of each jpg url (for example 082104) into a variable. I want the variable to be added at a different location on every line of new text.
Input text:
text http://pics1.riyaj.com/thumbs/000/082/104/small.jpg text
text http://pics1.riyaj.com/thumbs/000/569/315/small.jpg text
Output text:
text php?id=82104 text
text php?id=569315 text
thank
source to share
What have you tried so far?
Here's a short program that gives you the problem and you can add it:
while () { s | http: //.*/ \ d + / (\ d +) / (\ d +). *? jpg | php? id = $ 1 $ 2 |; print; }
This is very close to a command line program that handles looping and printing for you with a switch -p
(see perlrun for details):
perl -pi.old -e 's|http://.*/\d+/(\d+)/(\d+).*?jpg|php?id=$1$2|' inputfile > outputfile
source to share
I didn't know if to answer according to what you described ("last 6 digits"), or just assume that it all matches the pattern you showed. So I decided to answer in both directions.
Here is a method that can handle strings that are more varied than your examples.
use FileHandle;
my $jpeg_RE = qr{
(.*?) # Anything, watching out for patterns ahead
\s+ # At least one space
(?> http:// ) # Once we match "http://" we're onto the next section
\S*? # Any non-space, watching out for what follows
( (?: \d+ / )* # At least one digit, followed by a slash, any number of times
\d+ # another group of digits
) # end group
\D*? # Any number of non-digits looking ahead
\.jpg # literal string '.jpg'
\s+ # At least one space
(.*) # The rest of the line
}x;
my $infile = FileHandle->new( "<$file_in" );
my $outfile = FileHandle->new( ">$file_out" );
while ( my $line = <$infile> ) {
my ( $pre_text, $digits, $post_text ) = ( $line =~ m/$jpeg_RE/ );
$digits =~ s/\D//g;
$outfile->printf( "$pre_text php?id=%s $post_text\n", substr( $digits, -6 ));
}
$infile->close();
However, if it's as regular as you show it becomes much easier:
use FileHandle;
my $jpeg_RE = qr{
(?> \Qhttp://pics1.riyaj.com/thumbs/\E )
\d{3}
/
( \d{3} )
/
( \d{3} )
\S*?
\.jpg
}x;
my $infile = FileHandle->new( "<$file_in" );
my $outfile = FileHandle->new( ">$file_out" );
while ( my $line = <$infile> ) {
$line =~ s/$jpeg_RE/php?id=$1$2/g;
$outfile->print( $line );
}
$infile->close();
source to share