Comparing rowset and file using Perl

Question

Comparing rowset and file using Perl

I am trying to write a Perl script that will detect the difference between a set of lines and a file, and I want to print the content of a file that does not match the lines.

My INPUT1 will look like: (rowset)

AAAAA
BBBBB
CCCCC
DDDDD
EEEEE   --- These are user ids which should be passed in the script

My INPUT2 will be a User.txt file with multiple IDs, including the ones above.

ABBAAA
ACARVAV
AAAAA
BBBBB
CCCCC
DDDDD
EEEEE
BGATA
ETYUIOL

I want my output to be like

ABBAAA
ACARVAV
BGATA
ETYUIOL

So far I have reached

my @things_to_find = qw(AAAAAA BBBBB CCCCC DDDDD EEEEE);
my $comparefile = "User.txt";
open ( my $compare_filehandle, "<", $comparefile ) or die $!;
while ( my $line = <$compare_filehandle> ) 
{
    foreach my $thing ( @things_to_find )
    {
        print "Match found with: $line" if $line !~ /$thing/;
    }
}

But this does not give the desired result. I am very new to Perl so any suggestions from you will be very helpful to me.

+3

arrays regex perl

Divya ramachandran Aug 14 '14 at 11:30

source to share

4 answers

Toto · Answer 1 · 2014-08-14T11:41:48+0000

Try:

use List::Util qw(none);
my @things_to_find = qw(AAAAAA BBBBB CCCCC DDDDD EEEEE);
my $comparefile = "User.txt";
open ( my $compare_filehandle, "<", $comparefile ) or die $!;
while ( my $line = <$compare_filehandle> ) 
{
    print $line if none { $line =~ /\b$_\b/}  @things_to_find;
}

Doc List :: Util

Kalanidhi · Answer 2 · 2014-08-14T11:41:04+0000

You can try this simple one grep

for a matched pattern.

use strict;
use warnings;
use autodie;

my @users = qw(AAAAAA BBBBB CCCCC DDDDD EEEEE);

my $file = "User.txt";
open my $fh, "<", $file;
while ( my $line = <$fh> ) {
    chomp $line;
    print "Matched line : $line\n" unless grep {$line eq $_} @users;
}

Note:
Use grep

and is map

better than foreach

or to search for

.

Borodin · Answer 3 · 2014-08-14T12:10:52+0000

In a way, your own code finds all the lines in the file that don't contain all the lines in the list, when it shouldn't equal any of those lines. You need to change the tightness test to the equality test; skip the line as soon as a match is found; and use chomp

to remove trailing newline from lines read from file.

There are two obvious ways to write this. The first is to create a hash, which is actually an array that is indexed with a string instead of an integer. If you fill in the hash using the entries from the file, then remove the ones in the string array. It will look like

use strict;
use warnings;

my $comparefile = 'User.txt';
my @users = qw/ AAAAA BBBBB CCCCC DDDDD EEEEE /;

open my $users_fh, '<', $comparefile or die $!;

my %file_users;
while (my $user = <$users_fh> ) {
  chomp $user;
  $file_users{$user} = 1;
}

delete $file_users{$_} for @users;

print "$_\n" for sort keys %file_users;

Output

ABBAAA
ACARVAV
BGATA
ETYUIOL

Another way is to create a regular expression from strings and use it to select those lines from the file to ignore. It will look like below and the result is identical to the previous program. This solution will be faster, but includes some more advanced ideas like regular expressions and map

so you might prefer the former.

use strict;
use warnings;

my $comparefile = 'User.txt';
my @users = qw/ AAAAA BBBBB CCCCC DDDDD EEEEE /;

my $re = join '|', map "^\Q$_\E\$", @users;
$re = qr/$re/;

open my $users_fh, '<', $comparefile or die $!;

my @file_users;
while (my $user = <$users_fh> ) {
  chomp $user;
  push @file_users, $user unless $user =~ $re;
}

print "$_\n" for sort @file_users;

fugu · Answer 4 · 2014-08-14T11:37:48+0000

use strict;
use warnings;
use autodie;

open my $in, '<', 'in.txt'; 
open my $in2, '<', 'in_2.txt';

my (%data1, %data2);
while(<$in>){
    chomp;
    $data1{$_} = 1;
}

while(<$in2>){
    chomp;
    $data2{$_} = 2;
}


foreach(sort keys %data2){
    print "$_\n" unless $data1{$_};
}

Comparing rowset and file using Perl

More articles: