Capturing nonzero elements, graphs, and sparse matrix indices

I have the following sparse matrix A.

   2   3   0   0   0
   3   0   4   0   6
   0  -1  -3   2   0
   0   0   1   0   0
   0   4   2   0   1

      

Then I would like to get the following information from there:

  • the total number of records since the matrix is ​​scanned column by column. Yielding:

    Ap = [0, 2, 5, 9, 10, 12];

  • row indices of records as the matrix is ​​scanned column by column. Yielding:

    Ai = [0, 1, 0, 2, 4, 1, 2, 3, 4, 2, 1, 4];

  • Non-zero elements of the matrix, since the matrix is ​​scanned column by column. Yielding:

    Ax = [2, 3, 3, -1, 4, 4, -3, 1, 2, 2, 6, 1];

Since the actual matrix A is potentially very large, is there any efficient way in Perl that can capture these elements? Moreover, without any matrix A into RAM.

I am sticking with the following code. Which doesn't give what I want.

use strict;
use warnings;

my (@Ax, @Ai, @Ap) = ();
while (<>) {
    chomp;
    my @elements = split /\s+/;
    my $i = 0;
    my $new_line = 1;
    while (defined(my $element = shift @elements)) {
        $i++;
        if ($element) {
            push @Ax, 0 + $element;
            if ($new_line) {
                push @Ai, scalar @Ax;
                $new_line = 0;
            }

            push @Ap, $i;
        }
    }
}
push @Ai, 1 + @Ax;
print('@Ax  = [', join(" ", @Ax), "]\n");
print('@Ai = [', join(" ", @Ai), "]\n");
print('@Ap = [', join(" ", @Ap), "]\n");

      

+2


source to share


5 answers


A common strategy for storing sparse data is to drop the values ​​you don't need (zeros) and store the row and column indices with every value you care about, while preserving its positional information:

[VALUE, ROW, COLUMN]

      



In your case, you can save further, since all your needs can be met by processing the data by columns, which means that we do not need to repeat COLUMN for each value.

use strict;
use warnings;
use Data::Dumper;

my ($r, $c, @dataC, @Ap, @Ai, @Ax, $cumul);

# Read data row by row, storing non-zero values by column.
#    $dataC[COLUMN] = [
#        [VALUE, ROW],
#        [VALUE, ROW],
#        etc.
#    ]
$r = -1;
while (<DATA>) {
    chomp;
    $r ++;
    $c = -1;
    for my $v ( split '\s+', $_ ){
        $c ++;
        push @{$dataC[$c]}, [$v, $r] if $v;
    }
}

# Iterate through the data column by column
# to compute the three result arrays.
$cumul = 0;
@Ap = ($cumul);
$c = -1;
for my $column (@dataC){
    $c ++;
    $cumul += @$column;
    push @Ap, $cumul;
    for my $value (@$column){
        push @Ax, $value->[0];
        push @Ai, $value->[1];
    }
}

__DATA__
2   3   0   0   0
3   0   4   0   6
0  -1  -3   2   0
0   0   1   0   0
0   4   2   0   1

      

+1


source


This is what you are looking for, I think:



#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper::Simple;

my @matrix;

# Populate @matrix
while (<>) {
    push @matrix, [ split /\s+/ ];
}

my $columns = @{ $matrix[0] };
my $rows    = @matrix;

my ( @Ap, @Ai, @Ax );
my $ap = 0;

for ( my $j = 0 ; $j <= $rows ; $j++ ) {
    for ( my $i = 0 ; $i <= $columns ; $i++ ) {
        if ( $matrix[$i]->[$j] ) {
            $ap++;
            push @Ai, $i;
            push @Ax, $matrix[$i]->[$j];
        }
    }
    push @Ap, $ap;
}

print Dumper @Ap;
print Dumper @Ai;
print Dumper @Ax;

      

+1


source


Updated based on FM comment. If you don't want to store any raw data:

#!/usr/bin/perl

use strict;
use warnings;

my %matrix_info;

while ( <DATA> ) {
    chomp;
    last unless /[0-9]/;
    my @v = map {0 + $_ } split;
    for (my $i = 0; $i < @v; ++$i) {
        if ( $v[$i] ) {
            push @{ $matrix_info{$i}->{indices} }, $. - 1;
            push @{ $matrix_info{$i}->{nonzero} }, $v[$i];
        }
    }
}

my @cum_count = (0);
my @row_indices;
my @nonzero;

for my $i ( sort {$a <=> $b } keys %matrix_info ) {
    my $mi = $matrix_info{$i};
    push @nonzero, @{ $mi->{nonzero} };

    my @i = @{ $mi->{indices} };

    push @cum_count, $cum_count[-1] + @i;
    push @row_indices, @i;
}

print(
    "\@Ap = [@cum_count]\n",
    "\@Ai = [@row_indices]\n",
    "\@Ax = [@nonzero]\n",
);

__DATA__
2   3   0   0   0
3   0   4   0   6
0  -1  -3   2   0
0   0   1   0   0
0   4   2   0   1

      

Output:

C: \ Temp> m
@Ap = [0 2 5 9 10 12]
@Ai = [0 1 0 2 4 1 2 3 4 2 1 4]
@Ax = [2 3 3 -1 4 4 -3 1 2 2 6 1]
+1


source


Ap is easy: just start with zeros and increase every time you encounter a non-zero number. I don't see you trying to write anything in @Ap, so it should come as no surprise that this is not how you wish.

Ai and Ax are more complex: you want to order the columns while scanning. You won't be able to do anything in place since you don't yet know how many elements the columns will yield, so you cannot know the position of the elements in advance.

Obviously, it would be much easier if you just changed the requirement so that the sketches are ordered instead. Otherwise you can get complicated and collect (i, j, x) triplets. Collecting, they will naturally be ordered (i, j). Post-collection, you just want to sort them by (j, i).

0


source


The code you provided works line by line. To get results column by column, you have to accumulate your values ​​into separate arrays, one for each column:

# will look like ([], [], [] ...), one [] for each column.
my @columns;

while (<MATRIX>) {
    my @row = split qr'\s+';
    for (my $col = 0; $col < @row; $col++) {

        # push each non-zero value into its column
        push @{$columns[$col]}, $row[$col] if $row[$col] > 0;

    }
}

# now you only need to flatten it to get the desired kind of output:
use List::Flatten;
@non_zero = flat @columns;

      

See also List :: Flatten .

0


source







All Articles