What is the most efficient way to compare rows in a MySQL table with Java

This is largely a conceptual question, so I have no code to show. I will try to explain this as best I can. I am writing a program that needs to find common sequences of numbers found in a large table of random combinations.

So, for example, take this data:

1 5 3 9 6 3 8 8 3 3

6 7 5 5 5 4 9 2 0 1

6 4 4 3 7 8 3 9 5 6 

2 4 2 4 5 5 3 4 7 7 

1 5 6 3 4 9 9 3 3 2

0 2 7 9 4 5 3 9 8 3

      

These are random combinations of numbers 1-9. For every 3 digit (or more) sequence found more than once, I need to put it in a different database. So the first line contains "5 3 9" and the 6th line also contains "5 3 9". I would put this sequence in a separate table with the number of times it was found.

I am still developing an algorithm for actually doing these comparisons, but I think I will need to start with "1 5 3", compare that to every three-volume trio, then go to "5 3 9" then "3 9 6" and etc.

MY MAIN RIGHT NOW PROBLEM is that I don't know how to do this if these numbers are stored in the database. There are 11 columns in my database table. One column for each individual number and one column for the 10-digit sequence as a whole. The columns are named Sequence, 1stNum, 2ndNum, 3rdNum ... 10thNum.

Visual: The first line in my database for the above data would be:

|  1 5 3 9 6 3 8 8 3 3  |  1  |  5  |  3  |  9  |  6  |  3  |  8  |  8  |  3  |  3  |

      

("|" divide columns)

How to do comparisons efficiently with Java? I iterate over each row in the table many times. Once for the initial sequence to be compared, and for each of those sequences, I go through each line. Basically a for loop in a for loop. It sounds like it's going to take a ton of queries and could take forever if the table gets massive (which it will).

Is it more computationally efficient if I iterate through the database using queries or if I drop the database and iterate over the file?

I tried to explain it as best I can, it is a very confusing process for me. I can clarify everything you need. I just need to be guided by what the best course of action would be for this.

+3


source to share


2 answers


Here's what I would do, assuming you got the sequences in a list:

List<String> sequences = Arrays.asList("1539638833","6755549201","6443783956","2424553477","1563499332","0279453983");
Map<String,Integer> count = new HashMap<>();
for (String seq : sequences) {
    int length = seq.length();
    for (int i=0 ; i<length - 2 ; i++) {
        String sub = seq.substring(i,i + 3);
        count.put(sub,count.containsKey(sub) ? count.get(sub) + 1 : 1);
    }
}
System.out.println(count);   

      

Output:



{920=1, 783=1, 945=1, 332=1, 963=1, 644=1, 156=1, 983=1, 453=1, 153=1, 388=1, 534=1,
 455=1, 245=1, 539=2, 554=1, 242=1, 555=1, 553=1, 437=1, 883=1, 349=1, 755=1, 675=1,
 638=1, 395=1, 201=1, 956=1, 933=1, 499=1, 634=1, 839=1, 794=1, 027=1, 477=1, 833=1,
 347=1, 492=1, 378=1, 279=1, 993=1, 443=1, 396=1, 398=1, 549=1, 563=1, 424=1}

      

Then you can store these values ​​in the database from the card.

+1


source


You can do this in sql with a union clause:

select sum(c), sequence 
from
(   
    select
        count(*) as c, concat(col1 ,col2 , col3) as sequence
        from t
        group by col1, col2, col3
    union
    select
        count(*) as c, concat(col2 ,col3 , col4) as sequence
        from t
        group by col2, col3, col4
    union (... and so on enumerating through the column combinations)
) as tt
group by sequence

      



I would imagine a pure Java implementation would be faster and have less memory overhead. But if you already have this in your database, it might be fast enough.

0


source







All Articles