How do you manage millions of records?
I really need expert help to answer my request.
Here's the script:
I'm using a sql select query to fetch a million records. I need to sort and group the resulting records that are stored in a datatable (in one execution) and loop through it for grouping and sorting. I know this is so childish and not the right way to handle it. How can I efficiently manage millions of records and apply grouping and sorting to them?
Really need help here. Heard about doing batch query selection, but how do we implement grouping and sorting when we don't have all the data in hand? I can't go for ordering sql and group directly and this is contrary to my requirement.
Here's what I'm doing right now:
I have the following objects i.e. column names to group and sort
List<Group> groupList;
List<Sort> sortList;
DataTable reportData; // Here im having the entire records from db
      
        
        
        
      
    I'm looping through the string "reportData" line by line and matching the current and previous line for custom grouping and sorting. I would like to know how this can be done when we use periodic execution or is there any alternative solution?
I need to do sorting and grouping by the resulting records, which im storing in a datatable (in one execution) and looping it for grouping and sorting.
What for?
Seriously.
Don't wait, then try plaing smart with a silly object model (and datasets aren't particularly smart, sorry).
Group and sort in your select clause, pull out the data that is already grouped and combined and run with it.
A million records is a small amount of data for sql server when the original version was released (4.2, this is a sysase sql server) 17 years ago. These days this is something that fits into the cpu level cache and is not the correct sql server even realizes what it just processed.
SQL is a great announcement for projects, and since they didn't go into MARS, you can even run multiple queries over the same connection, which comes in handy here.
So go back - dump the dataset and "I'm trying to program a sort" and create the appropriate SQL statements to pull the data as needed.
It looks like you should implement Partition Pruning . Splitting will allow you to split the content as you request in order to have faster requests.
For example:
1. Suppose you would like to group them by their DocumentTypeID
var groupByType = reportData.GroupBy(g=>g.DocumentTypeID); 
      
        
        
        
      
    2. Sort alphabetically
 var sortAlphabetically = reportData.OrderBy(g=>g.DocumentName);
      
        
        
        
      
    3. Grouping and sorting
var groupAndSort = reportData.GroupBy(g=>g.DocumentTypeID)
                             .OrderBy(g=>g.DocumentName);
      
        
        
        
      
    4. Sort and group
var groupAndSort = reportData.OrderBy(g=>g.DocumentName)
                             .GroupBy(g=>g.DocumentTypeID);
      
        
        
        
      
    5. Multiple groupings and sorting
var multipleGroupAndSort = reportData.GroupBy(g=>g.DocumentTypeID)
                                     .GroupBy(g=>g.CreatedOnDate.Month)
                                     .OrderBy(g=>g.DocumentName);
      
        
        
        
      
    so on, etc.
But I will still discourage casting a million lines into the application. It will cost memory. Of course there are ways to manipulate it with stored procedures, etc.
If I understand correctly, in your case, I would create a temporary database table with the structure I want, especially to cover my grouping.
Then I would select the records from the main tables and insert them into the temp where all the changes were made including grouping.
A specific index of how you want to sort them must also be applied.
After that, just select from that table, do what you need to do, and finally, if the data is no longer needed, delete the temp table.
I would go with the above solution because a million entries in memory smells like trouble to me ...