Select unique ids from two data tables with three queries and merge

I have two data tables that are populated from a database. One is master data and the other is an audit table that records when a record from master data has been inserted, updated, or deleted.

Note: The deleted records will not be in the master data, but will have an audit history.

Ultimately I want to pull the DataTable out of the unique IDs with the last "AUDITDATE" and then sort by that "AUDITDATE" in descending order. Then I can have a DataTable telling me which unique ids have changed in descending order.

"AUDITDATE" and "ID" are dual data types with the format "yyyyMMddhhmsms".

So, I thought I could make three queries:

  • What audit records are contained in the master data (gives me what the existing one)
  • Which audit records are NOT master data (gives me what was deleted)
  • Which master records are NOT audited data (gives me what was not affected)

    var setCore = new HashSet<double>(geometry.dtGEOM_TABLE_CORE.AsEnumerable().Select(p => p.Field<double>("ID")));
    var setAudit = new HashSet<double>(geometry.dtGEOM_TABLE_AUDIT.AsEnumerable().Select(p => p.Field<double>("ID")));
    
    var resultAuditInCore = geometry.dtGEOM_TABLE_AUDIT.AsEnumerable()
        .Select(r => new { AUDITDATE = r.Field<double>("AUDITDATE"), ID = r.Field<double>("ID"), STATUS = "EXISTING" })
        .Where(r => setCore.Contains(r.ID))
        .OrderByDescending(r => r.AUDITDATE)
        .GroupBy(r => r.ID)
        .CopyToDataTable();
    
    var resultAuditNotInCore = geometry.dtGEOM_TABLE_AUDIT.AsEnumerable()
        .Select(r => new { AUDITDATE = r.Field<double>("AUDITDATE"), ID = r.Field<double>("ID"), STATUS = "DELETED" })
        .Where(r => !setCore.Contains(r.ID))
        .OrderByDescending(r => r.AUDITDATE)
        .GroupBy(r => r.ID)
        .CopyToDataTable();
    
    var resultCoreNotInAudit = geometry.dtGEOM_TABLE_CORE.AsEnumerable()
        .Select(r => new { AUDITDATE = (double)0, ID = r.Field<double>("ID"), STATUS = "NA" })
        .Where(r => !setAudit.Contains(r.ID))
        .OrderByDescending(r => r.ID)
        .CopyToDataTable();
    
          

After that I was going to combine all three sorts again on "AUDITDATE".

Problems: CopyToDataTable () does not work, however when deleting select()

Questions:

  • How do I get three queries to display DataTables?
  • Once the DataTables have been created, how do you combine them?
+3


source to share


1 answer


You understand that the result GroupBy

is a sequence of objects IGrouping<key, items>

, right? Each group can have a different number of elements. Such a sequence cannot be converted to a data table.

According to MSDNSystem.Data.DataTableExtensions.CopyToDataTable<T>

Common parameter T is DataRow

. My compiler is complaining that the result of your GroupBy cannot be injected for CopyToDataTable because your anonymous class is not DataRow.

So, we have to convert your pristine, deleted and checked items into one ordered sequence that can be converted to DataRows

that contains the data you want in your table.


I am assuming that the items in your main data table have at least an ID, and the items in the audit data table have at least AuditDate

and CoreDataId

by referencing the ID of the object being checked in the core data table (at least as long as the item exists).

Apparently you want all the data in the same table, I am assuming you will have this data in the object TableData

.

Since pristine elements are not in the table AuditedData

, pristine elements in the main table must have TableData

.

All deleted items are no longer part of the master data, so deleted items in the AuditedData table must also have TableData

.

Since you want to order all data by DateTime, yours TableData

must also contain AuditDate. Some elements are never checked, their AuditDate is null.

TODO: decide how to sort items that are never checked? First or last in an ordered result? *

class TableData
{
    DateTime? AuditDate {get; set;} // used to order by date. null if never audited
    ...
}

class CoreData
{
    public int Id {get; set;}     // int may be any other type
    public TableData {get; set;}  // all untouched items should have this
                                  // all others: choose what you want
    ...
}

class AuditData
{
    public DateTime AuditDate {get; set;}
    public int CoreId {get; set;}
    public TableData {get; set;}  // all deleted items should have this
                                  // all others: choose what you want
    ...
}

IEnumerable<CoreData> coreData = ...
IEnumerable<AuditData> auditedData = ...

      

I think you should be able to create LINQ queries to transform your core raw and validated data into the last mentioned sequences.

Items that were never touched (never edited or deleted)
= all items that are in coreData but not checkedData:

var auditedItemIds = auditedData
    .Select(auditedItem => auditedItem.CoreId)
    .Distinct();
IEnumerable<TableData> untouchedItems = coreData
    .Where(coreItem => !auditedItemIds.Contains(coreItem.Id))
    .Select(coreItem => coreItem.TableData;

      



Deleted elements
= all elements in AuditedData

that are not in coreData

(anymore) we only need the last auditedData element p>

var coreItemIds = coreData.Select(coreItem => coreItem.Id)
    .Distinct();
IEnumerable<TableData> deletedItems = auditedData
    // take only the audited items that are not in core data anymore:
    .Where(auditedItem => !coreItemIds.Contains(auditedItem.CoreId)
    // group the remaining items by same CoreId
    .GroupdBy(
        auditedItem => auditedItem.CoreId,     // key of the group: coreId
        autitedItem => auditedItem.TableData); // elements of the group
    // from every group (= all audited items belonging to the same core item)
    // take the element with the newest date
    // = order by descending AuditDate and take FirstOrDefault
    // because a group is created you are certain there is an element in the group
    .Select(group => group
        .OrderbyDescending(groupElement => groupElement.AuditDate)
        .FirstOrDefault());

      

Edited items
= all items that are both in coreData

and out AuditedData

, I'm not sure if you want TableData

from coreData

or from AuditedData

. LINQ query is similar.

Assuming what you want TableData

from the last audit:

var coreItemIds = coreData.Select(coreItem => coreItem.Id)
    .Distinct();
IEnumerable<TableData> newestAuditData = auditedData
    // take only auditedData that is still in coreData
    .Where(auditedItem => coreItemIds.Contains(auditedItem.CoreId)
    // group the remaining items by same CoreId
    .GroupdBy(
        auditedItem => auditedItem.CoreId,     // key of the group: coreId
        autitedItem => auditedItem.TableData); // elements of the group
    // from every group (= all audited items belonging to the same core item)
    // take the element with the newest date
    // = order by descending AuditDate and take FirstOrDefault
    // because a group is created you are certain there is an element in the group
    .Select(group => group
        .OrderbyDescending(groupElement => groupElement.AuditDate)
        .FirstOrDefault());

      

Have you noticed that the expression after is the Where

same as the deleted element, just like the checked elements?

Now you need to decide how to sort all these elements and put them in one DataRows sequence.

IOrderedEnumerable<TableData> OrderQueryResult(
    IEnumerable<TableData> unTouchedItems,
    IEnumerable<TableData> deletedItems,
    IEnumerable<TableData> auditedItems)
{
    // TODO decide what order to use
    return result;
}

class MyDataRow : DataRow
{
    MyDataRow(TableData tableData)
    {
        // in the constructor, extract the data you want in the DataRow
    }
}

IOrderedEnumerable<TableData> orderedTableData = OrderedQueryResult(
    untouchedItems, deletedItems, newestAuditData);
IEnumerable<MyDataRow> dataRows = orderedTableData
    .Select(tableData => new MyDataRow(tableData);
DataTable myTable = dataRows.CopyToDataTable();

      

Finally, I'm not sure if your database already exists, or if you are currently developing it. Your query is so complex because you have two tables containing the same data: one with your actual data and one containing your data history. If you had the following tables your queries would be much simpler

I am writing this in Entity-Framework format. If you use any other method, I'm sure you can retrieve SQL tables from it.

class CoreData
{
    public int Id {get; set;}
    public ... NeverChangingItems {get; set;}

    public bool IsObsolete {get; set;}

    // HistoricData is data that might change through history
    public virtual ICollection<HistoricData> HistoricData {get; set;}
 }

 public class HistoricData
 {
      public int Id {get; set;}
      public DateTime AuditDate {get; set;}
      public ... ItemsThatChangeThroughoutHistory {get; set;}

      // foreign key to owning Core
      public int CoreDataId {get; set;}
      public virtual CoreData CoreData {get; set;}
 }

      

If you have a table with immutable master data and a table with history, your three queries will be much simpler. In addition, you do not need to have copies of the unmodified master data, since items are never deleted.

Note that in the method you are using, the deleted items are not actually removed either as they remain in your checked table.

+1


source







All Articles