Merge rows with duplicate values, merge cells if they are different

I have a similar question: [concatenate rows with duplicate values] [1] Excel VBA - concatenate rows with duplicate values ​​in one cell and merge values ​​in another cell

I have data in this format (rows are sorted)


Pub     ID      CH      Ref
no      15      1      t2
no      15      1      t88
yes     15      2      t3
yes     15      2      t3
yes     15      2      t6

      


compare adjacent lines (say lines 4 and 5), if col 2 and 3 are the same, then if col 4 is different by merging col4, remove the line. if col 2,3,4 match then remove the line, don't concatenate col 4


Desired exit

key     ID      CH      Text  
no      15      1       t2   t88
yes     15      2       t3   t6

      

This first section of code is not working correctly

Sub mergeCategoryValues()
    Dim lngRow As Long

    With ActiveSheet
        Dim columnToMatch1 As Integer: columnToMatch1 = 2
        Dim columnToMatch2 As Integer: columnToMatch2 = 3
        Dim columnToConcatenate As Integer: columnToConcatenate = 4


        lngRow = .Cells(65536, columnToMatch1).End(xlUp).row
        .Cells(columnToMatch1).CurrentRegion.Sort key1:=.Cells(columnToMatch1), Header:=xlYes
        .Cells(columnToMatch2).CurrentRegion.Sort key1:=.Cells(columnToMatch2), Header:=xlYes

        Do
            If .Cells(lngRow, columnToMatch1) = .Cells(lngRow - 1, columnToMatch1) Then 'check col 2 row lngRow, lngRow-1
              If .Cells(lngRow, columnToMatch2) = .Cells(lngRow - 1, columnToMatch2) Then 'check col 3 row lngRow, lngRow-1
                 If .Cells(lngRow - 1, columnToConcatenate) = .Cells(lngRow, columnToConcatenate) Then
                    Else
                    .Cells(lngRow - 1, columnToConcatenate) = .Cells(lngRow - 1, columnToConcatenate) & "; " & .Cells(lngRow, columnToConcatenate)
                 End If
                .Rows(lngRow).Delete
              End If
            End If
            lngRow = lngRow - 1
        Loop Until lngRow = 1
    End With

      

The actual output is wrong because when the merge of cells t3 will not match t3; t6, my comparison on col 4 will only work in a very simple case.

Actual output

key ID  CH  Text
no  15  1   t2; t88
yes 15  2   t3; t3; t6

      

So I had to add these two sections to separate the Concatenate cells and then remove the duplicates

'split cell in Col d to col e+ delimited by ;
        With Range("D2:D6", Range("D" & Rows.Count).End(xlUp))
            .Replace ";", " ", xlPart
            .TextToColumns other:=True
        End With

 'remove duplicates in each row

    Dim x, y(), i&, j&, k&, s$
    With ActiveSheet.UsedRange
        x = .Value: ReDim y(1 To UBound(x, 1), 1 To UBound(x, 2))
        For i = 1 To UBound(x)
            For j = 1 To UBound(x, 2)
                If Len(x(i, j)) Then
                    If InStr(s & "|", "|" & x(i, j) & "|") = 0 Then _
                       s = s & "|" & x(i, j): k = k + 1: y(i, k) = x(i, j)
                End If
            Next j: s = vbNullString: k = 0
        Next i
        .Value = y()
    End With
    End Sub

      

With additional code output

Pub ID  CH  Ref 
no  15  1   t2  t88
yes 15  2   t3  t6

      

Question: There must be a much easier way to get it right than using three different methods? How about inserting new 5+ columns if the col 4 items don't match?

Note. Remove duplicate code was found on user nilem at excelforum.

Edit: Col 1 will always be the same if Col 2 and 3 are the same. If the solution is much simpler, we can assume that Col 1 is empty and ignores the data.

I have a book lookup table and need to convert to a simple format that will be used in hardware using a 1960 language that has very limited commands. I am trying to preformat this data, so I only need to search for one line that has all the information.

The final Col D output could be in Column D with a delimiter, or Column DK (8 max Ref only) because I will be parsed for use on another machine. Whichever method is simpler.

+3


source to share


3 answers


The canonical practice for deleting lines is to start at the bottom and work in an upward direction. This way no lines are skipped. The trick here is to find the rows above the current position that match columns B and C, and concatenate the rows from column D before deleting the row. There are some good worksheet formulas that can get the row number in two columns. Applying one of them in practice with the help application.Evaluate

seems to be the most appropriate method for collecting values ​​from column D.

Sub dedupe_and_collect()
    Dim rw As Long, mr As Long, wsn As String

    With ActiveSheet   '<- set this worksheet reference properly!
        wsn = .Name
        With .Cells(1, 1).CurrentRegion
            .RemoveDuplicates Columns:=Array(2, 3, 4), Header:=xlYes
        End With
        With .Cells(1, 1).CurrentRegion  'redefinition after duplicate removal
            For rw = .Rows.Count To 2 Step -1 'walk backwards when deleting rows
                If Application.CountIfs(.Columns(2), .Cells(rw, 2).Value, .Columns(3), .Cells(rw, 3).Value) > 1 Then
                    mr = Application.Evaluate("MIN(INDEX(ROW(1:" & rw & ")+(('" & wsn & "'!B1:B" & rw & "<>'" & wsn & "'!B" & rw & ")+('" & wsn & "'!C1:C" & rw & "<>'" & wsn & "'!C" & rw & "))*1E+99, , ))")
                    'concatenate column D
                    '.Cells(mr, 4) = .Cells(mr, 4).Value & "; " & .Cells(rw, 4).Value
                    'next free column from column D
                    .Cells(mr, Columns.Count).End(xlToLeft).Offset(0, 1) = .Cells(rw, 4).Value
                    .Rows(rw).EntireRow.Delete
                End If
            Next rw
        End With
    End With
End Sub

      



Removing records when matching with three columns is done using the VBA equivalent in Date ► Data Tools ► Remove Duplicates. This only counts columns B, C and D and removes the bottom duplicates (keeping close to row 1). If column A is important in this regard, additional coding should be added.

It is not clear to me if you want Column D as a dividing row or as separate cells as the end result. Could you clarify?

+1


source


As I wrote above, I will iterate over the data and collect things into a custom object. There is no need to sort the data in this method; and the duplicate REF

will be omitted.

One of the benefits of a user-defined object is that it makes debugging easier because you can see more clearly what you've done.

We concatenate each line where ID

both CH

match, using a property of the Collection object to raise the error if identical keys are used.

Since combining Refs in one cell with a delimiter, versus single cells in D: K columns, can be done simply. I decided to split into columns, but changing it to join into one column would be trivial.

After installing the class module, you need to rename it: cID_CH

You will notice that I have placed the results on separate sheets. You can overwrite the original data, but I would suggest doing that.

Class module


Option Explicit
Private pID As Long
Private pCH As Long
Private pPUB As String
Private pREF As String
Private pcolREF As Collection

Public Property Get ID() As Long
    ID = pID
End Property
Public Property Let ID(Value As Long)
    pID = Value
End Property

Public Property Get CH() As Long
    CH = pCH
End Property
Public Property Let CH(Value As Long)
    pCH = Value
End Property

Public Property Get PUB() As String
    PUB = pPUB
End Property
Public Property Let PUB(Value As String)
    pPUB = Value
End Property

Public Property Get REF() As String
    REF = pREF
End Property
Public Property Let REF(Value As String)
    pREF = Value
End Property

Public Property Get colREF() As Collection
    Set colREF = pcolREF
End Property

Public Sub ADD(refVAL As String)
    On Error Resume Next
        pcolREF.ADD refVAL, refVAL
    On Error GoTo 0
End Sub

Private Sub Class_Initialize()
    Set pcolREF = New Collection
End Sub

      




Regular module


Option Explicit
Sub CombineDUPS()
    Dim wsSRC As Worksheet, wsRES As Worksheet
    Dim vSRC As Variant, vRES() As Variant, rRES As Range
    Dim cI As cID_CH, colI As Collection
    Dim I As Long, J As Long
    Dim S As String

'Set source and results worksheets and results range
Set wsSRC = Worksheets("sheet1")
Set wsRES = Worksheets("sheet2")
Set rRES = wsRES.Cells(1, 1)

'Get Source data
With wsSRC
    vSRC = .Range("A2", .Cells(.Rows.Count, "D").End(xlUp))
End With

'Collect and combine data
Set colI = New Collection
On Error Resume Next
For I = 1 To UBound(vSRC, 1)
    Set cI = New cID_CH
    With cI
        .PUB = vSRC(I, 1)
        .ID = vSRC(I, 2)
        .CH = vSRC(I, 3)
        .REF = vSRC(I, 4)
        .ADD .REF
        S = CStr(.ID & "|" & .CH)
        colI.ADD cI, S
        If Err.Number = 457 Then
            Err.Clear
            colI(S).ADD .REF
        ElseIf Err.Number <> 0 Then
            Debug.Print Err.Number, Err.Description
            Stop
        End If
    End With
Next I
On Error GoTo 0

'Create and populate Results Array
ReDim vRES(0 To colI.Count, 1 To 11)

'Header row
vRES(0, 1) = "Pub"
vRES(0, 2) = "ID"
vRES(0, 3) = "CH"
vRES(0, 4) = "Ref"

'populate array
For I = 1 To colI.Count
    With colI(I)
        vRES(I, 1) = .PUB
        vRES(I, 2) = .ID
        vRES(I, 3) = .CH
        For J = 1 To .colREF.Count
            vRES(I, J + 3) = .colREF(J)
        Next J
    End With
Next I

'Write the results to the worksheet
Set rRES = rRES.Resize(UBound(vRES, 1) + 1, UBound(vRES, 2))
With rRES
    .EntireColumn.Clear
    .Value = vRES
    With .Rows(1)
        .Font.Bold = True
        .HorizontalAlignment = xlCenter
        Range(.Cells(4), .Cells(11)).HorizontalAlignment = xlCenterAcrossSelection
    End With
    .EntireColumn.AutoFit
End With

End Sub

      


Original

Original Data

Processed results

Results

+1


source


using the dictionary below

Sub test()
    Dim Dic As Object: Set Dic = CreateObject("Scripting.Dictionary")
    Dic.Comparemode = vbTextCompare
    Dim Cl As Range, x$, y$, i&, Key As Variant
    For Each Cl In Range("B2:B" & Cells(Rows.Count, "B").End(xlUp).Row)
        x = Cl.Value & "|" & Cl.Offset(, 1).Value
        y = Cl.Offset(, 2).Value
        If Not Dic.exists(x) Then
            Dic.Add x, Cl.Offset(, -1).Value & "|" & y & "|"
        ElseIf Dic.exists(x) And Not LCase(Dic(x)) Like "*|" & LCase(y) & "|*" Then
            Dic(x) = Dic(x) & "|" & y & "|"
        End If
    Next Cl
    Range("A2:D" & Cells(Rows.Count, "B").End(xlUp).Row).ClearContents
    i = 2
    For Each Key In Dic
        Cells(i, "A") = Split(Dic(Key), "|")(0)
        Range(Cells(i, "B"), Cells(i, "C")) = Split(Key, "|")
        Cells(i, "D") = Replace(Split(Replace(Dic(Key), "||", ";"), "|")(1), ":", ";")
        i = i + 1
    Next Key
    Set Dic = Nothing
End Sub

      

before

enter image description here

after

enter image description here

+1


source







All Articles