Merge rows with duplicate values, merge cells if they are different
I have a similar question: [concatenate rows with duplicate values] [1] Excel VBA - concatenate rows with duplicate values in one cell and merge values in another cell
I have data in this format (rows are sorted)
Pub ID CH Ref
no 15 1 t2
no 15 1 t88
yes 15 2 t3
yes 15 2 t3
yes 15 2 t6
compare adjacent lines (say lines 4 and 5), if col 2 and 3 are the same, then if col 4 is different by merging col4, remove the line. if col 2,3,4 match then remove the line, don't concatenate col 4
Desired exit
key ID CH Text
no 15 1 t2 t88
yes 15 2 t3 t6
This first section of code is not working correctly
Sub mergeCategoryValues()
Dim lngRow As Long
With ActiveSheet
Dim columnToMatch1 As Integer: columnToMatch1 = 2
Dim columnToMatch2 As Integer: columnToMatch2 = 3
Dim columnToConcatenate As Integer: columnToConcatenate = 4
lngRow = .Cells(65536, columnToMatch1).End(xlUp).row
.Cells(columnToMatch1).CurrentRegion.Sort key1:=.Cells(columnToMatch1), Header:=xlYes
.Cells(columnToMatch2).CurrentRegion.Sort key1:=.Cells(columnToMatch2), Header:=xlYes
Do
If .Cells(lngRow, columnToMatch1) = .Cells(lngRow - 1, columnToMatch1) Then 'check col 2 row lngRow, lngRow-1
If .Cells(lngRow, columnToMatch2) = .Cells(lngRow - 1, columnToMatch2) Then 'check col 3 row lngRow, lngRow-1
If .Cells(lngRow - 1, columnToConcatenate) = .Cells(lngRow, columnToConcatenate) Then
Else
.Cells(lngRow - 1, columnToConcatenate) = .Cells(lngRow - 1, columnToConcatenate) & "; " & .Cells(lngRow, columnToConcatenate)
End If
.Rows(lngRow).Delete
End If
End If
lngRow = lngRow - 1
Loop Until lngRow = 1
End With
The actual output is wrong because when the merge of cells t3 will not match t3; t6, my comparison on col 4 will only work in a very simple case.
Actual output
key ID CH Text
no 15 1 t2; t88
yes 15 2 t3; t3; t6
So I had to add these two sections to separate the Concatenate cells and then remove the duplicates
'split cell in Col d to col e+ delimited by ;
With Range("D2:D6", Range("D" & Rows.Count).End(xlUp))
.Replace ";", " ", xlPart
.TextToColumns other:=True
End With
'remove duplicates in each row
Dim x, y(), i&, j&, k&, s$
With ActiveSheet.UsedRange
x = .Value: ReDim y(1 To UBound(x, 1), 1 To UBound(x, 2))
For i = 1 To UBound(x)
For j = 1 To UBound(x, 2)
If Len(x(i, j)) Then
If InStr(s & "|", "|" & x(i, j) & "|") = 0 Then _
s = s & "|" & x(i, j): k = k + 1: y(i, k) = x(i, j)
End If
Next j: s = vbNullString: k = 0
Next i
.Value = y()
End With
End Sub
With additional code output
Pub ID CH Ref
no 15 1 t2 t88
yes 15 2 t3 t6
Question: There must be a much easier way to get it right than using three different methods? How about inserting new 5+ columns if the col 4 items don't match?
Note. Remove duplicate code was found on user nilem at excelforum.
Edit: Col 1 will always be the same if Col 2 and 3 are the same. If the solution is much simpler, we can assume that Col 1 is empty and ignores the data.
I have a book lookup table and need to convert to a simple format that will be used in hardware using a 1960 language that has very limited commands. I am trying to preformat this data, so I only need to search for one line that has all the information.
The final Col D output could be in Column D with a delimiter, or Column DK (8 max Ref only) because I will be parsed for use on another machine. Whichever method is simpler.
source to share
The canonical practice for deleting lines is to start at the bottom and work in an upward direction. This way no lines are skipped. The trick here is to find the rows above the current position that match columns B and C, and concatenate the rows from column D before deleting the row. There are some good worksheet formulas that can get the row number in two columns. Applying one of them in practice with the help application.Evaluate
seems to be the most appropriate method for collecting values from column D.
Sub dedupe_and_collect()
Dim rw As Long, mr As Long, wsn As String
With ActiveSheet '<- set this worksheet reference properly!
wsn = .Name
With .Cells(1, 1).CurrentRegion
.RemoveDuplicates Columns:=Array(2, 3, 4), Header:=xlYes
End With
With .Cells(1, 1).CurrentRegion 'redefinition after duplicate removal
For rw = .Rows.Count To 2 Step -1 'walk backwards when deleting rows
If Application.CountIfs(.Columns(2), .Cells(rw, 2).Value, .Columns(3), .Cells(rw, 3).Value) > 1 Then
mr = Application.Evaluate("MIN(INDEX(ROW(1:" & rw & ")+(('" & wsn & "'!B1:B" & rw & "<>'" & wsn & "'!B" & rw & ")+('" & wsn & "'!C1:C" & rw & "<>'" & wsn & "'!C" & rw & "))*1E+99, , ))")
'concatenate column D
'.Cells(mr, 4) = .Cells(mr, 4).Value & "; " & .Cells(rw, 4).Value
'next free column from column D
.Cells(mr, Columns.Count).End(xlToLeft).Offset(0, 1) = .Cells(rw, 4).Value
.Rows(rw).EntireRow.Delete
End If
Next rw
End With
End With
End Sub
Removing records when matching with three columns is done using the VBA equivalent in Date ► Data Tools ► Remove Duplicates. This only counts columns B, C and D and removes the bottom duplicates (keeping close to row 1). If column A is important in this regard, additional coding should be added.
It is not clear to me if you want Column D as a dividing row or as separate cells as the end result. Could you clarify?
source to share
As I wrote above, I will iterate over the data and collect things into a custom object. There is no need to sort the data in this method; and the duplicate REF
will be omitted.
One of the benefits of a user-defined object is that it makes debugging easier because you can see more clearly what you've done.
We concatenate each line where ID
both CH
match, using a property of the Collection object to raise the error if identical keys are used.
Since combining Refs in one cell with a delimiter, versus single cells in D: K columns, can be done simply. I decided to split into columns, but changing it to join into one column would be trivial.
After installing the class module, you need to rename it: cID_CH
You will notice that I have placed the results on separate sheets. You can overwrite the original data, but I would suggest doing that.
Class module
Option Explicit
Private pID As Long
Private pCH As Long
Private pPUB As String
Private pREF As String
Private pcolREF As Collection
Public Property Get ID() As Long
ID = pID
End Property
Public Property Let ID(Value As Long)
pID = Value
End Property
Public Property Get CH() As Long
CH = pCH
End Property
Public Property Let CH(Value As Long)
pCH = Value
End Property
Public Property Get PUB() As String
PUB = pPUB
End Property
Public Property Let PUB(Value As String)
pPUB = Value
End Property
Public Property Get REF() As String
REF = pREF
End Property
Public Property Let REF(Value As String)
pREF = Value
End Property
Public Property Get colREF() As Collection
Set colREF = pcolREF
End Property
Public Sub ADD(refVAL As String)
On Error Resume Next
pcolREF.ADD refVAL, refVAL
On Error GoTo 0
End Sub
Private Sub Class_Initialize()
Set pcolREF = New Collection
End Sub
Regular module
Option Explicit
Sub CombineDUPS()
Dim wsSRC As Worksheet, wsRES As Worksheet
Dim vSRC As Variant, vRES() As Variant, rRES As Range
Dim cI As cID_CH, colI As Collection
Dim I As Long, J As Long
Dim S As String
'Set source and results worksheets and results range
Set wsSRC = Worksheets("sheet1")
Set wsRES = Worksheets("sheet2")
Set rRES = wsRES.Cells(1, 1)
'Get Source data
With wsSRC
vSRC = .Range("A2", .Cells(.Rows.Count, "D").End(xlUp))
End With
'Collect and combine data
Set colI = New Collection
On Error Resume Next
For I = 1 To UBound(vSRC, 1)
Set cI = New cID_CH
With cI
.PUB = vSRC(I, 1)
.ID = vSRC(I, 2)
.CH = vSRC(I, 3)
.REF = vSRC(I, 4)
.ADD .REF
S = CStr(.ID & "|" & .CH)
colI.ADD cI, S
If Err.Number = 457 Then
Err.Clear
colI(S).ADD .REF
ElseIf Err.Number <> 0 Then
Debug.Print Err.Number, Err.Description
Stop
End If
End With
Next I
On Error GoTo 0
'Create and populate Results Array
ReDim vRES(0 To colI.Count, 1 To 11)
'Header row
vRES(0, 1) = "Pub"
vRES(0, 2) = "ID"
vRES(0, 3) = "CH"
vRES(0, 4) = "Ref"
'populate array
For I = 1 To colI.Count
With colI(I)
vRES(I, 1) = .PUB
vRES(I, 2) = .ID
vRES(I, 3) = .CH
For J = 1 To .colREF.Count
vRES(I, J + 3) = .colREF(J)
Next J
End With
Next I
'Write the results to the worksheet
Set rRES = rRES.Resize(UBound(vRES, 1) + 1, UBound(vRES, 2))
With rRES
.EntireColumn.Clear
.Value = vRES
With .Rows(1)
.Font.Bold = True
.HorizontalAlignment = xlCenter
Range(.Cells(4), .Cells(11)).HorizontalAlignment = xlCenterAcrossSelection
End With
.EntireColumn.AutoFit
End With
End Sub
Original
Processed results
source to share
using the dictionary below
Sub test()
Dim Dic As Object: Set Dic = CreateObject("Scripting.Dictionary")
Dic.Comparemode = vbTextCompare
Dim Cl As Range, x$, y$, i&, Key As Variant
For Each Cl In Range("B2:B" & Cells(Rows.Count, "B").End(xlUp).Row)
x = Cl.Value & "|" & Cl.Offset(, 1).Value
y = Cl.Offset(, 2).Value
If Not Dic.exists(x) Then
Dic.Add x, Cl.Offset(, -1).Value & "|" & y & "|"
ElseIf Dic.exists(x) And Not LCase(Dic(x)) Like "*|" & LCase(y) & "|*" Then
Dic(x) = Dic(x) & "|" & y & "|"
End If
Next Cl
Range("A2:D" & Cells(Rows.Count, "B").End(xlUp).Row).ClearContents
i = 2
For Each Key In Dic
Cells(i, "A") = Split(Dic(Key), "|")(0)
Range(Cells(i, "B"), Cells(i, "C")) = Split(Key, "|")
Cells(i, "D") = Replace(Split(Replace(Dic(Key), "||", ";"), "|")(1), ":", ";")
i = i + 1
Next Key
Set Dic = Nothing
End Sub
before
after
source to share