Excel - Counting unique values matching IDs optimized for 100,000+ cases
regarding excel screen capture below, i am looking for a formula solution that counts the number of unique values in column B (color) for each id number in column A.
I imputed the desired result into column C. So, for example, ID 1 (A2) has only one unique color, Gray (B2), which will return 1 in C2. ID 2 has only one unique color, yellow (B3, B4) and returns 1 in C3 and C4. ID 3, has two unique colors, blue and purple, which returns 2 in C5-C8. Etc.
Since this will work for nearly 100,000 rows, many of the index and / or collation based solutions I have come across go far ahead to compute. I found that the ID values, which are all in ascending order, can be used to speed things up by running the formula with = IF (A2 = A1 or something like that. Thanks in advance to anyone who has some ideas on how to Solve this using a lean formula.
Note. I am working with files that also have about 100 columns. A solution that doesn't require helper columns would be ideal.
EDIT / ADDITION: My main data file has instances of blank cells in column B. Is there a way to ignore blank cells when counting the results in column C?
source to share
Here is a VBA routine that should run quickly for this number of records. We create a class module (custom object) that consists of a collection (Dictionary) of colors associated with each identifier and a count of that color. (There isn't really a need for counting, but it's trivial to add it if you want it for some other purpose, and also as a demonstration of what can be done.)
We then output the results in the adjacent column as shown in the screenshot. Results can be displayed elsewhere, even on a different sheet, with minor code changes.
Be sure to read the notes at the beginning of the modules for important information and proper configuration.
Class module
Option Explicit
'RENAME this module: cID
Private pID As String
Private pColor As String
Private pColors As Dictionary
Public Property Get ID() As String
ID = pID
End Property
Public Property Let ID(Value As String)
pID = Value
End Property
Public Property Get Color() As String
Color = pColor
End Property
Public Property Let Color(Value As String)
pColor = Value
End Property
Public Property Get Colors() As Dictionary
Set Colors = pColors
End Property
Public Function ADDColor(Value As String)
'Might as well also count # of times this color assigned
If Not pColors.Exists(Value) Then
pColors.Add Key:=Value, Item:=1
Else
pColors(Value) = pColors(Value) + 1
End If
End Function
Private Sub Class_Initialize()
Set pColors = New Dictionary
End Sub
Regular module
EDIT (edited to exclude counting for blank lines)
Option Explicit
'Set reference to Microsoft Scripting Runtime (Tools/References)
Sub IDColorCount()
Dim cID As cID, dID As Dictionary
Dim wsData As Worksheet, rData As Range
Dim vData As Variant, vRes As Variant
Dim I As Long
'Set the data worksheet and range
'Read the data into an array for faster calculations
Set wsData = Worksheets("sheet1")
With wsData
Set rData = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)).Resize(columnsize:=2)
vData = rData
End With
'Go through the data and collect the information
Set dID = New Dictionary
For I = 2 To UBound(vData, 1)
If Not vData(I, 1) = "" Then
Set cID = New cID
With cID
.ID = vData(I, 1)
.Color = vData(I, 2)
.ADDColor .Color
If Not dID.Exists(.ID) Then
dID.Add Key:=.ID, Item:=cID
Else
dID(.ID).ADDColor .Color
End If
End With
End If
Next I
'Size the results array
ReDim vRes(1 To UBound(vData), 1 To 1)
vRes(1, 1) = "Count"
For I = 2 To UBound(vData, 1)
If Not vData(I, 1) = "" Then _
vRes(I, 1) = dID(CStr(vData(I, 1))).Colors.Count
Next I
'The results can be written anyplace
With rData.Offset(0, 2).Resize(columnsize:=1)
.EntireColumn.Clear
.Value = vRes
End With
End Sub
source to share
Ok, I solved your problem using this array formula:
=SUM(IF(FREQUENCY(IF($A$2:$A$21=A2,MATCH($B$2:$B$21,$B$2:$B$21,0),""),MATCH($B$2:$B$21,$B$2:$B$21,0))>0,1))
Place this formula in a cell C2
and press CTRL+ SHIFT+ ENTER, then drag the formula down. If have any questions, comment below
source to share