Excel - Counting unique values ​​matching IDs optimized for 100,000+ cases

regarding excel screen capture below, i am looking for a formula solution that counts the number of unique values ​​in column B (color) for each id number in column A.

I imputed the desired result into column C. So, for example, ID 1 (A2) has only one unique color, Gray (B2), which will return 1 in C2. ID 2 has only one unique color, yellow (B3, B4) and returns 1 in C3 and C4. ID 3, has two unique colors, blue and purple, which returns 2 in C5-C8. Etc.

Since this will work for nearly 100,000 rows, many of the index and / or collation based solutions I have come across go far ahead to compute. I found that the ID values, which are all in ascending order, can be used to speed things up by running the formula with = IF (A2 = A1 or something like that. Thanks in advance to anyone who has some ideas on how to Solve this using a lean formula.

Note. I am working with files that also have about 100 columns. A solution that doesn't require helper columns would be ideal.

EDIT / ADDITION: My main data file has instances of blank cells in column B. Is there a way to ignore blank cells when counting the results in column C?

enter image description here

+3


source to share


2 answers


Here is a VBA routine that should run quickly for this number of records. We create a class module (custom object) that consists of a collection (Dictionary) of colors associated with each identifier and a count of that color. (There isn't really a need for counting, but it's trivial to add it if you want it for some other purpose, and also as a demonstration of what can be done.)

We then output the results in the adjacent column as shown in the screenshot. Results can be displayed elsewhere, even on a different sheet, with minor code changes.

Be sure to read the notes at the beginning of the modules for important information and proper configuration.

Class module


Option Explicit
'RENAME this module:  cID

Private pID As String
Private pColor As String
Private pColors As Dictionary

Public Property Get ID() As String
    ID = pID
End Property
Public Property Let ID(Value As String)
    pID = Value
End Property

Public Property Get Color() As String
    Color = pColor
End Property
Public Property Let Color(Value As String)
    pColor = Value
End Property

Public Property Get Colors() As Dictionary
    Set Colors = pColors
End Property
Public Function ADDColor(Value As String)
    'Might as well also count # of times this color assigned
    If Not pColors.Exists(Value) Then
        pColors.Add Key:=Value, Item:=1
    Else
        pColors(Value) = pColors(Value) + 1
    End If
End Function

Private Sub Class_Initialize()
    Set pColors = New Dictionary
End Sub

      




Regular module

EDIT (edited to exclude counting for blank lines)


Option Explicit
'Set reference to Microsoft Scripting Runtime (Tools/References)

Sub IDColorCount()
    Dim cID As cID, dID As Dictionary
    Dim wsData As Worksheet, rData As Range
    Dim vData As Variant, vRes As Variant
    Dim I As Long

'Set the data worksheet and range
'Read the data into an array for faster calculations
Set wsData = Worksheets("sheet1")
With wsData
    Set rData = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)).Resize(columnsize:=2)
    vData = rData
End With

'Go through the data and collect the information
Set dID = New Dictionary
For I = 2 To UBound(vData, 1)
  If Not vData(I, 1) = "" Then
    Set cID = New cID
    With cID
        .ID = vData(I, 1)
        .Color = vData(I, 2)
        .ADDColor .Color

        If Not dID.Exists(.ID) Then
            dID.Add Key:=.ID, Item:=cID
        Else
            dID(.ID).ADDColor .Color
        End If
    End With
  End If
Next I

'Size the results array
ReDim vRes(1 To UBound(vData), 1 To 1)
vRes(1, 1) = "Count"
For I = 2 To UBound(vData, 1)
    If Not vData(I, 1) = "" Then _
        vRes(I, 1) = dID(CStr(vData(I, 1))).Colors.Count
Next I

'The results can  be written anyplace
With rData.Offset(0, 2).Resize(columnsize:=1)
    .EntireColumn.Clear
    .Value = vRes
End With

End Sub

      


+3


source


Ok, I solved your problem using this array formula:

=SUM(IF(FREQUENCY(IF($A$2:$A$21=A2,MATCH($B$2:$B$21,$B$2:$B$21,0),""),MATCH($B$2:$B$21,$B$2:$B$21,0))>0,1))

      



Place this formula in a cell C2

and press CTRL+ SHIFT+ ENTER, then drag the formula down. If have any questions, comment below

+3


source







All Articles