Knn predictions with clustering

I have a 60.000 obs / 40 Variable dataset on which I used Clara, mainly due to memory constraints.

library(cluster)    
library(dplyr)    

mutate(kddnew, Att=ifelse(Class=="normal","normal", "attack"))
ds <- dat[,c(-20,-21,-40)

clus <- clara(ds, 3, samples=500, sampsize=100, pamLike=TRUE)

      

This returned a table with medoids.

Now I am trying to use knn

to make a prediction like this:

medoidz <- clus$medoids
r <- knn(medoidz, ds, cl=ds$targetvariable)

      

And it returns

'train' and 'class' are of different length

Can anyone shed some light on how to use it?

+3


source to share


1 answer


It works:

require(cluster)
require(class)

data(iris)
ds   <- iris
ds$y <- as.numeric(ds$Species)
ds$Species <- NULL

idx      <- rbinom(nrow(ds), 2, .6)
training <- ds[idx,]
testing  <- ds[-idx,]
x        <- training
y        <- training$y
x1       <- testing
y1       <- testing$y

clus <- clara(x, 3, samples = 1, sampsize = nrow(x), pamLike=TRUE)

knn(train = x, test = x1, cl = clus$clustering, k = 10, l = 0, prob = T, use.all = T)

      



Although 3 is clearly a bad choice for the number of clusters in this dataset, so the prediction is not good. Hopefully, you choose the correct number of clusters for your data, and you can test your forecast strength using prediction.strength

the package fpc

or other methods.

+4


source







All Articles