Knn predictions with clustering
I have a 60.000 obs / 40 Variable dataset on which I used Clara, mainly due to memory constraints.
library(cluster)
library(dplyr)
mutate(kddnew, Att=ifelse(Class=="normal","normal", "attack"))
ds <- dat[,c(-20,-21,-40)
clus <- clara(ds, 3, samples=500, sampsize=100, pamLike=TRUE)
This returned a table with medoids.
Now I am trying to use knn
to make a prediction like this:
medoidz <- clus$medoids
r <- knn(medoidz, ds, cl=ds$targetvariable)
And it returns
'train' and 'class' are of different length
Can anyone shed some light on how to use it?
+3
source to share
1 answer
It works:
require(cluster)
require(class)
data(iris)
ds <- iris
ds$y <- as.numeric(ds$Species)
ds$Species <- NULL
idx <- rbinom(nrow(ds), 2, .6)
training <- ds[idx,]
testing <- ds[-idx,]
x <- training
y <- training$y
x1 <- testing
y1 <- testing$y
clus <- clara(x, 3, samples = 1, sampsize = nrow(x), pamLike=TRUE)
knn(train = x, test = x1, cl = clus$clustering, k = 10, l = 0, prob = T, use.all = T)
Although 3 is clearly a bad choice for the number of clusters in this dataset, so the prediction is not good. Hopefully, you choose the correct number of clusters for your data, and you can test your forecast strength using prediction.strength
the package fpc
or other methods.
+4
source to share