Sparse matrix has been solved, but dense data is needed. Use X.toarray () to convert to dense numpy array

The code looks like this. I am trying to use training data for GBRT regression trees, the same data works well for other classifiers, but still higher for GBRT. please, help:

dataset = load_files('train')
vectorizer = TfidfVectorizer(encoding='latin1')
X_train = vectorizer.fit_transform((open(f).read() for f in dataset.filenames)) 
assert sp.issparse(X_train)     
print("n_samples: %d, n_features: %d" % X_train.shape)
y_train = dataset.target
def benchmark(clf_class, params, name):
    clf = clf_class(**params).fit(X_train, y_train)

      

+3


source to share


2 answers


Since GBRT in sklearn request X ( training data ) array-like

not sparse matrix

: sklearn-gbrt



Hope this helps you!

+2


source


I faced the same problem while trying to train GradientBoostingClassifier

using the data loaded load_svmlight_files

. Solved by converting the sparse matrix to a numpy array.



X_train.todense()

      

+1


source







All Articles