Sparse matrix has been solved, but dense data is needed. Use X.toarray () to convert to dense numpy array

Question

Sparse matrix has been solved, but dense data is needed. Use X.toarray () to convert to dense numpy array

The code looks like this. I am trying to use training data for GBRT regression trees, the same data works well for other classifiers, but still higher for GBRT. please, help:

dataset = load_files('train')
vectorizer = TfidfVectorizer(encoding='latin1')
X_train = vectorizer.fit_transform((open(f).read() for f in dataset.filenames)) 
assert sp.issparse(X_train)     
print("n_samples: %d, n_features: %d" % X_train.shape)
y_train = dataset.target
def benchmark(clf_class, params, name):
    clf = clf_class(**params).fit(X_train, y_train)

+3

python scikit-learn

Dhananjay Ambekar May 28 '15 @ 9:18 am

source to share

2 answers

Chung-yen hung · Answer 1 · 2015-05-28T09:44:39+0000

Since GBRT in sklearn request X ( training data ) array-like

not sparse matrix

: sklearn-gbrt

Hope this helps you!

Peiqin · Answer 2 · 2016-05-16T07:28:29+0000

I faced the same problem while trying to train GradientBoostingClassifier

using the data loaded load_svmlight_files

. Solved by converting the sparse matrix to a numpy array.

X_train.todense()

Sparse matrix has been solved, but dense data is needed. Use X.toarray () to convert to dense numpy array

More articles: