Target column encoding for classification in tensorflow

Question

Target column encoding for classification in tensorflow

I've been working on Tensorflow for some time now, but one of the things I can't figure out is how to code the categorical target column for the model in the tf.contrib.learn model.

I know that we are defining an input function that is similar to the code below:

def input_fn(joined):
    continuous_cols = {k: tf.constant(joined[k].values)
                     for k in CONTINUOUS_COLUMNS}

    categorical_cols = {k: tf.SparseTensor(
      indices=[[i, 0] for i in range(joined[k].size)],
      values=joined[k].values,
      dense_shape=[joined[k].size, 1])
                      for k in CATEGORICAL_COLUMNS}

    # Merges the two dictionaries into one.
    feature_cols = dict(continuous_cols.items() | categorical_cols.items())
    target = tf.constant(joined[target_col].values)
    return feature_cols, target

def train_input_fn():
    return input_fn(train_frame)
def test_input_fn():
    return input_fn(test_frame)

This is great for binary classification, or for cases where we pre-encode the Target Variable with LabelEncoder or any other method. But how do I encode this variable with tensflow so that tf.contrib.learn can accept it.

I tried to change the code for the destination column as follows:

target = tf.SparseTensor(
      indices=[[i, 0] for i in range(joined[target_col].size)],
      values=joined[target_col].values,
      dense_shape=[joined[target_col].size, 1])

Since it is a string variable, so I thought the sparse tensor should do this But this gives an error:

ValueError: SparseTensor is not supported.

Can anyone help me in determining what I should fill in I should use in the input function for the model DNNClassifier for the target categorical variable.

+3

python tensorflow

Anmol Apr 25. 17 at 11:58

source to share

1 answer

user1454804 · Accepted Answer · 2017-05-02T23:30:49+0000

Support for DNNClassifier names as non-sparse tensors. The expected label looks like [[4], [5], [1]], where each value has an integer id. If you have string labels you can use an argument label_keys

.

Target column encoding for classification in tensorflow

More articles: