Read_csv converters for off-calculated columns

Question

Read_csv converters for off-calculated columns

I am trying to read a csv file that contains multiple values in each cell and I want to encode them into one formatted byte to be stored in a pandas cell (e.g. (1, 1) -> 771). For this, I would like to use the function converters parameter read_csv

. The problem is that I don't know the column names before serving, and the value to be passed to the converters must be a dict with the column names as keys. Actually I want to convert all columns with the same converter function. For this it would be better to write:

read_csv(fhand, converter=my_endocing_function)

than:

read_csv(fhand, converters={'col1':my_endocing_function,
                            'col2':my_endocing_function,
                            'col3':my_endocing_function,})

Is this possible? Right now, to solve the problem I am doing:

dataframe = read_csv(fhand)
enc_func = numpy.vectorize(encoder.encode_genotype)
dataframe = dataframe.apply(enc_func, axis=1)

But I guess this approach may be less efficient. By the way, I have similar doubts about the formats used by the to_string method.

+3

pandas csv

Jose blanca 07 Mar 12 at 7:56

source to share

1 answer

Wes mckinney · Accepted Answer · 2012-03-07T20:04:31+0000

Instead of names, you can pass integers (0, 1, 2). From docstring:

converters : dict. optional
    Dict of functions for converting values in certain columns. Keys can either
    be integers or column labels

Read_csv converters for off-calculated columns

More articles: