Reading python in a multi column tsv file with line numbers

What's the cleanest way to read in a multi-column tsv file in python with headers, but where the first column has no header and instead contains the line numbers for each row?

This appears to be a common format from files coming from R-data frames.

Example:

    A      B  C
1   a1     b1 c1
2   a2     b2 c2
3   a3     b3 c3

      

Any ideas?

+5


source to share


3 answers


Depends on what you want to do with the data after (and if the file is indeed a tsv delimited by \ t). If you just want it in a set of lists, you can use the module csv

like this:

import csv
with open("tsv.tsv") as tsvfile:
    tsvreader = csv.reader(tsvfile, delimiter="\t")
    for line in tsvreader:
        print line[1:]

      

However, I would also recommend the module DataFrame

from pandas

for anything other than simple python operations. It can be used as such:



from pandas import DataFrame
df = DataFrame.from_csv("tsv.tsv", sep="\t")

      

DataFrames allow high-level manipulation of datasets such as adding columns, looking for averages, etc.

+17


source


How to use the following native Python codes:



with open('tsvfilename') as f:
    lines = f.read().split('\n')[:-1]
    for i, line in enumerate(lines):
        if i == 0: # header
            column_names = line.split()
            # ...
        else:
            data = line.split();
            # ...

      

+1


source


df = DataFrame.from_csv("tsv.tsv", sep="\t")

outdated

df.read_csv("tsv.tsv", sep="\t")

probably works

0


source







All Articles