Reading python in a multi column tsv file with line numbers
What's the cleanest way to read in a multi-column tsv file in python with headers, but where the first column has no header and instead contains the line numbers for each row?
This appears to be a common format from files coming from R-data frames.
Example:
A B C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
Any ideas?
source to share
Depends on what you want to do with the data after (and if the file is indeed a tsv delimited by \ t). If you just want it in a set of lists, you can use the module csv
like this:
import csv
with open("tsv.tsv") as tsvfile:
tsvreader = csv.reader(tsvfile, delimiter="\t")
for line in tsvreader:
print line[1:]
However, I would also recommend the module DataFrame
from pandas
for anything other than simple python operations. It can be used as such:
from pandas import DataFrame
df = DataFrame.from_csv("tsv.tsv", sep="\t")
DataFrames allow high-level manipulation of datasets such as adding columns, looking for averages, etc.