Pandas, read in file without separator between columns

I want to read in a file that looks like this:

 1.49998061E-01 2.49996769E-01 3.99994830E-01 5.99992245E-01 9.99987075E-01
 1.49998061E+00 2.49996769E+00 5.99992245E+00 9.99987075E+00 1.99997415E+01
 4.99993537E+01 9.99987075E+01  .00000000E+00-2.70636350E+03-6.37027451E+03
-1.97521328E+04-4.64928272E+04-1.09435407E+05-3.39323088E+05-7.98702345E+05
-1.87999269E+06-5.82921376E+06-1.37207895E+07-2.26385807E+07-4.25429547E+07
-7.60167523E+07-1.25422049E+08-2.35690283E+08-3.88862033E+08-7.30701955E+08
-1.30546599E+09-2.15348023E+09-4.04455001E+09-4.54896210E+09-5.32533888E+09

      

So, each column is denoted by a sequence of 15 characters, but there is no official separator. Is there a pandas way to do this?

+3


source to share


3 answers


Yes! its called pd.read_fwf



from io import StringIO
import pandas as pd

txt = """ 1.49998061E-01 2.49996769E-01 3.99994830E-01 5.99992245E-01 9.99987075E-01
 1.49998061E+00 2.49996769E+00 5.99992245E+00 9.99987075E+00 1.99997415E+01
 4.99993537E+01 9.99987075E+01  .00000000E+00-2.70636350E+03-6.37027451E+03
-1.97521328E+04-4.64928272E+04-1.09435407E+05-3.39323088E+05-7.98702345E+05
-1.87999269E+06-5.82921376E+06-1.37207895E+07-2.26385807E+07-4.25429547E+07
-7.60167523E+07-1.25422049E+08-2.35690283E+08-3.88862033E+08-7.30701955E+08
-1.30546599E+09-2.15348023E+09-4.04455001E+09-4.54896210E+09-5.32533888E+09"""

pd.read_fwf(StringIO(txt), widths=[15] * 5, header=None)

              0             1             2             3             4
0  1.499981e-01  2.499968e-01  3.999948e-01  5.999922e-01  9.999871e-01
1  1.499981e+00  2.499968e+00  5.999922e+00  9.999871e+00  1.999974e+01
2  4.999935e+01  9.999871e+01  0.000000e+00 -2.706363e+03 -6.370275e+03
3 -1.975213e+04 -4.649283e+04 -1.094354e+05 -3.393231e+05 -7.987023e+05
4 -1.879993e+06 -5.829214e+06 -1.372079e+07 -2.263858e+07 -4.254295e+07
5 -7.601675e+07 -1.254220e+08 -2.356903e+08 -3.888620e+08 -7.307020e+08
6 -1.305466e+09 -2.153480e+09 -4.044550e+09 -4.548962e+09 -5.325339e+09

      

+7


source


Look at usage pd.read_fwf

:



df = pd.read_fwf(csv_file,widths=[15]*5,header=None)

      

+3


source


Let them do it: for example: housing.data

enter image description here

dataset = pd.read_csv('c:/1/housing.data', engine = 'python', sep='\s+', header=None)

      

enter image description here

0


source







All Articles