Reading part of a large xlsx file using python

Question

Reading part of a large xlsx file using python

I have a large .xlsx file with 1 million lines. I don't want to open the whole file in one go. I was wondering if I could read a chunk of a file, process it, and then read the next chunk? (I prefer to use pandas for it.)

+2

python pandas

Adel Jul 27. 16 at 21:19

source to share

2 answers

bpachev · Answer 1 · 2016-07-27T21:33:00+0000

Yes. Pandas supports reading pipes. You would read an excel file like this.

import pandas as pd
xl = pd.ExcelFile("myfile.xlsx")
for sheet_name in xl.sheet_names:
  reader = xl.parse(sheet_name, chunksize=1000):
  for chunk in reader:
    #parse chunk here

MaxU · Answer 2 · 2016-07-27T21:32:52+0000

UPDATE: 2019-09-05

The parameter is chunksize

deprecated as it has not been used pd.read_excel()

due to the nature of the XLSX file format that will be read into memory in general during parsing.

More on this in this great SO answer ...

OLD answer:

You can use the read_excel () method :

chunksize = 10**5
for chunk in pd.read_excel(filename, chunksize=chunksize):
    # process 'chunk' DF

if you have multiple sheets in your Excel file take a look at bpachev solution

Reading part of a large xlsx file using python

More articles: