Pandas get row number of data with composite index
I have a directory with .csv files containing 60 minute bars of stock data and a Python script that loads them all into a pandas dataframe and indexes the character and date and time as shown below:
import pandas as pd
import glob
import numpy as np
allFiles = glob.glob("D:\\Data\\60 Min Bar Stocks\\*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
df = pd.read_csv(file_,index_col=None, header=0)
list_.append(df)
frame = pd.concat(list_)
frame.set_index(['Symbol','Date'],inplace=True)
print(frame.loc["AAL", :])
print(frame.loc["AAL", :].loc["05-Jun-2017 09:00", :])
The first print returns the following:
Open High Low Close Volume
Date
05-Jun-2017 09:00 49.53 49.88 49.40 49.64 560155
05-Jun-2017 10:00 49.58 49.89 49.58 49.85 575165
The second print returns the following:
Open 49.53
High 49.88
Low 49.40
Close 49.64
Volume 560155.00
Name: 05-Jun-2017 09:00, dtype: float64
How can I find the row index for that single row in the data frame and then get a slice that will be 12 rows consisting of the previous row, the current row, and the next 10 rows?
source to share
I think you need get_loc
for position MultiIndex
, then select iloc
:
d = '05-Jun-2017 09:00' s = 'AAL' pos = df.index.get_loc((s,d)) df1 = df.iloc[pos-1:pos + 11] print (df1)
But there is a problem if t
is the first value or part of the 10
last:
df1 = df.iloc[max(pos-1,0): min(pos+11,len(df.index))]
Example:
print (df)
Open High Low Close Volume
Symbol Date
AAL 05-Jun-2017 08:00 1.1801 1.1819 1.1801 1.1817 4
05-Jun-2017 09:00 1.1817 1.1818 1.1804 1.1814 18
05-Jun-2017 10:00 1.1817 1.1817 1.1802 1.1806 12
05-Jun-2017 11:00 1.1807 1.1815 1.1795 1.1808 26
05-Jun-2017 12:00 1.1803 1.1806 1.1790 1.1806 4
05-Jun-2017 13:00 1.1801 1.1801 1.1779 1.1786 23
05-Jun-2017 14:00 1.1795 1.1801 1.1776 1.1788 28
05-Jun-2017 15:00 1.1793 1.1795 1.1782 1.1789 10
05-Jun-2017 16:00 1.1780 1.1792 1.1776 1.1792 12
05-Jun-2017 17:00 1.1788 1.1792 1.1788 1.1791 4
d = '05-Jun-2017 09:00' s = 'AAL' pos = df.index.get_loc((s,d)) df1 = df.iloc[max(pos-1,0): min(pos+10,len(df.index))] print (df1) Open High Low Close Volume Symbol Date AAL 05-Jun-2017 08:00 1.1801 1.1819 1.1801 1.1817 4 05-Jun-2017 09:00 1.1817 1.1818 1.1804 1.1814 18 05-Jun-2017 10:00 1.1817 1.1817 1.1802 1.1806 12 05-Jun-2017 11:00 1.1807 1.1815 1.1795 1.1808 26 05-Jun-2017 12:00 1.1803 1.1806 1.1790 1.1806 4 05-Jun-2017 13:00 1.1801 1.1801 1.1779 1.1786 23 05-Jun-2017 14:00 1.1795 1.1801 1.1776 1.1788 28 05-Jun-2017 15:00 1.1793 1.1795 1.1782 1.1789 10 05-Jun-2017 16:00 1.1780 1.1792 1.1776 1.1792 12 05-Jun-2017 17:00 1.1788 1.1792 1.1788 1.1791 4
Can't select previous because timestamp t
is first value if index is:
d = '05-Jun-2017 08:00' s = 'AAL' pos = df.index.get_loc((s,d)) df1 = df.iloc[max(pos-1,0): min(pos+10,len(df.index))] print (df1) Open High Low Close Volume Symbol Date AAL 05-Jun-2017 08:00 1.1801 1.1819 1.1801 1.1817 4 05-Jun-2017 09:00 1.1817 1.1818 1.1804 1.1814 18 05-Jun-2017 10:00 1.1817 1.1817 1.1802 1.1806 12 05-Jun-2017 11:00 1.1807 1.1815 1.1795 1.1808 26 05-Jun-2017 12:00 1.1803 1.1806 1.1790 1.1806 4 05-Jun-2017 13:00 1.1801 1.1801 1.1779 1.1786 23 05-Jun-2017 14:00 1.1795 1.1801 1.1776 1.1788 28 05-Jun-2017 15:00 1.1793 1.1795 1.1782 1.1789 10 05-Jun-2017 16:00 1.1780 1.1792 1.1776 1.1792 12 05-Jun-2017 17:00 1.1788 1.1792 1.1788 1.1791 4
It is not possible to select all 10 of the following rows because t
is the 3.rd
value from the back:
d = '05-Jun-2017 15:00' s = 'AAL' pos = df.index.get_loc((s,d)) df1 = df.iloc[max(pos-1,0): min(pos+10,len(df.index))] print (df1) Open High Low Close Volume Symbol Date AAL 05-Jun-2017 14:00 1.1795 1.1801 1.1776 1.1788 28 05-Jun-2017 15:00 1.1793 1.1795 1.1782 1.1789 10 05-Jun-2017 16:00 1.1780 1.1792 1.1776 1.1792 12 05-Jun-2017 17:00 1.1788 1.1792 1.1788 1.1791 4
source to share