Speeding up timestamped operations

Question

Speeding up timestamped operations

The following conversion (ms -> datetime -> time time time) takes a long time (4 minutes), possibly because I am working with a large dataframe:

for column in ['A', 'B', 'C', 'D', 'E']:
    # Data comes in unix time (ms) so I need to convert it to datetime
    df[column] = pd.to_datetime(df[column], unit='ms')

    # Get times in EST
    df[column] = df[column].apply(lambda x: x.tz_localize('UTC').tz_convert('US/Eastern'))

Is there a way to speed it up? Am I already using Pandas data structures and methods in the most efficient way?

+3

python pandas

Amelio vazquez-reina 04 Sep '14 at 3:29

source to share

2 answers

I would try in Bash using the date command. The date turns out to be faster than even gawk for routine conversions. Python can fight this.

To speed it up, even faster export column in one temp file, column B in another, ect. (You can even do this in python). Then run 5 columns in parallel.

for column in ['A']:
  print>>thefileA, column
for column in ['B']:
  print>>thefileB, column

Then a Bash script:

#!/usr/bin/env bash
readarray a < thefileA
for i in $( a ); do
    date -r item: $i
done

You will need a Bash script wizard to run the first part in python python pythonscript.py

. Then you will need to call in each of the Bash scripts in the background from the master ./FILEA.sh &

. This will run each column individually and automatically assign nodes. For my Bash loop after readarray I am not 100%, this is the correct syntax. If you are using Linux use date -d @ item

.

0

PhysicalChemist 04 Sep At 4:20 am

source to share

Andy Hayden · Accepted Answer · 2014-09-04T04:20:50+0000

They are available as DatetimeIndex methods which will be much faster:

df[column] = pd.DatetimeIndex(df[column]).tz_localize('UTC').tz_convert('US/Eastern')

Note: in 0.15.0, you will access them as dt accessor :

df[column] = df[column].dt.tz_localize('UTC').tz_convert('US/Eastern')

Speeding up timestamped operations

More articles: