Pandas: Apply a function through "Column A" while reading "Column B" at the same time

I am using Pandas

to control a function Python

. From inputs.csv

I use each line in "Column A"

as input for a function.

There csv

is also "Column B"

one that contains the values ​​that I want to read into a variable x

inside this function. It shouldn't be apply

from "Column B"

- it should still be made from "Column A"

. Is it possible?


This is the current code that applies the function from "Column A"

:

import pandas as pd
df = pd.read_csv(inputs.csv, delimiter=",")

def function(a):
    #variables c, d, e are created here
    ###I would like to create x from Column B if possible
    return pd.Series([c, d, e])
df[["Column C", "Column D", "Column E"]] = df["Column A"].apply(function)

      


Post-edit: This question has been identified as a possible duplicate of another question . While the answer may be the same, the question is not the same. It is probably not obvious to future readers that apply

two columns are interchangeable with apply

one column and “reading” another column at the same time. Therefore, the question should remain open.

+3


source to share


2 answers


Yes you are using Series.apply()

, instead you can use - DataFrame.apply()

, with axis=1

to get each row in a function, you can access the columns as - row[<column>]

.

Example -



In [37]: df
Out[37]:
   X  Y  Count
0  0  1      2
1  0  1      2
2  1  1      2
3  1  0      1
4  1  1      2
5  0  0      1

In [38]: def func1(r):
   ....:     print(r['X'])
   ....:     print(r['Y'])
   ....:     return r
   ....:

In [39]: df.apply(func1,axis=1)
0
1
0
1
1
1
1
0
1
1
0
0
Out[39]:
   X  Y  Count
0  0  1      2
1  0  1      2
2  1  1      2
3  1  0      1
4  1  1      2
5  0  0      1

      

This is just a very basic example, you can change this to whatever you really want to do.

+2


source


The argument axis=1

passed to the apply method puts the entire string in the apply method as one tuple argument.

However, this is much slower than using a single column. I would advise about this if performance is an issue.



def scrape(x):
    a, b = x
    # Magically create c, d, e from a
    print(b)
    return pd.Series([c, d, e])

df[["Column C", "Column D", "Column E"]] = df[(['Column A', 'Column B'])].apply(scrape, axis=1)

      

+1


source







All Articles