Counting T / F Values ​​for Multiple Conditions

I am starting to use pandas.

I am looking for mutations in several patients. I have 16 different conditions. I'm just writing code about it, but how do I do it for a loop? I am trying to find changes in the MUT column and set them to True and False. Then try to count True / False numbers. I only did 4.

Can you suggest an easier way instead of writing the same code 16 times?

s1=df["MUT"]
A_T= s1.str.contains("A:T")
ATnum= A_T.value_counts(sort=True)

s2=df["MUT"]
A_G=s2.str.contains("A:G")
AGnum=A_G.value_counts(sort=True)

s3=df["MUT"]
A_C=s3.str.contains("A:C")
ACnum=A_C.value_counts(sort=True)

s4=df["MUT"]
A__=s4.str.contains("A:-")
A_num=A__.value_counts(sort=True)

      

+3


source to share


2 answers


I'm not an expert on using Pandas, so I don't know if there is a cleaner way to do this, but maybe the following might work?

chars = 'TGC-'
nums = {}

for char in chars:
    s = df["MUT"]
    A = s.str.contains("A:" + char)
    num = A.value_counts(sort=True)
    nums[char] = num

ATnum = nums['T']
AGnum = nums['G']
# ...etc

      



Basically, go through each unique character (T, G, C, -), then pull out the values ​​you want, then finally insert the dictionary words into the dictionary. Then, once the loop is over, you can extract all the numbers you need from the dictionary.

+1


source


Just use value_counts

, this will give you the count of all unique values ​​in your column, no need to create 16 variables:



In [5]:
df = pd.DataFrame({'MUT':np.random.randint(0,16,100)})
df['MUT'].value_counts()

Out[5]:
6     11
14    10
13     9
12     9
1      8
9      7
15     6
11     6
8      5
5      5
3      5
2      5
10     4
4      4
7      3
0      3
dtype: int64

      

+1


source







All Articles