Counting T / F Values ββfor Multiple Conditions
I am starting to use pandas.
I am looking for mutations in several patients. I have 16 different conditions. I'm just writing code about it, but how do I do it for a loop? I am trying to find changes in the MUT column and set them to True and False. Then try to count True / False numbers. I only did 4.
Can you suggest an easier way instead of writing the same code 16 times?
s1=df["MUT"]
A_T= s1.str.contains("A:T")
ATnum= A_T.value_counts(sort=True)
s2=df["MUT"]
A_G=s2.str.contains("A:G")
AGnum=A_G.value_counts(sort=True)
s3=df["MUT"]
A_C=s3.str.contains("A:C")
ACnum=A_C.value_counts(sort=True)
s4=df["MUT"]
A__=s4.str.contains("A:-")
A_num=A__.value_counts(sort=True)
source to share
I'm not an expert on using Pandas, so I don't know if there is a cleaner way to do this, but maybe the following might work?
chars = 'TGC-'
nums = {}
for char in chars:
s = df["MUT"]
A = s.str.contains("A:" + char)
num = A.value_counts(sort=True)
nums[char] = num
ATnum = nums['T']
AGnum = nums['G']
# ...etc
Basically, go through each unique character (T, G, C, -), then pull out the values ββyou want, then finally insert the dictionary words into the dictionary. Then, once the loop is over, you can extract all the numbers you need from the dictionary.
source to share
Just use value_counts
, this will give you the count of all unique values ββin your column, no need to create 16 variables:
In [5]:
df = pd.DataFrame({'MUT':np.random.randint(0,16,100)})
df['MUT'].value_counts()
Out[5]:
6 11
14 10
13 9
12 9
1 8
9 7
15 6
11 6
8 5
5 5
3 5
2 5
10 4
4 4
7 3
0 3
dtype: int64
source to share