Bank statement violation by category in R

This is an approximate part of a bank statement:

Category<-c(
"Merchandise",
"Dining",
"Lodging",
"Other Services",
"Dining",
"Merchandise",
"Merchandise",
"Other Services",
"Entertainment",
"Merchandise",
"Merchandise",
"Internet",
"Other Services",
"Merchandise",
"Merchandise",
"Merchandise",
"Other Services",
"Phone/Cable",
"Airfare",
"Airfare",
"Other Services",
"Merchandise",
"Merchandise",
"Internet",
"Other Services",
"Other Services",
"Phone/Cable",
"Other Services",
"Healthcare"
)

Debit<-as.numeric(c(
"26.34",
"4.75",
"9.88",
"31.26",
"8.67",
"64.64",
"5.18",
"15.5",
"10",
"12.93",
"10.02",
"6.95",
"39.93",
"16.39",
"24",
"40.35",
"27.33",
"11.12",
"214.2",
"214.2",
"4",
"86.28",
"19.99",
"19.99",
"13.68",
"205",
"10.96",
"85",
"1525"
))

df<-data.frame(Category,Debit)

      

with the following output:

         Category   Debit
1     Merchandise   26.34
2          Dining    4.75
3         Lodging    9.88
4  Other Services   31.26
5          Dining    8.67
6     Merchandise   64.64
7     Merchandise    5.18
8  Other Services   15.50
9   Entertainment   10.00
10    Merchandise   12.93
11    Merchandise   10.02
12       Internet    6.95
13 Other Services   39.93
14    Merchandise   16.39
15    Merchandise   24.00
16    Merchandise   40.35
17 Other Services   27.33
18    Phone/Cable   11.12
19        Airfare  214.20
20        Airfare  214.20
21 Other Services    4.00
22    Merchandise   86.28
23    Merchandise   19.99
24       Internet   19.99
25 Other Services   13.68
26 Other Services  205.00
27    Phone/Cable   10.96
28 Other Services   85.00
29     Healthcare 1525.00

      

From there, in order to see the total amount of money I spent in individual categories, for example "Product", I have to do this:

> sum(df$Debit[which(df$Category=="Merchandise")])
[1] 306.12

      

But this is cumbersome to do for each category one at a time. I was wondering if there is a faster way to display it so that in one column I get all the levels listed for df$Category

, and in the second column the sum for each category.

Something like that:

Merchandise 306.12
Other Services  421.7
Phone/Cable 22.08
etc...

      

Any suggestions?

+3


source to share


3 answers


In R base, you can use aggregate()

:

aggregate(Debit ~ Category, df, FUN = sum)        

      



This gives:

        Category   Debit
1        Airfare  428.40
2         Dining   13.42
3  Entertainment   10.00
4     Healthcare 1525.00
5       Internet   26.94
6        Lodging    9.88
7    Merchandise  306.12
8 Other Services  421.70
9    Phone/Cable   22.08

      

+2


source


Also easily resolved with the package data.table

:



library(data.table)
dt <- as.data.table(df)
dt[, list(DebitTotal = sum(Debit)), by = Category]

#           Category DebitTotal
# 1:    Merchandise     306.12
# 2:         Dining      13.42
# 3:        Lodging       9.88
# 4: Other Services     421.70
# 5:  Entertainment      10.00
# 6:       Internet      26.94
# 7:    Phone/Cable      22.08
# 8:        Airfare     428.40
# 9:     Healthcare    1525.00

      

+2


source


You also have a classic SQL group of choice. First you need to download the sqldf package. library (sqldf)

sqldf (" select Category, sum(Debit) `debit_sum` from df group by Category ")

He will give you the following

Category sum 1 Airfare 428.40 2 Dining 13.42 3 Entertainment 10.00 4 Healthcare 1525.00 5 Internet 26.94 6 Lodging 9.88 7 Merchandise 306.12 8 Other Services 421.70 9 Phone/Cable 22.08

+1


source







All Articles