Bank statement violation by category in R
This is an approximate part of a bank statement:
Category<-c(
"Merchandise",
"Dining",
"Lodging",
"Other Services",
"Dining",
"Merchandise",
"Merchandise",
"Other Services",
"Entertainment",
"Merchandise",
"Merchandise",
"Internet",
"Other Services",
"Merchandise",
"Merchandise",
"Merchandise",
"Other Services",
"Phone/Cable",
"Airfare",
"Airfare",
"Other Services",
"Merchandise",
"Merchandise",
"Internet",
"Other Services",
"Other Services",
"Phone/Cable",
"Other Services",
"Healthcare"
)
Debit<-as.numeric(c(
"26.34",
"4.75",
"9.88",
"31.26",
"8.67",
"64.64",
"5.18",
"15.5",
"10",
"12.93",
"10.02",
"6.95",
"39.93",
"16.39",
"24",
"40.35",
"27.33",
"11.12",
"214.2",
"214.2",
"4",
"86.28",
"19.99",
"19.99",
"13.68",
"205",
"10.96",
"85",
"1525"
))
df<-data.frame(Category,Debit)
with the following output:
Category Debit
1 Merchandise 26.34
2 Dining 4.75
3 Lodging 9.88
4 Other Services 31.26
5 Dining 8.67
6 Merchandise 64.64
7 Merchandise 5.18
8 Other Services 15.50
9 Entertainment 10.00
10 Merchandise 12.93
11 Merchandise 10.02
12 Internet 6.95
13 Other Services 39.93
14 Merchandise 16.39
15 Merchandise 24.00
16 Merchandise 40.35
17 Other Services 27.33
18 Phone/Cable 11.12
19 Airfare 214.20
20 Airfare 214.20
21 Other Services 4.00
22 Merchandise 86.28
23 Merchandise 19.99
24 Internet 19.99
25 Other Services 13.68
26 Other Services 205.00
27 Phone/Cable 10.96
28 Other Services 85.00
29 Healthcare 1525.00
From there, in order to see the total amount of money I spent in individual categories, for example "Product", I have to do this:
> sum(df$Debit[which(df$Category=="Merchandise")])
[1] 306.12
But this is cumbersome to do for each category one at a time. I was wondering if there is a faster way to display it so that in one column I get all the levels listed for df$Category
, and in the second column the sum for each category.
Something like that:
Merchandise 306.12
Other Services 421.7
Phone/Cable 22.08
etc...
Any suggestions?
source to share
Also easily resolved with the package data.table
:
library(data.table)
dt <- as.data.table(df)
dt[, list(DebitTotal = sum(Debit)), by = Category]
# Category DebitTotal
# 1: Merchandise 306.12
# 2: Dining 13.42
# 3: Lodging 9.88
# 4: Other Services 421.70
# 5: Entertainment 10.00
# 6: Internet 26.94
# 7: Phone/Cable 22.08
# 8: Airfare 428.40
# 9: Healthcare 1525.00
source to share
You also have a classic SQL group of choice. First you need to download the sqldf package. library (sqldf)
sqldf ("
select Category, sum(Debit) `debit_sum`
from df
group by Category
")
He will give you the following
Category sum
1 Airfare 428.40
2 Dining 13.42
3 Entertainment 10.00
4 Healthcare 1525.00
5 Internet 26.94
6 Lodging 9.88
7 Merchandise 306.12
8 Other Services 421.70
9 Phone/Cable 22.08
source to share