A repeatable way to remove a line from a data frame that ends with a specific character
So, I have a small dataframe, and like my header, I would like to remove all lines that end in a specific letter "n".
Here is the code that will give you the data I'm working with:
url = "http://www.basketball-reference.com/leagues/NBA_1980.html"
library(XML)
x1 = readHTMLTable(url)
east.1980 = x1[["E_standings"]]
west.1980 = x1[["W_standings"]]
east.1980 = east.1980[c(1,2)]
west.1980 = west.1980[c(1,2)]
names(east.1980) = c("Team", "W")
names(west.1980) = c("Team", "W")
wins.1980 = rbind(east.1980, west.1980)
wins.1980$Team = gsub("\\b\\d+\\b", "", wins.1980$Team)
wins.1980$Team = gsub(" +"," ",gsub("^ +","",gsub("[^a-zA-Z0-9 ]","",wins.1980$Team)))
View(wins.1980)
Here's an example of what a data frame would look like:
Team W
1 Atlantic Division
2 Boston Celtics 61
3 Philadelphia 76ers 59
4 Washington Bullets 39
5 New York Knicks 39
6 New Jersey Nets 34
7 Central Division
8 Atlanta Hawks 50
9 Houston Rockets 41
10 San Antonio Spurs 41
11 Indiana Pacers 37
12 Cleveland Cavaliers 37
13 Detroit Pistons 16
14 Midwest Division
15 Milwaukee Bucks 49
16 Kansas City Kings 47
17 Denver Nuggets 30
Basically, I want to remove the division lines "Atlantic Division, Central Division, etc.". It just so happens that all of these lines end with "n", so I'm trying to write a for loop to delete all lines where the line wins.1980$Team
ends with "n".
I want to keep repeating this process for over 30 years, so repeatability is a must.
Here are the two loops I've tried so far:
for (i in 1:nrow(wins.1980)) {
if ((str_sub(wins.1980$Team[i], -1)) == "n") {
eval(parse(text=paste0("wins.","1980","[-", i, ",]")))
}
}
for (i in 1:nrow(wins.1980)) {
if ((str_sub(wins.1980$Team[i], -1)) == "n") {
wins.1980[-i,]
}
}
I used a for loop with if ((str_sub(myData$Column[i], -1)) == "letter")
to do something if the last character was equal "letter"
, so I'm pretty sure that part of the loop works.
Since there are only 6 divisions in the NBA, I would be fine with something repeatable and say if (wins.1980$Team == "Atlantic Division" | "Midwest Division" | etc...)
and then delete that line, however I don't feel like the problem in my loop is choosing the correct lines, just deleting them.
I don't get any errors when I run each of the above loops, it works, but I think it just doesn't save what it does or anything like that.
Extracting from the previous data frame, I would like it to look like this:
Team W
2 Boston Celtics 61
3 Philadelphia 76ers 59
4 Washington Bullets 39
5 New York Knicks 39
6 New Jersey Nets 34
8 Atlanta Hawks 50
9 Houston Rockets 41
10 San Antonio Spurs 41
11 Indiana Pacers 37
12 Cleveland Cavaliers 37
13 Detroit Pistons 16
15 Milwaukee Bucks 49
16 Kansas City Kings 47
17 Denver Nuggets 30
Again, I would like to be able to repeat this over many other data frames. Any ideas?
I'm new to R, so I could ignore simpler solutions and simplicity would be much appreciated! Thanks in advance!
source to share
Here's an easier way:
wins.1980[grep("Division$", wins.1980$Team, invert = TRUE), ]
grep("Division$"...
matches anything that ends in the Split section in a column Team
(this is probably safer than selecting anything that ends in n, but you can do it with the same technique), and invert = TRUE
inverts those so you get everything that doesn't end in Division. Using this for a subset will give you all lines that Team
do not end in Division.
You can make this function applicable to many dataframes:
no_div <- function(x) {
x[grep("Division$", x$Team, invert = TRUE), ]
}
Assuming you want to multiply them all based on a column Team
; if you are using different columns, you will have to change the function to accept an additional argument. Then hover over your data with no_div(wins.1980)
.
source to share
For this you have to use substrings and subsets.
First, find the lines that end in the section
matches <- substr(wins.1980$team,nchar(wins.1980$team)-8,nchar(wins.1980$team)) %in% c("Division")
Then a subset of the data based on this
wins.1980 <- subset(wins.1980, !matches)
Edit: The best example here is fooobar.com/questions/483512 / ...
source to share