A repeatable way to remove a line from a data frame that ends with a specific character

Question

A repeatable way to remove a line from a data frame that ends with a specific character

So, I have a small dataframe, and like my header, I would like to remove all lines that end in a specific letter "n".

Here is the code that will give you the data I'm working with:

url = "http://www.basketball-reference.com/leagues/NBA_1980.html"

library(XML)
x1 = readHTMLTable(url)

east.1980 = x1[["E_standings"]]
west.1980 = x1[["W_standings"]]

east.1980 = east.1980[c(1,2)]
west.1980 = west.1980[c(1,2)]

names(east.1980) = c("Team", "W")
names(west.1980) = c("Team", "W")

wins.1980 = rbind(east.1980, west.1980)

wins.1980$Team = gsub("\\b\\d+\\b", "", wins.1980$Team)
wins.1980$Team = gsub(" +"," ",gsub("^ +","",gsub("[^a-zA-Z0-9 ]","",wins.1980$Team)))

View(wins.1980)

Here's an example of what a data frame would look like:

                Team    W
1   Atlantic Division    
2   Boston Celtics      61
3   Philadelphia 76ers  59
4   Washington Bullets  39
5   New York Knicks     39
6   New Jersey Nets     34
7   Central Division     
8   Atlanta Hawks       50
9   Houston Rockets     41
10  San Antonio Spurs   41
11  Indiana Pacers      37
12  Cleveland Cavaliers 37
13  Detroit Pistons     16
14  Midwest Division     
15  Milwaukee Bucks     49
16  Kansas City Kings   47
17  Denver Nuggets      30

Basically, I want to remove the division lines "Atlantic Division, Central Division, etc.". It just so happens that all of these lines end with "n", so I'm trying to write a for loop to delete all lines where the line wins.1980$Team

ends with "n".

I want to keep repeating this process for over 30 years, so repeatability is a must.

Here are the two loops I've tried so far:

for (i in 1:nrow(wins.1980)) {                      
  if ((str_sub(wins.1980$Team[i], -1)) == "n") {    
    eval(parse(text=paste0("wins.","1980","[-", i, ",]")))  
  }
}




for (i in 1:nrow(wins.1980)) {
  if ((str_sub(wins.1980$Team[i], -1)) == "n") {    
   wins.1980[-i,]                      
  }
}

I used a for loop with if ((str_sub(myData$Column[i], -1)) == "letter")

to do something if the last character was equal "letter"

, so I'm pretty sure that part of the loop works.

Since there are only 6 divisions in the NBA, I would be fine with something repeatable and say if (wins.1980$Team == "Atlantic Division" | "Midwest Division" | etc...)

and then delete that line, however I don't feel like the problem in my loop is choosing the correct lines, just deleting them.

I don't get any errors when I run each of the above loops, it works, but I think it just doesn't save what it does or anything like that.

Extracting from the previous data frame, I would like it to look like this:

                 Team   W
2   Boston Celtics      61
3   Philadelphia 76ers  59
4   Washington Bullets  39
5   New York Knicks     39
6   New Jersey Nets     34
8   Atlanta Hawks       50
9   Houston Rockets     41
10  San Antonio Spurs   41
11  Indiana Pacers      37
12  Cleveland Cavaliers 37
13  Detroit Pistons     16
15  Milwaukee Bucks     49
16  Kansas City Kings   47
17  Denver Nuggets      30

Again, I would like to be able to repeat this over many other data frames. Any ideas?

I'm new to R, so I could ignore simpler solutions and simplicity would be much appreciated! Thanks in advance!

+3

string for-loop r

Matt collins 23 Apr 15 at 11:45 pm

source to share

4 answers

You can use grepl

like this,

df <- data.frame(Team=c("Boston Celtics","Atlantic Division",
                        "Central Division","Atlanta Hawks"),
                 W=sample(10:20, 4))

df <- df[!grepl("n$", df$Team),]

Where "n $" is a regular expression meaning "string ends with n"

+2

Remko Duursma Apr 24 15 at 0:04

source to share

For this you have to use substrings and subsets.

First, find the lines that end in the section

matches <- substr(wins.1980$team,nchar(wins.1980$team)-8,nchar(wins.1980$team)) %in% c("Division")

Then a subset of the data based on this

wins.1980 <- subset(wins.1980, !matches)

Edit: The best example here is fooobar.com/questions/483512 / ...

+1

jprockbelly Apr 24 At 12:10 am

source to share

If you like the package syntax dplyr

and magrittr

:

library(dplyr) ; library(magrittr)
wins.1980 %<>% filter(!grepl("Division", Team))

+1

Sam firke Apr 24 15 at 12:17 am

source to share

Kara woo · Accepted Answer · 2015-04-24T00:01:30+0000

Here's an easier way:

wins.1980[grep("Division$", wins.1980$Team, invert = TRUE), ]

grep("Division$"...

matches anything that ends in the Split section in a column Team

(this is probably safer than selecting anything that ends in n, but you can do it with the same technique), and invert = TRUE

inverts those so you get everything that doesn't end in Division. Using this for a subset will give you all lines that Team

do not end in Division.

You can make this function applicable to many dataframes:

no_div <- function(x) {
  x[grep("Division$", x$Team, invert = TRUE), ]
}

Assuming you want to multiply them all based on a column Team

; if you are using different columns, you will have to change the function to accept an additional argument. Then hover over your data with no_div(wins.1980)

.

A repeatable way to remove a line from a data frame that ends with a specific character

More articles: