A repeatable way to remove a line from a data frame that ends with a specific character

So, I have a small dataframe, and like my header, I would like to remove all lines that end in a specific letter "n".

Here is the code that will give you the data I'm working with:

url = "http://www.basketball-reference.com/leagues/NBA_1980.html"

library(XML)
x1 = readHTMLTable(url)

east.1980 = x1[["E_standings"]]
west.1980 = x1[["W_standings"]]

east.1980 = east.1980[c(1,2)]
west.1980 = west.1980[c(1,2)]

names(east.1980) = c("Team", "W")
names(west.1980) = c("Team", "W")

wins.1980 = rbind(east.1980, west.1980)

wins.1980$Team = gsub("\\b\\d+\\b", "", wins.1980$Team)
wins.1980$Team = gsub(" +"," ",gsub("^ +","",gsub("[^a-zA-Z0-9 ]","",wins.1980$Team)))

View(wins.1980)

      

Here's an example of what a data frame would look like:

                Team    W
1   Atlantic Division    
2   Boston Celtics      61
3   Philadelphia 76ers  59
4   Washington Bullets  39
5   New York Knicks     39
6   New Jersey Nets     34
7   Central Division     
8   Atlanta Hawks       50
9   Houston Rockets     41
10  San Antonio Spurs   41
11  Indiana Pacers      37
12  Cleveland Cavaliers 37
13  Detroit Pistons     16
14  Midwest Division     
15  Milwaukee Bucks     49
16  Kansas City Kings   47
17  Denver Nuggets      30

      

Basically, I want to remove the division lines "Atlantic Division, Central Division, etc.". It just so happens that all of these lines end with "n", so I'm trying to write a for loop to delete all lines where the line wins.1980$Team

ends with "n".

I want to keep repeating this process for over 30 years, so repeatability is a must.

Here are the two loops I've tried so far:

for (i in 1:nrow(wins.1980)) {                      
  if ((str_sub(wins.1980$Team[i], -1)) == "n") {    
    eval(parse(text=paste0("wins.","1980","[-", i, ",]")))  
  }
}




for (i in 1:nrow(wins.1980)) {
  if ((str_sub(wins.1980$Team[i], -1)) == "n") {    
   wins.1980[-i,]                      
  }
}

      

I used a for loop with if ((str_sub(myData$Column[i], -1)) == "letter")

to do something if the last character was equal "letter"

, so I'm pretty sure that part of the loop works.

Since there are only 6 divisions in the NBA, I would be fine with something repeatable and say if (wins.1980$Team == "Atlantic Division" | "Midwest Division" | etc...)

and then delete that line, however I don't feel like the problem in my loop is choosing the correct lines, just deleting them.

I don't get any errors when I run each of the above loops, it works, but I think it just doesn't save what it does or anything like that.

Extracting from the previous data frame, I would like it to look like this:

                 Team   W
2   Boston Celtics      61
3   Philadelphia 76ers  59
4   Washington Bullets  39
5   New York Knicks     39
6   New Jersey Nets     34
8   Atlanta Hawks       50
9   Houston Rockets     41
10  San Antonio Spurs   41
11  Indiana Pacers      37
12  Cleveland Cavaliers 37
13  Detroit Pistons     16
15  Milwaukee Bucks     49
16  Kansas City Kings   47
17  Denver Nuggets      30

      

Again, I would like to be able to repeat this over many other data frames. Any ideas?

I'm new to R, so I could ignore simpler solutions and simplicity would be much appreciated! Thanks in advance!

+3


source to share


4 answers


Here's an easier way:

wins.1980[grep("Division$", wins.1980$Team, invert = TRUE), ]

      

grep("Division$"...

matches anything that ends in the Split section in a column Team

(this is probably safer than selecting anything that ends in n, but you can do it with the same technique), and invert = TRUE

inverts those so you get everything that doesn't end in Division. Using this for a subset will give you all lines that Team

do not end in Division.



You can make this function applicable to many dataframes:

no_div <- function(x) {
  x[grep("Division$", x$Team, invert = TRUE), ]
}

      

Assuming you want to multiply them all based on a column Team

; if you are using different columns, you will have to change the function to accept an additional argument. Then hover over your data with no_div(wins.1980)

.

+6


source


You can use grepl

like this,

df <- data.frame(Team=c("Boston Celtics","Atlantic Division",
                        "Central Division","Atlanta Hawks"),
                 W=sample(10:20, 4))

df <- df[!grepl("n$", df$Team),]

      



Where "n $" is a regular expression meaning "string ends with n"

+2


source


For this you have to use substrings and subsets.

First, find the lines that end in the section

matches <- substr(wins.1980$team,nchar(wins.1980$team)-8,nchar(wins.1980$team)) %in% c("Division")

      

Then a subset of the data based on this

wins.1980 <- subset(wins.1980, !matches)

      

Edit: The best example here is fooobar.com/questions/483512 / ...

+1


source


If you like the package syntax dplyr

and magrittr

:

library(dplyr) ; library(magrittr)
wins.1980 %<>% filter(!grepl("Division", Team))

      

+1


source







All Articles