Sorting full name alphabetically in R

Full names (and captions, etc.) usually need to be split across multiple columns in order to sort the strings alphabetically using "last name". I've never come across an easy way to achieve this in SQL when one column contains the fully qualified name.

However, I know that R has thousands of libraries, and although I have not come across examples that can do this without splitting first, last, and titles into their respective columns. I thought I'd take a look, there is a slightly more efficient way to handle this situation.

In the dataset with which I work, there is one column with the full names . For example:

     Names
1    Robert Johnson                                  
2    Billy Joel                               
3    Donald Fagen                          
4    Trent Reznor                                
5    Wolfgang Mozart

      

I need to sort them alphabetically without creating additional columns. So far I'm not sure if this is possible, but I found it to work relatively painlessly. Fortunately, each first name follows the term "last name" (space) "last name". So I can use a separate () from the tidyr library to isolate surnames easily:

library(tidyverse)
library(magrittr)

# Separate into "first name" and "last name" columns
data %<>% separate(Names, c('first_name', 'last_name'), sep = ' ')

    first_name       last_name
1     Robert           Johnson                                    
2     Billy            Joel                                    
3     Donald           Fagen                                    
4     Trent            Reznor
5     Wolfgang         Mozart

      

Then I can sort the new last name column alphabetically using arr () and immediately rebuild the original column using union ():

# Arrange rows alphabetically by last name
data %<>% arrange(last_name)

# Rebuild original column and dissolve temporary 2nd column
data %>% unite(Names, first_name:last_name, sep=' ')

      

This successfully rebuilds the original table and alphabetizes the Names column, followed by the last name. Is there any other way to achieve this without ever (even temporarily) creating that second "last name" column? Any additional R libraries will be accepted. Thank you!

+3


source to share


2 answers


The function used here tidyverse

will be str_extract

from a package stringr

. It's also a little easier than gsub

or str_replace

, since you don't need to replace the captured part of the string ""

.



library(tidyverse)
library(stringr)

data %>%
    arrange(str_extract(Names,'\\s.*$'))

      

+2


source


You can do this with a dplyr

simple call gsub

.



library(dplyr)
data %>%
  arrange(gsub(".*\\s", "", Names))

            Names
1    Donald Fagen
2      Billy Joel
3  Robert Johnson
4 Wolfgang Mozart
5    Trent Reznor

      

+2


source







All Articles