Sorting full name alphabetically in R
Full names (and captions, etc.) usually need to be split across multiple columns in order to sort the strings alphabetically using "last name". I've never come across an easy way to achieve this in SQL when one column contains the fully qualified name.
However, I know that R has thousands of libraries, and although I have not come across examples that can do this without splitting first, last, and titles into their respective columns. I thought I'd take a look, there is a slightly more efficient way to handle this situation.
In the dataset with which I work, there is one column with the full names . For example:
Names
1 Robert Johnson
2 Billy Joel
3 Donald Fagen
4 Trent Reznor
5 Wolfgang Mozart
I need to sort them alphabetically without creating additional columns. So far I'm not sure if this is possible, but I found it to work relatively painlessly. Fortunately, each first name follows the term "last name" (space) "last name". So I can use a separate () from the tidyr library to isolate surnames easily:
library(tidyverse)
library(magrittr)
# Separate into "first name" and "last name" columns
data %<>% separate(Names, c('first_name', 'last_name'), sep = ' ')
first_name last_name
1 Robert Johnson
2 Billy Joel
3 Donald Fagen
4 Trent Reznor
5 Wolfgang Mozart
Then I can sort the new last name column alphabetically using arr () and immediately rebuild the original column using union ():
# Arrange rows alphabetically by last name
data %<>% arrange(last_name)
# Rebuild original column and dissolve temporary 2nd column
data %>% unite(Names, first_name:last_name, sep=' ')
This successfully rebuilds the original table and alphabetizes the Names column, followed by the last name. Is there any other way to achieve this without ever (even temporarily) creating that second "last name" column? Any additional R libraries will be accepted. Thank you!
source to share
The function used here tidyverse
will be str_extract
from a package stringr
. It's also a little easier than gsub
or str_replace
, since you don't need to replace the captured part of the string ""
.
library(tidyverse)
library(stringr)
data %>%
arrange(str_extract(Names,'\\s.*$'))
source to share