How to sort by Library of Congress (LCC) classification in R

Libraries of Congress Classification numbers are used in libraries to give ringing numbers to be ordered from the shelf. They can be simple or quite complex, with a few required parts but many optional. (See "Entering Call Numbers at 050" on the 050 Library of Congress phone number for how they break, or lc_callnumber for the ruby ​​tool that sorts them.)

I would like to sort by LCC number in R. I looked at Sorting a list of non-trivial elements in R and Sorting a list of custom class elements in R? but didn't understand.

Here are the four call numbers entered in sorted order:

call_numbers <- c("QA 7 H3 1992", "QA 76.73 R3 W53 2015", "QA 90 H33 2016", "QA 276.45 R3 A35 2010")

      

sort

sorts them by character, so 276 <7 <76.73 90.

> sort(call_numbers)
[1] "QA 276.45 R3 A35 2010" "QA 7 H3 1992"          "QA 76.73 R3 W53 2015"  "QA 90 H33 2016"       

      

To sort them correctly, I think I would need to define a class and then some methods on it, for example:

library(stringr)
class(call_numbers) <- "LCC"

## Just pick out the letters and digits for now, leave the rest
## until sorting works, then work down more levels.
lcc_regex <- '([[:alpha:]]+?) ([[:digit:]\\.]+?) (.*)'

"<.LCC" <- function(x, y) {
    x_lcc <- str_match(x, lcc_regex)
    y_lcc <- str_match(y, lcc_regex)
    if(x_lcc[2] < y_lcc[2]) return(x)
    if(as.integer(x_lcc[3]) < as.integer(y_lcc[3])) return(x)
}
"==.LCC" <- function(x, y) {
    x_lcc <- str_match(x, lcc_regex)
    y_lcc <- str_match(y, lcc_regex)
    x_lcc[2] == y_lcc[2] && x_lcc[3] == y_lcc[3]
}

">.LCC" <- function(x, y) {
    x_lcc <- str_match(x, lcc_regex)
    y_lcc <- str_match(y, lcc_regex)
    if(x_lcc[2] > y_lcc[2]) return(x)
    if(as.integer(x_lcc[3]) > as.integer(y_lcc[3])) return(x)
}

      

This does not change the sort order. I haven't defined the subset ( "[.myclass"

) method because I have no idea what it should be.

+3


source to share


3 answers


mixedsort

from the package gtools

(part of the R standard) turns out to be just a trick:

library(gtools)
call_numbers <- c("QA 7 H3 1992", "QA 76.73 R3 W53 2015", "QA 90 H33 2016", "QA 276.45 R3 A35 2010")
mixedsort(call_numbers)
## [1] "QA 7 H3 1992"          "QA 76.73 R3 W53 2015"  "QA 90 H33 2016"        "QA 276.45 R3 A35 2010"

      



It mixedorder

can also be used to sort a data frame by one column.

This is a special case of what was answered earlier in How to sort a character vector where elements contain letters and numbers in R?

0


source


This might be an easier approach. This assumes that each number has the following format: 2-letter code, space, number, space, letter number, space ... Year.

The strategy consists of two sections of LOC number by spaces and then getting 3 columns of data for the first three fields, and then each column can be sorted sequentially using a function order

.



call_numbers <- c("QA 7 H3 1992", "QA 76.73 R3 W53 2015", "QA 90 H33 2016", "QA 276.45 R3 A35 2010")

#split on the spaces
 split<-strsplit(call_numbers, " " )
#Retrieve the 2 letter code
 letters<-sapply(split, function(x){x[1]})
#retrieve the 2nd number group and convert to numeric values for sorting
 second<-sapply(split, function(x){as.numeric(x[2])})
#obtain the 3rd grouping
 third<-sapply(split, function(x){x[3]})
#find the year
 year<-sapply(split, function(x){x[length(x)]})

df<-data.frame(call_numbers)
#sort data based on the first and 2nd column
call_numbers[order(letters, second, third)]

      

For this limited dataset, the technique works.

+1


source


I feel like I spent too much time figuring out a solution to exactly what you are trying to do - only mine was for JavaScript. But it basically comes down to the notion of "normalizing" these numbers so that they can be sorted alphabetically.

Maybe this solution can be used and ported to R. At least hopefully this can get you started. It includes some regexes and some additional scripting to get the call numbers into a state in which they can be sorted.

https://github.com/rayvoelker/js-loc-callnumbers/blob/master/locCallClass.js

Good luck!

0


source







All Articles