"> = TRUE? This may be a stupid question, but while playing with a subset I ran into this and I can't figure ou...">

Why is "<some string>"> = <a number> TRUE?

This may be a stupid question, but while playing with a subset I ran into this and I can't figure out why this is happening. For example, consider a string, say "a"

, and an integer, say 3

why does this expression return TRUE

?

"a" >= 3
[1] TRUE

      

+3


source to share


1 answer


When you try to compare a string to an integer R will coerce the number to the string, so 3

becomes "3"

.

Using boolean operators on strings will check if the condition is true or false when they are alphabetically ordered. For example:

> "a" < "b"
[1] TRUE
> "b" > "c"
[1] FALSE

      

This is because for R ascending order a, b, c

. Numbers usually come up to letters in alphabetical order (just check files ordered by name that start with a number). This is why you get



"a" >= 3
[1] TRUE

      

Finally, note that your result may vary depending on your language and how it defines alphabetical order. The manual states :

Comparison of strings in character vectors is lexicographic within a string, using the collation sequence of the language used: see locales. Usually the collation sequence of locales such as en_US is different from C (which should use ASCII) and may be unexpected. Beware of making any assumptions about the sort order: eg. in Estonian Z is between S and T, and the sorting is not necessarily character-by-character - in Danish aa is sorted as one letter, after g. Welsh may or may not have a single sorting unit: if so follows g. Some platforms may not respect the locale and always sort in numerical byte order in the 8-bit locale or in Unicode code-point for the UTF-8 locale (and may not sort in the same order for the same language in different character sets).Matching non-letters (spaces, punctuation, hyphens, fractions, etc.) is even more problematic.

This is important and should be considered if boolean operators are used to compare strings (whether they are compared to numbers or not).

+8


source







All Articles