Displaying content in local language: R
I am trying to download data from a website that contains content in both English and local languages (not English). I was able to get the data that was in English, but for the content that was in the local language I got something like below. My question is, how do I display both?
X1 X2 X3
NA
1 <U+0926><U+094B><U+0932><U+0916><U+093E> <U+0915><U+093E><U+0932><U+093F><U+0928><U+094D><U+091A><U+094B><U+0915> <U+0917><U+093E><U+0909><U+0901><U+092A><U+093E><U+0932><U+093F><U+0915><U+093E>
2 <U+0926><U+094B><U+0932><U+0916><U+093E> <U+0915><U+093E><U+0932><U+093F><U+0928><U+094D><U+091A><U+094B><U+0915> <U+0917><U+093E><U+0909><U+0901><U+092A><U+093E><U+0932><U+093F><U+0915><U+093E>
3 <U+0926><U+094B><U+0932><U+0916><U+093E> <U+0915><U+093E><U+0932><U+093F><U+0928><U+094D><U+091A><U+094B><U+0915> <U+0917><U+093E><U+0909><U+0901><U+092A><U+093E><U+0932><U+093F><U+0915><U+093E>
4 <U+0926><U+094B><U+0932><U+0916><U+093E> <U+0915><U+093E><U+0932><U+093F><U+0928><U+094D><U+091A><U+094B><U+0915> <U+0917><U+093E><U+0909><U+0901><U+092A><U+093E><U+0932><U+093F><U+0915><U+093E>
5 <U+0926><U+094B><U+0932><U+0916><U+093E> <U+0915><U+093E><U+0932><U+093F><U+0928><U+094D><U+091A><U+094B><U+0915> <U+0917><U+093E><U+0909><U+0901><U+092A><U+093E><U+0932><U+093F><U+0915><U+093E>
6 <U+0926><U+094B><U+0932><U+0916><U+093E> <U+0915><U+093E><U+0932><U+093F><U+0928><U+094D><U+091A><U+094B><U+0915> <U+0917><U+093E><U+0909><U+0901><U+092A><U+093E><U+0932><U+093F><U+0915><U+093E>
7 <U+0926><U+094B><U+0932><U+0916><U+093E> <U+0915><U+093E><U+0932><U+093F><U+0928><U+094D><U+091A><U+094B><U+0915> <U+0917><U+093E><U+0909><U+0901><U+092A><U+093E><U+0932><U+093F><U+0915><U+093E>
8 <U+0926><U+094B><U+0932><U+0916><U+093E> <U+0915><U+093E><U+0932><U+093F><U+0928><U+094D><U+091A><U+094B><U+0915> <U+0917><U+093E><U+0909><U+0901><U+092A><U+093E><U+0932><U+093F><U+0915><U+093E>
9 <U+0926><U+094B><U+0932><U+0916><U+093E> <U+0915><U+093E><U+0932><U+093F><U+0928><U+094D><U+091A><U+094B><U+0915> <U+0917><U+093E><U+0909><U+0901><U+092A><U+093E><U+0932><U+093F><U+0915><U+093E>
10 <U+0926><U+094B><U+0932><U+0916><U+093E> <U+0915><U+093E><U+0932><U+093F><U+0928><U+094D><U+091A><U+094B><U+0915> <U+0917><U+093E><U+0909><U+0901><U+092A><U+093E><U+0932><U+093F><U+0915><U+093E>
source to share
You probably have the text you want, it just doesn't display correctly.
I can reproduce your problem. Your example data had the same lines 10 times. To keep the display sane, I only repeat three times.
## Hex codes from your example
S1 = c("0926", "094B", "0932", "0916", "093E")
S2 = c("0915", "093E", "0932", "093F", "0928", "094D", "091A", "094B", "0915")
S3 = c("0917", "093E", "0909", "0901", "092A", "093E", "0932", "093F", "0915", "093E")
## Convert to Devanagari strings
X1 = rep(intToUtf8(strtoi(S1, base=16L)), 3)
X2 = rep(intToUtf8(strtoi(S2, base=16L)), 3)
X3 = rep(intToUtf8(strtoi(S3, base=16L)), 3)
df = data.frame(X1, X2, X3, stringsAsFactors=FALSE)
Will now X1
display correctly, but df
not
In Bizarrely, df$X1
and will df[,1]
display unicode, but df[1, ]
not.
The workaround is to as.matrix(df)
display the whole thing as Unicode characters.
This is apparently a known bug in the Windows version of RGui. Some other research on this can be found on this Earlier SO question and this Mailing list post
Adding
Writing these lines in a readable Unicode file requires some care. This created a csv file for my example.
Mat = as.matrix(df)
F <- file("Test1.csv", "wb", encoding="UTF-8")
BOM <- charToRaw('\xEF\xBB\xBF')
writeBin(BOM, F)
for(r in 1:nrow(Mat)) {
Line = paste(Mat[r,], collapse=",")
writeLines(Line, F, useBytes=T)
}
close(F)
source to share