Importing JSON into R with embedded quotes

I am trying to read the following JSON file ("my_file.json") in R which contains the following:

[{"id":"484","comment":"They call me "Bruce""}]

      

using jsonlite package (0.9.12), the following:

library(jsonlite)
fromJSON(readLines('~/my_file.json'))

      

gets the error:

"Error in parseJSON(txt) : lexical error: invalid char in json text.
84","comment":"They call me "Bruce""}]
           (right here) ------^"

      

Here's the output from R-escaping the file:

readLines('~/my_file.json')

"[{\"id\":\"484\",\"comment\":\"They call me \"Bruce\"\"}]"

      

Removing quotes around "Bruce" solves the problem, as in:

my_file.json

[{"id":"484","comment":"They call me Bruce"}]

      

But what's the problem with getting out?

+3


source to share


1 answer


In R, string literals can be specified using single or double quotes.
eg

s1 <- 'hello'
s2 <- "world"

      

Of course, if you want to include double quotes in a string literal defined with double quotes, you have to escape (using a backslash) the inner quotes, otherwise the R code parser will not be able to detect the end of the string correctly (the same is true for a single quotes).
eg

s1 <- "Hello, my name is \"John\""

      

If you print (using cat

¹) this line to the console or write this line to a file, you get the actual "face" of the line, not the literal representation of R, that is:

> cat("Hello, my name is \"John\"")
Hello, my name is "John"

      

Json parser, reads the actual "face" of the string, so in your case, json reads:

[{"id":"484","comment":"They call me "Bruce""}]

      

not (R-letter representation):

"[{\"id\":\"484\",\"comment\":\"They call me \"Bruce\"\"}]" 

      

That being said, just like the json parser, double quotes are required when you have quotes inside strings.

Hence your line should be changed like this:

[{"id":"484","comment":"They call me \"Bruce\""}]

      



If you just modify your file by adding a backslash, you should be able to read the json.

Note that the corresponding R-literal representation of this string would be:

"[{\"id\":\"484\",\"comment\":\"They call me \\\"Bruce\\\"\"}]"

      

it actually works:

> fromJSON("[{\"id\":\"484\",\"comment\":\"They call me \\\"Bruce\\\"\"}]")
   id              comment
1 484 They call me "Bruce"

      


¹ The print

default R function (also called by simply pressing ENTER on a value) returns the corresponding string literal R. If you want to print the actual string, you need to use the print(quote=F,stringToPrint)

or function cat

.


EDIT (in @EngrStudent's comment on the ability to automate the escaping of quotes):

Json parser cannot do quotes automatically.
I mean try to put yourself on computer shoes and image, you should parse this (unescaped) string as json:{ "foo1" : " : "foo2" : "foo3" }

I see at least three possible outputs that give valid json:
{ "foo1" : " : \"foo2\" : \"foo3" }


{ "foo1\" : " : "foo2\" : \"foo3" }


{ "foo1\" : \" : \"foo2" : "foo3" }

As you can see from this small example, escaping is really necessary to avoid ambiguity.

Perhaps if the string you want to escape has a specific structure in which you can recognize (without ambiguity) double quotes to be escaped, you can create your own automatic escaping routine, but you have to start from scratch because there is nothing built-in.

+7


source







All Articles