PCDATA invalid Char in R

First of all, I'm sorry if this is a duplicate question. I've tried for hours now and I see different solutions for PHP or other languages, but not for R.

I am fetching data from last.fm site using their API. You need an API key to fetch the data I'm trying to get, but I'll make it easier and hope you can answer my question.

Here is my problem: At some point while fetching data, I run into an error that stops my request. I missed it once, but it comes back again and again. I always get the same: PCDATA invalid Char value #

Here's an example:

string = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<lfm status=\"ok\">\n<results for=\"a\" xmlns:opensearch=\"http://a9.com/-/spec/opensearch/1.1/\">\n<opensearch:Query role=\"request\" searchTerms=\"a\" startPage=\"1382\" />\n<opensearch:totalResults>212588</opensearch:totalResults>\n<opensearch:startIndex>1381</opensearch:startIndex>\n<opensearch:itemsPerPage>1</opensearch:itemsPerPage><artistmatches>\n<artist>\n    <name>!B0A \0348E09;&gt;2</name>\n                <listeners>1672</listeners>\n                <mbid></mbid>\n                        <url>http://www.last.fm/music/!B0A+%1C8E09;%3E2</url>\n    <streamable>0</streamable>\n            <image size=\"small\">http://userserve-ak.last.fm/serve/34/88015017.png</image>\n        <image size=\"medium\">http://userserve-ak.last.fm/serve/64/88015017.png</image>\n        <image size=\"large\">http://userserve-ak.last.fm/serve/126/88015017.png</image>\n        <image size=\"extralarge\">http://userserve-ak.last.fm/serve/252/88015017.png</image>\n        <image size=\"mega\">http://userserve-ak.last.fm/serve/_/88015017/B0A+8E092+15286997.png</image>\n    </artist></artistmatches>\n</results></lfm>\n"

      

When I try to parse this text, I get an error:

doc = xmlParse(string, asText = TRUE)
PCDATA invalid Char value 28
Error: 1: PCDATA invalid Char value 28

      

I believe the part that does this comes from this part of the line:

<name>!B0A \0348E09;&gt;2</name>\n 

      

But now I cannot be sure.

What I am looking for is one of these solutions being the first ideal situation, but any of the others will make me happy:

1 - allow R to accept these invalid characters

2 - Eliminate invalid characters and continue parsing without stopping.

3 - Skip the line with invalid characters and continue parsing

4 - Create a function to find invalid characters so I can enable this when fetching data from last.fm

Hope you can understand the question and help me with this. thanks in advance

+3


source to share





All Articles