How do I read escape characters as text in Java?

public List<String> readRSS(String feedUrl, String openTag, String closeTag)
            throws IOException, MalformedURLException {

        URL url = new URL(feedUrl);
        BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));

        String currentLine;
        List<String> tempList = new ArrayList<String>();
        while ((currentLine = reader.readLine()) != null) {
            Integer tagEndIndex = 0;
            Integer tagStartIndex = 0;
            while (tagStartIndex >= 0) {
                tagStartIndex = currentLine.indexOf(openTag, tagEndIndex);
                if (tagStartIndex >= 0) {
                    tagEndIndex = currentLine.indexOf(closeTag, tagStartIndex);
                    tempList.add(currentLine.substring(tagStartIndex + openTag.length(), tagEndIndex) + "\n");
                }
            }
        }
        if (tempList.size() > 0) {
            if(openTag.contains("title")){
                tempList.remove(0);
                tempList.remove(0);
            }
            else if(openTag.contains("desc")){
                tempList.remove(0);
            }
        }
        return tempList;
    }

      

I wrote this code to read an RSS feed. Everything works fine, but when the parser finds a char like this &#xD;

, it breaks. This is because it cannot find its end tags due to the xml being escaped.

I don't know how I can fix this inside my code. Can anyone help me solve this problem?

+3


source to share


1 answer


The problem is that the special character &#xD;

is a line break, so your start and end tags end on different lines. So, if you are reading line by line, this will not work with the code you have.

You can try something like this:

StringBuffer fullLine = new StringBuffer();

while ((currentLine = reader.readLine()) != null) {
    int tagStartIndex = currentLine.indexOf(openTag, 0);
    int tagEndIndex = currentLine.indexOf(closeTag, tagStartIndex);

    // both tags on the same line
    if (tagStartIndex != -1 && tagEndIndex != -1) {
        // process the whole line
        tempList.add(currentLine);
        fullLine = new StringBuffer();
    // no tags on this line but the buffer has been started
    } else if (tagStartIndex == -1 && tagEndIndex == -1 && fullLine.length() > 0) {
        /*
         * add the current line to the buffer; it is part 
         * of a larger line
         */
        fullLine.append(currentLine);
    // start tag is on this line
    } else if (tagStartIndex != -1 && tagEndIndex == -1) {
        /*
         *  line started but did not have an end tag; add it to 
         *  a new buffer
         */
        fullLine = new StringBuffer(currentLine);
        // end tag is on this line
    } else if (tagEndIndex != -1 && tagStartIndex == -1) {
        /*
         *  line ended but did not have a start tag; add it to 
         *  the current buffer and then process the buffer
         */
        fullLine.append(currentLine);
        tempList.add(fullLine.toString());
        fullLine = new StringBuffer();
    }
}

      

Given this example input:

<title>another &#xD;
title 0</title>
<title>another title 1</title>
<title>another title 2</title>
<title>another title 3</title>
<desc>description 0</desc>
<desc>another &#xD;
description 1</desc>
<title>another title 4</title>
<title>another &#xD;
another line in between &#xD;
title 5</title>

      



The complete lines in tempList

for title

will become:

<title>another &#xD;title 0</title>
<title>another title 1</title>
<title>another title 2</title>
<title>another title 3</title>
<title>another title 4</title>
<title>another &#xD;another line in between &#xD;title 5</title>

      

And for desc

:

<desc>description 0</desc>
<desc>another &#xD;description 1</desc>

      

You should test this approach for performance on your full RSS feed. Also, note that special characters will not be escaped.

+1


source







All Articles