Java how to encode single quote and double quote into HTML entities?

How can I encode "

to "

and '

from '

?

I am very surprised by the single quotation mark and the double quotation mark is not defined in HTML entities 4.0, so StringEscapeUtils

cannot escape those two characters into the corresponding entities.

Is there another String related tool that can do this?

Any reason why single quote and double quote are not defined in HTML Entities 4.0?

Apart from single quote and double quote, is there any infrastructure capable of encoding an entire unicode character into their respective entities? Since all unicode can be manually converted to decimal and displayed in HTML, so I wonder if there is any tool that can convert it automatically?

+3


source to share


1 answer


  • Single quote and double quote not defined in HTML 4.0

Only one quote is not defined in HTML 4.0, double quote is defined as "

starting in HTML2.0

  1. StringEscapeUtils cannot escape these two characters into corresponding objects

escapeXml11

in StringEscapeUtils

supports converting single quote to '

.

Example:

StringEscapeUtils.escapeXml11("'"); //Returns '
StringEscapeUtils.escapeHtml4("\""); //Returns "

      

  1. Is there another String related tool that can do this?

HTMLUtils from Spring framework takes care of single quotes and double quotes, and also converts values ​​to decimal (like '

and "

). The following example is taken from the answer to this question :

import org.springframework.web.util.HtmlUtils;
[...]
HtmlUtils.htmlEscapeDecimal("&")` //gives &
HtmlUtils.htmlEscape("&")` //gives &

      



  1. Any reason why single quote and double quote are not defined in HTML Entities 4.0?

According to HTML 4 Character Entity References, the single quote is undefined. Double quote is available from HTML2.0. Whereas single quote is supported as part of XHTML1.0 .

  1. A tool or method for encoding an entire Unicode character into corresponding objects

There is a very nice and simple Java implementation mentioned as part of the answer to this question .

Below is a sample program based on this answer:

import org.apache.commons.lang3.StringEscapeUtils;

public class HTMLCharacterEscaper {
    public static void main(String[] args) {        
        //With StringEscapeUtils
        System.out.println("Using SEU: " + StringEscapeUtils.escapeHtml4("\" ΒΆ"));
        System.out.println("Using SEU: " + StringEscapeUtils.escapeXml11("'"));

        //Single quote & double quote
        System.out.println(escapeHTML("It good"));
        System.out.println(escapeHTML("\" Grit \""));

        //Unicode characters
        System.out.println(escapeHTML("This is copyright symbol Β©"));
        System.out.println(escapeHTML("Paragraph symbol ΒΆ"));
        System.out.println(escapeHTML("This is pound Β£"));      
    }

    public static String escapeHTML(String s) {
        StringBuilder out = new StringBuilder(Math.max(16, s.length()));
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (c > 127 || c == '"' || c == '<' || c == '>' || c == '&' || c == '\'') {
                out.append("&#");
                out.append((int) c);
                out.append(';');
            } else {
                out.append(c);
            }
        }
        return out.toString();
    }

}

      

Below are some interesting links that I came across during the answer:

+6


source







All Articles