Jsoup & to conversion and when I need this information as it is
In a couple of cases, I am passing in JSON with the URL of the page where the user performed some action. This page url will contain whatever part of the query string I need so the user can redirect to the same page when requested from my application. my JSON will be like
{
"userId":"123456789",
"pageUrl":"http://exampl.com/designs.jsp?templateId=f348aaf2-45e4-4836-9be4-9a7e63105932&kind=123",
"action":"favourite"
}
But when I run this json through Jsoup.clean(json, Whitelist.basic())
I see what has &
been replaced with &
. Can I customize Jsoup
to avoid this character?
source to share
Hiding occurs in org.jsoup.nodes.Entities
. This is the code in question
static void escape(StringBuilder accum, String string,
Document.OutputSettings out, boolean inAttribute,
boolean normaliseWhite, boolean stripLeadingWhite) {
boolean lastWasWhite = false;
boolean reachedNonWhite = false;
EscapeMode escapeMode = out.escapeMode();
CharsetEncoder encoder = out.encoder();
CoreCharset coreCharset = CoreCharset.access$300(encoder.charset().name());
Map map = escapeMode.getMap();
int length = string.length();
int codePoint;
for (int offset = 0; offset < length; offset += Character.charCount(codePoint)) {
codePoint = string.codePointAt(offset);
if (normaliseWhite) {
if (StringUtil.isWhitespace(codePoint)) {
if ((stripLeadingWhite) && (!(reachedNonWhite)))
continue;
if (lastWasWhite)
continue;
accum.append(' ');
lastWasWhite = true;
continue;
}
lastWasWhite = false;
reachedNonWhite = true;
}
if (codePoint < 65536) {
char c = (char) codePoint;
switch (c) {
case '&':
accum.append("&");
break;
case ' ':
if (escapeMode != EscapeMode.xhtml)
accum.append(" ");
else
accum.append(c);
break;
case '<':
if (!(inAttribute))
accum.append("<");
else
accum.append(c);
break;
case '>':
if (!(inAttribute))
accum.append(">");
else
accum.append(c);
break;
case '"':
if (inAttribute)
accum.append(""");
else
accum.append(c);
break;
default:
if (canEncode(coreCharset, c, encoder))
accum.append(c);
else if (map.containsKey(Character.valueOf(c)))
accum.append('&')
.append((String) map.get(Character.valueOf(c)))
.append(';');
else
accum.append("&#x")
.append(Integer.toHexString(codePoint))
.append(';');
}
} else {
String c = new String(Character.toChars(codePoint));
if (encoder.canEncode(c))
accum.append(c);
else
accum.append("&#x").append(Integer.toHexString(codePoint))
.append(';');
}
}
}
A quick way to do what you need is to use something like this
String str = "http://exampl.com/designs.jsp?templateId=f348aaf2-45e4-4836-9be4-9a7e63105932&kind=123";
str = Jsoup.clean(str, Whitelist.basic());
System.out.println(str);
str = Parser.unescapeEntities(str, true);
System.out.println(str);
Another way would be to extend the above class and override the method causing the problem, but since it is only visible to the package (default visibility), this would mean you need to load the source, change the visibility above, and override the class (so the method will be visible) ...
source to share