XSLT tokenize - capturing delimiters
here is a piece of code in XSL that tokenizes text into chunks separated by interpolation and similar characters. I would like to ask if there is any way to capture the lines where the text has been marked, like a comma or a period, etc.
<xsl:stylesheet version="2.0" exclude-result-prefixes="xs xdt err fn" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:err="http://www.w3.org/2005/xqt-errors" xmlns:xdt="http://www.w3.org/2005/xpath-datatypes">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="GENERUJ">
<TEXT>
<xsl:variable name="text">
<xsl:value-of select="normalize-space(unparsed-text(@filename, 'UTF-8'))" disable-output-escaping="yes"/>
</xsl:variable>
<xsl:for-each select="tokenize($text, '(\s+("|\(|\[|\{))|(("|,|;|:|\s\-|\)|\]|\})\s+)|((\.|\?|!|;)"?\s*)' )">
<xsl:choose>
<xsl:when test="string-length(.)>0">
<FRAGMENT>
<CONTENT>
<xsl:value-of select="."/>
</CONTENT>
<LENGTH>
<xsl:value-of select="string-length(.)"/>
</LENGTH>
</FRAGMENT>
</xsl:when>
<xsl:otherwise>
<FRAGMENT_COUNT>
<xsl:value-of select="last()-1"/>
</FRAGMENT_COUNT>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</TEXT>
</xsl:template>
As you can see the CONTENTS, LENGTH tags generated, I would like to add one named SEPARATOR if you know what I mean. I couldn't find an answer to this on the internet and I'm just a beginner with xsl transformations, so I'm looking for a quick solution. Thank you in advance.
source to share
The tokenize () function doesn't let you know what the delimiters are. If you need to know, you will need to use xsl:analyze-string
. If you use the same regex as for tokenize (), it passes "tokens" to the statement xsl:non-matching-substring
and "delimiters" to the statement xsl:matching-substring
.
source to share