How to create ENTITY links in DOCTYPE using perl / LibXML

I am trying to create the following DTD declarations containing entities:

<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" 
[ <!ENTITY icon.url "https://example.com/icon.png"> 
<!ENTITY base.url "https://example.com/content/" > ]>

      

I can successfully create a DOCTYPE without entity references:

#!/usr/bin/perl -w
use strict;
use XML::LibXML;

my $doc = XML::LibXML::Document->new('1.0','UTF-8');
my $dtd = $doc->createInternalSubset( "LinkSet", "-//NLM//DTD LinkOut 1.0//EN", "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" );

my $ls = $doc->createElement( "LinkSet" );
$doc->setDocumentElement($ls);

print $doc->toString;
exit;

      

Results in:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd">
<LinkSet/>

      

The XML :: LibXML documentation shows how to add a document reference for a document, but not how to declare an object in a DOCTYPE.

A similar (but PHP based) question points to creating ENTITY links as a string and parsing that. Is this the best approach in Perl too?

+3


source to share


1 answer


The documentation for XML::LibXML::Document

says this

[Document class] inherits all functionality from XML::LibXML::Node

as specified in the DOM Specification. This provides access to nodes other than the document-level root element — for example, "DTD". Support for these nodes is limited at this time.

It also turns out that the source of these restrictions is libxml2

not the Perl module. This makes sense because the DTD has a completely different syntax from XML (or even XML processing instructions), although it may look similar in appearance.

The only way is to parse the underlying document with the required DTD and work with that



Thus

use strict;
use warnings 'all';

use XML::LibXML;

my $doc = XML::LibXML->load_xml(string => <<__END_XML__);
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" 
[
  <!ENTITY icon.url "https://example.com/icon.png"> 
  <!ENTITY base.url "https://example.com/content/">
]>

<LinkSet/>
__END_XML__

print $doc;

      

Output

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" [
<!ENTITY icon.url "https://example.com/icon.png">
<!ENTITY base.url "https://example.com/content/">
]>
<LinkSet/>

      

+3


source







All Articles