How can I extract RDFa from HTML using PHP or Java?

I am a newbie trying to learn about RDF, RDFa and stuff related to it since a few days.

My question is, consider the following HTML + RDFa code .. is it possible to split the RDF part? if you could demonstrate a simple piece of code (PHP or Java).

I heard that it is possible to use Yenu but could not find a tutorial explaining this. So if it's possible with Jena, someone can post a code snippet please.

<html xmlns=""
version="XHTML+RDFa 1.0" xml:lang="en">
    <title>John Home Page</title>
    <base href="" />
    <meta property="dc:creator" content="Jonathan Doe" />
    <link rel="foaf:primaryTopic" href="" />
  <body about="">
    <h1>John Home Page</h1>
    <p>My name is <span property="foaf:nick">John D</span> and I like
      <a href="" rel="foaf:interest"
        xml:lang="de">EinstΓΌrzende Neubauten</a>.
      My <span rel="foaf:interest" resource="urn:ISBN:0752820907">favorite
      book is the inspiring <span about="urn:ISBN:0752820907"><cite
      property="dc:title">Weaving the Web</cite> by
      <span property="dc:creator">Tim Berners-Lee</span></span>



source to share

4 answers

Yes, you can extract RDF from pages containing RDFa markup and after retrieving it, you can put it in the local RDF tripstrotor if you want to do something from that data yourself, or you can inject it into the global tripestor and be able to query it along with existing RDF data.

Here is a relevant discussion of Java RDFa parsers.



Have a look at Damian java-rdfa . You can use it with Apache Jena , here is the code snippet:

Model model = ..., "XHTML"); // xml parsing, "HTML"); // html parsing


Another option in Java is Apache Any23 .



Parsing RDFa in PHP: (use 0.8 / master branch to have an RDFa parser)

Parsing RDFa in Java:



You cannot separate RDF from HTML as RDF provides additional information about things in HTML.

It would be like taking the footnotes and bibliography out of the book and throwing the book away: mostly pointless.



All Articles