How can I extract RDFa from HTML using PHP or Java?

I am a newbie trying to learn about RDF, RDFa and stuff related to it since a few days.

My question is, consider the following HTML + RDFa code .. is it possible to split the RDF part? if you could demonstrate a simple piece of code (PHP or Java).

I heard that it is possible to use Yenu but could not find a tutorial explaining this. So if it's possible with Jena, someone can post a code snippet please.

<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
version="XHTML+RDFa 1.0" xml:lang="en">
  <head>
    <title>John Home Page</title>
    <base href="http://example.org/john-d/" />
    <meta property="dc:creator" content="Jonathan Doe" />
    <link rel="foaf:primaryTopic" href="http://example.org/john-d/#me" />
  </head>
  <body about="http://example.org/john-d/#me">
    <h1>John Home Page</h1>
    <p>My name is <span property="foaf:nick">John D</span> and I like
      <a href="http://www.neubauten.org/" rel="foaf:interest"
        xml:lang="de">EinstΓΌrzende Neubauten</a>.
    </p>
    <p>
      My <span rel="foaf:interest" resource="urn:ISBN:0752820907">favorite
      book is the inspiring <span about="urn:ISBN:0752820907"><cite
      property="dc:title">Weaving the Web</cite> by
      <span property="dc:creator">Tim Berners-Lee</span></span>
     </span>
    </p>
  </body>
</html>

      

+3
java html php rdf rdfa


source to share


4 answers


Yes, you can extract RDF from pages containing RDFa markup and after retrieving it, you can put it in the local RDF tripstrotor if you want to do something from that data yourself, or you can inject it into the global tripestor and be able to query it along with existing RDF data.



Here is a relevant discussion of Java RDFa parsers.

+3


source to share


Have a look at Damian java-rdfa . You can use it with Apache Jena , here is the code snippet:

Class.forName("net.rootdev.javardfa.RDFaReader");
Model model = ...
model.read(url, "XHTML"); // xml parsing
model.read(other, "HTML"); // html parsing

      



Another option in Java is Apache Any23 .

+2


source to share


Parsing RDFa in PHP: https://github.com/njh/easyrdf/ (use 0.8 / master branch to have an RDFa parser)

Parsing RDFa in Java: http://semarglproject.org/

0


source to share


You cannot separate RDF from HTML as RDF provides additional information about things in HTML.

It would be like taking the footnotes and bibliography out of the book and throwing the book away: mostly pointless.

-five


source to share







All Articles
Loading...
X
Show
Funny
Dev
Pics