XQuery Full Text Search with Word1 and NOT Word2

Question

XQuery Full Text Search with Word1 and NOT Word2

Following is the XML structure -

<Docs>
  <Doc>
    <Name>Doc 1</Name>
    <Notes>
        <specialNote>
          This is a special note section. 
           <B>This B Tag is used for highlighting any text and is optional</B>        
           <U>This U Tag will underline any text and is optional</U>        
           <I>This I Tag is used for highlighting any text and is optional</I>        
        </specialNote>      
        <generalNote>
           <P>
            This will store the general notes and might have number of paragraphs. This is para no 1. NO Child Tags here         
           </P>
           <P>
            This is para no 2            
           </P>  
        </generalNote>      
    </Notes>  
    <Desc>
        <P>
          This is used for Description and might have number of paragraphs. Here too, there will be B, U and I Tags for highlighting the description text and are optional
          <B>Bold</B>
          <I>Italic</I>
          <U>Underline</U>
        </P>
        <P>
          This is description para no 2 with I and U Tags
          <I>Italic</I>
          <U>Underline</U>
        </P>      
    </Desc>
</Doc>

There will be 1000 tags Doc

. I want to give the user search criteria where he can search WORD1

and NOT WORD2

. Following is the request -

for $x in doc('Documents')/Docs/Doc[Notes/specialNote/text() contains text 'Tom' 
ftand  ftnot 'jerry' or 
Notes/specialNote/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/specialNote/B/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/specialNote/I/text() contains text 'Tom' ftand ftnot 'jerry' or 
Notes/specialNote/U/text() contains text 'Tom' ftand ftnot 'jerry' or
Notes/generalNote/P/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/B/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/I/text() contains text 'Tom' ftand ftnot 'jerry' or 
Desc/P/U/text() contains text 'Tom' ftand ftnot 'jerry']
return $x/Name

The result of this query is incorrect. I mean the result contains some document with Tom

and jerry

. So I changed the request to -

for $x in doc('Documents')/Docs/Doc[. contains text 'Tom' ftand ftnot 'jerry'] 
return $x/Name

This query gives me the exact result, i.e. only those documents with Tom

and Not jerry

, BUT GOING ON A HUGE TIME ... Approx. 45 seconds, while the previous one took 10 seconds!

I am using XML BaseX 7.5 database.

We need expert comments on this :)

+3

full-text-search basex xquery-3.0

John Feb 19 At 10:26

source to share

1 answer

Leo Wörteler · Accepted Answer · 2013-02-19T11:24:39+0000

The first request checks each text node in the document separately, so it Tom and Jerry

will match because the first text node contains Tom, but not Jerry.

The second query performs a full-text search on the entire text content of the elements Doc

as if they were concatenated into one string. There is no answer to this ( BaseX full-text index which indexes each text node separately.

The solution would be to do a full text search for each term separately and merge the results at the end. This can be done for each text node separately, so an index can be used:

for $x in (doc('Documents')/Docs/Doc[.//text() contains text 'Tom']
            except doc('Documents')/Docs/Doc[.//text() contains text 'Jerry'])
return $x/Name

The above query is rewritten by the query optimizer to this equivalent using two index calls:

for $x in (db:fulltext("Documents", "Tom")/ancestor::*:Doc
            except db:fulltext("Documents", "Jerry")/ancestor::*:Doc)
return $x/Name

You can even customize the order in which you combine the results so that the intermediate results are small if you want.

XQuery Full Text Search with Word1 and NOT Word2

More articles: