Previous topic  Top  Next topic  Print this Topic
 

Information about _queryIndex/10

 

The queryIndex/10 built-in executes a search in a Lucene full-text index. Typically this index is an automatically managed index of the OntoBroker, but external indexes can also be queried. The built-in has 10 arguments, the first five must be bound.

Argument

Bound/
Free

Description

<module>

b

The module whose index should be queried

<option list>

b

List of optional parameter (see list below for details)

<lucene query text>

b

The query string (Lucene query syntax)
see http://lucene.apache.org/java/2_9_3/queryparsersyntax.html for details about the query syntax.

<offset>

b

Index of first hit to return (starts with 0)

<limit>

b

Maximal number of hits to return

<object>

f

Term for object hit

<total count>

f

Total count of hits

<score>

f

Lucene ranking for the hit

<order>

f

Order number to sort the hits in the correct order

<optional output list>

f

Contents depends on <option list>

The <option list> parameter consists of a list of optional parameter. If no optional parameters should be specified, use the empty list, i.e. []

Supported optional parameter for <option list> argument:

return(<field>)
The content of the field is returned for the hit in the <optional output list> variable. Note that the field must be defined as “stored” in the Lucene index, otherwise nothing is returned.

Example:

?- _queryIndex(module1, [return("name_en")], "name_en:foo", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).

 
For every “name_en” field of a hit document in the Lucene index, the <optional output list> will contain an item name_en("content of field")

stringmetric(<metric>)

Sets the string metric to be used for the fuzzy search. If this option is not set, the default value is used, either explicitly specified by the property “defaultStringMetric” in the fulltextindex-config.xml or if this is also not set, the string metric “Jaro” is used.

Supported string metric values are:
"Levenstein", "MongeElkan", "NeedlemanWunch", "QGrams", "Jaro",

"JaroScaled", "JaroWinkler", "DamerauLevenshtein", "DamerauLevenshteinScaled", "MaxJaroDamerauLevenshteinScaled", "DamerauLevenshteinSoundex", "Jaccard", "Soundex", "SmithWaterman"

Side remark:
You can use the built-in distance2 to see how two strings compare using one of these string metrics, e.g.

?- _distance2("Jaro", "good", "food", 0, ?X).

?X will return a similarity value (between 0 and 1.0), here 0.833

If you perform a fuzzy search with the string metric Jaro, e.g. Lucene query text "good~0.8", this will match "food", as 0.833 is >= 0.8

Example:
?- _queryIndex(module1, [stringmetric("Jaro")], "name_en:good~0.8", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).
 

Use string metric "Jaro" for fuzzy search

includeall

Includes all imported modules of <module> (first argument) in the search.

Example:
?- _queryIndex(module1, [includeAll], "foo", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).

defaultfield(<field>)        

Sets the default field to be used for search terms whose field is not explicitly given. E.g. if you have the Lucene query text "all:foo bar", the search term "foo" is searched in the field "all" and bar is searched in the default field.

If the default field is not set in the option, the default field specified in the fulltextindex-schema.xml (tag defaultSearchField) is used. If this is also not set, the default field is "all".

Example:

?- _queryIndex(module1, [defaultfield("name_en")], "foo", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).

Use field "name_en" as default field

solrparam(<name>,<value>)

Sets additional Apache Solr parameters. The queryIndex built-in uses also the core of Apache Solr on top of Lucene. With this option you can set one or multiple parameters for this layer.

Example:
?- _queryIndex(module1, [solrparam("hl", "true"), solrparam("hl.fl","name_en"),
solrparam("hl.snippets", "2"),
solrparam("hl.fragsize", "200")")], "foo", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).

These parameters enable the Solr highlighting. Please note that only stored fields can be used for highlighting. More details about the Solr parameter for highlighting can be found here:
http://wiki.apache.org/solr/HighlightingParameters

externalindexesonly

If this option is set, the module in the first argument is ignored. Note that in this case the option externalindex(<path>) must be set.

externalindex(<path>)

Adds one or multiple external Lucene indexes for the search. Note that the used fields must nonetheless be defined in the fulltextindex-config.xml.

Example:

?- _queryIndex(dummy, [externalindexesonly, externalindex("d:/index1"),externalindex("d:/index2")], "foo", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).
Includes the Lucene indexes located in the directory d:\index1 and d:\index2

Extended syntax for <Lucene query text> argument
You can use the Solr query syntax extensions in the <Lucene query text> argument. This allows using customized query parsers to add new functionality to the search. A customized query parser is specified by starting the query text with “{!parsername param1=value1 param2=value2}”. Here parsername is the name of the query parser, param1, value1, param2, value2 are sample parameter/value pairs.

OntoBroker currently supports two extended query parsers: lucene and multifield

lucene

This is a normal Lucene query which some additional parameters specified directly in the query text.

Parameter

Description

q.op

Default operator (either AND or OR). The standard default operator is AND

df

Default field (see above)

stringMetric

String metric (see above)

sort

Sort results, e.g. sort='id desc'

Important restriction:

Sorting can be done on the "score" of the document, or on any multiValued="false" indexed="true" field provided that field is either non-tokenized (ie: has no Analyzer) or uses an Analyzer that only produces a single Term (ie: uses the KeywordTokenizer)
See for more details:
http://wiki.apache.org/solr/CommonQueryParameters

Example:

?- _queryIndex(module1, [], "{!lucene df=name_en q.op=OR sort='id asc'} foo bar", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).
Queries for "foo OR bar" in the default field "name_en" and sorting the results ascending by the field id.

multifield

This query parser searches for the search terms in multiple default fields. It supports the same parameters as the lucene query parser plus additionally:

Parameter

Description

fields

Fields to search, e.g. fields='name_en^2 docu_en'. Here a hit in the field name_en is boosted additionally by a factor 2

Example:

?- _queryIndex(module1, [return(name_en),return(docu_en)], "{!multifield fields='name_en docu_en' q.op=OR sort='id asc'} foo bar", 0, 10, ?Obj, ?Tc, ?Sc, ?Order, ?Opt).
Queries for "foo OR bar" in the fields "name_en" and "docu_en" and sorting the results ascending by the field id and returns fields "name_en" and "docu_en".

Available fields for objects in modules

If the fulltext indexing is enabled, OntoBroker creates index entries for every ObjectLogic object which is used in the given module as concept, instance, attribute or relation. This means hits are always to the indexed ObjectLogic object, whose term is returned in the <object> parameter.

Side remark

Fulltext indexing is enabled by the OntoConfig.prp parameter, e.g.

FullTextIndex  = on

The fields in the full-text index are defined in the fulltextindex-config.xml and fulltextindex-schema.xml. (see section “Fulltext indexer settings” in the OntoBroker Manual Appendix for details).

As a default, the following fields are filled for every object:

Field

Stored

Indexed

Description

id

yes

yes (untokenized)

This field stores the untokenized ObjectLogic term representation of the object.

lid

yes

yes

Field for indexing the localname of the ObjectLogic object term (for terms which are not IRI this is the same as the id)

type

yes

yes

This field contains the types of the object:

i = Instance

c = Concept

a = Attribute specification

r = Relation specification

p = Property specification

u = Rule

q = Query

t = Constraint

assertedisa

yes

yes

For instances this field contains the ids of its concepts

repr_de

repr_en

yes

yes

Contains the language-dependent label for a given object

docu_de

docu_en

no

yes

Contains the language-dependent documentation for a given object

syn_de
syn_en

yes

yes

Contains language-dependent synonyms

name_de

name_en

...

yes

yes

This field contains the label and the synonyms in the given language. By default the indexer only creates fields for the languages “de” and “en”.

If this field is returned (e.g. <option list> = […,return(name_en),…]), the first line always contains the label.

syn

yes

yes

Contains all synonyms for all languages

all

no

yes

Contains all text of the fields lid, name_{lang}, docu_{lang}, attval, syn

axiomtext

yes

yes

Contains the rule text for rules, queries and constraints.

All fields which are indexed can be used in the query text. For all fields which are stored a “return” option can be specified.

Example:

?- _queryIndex(<http://company.com#onto1>,

[return(name_en),return(type),includeall], "+name_en:city

+type:i +assertedisa:\"<http://company.com#Region>\"", 0, 20,

?OBJ,?TC,?SCORE,?ORDER,?OPT).

This query searches for instances of <http://company.com#Region> in the module <http://company.com#onto1> whose English representation or synonym contains the word “city”.

Here are some more examples for valid Lucene query text:

all:city

searches in the field “all” for the word “city”

city

same as “all:city”

+name_en:village +type:i

searches for instances whose English representation or synonym contains the word “village”

id:"http://company.com#Project"

Searches for the object with the id <http://company.com#Project>