Previous topic  Top  Next topic  Print this Topic
 

Custom Index Fields

 

The indexer can now be customized with additional fields. Values for a defined set of attributes can be added to additional Lucene index fields and then used with the _queryIndex/10 built-in or for autocompletion.

Configuration

If the full-text index engine is enabled, i.e. the parameter "FullTextIndex = on" is set in the OntoConfig.prp, OntoBroker creates an full-text index for every ontology.The fields and values can be configured with two files in the conf directory:

fulltextindex-config.xml

Contains the general options and the definition of the index fields.

fulltextindex-schema.xml

Contains the index schema definition, i.e. field types and analyzers to be used for indexing and querying.
See also Apache Solr documentation wiki.apache.org/solr/SchemaXml
In contrast to Apache Solr, the index fields are not defined in the schema definition, but in the fulltextindex-config.xml.

Also note that the full text index engine only works with extensional data, i.e. facts stored in the OntoBroker datamodel by import from ontology files, API insert/delete, materialization.

General Options

property

Description

schema

path to schema definition, relative to conf directoy

excludedOntologies

set of ontologies, identified by ontologyURI, which should not be full-text indexed

batchSize

number of objects processed in one batch during index creation

flushSize

number of objects processed before flushing to disk

waitTime

waiting time before starting processing invalidated objects

languages

languages to consider for representations, synonyms and custom fields with language pattern

booleanQueryMaxClauseCount

maximum number of Lucene boolean clauses

defaultQueryOperator

default query operator if no operator is specified. Default is OR, alternative is AND, but this has a major impact on the overall behavior of the search

defaultStringMetric

string metric used for fuzzy search if not explicitly specified. Possible values are e.g.

Jaro, Levenstein, JaroScaled, JaroWinkler, DamerauLevenshtein, DamerauLevenshteinScaled, MaxJaroDamerauLevenshteinScaled, Jaccard, Soundex, SmithWatermna

Standard index fields

With the default configuration, there are the following index fields available for every ontology:

Field name

Description

id

object id

lid

local name of object id

type

Object type: c = Concept, i = Instance, a = Attribute, r = Relation, u = Rule, q = Query, t = Constraint

assertedisa

all concept ids, this object id (instance) has an assertedisa fact,

i.e. ?- $assertedisa(obj,?V).

i.e. extensional ?- obj:?V.

axiomtext

text of rules, queries, and constraints

syn

local name, all representations and all synonyms (restricted to values of language property)

all

all text values (values from fields id, lid, axiomtext, syn)

repr_de, repr_en, ...

language-dependent representation,

i.e. extensional value of ?- obj[_representation(lang)->?V]

docu_de, docu_en, ...

language-dependent documentation,

i.e. extensional value of ?- obj[_documentation(lang)->?V]

syn_de, syn_en, ...

language-dependent synonyms, plus all language-independent synonyms

i.e. extensional value of ?- obj[_synonym(lang)->?V] OR obj[_synonym->?V]

name_de, name_en, ...

language-dependent names: representation (or localname as fall back if representation is not defined), synonyms

Custom index fields

You can add additional fields by defining a bean of class com.ontoprise.indexer.FullTextIndexField.
FullTextIndexField has the following properties:

property name

Description

fieldName

name of the index field. Attribute 'languageAware' (boolean): If true, the fieldName is expanded if it contains the languagePattern.

fieldType

name of field type as defined in fulltextindex-schema.xml

ontobrokerInternal

false, default=false: Flag to mark field to be important for various OntoBroker functionalities (SearchHelper, AutocompleteHelper)

attributeNames

set of property names whose value should be included in this field. If an attributeName contains the value of the property languagePattern, it is replaced by a concrete language code.

languagePattern

pattern to be replaced by the language values (see general options above). In this case this field definition is translated into multiple fields.

restrictedToInstancesOf

set with concept ids. The field is only created for an object if it is an asserted instance of one of the given concept ids

indexed

false: Should field be indexed? (overwrites value from field type)

tokenized

false: Should values be tokenized? (overwrites value from field type)

stored

false: Should value be stored in index,i.e. values can be retrieved on search? (overwrites value from field type)

binary

false: is this a binary field? (overwrites value from field type)

compressed

false: should stored text field be compressed? (overwrites value from field type)

omitNorms

false: advanced option: omit norms associated with this field (changes ranking). (overwrites value from field type)

omitTermFreqAndPositions

false: omit term freq, positions and payloads from postings for this field? (overwrites value from field type)

termVectors

false: expert field option (see Lucene documentation for details)

termPositions

false: expert field option (see Lucene documentation for details)

termOffsets

false: expert field option (see Lucene documentation for details)

Configuration example

  ...

 

 <bean class="com.ontoprise.indexer.FullTextIndexField">

     <property name="languagePattern" value="$LANGUAGE$" />

     <property name="fieldName" value="xname_$LANGUAGE$" />

     <property name="fieldType" value="otext" />

     <property name="stored" value="true" />

     <property name="attributeNames">

         <set>

             <value>&lt;http://my.namespace#name&gt;("$LANGUAGE$")</value>

         </set>

     </property>

 </bean>

  ...

In this example, one custom index field definition with a language pattern is defined. This means that multiple index fields are created, one for each language defined in the property languages (see global options above). Assuming that the defined languages are "de" and "en", the effective index fields are "xname_de" containing the values of the attribute <http://my.namespace#name>("de"), and "xname_en" containing the values of the attribute <http://my.namespace#name>("en"). The field type used here is "otext" , this is a standard field type used by many standard index fields of OntoBroker. The values are stored in the index for retrieval (this is only needed if it is used for autocompletion or if you want to use the return() option of the _queryIndex built-in)

Search example

To search in this field with _queryIndex/10, use the Lucene syntax for fields (here "xname_en:") in front of your search text.

?- _queryIndex(<http://your.company.com#ontology>, [return(xname_en)], "xname_en:Transp*", 0, 10, ?OBJ, ?TC, ?SCORE, ?ORDER, ?OPT).

Autocomplete example

For autocomplete you have to set the search field in the AutoComplete.

AutocompleteHelper helper = ...

Ontology ontology = ...

AutocompleteHelper.Type type = ...

AutocompleteHelper.Options options = new AutocompleteHelper.Options();

 

options.setSearchField("xname_en");

 

options.set...

 

CompletionResults results = helper.getCompletion(ontology, options, type, "Transp", 0, 10);

..