Previous topic  Top  Next topic  Print this Topic
 

Data Integration

 

Besides serving as a common communication language and representing expert knowledge, ontologies serve as an integration means of different legacy systems. The ontology is used to reinterpret given information sources in a common language and thus provide a common and single view of different data sources. An ontology can collect different sources and integrate them in a common logical model. This goes far beyond just building connectors between applications. The goal of integration is to consolidate distributed information intelligently, free of redundancy and providing users and applications with simple access to information without considering the underlying data structure or system.

 

Supported external Databases:

Microsoft SQL Server 2000, 2005 and 2008
Oracle 10g and 11g
Oracle TimesTen in-memory database 11g
Oracle TimesTen in-memory database cache 11g
DB2 9.0 and 9.5
other JDBC drivers can be embedded by configuration

Creation of database mapping rules

Database Schema Import in OntoStudio

Database mapping rules will be created automatically when using the database schema import.

SQLExecute

You can create your own db mapping rules, using the sqlexecute predicate:

?X:"Person"@_defaultModule :- _sqlexecute("SELECT colname FROM tblname",[?X],"mydbconnection").

 

NOTE: There must be a fact stored in an loaded ontology defining the connection details of "mydbconnection", e.g:

_dbaccessuserdata("mydbconnection", "oracle","databasename","localhost:1521","user","pwd").

For more information see the built-in “_dbaccessuser”.

Database User Management and Database Configuration

For external database access you will need read access to the tables and read & write access to the relevant database for creating temporary tables.
The conversion of data types in a database can be different from the data type conversion of OntoBroker. Of course it can happen that when you execute built-ins like string2number on the database, the result differs from the OntoBroker result. When non-trivial queries involving rules with _dbaccessuser built-ins are executed, it can be necessary to create temporary tables for storing intermediate results. Therefore your connection needs permissions to create and delete temporary tables and insert facts in these tables.

MS SQL Server

The following user privileges are required:

Select
Insert
Create table
Drop table

The tempdb database is used. It is necessary that the database and the tempdb have the same collation.

Oracle

The following user privileges are required:

Select
Insert
Create table
Drop table

If you don´t have write access to the schema, proceed in the following way:

1.Create a new user.
2.Grant writing access to this new user on his own schema.
3.Grant just read access to this user on the schema with data.
4.Login with this new user when doing a DB schema import in OntoStudio.
5.Choose the schema with data and import the tables.

DB2

The following user privileges are required:

Select
Insert
Create table
Drop table
Use on user temporary tablespace (for Version DB2 8)

You need a user temporary tablespace for Version DB2 8.

Optimization

ShrinkDBAccess, SQLRewriter

Access to the database will be optimized whilst executing a query: _dbaccessuser/3 literals are rewritten in a way so that only the required columns are selected. Furthermore, the dbaccessuser literals are combined with some built-ins in a SQL query so that there are some restrictions.

Example:

?- _dbaccessuser("projects",F(id,VAR1,project,VAR1),"mydbconnection") and equal(VAR1,"myproject").

will be rewritten

?- _sqlexecute("select id, project from projects where project =’myproject’",[?VAR1,?VAR2],"mydbconnection").

It is also possible to combine multiple _dbaccessuser/3 literals. _sqlexecute/3 literals are not optimized.

You may encounter performance problems when dbaccessuser literals are not executed correctly (e.g. needless columns are selected, dbaccessuser literals are executed without restrictions). For a detailed analysis you can turn on the SQL tracing.

You may also try out if changing some of the following switches significantly improves the performance:

Other evaluation methods
Use namesground switches: CNG, ANG, MNG on/off.
Using StrongUnfolder (can also be used if you only use a couple of rules)

Evaluation method DynamicFiltering2 with ANG=on is often a good choice.

Null Values

The access of external databases with the dbaccessuser built-in may result in null values from the database. A built-in with a null value as an argument is assumed to be false. As a consequence the built-ins are not called as soon as a null value occurs in its arguments.

SuppressNull

NOTE: there was a bugfix on SuppressNull! This bugfix can change the behavior of old ontologies with db mapping!

OntoConfig Option:

SuppressNull = on

Default value is on.

Example:

Table "test" has following rows:

ID         NAME

1         a

2         null

Rule:

p(?X) :- _dbaccessuser("test", F(ID,?X,NAME,?Y), "connectiondatasource").

Result before the bugfix:

?X

1

2

Result after the bugfix:

?X

1

The second row (2, null) is not selected now because the NAME is null. For getting the old result you should change the rule as follows:

p(?X) :- _dbaccessuser("tblname", F(ID,?X), "connectiondatasource").

Query Option Fillnull

Example:

@{q,options[fillNull]} ?- ?X:Person AND ?X[hasForename->?Y].

 

Result:

SuppressNull = off

SuppressNull = on

1,B

2,null

1,B

2,null

For more information on fillnull, see the accordant chapter.

Changing the Connection Data

You can change the connectionsdata by editing dbaccessuserdata_ fact:

_dbaccessuserdata("key","oracle","databasename","host:port","user","password").

For more information on data integration, see the OntoStudio Manual -> Import -> Import.

Connector for tuple cache

Connector Cache Options for OntoConfig

Connector.Database.Cache.Lifetime = query | unlimited
Connector.SPARQL.Cache.Lifetime = query | unlimited
Connector.OntoBroker.Cache.Lifetime = query | unlimited
Connector.Excel.Cache.Lifetime = query | unlimited
Connector.WebService.Cache.Lifetime = query | unlimited

query - tuple cache is removed after the query execution. This is the default value for this option for all connectors in OntoStudio and for the WebService connector in Ontobroker.

unlimited - tuple cache is not removed after the query execution and can be used for the next query. This option is better for performance and should always be used if a database won't be updated a lot. The cache update should be done manually (see "Clear connector cache command"). This is the default value for this option in OntoBroker for all of the connectors except for the WebService connector.

Connector cache clear command

clear_connector_cache all | cache key

The command cleans a connector cache. This command can be executed only if Ontobroker is started with the option Connector.*.Cache.Lifetime = unlimited.

all - all tuples will be deleted from the cache.

cache key - only tuples for the cache key will be deleted. The cache key is a map term and can contain different values for different connectors.

Examples:

Database connector:

clear_connector_cache [_host->"data",_port->1521,_database->"orcl11",_user->"testuser",_table->"tblname"]

The table is optional; if no host is defined, the localhost will be used.

MergeImports

If the MergeImports parameter is set to "on" in the OntoConfig.prp, imported ontologies are directly merged into the importing root ontology. This means that the imported ontologies are not available as separate ontologies as for OntoStudio.

Example

To activate the MergeImports switch, set in the OntoConfig.prp

MergeImports = on

Assume that you have three ontologies A,B, and C. A imports B and B imports C. Then, after loading you have only one ontology A' (consisting of rules and facts from A,B, and C).

NOTE 1: The flag influences only the loading process on OntoBroker start (or reload command).

NOTE 2: The MergeImports flag cannot be used together with project files.