Connecting to an Exasol database is the simplest way to get started with Virtual Schemas. You don't have to install any JDBC driver, because it is already installed in the Exasol database and also included in the JAR of the JDBC adapter.
Upload the latest available release of Exasol Virtual Schema to Bucket FS.
Then create a schema to hold the adapter script.
CREATE SCHEMA SCHEMA_FOR_VS_SCRIPT;
The SQL statement below creates the adapter script, defines the Java class that serves as entry point and tells the UDF framework where to find the libraries (JAR files) for Virtual Schema and database driver.
CREATE JAVA ADAPTER SCRIPT SCHEMA_FOR_VS_SCRIPT.ADAPTER_SCRIPT_EXASOL AS
%scriptclass com.exasol.adapter.RequestDispatcher;
%jar /buckets/<BFS service>/<bucket>/virtual-schema-dist-12.0.0-exasol-8.0.0.jar;
/
Define the connection to the other Exasol cluster as shown below:
CREATE CONNECTION JDBC_CONNECTION
TO 'jdbc:exa:<host>:<port>'
USER '<user>'
IDENTIFIED BY '<password>';
For connecting via TLS you can specify the certificate's fingerprint in the JDBC URL like this:
CREATE CONNECTION JDBC_CONNECTION
TO 'jdbc:exa:<host>/<fingerprint>:<port>'
USER '<user>'
IDENTIFIED BY '<password>';
You can learn more about defining named connections in the Exasol online handbook.
You have three options to pick from when connecting to an Exasol instance or cluster. The options are explained below.
Exasol provides the faster and parallel IMPORT FROM EXA
command for loading data from another Exasol instance. You can tell the adapter to use this command instead of IMPORT FROM JDBC
by setting the IMPORT_FROM_EXA
property.
In this case you have to provide the additional EXA_CONNECTION
which contains the name of the connection definition used for the internally used IMPORT FROM EXA
command.
That means you will have two named connections: a JDBC connection and an EXA connection. The Virtual Schema adapter uses the JDBC connection for reading metadata. The EXA connection is used by the EXALoader that runs the IMPORT
statement.
Please refer to the CREATE CONNECTION documentation for more details about how to define an EXA connection.
CREATE CONNECTION EXA_CONNECTION
TO '<host-or-list>:<port>'
USER '<user>'
IDENTIFIED BY '<password>'
With Exasol 7.1.0 and later you can specify the TLS certificate's fingerprint:
CREATE CONNECTION EXA_CONNECTION
TO '<host-or-list>/<fingerprint>:<port>'
USER '<user>'
IDENTIFIED BY '<password>'
CREATE VIRTUAL SCHEMA VIRTUAL_EXASOL
USING SCHEMA_FOR_VS_SCRIPT.ADAPTER_SCRIPT_EXASOL WITH
CONNECTION_NAME = 'JDBC_CONNECTION'
SCHEMA_NAME = '<schema name>'
IMPORT_FROM_EXA = 'true'
EXA_CONNECTION = 'EXA_CONNECTION';
Unlike for a JDBC connection IMPORT FROM EXA
does not use an explicit datatype mapping. In consequence columns of type HASHTYPE
are mapped to VARCHAR
and joining such a column therefore failed in Exasol 7.1 with error message Feature not supported: Incomparable Types: VARCHAR(32) UTF8 and HASHTYPE(16 BYTE)!
.
Exasol Virtual Schema in version 7.2.0 and later mitigates this by offering parameter GENERATE_JDBC_DATATYPE_MAPPING_FOR_EXA
with values true
and false
(default):
CREATE VIRTUAL SCHEMA VIRTUAL_EXASOL
USING SCHEMA_FOR_VS_SCRIPT.ADAPTER_SCRIPT_EXASOL WITH
CONNECTION_NAME = 'JDBC_CONNECTION'
SCHEMA_NAME = '<schema name>'
IMPORT_FROM_EXA = 'true'
EXA_CONNECTION = 'EXA_CONNECTION'
GENERATE_JDBC_DATATYPE_MAPPING_FOR_EXA = 'true';
This will add explicit datatype mapping to the generated command when using IMPORT FROM EXA
.
Example for the generated pushdown query with GENERATE_JDBC_DATATYPE_MAPPING_FOR_EXA = 'false'
(default):
IMPORT FROM EXA AT "EXA_CONNECTION" STATEMENT '...'
Pushdown query with GENERATE_JDBC_DATATYPE_MAPPING_FOR_EXA = 'true'
:
IMPORT INTO (c1 DECIMAL(36,1), c2 .... ) FROM EXA AT "EXA_CONNECTION" STATEMENT '...'
In case you run into a Data type mismatch
issue which looks like this:
Adapter generated invalid pushdown query for virtual table <TABLENAME>: Data type mismatch in column number <COLUMN NUMBER> (1-indexed).Expected <EXPECTED IMPORT TYPE>, but got <IMPORT TYPE>.
You can set the datatype mapping to true: GENERATE_JDBC_DATATYPE_MAPPING_FOR_EXA = 'true'
.
This will usually solve the issue by providing type hints.
You can alternatively use a regular JDBC connection for the IMPORT
. Note that this option is slower because it lacks the parallelization the IMPORT FROM EXA
variant.
CREATE VIRTUAL SCHEMA <virtual schema name>
USING SCHEMA_FOR_VS_SCRIPT.ADAPTER_SCRIPT_EXASOL WITH
CONNECTION_NAME = 'JDBC_CONNECTION'
SCHEMA_NAME = '<schema name>';
If the data source is the same Exasol instance or cluster Virtual Schemas runs on, then the best possible connection type is a so called "local" connection.
Add the following parameter to CREATE VIRTUAL SCHEMA
:
IS_LOCAL = 'true'
The IS_LOCAL
parameter provides an additional speed-up in this particular use case.
The way this works is that Virtual Schema generates a regular SELECT
statement instead of an IMPORT
statement.
And that SELECT
can be directly executed by the core database, whereas the IMPORT
statement takes a detour via the ExaLoader.
Important: Please note that since the generated SELECT
command runs with the permissions of the owner of the Virtual Schema, that user must have privileges to access what you plan to select!
IMPORT
statements use a connection definition which allows connecting with a different user account. Generated SELECT
statements do not open additional connections (hence the "local" moniker) so they inherit the context of the Virtual Schema query they are executed in — including permissions.
CREATE VIRTUAL SCHEMA <virtual schema name>
USING SCHEMA_FOR_VS_SCRIPT.ADAPTER_SCRIPT_EXASOL WITH
CONNECTION_NAME = 'JDBC_CONNECTION'
SCHEMA_NAME = '<schema name>'
IS_LOCAL = 'true';
Note that you still need to provide a JDBC connection. This is used by the Virtual Schema adapter internally. It is not used for mass data transfer though. And that is where the performance gain comes from.
With version 7.1 Exasol introduced TLS encryption on the database port. Other ports were TLS capable before that already.
To establish a TLS connection with an Exasol Virtual Schema, you must use Exasol Virtual Schema 5.0.4 or later and connect to a compatible Exasol server version (see section "Exasol Server Versions and TLS Support").
The reason why you need the Exasol Virtual Schema 5.0.4 or later is that with this version we built-in a JDBC driver that is TLS-capable.
Check the JDBC driver documentation in our online handbook for details.
The following Exasol Server Versions support TLS connections via JDBC:
- 6.2.15 and later 6.2.x versions
- 7.0.10 and later 7.0.x versions
- All versions from 7.1.x on
Older versions (6.2.0 … 6.2.14 and 7.0.0 … 7.0.9) automatically fall back to legacy encryption. These versions are outdated anyway. If you still have one of them running, you should still disable TLS support explicitly by specifying parameter legacyencryption=1
in the JDBC URL (see details about supported driver properties). This is a visual reminder that TLS won't work with those versions.
If you are interested, you can find even more detailed information in our roadmap ticket "TLS for all Exasol Drivers".
If you want to connect to a cluster running Exasol 7.0.x or earlier with a recent Exasol Virtual Schema (5.0.4 or later), you should explicitly disable TLS on in the JDBC connection the Virtual Schema uses, otherwise the driver will attempt to establish a TLS connection to the server that does not support it and the connection attempt will fail.
The JDBC driver property legacyencryption
switches between TLS and the encryption scheme of older Exasol versions. Set it to 1
to disable TLS.
Example:
CREATE CONNECTION LEGACY_JDBC_CONNECTION
TO 'jdbc:exa:<host>:<port>;legacyencryption=1'
USER '<user>'
IDENTIFIED BY '<password>';
The Exasol SQL dialect supports all capabilities that are supported by the virtual schema framework.
Using literals and constant expressions with TIMESTAMP WITH LOCAL TIME ZONE
data type in Virtual Schemas can produce incorrect results.
- We recommend using
TIMESTAMP
instead. - If you are willing to take the risk and want to use
TIMESTAMP WITH LOCAL TIME ZONE
anyway, please, create a Virtual Schema with the following additional propertyIGNORE_ERRORS = 'TIMESTAMP_WITH_LOCAL_TIME_ZONE_USAGE'
. - We also recommend to set Exasol system
time_zone
to UTC while working withTIMESTAMP WITH LOCAL TIME ZONE
.
Clause ORDER BY
can be used without limitations when using local connection.
Connections EXA and JDBC are using IMPORT
which only supports unordered data transfer. Therefore the outermost order of the imported result rows is not guaranteed.
If you need ordering then please
- apply the ordering on top-level in the Exasol target database outside the virtual schema,
- use a sub-query to access the virtual schema and
- annotate the sub-query with
ORDER BY FALSE
to prevent push down of the top-levelORDER BY
, e.g.
SELECT * FROM (<virtual-schema-query> ORDER BY FALSE) ORDER BY <criteria> [, ...]
See also General Known Limitations in official Exasol documentation on Virtual Schemas.