Presto 330 Documentation

6.22. Redshift Connector

6.22. Redshift Connector

The Redshift connector allows querying and creating tables in an external Amazon Redshift cluster. This can be used to join data between different systems like Redshift and Hive, or between two different Redshift clusters.

Configuration

To configure the Redshift connector, create a catalog properties file in etc/catalog named, for example, redshift.properties, to mount the Redshift connector as the redshift catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:

connector.name=redshift
connection-url=jdbc:postgresql://example.net:5439/database
connection-user=root
connection-password=secret
scan-cache-enabled=true

Transparent scan cache

Transparent scan cache can be enabled to cache data scans from Redshift to in memory Ampool connector. On subsequent access, Redshift data scan will be replaced by cached Ampool table scan, improving performance. Cache invalidation mechanism deletes cached tables based on data change and Ampool resource consumption. Transparent scan cache can be enabled by setting ‘scan-cache-enabled=true’ in Redshift catalog properties file. Also, there is session property ‘query_scancaching_enabled’ to enable/disable transparent scan cache for a particular session. Note that both, properties file cache property and session cache property must be ‘true’ to use transparent cache.

Multiple Redshift Databases or Clusters

The Redshift connector can only access a single database within a Redshift cluster. Thus, if you have multiple Redshift databases, or want to connect to multiple Redshift clusters, you must configure multiple instances of the Redshift connector.

To add another catalog, simply add another properties file to etc/catalog with a different name, making sure it ends in .properties. For example, if you name the property file sales.properties, Presto creates a catalog named sales using the configured connector.

Querying Redshift

The Redshift connector provides a schema for every Redshift schema. You can see the available Redshift schemas by running SHOW SCHEMAS:

SHOW SCHEMAS FROM redshift;

If you have a Redshift schema named web, you can view the tables in this schema by running SHOW TABLES:

SHOW TABLES FROM redshift.web;

You can see a list of the columns in the clicks table in the web database using either of the following:

DESCRIBE redshift.web.clicks;
SHOW COLUMNS FROM redshift.web.clicks;

Finally, you can access the clicks table in the web schema:

SELECT * FROM redshift.web.clicks;

If you used a different name for your catalog properties file, use that catalog name instead of redshift in the above examples.

Redshift Connector Limitations

The following SQL statements are not yet supported: