Enterprise Data Catalog Scanner Configuration Guide > Configuring Cloud Resources > Amazon Redshift
  

Amazon Redshift

Amazon Redshift is an Internet hosting service and data warehouse product. Amazon Redshift is part of Amazon Web Services, the cloud-computing platform offered by Amazon.

Objects Extracted

Permissions to Configure the Resource

If you create a new user, ensure that you configure read permission on the Amazon Redshift data source for the user account.

Prerequisites

Obtain JDBC driver file
Verify that you are using the correct version of the Amazon Redshift JDBC driver:
Copy the files to the RedShiftScanner directory in the <INFA_HOME>/services/CatalogService/ScannerJars/externalDependencies directory. You need not recycle the Catalog Service.
Note: Verify that ANTLR version 4.8.1 is included in the driver JAR file.
Alternatively, you can perform the following steps to complete the prerequisites:
  1. 1. Verify that you are using the correct version of the Amazon Redshift JDBC driver:
  2. Note: Verify that ANTLR version 4.8.1 is included in the driver JAR file.
  3. 2. Include the <Informatica Installation directory>/externaljdbcjars/<redshift-jdbc version>.jar file in the redshiftJars.zip file.
  4. 3. Copy the redshiftJars.zip file to the <INFA_HOME>/services/CatalogService/ScannerBinaries directory.
  5. 4. Open the <INFA_HOME>/services/CatalogService/ScannerBinaries/CustomDeployer/scannerDeployer.xml file and add the following lines in the file:
  6. <ExecutionContextProperty isLocationProperty="true" dependencyToUnpack="redshiftJars.zip"> <PropertyName>RedShiftScanner_DriverLocation</PropertyName> <PropertyValue>scanner_miti/RedShift/Drivers</PropertyValue> </ExecutionContextProperty
  7. 5. Save the scannerDeployer.xml file.
  8. 6. If you want to enable profiling for the Amazon Redshift resource, copy the downloaded Amazon Redshift JDBC driver file to the <INFA_HOME>/connectors/thirdparty/informatica.amazonredshift/common/ directory.
  9. 7. Recycle the Catalog Service.
Update the JDBC driver file
To replace the obsolete JDBC driver jar with latest driver jar, perform the following steps:
  1. 1. Open the <INFA_HOME>/services/CatalogService/ScannerBinaries/ directory.
  2. 2. Replace the redshift-jdbc41 driver jar with redshift-jdbc42 in the redshiftJars.zip file.
  3. 3. Recycle the Catalog Service.

Basic Information

The General tab includes the following basic information about the resource:
Information
Description
Name
The name of the resource.
Description
The description of the resource.
Resource type
The type of the resource.
Execute On
You can choose to execute on the default catalog server or offline.

Resource Connection Properties

The General tab includes the following properties:
Property
Description
User
The user name used to access the database.
Password
The password associated with the user name.
Host
Host name or IP address of Amazon Redshift service.
Port
Amazon Redshift server port number. Default is 5439.
Database
The name of the database instance.
The following image shows sample connection properties on the General tab:
The Metadata Load Settings tab includes the following properties:
Property
Description
Enable Source Metadata
Extracts metadata from the data source.
Import System Objects
Select this option to specify that the system objects must be imported.
Schema
Click Select... to specify the Amazon Redshift schemas that you want to import. You can use one of the following options from the Select Schema dialog box to import the schemas:
  • - Select from List: Use this option to select the required schemas from a list of available schemas.
  • - Select using regex: Provide an SQL regular expression to select schemas that match the expression.
S3 Bucket Name
Provide a valid Amazon S3 bucket name for the Amazon Redshift data source. You must provide this value if you want to enable profiling for Amazon Redshift. If you do not want to enable profiling, retain the default value. Bucket name should use the access key or private key specified in DIS connection.
Case Sensitive
Specifies that the resource is configured for case insensitivity. Select one of the following values:
  • - True. Select this check box to specify that the resource is configured as case sensitive.
  • - False. Clear this check box to specify that the resource is configured as case insensitive.
The default value is False.
Memory
The memory required to run the scanner job. Select one of the following values based on the data set size imported:
  • - Low
  • - Medium
  • - High
Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal
Custom Options
JVM parameters that you can set to configure the scanner container. Use the following arguments to configure the parameters:
  • - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of a scanner to values, such as DEBUG, ERROR, or INFO. Default value is INFO.
  • - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value should be a number.
  • - -Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the Yarn environment. Use a comma to separate the key pair value.
  • - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. The default value is 1.
  • - -DenableDirectRead=true. Enables direct reading of profiling information from an Amazon Redshift data source by avoiding the staging phase during the resource scan.
  • Note: Enable the option only if direct read is enabled in the Data Integration Service.
Track Data Source Changes
View metadata source change notifications in Enterprise Data Catalog.
Agent Options
Specify the Enterprise Data Catalog Agent options to run the scanner job.
You can enable data discovery for an Amazon Redshift resource. For more information, see the Enable Data Discovery topic.
Note: Effective in version 10.5.2.1, you can run a profile and perform data domain discovery on external tables for an Amazon Redshift resource.
You can enable composite data domain discovery for an Amazon Redshift resource. For more information, see the Composite Data Domain Discovery topic.