Enterprise Data Catalog Scanner Configuration Guide > Configuring Cloud Resources > AWS Glue
  

AWS Glue

Amazon Web Services Glue is an ETL service of the Amazon Web Services ecosystem that uses data moved across different stores. Glue captures the metadata of multiple data stores that are part of the Amazon Web Services ecosystem.

Objects Extracted

The Glue resource extracts metadata from the following assets in a Glue data source:

Permissions to Configure the Resource

To access Glue, make sure that you perform one of the following steps before you configure the resource:

Connect to a Glue Data Source Enabled for SSL

To connect to a Glue data source enabled for SSL, perform the following steps:
  1. 1. Download the Glue SSL certificates using a web browser.
  2. Note: Make sure that you import the Glue Trust Services certificate in the Certificates directory.
  3. 2. Copy the certificates to the <INFA_HOME>/services/shared/security/ directory.
  4. 3. Go to the <INFA_HOME>/source/java/jre/bin directory and then run the following keytool command to import each copied certificate as a trusted certificate in to the Informatica domain keystore:
  5. keytool -import -file <INFA_HOME>/services/shared/security/<certificate>.cer -alias <alias name> -keystore <INFA_HOME>/services/shared/security/infa_truststore.jks -storepass <Informatica domain keystore password>

Basic Information

The General tab includes the following basic information about the resource:
Information
Description
Name
The name of the resource.
Description
The description of the resource.
Resource type
The type of the resource.
Execute On
You can choose to execute on the default catalog server or offline.

Resource Connection Properties

The General tab includes the following properties:
Property
Description
Role-based Authentication
Option to use the Amazon Elastic Compute Cloud (Amazon EC2) instance profile credentials when Enterprise Data Catalog is installed on an Amazon EC2 instance.
AWS Access Key
Access Key of the Amazon Web Services account.
AWS Secret Key
Secret key of the Amazon Web Services account.
AWS Region
Amazon Web Services region from where you want to scan the Glue Catalog.
The following table describes the properties that you can configure in the Source Metadata section of the Metadata Load Settings tab:
Property
Description
Enable Source Metadata
Enables metadata extraction
Database Filter
Filter that enables you to include or exclude databases in the resource run. You can also specify a regular expression that represents databases you want to include or exclude.
Table Filter
Filter that enables you to enter a suitable combination of regular expression and wildcard characters to include or exclude specific assets in the resource run that match the regular expression format. You can also enter table names to include them in the resource run. Use a semicolon to separate the wildcard patterns and table names.
Enable Reference Resources
Option to extract metadata about assets that are not included in this resource, but referred to in the resource. Examples include source and target tables in PowerCenter mappings, and source tables and files from Tableau reports.
Create Athena Resources
Indicates whether or not to create an Athena data source.
Retain Unresolved Reference Assets
Option to retain unresolved reference assets in the catalog after you assign connections. Retaining unresolved reference assets help you view the complete lineage. The unresolved assets include deleted files, temporary tables, and other assets that are not present in the primary resource.
Auto Assign Connections
Indicates whether the connections must be assigned automatically.
Memory
The memory value required to run a scanner job.
Specify one of the following memory values:
  • - Low
  • - Medium
  • - High
Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal
Custom Options
JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters:
  • - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to values, such as DEBUG, INFO, or ERROR. Default value is INFO.
  • - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value must be a number.
  • - -Dscanner.yarn.app.environment=<key=value>. Key value pair that you need to set in the Yarn environment. Use a comma to separate the multiple key value pairs.
  • - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. The default value is 1.
Track Data Source Changes
View metadata source change notifications in Enterprise Data Catalog.