Property | Description |
---|---|
Name | Name of the task. |
Description | Optional. Description of the task. |
Connection Name | Name of the cloud provisioning configuration to use with the workflow. |
Connection Type | Choose one of the following Hadoop distributions:
Default is Amazon EMR |
Property | Description |
---|---|
Cluster Name | Name of the cluster to create. |
Release Version | EMR version to run on the cluster. Enter the AWS version tag string to designate the version. For example: emr-5.8.0 Default is Latest version supported. |
Connection Name | Name of the Hadoop connection that you configured for use with the cluster workflow. |
S3 Log URI | Optional. S3 location of logs for cluster creation. Format: s3://<bucket name>/<folder name> If you do not supply a location, no cluster logs will be stored. |
Property | Description |
---|---|
Master Instance Type | Master node EC2 instance type. You can specify any available EC2 instance type. Default is m4.4xlarge. |
Master Instance Maximum Spot Price | Maximum spot price for the master node. Setting this property changes the purchasing option of the master instance group to Spot instead of On-demand. |
Property | Description |
---|---|
Core Instance Type | Core node EC2 instance type. You can specify any available EC2 instance type. Default is m4.4xlarge. |
Core Instance Count | Number of core EC2 instances to create in the cluster. Default is 2. |
Core Instance Maximum Spot Price | Maximum spot price for core nodes. Setting this property changes the purchasing option of the core instance group to Spot instead of On-demand. |
Core Auto-Scaling Policy | Optional. Auto-scaling policy for core instances. Type the policy JSON statement here, or provide a path to a file that contains a JSON statement. Format: file:\\<path_to_policy_config_file> |
Property | Description |
---|---|
Task Instance Type | Task node EC2 instance type. You can specify any available EC2 instance type. Default is m4.4xlarge. |
Task Instance Count | Number of task EC2 instances to create in the cluster. Default is 2 |
Task Instance Maximum Spot Price | Maximum spot price for task nodes. Setting this property changes the purchasing option of the task instance group to Spot instead of On-demand. |
Task Auto-Scaling Policy | Optional. Auto-scaling policy for task instances. Type the policy JSON statement here, or provide a path to a file that contains a JSON statement. Format: file:\\<path_to_policy_config_file> |
Property | Description |
---|---|
Applications | Optional. Applications to add to the default applications that AWS installs. AWS installs certain applications when it creates an EMR cluster. In addition, you can specify additional applications. Select additional applications from the drop-down list. This field is equivalent to the Software Configuration list in the AWS EMR cluster creation wizard. |
Tags | Optional. Tags to propagate to cluster EC2 instances. Tags assist in identifying EC2 instances. Format: TagName1=TagValue1,TagName2=TagValue2 |
Software Settings | Optional. Custom configurations to apply to the applications installed on the cluster. This field is equivalent to the Edit Software Settings field in the AWS cluster creation wizard. You can use this as a method to modify the software configuration on the cluster. Type the configuration JSON statement here, or provide a path to a file that contains a JSON statement. Format: file:\\<path_to_custom_config_file> |
Steps | Optional. Commands to run after cluster creation. For example, you can use this to run Linux commands or HDFS or Hive Hadoop commands. This field is equivalent to the Add Steps field in the AWS cluster creation wizard. Type the command statement here, or or provide a path to a file that contains a JSON statement. Format: file:\\<path_to_command_file> |
Bootstrap Actions | Optional. Actions to perform after EC2 instances are running, and before applications are installed. Type the JSON statement here, or provide a path to a file that contains a JSON statement. Format: file:\\<path_to_policy_config_file> |
Task Recovery Strategy | Choose from the following options:
Default is Restart task. |
Property | Description |
---|---|
Cluster Name | Name of the cluster to create. |
Azure Cluster Type | Type of the cluster to be created. Choose one of the options in the drop-down list. Default is Hadoop. |
HDInsight version | HDInsight version to run on the cluster. Enter the HDInsight version tag string to designate the version. Default is the latest version supported. |
Azure Cluster Location | Use the drop-down list to choose the location in which to create the cluster. |
Head Node VM Size | Size of the head node instance to create. Default is Standard_D12_v2. |
Number of Worker Node Instances | Number of worker node instances to create in the cluster. Default is 2. |
Worker Node VM Size | Size of the worker node instance to create. Default is Standard_D13_v2. |
Default Storage Type | Primary storage type to be used for the cluster. Choose one of the following options:
Default is BLOB storage |
Default Storage Container or Root Mount Path | Default container for data. Type one of the following paths:
|
Log Location | Optional. Path to the directory to store workflow event logs. Default is /app-logs. |
Attach External Hive Metastore | If you select this option, the workflow attaches an external Hive metastore to the cluster if you configured an external Hive metastore in the cloud provisioning configuration. |
Bootstrap JSON String | JSON statement to run during cluster creation. You can use this statement to configure cluster details. For example, you could designate a Hadoop connection for the cluster, add tags to cluster resources, or run script actions. Choose one of the following methods to populate the property:
{ "core-site" : { "<sample_property_key1>": "<sample_property_val1>", "<sample_property_key2>": "<sample_property_val2>" }, "tags": { "<tag_key>": "<tag_val>" }, "scriptActions": [ { "name": "setenvironmentvariable", "uri": "scriptActionUri", "parameters": "headnode" } ] } file://<path_to_bootstrap_file> |