Getting Started with Data Ingestion and Replication > Getting Started with Data Ingestion and Replication > Secure Agent services
  

Secure Agent services

Secure Agent services are pluggable microservices that the Secure Agent uses for data processing. Each Secure Agent service runs independently of the other services that run on the agent.
The independent services architecture provides the following benefits:
The services that run on a Secure Agent vary based on your licenses and the Informatica Intelligent Cloud Services that your organization uses. For Data Ingestion and Replication, the following Secure Agent services are available:
Each service has a unique set of configuration properties. You might need to configure a service or change the service properties to optimize performance or if you are instructed to do so by Informatica Global Customer Support.

Database Ingestion service

Both Application Ingestion and Replication and Database Ingestion and Replication use the Database Ingestion agent service to run jobs.
After you download the Secure Agent to your runtime environment and enable the Database Ingestion service, the Database Ingestion packages are pushed to the on-premises system where the Secure Agent runs. You can then optionally configure properties for the Database Ingestion service that runs on the Secure Agent.

Database Ingestion service properties

To change or optimize the behavior of the Database Ingestion service that your Secure Agent group uses, you can configure Database Ingestion agent configuration properties for your runtime environment.
To configure the properties, open a Secure Agent in your runtime environment and click Edit. Under System Configuration Details or Custom Configuration Details, select Database Ingestion as the service and DBMI_AGENT_CONFIG as the type.
The following table describes the Database Ingestion agent service properties:
Property
Description
maxTaskUnits
The maximum number of application ingestion and replication task units and database ingestion and replication task units that can run concurrently on an on-premises machine where the Secure Agent is running.
Task units are not related to the capacity and availability of your hardware or software. You can configure maxTaskUnits to precisely control CPU usage. Valid values are 1 to 2000000000 (2 billion).
To calculate a reasonable number of task units for your Secure Agent machine, Informatica recommends that you divide the number of cores by 3 or 4. For example, if you have an 8-core machine, you could set this property to 2. Then monitor CPU usage and adjust the property value as needed to tune performance.
During initial load processing, this property determines the number of tables that can be unloaded simultaneously. Remaining tables are queued and start unload processing when resources become available.
Note: A single job can process many tables. The total number of tables that can be processed is limited only by available memory. On the average, 25 MB of RAM is required per table for an initial load task based on a 1 KB row size.
During incremental load processing, this property determines the number of application ingestion and replication and database ingestion and replication jobs that can run simultaneously.
Setting this property to a value greater than the number of cores on the Secure Agent machine can increase parallelism for task execution but also cause performance bottlenecks at task execution time.
serviceLogRetentionPeriod
The number of days to retain each internal Database Ingestion service log file after the last update is written to the file. When this retention period elapses, the log file is deleted. The default value is 7 days.
Service logs are retained on the Secure Agent host where they are created: <infaagent>/apps/Database_Ingestion/logs.
Note: This property is applicable to both Application Ingestion and Replication and Database Ingestion and Replication.
taskLogRetentionPeriod
The number of days to retain each job log file after the last update is written to the file. When this retention period elapses, the log file is deleted. The default value is 7 days.
ociPath
For Oracle sources and targets, the path to the Oracle Call Interface (OCI) directory that contains the oci.dll or libcIntsh.so file. By default, Oracle uses $ORACLE_HOME/lib on Linux or %ORACLE_HOME%\bin on Windows. The OCI library is used by database ingestion CDC tasks to connect to Oracle.
For a DBMI agent that is running, this value is appended to the PATH environment variable value on Windows or to the LD_LIBRARY_PATH environment variable value on Linux. This property is not required if you already included the OCI path in the PATH or LD_LIBRARY_PATH environment variable.
Note: This property is applicable only to Database Ingestion and Replication.
serviceUrl
The URL that the Database Ingestion service uses to connect to the Informatica Intelligent Cloud Services cloud.
Note: This property is applicable to both Application Ingestion and Replication and Database Ingestion and Replication.
logLevel
The level of detail to include in the logs that the Database Ingestion service produces. Options are:
  • - TRACE
  • - DEBUG
  • - INFO
  • - WARN
  • - ERROR
The default value is TRACE.
Note: This property is applicable to both Application Ingestion and Replication and Database Ingestion and Replication.
taskExecutionHeapSize
The maximum heap size, in gigabytes, for the Task Execution service. This value, in conjunction with maxTaskUnits property, affects the number of concurrent application ingestion and replication and database ingestion and replication tasks that can run on a Secure Agent. Try increasing the heap size to run more tasks concurrently. Enter this value followed by "g" for gigabytes, for example, '9g'. The default value is '8g'.
Note: This property is applicable to both Application Ingestion and Replication and Database Ingestion and Replication.
useProxy
Set this property to true to enable the DBMI Agent to go through a proxy when connecting to or writing data to targets. The DBMI Agent then uses the proxy settings from the Secure Agent proxy configuration. By default, proxy settings are not used.
Note: This property is applicable to both Application Ingestion and Replication and Database Ingestion and Replication.
intermediateStorageDirectory
For incremental load and combined initial and incremental load jobs, the local root directory under which intermediate files that contain data are stored when the Enable Persistent Storage option is selected in the associated task definitions.
Note: This property is applicable only to Database Ingestion and Replication.
storageBackupDirectory
For incremental load and combined initial and incremental load jobs, the path to the directory that stores backup files when the Enable Persistent Storage option is selected in the associated task definitions.
Note: This property is applicable only to Database Ingestion and Replication.
storageProperties
For incremental load and combined initial and incremental load jobs, a comma-separated list of key=value pairs that is used when the Enable Persistent Storage option is selected in the associated task definitions. Specify this property only at the direction of Informatica Global Customer Support.
Note: This property is applicable only to Database Ingestion and Replication.
task_container.jvm.allowExceptionForInvalidEncodedData
If you receive transliteration errors that report invalid encoding to UTF-8, and you do not want to repair or correct the source data, set this property to false so that database ingestion and replication jobs do not fail when trying to unload the data from the source. With this setting, the Database Ingestion service passes an equivalent Java property to the DataDirect JDBC driver to prevent the exception from occurring. After you set this property, you must restart the Database Ingestion service.
Note: This property is applicable only to Database Ingestion and Replication.
supportedLoadTypes
For application ingestion and replication jobs and deatabase ingestion and replication jobs, the load types that the Database Ingestion agent service can process. You can enter one or more of the following valies, separated by a comma (,):
  • - INITIAL. Initial load jobs or the initial load phase of combined initial and incremental load jobs.
  • - INCREMENTAL. Incremental load jobs or the incremental phase of combined initial and incremental load jobs, which write to your target..
  • - INCREMENTAL_STAGING. CDC staging tasks of incremental load or combined load jobs.
Default is INITIAL,INCREMENTAL,INCREMENTAL_STAGING, which indicates all load types.
Note: If multiple Database Ingestion agents are configured to support the same load types,the jobs use the agent with the most available task units.

Database Ingestion Agent environment variables

To change or optimize the behavior of the Database Ingestion agent service, you can define environment variables:
To configure environment variables, open a Secure Agent in your runtime environment and click Edit. Under System Configuration Details or Custom Configuration Details, select Database Ingestion as the service and DBMI_AGENT_ENV as the type.
Environment Variable
Description
DBMI_REPLACE_UNSUPPORTED_CHARS
For Microsoft Azure Synapse Analytics targets, controls whether an application ingestion and replication job or database ingestion and replication job replaces characters in character data that the target cannot process correctly. To enable character replacement, set this environment variable to true.
DBMI_REPLACE_UNSUPPORTED_CHARS=true
Application Ingestion and Replication or Database Ingestion and Replication then uses the character that is specified in the DBMI_UNSUPPORTED_CHARS_REPLACEMENT environment variable to replace unsupported characters.
DBMI_UNSUPPORTED_CHARS_REPLACEMENT
If the DBMI_REPLACE_UNSUPPORTED_CHARS environment variable is set to true, specifies the character that replaces the characters in source data that a Microsoft Azure Synapse Analytics target cannot process correctly.
Default value: ? (question mark)
Note: Define this environment variable only for Database Ingestion and Replication.
DBMI_WRITER_CONN_POOL_SIZE
Indicates the number of connections that an application ingestion and replication job or database ingestion and replication job uses to propagate the change data to the target. The default value is 8. Valid values are 4 through 8.
DBMI_WRITER_RETRIES_MAX_COUNT
If a network issue occurs while a database ingestion and replication job is loading source data to an Amazon S3 or Microsoft Azure Data Lake Storage Gen2 target, indicates the maximum number of times that the job retries a request to continue the initial load or incremental load. If all of the retries fail, the job fails.
The default value is 5.
DBMI_WRITER_RETRIES_INTERVAL_IN_MILLIS
Specifies the time interval, in milliseconds, that a database ingestion and replication job waits before retrying the request to continue the initial load or incremental load to an Amazon S3 or Microsoft Azure Data Lake Storage Gen2 target if a network issue occurs.
The default value is 1000.
Note: After you define or change an environment variable, restart the Database Ingestion Agent for the changes to take effect.

Mass Ingestion (Files)

To change or optimize the behavior of File Ingestion and Replication that your Secure Agent group uses, configure Mass Ingestion agent service properties for your runtime environment in Administrator.
You can configure the following properties:
Type
Name
Description
AGENT_RUNTIME_SETTINGS
file-listener-snapshot-dir
A directory where the snapshots of a new file listener components are added. You can add the following directory paths:
  • - A path relative to the MassIngestionRuntime directory. For example, ../data/monitor.
  • - The absolute path. For example,
  • <Secure agent installation directory>/apps/MassIngestionRuntime/data/monitor
    where Secure agent installation directory is the name of the directory where the secure agent is installed.
Note: Use the snapshot directory shared with all agents when multiple Secure Agents are present in a group.
AGENT_RUNTIME_SETTINGS
mi-task-workspace-dir
A directory in the agent that file ingestion and replication tasks use as an intermediate staging area when transferring files to a target. The directory is a custom location in the agent. The path can be a shared location, mounted location, or a location apart from the default location in the agent.
AGENT_RUNTIME_SETTINGS
mi-task-project-dir
A directory where the file ingestion and replication task stores the project files. The directory is a custom location in the agent. The path can be a shared location, mounted location, or a location apart from the default location in the agent.
AGENT_RUNTIME_SETTINGS
mi-task-logs-dir
A directory where the file ingestion and replication task stores the task logs files. The directory is a custom location in the agent. The path can be a shared location, mounted location, or a location apart from the default location in the agent.
AGENT_RUNTIME_SETTINGS
mi-task-quarantine-dir
A directory where the file ingestion and replication task stores the infected files detected when you run a virus scan. The directory is a custom location in the agent. The path can be a shared location, mounted location, or a location apart from the default location in the agent.
For example, userdata\quarantine
Note: To automatically clean up the quarantine directory, set the agent property for the quarantine location to a system temporary files location such as /tmp/informatica/fmi/quarantine.
AGENT_RUNTIME_SETTINGS
agent-dedup-repository
The information about skipped duplicate files is saved in Informatica Intelligent Cloud Services (IICS). To save the skipped duplicate files information in the Secure Agent, set the property to true.
Default is false.
For more information about saving the skipped duplicate information, see the File Ingestion and Replication guide.
AGENT_RUNTIME_SETTINGS
mi-dedup-snapshot-dir
Enter the path to store the information about skipped duplicate files in the Secure Agent.
Applies only when the agent-dedup-repository property is set to true.
AGENT_RUNTIME_SETTINGS
file-listener-max-pool-size
The maximum number of threads to execute the file listener.
Default is 20.
AGENT_RUNTIME_SETTINGS
file-listener-core-pool-size
The total number of threads.
Default is 20.
AGENT_RUNTIME_SETTINGS
fmi-task-max-pool-size
The maximum number of threads to execute the file ingestion and replication task.
Default is 50.
AGENT_RUNTIME_SETTINGS
fmi-task-core-pool-size
The initial or minimum number of threads.
Default is 20.
AGENT_RUNTIME_SETTINGS
ftp-receive-socket-buffer-size
The buffer size for FTP inbound packets.
Default is 16 bytes.
AGENT_RUNTIME_SETTINGS
ftp-send-socket-buffer-size
The buffer size for FTP outbound packets.
Default is 16 bytes.
AGENT_RUNTIME_SETTINGS
http-client-timeout
The timeout duration in seconds for Agent requests to Informatica Intelligent Cloud Services.
Default is 30 seconds.
PGP_SETTINGS
public-keyring-path
The directory to store the public key ring. You can add the following directory paths:
  • - A path relative to the directory where Data Ingestion and Replication is installed. For example,
  • ../data/pubring.pkr
    where pubring.pkr is the name of the file where you store the public key ring.
  • - The absolute path. For example,
  • <Secure agent installation directory>/apps/MassIngestionRuntime/data/pubring.pkr
    where pubring.pkr is the name of the file where you store the public key ring and Secure agent installation directory is the name of the directory where the agent is installed.
PGP_SETTINGS
secret-keyring-path
The directory to store the secret key ring. You can add the following directory paths:
  • - A path relative to the directory where Data Ingestion and Replication is installed. For example,
  • ../data/secring.pkr
    where secring.pkr is the name of the file where you store the secret key ring.
  • - The absolute path. For example,
  • <Secure agent installation directory>/apps/MassIngestionRuntime/data/secring.pkr
    where secring.pkr is the name of the file where you store the secret key ring and Secure Agent installation directory is the name of the directory where the agent is installed.
JVM_SETTINGS
app-heap-size
The minimum and maximum heap sizes of the File Ingestion and Replication application.
Default is -Xms256m -Xmx2048m.
JVM_SETTINGS
lcm-heap-size
The minimum and maximum heap sizes of life-cycle management scripts.
Default is -Xms32m -Xmx128m.
You can configure the following properties in the Custom Configuration Details area when you edit a Secure Agent:
Type
Name
Description
AGENT_RUNTIME_SETTINGS
ComplexFileDisableWriteChecksum
Set the value to True to ignore the crc file. The job runs successfully with Hadoop Files V2 as source and Snowflake Cloud Data Warehouse V2 as the target.
Guidelines to specify the folder path
A folder path can be a shared location, mounted location, or a location apart from the default location in the Secure Agent.
The following table lists the use of slashes around the source folder path:
Source
Folder Path
Windows
<folder path>
For example, C:\temp
Linux
/<folder path>/
For example, /root/path
Windows shared location
<folder path> with additional slashes (\)
For example, the path \\INV12B2B01\Shared\path, is specified as \\\\INV12B2B01\\Shared\\path

CMI Streaming Agent

Use the CMI Streaming Agent to define and deploy streaming ingestion and replication tasks. You configure streaming ingestion and replication tasks in the Data Ingestion and Replication service.
A CMI Streaming Agent runs on an on-premise system and works in conjunction with the Streaming Ingestion and Replication. In an on-premise system, the CMI Streaming Agent runs the jobs deployed by Streaming Ingestion and Replication. The agent provides status and statistics updates of each job.
On Linux, the CMI Streaming Agent does not start if the agent installation directory name contains a space. The agent returns a connection timeout status. After a few restart attempts, the agent goes into the error state.

CMI Streaming Agent properties

To change or optimize the behavior of the CMI Streaming Agent, configure agent properties for your run-time environment. Configure CMI Streaming Agent properties in the System Configuration Details area when you edit a Secure Agent.
You can configure Engine, Agent, and Script properties of a CMI Streaming Agent.
The following image shows some of the CMI Streaming Agent properties:
Shows some of the Streaming Ingestion Agent configurable properties.
You can configure the following CMI Streaming Agent properties:
Type
Property Name
Description
Engine
MaxLogFileSize
The maximum size of the log file that the engine can create.
Default is 5 MB.
Engine
LogLevel
The log level for the engine.
Agent
DataflowPullInterval
The time interval after which the agent checks for updates in the task.
Default is 60 seconds.
Agent
JVM
List of JVM properties for the agent. For example: [-Xms256M -Xmx256M]
Agent
LogLevel
The log level for the agent.
Agent
MaxLogFileSize
Maximum size of the log files that an agent can create.
Default is 10 MB.
Agent
MaxNumberOfBackups
Maximum number of backup log files for the agent.
Default is 5.
Scripts
LogLevel
The log level of the scripts.
Scripts
MaxFileSize
The maximum file size after which the log rolls over and creates a new file.
Default is 10 MB.
Scripts
MaxBackupIndex
Maximum number of backup files maintained after rolling over.
Default is 5.

Streaming Agent offline mode

You can run and monitor a streaming ingestion and replication job when the CMI Streaming Agent is offline or not connected to the internet.
The Streaming Agent supports both online and offline modes of communication. In the offline mode, the streaming ingestion and replication job continues to run even if the Streaming Agent does not communicate with the Informatica Intelligent Cloud Services for an extended period of time. The Streaming Agent continues to monitor the health and statistics of the jobs locally. When the Streaming Agent turns online and connects to the cloud services, it updates any configuration changes for the agent and jobs, as well as updates the health and statistics to the services.
To switch between the offline and online modes, you can use the command line utility provided by the Streaming Ingestion and Replication service. Run the following command to start the command line utility:
<Informatica Secure Agent>/apps/Streaming_Ingestion_Agent/<version>/runcli.bat
The command line utility uses the command prompt infa/stream> and provides three groups of commands.
You can change the communication modes only through the command line utility. The Streaming Agent preserves the communication mode when the agent restarts.
The following table lists the commands of this command line utility:
Command
Description
Example
app-config
Shows the current configuration of the Streaming Agent application.
infa/stream :>app-config
deploy.pull.interval : 60
health.poll.interval : 30
minifi.ingester.file.location : ./conf
siagent.communication.mode : Online
siagent.monitoring.persist.dir : ../data
siagent.statistics.post.batchsize : 720
siagent.statistics.post.concurrency : 60
siagent.status.persist.dir : ../data
statistics.poll.interval : 30
app-setconfig
Use this command to configure the following properties:
  • - siagent.communication.mode. Use to configure offline or online communication mode.
  • - siagent.statistics.post.batchsize. Use to define the number of snapshots in a batch.
  • - siagent.statistics.post.concurrency. Use to define the number of worker threads to post statistics.
The --key and --value tokens are optional.
infa/stream :>app-setconfig --key siagent.statistics.post.batchsize --value 20
or
infa/stream :>app-setconfig siagent.statistics.post.batchsize 20
app-status
Shows the current status of the Streaming Agent.
The health status code and health error message indicates the status of the agent (service) shown on the Administrator.
uptime indicates the number of seconds since the Streaming Agent application is available.
infa/stream :>app-status
health error message : No errors
health status code : RUNNING(0)
uptime : 67828
app-statistics
Shows metadata and status of overall statistics collection in the Streaming Agent.
  • - collection interval. Interval of statistics collection, in seconds.
  • - post interval. Frequency of statistics posted or attempted post.
  • - max batch size. Maximum number of snapshots posted in a single http post.
  • - last batch size. Number of snapshots in the last http post.
  • - last time collected. Timestamp when any statistics were last collected.
  • - last time posted. Timestamp when any statistics were last posted.
infa/stream :>app-statistics
collection interval : 30
last batch size : 2
last time collected : 7/3/20 10:19:03 AM IST
last time posted : 7/3/20 10:18:53 AM IST
max batch size : 20
pending snapshots : 3
post interval : 30
clear
Clears the screen.
-
exit, quit
Quits the application.
-
help
Shows a summary of all the commands available.
infa/stream :>help
AVAILABLE COMMANDSAgent Application Commands
app-config: Show agent application configuration
app-setmode: Set the communication mode [Online/Offline]
app-status: Show agent application statusBuilt-In Commands
clear: Clear the shell screen.
exit, quit: Exit the shell.
help: Display help about available commands.Streaming Ingestion Task Commands
task-health: Show streaming ingestion task health
task-list: Show streaming ingestion task list
task-metadata: Show streaming ingestion task metadata
task-list
Shows the list of streaming ingestion and replication jobs currently deployed on the Streaming Agent.
infa/stream :>task-list
6e61e76f-2618-4292-ab3d-dd181f47ee91
ad5053c7-5ac2-493f-8cbb-a24900b61f71
task-health
Shows the health status of all streaming ingestion and replication jobs in the Streaming Agent.
Use the options --name or --id to specify a job.
If none are specified, all jobs are listed.
infa/stream :>task-health --name aby_df4
processors : [{"id":"14a7a095-7fac-4fc3-ac5c-705369132516","status":"ERROR"},{"id":"821e6730-3aed-4d3f-b875-45f424b6b963","status":"RUNNING"}]
status : ERROR
timestamp : Sat May 09 06:04:08 IST 2020
infa/stream :>task-health
6e61e76f-2618-4292-ab3d-dd181f47ee91
processors : [{"id":"2a0b8715-aa7a-46c5-9d6a-6a356f5a0102","status":"ERROR"},{"id":"1172f3a8-35dd-41ef-be4b-bc0cf37e3794","status":"RUNNING"}]
status : ERROR
timestamp : Sat May 09 06:04:08 IST 2020
ad5053c7-5ac2-493f-8cbb-a24900b61f71
processors : [{"id":"14a7a095-7fac-4fc3-ac5c-705369132516","status":"ERROR"},{"id":"821e6730-3aed-4d3f-b875-45f424b6b963","status":"RUNNING"}]
status : ERROR
timestamp : Sat May 09 06:04:08 IST 2020
task-metadata
Shows the metadata of all streaming ingestion and replication jobs in the Streaming Agent.
Use the options --name or --id to specify a job.
If none are specified, all jobs are listed.
infa/stream :>task-metadata --name aby_df4
id : ad5053c7-5ac2-493f-8cbb-a24900b61f71
name : aby_df4
runId : 9071 version : 1
infa/stream :>task-metadata
6e61e76f-2618-4292-ab3d-dd181f47ee91
id : 6e61e76f-2618-4292-ab3d-dd181f47ee91
name : aby_df2
runId : 9069
version : 8
ad5053c7-5ac2-493f-8cbb-a24900b61f71
id : ad5053c7-5ac2-493f-8cbb-a24900b61f71
name : aby_df4
runId : 9071
version : 1
task-statistics
Shows the statistics details of all streaming ingestion and replication jobs in the Streaming Agent.
Use the options --name or --id to specify a job.
If none are specified, all jobs are listed.
infa/stream :>task-statistics --name aby_df1
dataflow name : aby_df1
last time collected : 1590861803731
last time posted : 1590861806091
infa/stream :>task-statistics
7b7d3c09-df43-482f-b6c8-8dd80187e6d7
dataflow name : aby_df2
last time collected : 1590861770731
last time posted : 1590861741132
decfad0a-20df-4226-84f9-1ff1ab6ef96a
dataflow name : aby_df1
last time collected : 1590861768730
last time posted : 1590861771054

Online mode to Offline mode

By default, the Streaming Agent is in online mode.
To change the Streaming Agent to offline mode:
  1. 1Launch the command line utility using the following command:
  2. In Windows:
    <Informatica_Secure_Agent>/apps/Streaming_Ingestion_Agent/<version>/runcli.bat
    In UNIX:
    <Informatica_Secure_Agent>/apps/Streaming_Ingestion_Agent/<version>/runcli.sh
  3. 2Set the Streaming Agent to offline mode:
  4. app-setconfig --key siagent.communication.mode --value Offline
  5. or
  6. app-setconfig siagent.communication.mode Offline
    The Streaming Agent stops sending health updates and statistics of any streaming ingestion and replication job.

Offline mode to Online mode

To change the Streaming Agent to online mode:
  1. 1Launch the command line utility using the following command:
  2. In Windows:
    <Informatica_Secure_Agent>/apps/Streaming_Ingestion_Agent/<version>/runcli.bat
    In UNIX:
    <Informatica_Secure_Agent>/apps/Streaming_Ingestion_Agent/<version>/runcli.sh
  3. 2Set the Streaming Agent to online mode:
  4. app-setconfig --key siagent.communication.mode --value Online
  5. or
  6. app-setconfig siagent.communication.mode Online
    The Streaming Agent starts sending health updates of all the streaming ingestion and replication jobs and the updates appear in the Monitoring page. It starts sending statistics of all the streaming ingestion and replication jobs including the statistics backlog collected while it was offline to the service. It also synchronizes the updates to the streaming ingestion and replication jobs or adds the new streaming ingestion and replication job deployed while it was offline to the service.