Getting Started with Data Ingestion and Replication > Getting Started with Data Ingestion and Replication > Secure Agent services

Secure Agent services

Secure Agent services are pluggable microservices that the Secure Agent uses for data processing. Each Secure Agent service runs independently of the other services that run on the agent.

The independent services architecture provides the following benefits:

•The Secure Agent does not restart when you add a connector or package.
•Services are not impacted when another service restarts.
•Downtime during upgrades is minimized. The upgrade process installs a new version of the Secure Agent, updates connector packages, and applies configuration changes for the services. To minimize downtime, the old agent remains available and continues to run ingestion jobs during the upgrade. The new version of the Secure Agent runs jobs that start after the upgrade process completes.

The services that run on a Secure Agent vary based on your licenses and the Informatica Intelligent Cloud Services that your organization uses. For Data Ingestion and Replication, the following Secure Agent services are available:

•Database Ingestion - For running application ingestion and replication jobs and database ingestion and replication jobs
•CMI Streaming Agent - For running streaming ingestion and replication jobs
•Mass Ingestion - For running file ingestion and replication jobs

Each service has a unique set of configuration properties. You might need to configure a service or change the service properties to optimize performance or if you are instructed to do so by Informatica Global Customer Support.

Database Ingestion service

Both Application Ingestion and Replication and Database Ingestion and Replication use the Database Ingestion agent service to run jobs.

After you download the Secure Agent to your runtime environment and enable the Database Ingestion service, the Database Ingestion packages are pushed to the on-premises system where the Secure Agent runs. You can then optionally configure properties for the Database Ingestion service that runs on the Secure Agent.

Database Ingestion service properties

To change or optimize the behavior of the Database Ingestion service that your Secure Agent group uses, you can configure Database Ingestion agent configuration properties for your runtime environment.

To configure the properties, open a Secure Agent in your runtime environment and click Edit. Under System Configuration Details, select Database Ingestion as the service and DBMI_AGENT_CONFIG or LOCAL_TASK_CONFIG as the type.

The following table describes the Database Ingestion agent service properties for the DBMI_AGENT_CONFIG type:

Property	Description
maxTaskUnits	The maximum number of application ingestion and replication task units and database ingestion and replication task units that can run concurrently on an on-premises machine where the Secure Agent is running. Task units are not related to the capacity and availability of your hardware or software. You can configure maxTaskUnits to precisely control CPU usage. Valid values are 1 to 2000000000 (2 billion). To calculate a reasonable number of task units for your Secure Agent machine, Informatica recommends that you divide the number of cores by 3 or 4. For example, if you have an 8-core machine, you could set this property to 2. Then monitor CPU usage and adjust the property value as needed to tune performance. During initial load processing, this property determines the number of tables that can be unloaded simultaneously. Remaining tables are queued and start unload processing when resources become available. Note: A single job can process many tables. The total number of tables that can be processed is limited only by available memory. On the average, 25 MB of RAM is required per table for an initial load task based on a 1 KB row size. During incremental load processing, this property determines the number of application ingestion and replication and database ingestion and replication jobs that can run simultaneously. Setting this property to a value greater than the number of cores on the Secure Agent machine can increase parallelism for task execution but also cause performance bottlenecks at task execution time.
serviceLogRetentionPeriod	The number of days to retain each internal Database Ingestion service log file after the last update is written to the file. When this retention period elapses, the log file is deleted. The default value is 7 days. Service logs are retained on the Secure Agent host where they are created: <infaagent>/apps/Database_Ingestion/logs. Note: This property is applicable to both Application Ingestion and Replication and Database Ingestion and Replication.
taskLogRetentionPeriod	The number of days to retain each job log file after the last update is written to the file. When this retention period elapses, the log file is deleted. The default value is 7 days.
ociPath	For Oracle sources and targets, the path to the Oracle Call Interface (OCI) directory that contains the oci.dll or libcIntsh.so file. By default, Oracle uses $ORACLE_HOME/lib on Linux or %ORACLE_HOME%\bin on Windows. The OCI library is used by database ingestion CDC tasks to connect to Oracle. For a DBMI agent that is running, this value is appended to the PATH environment variable value on Windows or to the LD_LIBRARY_PATH environment variable value on Linux. This property is not required if you already included the OCI path in the PATH or LD_LIBRARY_PATH environment variable. Note: This property is applicable only to Database Ingestion and Replication.
serviceUrl	The URL that the Database Ingestion service uses to connect to the Informatica Intelligent Cloud Services cloud. Note: This property is applicable to both Application Ingestion and Replication and Database Ingestion and Replication.
logLevel	The level of detail to include in the logs that the Database Ingestion service produces. Options are: - TRACE - DEBUG - INFO - WARN - ERROR The default value is TRACE. Note: This property is applicable to both Application Ingestion and Replication and Database Ingestion and Replication.
taskExecutionHeapSize	The maximum heap size, in gigabytes, for the Task Execution service. This heap size is used for the single Java process that hosts all application ingestion and replication tasks and database ingestion and replication tasks that run in the common Container service. You might need to increase this value if you run a large number of tasks or process a large volume of data. Enter the property value followed by "g" for gigabytes. The default value is '8g'. Note: This property is applicable to both Application Ingestion and Replication and Database Ingestion and Replication.
useProxy	Set this property to true to enable the DBMI Agent to go through a proxy when connecting to or writing data to targets. The DBMI Agent then uses the proxy settings from the Secure Agent proxy configuration. By default, proxy settings are not used. Note: This property is applicable to both Application Ingestion and Replication and Database Ingestion and Replication.
intermediateStorageDirectory	For incremental load and combined initial and incremental load jobs, the local root directory under which intermediate files that contain data are stored when the Enable Persistent Storage option is selected in the associated task definitions. Note: This property is applicable only to Database Ingestion and Replication.
storageBackupDirectory	For incremental load and combined initial and incremental load jobs, the path to the directory that stores backup files when the Enable Persistent Storage option is selected in the associated task definitions. Note: This property is applicable only to Database Ingestion and Replication.
storageProperties	For incremental load and combined initial and incremental load jobs, a comma-separated list of key=value pairs that is used when the Enable Persistent Storage option is selected in the associated task definitions. Specify this property only at the direction of Informatica Global Customer Support. Note: This property is applicable only to Database Ingestion and Replication.
supportedLoadTypes	For application ingestion and replication jobs and database ingestion and replication jobs, the load types that the Database Ingestion agent service can process. You can enter one or more of the following values, separated by a comma (,): - INITIAL. Initial load jobs or the initial load phase of combined initial and incremental load jobs. - INCREMENTAL. Incremental load jobs or the incremental phase of combined initial and incremental load jobs, which write to your target. - INCREMENTAL_STAGING. CDC staging tasks of incremental load or combined load jobs. Default is INITIAL,INCREMENTAL,INCREMENTAL_STAGING, which indicates all load types. Notes: - If multiple Database Ingestion agent services are configured to support the same load types, the jobs use the agent with the most available task units. - If the Database Ingestion agent service where a combined load job is running does not support the initial load type, the initial load phase of the combined load job is directed to another agent in the Secure Agent group where the initial load type enabled.

The following table describes the Database Ingestion agent service properties for the LOCAL_TASK_CONFIG type:

Property	Description
taskExecutionHeapSize	The maximum heap size, in gigabytes, available for running a single task, such as the CDC Staging Task, in its own Java Virtual Memory (JVM). You might need to increase this size if the task processes a large volume of data. Enter the property value followed by "g" for gigabytes. The default value is '2g'.
taskStartTimeoutSeconds	The number of seconds that must elapse before an attempt to start a task in its own JVM times out. The default value is 120.

Database Ingestion Agent environment variables

To change or optimize the behavior of the Database Ingestion agent service, you can define environment variables:

To configure environment variables, open a Secure Agent in your runtime environment and click Edit. Under System Configuration Details or Custom Configuration Details, select Database Ingestion as the service and DBMI_AGENT_ENV as the type.

Environment Variable	Description
DBMI_REPLACE_UNSUPPORTED_CHARS	For Microsoft Azure Synapse Analytics targets, controls whether an application ingestion and replication job or database ingestion and replication job replaces characters in character data that the target cannot process correctly. To enable character replacement, set this environment variable to true. DBMI_REPLACE_UNSUPPORTED_CHARS=true Application Ingestion and Replication or Database Ingestion and Replication then uses the character that is specified in the DBMI_UNSUPPORTED_CHARS_REPLACEMENT environment variable to replace unsupported characters.
DBMI_UNSUPPORTED_CHARS_REPLACEMENT	If the DBMI_REPLACE_UNSUPPORTED_CHARS environment variable is set to true, specifies the character that replaces the characters in source data that a Microsoft Azure Synapse Analytics target cannot process correctly. Default value: ? (question mark) Note: Define this environment variable only for Database Ingestion and Replication.
DBMI_WRITER_CONN_POOL_SIZE	Indicates the number of connections that an application ingestion and replication job or database ingestion and replication job uses to propagate the change data to the target. The default value is 8. Valid values are 4 through 8.
DBMI_WRITER_RETRIES_MAX_COUNT	If a network issue occurs while a database ingestion and replication job is loading source data to an Amazon S3 or Microsoft Azure Data Lake Storage Gen2 target, indicates the maximum number of times that the job retries a request to continue the initial load or incremental load. If all of the retries fail, the job fails. The default value is 5.
DBMI_WRITER_RETRIES_INTERVAL_IN_MILLIS	Specifies the time interval, in milliseconds, that a database ingestion and replication job waits before retrying the request to continue the initial load or incremental load to an Amazon S3 or Microsoft Azure Data Lake Storage Gen2 target if a network issue occurs. The default value is 1000.

Note: After you define or change an environment variable, restart the Database Ingestion Agent for the changes to take effect.

Mass Ingestion (Files)

To change or optimize the behavior of File Ingestion and Replication that your Secure Agent group uses, configure Mass Ingestion agent service properties for your runtime environment in Administrator.

You can configure the following properties:

Type	Name	Description
AGENT_RUNTIME_SETTINGS	file-listener-snapshot-dir	A directory where the snapshots of a new file listener components are added. You can add the following directory paths: - A path relative to the MassIngestionRuntime directory. For example, ../data/monitor. - The absolute path. For example, <Secure agent installation directory>/apps/MassIngestionRuntime/data/monitor where Secure agent installation directory is the name of the directory where the secure agent is installed. Note: Use the snapshot directory shared with all agents when multiple Secure Agents are present in a group.
AGENT_RUNTIME_SETTINGS	mi-task-workspace-dir	A directory in the agent that file ingestion and replication tasks use as an intermediate staging area when transferring files to a target. The directory is a custom location in the agent. The path can be a shared location, mounted location, or a location apart from the default location in the agent.
AGENT_RUNTIME_SETTINGS	mi-task-project-dir	A directory where the file ingestion and replication task stores the project files. The directory is a custom location in the agent. The path can be a shared location, mounted location, or a location apart from the default location in the agent.
AGENT_RUNTIME_SETTINGS	mi-task-logs-dir	A directory where the file ingestion and replication task stores the task logs files. The directory is a custom location in the agent. The path can be a shared location, mounted location, or a location apart from the default location in the agent.
AGENT_RUNTIME_SETTINGS	mi-task-quarantine-dir	A directory where the file ingestion and replication task stores the infected files detected when you run a virus scan. The directory is a custom location in the agent. The path can be a shared location, mounted location, or a location apart from the default location in the agent. For example, userdata\quarantine Note: To automatically clean up the quarantine directory, set the agent property for the quarantine location to a system temporary files location such as /tmp/informatica/fmi/quarantine.
AGENT_RUNTIME_SETTINGS	agent-dedup-repository	The information about skipped duplicate files is saved in Informatica Intelligent Cloud Services (IICS). To save the skipped duplicate files information in the Secure Agent, set the property to true. Default is false. For more information about saving the skipped duplicate information, see the File Ingestion and Replication guide.
AGENT_RUNTIME_SETTINGS	mi-dedup-snapshot-dir	Enter the path to store the information about skipped duplicate files in the Secure Agent. Applies only when the agent-dedup-repository property is set to true.
AGENT_RUNTIME_SETTINGS	file-listener-max-pool-size	The maximum number of threads to execute the file listener. Default is 20.
AGENT_RUNTIME_SETTINGS	file-listener-core-pool-size	The total number of threads. Default is 20.
AGENT_RUNTIME_SETTINGS	fmi-task-max-pool-size	The maximum number of threads to execute the file ingestion and replication task. Default is 50.
AGENT_RUNTIME_SETTINGS	fmi-task-core-pool-size	The initial or minimum number of threads. Default is 20.
AGENT_RUNTIME_SETTINGS	ftp-receive-socket-buffer-size	The buffer size for FTP inbound packets. Default is 16 bytes.
AGENT_RUNTIME_SETTINGS	ftp-send-socket-buffer-size	The buffer size for FTP outbound packets. Default is 16 bytes.
AGENT_RUNTIME_SETTINGS	http-client-timeout	The timeout duration in seconds for Agent requests to Informatica Intelligent Cloud Services. Default is 30 seconds.
PGP_SETTINGS	public-keyring-path	The directory to store the public key ring. You can add the following directory paths: - A path relative to the directory where Data Ingestion and Replication is installed. For example, ../data/pubring.pkr where pubring.pkr is the name of the file where you store the public key ring. - The absolute path. For example, <Secure agent installation directory>/apps/MassIngestionRuntime/data/pubring.pkr where pubring.pkr is the name of the file where you store the public key ring and Secure agent installation directory is the name of the directory where the agent is installed.
PGP_SETTINGS	secret-keyring-path	The directory to store the secret key ring. You can add the following directory paths: - A path relative to the directory where Data Ingestion and Replication is installed. For example, ../data/secring.pkr where secring.pkr is the name of the file where you store the secret key ring. - The absolute path. For example, <Secure agent installation directory>/apps/MassIngestionRuntime/data/secring.pkr where secring.pkr is the name of the file where you store the secret key ring and Secure Agent installation directory is the name of the directory where the agent is installed.
JVM_SETTINGS	app-heap-size	The minimum and maximum heap sizes of the File Ingestion and Replication application. Default is -Xms256m -Xmx2048m.
JVM_SETTINGS	lcm-heap-size	The minimum and maximum heap sizes of life-cycle management scripts. Default is -Xms32m -Xmx128m.

You can configure the following properties in the Custom Configuration Details area when you edit a Secure Agent:

Type	Name	Description
AGENT_RUNTIME_SETTINGS	ComplexFileDisableWriteChecksum	Set the value to True to ignore the crc file. The job runs successfully with Hadoop Files V2 as source and Snowflake Cloud Data Warehouse V2 as the target.

Guidelines to specify the folder path: A folder path can be a shared location, mounted location, or a location apart from the default location in the Secure Agent.

CMI Streaming Agent

Use the CMI Streaming Agent to define and deploy streaming ingestion and replication tasks. You configure streaming ingestion and replication tasks in the Data Ingestion and Replication service.

A CMI Streaming Agent runs on an on-premise system and works in conjunction with the Streaming Ingestion and Replication. In an on-premise system, the CMI Streaming Agent runs the jobs deployed by Streaming Ingestion and Replication. The agent provides status and statistics updates of each job.

On Linux, the CMI Streaming Agent does not start if the agent installation directory name contains a space. The agent returns a connection timeout status. After a few restart attempts, the agent goes into the error state.

CMI Streaming Agent properties

To change or optimize the behavior of the CMI Streaming Agent, configure agent properties for your run-time environment. Configure CMI Streaming Agent properties in the System Configuration Details area when you edit a Secure Agent.

You can configure Engine, Agent, and Script properties of a CMI Streaming Agent.

The following image shows some of the CMI Streaming Agent properties:

Shows some of the Streaming Ingestion Agent configurable properties.

You can configure the following CMI Streaming Agent properties:

Type	Property Name	Description
Engine	MaxLogFileSize	The maximum size of the log file that the engine can create. Default is 5 MB.
Engine	LogLevel	The log level for the engine.
Agent	DataflowPullInterval	The time interval after which the agent checks for updates in the task. Default is 60 seconds.
Agent	JVM	List of JVM properties for the agent. For example: [-Xms256M -Xmx256M]
Agent	LogLevel	The log level for the agent.
Agent	MaxLogFileSize	Maximum size of the log files that an agent can create. Default is 10 MB.
Agent	MaxNumberOfBackups	Maximum number of backup log files for the agent. Default is 5.
Scripts	LogLevel	The log level of the scripts.
Scripts	MaxFileSize	The maximum file size after which the log rolls over and creates a new file. Default is 10 MB.
Scripts	MaxBackupIndex	Maximum number of backup files maintained after rolling over. Default is 5.

Streaming Agent offline mode

You can run and monitor a streaming ingestion and replication job when the CMI Streaming Agent is offline or not connected to the internet.

The Streaming Agent supports both online and offline modes of communication. In the offline mode, the streaming ingestion and replication job continues to run even if the Streaming Agent does not communicate with the Informatica Intelligent Cloud Services for an extended period of time. The Streaming Agent continues to monitor the health and statistics of the jobs locally. When the Streaming Agent turns online and connects to the cloud services, it updates any configuration changes for the agent and jobs, as well as updates the health and statistics to the services.

To switch between the offline and online modes, you can use the command line utility provided by the Streaming Ingestion and Replication service. Run the following command to start the command line utility:

<Informatica Secure Agent>/apps/Streaming_Ingestion_Agent/<version>/runcli.bat

The command line utility uses the command prompt infa/stream> and provides three groups of commands.

You can change the communication modes only through the command line utility. The Streaming Agent preserves the communication mode when the agent restarts.

The following table lists the commands of this command line utility:

Command	Description	Example
app-config	Shows the current configuration of the Streaming Agent application.	infa/stream :>app-config deploy.pull.interval : 60 health.poll.interval : 30 minifi.ingester.file.location : ./conf siagent.communication.mode : Online siagent.monitoring.persist.dir : ../data siagent.statistics.post.batchsize : 720 siagent.statistics.post.concurrency : 60 siagent.status.persist.dir : ../data statistics.poll.interval : 30
app-setconfig	Use this command to configure the following properties: - siagent.communication.mode. Use to configure offline or online communication mode. - siagent.statistics.post.batchsize. Use to define the number of snapshots in a batch. - siagent.statistics.post.concurrency. Use to define the number of worker threads to post statistics. The --key and --value tokens are optional.	infa/stream :>app-setconfig --key siagent.statistics.post.batchsize --value 20 or infa/stream :>app-setconfig siagent.statistics.post.batchsize 20
app-status	Shows the current status of the Streaming Agent. The health status code and health error message indicates the status of the agent (service) shown on the Administrator. uptime indicates the number of seconds since the Streaming Agent application is available.	infa/stream :>app-status health error message : No errors health status code : RUNNING(0) uptime : 67828
app-statistics	Shows metadata and status of overall statistics collection in the Streaming Agent. - collection interval. Interval of statistics collection, in seconds. - post interval. Frequency of statistics posted or attempted post. - max batch size. Maximum number of snapshots posted in a single http post. - last batch size. Number of snapshots in the last http post. - last time collected. Timestamp when any statistics were last collected. - last time posted. Timestamp when any statistics were last posted.	infa/stream :>app-statistics collection interval : 30 last batch size : 2 last time collected : 7/3/20 10:19:03 AM IST last time posted : 7/3/20 10:18:53 AM IST max batch size : 20 pending snapshots : 3 post interval : 30
clear	Clears the screen.	-
exit, quit	Quits the application.	-
help	Shows a summary of all the commands available.	infa/stream :>help AVAILABLE COMMANDSAgent Application Commands app-config: Show agent application configuration app-setmode: Set the communication mode [Online/Offline] app-status: Show agent application statusBuilt-In Commands clear: Clear the shell screen. exit, quit: Exit the shell. help: Display help about available commands.Streaming Ingestion Task Commands task-health: Show streaming ingestion task health task-list: Show streaming ingestion task list task-metadata: Show streaming ingestion task metadata
task-list	Shows the list of streaming ingestion and replication jobs currently deployed on the Streaming Agent.	infa/stream :>task-list 6e61e76f-2618-4292-ab3d-dd181f47ee91 ad5053c7-5ac2-493f-8cbb-a24900b61f71
task-health	Shows the health status of all streaming ingestion and replication jobs in the Streaming Agent. Use the options --name or --id to specify a job. If none are specified, all jobs are listed.	infa/stream :>task-health --name aby_df4 processors : [{"id":"14a7a095-7fac-4fc3-ac5c-705369132516","status":"ERROR"},{"id":"821e6730-3aed-4d3f-b875-45f424b6b963","status":"RUNNING"}] status : ERROR timestamp : Sat May 09 06:04:08 IST 2020 infa/stream :>task-health 6e61e76f-2618-4292-ab3d-dd181f47ee91 processors : [{"id":"2a0b8715-aa7a-46c5-9d6a-6a356f5a0102","status":"ERROR"},{"id":"1172f3a8-35dd-41ef-be4b-bc0cf37e3794","status":"RUNNING"}] status : ERROR timestamp : Sat May 09 06:04:08 IST 2020 ad5053c7-5ac2-493f-8cbb-a24900b61f71 processors : [{"id":"14a7a095-7fac-4fc3-ac5c-705369132516","status":"ERROR"},{"id":"821e6730-3aed-4d3f-b875-45f424b6b963","status":"RUNNING"}] status : ERROR timestamp : Sat May 09 06:04:08 IST 2020
task-metadata	Shows the metadata of all streaming ingestion and replication jobs in the Streaming Agent. Use the options --name or --id to specify a job. If none are specified, all jobs are listed.	infa/stream :>task-metadata --name aby_df4 id : ad5053c7-5ac2-493f-8cbb-a24900b61f71 name : aby_df4 runId : 9071 version : 1 infa/stream :>task-metadata 6e61e76f-2618-4292-ab3d-dd181f47ee91 id : 6e61e76f-2618-4292-ab3d-dd181f47ee91 name : aby_df2 runId : 9069 version : 8 ad5053c7-5ac2-493f-8cbb-a24900b61f71 id : ad5053c7-5ac2-493f-8cbb-a24900b61f71 name : aby_df4 runId : 9071 version : 1
task-statistics	Shows the statistics details of all streaming ingestion and replication jobs in the Streaming Agent. Use the options --name or --id to specify a job. If none are specified, all jobs are listed.	infa/stream :>task-statistics --name aby_df1 dataflow name : aby_df1 last time collected : 1590861803731 last time posted : 1590861806091 infa/stream :>task-statistics 7b7d3c09-df43-482f-b6c8-8dd80187e6d7 dataflow name : aby_df2 last time collected : 1590861770731 last time posted : 1590861741132 decfad0a-20df-4226-84f9-1ff1ab6ef96a dataflow name : aby_df1 last time collected : 1590861768730 last time posted : 1590861771054

Online mode to Offline mode

By default, the Streaming Agent is in online mode.

To change the Streaming Agent to offline mode:

1Launch the command line utility using the following command:

In Windows:

<Informatica_Secure_Agent>/apps/Streaming_Ingestion_Agent/<version>/runcli.bat

In UNIX:

<Informatica_Secure_Agent>/apps/Streaming_Ingestion_Agent/<version>/runcli.sh

2Set the Streaming Agent to offline mode:

app-setconfig --key siagent.communication.mode --value Offline

or

app-setconfig siagent.communication.mode Offline

The Streaming Agent stops sending health updates and statistics of any streaming ingestion and replication job.

Offline mode to Online mode

To change the Streaming Agent to online mode:

1Launch the command line utility using the following command:

In Windows:

<Informatica_Secure_Agent>/apps/Streaming_Ingestion_Agent/<version>/runcli.bat

In UNIX:

<Informatica_Secure_Agent>/apps/Streaming_Ingestion_Agent/<version>/runcli.sh

2Set the Streaming Agent to online mode:

app-setconfig --key siagent.communication.mode --value Online

or

app-setconfig siagent.communication.mode Online

The Streaming Agent starts sending health updates of all the streaming ingestion and replication jobs and the updates appear in the Monitoring page. It starts sending statistics of all the streaming ingestion and replication jobs including the statistics backlog collected while it was offline to the service. It also synchronizes the updates to the streaming ingestion and replication jobs or adds the new streaming ingestion and replication job deployed while it was offline to the service.