Frequently asked questions for Streaming Ingestion and Replication behavior

Review the following frequently asked questions to understand the product behaviour.

Are the checkpoints stored on the Secure Agent, cloud repository, source, or target?

Depending on the source and target, the checkpoints are stored on the Secure Agent, cloud repository, source, or target. The storage of checkpoints varies for different sources and targets.

For example, for an Amazon Kinesis Streams source, Streaming Ingestion and Replication creates a DynamoDB checkpoint table in a data warehouse such as Amazon S3.

Can I set the checkpoint after deploying or undeploying a task?

No, you can't set the checkpoint after deploying or undeploying a task.

In a streaming ingestion and replication task with a Kafka source and a Kafka target, can I rely on the offset stored in Kafka as a restart checkpoint?

Yes, you can treat the offset stored in Kafka as a restart checkpoint.

In a streaming ingestion and replication task with a Kafka source and a Kafka target, does the group.id property help to restart the checkpoint after the tasks are undeployed and deployed?

The group.id is a shared property in Kafka that helps in checkpoint recovery. When a Kafka source within a group processes a message successfully, it updates its offset in the topic partition. This offset is stored along with the group.id. When a Kafka dataflow transitions from the undeployed to the deployed status, and is up and running, it uses the same group.id and the stored offset to resume processing from where the dataflow stopped.

Can you guarantee At least once delivery allowing potential duplicates in the target after a task failure?

Yes, At least once delivery is guaranteed.

Is there any difference in delivery depending on the source and target connectors?

Each connector has a unique implementation, but all are designed to ensure At least once delivery guaranteed delivery.

How does recovery occur after a failure?

After the Secure Agent recovers from a failure, the runtime engine restarts, and the reconciliation process takes place.

Does Streaming Ingestion and Replication support a Secure Agent group with more than one Secure Agent?

Streaming Ingestion and Replication supports multiple Secure Agents under a single agent group. However, a dataflow can't run across multiple Secure Agents at the same time.

Does Streaming Ingestion and Replication support shared Secure Agent groups?

Streaming Ingestion and Replication doesn't support shared Secure Agent groups.

Why does the following warning message appear?

"The Secure Agent is either offline or not reachable."

You might receive the warning message because of the following reasons:

- The Secure Agent is facing a network error.
- The Channel Service in the agent is not working properly, preventing the tunnel from opening for communication.
- There is an issue with Java heap memory.
- Some old dataflows block other dataflows, making the agent unreachable.
- A permission issue occurs while creating and starting the agent with different users.

Questions on POD upgrades

How are the Streaming Ingestion (SI) Agent upgrades and patches applied?

When a new version of the SI Agent is downloaded, it upgrades using the REPLACE mode. The following actions are performed:

1The old package stops, creating an archive of all logs and configurations for future debugging.
2The new package is deployed.
3The new package starts.

The SI Agent is designed to share the states and configurations across subsequent SI Agent versions. The new version uses the previous version's state and configuration to resume with the same state. After upgrade, the system creates a new version directory and switches from the old version to the new version seamlessly. The old version remains on the agent for a week before it is purged.

What happens when the SI Agent is unavailable?

The SI Agent regularly checks for updates in dataflows and deploys them. If the SI Agent is unavailable during a dataflow update, it pulls the changes after it restarts. The SI Agent restarts based on its status. If the SI Agent stops, the corresponding instance also stops.

How is the SI Agent restart handled?

The SI Agent restarts based on its status. If the SI Agent stops, the corresponding instance also stops.

What happens if the agent host is not available?

If the agent host is not available from the service, the dataflow continues to run, but the updates to the dataflow might not be deployed to the agent. When the agent is back online, the SI Agent resumes checking for updates and pulls the latest version of the dataflow.

What happens if the agent core is unavailable?

The system automatically monitors and restarts the agent core. As a result, the SI Agent operates continuously without any interruption.

What happens when the agent core is patched?

When you patch the agent core, the SI Agent continues to run. After a successful patch, the SI Agent resumes operations without affecting the dataflows.