Troubleshooting a database ingestion and replication task

Database Ingestion and Replication does not create target columns for source columns that have unsupported data types when you deploy a task. If you change the unsupported data type to a supported data type for the source column later, Database Ingestion and Replication processes the modify column operation on the source but does not replicate the change to the target. When Database Ingestion and Replication tries to add a column with the supported data type to the target, the operation is ignored because the schema drift option Add column is set to Ignore.

To handle this situation, perform the following steps:

This problem occurs if you add or drop a primary key constraint, or if you add or drop a column from an existing primary key.

To resume processing the source table for combined initial and incremental jobs, resynchronize the target table with the source.

To resume processing the source table for incremental jobs, perform the following steps:

This error occurs because of a known Snowflake issue related to schema queries. For more information, see the Snowflake documentation.

In Database Ingestion and Replication, the error can cause the deployment of a database ingestion and replication task that has a Snowflake target to fail when a large number of source tables are selected.

To handle the deployment failure, drop the target tables. Then update the database ingestion and replication task to select fewer source tables for generating the target tables. Then try to deploy the task again.

The maximum number of user processes that is set for the operating system might have been exceeded. If the Linux ulimit value for maximum user processes is not already set to unlimited, set it to unlimited or a higher value. Then resume the job.

If you try to copy an asset to another location that already has an asset of the same name, Database Ingestion and Replication displays a warning message that asks if you want to keep both assets, one with a suffix such as "- Copy 1". Note that when you choose to keep both assets, Database Ingestion and Replication validates the name length to ensure that it will not exceed the maximum length of 50 characters after the suffix is added. If the name length will exceed 50 characters, the copy operation will fail. In this case, you must copy the asset to another location, rename the copy, and then move the renamed asset back to the original location.

org.apache.avro.AvroTypeException: Invalid default for field meta_data: null not a {"type":"array"...
org.apache.avro.AvroTypeException: Invalid default for field header: null not a {"type":"record"...

This error might occur because the consumer has been upgraded to a new Avro version but still uses the Avro schema files from the older version.

To resolve the problem, use the new Avro schema files that Database Ingestion and Replication provides.

This problem might occur when the job is processing many source tables, which requires Confluent Schema Registry to process many schemas. To resolve the problem, try increasing the value of the Confluent Schema Registry kafkastore.timeout.ms option. This option sets the timeout for an operation on the Kafka store. For more information, see the Confluent Schema Registry documentation.

This problem occurs when the job is configured to process many source tables and the Google BigQuery target connection times out before initial load processing of the source tables is complete. To resolve this problem, increase the timeout interval in the Google BigQuery V2 target connection properties.

Database Ingestion and Replication could not find target table 'table_name' which is mapped to source table 'table_name' when deploying the database ingestion task.

This problem occurs because Amazon Redshift reads table and column names as lowercase by default.

To prevent this error, you can set the enable_case_sensitive_identifier parameter to "true" when configuring the database parameter group. For more information about this parameter, see the AWS Amazon Redshift documentation at https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html.

When a new Databricks target table is created during deployment, an entry is added to the Hive metastore that Databricks uses. The Hive metastore is typically a MySQL database. More specifically, column names are inserted into the TABLE_PARAMS field of the metastore. The charset collation of the PARAM_VALUE from TABLE_PARAMS is latin1_bin, and the charset is latin1. This charset does not support Japanese characters. To resolve the problem, create an external metastore with UTF-8_bin as the collation and UTF-8 as the charset. For more information, see the Databricks documentation at https://docs.microsoft.com/en-us/azure/databricks/kb/metastore/jpn-char-external-metastore and https://kb.databricks.com/metastore/jpn-char-external-metastore.html.

Unexpected error encountered filling record reader buffer: HadoopExecutionException: The size of the schema/row at ordinal 1 is 1000050 bytes. It exceeds the maximum allowed row size of 1000000 bytes for Polybase.

To correct the problem, determine an appropriate lower truncation point and specify it in the unloadClobTruncationSize custom property on the Target page of the task wizard. If only one XML column occurs in a row, decrease the truncation point by the difference between <actual schema/row size> and maximum row size. For example, based on the preceding sample message, you'd calculate the lower truncation point for a row with a single XML column as 500000 - 50, or 499950 bytes.