The following list identifies considerations for using Microsoft Fabric OneLake targets:
•When you define a database ingestion and replication task that has a Microsoft Fabric OneLake target, you can select CSV, Avro, or Parquet as the format for the generated output files that contain the source data to be applied to the target.
•If you select the CSV output format, Database Ingestion and Replication creates the following files on the target for each source table:
- A schema.ini file that describes the schema and includes some settings for the output file on the target.
- One or multiple output files for each source table, which contain the source data. Database Ingestion and Replication names these text files based on the name of the source table with an appended date and time.
The schema.ini file lists a sequence of columns for the rows in the corresponding output file. The following table describes the columns in the schema.ini file:
Column
Description
ColNameHeader
Indicates whether the source data files include column headers.
Format
Describes the format of the output files. Database Ingestion and Replication uses a comma (,) to delimit column values.
CharacterSet
Specifies the character set that is used for output files. Database Ingestion and Replication generates the files in the UTF-8 character set.
COL<sequence_number>
The name and data type of the column.
Important: You should not edit the schema.ini file.
•If you select the Avro output format, you can select an Avro format type, a file compression type, an Avro data compression type, and the directory that stores the Avro schema definitions generated for each source table. The schema definition files have the following naming pattern: schemaname_tablename.txt.
•If you select the Parquet output format, you can optionally select a compression type that Parquet supports.
•On Microsoft Fabric OneLake targets, Database Ingestion and Replication creates an empty directory for each empty source table.
•If database ingestion and replication incremental load and combined initial and incremental load jobs replicate Update operations that change primary key values on the source to any of these targets that use the CSV output format, the job processes each Update record as two records on the target: a Delete followed by an Insert. The Delete contains the before image. The Insert contains the after image for the same row.
For Update operations that do not change primary key values, database ingestion and replication jobs process each Update as one operation and writes only the after image to the target.
Note: If source tables do not have primary keys, Database Ingestion and Replication treats the tables as if all columns were part of the primary key. In this case, each Update operation is processed as a Delete followed by an Insert.
•Database Ingestion and Replication jobs unload binary data in hexadecimal format when the data is sent to a Microsoft Fabric OneLake target. Each hexadecimal column value has the "0x" prefix. If you want to use output files to load the data to a target, you might need to edit the files to remove the "0x" prefixes.
Using open mirroring to replicate data to Microsoft Fabric OneLake
Open mirroring enables near real-time replication to a mirrored database in Microsoft Fabric OneLake.
The data is first replicated to a landing zone associated with the mirrored database in Microsoft Fabric OneLake. This replicated data is stored in Parquet file format, which Microsoft Fabric reads from the OneLake landing zone to create or update tables in the mirrored database. Mirroring ensures that changes in the source system are quickly reflected in Microsoft Fabric to keep the mirrored data continuously up-to-date and ready for timely analysis and reporting.
Target preparation
Before you enable mirroring in a task that loads data to Microsoft Fabric OneLake, perform the following tasks:
1Create a mirrored database in your Microsoft Fabric workspace.
OneLake generates a unique landing zone URL based on the mirrored database name and workspace. Note this landing zone URL for use in your task configuration.
2Replication starts automatically after you create the mirrored database. Check the mirrored database settings and ensure that the replication status is running.
For detailed information about creating a mirrored database, see the Microsoft Fabric OneLake documentation.
Task configuration
To configure an ingestion and replication task that loads data to Microsoft Fabric OneLake, perform the following tasks:
•Ensure that each source table that you select has primary or unique key columns defined to avoid replication failures. These keys are required to handle updates and deletes.
•In the task target properties, enable Open Mirroring and enter the Landing Zone URL value so that mirrored data files are written to the correct Microsoft Fabric OneLake path.
Landing zone file structure
When the task runs, the replicated data is stored in Parquet files organized by the following schema and table folders in the OneLake landing zone: <landing‑zone>/<schema>/<table>/
Each selected source table has its own folder that contains the following files:
•A _metadata.json file specifying key columns and metadata.
•Parquet data files named sequentially with a fixed-width numeric format of 19 digits to preserve ordering. For example, 00000000000000000001.parquet.
For incremental changes, each record in the Parquet files includes an operation row marker that indicates the type of DML operation, which can be insert, update, or delete. This marker is derived from the infa_operation_type metadata column. It enables Microsoft Fabric OneLake to apply changes correctly.
Microsoft Fabric OneLake reads these Parquet files from the landing zone to create or update tables in the mirrored database.
Limitations
The following limitations apply when you enable open mirroring:
•The _metadata.json file for each table is generated by the task. You cannot modify it while the task is deployed. To make changes, you need to update the task and then redeploy it.
•Do not modify or delete mirrored data files in the landing zone, as this might cause replication or downstream ingestion failures.
•Schema drift automatically adds new columns when they appear in the source data. However, if a column is deleted in the source, this change is not reflected in the target.
•You can mirror a maximum of 500 tables within a single mirrored database.