Big Data Management Component Architecture
The Big Data Management components include client tools, application services, repositories, and third-party tools that Big Data Management uses for a big data project. The specific components involved depend on the task you perform.
The following image shows the components of Big Data Management:
Hadoop Environment
Big Data Management connects to Hadoop clusters that are distributed by third parties. Hadoop is an open-source software framework that enables distributed processing of large data sets across clusters of machines. You might also need to use third-party software clients to set up and manage your Hadoop cluster.
Big Data Management can connect to Hadoop as a data source and push job processing to the Hadoop cluster. It can also connect to HDFS, which enables high performance access to files across the cluster. It can connect to Hive, which is a data warehouse that connects to HDFS and uses SQL-like queries to run MapReduce jobs on Hadoop, or YARN, which can manage Hadoop clusters more efficiently. It can also connect to NoSQL databases such as HBase, which is a database comprising key-value pairs on Hadoop that performs operations in real-time.
The Data Integration Service pushes mapping and profiling jobs to the Blaze, Spark, or Hive engine in the Hadoop environment.
Clients and Tools
Based on your product license, you can use multiple Informatica tools and clients to manage big data projects.
Use the following tools to manage big data projects:
- Informatica Administrator
- Monitor the status of profile, mapping, and MDM Big Data Relationship Management jobs on the Monitoring tab of the Administrator tool. The Monitoring tab of the Administrator tool is called the Monitoring tool. You can also design a Vibe Data Stream workflow in the Administrator tool.
- Informatica Analyst
- Create and run profiles on big data sources, and create mapping specifications to collaborate on projects and define business logic that populates a big data target with data.
- Informatica Developer
- Create and run profiles against big data sources, and run mappings and workflows on the Hadoop cluster from the Developer tool.
Application Services
Big Data Management uses application services in the Informatica domain to process data.
Big Data Management uses the following application services:
- Analyst Service
The Analyst Service runs the Analyst tool in the Informatica domain. The Analyst Service manages the connections between service components and the users that have access to the Analyst tool.
- Data Integration Service
The Data Integration Service can process mappings in the native environment or push the mapping for processing to the Hadoop cluster in the Hadoop environment. The Data Integration Service also retrieves metadata from the Model repository when you run a Developer tool mapping or workflow. The Analyst tool and Developer tool connect to the Data Integration Service to run profile jobs and store profile results in the profiling warehouse.
- Model Repository Service
The Model Repository Service manages the Model repository. The Model Repository Service connects to the Model repository when you run a mapping, mapping specification, profile, or workflow.
Repositories
Big Data Management uses repositories and other databases to store data related to connections, source metadata, data domains, data profiling, data masking, and data lineage. Big Data Management uses application services in the Informatica domain to access data in repositories.
Big Data Management uses the following databases:
- Model repository
The Model repository stores profiles, data domains, mapping, and workflows that you manage in the Developer tool. The Model repository also stores profiles, data domains, and mapping specifications that you manage in the Analyst tool.
- Profiling warehouse
The Data Integration Service runs profiles and stores profile results in the profiling warehouse.