Content Management for the Profiling Warehouse
To create and run profiles and scorecards, you must associate the Data Integration Service with a profiling warehouse. You can specify the profiling warehouse when you create the Data Integration Service or when you edit the Data Integration Service properties.
The profiling warehouse stores profiling data and metadata. If you specify a new profiling warehouse database, you must create the profiling content. If you specify an existing profiling warehouse, you can use the existing content or delete and create new content.
You can create or delete content for a profiling warehouse at any time. You may choose to delete the content of a profiling warehouse to delete corrupted data or to increase disk or database space.
Creating and Deleting Profiling Warehouse Content
The Data Integration Service must be running when you create or delete profiling warehouse content.
1. In the Administrator tool, click the Manage tab > Services and Nodes view.
2. In the Domain Navigator, select a Data Integration Service that has an associated profiling warehouse.
3. To create profiling warehouse content, click the Actions menu on the Manage tab and select Profiling Warehouse Database Contents > Create.
4. To delete profiling warehouse content, click the Actions menu on the Manage tab and select Profiling Warehouse Database Contents > Delete.
Database Management
You need to periodically review and manage the profiling warehouse database growth. You can remove profile information that you no longer need and monitor or maintain the profiling warehouse tables.
The need for maintenance depends on different scenarios, such as short-term projects or when you no longer need the profile results. You can remove unused profile results and recover disk space used by the results so that you can reuse the database space for other purposes.
Purge
Purges profile and scorecard results from the profiling warehouse.
The infacmd ps Purge command uses the following syntax:
Purge
<-DomainName|-dn> domain_name
[<-Gateway|-hp> gateway_name]
[<-NodeName|-nn>] node_name
<-UserName|-un> user_name
<-Password|-pd> Password
[<-SecurityDomain|-sdn> security_domain]
<-MrsServiceName|-msn> MRS_name
<-DsServiceName|-dsn> data_integration_service_name
<-ObjectType|-ot> object_type
<-ObjectPathAndName|-opn> MRS_object_path
[<-RetainDays|-rd> results_retain_days]
[<-ProjectFolderPath|-pf> project_folder_path]
[<-ProfileName|-pt> profile_task_name]
[<-Recursive|-r> recursive]
[<-PurgeAllResults|-pa> purge_all_results]
The following table describes infacmd ps Purge options and arguments:
Option | Argument | Description |
---|
-DomainName -dn | domain_name | Required. The name of the Informatica domain. You can set the domain name with the -dn option or the environment variable INFA_DEFAULT_DOMAIN. If you set a domain name with both methods, the -dn option takes precedence. |
-Gateway -hp | gateway_name | Optional if you run the command from the Informatica installation \bin directory. Required if you run the command from another location. The gateway node name. Use the following syntax: [Domain_Host]:[HTTP_Port] |
-NodeName -nn | node_name | Required. The name of the node where the Data Integration Service runs. |
-UserName -un | user_name | Required if the domain uses Native or LDAP authentication. User name to connect to the domain. You can set the user name with the -un option or the environment variable INFA_DEFAULT_DOMAIN_USER. If you set a user name with both methods, the -un option takes precedence. Optional if the domain uses Kerberos authentication. To run the command with single sign-on, do not set the user name. If you set the user name, the command runs without single sign-on. |
-Password -pd | Password | Required if you specify the user name. Password for the user name. The password is case sensitive. You can set a password with the -pd option or the environment variable INFA_DEFAULT_DOMAIN_PASSWORD. If you set a password with both methods, the password set with the -pd option takes precedence. |
-SecurityDomain -sdn | security_domain | Required if the domain uses LDAP authentication. Optional if the domain uses native authentication or Kerberos authentication. Name of the security domain to which the domain user belongs. You can set a security domain with the -sdn option or the environment variable INFA_DEFAULT_SECURITY_DOMAIN. If you set a security domain name with both methods, the -sdn option takes precedence. The security domain name is case sensitive. If the domain uses native or LDAP authentication, the default is Native. If the domain uses Kerberos authentication, the default is the LDAP security domain created during installation. The name of the security domain is the same as the user realm specified during installation. |
-MrsServiceName -msn | MRS_name | Required. The Model Repository Service name. |
-DsServiceName -dsn | data_integration_service_name | Required. The Data Integration Service name |
-ObjectType -ot | - | Required. Enter profile or scorecard. |
-ObjectPathAndName -opn * | MRS_object_path | Optional. Do not use with ProjectFolderPath or Recursive. The path to the profile or scorecard in the Model repository. Use the following syntax: ProjectName/FolderName/.../{SubFolder_Name/ObjectName|ProjectName/ObjectName} |
-RetainDays -rd | results_retain_days | Optional. The number of days that the profiling warehouse stores profile or scorecard results before it purges the results. |
-ProjectFolderPath -pf * | project_folder_path | Optional. Do not use with ObjectPathAndName or ProfileTaskName. The names of the project and folder where the profile or scorecard is stored. Use the following syntax: ProjectName/FolderName |
-ProfileName -pt * | profile_task_name | Optional. The name of the profile task that you want to purge. If you specified the ProjectFolderPath, you do not need to specify this option because the ProjectFolderPath includes the name of the enterprise discovery profile that contains the profile task. |
-Recursive -r | recursive | Optional. Do not use with ObjectPathAndName. Applies the command to objects in the folder that you specify and its subfolders. |
-PurgeAllResults -pa | purge_all_results | Optional. Set this option to purge all results for the profile or scorecard object. Use with the -recursive option to apply the command to profile and scorecard results in the folder that you specify and its subfolders. |
* To run the command, you need to specify ObjectPathAndName or ProjectFolderPath or ProfileTaskName. |
Tablespace Recovery
As part of the regular profile operations, the Data Integration Service writes profile results to the profiling warehouse and deletes results from the profiling warehouse. The indexes and base tables can become fragmented over a period of time. You need to reclaim the unused disk space, especially for Index Organized Tables in Oracle database.
Most of the profiling warehouse tables contain relatively small amount of data and you do not need to recover the tablespace and index space.
The following tables store large amounts of profile data and deleting the tables can leave the tables fragmented:
Name | Description |
---|
IDP_FIELD_VERBOSE_SMRY_DATA | Stores the value frequencies |
IDP_VERBOSE_FIELD_DTL_RES | Stores the staged data |
When you perform the tablespace recovery, ensure that no user runs a profile task. After you recover the data, update the database statistics to reflect the changed structure.
IBM DB2
The recommendation is to shut down the Data Integration Service when you reorganize the tables and indexes.
To recover the database for a table, run the following command:
REORG TABLE <TABLE NAME>
REORG INDEXES ALL FOR TABLE <TABLE NAME> ALLOW WRITE ACCESS CLEANUP ONLY ALL
Oracle
You can rebuild Index Organized Tables in Oracle. This action reclaims unused fragments inside the index and applies to the IDP_FIELD_VERBOSE_SMRY_DATA and IDP_FIELD_VERBOSE_SMRY_DATA profiling warehouse tables.
To recover the database for a table, run the following command:
ALTER TABLE <Table Name> MOVE ONLINE
Microsoft SQL Server
Microsoft SQL Server reclaims unused space back into the tablespace and compacts indexes when rows are deleted. You do not need to maintain the database.
Database Statistics
Update the database statistics to allow the database to quickly run the queries on the profiling warehouse.
Database Statistics on IBM DB2
IBM DB2 recommends that you run the RUNSTATS command to update the statistics after a lot of updates have been made to a table or after a reorganization of the table.
To update the statistics, run the following command:
RUNSTATS ON TABLE <TABLE NAME> WITH DISTRIBUTION AND DETAILED INDEXES ALL
Database Statistics on Oracle
By default, Oracle gathers database statistics and therefore, you do not need to perform any action. For more information, refer the documentation on Oracle DBMS_STATS command.
Database Statistics on Microsoft SQL Server
By default, Microsoft SQL Server gathers statistics and therefore, no action is required. To update the statistics more frequently than the default recommended option, refer the documentation on SQL Server UPDATE STATISTICS command.