Saturday, December 1, 2012

Loading Multiple Target Tables having the same definition from a Source Table

For this illustration, the Customers table in the Sales History (SH) schema in Oracle 10g is used as the source table. Click on each image if needed to open a larger view of the image.

Scenario: When the source and target tables reside in the same database

In this case all the target tables will reside in the SH source schema. The Customers source table will populate multiple target tables - Customers_Finance, Customers_Sales and Customers_Requests.

The source table definition is shown below. Similarly there will be three targets having the same table definition.


A mapping that links all the fields from the source table to that of the target tables is shown below. This is the simplest approach that can be used, but the problem lies with scalability i.e. addition of more target tables would mean changes to both the mapping and the session.





Let's look at an approach that will help scalability and also restrict the logic to a single mapping and session.

Create a file Target_Tables_List.txt as shown below that contains all the target tables that need to be loaded by the Customers source table.


The file definition is shown below. Use this file as the source for the mapping.


Create a SQL transformation SQL_Insert_Customers_Data after the Source Qualifier. Use the default options as shown below.



Drag the Table_Name port from the Source Qualifier transformation to the SQL transformation. Edit the SQL transformation. Check the Add Statistic Output Port check box as shown below in the SQL Settings tab. The Add Statistic Output Port attribute creates adds a NumRowsAffected output port that returns the total number of rows affected by INSERT, DELETE, UPDATE query statements for each input row.


Click on SQL Ports tab. Click on the section highlighted below to add the SQL Query to insert data into the target tables.


Click on the Table_Name port under String Substitution to substitute the table name in the query as shown below.


Complete the query by adding all the column names of the table in the query as shown below.


The above query could have been added as INSERT INTO ~Table_Name~ SELECT * FROM CUSTOMERS. But for this, the column order should be the same for all the tables in the database. To play safe, mention all the column names in the INSERT and SELECT part in the same order as is the syntax.

Create a SQL_Audit table to maintain a log of all the tables loaded for every run. This table can be customized further based on additional requirements. The SQL_Audit table definition is shown below.






Create an expression transformation EXP_Audit to generate the load date timestamp and also customize the error message if no SQL errors are encountered. The expression transformation ports are shown below.





The expression in the SQLError_out port is : IIF(ISNULL(SQLError_in), 'No errors occured', SQLError_in).

Connect the output ports from the SQL transformation to the SQL_Audit table. The complete mapping is shown below.





After running the session for the first time, the data loaded in the SQL_Audit table is shown below. Also, as seen from the table, all the three target tables are loaded correctly with 55500 rows from the source Customers table.


The target tables have a primary key defined on CUST_ID. So, if we rerun the job, the load should fail for all the three tables due to unique constraint violated. The session would succeed because the target table SQL_Audit is loaded correctly with 3 rows, but on checking the SQL_Audit table, it is evident that no data is loaded into the 3 target tables and the reason will be in the SQL_Error column of the SQL_Audit table as shown below.


The complete SQL Error loaded in the SQL_Error column for the CUSTOMERS_FINANCE target table is shown below.
ORA-00001: unique constraint (SH.CUSTOMERS_FINANCE_PK) violated  Database driver error... Function Name : executeDirect SQL Stmt : INSERT INTO CUSTOMERS_FINANCE   (CUST_ID, CUST_FIRST_NAME, CUST_LAST_NAME, CUST_GENDER, CUST_YEAR_OF_BIRTH, CUST_MARITAL_STATUS, CUST_STREET_ADDRESS, CUST_POSTAL_CODE, CUST_CITY, CUST_CITY_ID, CUST_STATE_PROVINCE, CUST_STATE_PROVINCE_ID, COUNTRY_ID, CUST_MAIN_PHONE_NUMBER, CUST_INCOME_LEVEL, CUST_CREDIT_LIMIT, CUST_EMAIL, CUST_TOTAL, CUST_TOTAL_ID, CUST_SRC_ID, CUST_EFF_FROM, CUST_EFF_TO, CUST_VALID)     SELECT CUST_ID, CUST_FIRST_NAME, CUST_LAST_NAME, CUST_GENDER, CUST_YEAR_OF_BIRTH, CUST_MARITAL_STATUS, CUST_STREET_ADDRESS, CUST_POSTAL_CODE, CUST_CITY, CUST_CITY_ID, CUST_STATE_PROVINCE, CUST_STATE_PROVINCE_ID, COUNTRY_ID, CUST_MAIN_PHONE_NUMBER, CUST_INCOME_LEVEL, CUST_CREDIT_LIMIT, CUST_EMAIL, CUST_TOTAL, CUST_TOTAL_ID, CUST_SRC_ID, CUST_EFF_FROM, CUST_EFF_TO, CUST_VALID FROM CUSTOMERS Database driver error... Function Name : ExecuteDirect

Any new target tables that have to be added can be mentioned in the Target_Tables_List.txt flat file without making any changes to the mapping or session.

Thursday, May 31, 2012

Matching Data Between Two Sources Using Part of a String

The illustration below explains different scenarios to match data between two sources using part of a string.

Let us understand the requirement with an example. Click on each image if needed to open a larger view of the image.

The MASTER_LIST source contains a list of strings that need to be searched against the BANK_LIST source. The two sources are shown below.


For instance, if the search string is 'INDIA' in the MASTER_LIST source, the matching rows in the BANK_LIST source will be 'Indian Bank' and 'STATE BANK OF INDIA' since both bank names contain the string 'INDIA'. The resultant rows will be loaded into the BANK_MASTER table as shown below.


The matching between the two sources should not be case-sensitive.


Scenario 1: When both the sources are relational tables and reside in the same database. The database for this example is Oracle.

The INSTR function returns the location of a substring in a string. If the INSTR function returns '0', it implies, the substring is not present in the original string. A SQL query that matches the data between the two sources is given below.

SELECT
mstr.master_id,
bank.bank_id,
bank.bank_name
FROM
master_list mstr, bank_list bank
WHERE
INSTR(bank.bank_name, mstr.search_string) > 0

The above query returns only one row as shown below.


The query should not be case-sensitive and it can be re-written as below by converting both the string (bank.bank_name) and the substring (mstr.search_string) to lower case. This query should be added to the Sql Query section in the Source Qualifier properties.

SELECT
mstr.master_id,
bank.bank_id,
bank.bank_name
FROM
master_list mstr, bank_list bank
WHERE
INSTR(LOWER(bank.bank_name), LOWER(mstr.search_string)) > 0

The mapping is a simple pass-through mapping as shown below.


The final output will be the same as shown in the BANK_MASTER table below with the desired four resultant rows.




Scenario 2: When both the sources are flat files or one source is a flat file and the other is a relational table.

In this example, both the sources are flat files. The mapping implementation is shown below.



After, the Source Qualifier transformations, create two expression transformations as shown above. Create two output ports MASTER_KEY and BANK_KEY in the expression transformations EXP_Master_List and EXP_Bank_List respectively and in the expression editor pass the integer value '1'. These values will serve as a dummy join to merge the rows from both the flat files in the joiner transformation.


In the joiner transformation, designate the MASTER_LIST source as the "Master" source since it has fewer rows as compared to the BANK_LIST source. The Joiner condition is shown below. The Join type is "Normal Join".





The joiner transformation essentially does a full outer join i.e. all the rows between the two sources are matched with each other. We need to select only those rows that meet the defined criteria. This is achieved by using a filter transformation with the Filter Condition as shown below.





Scenario 3: When both the sources are relational tables but reside in different databases.



A similar approach to Scenario 2 can be used in this case too, but it would mean joining all the rows between the two source tables in the joiner transformation. Rather we can issue a query to the BANK_LIST table similar to the query in Scenario 1 except that the query won't have the MASTER_LIST table since it is in a different database.


The SQL transformation can be used to process queries midstream and to get the matching rows from the BANK_LIST table.



The source definitions for MASTER_LIST and BANK_LIST tables are shown below. Both the tables are in separate databases and there exists no DB links either between the two databases.




The BANK_MASTER target definition is shown below.




Create a new mapping. Drag the MASTER_LIST source definition and BANK_MASTER target definition into the Mapping Designer workspace as shown below.




Create a SQL transformation SQL_Get_Bank_Details as shown below. Click Create to proceed.




Proceed with the default settings as shown below. Click OK and Done to continue.





The SQL transformation needs to be run in the Query Mode since the SQL queries issued to the BANK_LIST table will be dynamic i.e. since the MASTER_LIST table contains two search strings 'INDIA' and 'AMERICA', two queries will be issued as given below.



SELECT BANK_ID, BANK_NAME
FROM
BANK_LIST
WHERE
INSTR(LOWER(BANK_NAME), LOWER('INDIA')) > 0;


SELECT BANK_ID, BANK_NAME
FROM
BANK_LIST
WHERE
INSTR(LOWER(BANK_NAME), LOWER('AMERICA')) > 0;


Drag the MASTER_ID and SEARCH_STRING ports from the Source Qualifier to the SQL transformation as shown below.





Double click on the SQL transformation to edit it. Go to the SQL Ports tab. Uncheck the SEARCH_STRING as an output port since it is not required in the target as shown below. Only the MASTER_ID needs to be passed to the target, so it remains as an Input/Output port.




Add two SQL output ports BANK_ID and BANK_NAME as shown below ensuring that the correct Native Type and Precision are selected for each.





Next click on the section highlighted in red above to open the SQL Editor that will contain the SQL query that gets issued midstream. Type the query as shown below. Ensure that the order of the fields in the SELECT clause match the order of the SQL output ports.




Now, since the 'SEARCH_STRING' needs to change dynamically as shown in the two queries above, we need to use String Substitution. Click on the SEARCH_STRING port below String Substitution to add it to the query as shown below.




Modify the query as shown below, so that it matches the above two queries that need to be issued to the BANK_LIST table.





Click OK to continue. Link the MASTER_ID_output, BANK_ID and BANK_NAME ports from the SQL transformation to the target definition. The complete mapping is shown below.





In the session task, mention the correct relational connections. A relational connection (Database_B) needs to be specified for the SQL transformation too. As shown below, the MASTER_LIST table is in Database_A, BANK_LIST table is in Database_B and BANK_MASTER target table is in Database_C.