Labels

Apache Hadoop (3) ASP.NET (2) AWS S3 (2) Batch Script (3) BigQuery (21) BlobStorage (1) C# (3) Cloudera (1) Command (2) Data Model (3) Data Science (1) Django (1) Docker (1) ETL (7) Google Cloud (5) GPG (2) Hadoop (2) Hive (3) Luigi (1) MDX (21) Mongo (3) MYSQL (3) Pandas (1) Pentaho Data Integration (5) PentahoAdmin (13) Polybase (1) Postgres (1) PPS 2007 (2) Python (13) R Program (1) Redshift (3) SQL 2016 (2) SQL Error Fix (18) SQL Performance (1) SQL2012 (7) SQOOP (1) SSAS (20) SSH (1) SSIS (42) SSRS (17) T-SQL (75) Talend (3) Vagrant (1) Virtual Machine (2) WinSCP (1)

Friday, April 10, 2015

Talend: Incremental Load using Talend ETL tool

Consider you want to perform an incremental load between Source and Target table. Let assume you have both source and target table in MS SQL database. Perform the below steps for incremental loading:

1. Create a reference table for incremental load in target database (e.g., dbo.Incrementalload with columns TableName and Incrementaldate1).  Your table will look like this:

---------------------------------------------------------------
TableName   |  Incrementaldate1 | Incrementaldate2
----------------------------------------------------------------
Event            | 2000-01-01           | NULL
----------------------------------------------------------------
User             | 2010-01-01            | NULL
----------------------------------------------------------------


2. Design your Talend job as like below:


3. Create a context variable 'Incrementaldate1'

4. In tMSSQLInput_2: Get incremental date to assign to context variable.


 5. tJavaRow_1: Assign incremental date to context variable.



6. tMSSQLInput_3: Retrive data from source.



In WHERE condition call the context variable.

7. tMSSQLOutput_2: Load data to target table.

8. tMSSQLRow_2 : Update the Incrementaldate1 in IncrementalLoad table with Talend Getdate()



So in next load, only the data with modified date greater than incremental date will be retrieved and loaded.

Talend: How to add StageModifiedDate and a CheckSum value in Text file Output

Consider you have a text file with User details, the text file doesn't have any data about created or modified date, in that case when you need to move this data to OLEDB database or to Data warehouse you may need stage modified date and Checksum value to improve the performance of the data load.

In this case, make sure you pass StageModifiedDate and CheckSum value to your staging tables, so those values helps you to perform incremental load to target main table.

To do this with Talend, design your job as like below:



In tMap, create a new variable 'stagemodifieddate' as shown below (make sure you use date format and output date pattern provide the required date format), this will add new output StageModifiedDate to schema.



To get CheckSum output, add tAddCRCRow component, and check all those columns that may undergo update, in below example I checked all. By doing so a new CRC column will be generated.



To verify tAddCRCRow, click on Edit Schema, you can see new column CRC


 Now, in text output you can add these two columns along with other data.