Labels

Apache Hadoop (3) ASP.NET (2) AWS S3 (2) Batch Script (3) BigQuery (21) BlobStorage (1) C# (3) Cloudera (1) Command (2) Data Model (3) Data Science (1) Django (1) Docker (1) ETL (7) Google Cloud (5) GPG (2) Hadoop (2) Hive (3) Luigi (1) MDX (21) Mongo (3) MYSQL (3) Pandas (1) Pentaho Data Integration (5) PentahoAdmin (13) Polybase (1) Postgres (1) PPS 2007 (2) Python (13) R Program (1) Redshift (3) SQL 2016 (2) SQL Error Fix (18) SQL Performance (1) SQL2012 (7) SQOOP (1) SSAS (20) SSH (1) SSIS (42) SSRS (17) T-SQL (75) Talend (3) Vagrant (1) Virtual Machine (2) WinSCP (1)

Monday, December 26, 2011

Define Data Warehouse


A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process.

Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject.

Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.

Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.

Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.

Different Data Models in Data Warehouse

There are three types of data modelling used to construct a data warehouse, namely:
  1. Conceptual Data Model
  2. Logical Data Model
  3. Physical Data Model
Before going in depth, let us consider some importatnt objects created while designing a database:
Entity Names, Entity relationships, Attributes, Primary and Foriegn keys, Table names, Column names and Data types.

The below chart explains about the difference between the data models based on the database objects:


Conceptual Data Model:

Conceptual data model describes about highest level of relationships between entities. It includes only the entities names and their relationships. No table names, primary keys, foreign keys and column names are referred in this model.

Logical Data Model:

Logical data model provides more information on database structure, it include entities and their relationships, Primary and forign keys and provide more possibilities of handling normalization at earlier stage.



Physical Data Model:
A physical database model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables. Denormalization occurs based on user requirements in this stage.



























We can see that the complexity increases from conceptual to logical to physical. This is why we always first start with the conceptual data model (so we understand at high level what are the different entities in our data and how they relate to one another), then move on to the logical data model (so we understand the details of our data without worrying about how they will actually implemented), and finally the physical data model (so we know exactly how to implement our data model in the database of choice). In a data warehousing project, sometimes the conceptual data model and the logical data model are considered as a single deliverable.

Wednesday, December 21, 2011

SQL Agent Error: Unable to Create or Edit Step in SQL Jobs

In SQL Server 2008\R2 we may come cross this error. The user may not able to edit or create new steps due to the below error :

Creating an instance of the COM component with CLSID {AA40D1D6-CAEF-4A56-B9BB-D0D3DC976BA2} from the IClassFactory failed due to the following error: c001f011. (Microsoft.SqlServer.ManagedDTS)..”


The above error can be solved by executing below command in Command Prompt:

C:\Windows\System32>regsvr32 “c:\Program Files\Microsoft SQL Server\100\DTS\Binn\DTS.dll”

Tuesday, December 20, 2011

Excel not Supported in 64bit: DTS_E_OLEDB_EXCEL_NOT_SUPPORTED

[Connection manager "SourceExcel"] Error: SSIS Error Code DTS_E_OLEDB_EXCEL_NOT_SUPPORTED: The Excel Connection Manager is not supported in the 64-bit version of SSIS, as no OLE DB provider is available.

When you come across the above error in SSIS while using Excel Connection perform the following action:

Go to the property page of the project and set Run64BitrunTime to 'False'





Also in SQL Server Job, which was created to execute the package containing Excel Connection we need to check "Use 32 bit runTime" check box by navigating to the respective Steps > Edit > ExecutionOption.


Monday, December 19, 2011

SQL Query to get Bank Statement As on Date

DECLARE @Transactions TABLE(

[Date] DateTime,

Remarks VARCHAR(100),

CRAmt DECIMAL(18,3),

DRAmt DECIMAL(18,3),

Balance DECIMAL(18,3),

[BalanceType] CHAR(2))

INSERT INTO @Transactions VALUES(GETDATE()-312,'Remark 1',100, NULL,NULL,NULL)

INSERT INTO @Transactions VALUES(GETDATE()-212,'Remark 2',NULL, 50,NULL,NULL)

INSERT INTO @Transactions VALUES(GETDATE()-12,'Remark 3', 200, NULL,NULL,NULL)

SELECT * FROM @Transactions

DECLARE @DrTotal AS DECIMAL(18,3) = 0

DECLARE @CrTotal AS DECIMAL(18,3) = 0

DECLARE @Balance AS DECIMAL(18,3) = 0

UPDATE @Transactions

SET @DrTotal = @DrTotal + ISNULL(DRAmt,0),

@CrTotal = @CrTotal + ISNULL(CRAmt,0),

@Balance = @Balance + ISNULL(DRAmt,0) - ISNULL(CRAmt,0),

Balance = ABS(@Balance),

[BalanceType] = CASE WHEN @Balance > 0 THEN 'Dr' ELSE 'Cr' END



SELECT [Date],Remarks, CRAmt , DRAmt , Balance , [BalanceType] FROM @Transactions

UNION ALL

SELECT GETDATE() [Date],'Total' Remarks

, @CrTotal CRAmt , @DrTotal DRAmt

, ABS(@Balance) Balance , CASE WHEN @Balance > 0 THEN 'Dr' ELSE 'Cr' END [BalanceType]

SSIS Merge Join for Incremental Load

We can use Merge Join transaction to achieve incremental loading of data:

Add a Data Flow task containing the following process:

Step 1: Add two OLEDBSource configured to Customer table in Source DB and DimCustomer table in Target DB.

Step 2: Add two Sort transaction (see Image 1 below) and sort data by Customer ID.




















Step 3: Add a Merge Join, and config as shown below:



















Step 4: Add a Conditional Split and create a condition to get CustomerID with Null values.




















Step 5: Now redirect rows containing CustomerID with Null values to OLEDB Insert DimCustomer and other records to OLEDB Update DimCustomer (OLEDB Command).

Step 6: In OLEDB Command add the below query and in SQL Command and perform the mapping as shown in images below:

UPDATE [Demo].[dbo].[DimCustomer]
   SET [CustomerName] = ?
      ,[MaritalStatus] = ?
      ,[Gender] = ?
      ,[AddressLine1] = ?
      ,[AddressLine2] = ?
      ,[Phone] = ?
 WHERE [CustomerID] = ?







Now execute the package to perform incremental load.




Friday, December 16, 2011

SQL to get Running Total of Transactions

The below query helps the user to get the running total of transaction based on date, consider below tables for example:



Step 1: Create a view with following query (Query 1):

USE [Demo]
GO
SET
ANSI_NULLS ON
GO
SET
QUOTED_IDENTIFIER ON
GO
CREATE
VIEW [dbo].[vwFactTransactionAgg]
AS
SELECT
TOP (100) PERCENT CustomerKey, DateKey, SUM(Amount) AS Amount
FROM dbo.FactTransaction
GROUP BY CustomerKey, DateKey
ORDER BY CustomerKey, DateKey
GO


Step 2: Execute the below Query



SELECT
T.CustomerKey ,
T.DateKey ,
T.Amount ,
RT.RunningTotal
FROM vwFactTransactionAGG T
CROSS APPLY (SELECT SUM(Amount) AS runningTotal
FROM vwFactTransactionAgg
WHERE DateKey <= t.Datekey AND CustomerKey = t.CustomerKey
) AS RT
ORDER BY T.CustomerKey, DateKey


Thursday, December 15, 2011

DimDate - Date Dimension

The below script helps to create valid date dimension attribures:


USE

[DatabaseName]

GO

IF OBJECT_ID('Date','U') IS NOT NULL

DROP TABLE Date

GO

 

CREATE TABLE [dbo].[Date](

[DateSK] [int] NOT NULL,

[FullDate] [datetime] NOT NULL,

[DateName] [char](11) NOT NULL,

[DayOfWeek] [tinyint] NOT NULL,

[DayNameOfWeek] [char](10) NOT NULL,

[DayOfMonth] [tinyint] NOT NULL,

[DayOfYear] [smallint] NOT NULL,

[WeekdayWeekend] [char](7) NOT NULL,

[WeekOfYear] [tinyint] NOT NULL,

[MonthName] [char](10) NOT NULL,

[MonthOfYear] [tinyint] NOT NULL,

[CalendarQuarter] [tinyint] NOT NULL,

[CalendarYear] [smallint] NOT NULL,

[CalendarYearMonth] [char](7) NOT NULL,

[CalendarYearQtr] [char](15) NOT NULL,

CONSTRAINT PK_Date_DateID PRIMARY KEY (DateSK)

) ON [PRIMARY]

GO

RAISERROR
('Table Date created successfully!',0,1)

DECLARE @StartDate datetime, @EndDate datetime

-- Set StartDate and EndDate as per your requirement

SELECT @StartDate = '2009-01-01', @EndDate = '2010-12-31'

WHILE (@StartDate <= @EndDate )

BEGIN

INSERT
INTO Date

SELECT

CAST(CONVERT(varchar(8),@StartDate,112) AS int) DateSK

,@StartDate AS [Date]

,CONVERT(varchar(20),@StartDate,106) AS DateName

,DATEPART(DW,@StartDate) [DayOfWeek]

,DATENAME(DW,@StartDate) [DayNameOfWeek]

,DATENAME(DD,@StartDate) [DayOfMonth]

,DATENAME(DY,@StartDate) [DayOfYear]

,CASE WHEN DATEPART(DW,@StartDate) IN (1,7) THEN 'WeekEnd'

ELSE 'WeekDay' END [WeekdayWeekend]

,DATEPART(WW,@StartDate) [WeekOfYear]

,DATENAME(MM ,@StartDate) [MonthName]

,DATEPART(MM ,@StartDate) [MonthOfYear]

,DATEPART(QQ,@StartDate) [CalendarQuarter]

,DATEPART(YY ,@StartDate) [CalendarYear]

,DATENAME(YY,@StartDate)+'-'+RIGHT('0'+CAST(Month(@StartDate) as varchar),2) [CalendarYearMonth]

,DATENAME(YY,@StartDate)+'-Q'+DATENAME(QQ,@StartDate) [CalendarYearQtr]

SET @StartDate = @StartDate +1

END

GO