Labels

Apache Hadoop (3) ASP.NET (2) AWS S3 (2) Batch Script (3) BigQuery (21) BlobStorage (1) C# (3) Cloudera (1) Command (2) Data Model (3) Data Science (1) Django (1) Docker (1) ETL (7) Google Cloud (5) GPG (2) Hadoop (2) Hive (3) Luigi (1) MDX (21) Mongo (3) MYSQL (3) Pandas (1) Pentaho Data Integration (5) PentahoAdmin (13) Polybase (1) Postgres (1) PPS 2007 (2) Python (13) R Program (1) Redshift (3) SQL 2016 (2) SQL Error Fix (18) SQL Performance (1) SQL2012 (7) SQOOP (1) SSAS (20) SSH (1) SSIS (42) SSRS (17) T-SQL (75) Talend (3) Vagrant (1) Virtual Machine (2) WinSCP (1)

Monday, April 11, 2016

AWS S3 to Redshift Incremental load


Performing incremental load between AWS S3 and Redshift table.

Step 1: Create a stageing table in AWS Redshift database similar to your  target table. Execute below command to load data from S3 to Staging table.


QUERY 1

copy stage.logtracking from 's3://Bucket/FolderName'
credentials aws_access_key_id="S3Accesskey";aws_secret_access_key="S3Secretkey"
delimiter   '|'  ACCEPTINVCHARS iGNOREHEADER 1;


 Step 2: Run the below query to perform incremental load similar to UPSERT/MERGE statement

QUERY 2

-- Start a new transaction
BEGIN TRANSACTION;

-- Update the target table using an inner join with the staging table
-- The join includes a redundant predicate to collocate on the distribution key
-- A filter on saletime enables a range-restricted scan on SALES

UPDATE ga.logtracking
SET  usertype= stage.logtracking.usertype, sessioncount= stage.logtracking.sessioncount
   , dayssincelastsession = stage.logtracking.dayssincelastsession, userdefinedvalue = stage.logtracking.userdefinedvalue
   , date = CAST(stage.logtracking.date AS DATETIME)
   , userid = stage.logtracking.userid
   , users = stage.logtracking.users
   , newusers = stage.logtracking.newusers
   , percentnewsessions = stage.logtracking.percentnewsessions
   , sessionsperuser = stage.logtracking.sessionsperuser
   , timeonsite = stage.logtracking.timeonsite
   , checksum = stage.logtracking.checksum
FROM stage.logtracking
WHERE ga.logtracking.checksum = stage.logtracking.checksum;

-- Delete matching rows from the staging table
-- using an inner join with the target table

DELETE FROM stage.logtracking
USING ga.logtracking
WHERE ga.logtracking.checksum = stage.logtracking.checksum;

-- Insert the remaining rows from the staging table into the target table
INSERT INTO ga.logtracking
SELECT usertype, sessioncount, dayssincelastsession, userdefinedvalue, CAST(date AS DATETIME), userid
, users, newusers, percentnewsessions, sessionsperuser, timeonsite, checksum
FROM stage.logtracking;

-- End transaction and commit
END TRANSACTION;

Monday, March 21, 2016

How to filter MDX result Set by Current Year?



Below MDX query is to perform dynamic monthly comparison of a measure with current and previous year months,  to achieve this, a current year parameter is applied as like below

WITH
MEMBER [Measures].[ActiveLearnerPercentPreviousYear] as ([Measures].[Active Learner %],
PARALLELPERIOD ([MonthlyPeriod].[Calendar Period].[Month],12,[MonthlyPeriod].[Calendar Period].CurrentMember)),  FORMAT_STRING ='Percent'
MEMBER [Measures].[ActiveLearnerPercentCurrentYear]   as([Measures].[Active Learner %])
SELECT  {[Measures].[ActiveLearnerPercentCurrentYear] , [Measures].[ActiveLearnerPercentPreviousYear] } ON COLUMNS ,
([MonthlyPeriod].[Calendar Period].[Month], [MonthlyPeriod].[Month No].[Month No]) ON ROWS
FROM [MonthlyMetrics]
WHERE (
      strtomember("[MonthlyPeriod].[Year].&[" + Format(now(), "yyyy") + "]")
      )



Wednesday, March 16, 2016

When We Start Pseudo Distributed Service We Face 'unregognized service' Error Message?

 
We may come across the below issue when we start HDFS service in cloudera:

$ for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done error unrecognized service 

I was able to overcome this issue by running below commands:
  • $ sudo service hadoop-hdfs-namenode start 
  • $ sudo service hadoop-hdfs-datenode start 
  • $ sudo service hadoop-hdfs-secondarynamenode start

Friday, February 12, 2016

Creating Named Set for Current Month, Last 6 Months, and Parallel Period Months


Assume you have a date attributes in the following format:

[Date].[Calendar Date].[Year].&[1999].&[1].&[1].&[01/16/1999]

First create a named set [Current month], with below script. Then followed by  Last 6 Months and parallel period.

Current Month:
StrToMember('[Date].[Calendar Date].[Date].&[' + FORMAT(Now(), "MM/dd/yyyy") + ']').Parent

 Last 6 Months:
TAIL({NULL: [Current Month].ITEM(0)}, 6)

 Last 6 Months - Previous Year
TAIL({NULL: PARALLELPERIOD([Date].[Calendar Date].[Year], 1, [Current Month].ITEM(0))}, 6)

Wednesday, February 10, 2016

SSRS - How to Split a Single Column to Multiple Columns

Let us assume we have dataset with below fields:
 
StudentName                Score
A 2,90,100,8
B 1,80,100,7
C 1,90,100,2

Score can be defined as (attemptno, scored, total, rank)
Now, I want to split this score column to four columns as 'Attempts', 'Scored', 'Total', and 'Rank'. To add this column we need to add below expression to text boxes.

=(Split(Fields!Score.Value,",")).GetValue(0)
=(Split(Fields!Score.Value,",")).GetValue(1)
=(Split(Fields!Score.Value,",")).GetValue(2)
=(Split(Fields!Score.Value,",")).GetValue(3) 
 











































Sunday, February 7, 2016

Passing Config File Variables in Other Python Files

Assume you have a config file (ETLConfig.cfg) with below texts (variables):

[ETL]
etldbname = *****
etllogin = ******
etlpassword = ****
etlport = ****
etlserver = *****

[FPATH]
filepath = D:/ETLFile/

Now we need to use these variable in other python files.

Create a python file (test.py) with below script:

1. Read variables from ETLConfig.cfg using configparser and write those variables to connection strings in your file:

import configparser
import sys
sys.path.append('C:\Python34\Lib\site-packages')
import pprint
import time
import datetime
import os
import csv
import sqlite3
from datetime import date, timedelta

import mysql.connector
from mysql.connector import errorcode

config = configparser.RawConfigParser()
config.read('ETLConfig.cfg')


#GET ETL CONFIG
m1servername = config.get('ETL', 'etlserver')
m1dbname = config.get('ETL', 'etldbname')
m1login = config.get('ETL', 'etllogin')
m1password = config.get('ETL', 'etlpassword')
m1port = config.get('ETL', 'etlport')

cnx = {
  'user': m1login,
  'password': m1password,
  'host': m1servername,
  'database': m1dbname,
  'raise_on_warnings': True,
  'use_pure': False
  }

melcnx=mysql.connector.connect(**cnx)
print(melcnx)

#GET ETL FILEPATH CONFIG

etlfilepath = config.get('FPATH', 'filepath')
print(etlfilepath )

2. On executing the python file (test.py), you can see the printed connection strings having same values available in CFG file.

How to Create Configuration File (*.cfg) Using Python?

The below code help you to create a sample config file using python script. The below code is tested with python 3.4 version:


Step 1:  Check you have python 3.* installed in your machine and also you have python configparser package installed.

Step 2: Create a python file with below scripts, say the python file is saved as CreateConfig.py.

import configparser

config = configparser.RawConfigParser()

#MYSQL_MEL
config.add_section('ETL')
config.set('ETL', 'etldbname', '*****')
config.set('ETL', 'etllogin', '******')
config.set('ETL', 'etlpassword', '****')
config.set('ETL', 'etlport', '****')
config.set('ETL', 'etlserver', '*****')

#FilePath
config.add_section('FPATH')
config.set('FPATH', 'filepath', 'D:/ETLFile/')


# Writing our configuration file to 'ETLConfig.cfg'
with open('ETLConfig.cfg', 'wt') as configfile:
    config.write(configfile)

Step 3: On executing CreateConfig.py file, you can see in your working directory, a new CFG file named ETLConfig.cfg is created with below texts.

[ETL]
etldbname = *****
etllogin = ******
etlpassword = ****
etlport = ****
etlserver = *****

[FPATH]
filepath = D:/ETLFile/

This ETLConfig file can be called in other python files, and the variables can be used as Global Variables in all packages.