Labels

Apache Hadoop (3) ASP.NET (2) AWS S3 (2) Batch Script (3) BigQuery (21) BlobStorage (1) C# (3) Cloudera (1) Command (2) Data Model (3) Data Science (1) Django (1) Docker (1) ETL (7) Google Cloud (5) GPG (2) Hadoop (2) Hive (3) Luigi (1) MDX (21) Mongo (3) MYSQL (3) Pandas (1) Pentaho Data Integration (5) PentahoAdmin (13) Polybase (1) Postgres (1) PPS 2007 (2) Python (13) R Program (1) Redshift (3) SQL 2016 (2) SQL Error Fix (18) SQL Performance (1) SQL2012 (7) SQOOP (1) SSAS (20) SSH (1) SSIS (42) SSRS (17) T-SQL (75) Talend (3) Vagrant (1) Virtual Machine (2) WinSCP (1)

Monday, May 23, 2011

Stored procedure to get Original error



USE [DBNAME]
GO
/****** Object:  StoredProcedure [dbo].[f9001_pcrtw_Error]    Script Date: 05/24/2011 12:13:42 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =====================================================================
-- Author   :Mahadevan
-- Create date  :09/06/2010
-- Description  :The procedure rethrows the original error
-- Modified Date :Modified By   Reason
-- =====================================================================
CREATE PROCEDURE [dbo].[f9001_pcrtw_Error]
AS
 BEGIN
 SET NOCOUNT ON
    -- Return if there is no error information to retrieve.
    IF ERROR_NUMBER() IS NULL
        RETURN;
    DECLARE @ErrorMessage    NVARCHAR(4000)
    DECLARE @ErrorNumber     INT
    DECLARE @ErrorSeverity   INT
    DECLARE @ErrorState      INT
    DECLARE @ErrorLine       INT
    DECLARE @ErrorProcedure  NVARCHAR(200)
    -- Assign variables to error-handling functions that
    -- capture information for RAISERROR.
    SELECT @ErrorNumber = ERROR_NUMBER(),
   @ErrorSeverity = ERROR_SEVERITY(),
   @ErrorState = ERROR_STATE(),
   @ErrorLine = ERROR_LINE(),
   @ErrorProcedure = ISNULL(ERROR_PROCEDURE(), '-');
    -- Building the message string that will contain original
    -- error information.
    SELECT @ErrorMessage = N'Error %d, Level %d, State %d, Procedure %s, Line %d, ' +'Message: '+ ERROR_MESSAGE();
 
    -- Raise an error: msg_str parameter of RAISERROR will contain
    -- the original error information.
    RAISERROR
    (
  @ErrorMessage,
  @ErrorSeverity,
  1,              
  @ErrorNumber,    -- parameter: original error number.
  @ErrorSeverity,  -- parameter: original error severity.
  @ErrorState,     -- parameter: original error state.
  @ErrorProcedure, -- parameter: original error procedure name.
  @ErrorLine       -- parameter: original error line number.
    )
 SET NOCOUNT OFF
 END
GO

Use the procedure "f9001_pcrtw_Error" along with other stored procedure as

CREATE PROCEDURE <name>
AS
BEGIN
  BEGIN TRY
   <SQL Statement>
  END TRY

 BEGIN CATCH
   EXEC f9001_pctw_Error
 END CATCH

END







 

Sunday, May 22, 2011

How to shrink SQL Server Database

We can shrink SQL Server Database by shrinking their respective logfiles using the below command:

USE YourDatabase
GO
DECLARE @LogFileLogicalName SYSNAME
SELECT @LogFileLogicalName= 'Name'
FROM SYS.DATABASE_FILES WHERE TYPE=1 ---Enter log filename
PRINT @LogFileLogicalName
DBCC Shrinkfile(@LogFileLogicalName,1)

Thursday, May 19, 2011

How to Connect Multiple Datasources in Different Domain through SSIS

How to connect multiple datasources in different domain via SSIS?
Example:
Let us consider I have an excel source and I need to load the data in excel sheet 1 to 3 different databases (two in same domain A and 1 in different domain B). In this case we can use Multicast transformations to load data to all domain. But we need to create connection string for all databases which may become complicated when the number of dataources are more (e.g., 50 connection instead of 3). To avoid this we can use For Each Loop container.
Follow the below steps:
1.      Create a table “dbo.Loop” and save the details of the data sources:





 
2.      Create the following variables as shown below:







3.      Create the following connection managers as shown below:
·        Sql Server connection LoadSourceDetails: To get details of databases to which data has to be loaded.
·         Sql Server connection Datasource: Connectuion manager configured with one of the destination datasources.
In this connection manager go to Expression Properties and config the below value:
"Provider=SQLNCLI10.1;Data Source="+ @[User::LoopServer] +";Persist Security Info=True;Password="+ @[User::Password] +";User ID="+ @[User::Username] +";Initial Catalog="+ @[User::LoopDB] +";"
























4.       Create a package as shown below:



5.       In Load DatasourceDetails config like below:


















SELECT        ServerDW, DatabaseDW, Username, Password
FROM            dbo.LOOP
WHERE        (IsActive = 1)


6.       In ResultSet map the User::LoadConnection variable:




 

7.       In For Each Loop container perform the below configurations:






















In Variable mappings:



















 
8.       In Data flow task create a simple flow tasks as shown below:



 
Now execute the package data will be loaded to all databases present in “dbo.Loop” table.

Monday, May 16, 2011

Change Data Capture in SQl Server

Change Data Capture (CDC): This is a new feature available in SQL Server 2008 to track INSERTED, UPDATED or DELETED records.
Steps to enable CDC:
1.       Check CDC Status in your databases.

Check the status of “is_CDC_enabled” column by running below query:
SELECT [Name], database_ID, is_cdc_enabled FROM SYS.DATABASES
WHERE [name] = 'BIPractice'


2.       To enable change data capture in your database execute the below query:

USE BIPractice -–[dbname]
GO
EXEC sys.sp_cdc_enable_db
GO 

Verify CDC Method 1: And then run the below query to verify CDC configuration:







Verify CDC Method 2: A new Schema CDC will be created in Schemas folder in your database.

Verify CDC Method 3: The following system tables will be created in the database:

Ø  cdc.captured_columns: This table returns result for list of captured column.
Ø  cdc.change_tables: This table returns list of all the tables which are enabled for capture.
Ø  cdc.ddl_history: This table contains history of all the DDL changes since capture data enabled.
Ø  cdc.index_columns: This table contains indexes associated with change table.
Ø  cdc.lsn_time_mapping: This table maps LSN number (for which we will learn later) and time.
3.       Run the below query to enable CDC capture against the table which you want to track, e.g., dbo.Sales.
 exec sys.sp_cdc_enable_table
    @source_schema = 'dbo',
    @source_name = 'sales' ,
    @role_name = 'db_reader',
 @supports_net_changes = 1
On successful execution, the following messages are displayed:
Job 'cdc.BIPractice_capture' started successfully.
Job 'cdc.BIPractice_cleanup' started successfully.
               
Ø  @source_schema is the schema name of the table that you want to enable for CDC
Ø  @source_name is the table name that you want to enable for CDC
Ø  @role_name is a database role which will be used to determine whether a user can access the CDC data; the role will be created if it doesn't exist.  You can add users to this role as required; you only need to add users that aren't already members of the db_owner fixed database role.
Ø  @supports_net_changes determines whether you can summarize multiple changes into a single change record; set to 1 to allow, 0 otherwise.
4.       In order to view the tables for which CDC is enabled run the below query:
 SELECT name, type, type_desc, is_tracked_by_cdc from sys.tables
    WHERE is_tracked_by_cdc = 1

Example:
1.       Create Table Sales
CREATE TABLE [dbo].[Sales](
       [RowKey] [int] IDENTITY(1,1) NOT NULL,
       [Firstname] [varchar](50) NOT NULL,
       [ProductCategory] [varchar](50) NOT NULL,
       [Product] [varchar](50) NULL,
       [CreatedDate] [date] NULL,
       [Country] [varchar](50) NULL,
       [State] [varchar](50) NULL,
       [City] [varchar](50) NULL,
       [Sales] [float] NULL,
 CONSTRAINT [PK_Sales] PRIMARY KEY CLUSTERED
(
       [RowKey] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
2.       Enable CDC for the Table Sales
3.        Inert record
INSERT INTO [BIPractice].[dbo].[Sales]
([Firstname],[ProductCategory],[Product],[CreatedDate],[Country],[State],[City],[Sales])
VALUES('ABC','ABC','ABC','1/1/2010','India','Taminadu','Chennai',1234)
Run the below Query:
DECLARE @begin_lsn BINARY(10), @end_lsn BINARY(10)
SELECT @begin_lsn = sys.fn_cdc_get_min_lsn('dbo_sales')
SELECT @end_lsn = sys.fn_cdc_get_max_lsn()

PRINT @begin_lsn
PRINT @end_lsn

SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_sales(@begin_lsn, @end_lsn, 'all');



The __$operation column stands for
1 = delete
 2 = insert, 
3 = update (values before update),
4 = update (values after update). 
(Try executing update edit statements in the table)


--==============================================================================
--DISABLE CHANGE DATA CAPTURE IN TABLE
--==============================================================================

USE [CDC]
GO

EXEC sys.sp_cdc_disable_table
@source_schema = N'dbo',
@source_name   = N'sales',
@capture_instance = N'dbo_sales'
GO

--================================================================================

--==============================================================================
--DISABLE CHANGE DATA CAPTURE IN DATABASE
--==============================================================================

USE [CDC]
GO
EXEC sys.sp_cdc_disable_db
GO 
--******************************************************************************


Monday, May 9, 2011

Tips for Performance Tuning in SSIS package

Some Tips for Performance Tuning in SSIS package:

Data Flow Optimization Modes
The data flow task has a property called RunInOptimizedMode. When this property is enabled, any down-stream component that doesn’t use any of a source component columns is automatically disabled, and any unused column is also automatically disabled.  The net result of enabling the RunInOptimizedMode property is the performance of the entire data flow task is improved.
SSIS projects also have an RunInOptimizedMode property.  This indicates that the RunInOptimizedMode property of all the data flow tasks in the project are overridden at design time, and that all of the data flow tasks in the project run in optimized mode during debugging.
Buffers
A buffer is an in-memory dataset object utilized by the data flow engine to transform data.  The data flow task has a configurable property called DefaultMaxBufferRows, which is set to 10,000 by default.  The data flow task also has a configurable property called DefaultBufferSize, which is set to 10 MB by default.  Additionally, the data flow task has a property called MaxBufferSize, which is set to 100 MB and cannot be changed.
Buffer Sizing
When performance tuning a data flow, the goal should be to pass as many records as possible through a single buffer while efficiently utilizing memory.  This begs the question: what does “efficiently utilizing memory” mean?  SSIS estimates the size of a buffer row by calculating the data source metadata at design-time.  Optimally, the buffer row size should be as small as possible…which can be accomplished by employing the smallest possible data type for each column.  SSIS automatically multiplies the estimated buffer row size by the DefaultMaxBufferRows setting to determine how much memory to allocate to each buffer in the data flow engine.  If this amount of memory exceeds the MaxBufferSize (100 MB), SSIS automatically reduces the number of buffer rows to fit within the 100 MB memory boundary.
The data flow task has another property called MinBufferSize, which is 64 KB and cannot be changed.  If the amount of memory estimated by SSIS to be allocated for each buffer is below 64 KB, SSIS will automatically increase the number of buffer rows per buffer in order to exceed the MinBufferSize memory boundary.

Buffer Tuning

The data flow task has a property called BufferSizeTuning.  When the value of this property is set to true, SSIS will add information to the SSIS log indicating where SSIS has adjusted the buffer size.  While buffer tuning, the goal should be to fit as many rows into a buffer as possible.  Thus, the value for DefaultMaxBufferRows should be as large as possible without exceeding a total buffer size of 100 MB.

Parallelism

SSIS natively supports the parallel execution of packages, tasks, and transformations.  Therefore, parallelism can greatly improve the performance of a package when it is configured within the constraints of system resources.  A package has a property called MaxConcurrentExecutables, which can be configured to set the maximum number of threads that can execute in parallel per package.  By default, this is set to -1, which translates to the number of logical machine processors plus 2.  All or some of the operations in a package can execute in parallel.
Additonally, the data flow task has a property called EngineThreads, which defines how many threads the data flow engine can create and run in parallel.  This property applies equally to both the source threads that the data flow engine creates for sources and the worker threads that the engine creates for transformations and destinations.   For example, setting the EngineThreads property to 10 indicates that the data flow engine can create up to 10 source threads and 10 worker threads.

Thursday, May 5, 2011

SSRS Report Life Cycle

1. Report authoring: This stage involves creation of reports that are published using the Report Definition language. RDL is an XML based industry standard for defining reports.
2. Report management: This involves managing the published reports as a part of the webservice. The reports are checked for consistency and performance. They can be executed whenever demanded or can be scheduled and executed.
In short Report Management includes:
- Organizing reports and data sources,
- Scheduling report execution and delivery, and
- Tracking reporting history.
3. Report delivery: Reports can be delivered to the consumers either on their demand or based on an event. Then they can view them is a web-based format.
4. Report security: It is important to protect reports as well as the report resources. Therefore, Reporting Services implement a flexible, role-based security model.

Impersonate Account Setting in SSAS XMLA script

While generating Create SSAS Database through XMLA script, we can see that the password field in impersonate account is not appearing anywhere. If you execute the script as it is the database will be created but fails whenever you process the database. to avoid this you need to add user account password to run the cube as shown below:

<ImpersonationInfo>
  <ImpersonationMode>ImpersonateAccount</ImpersonationMode>
  <Account>Domain\UserName</Account>
  <Password>Password</Password></ImpersonationInfo>

Configuring Reporting Services to Use SSIS Package Data

Configuring Reporting Services to Use SSIS Package Data
Reporting Services includes a data processing extension that retrieves data from a SQL Server Integration Services (SSIS) package. The SSIS data processing extension is not supported. This data processing extension is a non-production feature that is off by default. Using the SSIS data processing extension on a production server is not recommended at this time. If you choose to enable the feature and use it on your report server, be aware that at run time the package will be processed under the security identity of the Report Server Windows service account or the Report Server Web service account. This has implications on how you secure the data sources that the package accesses. Specifically, you will need to grant the service accounts permission to read the data and perform any operation that you define in the package. Use this feature only if you take the following precautions:
·         Make sure the content of the package is controlled (for example, you can use the password protection features in SSIS or set permissions on the file). For more information, see Setting the Protection Level of Packages.
·         Design the package so that it can only perform those operations that are necessary for getting the data you want.
·         When configuring the report server service accounts, always use least-privilege user accounts or service accounts.
Before you can use the SSIS data processing extension, you must modify configuration files to enable the extension on your report server and in Report Designer. Follow the instructions in this topic to edit the configuration files.
To enable the data processing extension, you must remove comments from two sections in the RSReportDesigner.config file.
1.       Install Integration Services on the same computer on which you installed Reporting Services.
2.       Using Windows Explorer, locate <drive>:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies.
3.       Using a text editor, open the RSReportDesigner.config file.
4.       Remove the comment marks (delete <!-- and --> characters from the beginning and end of the line) for this entry in the <Data> section:
<!-- <Extension Name="SSIS" Type="Microsoft.SqlServer.Dts.DtsClient.DtsConnection,Microsoft.SqlServer.Dts.DtsClient, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91"/> -->

5.       In the <Designer> section, remove the comments from this entry:
<!-- <Extension Name="SSIS" Type="Microsoft.ReportingServices.QueryDesigners.GenericQueryDesigner,Microsoft.ReportingServices.QueryDesigners"/> -->

6.       Save the file.
After you modify the configuration file, the SSIS data processing extension becomes available as a data source type that you can select in the Data Source dialog box in Report Designer.

Before you can process a report that retrieves package data, you must remove comments from RSReportServer.config file.
1.       Open the in RSReportServer.config file, located by default at <drive>:\Program Files\Microsoft SQL Server\MSRS10.MSSQLSERVER\Reporting Services\ReportServer.
2.       Remove the comment marks (delete <!-- and --> characters from the beginning and end of the line) for this entry:
<!-- <Extension Name="SSIS" Type="Microsoft.SqlServer.Dts.DtsClient.DtsConnection,Microsoft.SqlServer.Dts.DtsClient, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91"/> -->

3.       Save the file.
After you edit RSReportServer.config, the SSIS data processing extension can be used on the report server. To verify that the data processing extension is available, open a data source property page in either Report Manager or Management Studio and view the list of data processing extensions in Data Source Type. If you modified the configuration file correctly, SSIS appears in the list of data source types.

Difference between SSRS 2005 and SSRS 2008

SSRS 2005 vs SSRS 2008

Tuesday, May 3, 2011

Script to Create Linked Server

/****** Object:  LinkedServer [PREMERATEST]    Script Date: 05/03/2011 13:57:55 ******/
EXEC master.dbo.sp_addlinkedserver @server = N'PREMERATEST', @srvproduct=N'SQLServer OLEDB Provider', @provider=N'SQLNCLI', @datasrc=N'PremeraBCP2'
GO
EXEC master.dbo.sp_serveroption @server=N'PREMERATEST', @optname=N'collation compatible', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'PREMERATEST', @optname=N'data access', @optvalue=N'true'
GO
EXEC master.dbo.sp_serveroption @server=N'PREMERATEST', @optname=N'dist', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'PREMERATEST', @optname=N'pub', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'PREMERATEST', @optname=N'rpc', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'PREMERATEST', @optname=N'rpc out', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'PREMERATEST', @optname=N'sub', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'PREMERATEST', @optname=N'connect timeout', @optvalue=N'0'
GO
EXEC master.dbo.sp_serveroption @server=N'PREMERATEST', @optname=N'collation name', @optvalue=null
GO
EXEC master.dbo.sp_serveroption @server=N'PREMERATEST', @optname=N'lazy schema validation', @optvalue=N'false'
GO
EXEC master.dbo.sp_serveroption @server=N'PREMERATEST', @optname=N'query timeout', @optvalue=N'0'
GO
EXEC master.dbo.sp_serveroption @server=N'PREMERATEST', @optname=N'use remote collation', @optvalue=N'true'