This blog contains posts related to data warehouse. All posts are used in my real time project and can be used as reusable codes and helpful to BI developers.
Labels
Apache Hadoop
(3)
ASP.NET
(2)
AWS S3
(2)
Batch Script
(3)
BigQuery
(21)
BlobStorage
(1)
C#
(3)
Cloudera
(1)
Command
(2)
Data Model
(3)
Data Science
(1)
Django
(1)
Docker
(1)
ETL
(7)
Google Cloud
(5)
GPG
(2)
Hadoop
(2)
Hive
(3)
Luigi
(1)
MDX
(21)
Mongo
(3)
MYSQL
(3)
Pandas
(1)
Pentaho Data Integration
(5)
PentahoAdmin
(13)
Polybase
(1)
Postgres
(1)
PPS 2007
(2)
Python
(13)
R Program
(1)
Redshift
(3)
SQL 2016
(2)
SQL Error Fix
(18)
SQL Performance
(1)
SQL2012
(7)
SQOOP
(1)
SSAS
(20)
SSH
(1)
SSIS
(42)
SSRS
(17)
T-SQL
(75)
Talend
(3)
Vagrant
(1)
Virtual Machine
(2)
WinSCP
(1)
Wednesday, April 4, 2012
Monday, April 2, 2012
Pentaho Data Connection with Variables
Q: When you create a normal database connection, you have to edit the transformation or job to connect to a different host or database. How can I make this a variable?
A: Here are the steps to make a connection based on variables and share the connection for easier reuse:
1. In Spoon, open the transformation containing the current hardcoded form of the DB connection.
2. Expand the "Database connections" section of the navigation tree.
3. Double click on the connection you currently have defined to edit it.
4. In the "server host name" textbox, change the currently hardcoded value (e.g. DBserver) to a variable (e.g. ${DB_HOSTNAME})
5. Save and close that dialog (Click OK)
6. Right click the connection you just edited and select the option "Share", to share it.
7. Save the transformation you had open. (Shared connections don't get written out until you save something)
8. Using a file explorer, navigate to the .kettle directory inside your home directory (i.e. "C:\Documents and Settings\<username>\.kettle" for Windows XP/2000 ", "C:\Profiles\<username>\.kettle" for Windows Vista ", /home/<username>/.kettle" for Linux/Unix)
9. The shared connection should now be in .kettle/shared.xml. This file can be copied and pasted to any new Kettle installation.
10. Edit the kettle.properties file using a standard text editor.
11. Create a new line in it below the comments with the name of the variable you defined in step 4.
(The new line would read as follows if you named the variable DB_HOSTNAME: DB_HOSTNAME = localhost)
12. On any new installation, you can edit that kettle.properties file and define a new value for that variable.
13. Once you have completed all of the above, either restart kettle or select the Set environment variables option in the Edit menu.
A: Here are the steps to make a connection based on variables and share the connection for easier reuse:
1. In Spoon, open the transformation containing the current hardcoded form of the DB connection.
2. Expand the "Database connections" section of the navigation tree.
3. Double click on the connection you currently have defined to edit it.
4. In the "server host name" textbox, change the currently hardcoded value (e.g. DBserver) to a variable (e.g. ${DB_HOSTNAME})
5. Save and close that dialog (Click OK)
6. Right click the connection you just edited and select the option "Share", to share it.
7. Save the transformation you had open. (Shared connections don't get written out until you save something)
8. Using a file explorer, navigate to the .kettle directory inside your home directory (i.e. "C:\Documents and Settings\<username>\.kettle" for Windows XP/2000 ", "C:\Profiles\<username>\.kettle" for Windows Vista ", /home/<username>/.kettle" for Linux/Unix)
9. The shared connection should now be in .kettle/shared.xml. This file can be copied and pasted to any new Kettle installation.
10. Edit the kettle.properties file using a standard text editor.
11. Create a new line in it below the comments with the name of the variable you defined in step 4.
(The new line would read as follows if you named the variable DB_HOSTNAME: DB_HOSTNAME = localhost)
12. On any new installation, you can edit that kettle.properties file and define a new value for that variable.
13. Once you have completed all of the above, either restart kettle or select the Set environment variables option in the Edit menu.
Tuesday, March 27, 2012
SQL Server Agent Functions
SQL Server Agent
Question Difficulty = Easy
Question 1: What sorts of functionality does SQL Server Agent provide?- SQL Server Agent is a Windows service that accompanies each instance of SQL Server on a machine for most editions of SQL Server.
- SQL Server Agent is primarily a job scheduler for executing T-SQL, SSIS, DOS, etc. scripts.
- SQL Server Agent is also responsible for defining Operators and Alerts.
- Operators can be associated with Jobs or Alerts, so that particular people (email addresses, pagers, NET SEND) are notified or distribution lists are notified if an issue occurs.
- Alerts can be setup for custom conditions or errors of a particular severity level.
- Additional information - Setting Up Alerts for All Jobs
- No - The SQL Server Express Edition does not have a SQL Server Agent Service.
Question 3: If SQL Server Express does not have a job scheduling interface what alternatives are available?
- Windows Task Scheduler.
- Third party solutions.
- Additional information - Free Job Scheduling Tool for SQL Server Express and MSDE
- True.
- MSDB
- Additional information - Accessing SQL Server Agent Data
- Question 1: How many options are available to identify failed jobs?
- Manually review the failed Jobs in Management Studio.
- Setup an automated process to query the msdb.dbo.sysjobhistory system table to find the failures.
- Additional information - Finding SQL Server Agent Job Failures
- Question 2: How many of the SQL Server Agent system tables can you name with their associated purpose?
- sysjobactivity stores data about job activity
- sysjobhistory stores data for all historical executions of all jobs
- sysjobs stores data about each job such as the name
- sysjobschedules stores job schedule information
- sysjobservers stores server information related to a job
- sysjobsteps stores specific job step information such as the type of code being issued, the actual code, etc.
- sysjobstepslogs stores specific job step log information for each run if this is enabled.
- Additional information - Accessing SQL Server Agent Data
- Question 3: How many of the SQL Server Agent system stored procedures can you name with their associated purpose?
- sp_help_job
- This stored procedure returns information about the job.
- If no parameters are used information is returned for all jobs.
- If a specific job_id is passed it gives you job information, job step information, schedule information and last run information.
- sp_help_jobactivity
- This stored procedure returns information about the status of the job run.
- If no parameters are used information is returned for all jobs.
- sp_help_jobcount
- This stored procedure gives you a count of how many jobs a schedule is tied to.
- This stored procedure requires either @schedule_id or @schedule_name to be passed as a parameter.
- sp_help_jobhistory
- This stored procedure returns all history information for all of the job runs.
- If no parameters are used information is returned for all jobs.
- If you also use parameter @mode = N'FULL' this provides additional information about each job step.
- sp_help_jobs_in_schedule
- This stored procedure gives you a list of the jobs that are tied to a schedule.
- This requires either @schedule_id or @schedule_name to be passed as a parameter.
- sp_help_jobschedule
- This stored procedure provides jobs schedule information for a particular job.
- This stored procedure requires either @job_id or @job_name to be passed.
- sp_help_jobserver
- This stored procedure provides information about a specific server tied to a job.
- This stored procedure requires either @job_id or @job_name to be passed.
- sp_help_jobstep
- This stored procedure provides information about the job steps for a specific job.
- This stored procedure requires either @job_id or @job_name to be passed.
- sp_help_jobsteplog
- This stored procedure returns information about a specific job step log.
- This stored procedure requires either @job_id or @job_name to be passed.
- sp_get_composite_job_info
- This stored procedure returns combined data for all jobs in the system.
- If no parameters are used info is returned for all jobs.
- Additional information - Accessing SQL Server Agent Data
- sp_help_job
- Question 4: What resources are available to troubleshoot SQL Server Agent?
- SQL Server Agent Log is a record of all entries written by the SQL Server Agent service.
- Additional Information - SQL Server 2005 Exposed - Log File Viewer
- Performance Monitor and Profiler can be setup to monitor the status of a particular job.
- Additional Information - Collecting Performance Statistics for Scheduled Jobs
- SQL Server Agent Log is a record of all entries written by the SQL Server Agent service.
- Question 5: True or False. Besides the MSDB database, SQL Server Agent also has configuration parameter related data stored in the registry.
- True.
- Additional information - Missing SQL Server Agent History
- True.
- Question 1: What is multi-server administration and when would you use it?
- Job management paradigm with a master server and one or more target servers. The master server sends and receives jobs from the target servers with all of the job and job step related information stored on the master server. When the jobs complete on the target servers notification is sent to the master server so this server has the updated information. Multi-server administration is used in a enterprise environment where a consistent set of jobs need to run on numerous SQL Servers, this technology helps to consolidate the creation, execution and management of those jobs.
- Question 2: What is a SQL Server Agent Proxy? Can you name some of the sub-systems proxies? Why are the proxies of any significance?
- A SQL Server Agent Proxy is an account that is setup to help secure a particular sub-system, so that if an login\user is trying to access the particular sub-system and does not have rights, the proxy will grant rights.
- The SQL Server Agent Proxies include:
- ActiveX Script
- Operating System (CmdExec)
- Replication Distributor
- Replication Merge
- Replication Queue Reader
- Replication Snapshot
- Replication Transaction-Log Reader
- Analysis Services Command
- Analysis Services Query
- SSIS Package Execution
- Unassigned Proxies
- The SQL Server Agent Proxies offer a new level of granularity for SQL Server Agent that was not previously available.
- Additional information - SQL Server Agent Proxies
- Question 3: What are the new SQL Server Agent Fixed Database Roles and what is the significance of each role?
- SQLAgentUserRole - Ability to manage Jobs that they own
- SQLAgentReaderRole - All of the SQLAgentUserRole rights and the ability to review multi-server jobs, their configurations and history
- SQLAgentOperatorRole - All of the SQLAgentReaderRole rights and the ability to review operators, proxies and alerts, execute, stop or start all local jobs, delete the job history for any local job as well as enable or disable all local jobs and schedules
- Additional information - <-- SQL Server Security Note --> SQL Server Agent Fixed Database Roles
Monday, March 12, 2012
Ovewrite Excel file using SSIS
Create an SSIS package as shown below:
Step A: Add an SQL task and execute a command "Truncate Table Sales", where Sales is the name of the Excel table.
Step B: Add an SQL task and executye a command to create a table with necessary columns, see example below:
CREATE TABLE Sales(
SalesID NUMBER,
InvoiceNo NUMBER,
SalesAmount NUMBER,
OrderQty NUMBER,
CustomerID NUMBER,
SalesID NUMBER,
InvoiceNo NUMBER,
SalesAmount NUMBER,
OrderQty NUMBER,
CustomerID NUMBER,
SalesDate DATE
)
)
A new table will be created in sheet 'Sales', then create an DFT task to load data into excel sheet Sales.
Wednesday, February 29, 2012
FIX: A connection cannot be made. Ensure that the server is running. (Microsoft.AnalysisServices.AdomdClient)
Most often we may come across issues like below when tried to connect Analysis Service Data Connection:
"A connection cannot be made. Ensure that the server is running. (Microsoft.AnalysisServices.AdomdClient)"
To fix this follow below steps:
Step 1: Navigate to Programs|SQL Server XXXX|Configuration tools|SQL Server Surface Area Configuration
Step 2: Change the service account to local system as shown below:
"A connection cannot be made. Ensure that the server is running. (Microsoft.AnalysisServices.AdomdClient)"
To fix this follow below steps:
Step 1: Navigate to Programs|SQL Server XXXX|Configuration tools|SQL Server Surface Area Configuration
Step 2: Change the service account to local system as shown below:
Wednesday, February 22, 2012
How to shrink database and database logical files
1. Execute the below command to find the logical file names and their location in local folder:
USE <DB_NAME>
Execute SP_Helpfile
E.g., Use AdventureWorks
Execute SP_HelpFile
2. Command to shrink a database:
DBCC SHRINKDATBASE (AdventureWorks)
3. Command to Shrink Logical files (.MDF, .LDF):
If the MDF logical file name of AdventureWorks database is AdventureWorks_Data
Then the command is,
DBCC SHRINKFILE (AdventureWorks_Data)
USE <DB_NAME>
Execute SP_Helpfile
E.g., Use AdventureWorks
Execute SP_HelpFile
2. Command to shrink a database:
DBCC SHRINKDATBASE (AdventureWorks)
3. Command to Shrink Logical files (.MDF, .LDF):
If the MDF logical file name of AdventureWorks database is AdventureWorks_Data
Then the command is,
DBCC SHRINKFILE (AdventureWorks_Data)
MDF, NDF and LDF Files in DQL Database
Primary data files: MDF FilesPrimary data file is the starting point of the database. It points to the other files in the database. Therefore, every database has one primary data file. Also, all the data in the database objects (tables, stored procedures, views, triggers.. etc.) are stored in the primary data files. The recommended and the most common file name extension for primary data files is .mdf.
Secondary data files: NDF Files
You can only have one primary data file for a database. Rest made up by secondary data files. But its not necessary to have a secondary data file. Therefore some databases may not have any secondary data file. But its also possible to have multiple secondary data files for a single database. .ndf is usually recommended to denote secondary data files. It’s also possible to store the secondary data file in a separate physical drive than the one which primary data file is stored.
Log filesL LDF files
Log files in SQL Server databases hold all the log information. Those information can be later used to recover the database. Size of the log file is determined by the logging level you have set up on the database. There must be at least one log file for each database. But it is also possible to have more than one log file for a single database. The recommended file name extension for log files is .ldf.
Secondary data files: NDF Files
You can only have one primary data file for a database. Rest made up by secondary data files. But its not necessary to have a secondary data file. Therefore some databases may not have any secondary data file. But its also possible to have multiple secondary data files for a single database. .ndf is usually recommended to denote secondary data files. It’s also possible to store the secondary data file in a separate physical drive than the one which primary data file is stored.
Log filesL LDF files
Log files in SQL Server databases hold all the log information. Those information can be later used to recover the database. Size of the log file is determined by the logging level you have set up on the database. There must be at least one log file for each database. But it is also possible to have more than one log file for a single database. The recommended file name extension for log files is .ldf.
Subscribe to:
Posts (Atom)

