Labels

Apache Hadoop (3) ASP.NET (2) AWS S3 (2) Batch Script (3) BigQuery (21) BlobStorage (1) C# (3) Cloudera (1) Command (2) Data Model (3) Data Science (1) Django (1) Docker (1) ETL (7) Google Cloud (5) GPG (2) Hadoop (2) Hive (3) Luigi (1) MDX (21) Mongo (3) MYSQL (3) Pandas (1) Pentaho Data Integration (5) PentahoAdmin (13) Polybase (1) Postgres (1) PPS 2007 (2) Python (13) R Program (1) Redshift (3) SQL 2016 (2) SQL Error Fix (18) SQL Performance (1) SQL2012 (7) SQOOP (1) SSAS (20) SSH (1) SSIS (42) SSRS (17) T-SQL (75) Talend (3) Vagrant (1) Virtual Machine (2) WinSCP (1)

Friday, December 23, 2016

Install AWS Components in Python Virtual Environment

Installing AWS Components in Python Virtual Environment

Step 1: >>$ sudo su
Step 2: >>$ pip install awscli
Step 3: >>$ aws configure
Complete the authorization steps with your AWS Key and Secret Key.

Install gcloud components in python virtual environment

Installing gcloud components in virtual environment

Step 1: Download google cloud SK file (below is an example for Mac)
Step 2. Unzip the tar file, navigate to bin folder
>> ./install.sh
if you face any issues like Module Platform missing, execute below command:
>>$ export CLOUDSDK_PYTHON_SITEPACKAGES=1

Step 3: Run gcloud init
Step 4: Open new terminal and run 'bq ls', a authorization steps occurs, complete the authorization.

Monday, November 14, 2016

How to Change Default Version of Python to another Version

Follow the below steps to change default version of python to another version:

Step 1: Go to /home/<user> directory in your terminal.
Step 2: Run sudo vim ~/.bashrc_aliases
Step 3: Add alias python=python3
Step 4: Run source ~/.bashrc_aliases

This will change the version from default to python3 version.

Tuesday, October 25, 2016

Google Cloud BQ Command Line Data Load

'bq' is command line tool provided by Google Cloud Platform to access bigquery table and perform operations like DDL, DML, etc. 

Refer http://mahadevanrv.blogspot.in/2016/06/install-google-bigquery-command-line.html for GCloud installation.

Data load using bq involves three types:

1. Empty (Default): It writes data into an empty table, if data already exists it throws error.

bq query  ---n=1000 --destination_table=<table_name> 'SELECT * FROM [project:dataset.source_table];'

2. Replace: It replace a current table with newly obtained data output. It involves loss of existing data in a destination table. Use it wisely to perform incremental load which involves update and inserts.

bq query  ---replace --destination_table=<table_name> 'SELECT * FROM [project:dataset.source_table];'

3. Append: It appends new records to the existing table. If same command is executed more than one time it will create duplicate records. Can be used for incremental load which involves only data insert.

bq query  ---append_table --destination_table=<table_name> 'SELECT * FROM [project:dataset.source_table];'

Sunday, September 25, 2016

A Tour to Google Cloud Platform

Nowadays, cloud service has become one of the most important technologies in software industries. There are many organizations already migrated to cloud service and many more are planning to migrate their applications to cloud. But choosing a right cloud platform is always not easy. One need to identify the right cloud provider by studying their own infrastructures and their applications. We have three big cloud service provider in today’s market, namely, AWS, Azure, and Google Cloud Platform. All offers different products and services but we need to understand how these products and services can help us in meeting our demands and serve us the best.

We need to ask three questions our-self before migrating to cloud:

Does it supports our existing system with or without minimal change?
Does it requires any re-architecture?
Is it affordable and saves cost?

This article is intend to provide basic list of some of the popular products and services of Google Cloud Platform (GCP). Google cloud platform is nothing but Google’s own infrastructure, which was used internally by Google team for more than 16 years. This shows the maturity and reliability of Google Cloud and answers the debate of google maturity on cloud platform. Google ventured into cloud platform in the year 2008 with the release of Google App Engine, since 2008 google had made many progress on cloud platform and launched several products like Google Compute Engine, Cloud Storage, Cloud SQL, Cloud Datastore, BigQuery, etc.

   

Compute

Virtual Servers                 : Compute Engine
Autoscale                         : Autoscaling
Virtual Server Disk          : Persistent Disk
Container Management   : Container Engine
Backend Processing Logic    : Cloud Functions
Microservices                  : Cloud Functions
Web Applications            : App Engine
Market Place                   : Cloud Launcher

Google Compute Engine (GCE)

Google Compute Engine (GCE) provides scalable and high-performance virtual machines. One could easily create a compute engine with in few minutes and customize it at any time. GCE provides supports to different operating systems like Debian, CentOS, CoreOS, SUSE, Ubuntu, Red Hat, FreeBSD, and Windows 2008 R2 and 2012 R2. GCE are accessible through command line (gcloud), Compute Engine Console, Google API client libraries, and RESTful API.
GCE provides option to add addition storage disks and create snapshots of the disks.

Google App Engine (GAE)

Google App Engine is a platform to deploy and manage all your web and mobile applications. App Engine takes care of complete server management, it is available with built-in services and APIs, also scales automatically based on the traffic to your sites. It supports popular development tools such as Eclipse, IntelliJ, Maven, Git, Jenkins, and PyCharm.

Google Container Engine (GCK)

Google Container Engine is made up of group of Google Compute Engine instances. It is used by developers to create or resize dockers clusters, to create pods, replication controllers, jobs, services or load balancers, and to create and test enterprise applications.

Storage

Object Storage              : Cloud Storage
Archiving and Backup  : Cloud Storage Nearline (Storage)
Content Delivery          : Cloud CDN

Google Cloud Storage

Google Cloud Storage is a RESTful service for storing and accessing frequently used huge volume of data (unlimited file storage with unlimited file size) with high performance. Cloud storage are project based, we can create buckets and folders in the respective projects. Buckets are the primary storage container and act as file repository which can be easily accessible from APIs and command line (gsutil command). The biggest advantage of google cloud storage is it’s accessible from any where and reduces the cost of operation.

Google Storage Nearline

As data grows with period, it becomes necessary to archive infrequently used data. Those unused data can be archived to Google Storage Nearline. By moving all our infrequently used data to Google Storage Nearline, we pay very less cost (1 cent per GB/month) for storage than we pay it for GCP. When required we can move data from Nearline to cloud storage at high speed.

Database

Relational Database  : Cloud SQL
NoSQL Database      : Cloud Datastore
Data Warehouse        : BigQuery
Table Storage            : Cloud Bigtable
Caching                     : Memcache (App Engine)

BigQuery

Google Bigquery allow users to store and analyze multi-terabytes of data with SQL like queries. We can access bigquery tables with browser window, command line, REST interface and even with Excel connector. Though bigquery resembles like a relational database, it is not a  relational database. In addition, big query allows us to add new views and functions and perform analytics using them.

Google Cloud SQL

Cloud SQL is a fully managed MYSQL relational database launched in Google Cloud. Cloud SQL can be scaled to 10TB of data and accessed at fast performance rate. It provides features like high availability failover, replication and backup configurations. It could be connected from anywhere including GCE instances and their workstations.

Cloud Datastore

Google Cloud Datastore are infinitely available NoSql databases launched in Google platform. It is  a object store and doesn’t require any fixed schema.

Some of the other google platform services and products are given below:

Analytics & Big Data

Big Data Processing        : Cloud Dataproc
Data Orchestration          : Cloud Dataflow
Analytics                         : Cloud Dataflow
Visualization                   : Cloud Datalab
Machine Learning           : Cloud Machine Learning Prediction API
Intelligence API              : Translate, Speech, Vision
Search                             : Search API (App Engine)
Genomics                        : Google Genomics

Network

Networking                     : Cloud Virtual Network
Domain Name System    : Cloud DNS
Dedicated Network         : Cloud Interconnect
Load Balancing               : Cloud Load Balancing

Application Service

Messaging                   : Cloud Pub/Sub, App Engine - Task Queue
App Testing                 : Cloud Test Lab
Email Address             : App Engine - Email Service
API Management        : Cloud End Points

Security

Authentication and Authorization     : IAM, Cloud Resource Manager, Google Signin,
                                                             Google Identity Toolkit
Encryption                : BYOK, platform Level Encryption
Security                    : Cloud Security Scanner

Streaming

Streaming            : Cloud Dataflow

Mobile Services

Pro App Development        : App Engine, Firebase