Labels

Apache Hadoop (3) ASP.NET (2) AWS S3 (2) Batch Script (3) BigQuery (21) BlobStorage (1) C# (3) Cloudera (1) Command (2) Data Model (3) Data Science (1) Django (1) Docker (1) ETL (7) Google Cloud (5) GPG (2) Hadoop (2) Hive (3) Luigi (1) MDX (21) Mongo (3) MYSQL (3) Pandas (1) Pentaho Data Integration (5) PentahoAdmin (13) Polybase (1) Postgres (1) PPS 2007 (2) Python (13) R Program (1) Redshift (3) SQL 2016 (2) SQL Error Fix (18) SQL Performance (1) SQL2012 (7) SQOOP (1) SSAS (20) SSH (1) SSIS (42) SSRS (17) T-SQL (75) Talend (3) Vagrant (1) Virtual Machine (2) WinSCP (1)

Saturday, November 11, 2017

How to Remove HTML Tags and Get the Word Count of a Content in Bigquery


REGEXP to remove html tags and get the word count of a content

SELECT content_id, COUNT(words) wordcount
FROM (
SELECT content_id, SPLIT(tt, '') words
FROM (
SELECT content_id,
REGEXP_REPLACE(content, r'(<[^>]+>|\&(nbsp;)|(amp;)|&#\d\d\d\d)', '') tt,
FROM [project:dataset.table] ))
GROUP BY content_id

Sunday, September 10, 2017

BigQuery Get First and Last Day of Months

The below Query helps you to get the First Day and Last Day of a month. This can also be used in TABLE_DATE_RANGE to retrieve data for Previous Months.

SQL

SELECT DATE_ADD(DATE_ADD(CURRENT_DATE(),-DAY(CURRENT_DATE())+1,"DAY") ,-1,"MONTH") First_Day_Previous_Month,
       DATE_ADD(DATE_ADD(CURRENT_DATE(),-DAY(CURRENT_DATE()),"DAY") ,0,"MONTH") Last_Day_Previous_Month,

       DATE_ADD(DATE_ADD(CURRENT_DATE(),-DAY(CURRENT_DATE())+1,"DAY") ,0,"MONTH") First_Day_Current_Month




Tuesday, May 30, 2017

Bigquery Leagacy SQL vs Standard SQL - Array Queries


Bigquery supports both Standard and Legacy SQL, below is the example of how to migrate array related SQLs from legacy to standard SQL.

When we use array type of data in our tables then we should follow below Bigquery Syntaxes:


Legacy

SELECT date, id, code, country, gender, dob, city, location, product.category
FROM  TABLE_DATE_RANGE([PROJECT:DATASET.TABLE_],DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'),DATE_ADD(CURRENT_TIMESTAMP(), -0, 'DAY'))

Standard

SELECT date, id, code, country, gender, dob, city, location, prod.category
FROM `PROJECT.DATASET.TABLE_*`,
UNNEST(product) AS prod
WHERE (_TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -1 day))
AND FORMAT_DATE("%Y%m%d", CURRENT_DATE()))

Big Query Legacy Vs Standard Date Ranges Query


Bigquery supports both Standard and Legacy SQL, below are the few tips how we can migrate SQLs from legacy to standard SQL.

Legacy

For date shard table:

SELECT * FROM
TABLE_DATE_RANGE([PROJECT:DATASET.TABLE_],DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'),DATE_ADD(CURRENT_TIMESTAMP(), -0, 'DAY'))

For partitioned table use

_PARTITIONTIME BETWEEN TIMESTAMP(DATE_ADD(CURRENT_TIMESTAMP(), -4, 'Day'))
AND TIMESTAMP(CURRENT_TIMESTAMP())

Standard

For date shard table:

SELECT * FROM
FROM `PROJECT.DATASET.TABLE_*`
WHERE (_TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -1 day))
AND FORMAT_DATE("%Y%m%d", CURRENT_DATE()))


For partitioned table replace _TABLE_SUFFIX with DATE(_PARTITIONTIME)

Sunday, April 23, 2017

Unable to install mysqlclient in python virtualenv

While installing mysqlclient in python virtualenv, some might have experienced issue like 
"failed building wheel for mysqlclient". This issue occurs mostly in Windows, to overcome this we need to download appropriate version of mysqlclient-xxx-xx.whl files, search for below files

and run the command

$ pip install mysqlclient-1.3.10.cp27m-win32.whl

you can see mysqlclient installed successfully.

Monday, April 17, 2017

Connecting to MySQL database in Google Compute Engine - Remote Access

This page help you to access MYSQL database in Google Compute Engine from your system or network.

  1. Get your public IP address.
  2. Open Google Compute Engine in browser window
  3. Navigate to /etc/mysql/
  4. Open my.cnf file and add line:  bind-address = <internal IP>
  5. Restart mysql server: $ service mysql restart
  6. Sign in as mysql root user: $mysql -u root -p
  7. Create new user: mysql>CREATE USER 'user1'@'<internal-ip-my-client>' IDENTIFIED BY '<some-password>'
  8. Provide access to the user: mysql> GRANT ALL PRIVILEGES ON *.* TO 'user1'@'<internal-ip-my-client>' IDENTIFIED BY '<some-password>'; 
  9. Restart mysql server


Wednesday, February 22, 2017

Django Migrate Sqllite to MySql DB

Steps to migrate Django Sqllite DB to MySql DB

1. python manage.py dumpdata -o datadump.json
2. Change settings.py to your mysql

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'database',
        'USER': 'username',
        'PASSWORD': 'password',
        'HOST': 'localhost',   # Or an IP Address that your DB is hosted on
        'PORT': '3306',
    }
}


3. Check you have mysqlclient, else run below command: 
       
       pip install mysqlclient

4. python manage.py migrate --run-syncdb5. python manage.py loaddata datadump.json

Friday, January 27, 2017

Setting up Python Virtual Environment


Follow below steps to create a virtual environment with python 2.7 version, 
you can replace it with any other version.

    $ pip install virtualenv
    $ virtualenv -p python2.7 mypython27
   Run below command to activate python 2.7 virtualenv

   > Go to mypython27 directory
   $ . bin/activate

Wednesday, January 4, 2017

Determining _PARTITION details in BigQuery Partitioned Table



Run below query to check the partition summary of Bigquery table:

SELECT DATE(_PARTITIONDATE) AS PT, DATE(CURRENT_TIMESTAMP()) , DATE(DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'))
FROM [ProjectId:Dataset.Table]
GROUP BY PT

SELECT project_id, dataset_id, table_id, partition_id
, MSEC_TO_TIMESTAMP(creation_time) Created_date, MSEC_TO_TIMESTAMP(last_modified_time) modified_time
from [ProjectId:Dataset.Table$__PARTITIONS_SUMMARY__]