Labels

Apache Hadoop (3) ASP.NET (2) AWS S3 (2) Batch Script (3) BigQuery (21) BlobStorage (1) C# (3) Cloudera (1) Command (2) Data Model (3) Data Science (1) Django (1) Docker (1) ETL (7) Google Cloud (5) GPG (2) Hadoop (2) Hive (3) Luigi (1) MDX (21) Mongo (3) MYSQL (3) Pandas (1) Pentaho Data Integration (5) PentahoAdmin (13) Polybase (1) Postgres (1) PPS 2007 (2) Python (13) R Program (1) Redshift (3) SQL 2016 (2) SQL Error Fix (18) SQL Performance (1) SQL2012 (7) SQOOP (1) SSAS (20) SSH (1) SSIS (42) SSRS (17) T-SQL (75) Talend (3) Vagrant (1) Virtual Machine (2) WinSCP (1)

Tuesday, May 30, 2017

Bigquery Leagacy SQL vs Standard SQL - Array Queries


Bigquery supports both Standard and Legacy SQL, below is the example of how to migrate array related SQLs from legacy to standard SQL.

When we use array type of data in our tables then we should follow below Bigquery Syntaxes:


Legacy

SELECT date, id, code, country, gender, dob, city, location, product.category
FROM  TABLE_DATE_RANGE([PROJECT:DATASET.TABLE_],DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'),DATE_ADD(CURRENT_TIMESTAMP(), -0, 'DAY'))

Standard

SELECT date, id, code, country, gender, dob, city, location, prod.category
FROM `PROJECT.DATASET.TABLE_*`,
UNNEST(product) AS prod
WHERE (_TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -1 day))
AND FORMAT_DATE("%Y%m%d", CURRENT_DATE()))

Big Query Legacy Vs Standard Date Ranges Query


Bigquery supports both Standard and Legacy SQL, below are the few tips how we can migrate SQLs from legacy to standard SQL.

Legacy

For date shard table:

SELECT * FROM
TABLE_DATE_RANGE([PROJECT:DATASET.TABLE_],DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'),DATE_ADD(CURRENT_TIMESTAMP(), -0, 'DAY'))

For partitioned table use

_PARTITIONTIME BETWEEN TIMESTAMP(DATE_ADD(CURRENT_TIMESTAMP(), -4, 'Day'))
AND TIMESTAMP(CURRENT_TIMESTAMP())

Standard

For date shard table:

SELECT * FROM
FROM `PROJECT.DATASET.TABLE_*`
WHERE (_TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -1 day))
AND FORMAT_DATE("%Y%m%d", CURRENT_DATE()))


For partitioned table replace _TABLE_SUFFIX with DATE(_PARTITIONTIME)