Mahadevan BI Consultant

Wednesday, October 10, 2018

Bigquery - Querying Day Partioned Table Using Legacy and Standard SQL

FOR PARTITIONED TABLE

#legacySQL
SELECT *
FROM [Project:Dataset.Table]
WHERE _PARTITIONTIME BETWEEN TIMESTAMP(DATE_ADD(CURRENT_TIMESTAMP(), -3, 'Day')) AND TIMESTAMP(CURRENT_TIMESTAMP())
LIMIT 1000

#standardSQL
SELECT *
FROM `Project.Dataset.Table`
WHERE _PARTITIONTIME BETWEEN TIMESTAMP(DATE_ADD(CURRENT_DATE(), INTERVAL -3 DAY)) AND CURRENT_TIMESTAMP()
LIMIT 1000

Saturday, September 29, 2018

Bigquery - SQL for Flattening Custom Metrics Value

Google Analytics stream data into bigquery in a nested json format, it make sometimes difficult for the users to flatten custom metrics data for each event, this can be overcome by using below custom dimension temp function (Standard SQL only). We can pass customMetrics.index and customMetrics.value as parameter for temp function.

CREATE TEMP FUNCTION
customMetricByIndex(indx INT64,
arr ARRAY<STRUCT<index INT64,
value INT64>>) AS ( (
SELECT
x.value
FROM
UNNEST(arr) x
WHERE
indx=x.index) );

SELECT visitStarttime, visitId, visitNumber,
hit.hitNumber AS session_hit_count,
hit.type AS hit_type,
hit.page.hostname url_domain_name,
hit.page,
customMetricByIndex(3, hit.customMetrics) AS custom_metrics_1
FROM
`project.dataset.ga_sessions_20180909`,
UNNEST(hits) AS hit

Bigquery - SQL for Flattening Custom Dimensions Value

Google Analytics stream data into bigquery in a nested json format, it make sometimes difficult for the users to flatten custom dimension data for each event, this can be overcome by using below custom dimension temp function (Standard SQL only). We can pass customDimensions.index and customDimensions.value as parameter for temp function.

CREATE TEMP FUNCTION
customDimensionByIndex(indx INT64,
arr ARRAY<STRUCT<index INT64,
value STRING>>) AS ( (
SELECT
x.value
FROM
UNNEST(arr) x
WHERE
indx=x.index) );

SELECT visitStarttime, visitId, visitNumber,
hit.hitNumber AS session_hit_count,
hit.type AS hit_type,
hit.page.hostname url_domain_name,
hit.page,
customDimensionByIndex(165, hit.customDimensions) AS custom_variable_1
FROM
`project.dataset.ga_sessions_20180909`,
UNNEST(hits) AS hit

Bigquery Views for Google Analytics Realtime Session - Standard SQL

People who started using Google Analytics real-time streaming into bigquery may come across a query conflict while calling ga_realtime_sessions table with data range filter condition, e.g.,

when we execute the below query

SELECT * FROM
TABLE_DATE_RANGE([project:dataset.ga_realtime_sessions_], CURRENT_TIMESTAMP(),CURRENT_TIMESTAMP())
LIMIT 1000

We end up with error message
Query Failed

Error: Cannot output multiple independently repeated fields at the same time.

The reason is because of both real-time table and views have same naming pattern

Realtime Table: project:dataset.ga_realtime_sessions_20180929
Realtime View: project:dataset.ga_realtime_sessions_view_20180929

In addition, the real-time view is available in Legacy SQL, so we cannot use it for Standard SQL queries, to overcome this it is good to save below query as view to get realtime data for today.

SELECT * FROM
`project.dataset.ga_realtime_sessions_2*`
WHERE
CONCAT('2', CAST(_TABLE_SUFFIX AS string)) = FORMAT_DATE("%Y%m%d", CURRENT_DATE())
AND exportKey IN (
SELECT
exportKey
FROM (
SELECT
exportKey,
exportTimeUsec,
MAX(exportTimeUsec) OVER (PARTITION BY visitKey) AS maxexportTimeUsec
FROM
`project.dataset.ga_realtime_sessions_2*`
WHERE
CONCAT('2', CAST(_TABLE_SUFFIX AS string)) = FORMAT_DATE("%Y%m%d", CURRENT_DATE()))
WHERE
exportTimeUsec >= maxexportTimeUsec)

Monday, August 13, 2018

There was a problem confirming the ssl certificate : [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:645)

PIP Issue while installing psycopg2:

I created a virtual environment and tried to install psycopg2, but ended with the following error message:

There was a problem confirming the ssl certificate
: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:645)

To overcome this issue we need to follow the below steps:

1. Check which python & its ssl version

python -c "import ssl; print(ssl.OPENSSL_VERSION)"
OpenSSL 1.0.2f 28 Jan 2016

python3 -c "import ssl; print (ssl.OPENSSL_VERSION)"
OpenSSL 0.9.8zh 14 Jan 2016

2. Check pip
pip --version
pip 9.0.1 from /Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-
packages (python 3.5)

3. Upgrade pip for python3

curl https://bootstrap.pypa.io/get-pip.py | python3
pip --version
pip 10.0.1 from /Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip (python 3.5)
which pip
/Library/Frameworks/Python.framework/Versions/3.5/bin/pip

This solved the issue installing psycopg2.

Sunday, April 29, 2018

SQOOP Import and Export Examples

Below are some sample commands to export and import operator in SQOOP to move data from relational databases (e.g., mysql is used here) to HDFS location

Export:

sqoop export --connect jdbc:mysql://mysqldb.****.****/database --table <table_name> --username ******* -password ****** -fields-terminated-by ',' -m 1 --export-dir <HDFS Path>

Import:

sqoop import --connect jdbc:mysql://mysqldb.******.****/MyDB --table customers --username ****** --password ****** --target-dir batch/sqoop/job1 -m 1

=> m =1 loads all data to single part file.

=> m= 5 loads data to 5 separate part files.

Extracts specific columns:

sqoop import --connect jdbc:mysql://mysqldb.edu.cloudlab.com/retail_db --table customers --username labuser --password edureka --target-dir batch/sqoop/job1 --columns “column1, column2” -m 1

Saturday, March 24, 2018

Build Docker for Logstash - Ubuntu

Below steps helps to launch a logstash machine with Doker file.

Step 1: Install Docker

>> sudo apt-get install docker-ce (for ubuntu)

Step 2: Lets create a folder docker-image

Step 3: create a file called Docker with below scripts:

FROM docker.elastic.co/logstash/logstash:6.2.2
RUN rm -f /usr/share/logstash/pipeline/logstash.conf # (optional)
RUN mkdir -p ADD /usr/share/logstash/template # (optional)
COPY your_pipeline.conf /usr/share/logstash/pipeline/your_pipeline.conf
CMD ["/usr/share/logstash/bin/logstash", "-f", "/usr/share/logstash/pipeline/your_pipeline.conf"]

Step 4: Navigate to Dockerfile location and run below command in terminal:

>> docker build -t test_logstash:v1 <docker dir>

Step 5: Run >> docker run test_logstash:v1 or <image id>

Mahadevan BI Consultant

Labels

Wednesday, October 10, 2018

Bigquery - Querying Day Partioned Table Using Legacy and Standard SQL

FOR PARTITIONED TABLE

Saturday, September 29, 2018

Bigquery - SQL for Flattening Custom Metrics Value

Bigquery - SQL for Flattening Custom Dimensions Value

Bigquery Views for Google Analytics Realtime Session - Standard SQL

Monday, August 13, 2018

There was a problem confirming the ssl certificate : [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:645)

Sunday, April 29, 2018

SQOOP Import and Export Examples

Saturday, March 24, 2018

Build Docker for Logstash - Ubuntu

Total Pageviews

Labels

Wednesday, October 10, 2018

FOR PARTITIONED TABLE

Saturday, September 29, 2018

Monday, August 13, 2018

Sunday, April 29, 2018

Saturday, March 24, 2018

Total Pageviews

Subscribe To