Labels

Apache Hadoop (3) ASP.NET (2) AWS S3 (2) Batch Script (3) BigQuery (21) BlobStorage (1) C# (3) Cloudera (1) Command (2) Data Model (3) Data Science (1) Django (1) Docker (1) ETL (7) Google Cloud (5) GPG (2) Hadoop (2) Hive (3) Luigi (1) MDX (21) Mongo (3) MYSQL (3) Pandas (1) Pentaho Data Integration (5) PentahoAdmin (13) Polybase (1) Postgres (1) PPS 2007 (2) Python (13) R Program (1) Redshift (3) SQL 2016 (2) SQL Error Fix (18) SQL Performance (1) SQL2012 (7) SQOOP (1) SSAS (20) SSH (1) SSIS (42) SSRS (17) T-SQL (75) Talend (3) Vagrant (1) Virtual Machine (2) WinSCP (1)

Saturday, March 24, 2018

Build Docker for Logstash - Ubuntu

Below steps helps to launch a logstash machine with Doker file.

Step 1: Install Docker

>> sudo apt-get install docker-ce (for ubuntu)

Step 2: Lets create a folder docker-image

Step 3: create a file called Docker with below scripts:

FROM docker.elastic.co/logstash/logstash:6.2.2
RUN rm -f /usr/share/logstash/pipeline/logstash.conf # (optional)
RUN mkdir -p ADD /usr/share/logstash/template # (optional)
COPY your_pipeline.conf /usr/share/logstash/pipeline/your_pipeline.conf
CMD ["/usr/share/logstash/bin/logstash", "-f", "/usr/share/logstash/pipeline/your_pipeline.conf"]

Step 4: Navigate to Dockerfile location and run below command in terminal:

>> docker build -t test_logstash:v1 <docker dir>

Step 5: Run >> docker run  test_logstash:v1 or <image id>

Merge Panda Dataframe and Remove NaN Records


Below method help developer to merge multiple dataframe with same number of columns into single dataframe

Assume we have dataframes : r1 and r2, and we need to ignore null records then you can use below command with dropna().

merged_df = pd.concat([r1, r2], axis=0).dropna()
merged_df.to_csv('output.csv', index=False, doublequote=False)

Split Strings in Bigquery Using REGEXP

Split Strings in Bigquery Using REGEXP

Assume that we have a bigquery column with values like below:

---------------------------------------------------------
pair
----------------------------------------------------------
television:100
mobile:250
driver: 110
----------------------------------------------------------

Expected Output
---------------------------------------------------------
Device                         | Cost
---------------------------------------------------------
television                    |100
mobile                        | 250
driver                          | 110
----------------------------------------------------------

Use below bigquery statements to split the column:

 CASE
      WHEN REGEXP_MATCH(pair,":") THEN REGEXP_EXTRACT(pair, r'(\w*):')
      ELSE pair
    END AS attribute_name,
    REGEXP_EXTRACT(pair, r'\:(.*)') AS attribute_value

Python Fundamental - Operators

#Save the below code as python file and execute to see output.

varA = 15
varB = 6

# 1. Addition operator
add_sample = varA + varB
print(add_sample)

# 2. Subtract operator
sub_sample = varA - varB
print(sub_sample)

# 3. Multiply operator
multiply_sample = varA * varB
print(multiply_sample)

# 4. Division operator
division_sample = varA / varB
print(division_sample)

#5. Add Assignment (usefull for loop statement, any one below method can be used)

add_sample += 3
print(add_sample)

add_sample = add_sample + 1
print(add_sample)

# Similarly for other operators, use operator sign befor equal to assign value:
# examples:  -=, *=, /=

#7 Modulus

mod_sample = varA % varB
print(mod_sample)


#8 exponentiation

exp_sample = varA ** 2
print(exp_sample)

# Note: Operator Rule
# BODMAS: Bracket Orders Division Multiple Addition Subtraction

Python Fundamental - Strings and Indexes Example

#Save the below code as python file and execute to see output.

# 1. Normal

strA = 'My First string in quotes'
strB = "My first string in double quotes"

print (strA + "; " + strB)

#2. Escape Sequence
# escA= "My "first" double quotes" (This will result in error)
escA = "My 'first' double quotes"
escB = "My \"first\" double quotes"
print( escA + "; " + escB)

#3 String Index
# Index starts at 0 in python
indA = strA[0]
indB = strA[5]
print("Print indexes: " + indA + "; " + indB)

#4 Slicing of Strings

strC = "Python"

sliceA = strC[:3] #gives first 3 characters
sliceB = strC[3:] #gives last 3 characters
sliceC = strC[2:4] #gives 3 and 4 characters
print("Print slice indexes: " + sliceA + "; " + sliceB + "; "+ sliceC)

Python Fundamental Variables and Datatypes - Examples

#Save the below code as python file and execute to see output.

# 1. Add a variable and assign datatype int

myInt = 5
print(myInt)

# 2. Add a variable and assign datatype float

myFloat = 5.5
print(myFloat)

 # 3. Add a variable and assign datatype boolean
myBool = True
print(myBool)