Labels

Apache Hadoop (3) ASP.NET (2) AWS S3 (2) Batch Script (3) BigQuery (21) BlobStorage (1) C# (3) Cloudera (1) Command (2) Data Model (3) Data Science (1) Django (1) Docker (1) ETL (7) Google Cloud (5) GPG (2) Hadoop (2) Hive (3) Luigi (1) MDX (21) Mongo (3) MYSQL (3) Pandas (1) Pentaho Data Integration (5) PentahoAdmin (13) Polybase (1) Postgres (1) PPS 2007 (2) Python (13) R Program (1) Redshift (3) SQL 2016 (2) SQL Error Fix (18) SQL Performance (1) SQL2012 (7) SQOOP (1) SSAS (20) SSH (1) SSIS (42) SSRS (17) T-SQL (75) Talend (3) Vagrant (1) Virtual Machine (2) WinSCP (1)

Monday, September 21, 2015

Creating a New Table in HIVE

Creating a new table in HIVE



CREATE TABLE IF NOT EXISTS mydb.employee (empid int, firstname String, salary String)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

Where mydb is my database name and employee is table name.

Saturday, September 12, 2015

[Error] Terminal initialization failed, falling backl to unsupported java.lang.incompatibleClassChange


While configuring Apache Hive on top of Hadoop, we may get the below error meesage: 
[Error] Terminal initialization failed, falling backl to unsupported java.lang.incompatible
ClassChange


To fix this got to "hive-config.sh" file and add below text.

export HADOOP_USER_CLASSPATH_FIRST=true

Friday, September 11, 2015

[ERROR]: XDG_RUNTIME_DIR not set in the environment.


While configuring Apache Hadoop, we may often come-across below error message:
[ERROR]: XDG_RUNTIME_DIR not set in the environment.
To fix this run the below command as root user:

sudo pkexec env DISPLAY=$DISPLAY XAUTHORITY=$XAUTHORITY gedit

Useful Top HDFS Commands



# Open a terminal window to the current working directory.
# /usr/local/hadoop
# 1. Print the Hadoop version
hadoop version
# 2. List the contents of the root directory in HDFS

hadoop fs -ls /
# 3. Report the amount of space used and available on currently mounted filesystem

hadoop fs -df hdfs:/
# 4. Count the number of directories/files and bytes under the paths that match the specified file pattern

hadoop fs -count hdfs:/
# 5. Run a DFS filesystem checking utility

hadoop fsck - /
# 6. Run a cluster balancing utility

hadoop balancer
# 7. Create a new directory named “warehouse” below the /user/mypractice directory in HDFS.
hadoop fs -mkdir /user/mypractive/warehouse
# 8. Add a sample text file from the local directory named “data” to the new directory you created in HDFS  during the previous step.

hadoop fs -put data/sample.txt /user/mypractice/warehouse
# 9. List the contents of this new directory in HDFS.

hadoop fs -ls /user/mypractice/warehouse
# 10. Add the entire local directory called “retail” to the /user/mypractice directory in HDFS.

hadoop fs -put data/retail /user/mypractice/warehouse
# 11. Since /user/mypractice is your home directory in HDFS, any command that does not have an absolute path is interpreted as relative to that directory. The next command will therefore list your home directory, and should show the items you’ve just added there.

hadoop fs -ls
# 12. See how much space this directory occupies in HDFS.

hadoop fs -du -s -h hadoop/retail
# 13. Delete a file ‘customers’ from the “retail” directory.

hadoop fs -rm hadoop/retail/customers
# 14. Ensure this file is no longer in HDFS.

hadoop fs -ls hadoop/retail/customers
# 15. Delete all files from the “retail” directory using a wildcard.

hadoop fs -rm hadoop/retail/*
# 16. To empty the trash

hadoop fs -expunge
# 17. Finally, remove the entire retail directory and all of its contents in HDFS.

hadoop fs -rm -r hadoop/retail
# 18. List the hadoop directory again

hadoop fs -ls hadoop
# 19. Add the purchases.txt file from the local directory named “/home/training/” to the directory you created in HDFS

hadoop fs -copyFromLocal /home/training/purchases.txt /user/mypractice/warehouse
# 20. To view the contents of your text file purchases.txt which is present in your hadoop directory.

hadoop fs -cat /user/mypractice/warehouse/purchases.txt
# 21. Add the purchases.txt file from “/user/mypractice/warehouse” directory which is present in HDFS directory to the directory “data” which is present in your local directory

hadoop fs -copyToLocal /user/mypractice/warehouse purchases.txt /home/training/data
# 22. cp is used to copy files between directories present in HDFS

hadoop fs -cp /user/mypractice/*.txt /user/mypractice/warehouse
# 23. ‘-get’ command can be used alternatively to ‘-copyToLocal’ command

hadoop fs -get /user/mypractice/warehouse/sample.txt /home/training/
# 24. Display last kilobyte of the file “purchases.txt” to stdout.

hadoop fs -tail hadoop/purchases.txt
# 25. Default file permissions are 666 in HDFS. Use ‘-chmod’ command to change permissions of a file

hadoop fs -ls /user/mypractice/warehouse/purchases.txt
sudo -u hdfs hadoop fs -chmod 600 /user/mypractice/warehouse/purchases.txt
# 26. Default names of owner and group are training, training. Use ‘-chown’ to change owner name and group name simultaneously

hadoop fs -ls /user/mypractice/warehouse/purchases.txt
sudo -u hdfs hadoop fs -chown root:root /user/mypractice/warehouse/purchases.txt
# 27. Default name of group is training
# Use ‘-chgrp’ command to change group name

hadoop fs -ls /user/mypractice/warehouse /purchases.txt
sudo -u hdfs hadoop fs -chgrp training /user/mypractice/warehouse/purchases.txt
# 28. Move a directory from one location to other
hadoop fs -mv /user/mypractice/warehouse /user/mypractice/retail
# 29. Default replication factor to a file is 3. Use ‘-setrep’ command to change replication factor of a file

hadoop fs -setrep -w 2 /user/mypractice/warehouse/sample.txt
# 30. Copy a directory from one node in the cluster to another
# Use ‘-distcp’ command to copy,
# -overwrite option to overwrite in an existing files
# -update command to synchronize both directories

hadoop fs -distcp hdfs://namenodeA/apache_hadoop hdfs://namenodeB/hadoop
# 31. Command to make the name node leave safe mode

hadoop fs -expunge
sudo -u hdfs hdfs dfsadmin -safemode leave
# 32. List all the hadoop file system shell commands

hadoop fs
# 33. Last but not least, always ask for help!

hadoop fs -help

Thursday, September 3, 2015

Hive: How to View Column Header with Resultset

In general, when we query in Hive, it displays result set without column names. To over come this execute below command:







execute set hive.cli.print.header=true;