HDFS Commands

command Description Usage fsck HDFS Command to check the health of the Hadoop file system hdfs fsck ls HDFS Command to display the list of Files and Directories in HDFS hdfs dfs –ls mkdir hdfs dfs –mkdir /directory_name touchz HDFS Command to create a file in HDFS with file size 0 bytes hdfs dfs –touchz /directory/filename du HDFS Command to […]

Optimizing Hive Query Performance Through Mapjoin

Let us explore three parameters having significant impact to hive query performance:hive.auto.convert.join.noconditionaltask = true;hive.auto.convert.join.noconditionaltask.size=10000000hive.mapjoin.smalltable.filesize:hive.auto.convert.join.noconditionaltaskAdded in Hive 0.11.0, and it is true by default. That  means, if the sum of size for n-1 of the tables/partitions for an n-way join is smaller than the size specified by hive.auto.convert.join.noconditionaltask.size(10MB by default), the join is directly converted to […]

TEZ Memory Tuning Checklist

TEZ Application Manager tez.am.resource.memory.mb  should be a multiple of yarn.scheduler.maximum-allocation-mb but less than yarn.scheduler.maximum-allocation-mb             Application Master Java Heap sizes (tez.am.launch.cmd-opts) should be by default 80% of  tez.am.resource.memory.mb  TEZ Container Set hive.tez.container.size to be the same as or a small multiple (1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb but NEVER more than yarn.scheduler.maximum-allocation-mb, […]