In this blog, we discuss Troubleshooting, Administering and optimizing Hadoop.
TROUBLESHOOTING:
Lets see hadoop base directory
-Remember Logs are your best friend
*error messages
*java exceptions
In the hadoop main directory, you can find all the logs in the logs folder. For some reason if there is no logs folder, then logs will be found in libexec folder. lets see where hadoop stores its files
-Logs are named by machine
*user, daemons
-cluster start up issues are most likely due to configuration.
Administration:
- Commission and decommission: Commissioning is just adding a node into the slaves file, configuring its hdfs site& mapred files and bringing it online.
To decommission we need to add an exclude file to the machine we want to decommission. Also exclude from slave files.
-Check for corruption: fsck, a great way to check file corruption.
-Default override configuration
-copy data, in, out , across clusters.:Distributed copying, which actually reduce map reduce to spread the data across the clusters.
-Tuning and Optimization
-Troubleshooting Jobs and nodes
-safe mode: its actually a read only node for HDFS.
Optimization:
Below are the options that we can optimize
Most of the optimization will come under map reduce.
TROUBLESHOOTING:
Lets see hadoop base directory
-Remember Logs are your best friend
*error messages
*java exceptions
In the hadoop main directory, you can find all the logs in the logs folder. For some reason if there is no logs folder, then logs will be found in libexec folder. lets see where hadoop stores its files
-Logs are named by machine
*user, daemons
-cluster start up issues are most likely due to configuration.
Administration:
- Commission and decommission: Commissioning is just adding a node into the slaves file, configuring its hdfs site& mapred files and bringing it online.
To decommission we need to add an exclude file to the machine we want to decommission. Also exclude from slave files.
-Check for corruption: fsck, a great way to check file corruption.
-Default override configuration
-copy data, in, out , across clusters.:Distributed copying, which actually reduce map reduce to spread the data across the clusters.
-Tuning and Optimization
-Troubleshooting Jobs and nodes
-safe mode: its actually a read only node for HDFS.
Optimization:
Below are the options that we can optimize
Most of the optimization will come under map reduce.

No comments:
Post a Comment