Knowledge Hub: Big Data: Introduction to Zookeeper

Saturday, August 16, 2014

Big Data: Introduction to Zookeeper

So far we have covered following topics in Big Data

Big Data- The Rise and the Future

· Big data: Technology Stack

· Big Data: Hadoop Distributed Filesystem (HDFS)

· Big Data: Map Reduce

· Big Data- Installing Hadoop ( Single Node)

· Big Data- Apache Hadoop Multi Node

· Big Data: Troubleshooting, Administering and optimizing Hadoop

· Big Data: Managing HDFS

· Big Data: Map Reduce Development

· Big Data: Introduction to Pig

Introduction to Hbase

In this blog we will discuss ZooKeeper.

In the world of hadoop, theme is distributed. What if you want to build your own distributed application?

You have to worry about centralized configuration, synchronization, serialization.

Zookeeper is the distributed coordination service for the distributed application. a centralized repository.

ZOOKEEPER Overview

What is zookeeper?

-Distributed coordination service for distributed applications

-Used for synchronization, serialization and coordination

-Handles the 'nitty-gritty' side of distributed app dev

-Apps use these services to coordinate distributed processing

Distributed Challenges:

- coordination is error prone

-Rack conditions, deadlocks,partial failures, inconsistencies

ZooKeeper Goals

-Serialization

-reliability

-Simple API

-Atomicity

Typical Uses

-Configuration - message queue

-Notification/Synchronization

Now Lets talk about ZOOKeeper architecture

Above diagram, if we looks closely

Z nodes

- Container for data and other nodes

-Stores Stats' user data( 1MB)

Z nodes types

- persistent

-emphermeral

-sequential

Knowledge Hub