Data Structure : Hashing

What is Hashing?

Hashing is a technique used for storing and retrieving information as quickly as possible. It is used to perform optimal searches and is useful in implementing symbol tables

Why Hashing?

In the Trees chapter we saw that balanced binary search trees support operations such as insert, delete and search in O( logn) time. In applications, if we need these operations in O( 1), then hashing provides a way.

Understanding Hashing

We can treat Array has Hash table. But there is a problem.

Example: Write an algo for printing the first repeated character if there are duplicated elements in it.

Simple and Brute Force Way :

Given a string, for each character check whether that character is repeated or not. The time complexity of this approach is O( n2) with O( 1) space complexity.

Better Way : Use Array

Suppose the number of characters are ASCII char which is 256.
We create and array of length 256
If any char is repeated increment its count

Problem is :Suppose given array can have combination of numbers. In this case the set of possible values is infinity.

The process of mapping the keys to locations is called hashing.(linked list can grow dynamically)

Components of Hashing:

Hashing has four key components:

1) Hash Table :

Hash table is a generalization of array.

The basic idea behind hashing is to take a field in a record, known as the key, and convert it through some fixed process to a numeric value, known as the hash key, which represents the position to either store or find an item in the table.

Why Hash Table?

That means, given a key k, we find the element whose key is k by just looking in the kth position of the array. This is called direct addressing. if we have less locations and more possible keys, then simple array implementation is not enough.

In these cases one option is to use hash tables.

Hash table or hash map is a data structure that stores the keys and their associated values, and hash table uses a hash function to map keys to their associated values. The general convention is that we use a hash table when the number of keys actually stored is small relative to the number of possible keys.

2) Hash Functions :

The fixed process to convert a key to a hash key is known as a hash function.

The hash function is used to transform the key into the index. elements, a hash function that maps each item into a unique slot is referred to as a perfect hash function. Given an arbitrary collection of elements, there is no systematic way to construct a perfect hash function.

Our goal is to create a hash function that minimizes the number of collisions, is easy to compute, and evenly distributes the elements in the hash table.

Characteristics of Good Hash Functions :

A good hash function should have the following characteristics:

Minimize collision
Be easy and quick to compute
Distribute key values evenly in the hash table
Use all the information provided in the key
Have a high load factor for a given set of keys

3) Collisions :

Hash functions are used to map each key to a different address space, but practically it is not possible to create such a hash function and the problem is called collision. Collision is the condition where two records are stored in the same location.

4) Collision Resolution Techniques : See Link

Load Factor:

The load factor of a non-empty hash table is the number of items stored in the table divided by the size of the table. This is the decision parameter used when we want to rehash or expand the existing hash table entries.

Data Structure

Thursday, 13 April 2017

Hashing