What is Hashing ? Hashing with example

Pawan Kumar Yadav
3 min readDec 2, 2023

--

Credit : Dall-E-3

What is hashing?

Hashing is a technique for transforming a data input of any size into a fixed-size output called a hash value. This hash value is a unique representation of the input data and can be used for various purposes, such as verifying data integrity, storing data efficiently, and securing sensitive information.

Why do we need hashing ?

Hashing is used for various purposes in computer science and information security due to its unique properties. Here are some key reasons why hashing is important:

  1. Password Security:
  • Problem: Storing passwords in plain text is a significant security risk. If a database is compromised, all user passwords could be exposed.
  • Solution: Hashing passwords before storing them ensures that even if the database is breached, attackers see only the hash values, not the actual passwords. This adds a layer of security to user accounts.

2. Data Integrity:

  • Problem: When transmitting data over a network or storing it, there’s a risk of accidental or intentional data corruption.
  • Solution: Hashing is used to create checksums or digital signatures, allowing data recipients to verify the integrity of the received data. If the hash values match, the data is intact.

3. Efficient Data Retrieval:

  • Problem: In large datasets, searching for specific information can be time-consuming.
  • Solution: Hash functions are employed in data structures like hash tables to enable quick data retrieval. This is essential for efficient searching and sorting in databases.

4. Preventing Duplicate Data:

  • Problem: Avoiding the storage of duplicate copies of the same data.
  • Solution: Hash functions can be used to check if data already exists in a dataset, preventing the storage of identical information.

5. Consistent File Retrieval:

  • Problem: Finding and retrieving files quickly in a file system.
  • Solution: Hashing is used to create unique file identifiers, allowing for rapid and consistent file retrieval.

Most commonly used Hashing Algorithms

  1. MD5 (Message Digest 5)
  2. SHA-1 (Secure Hash Algorithm 1)
  3. SHA-256 (Secure Hash Algorithm 2)
  4. SHA-3 (Secure Hash Algorithm 3)
  5. CRC32 (Cyclic Redundancy Check 32)
  6. MurmurHash
  7. XXHash
  8. SpookyHash

The choice of hashing algorithm depends on the specific needs of the application.

Example of Hashing

Hashing Passwords in Python

Let’s convert a column of data, such as passwords, using hashing and import it into a Python DataFrame, you can use the hashlib library for hashing and the pandas library for working with DataFrames.

Import Necessary Libraries

import pandas as pd
import hashlib

Let’s create a simple DataFrame with a column containing passwords.

data = {'Username': ['user1', 'user2', 'user3'],
'Password': ['password123', 'secret456', 'secure789']}

df = pd.DataFrame(data)

Now, let’s create a new column in the DataFrame where we store the hashed values of the passwords.

def hash_password(password):
sha256 = hashlib.sha256()
sha256.update(password.encode('utf-8'))
return sha256.hexdigest()

df['Hashed_Password'] = df['Password'].apply(hash_password)

Now, if you print the DataFrame, you’ll see the original passwords and the corresponding hashed values.

print(df)

Username Password Hashed_Password
0 user1 password123 fcaf00c65b3b3e470436380cf8e1c91a...
1 user2 secret456 1b79f7127f9b17112efb56b32f5e217a...
2 user3 secure789 e23eab389f62d2c135580e3924a4a812...

Now, you have a DataFrame with the original passwords and their hashed versions.

Remember that when using password hashing in a real-world scenario, you should consider additional security measures, such as salting, to enhance protection against certain types of attacks. Also, choose an appropriate and secure hashing algorithm based on your application’s requirements.

--

--