en 非公開: Secure hashing using Python Hashlib

Secure hashing using Python Hashlib

This tutorial shows you how to create secure hashes using the built-in functionality of Python’s hashlib module.

Understanding the importance of hashes and how to calculate secure hashes programmatically is useful even if you don’t work in application security. but why?

As you work on Python projects, you may come across situations where you are concerned about storing passwords or other sensitive information in databases or source code files. In these cases, it is safer to run a hashing algorithm on the sensitive information and store the hash instead of the information.

This guide explains what hashing is and how it differs from encryption. We also discuss the properties of secure hash functions. Next, compute a hash of the plaintext in Python using a common hashing algorithm. To do this, use the built-in hashlib module.

With all that in mind, let’s get started!

What is a hash?

The hashing process takes a message string and produces a fixed-length output called a hash . This means that the length of the output hash of a particular hashing algorithm is fixed, regardless of the length of the input. But how is it different from encryption?

Encryption encrypts a message or plain text using an encryption algorithm that provides an encrypted output. You can then run a decryption algorithm on the encrypted output to retrieve the message string.

What is a hash?
What is a hash?

However, hashing works differently. We learned that the process of encryption is reversible and that we can switch from encrypted messages to unencrypted messages and vice versa.

Unlike encryption, hashing is not a reversible process. That is, you cannot proceed from the hash to the input message.

Hash function properties
Hash function properties

Hash function properties

Let’s take a quick look at some properties that a hash function should satisfy.

  • Deterministic : The hash function is deterministic . Given a message m, the hash of m is always the same.
  • Preimage Resistant : We already talked about this when we said that hashing is not an irreversible operation. The preimage resistance property indicates that it is impossible to find message m from the output hash.
  • Collision resistance : It must be difficult (or computationally impossible) to find two different message strings m1 and m2 such that the hash of m1 is equal to the hash of m2 of m1 . This property is called collision resistance .
  • Second Preimage Resistant : This means that given a message m1 and a corresponding hash m2 , it is impossible to find another message m2 such that hash(m1) = hash(m2) .

Python hashing library module

Python’s built-in hashlib module provides implementations of several hash and message digest algorithms, including the SHA and MD5 algorithms.

To use the constructor and built-in functions of the Python hashlib module, import it into your working environment as follows:

 import hashlib

The hashlib module provides algorithms_available and algorithms_guaranteed constants. Each of these represents a set of algorithms for which implementation is available and guaranteed on the platform.

So algorithms_guaranteed is a subset of algorithms_available .

Python hash rib module
Python hash rib module

Start the Python REPL, import hashlib, and access algorithms_available and algorithms_guaranteed constants.

 >>> hashlib.algorithms_available
 # Output
{'md5', 'md5-sha1', 'sha3_256', 'shake_128', 'sha384', 'sha512_256', 'sha512', 'md4', 
'shake_256', 'whirlpool', 'sha1', 'sha3_512', 'sha3_384', 'sha256', 'ripemd160', 'mdc2', 
'sha512_224', 'blake2s', 'blake2b', 'sha3_224', 'sm3', 'sha224'}
 >>> hashlib.algorithms_guaranteed
 # Output
{'md5', 'shake_256', 'sha3_256', 'shake_128', 'blake2b', 'sha3_224', 'sha3_384', 
'sha384', 'sha256', 'sha1', 'sha3_512', 'sha512', 'blake2s', 'sha224'}

It turns out that algorithms_guaranteed is actually a subset of algorithms_available .

Create a hash object in Python

Python-Hashribs-1
Python-Hashrib-1

Next, let’s learn how to create a hash object in Python. Compute the SHA256 hash of the message string using the following method:

  • Generic new() constructor
  • algorithm-specific constructor

Using the new() constructor

Let’s initialize the message string.

 >>> message = " is awesome!"

To instantiate a hash object, use new() constructor and pass the name of the algorithm:

 >>> sha256_hash = hashlib.new("SHA256")

Now you can call the hash object’s update() method with message string as an argument.

 >>> sha256_hash.update(message)

Doing this will result in an error because hashing algorithms only work on byte strings.

 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Unicode-objects must be encoded before hashing

To get the encoded string, call encode() method on the method string and use it in update() method call. You can then call hexdigest() method to obtain the sha256 hash corresponding to the message string.

 sha256_hash.update(message.encode())
sha256_hash.hexdigest()
# Output:'b360c77de704ad8f02af963d7da9b3bb4e0da6b81fceb4c1b36723e9d6d9de3d'

Instead of encoding the message string using the encode() method, you can also define the message string as a byte string by prepending the string with b , like this:

 message = b" is awesome!"
sha256_hash.update(message)
sha256_hash.hexdigest()
# Output: 'b360c77de704ad8f02af963d7da9b3bb4e0da6b81fceb4c1b36723e9d6d9de3d'

The obtained hash is the same as the previous hash, confirming the deterministic nature of the hash function.

Furthermore, small changes in the message string can result in large changes in the hash (also known as the “avalanche effect”).

To confirm this, let’s change the “a” in “awesome” to “A” and calculate the hash.

 message = " is Awesome!"
h1 = hashlib.new("SHA256")
h1.update(message.encode())
h1.hexdigest()
# Output: '3c67f334cc598912dc66464f77acb71d88cfd6c8cba8e64a7b749d093c1a53ab'

You can see that the hash has changed completely.

Using algorithm-specific constructors

In the previous example, we used the generic new() constructor and passed “SHA256” as the name of the algorithm that creates the hash object.

Alternatively, you can use sha256() constructor as shown below.

 sha256_hash = hashlib.sha256()
message= " is awesome!"
sha256_hash.update(message.encode())
sha256_hash.hexdigest()
# Output: 'b360c77de704ad8f02af963d7da9b3bb4e0da6b81fceb4c1b36723e9d6d9de3d'

The output hash is the same as the hash we obtained earlier for message string “is great!”.

Examining the attributes of a hash object

Hash objects have several useful attributes.

  • The digest_size attribute indicates the size of the digest in bytes. For example, the SHA256 algorithm returns a 256-bit hash. This equates to 32 bytes.
  • The block_size attribute refers to the block size used in the hashing algorithm.
  • name attribute is the name of the algorithm that can be used in the new() constructor. Examining the value of this attribute is useful when the hash object does not have a meaningful name.

You can see these attributes of the sha256_hash object you created earlier.

 >>> sha256_hash.digest_size
32
>>> sha256_hash.block_size
64
>>> sha256_hash.name
'sha256'

Next, let’s look at some interesting applications of hashing using Python’s hashlib module.

Practical example of hashing

Practical example of hashing
Practical example of hashing

Verifying software and file integrity

As developers, we constantly download and install software packages. This is true whether you’re working on a Linux distribution, Windows or Mac.

However, some mirrors of software packages may be unreliable . A hash (or checksum) will appear next to the download link. You can also verify the integrity of downloaded software by calculating the hash and comparing it to the official hash.

This also applies to files on your machine. Even small changes to the file’s contents can change the hash significantly. You can check if a file has been modified by validating the hash.

Here’s a simple example: Create a text file “my_file.txt” in your working directory and add content to it.

 $ cat my_file.txt
This is a sample text file.
We are  going to compute the SHA256 hash of this text file and also
check if the file has been modified by
recomputing the hash.

Next, open the file in read binary mode ( 'rb' ), read the file contents, and calculate the SHA256 hash as shown below.

 >>> import hashlib
>>> with open("my_file.txt","rb") as file:
...     file_contents = file.read()
...     sha256_hash = hashlib.sha256()
...     sha256_hash.update(file_contents)
...     original_hash = sha256_hash.hexdigest()

Here, the variable original_hash is the hash of “my_file.txt” in its current state.

 >>> original_hash
# Output: '53bfd0551dc06c4515069d1f0dc715d002d451c8799add29f3e5b7328fda9f8f'

Next, modify the file “my_file.txt”. You can remove the extra leading white space before the word “going”. 🙂

Compute the hash again and store it in the computed_hash variable.

 >>> import hashlib
>>> with open("my_file.txt","rb") as file:
...     file_contents = file.read()
...     sha256_hash = hashlib.sha256()
...     sha256_hash.update(file_contents)
...     computed_hash = sha256_hash.hexdigest()

Then you can add a simple assert statement that asserts whether computed_hash is equal to original_hash .

 >>> assert computed_hash == original_hash

If the file has changed (which is the case in this case), an AssertionError should be returned.

 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

You can use hashes when storing sensitive information such as passwords in a database. You can also use hashes for password authentication when connecting to a database. Validates the hash of the entered password by comparing it to the hash of the correct password.

conclusion

I hope this tutorial helped you learn about generating secure hashes using Python. The key points are:

  • Python’s hashlib module provides ready-to-use implementations of several hashing algorithms. You can use hashlib.algorithms_guaranteed to get a list of algorithms guaranteed on your platform.
  • To create a hash object, you can use the general new() constructor with the syntax hashlib.new("algo-name") . Alternatively, you can use a constructor that corresponds to a specific hashing algorithm, such as hashlib.sha256() for SHA 256 hashes.
  • After initializing the hash object with the message string to hash, you can retrieve the hash by calling update() method on the hash object, followed by hexdigest() method.
  • Hashes are useful for checking the integrity of software artifacts or files, or for storing sensitive information in a database.

Next, learn how to code a random password generator in Python.