The Public Key Infrastructure (PKI) industry recommends that any infrastructure object using SHA-1 be converted to the more secure SHA-2. This article describes why and how to do this.
In 2016, migration to SHA-2 was a good preparation for the general deadline, but now this transition is mandatory to ensure security. Many devices and applications that use electronic certificates already display warnings or errors or refuse to work if the certificate uses SHA-1 or higher hashing algorithms. Why these forced changes? Because serious cryptographic vulnerabilities have been discovered in the SHA-1 hash, and the days when its protection is still reliable are already numbered.
Up until 2022, SHA-1 was the most popular hash used for cryptographic signatures, and some, particularly older, applications and devices did not accept or understand hashes or certificates based on the SHA-2 algorithm. This was the main problem with the transition to the new standard.
What is a Python hash function
Hash functions are used in cryptographic algorithms, electronic signatures, message authentication codes, tampering detection, fingerprint scanning, checksums (verifying message integrity), hash tables, password storage , and much more.
As a Python developer, you may need these functions to check for duplicate data and files, check data integrity when transmitting information over a network, securely store passwords in databases , or perhaps do some cryptography-related work.
Have questions about Python?
On our forum you can ask any question and get an answer from our entire community!
Python Help Forum
Telegram Chat & Channel
Join our friendly Python chat and start communicating with like-minded people! Become part of a big community!
Chat
Channel
Public VK
One of the largest Python communities on the VK social network. Video lessons and books for you!
Subscribe
Note that hash functions are not a cryptographic protocol; they do not encrypt or decrypt information, but are a fundamental part of many cryptographic protocols and tools.
Attacks on hashes
The strength of the cryptographic hash function is also ensured by the fact that a unique hash is generated for any unique message. At the same time, it is necessary that the hash alone cannot be used to reproduce the original message. The attack for finding a preimage is based on an attempt to bypass this property. In addition, two different messages should never be converted to the same hashes, otherwise a phenomenon called collision will occur. The birthday attack is based on this phenomenon.
Conventional cryptographic hash functions are initially considered cryptographically strong, but over time, attackers find mathematical tricks that weaken their security.
The computational complexity of a strong hash is equal to the stated effective length of the bit sequence minus 1. Thus, when its flaws are unknown, a 128-bit hash will have a computational complexity of 2^127. Once someone finds a mathematical algorithm that allows a hash to be cracked in less time than the effective length of bits minus 1, that hash will be considered weakened. As a rule, all generally accepted hashes become weaker over time. When the effective bit length is shortened, the hash becomes less secure and less valuable. When it is believed that a hash can be cracked in a reasonable amount of time and with little computational resources (costing hundreds of thousands to millions of dollars), then the hash is considered "cracked" and should no longer be used. The cracked hashes are used by malware and attackers to create supposedly legitimate digitally signed software. A good example of such software is the Flame malware program. In general, weak hashes can play a role and should not be used.
Popular Python hash functions
Some commonly used hash functions:
- MD5 : The algorithm produces a hash value of 128 bits. Widely used to check data integrity. Not suitable for use in other areas due to MD5 security vulnerability.
- SHA : A group of algorithms that were developed by the United States NSA. They are part of the US Federal Information Processing Standard. These algorithms are widely used in several cryptographic applications. Message length varies from 160 to 512 bits.
The hashlib module, included in the Python standard library, is a module that provides an interface to the most popular hashing algorithms. hashlib implements some algorithms, however, if you have OpenSSL installed, hashlib can also use these algorithms.
This code is designed to work in Python 3.5 and higher. If you wish to run these examples in Python 2.x, simply remove the attributems_available and algorithms_guaranteed calls.
First, the hashlib module is imported:
Python
1 | import hashlib |
Algorithms_available and algorithms_guaranteed are now used to list available algorithms.
Python
1 2 | print(hashlib.algorithms_available) print(hashlib.algorithms_guaranteed) |
The algorithms_available method creates a list of all algorithms available on the system, including those available through OpenSSl. In this case, you may notice duplicate names in the list. algorithms_guaranteed lists only the module's algorithms. md5, sha1, sha224, sha256, sha384, sha512 are always present.
Handling legacy SHA-1 hashes
All major web browser vendors (e.g. Microsoft, Google, Mozilla, Apple) and other trusted parties have recommended that all clients, services and products currently using SHA-1 migrate to SHA-2, although what should migrate and when depends from the supplier. For example, many vendors only care about TLS certificates (i.e. web servers), and only Microsoft is concerned about using SHA-1 in a digital certificate from a "public" certificate authority. But you can expect that all vendors will require all applications and devices to be converted to SHA-2, even if they are not ready to do so. Currently, most browsers will show an error message if a website uses a public digital certificate signed by SHA-1, but some will allow you to bypass the pop-up and go to such a site. Perhaps soon, all major browser vendors will prohibit bypassing error messages and navigating to sites that use SHA-1 digital certificates.
Unfortunately, changing from SHA-1 to SHA-2 is a one-way operation in most server scenarios. For example, once you start using a SHA-2 digital certificate instead of SHA-1, users who don't understand SHA-2 certificates will start receiving warnings and error messages, or even denials. For users of apps and devices that don't support SHA-2, the transition will be a dangerous leap.
SHA-3
SHA3 is the newest hashing algorithm in the SHA family and was published by NISH in 2015 but has not yet been widely adopted. Although it belongs to the same family, its internal structure is completely different. This new hashing algorithm is based on "Sponge Construction" . » The design of this sponge is based on a random function or random permutation of data, it allows you to enter any amount of data and generate any amount of data, in addition, the function is pseudo-random with respect to all previous entries. This allows SHA-3 to have greater flexibility; the goal is to replace SHA2 in typical TLS or VPN protocols that use this hashing algorithm to verify data integrity and authenticity.
SHA-3 was born as an alternative to SHA2, not because SHA-2 is unsafe to use, but because they wanted to have a plan B in case of a successful attack against SHA2, thus both SHA-2 and SHA-3 would coexist. In fact, for many years SHA-3 has not been widely used like SHA-2.
Operation and Specifications
SHA-3 uses a "sponge" design, data is "swallowed" and processed to display output of the desired length. The data absorption phase uses an XOR operation, which is then converted into a permutation function. SHA-3 allows us to have extra bits of information to protect the hash function from expansion attacks, which happens with MD5, SHA-1 and SHA-2. Another important feature is that it is very flexible, allowing it to be tested for cryptanalytic attacks and used in lightweight applications. Currently SHA2-512 is twice as fast as SHA3-512, but the latter can be implemented in hardware, which can then be just as fast or even faster.
List of all coins of the SHA-256 algorithm
Bitcoin (BTC), Bitcoin Cash (BCH) and Bitcoin SV (BSV) are three well-known coins that use the SHA-256 hashing algorithm.
In addition, there are hundreds of altcoins that you can mine using your ASIC. However, please note that most of the projects are abandoned.
Mining such coins is completely pointless. Because they are difficult to sell as they are rarely traded on exchanges. Therefore, we have decided to list only well-known coins and those that have a reasonable trading volume.
Profitability will be the same for all coins. We suggest you use mining calculators to find out which one is more profitable at the moment.
READ:
Profitability calculator whattomine Profitability calculator Profit-mine Profitability calculator Coincalculators
How did the concept of hash come about?
Let’s take a short pause so that the intellect does not completely float away from the flow of terms and information that are difficult for ordinary users. Let's talk about the history of the appearance of the term “hash”. And for ease of understanding, we will lay out the “info” in tabular form.
Date (year) | Chronology of events |
1953 | The famous mathematician and programmer Donald Knuth authoritatively believes that it was during this period of time that IBM employee Hans Peter Luhn first proposed the idea of hashing. |
1956 | Arnold Dumi showed the world the hashing principle that the vast majority of modern programmers know. It was this “bright mind” who proposed to consider the remainder of division by any prime number as a hash code. In addition, the researcher saw ideal hashing tools for the positive implementation of the “Dictionary Problem”. |
1957 | A paper by Wesley Peterson published in the Journal of Research and Development was the first to seriously address information retrieval in large files, identifying open addressing and the performance degradation of elimination. |
1963 | The work of Werner Buchholz was published, where a thorough study of the hash function was presented. |
1967 | The modern hashing model was first mentioned in the work “Principles of Digital Computing Systems” by Herbert Hellerman. |
1968 | An impressive review by Robert Morris, published in the Communications of the ACM, is considered the starting point for the emergence of the concept of hashing and the term “hash” in the scientific world. |
Interesting! Back in 1956, Soviet programmer Andrei Ershov called the hashing process an arrangement, and collisions of hash functions a conflict. Unfortunately, none of these terms caught on.
What kind of “beast” is hashing?
So that a “vinaigrette” does not form in the minds of readers, let’s start with the meaning of terminologies in relation to digital technologies:
- hash function (“convolution”) – a mathematical equation or algorithm designed and allowing to transform an incoming information flow of unlimited volume into a concise line with a given number of paired characters (the number depends on the protocol);
- hashing – the process described in the previous paragraph;
- hash (hash code, hash sum) – that very laconic line (block) of several dozen “randomly” selected characters or, in other words, the result of hashing;
- collision – the same hash for different data sets.
Based on the explanations, we conclude: hashing is the process of compressing an incoming stream of information of any volume (even all the works of William Shakespeare) into a short “annotation” in the form of a set of random characters and numbers of a fixed length.
Collisions
Hash function collisions involve the appearance of a common hash code for two different sets of information. An unpleasant situation arises due to the relatively small number of characters in the hash. In other words, the fewer characters the final formula uses, the greater the likelihood of iteration (repeating) the same hash code on different data sets. To reduce the risk of a collision, double hashing of strings is used, forming a public and private key - that is, 2 protocols are used, as, for example, in Bitcoin. Experts generally recommend doing without hashing when implementing any important projects, if, of course, this is possible. If a cryptographic hash function is unavoidable, the protocol must be tested for compatibility with the keys.
Important! Collisions will always exist. A hashing algorithm that processes a stream of information of varying volume into a hash code fixed in the number of characters will in any case produce duplicates, since multiple sets of data are confronted with the same line of a given length. The risk of recurrence can only be reduced.
Technical specifications
The fundamental characteristics of hashing protocols are as follows:
- The presence of intra-system equations that allow modifying a non-fixed amount of information into a laconic set of characters and numbers of a given length.
- Transparency for cryptographic audit.
- The presence of functions that make it possible to reliably encode the original information.
- Ability to decrypt a hash sum using medium-power computing equipment.
Here it is also worth noting the important properties of the algorithms: the ability to “collapse” any data array, produce a hash of a specific length, and distribute function values evenly at the output. It should be noted that any changes in the incoming message (another letter, number, punctuation mark, even an extra space) will make adjustments to the final hash code. It will simply be different - the same length, but with different symbols.
Requirements
The following requirements are put forward for a hash function that is effective in all respects:
- the protocol must be sensitive to changes occurring in incoming documents - that is, the algorithm must recognize the rearrangement of paragraphs, hyphens, and other elements of text data (the meaning of the text does not change, it is simply corrected);
- the technology must transform the flow of information in such a way that it is impossible in practice to carry out the reverse procedure - to restore the original data from the hash value;
- the protocol must use mathematical equations that eliminate or significantly reduce the occurrence of a collision.
These requirements are only feasible when the protocol is based on complex mathematical equations.