To start with digital fingerprints or hashing you need to understand what is a fingerprint and what is digital (no kidding).
What is a fingerprint?
Normally a fingerprint in biology and biometrics is the unique pattern of whorls and lines on the fingertip of a human being. For a while forget all that.
Just consider a fingerprint as a unique pattern.
A unique pattern so unique that an almost infinite or a very high number of separate patterns can be generated without any correlation. Imagine a world full of numbers, where every item you see, every sound you hear, and every other perceptions, are all numbers. The requirements from a fingerprint then is distinction from each other and similarity of some sort.
For e.g. if you have to compare two human beings, you have to take their fingerprints, which has the same characteristics but totally distinct.
What is digital?
In computers, all information is stored as binary numbers. For more clarity on how everything can be stored as 1s and 0s you may read this short article here: What is digital information and how does the computer work? For a lawyer
Binary information is then stored as small packets on the storage device as files. Files are always of variable length. The word ‘India’ will take 5 bytes to store on a hard drive as a text file while the entire Ramayana would take about three and a half million bytes or 3.5 MBs.
What is a digital fingerprint?
While electronic file sizes are of variable length, the files are all made up of a similar structure of 0s and 1s.
The required distinction is the pattern in their composition of 0s and 1s, and the required similarity is that they are made up of patterns of 0s and 1s.
Digital Fingerprint is a set of characters and numbers unique to every file. It is of a specific length. It is generated on the basis of the binary data of each file.
The words ‘digital fingerprint’, ‘message digest’, ‘digest’, ‘checksum’ and ‘hash’ are used interchangeably.
A mathematical function called hashing is then used to convert this long strings of binary data into a prescribed number of characters, say a specific set of 32, 64 or 128 numbers.
This mathematical function just works one way and it is mathematically and logically impossible to find out the source data by using the digital fingerprint.
For e.g. if I were told to reduce a string of numbers into a digital fingerprint of two characters, I would break the original string of numbers into their individual components and add the components till I reach two digits.
7778889990 = 7+7+7+8+8+8+9+9+9+0 = 72
It would be then impossible to work back the number 72 to 7778889990
Similarly the text:
“Internet developed rapidly leaving little or no scope for its terminologies to develop. Most internet terms and phrases are English loanwords most analogous to the concept being described.”
can be first changed to a string of binary numbers (you can read about it here1) and then a mathematical function can be used to reduce the string to a specific set of numbers.
This reduction of a large file into a fixed set of numbers is called hashing. You can visit this site MD5 Online Generator to generate the MD5 hash of any text.
Properties of a hash
The hash of any file generated therefore:
- is a one way encryption result
- is quicker to transfer than their original source files
- changes extensively even with a small change to the input
- appears uncorrelated with any other hash value
- cannot be recreated using different inputs
- is always the same with the same input
What is the use of hashing?
File or Email transfer
The use of hashing is mostly due to internet communication, where one party needs to send a file securely to another party.
For e.g. Bimal wants to download a file from Amazon, and wants to be sure it is the same file and that it has not been infected with any malware while being transferred. He requests Amazon to deliver the MD5 hash of the file in a separate arrangement. After downloading and before using the file, Bimal computes the MD5 hash of the file and compares it with the hash that Amazon provided. If they are the same then it is definite that the file has not been tampered with and that it is safe to use.
Every password verification form you have filled up ever, takes your input password, hashes it and compares it with the hash stored on its database, if the hash matches then the access is granted.
Why hash it? Storing all user passwords in a text file can result in a massive security breach if the password file itself is compromised.
If you would like to know more about hashing or digital fingerprints please leave your comments below.