Think of the number of the times you've signed your name to a piece
of paper during the last week. You sign checks, credit card statements,
legal documents, and letters. Your signature attests to the fact
that you (as opposed to someone else) have acknowledged and/or agreed with
the document's contents. In a digital world, one often want to indicate
the owner or creator of a document, or to signify one's agreement with
a document's content. A digital signature is a cryptographic
technique for achieving these goals in a digital world.
Just as with human signatures, digital signing should be done in such
a way that a digital signatures are verifiable, non-forgible, and non-repudiable.
That is, it must be possible to "prove" that a document signed by an individual
was indeed signed by that individual (the signature must be verifiable)
and that only that individual could have signed the document
(the signature can not be forged, and a signer can not later repudiate
or deny having signed the document). This is easily accomplished
with public key cryptography.
Does the digital signature, dB(m), meet our requirements of being verifiable, non-forgible, and non-repudiable? Suppose Alice has m and dB(m). She wants to prove in court (being litigious) that Bob had indeed signed the document and was the only person who could have possibly signed the document. Alice takes Bob's public key, eB, and applies it to the digital signature, dB(m), associated with the document, m. That is, she computes eB(dB(m)), and voila, with a dramatic flurry, she produces m, which exactly matches the original document! Alice then argues that only Bob could have signed the document because:
Thus we see that public key cryptography techniques provide a simple and elegant way to digitally sign documents that is verifiable, non-forgible, and non-repudiable, and that protects against later modification of the document.
Given the overheads of encryption and decryption, signing data via complete encryption/decryption can be overkill. A more efficient approach using so-called message digests can accomplish these two goals without full message encryption.
- the sender of the data is as claimed, i.e., that the sender has signed the data and this signature can be checked
- the transmitted data has not been changed since the sender created and signed the data.
A message digest is in many ways like a checksum. Message digest algorithms take a message, m, of arbitrary length and compute a fixed length "fingerprint" of the data known as a message digest, H(m). The message digest protects the data in the sense that if m is changed to m' (either maliciously or by accident) then H(m), computed for the original data (and transmitted with that data), will not match the H(m) computed over the changed data. While the message digest provides for data integrity, how does it help with signing the message m? The goal here is that rather than having Bob digitally sign (encrypt) the entire message by computing dB(m), he should be able to sign just the message digest by compting dB(H(m)). That is, having m and dB(H(m)) together (note that m is not typically encrypted) should be "just as good as" having a signed complete message, dB(m); this means that m and dB(H(m)) together should be non-forgible, verifiable, and non-repudiable. Nonforgible will require that the message digest algorithm that computes the message digest have some special properties, as we will see below.
Our definition of a message digest may seem quite similar to the definition of a checksum (e.g., the Internet checksum, see section 4.4) or a more powerful error detection code such as a cyclic redundancy check (see section 5.1). Is it really any different? Checksums, cyclic redundancy checks, and message digests are all examples of so-called hash functions. As shown in Figure 7.4-2, a hash function takes an input, m, and computes a fixed-size string known as a hash. The Internet checksum, CRC's and message digests all meet this definition. If signing a message digest is going to be "just as good as" signing the entire message, in particular if it is going to satisfy the non-forgibility requirement, then a message digest algorithm must have the following additional properties:
In the context of Bob sending a message to Alice, Figure 7.4-3 provides a summary of the operational procedure of creating a digital signature. Bob puts his original long message through a hash function to create a messge digest. He then encrypts the message digest with his own private key. The original message (in clear text) along with the digitally signed message digest (henceforth referred to as the digital signature) is then sent to Alice. Figure 7.4-4 provides a summary of the operational procedure of verifying message integrity. Alice applies the Bob's public key to the message to recover the message digest. Alice also applies the hash function to the clear text message to obtain a second message digest. If the two message digests match, then the recipientAlice can be sure about the integrity of the message, and sure that Bob sent the message.
Figure 7.4-5 (top) shows that the 4-byte checksum for this message is B2 C1 D2 AC. A slightly different message (and a much more costly one for Bob) is shown in the bottom half of Figure 7.5-1. The message "IOU100.99BOB" and "IOU900.19BOB" have the same checksum! Thus, this simple checksum algorithm violates the two required requirements above. Given the original data, it is simple to find another set of data with the same checksum. Clearly, for security purposes. we are going to need a more powerful hash function than a checksum.
Figure 7.4-5: Initial message and fraudulent message have the
same checksum!
The MD5 message digest algorithm by Ron Rivest [RFC 1321] is in wide use today. It computes a 128-bit message digest in a four-step process consisting of a padding step (adding a 1 followed by enough zero's so that the length of the message satisfies certain conditions), an append step (appending a 64-bit representation of the message length before padding), an initialization of an accumulator, and a final looping step in which the message's 16-word blocks are processed (mangled) in four rounds of processing. It is not known whether MD5 actually satisfies the requirements listed above. The author of MD5 claims "It is conjectured that the difficulty of coming up with two messages having the same message digest is on the order of 264 operations, and that the difficulty of coming up with any message having a given message digest is on the order of 2128 operations. "[RFC 1321]. No one has argued with this claim. For a description of MD5 (including a C source code implementation) see [RFC 1321]. Computational aspects of MD5 are discussed in [RFC 1810].
The second major message digest algorithm in use today is SHA-1, the Secure Hash Algorithm [FIPS 1995]. This algorithm is based on principles similar to those used in the design of MD4 [RFC 1320], the predecessor to MD5. The Secure Hash Algorithm (SHA-1), a US federal standard, is required for use whenever a secure message digest algorithm is required for federal applications. It produces a 160-bit message digest.