7.4 Integrity

Think of the number of the times you've signed your name to a piece of paper during the last week. You sign checks, credit card statements, legal documents, and letters. Your signature attests to the fact that you (as opposed to someone else) have acknowledged and/or agreed with the document's contents. In a digital world, one often want to indicate the owner or creator of a document, or to signify one's agreement with a document's content. A digital signature is a cryptographic technique for achieving these goals in a digital world.

Just as with human signatures, digital signing should be done in such a way that a digital signatures are verifiable, non-forgible, and non-repudiable. That is, it must be possible to "prove" that a document signed by an individual was indeed signed by that individual (the signature must be verifiable) and that only that individual could have signed the document (the signature can not be forged, and a signer can not later repudiate or deny having signed the document). This is easily accomplished with public key cryptography.

7.4.1 Generating Digital Signatures

Suppose that Bob wants to digitally sign a "document," m. We can think of the document as a file or a message that Bob is going to sign and send. As shown in Figure 7.4-1, to sign this document, Bob simply uses his private decryption key, d_B, to compute d_B(m). At first, it might seem odd that Bob is running a decryption algorithm over a document that hasn't been encrypted. But recall that "decryption" is nothing more than a mathematical operation (exponentiation to the power of d in RSA; see section 7.2) and recall that Bob's goal is not to scramble or obscure the contents of the document, but rather to sign the document in a manner that is verifiable, non-forgible, and non-repudiable. Bob has the document, m, and his digital signature of the document, d_B(m).

Figure 7.4-1: Creating a digital signature for a document.

Does the digital signature, d_B(m), meet our requirements of being verifiable, non-forgible, and non-repudiable? Suppose Alice has m and d_B(m). She wants to prove in court (being litigious) that Bob had indeed signed the document and was the only person who could have possibly signed the document. Alice takes Bob's public key, e_B, and applies it to the digital signature, d_B(m), associated with the document, m. That is, she computes e_B(d_B(m)), and voila, with a dramatic flurry, she produces m, which exactly matches the original document! Alice then argues that only Bob could have signed the document because:

Whoever signed the message must have used the private encryption key, d_Bin computing the signatured_B(m), such that e_B(d_B(m)) = m.
The only person who could known the private key, d_B , is Bob. Recall from our discussion of RSA in section 7.2 that knowing the public key e_B is of no help in learning the private key d_B. Therefore, the only person who could know d_Bis the person who generated the pair of keys, (e_B,d_B) in the firstplace, Bob.

It is also important to note that if the original document, m, is ever modified to some alternate form, m', the signature that Bob created for m will not be valid for m', since e_B(d_B(m)) does not equal m'.

Thus we see that public key cryptography techniques provide a simple and elegant way to digitally sign documents that is verifiable, non-forgible, and non-repudiable, and that protects against later modification of the document.

7.4.2 Message Digests

We have seen above that public key encryption technology can be used to create a digital signature. One concern with signing data by encryption, however, is that encryption and decryption are computationally expensive. When digitally signing a really important document, say a merger between two large multinational corporations or an agreement with a child to have him/her clean her room weekly, computational cost may not may be important. However, many network devices and processes (e.g., routers exchanging routing table information and email user agents exchanging email) routinely exchange data that may not need to be encrypted. Nonetheless, they do want to ensure that:

the sender of the data is as claimed, i.e., that the sender has signed the data and this signature can be checked

the transmitted data has not been changed since the sender created and signed the data.

Given the overheads of encryption and decryption, signing data via complete encryption/decryption can be overkill. A more efficient approach using so-called message digests can accomplish these two goals without full message encryption.

A message digest is in many ways like a checksum. Message digest algorithms take a message, m, of arbitrary length and compute a fixed length "fingerprint" of the data known as a message digest, H(m). The message digest protects the data in the sense that if m is changed to m' (either maliciously or by accident) then H(m), computed for the original data (and transmitted with that data), will not match the H(m) computed over the changed data. While the message digest provides for data integrity, how does it help with signing the message m? The goal here is that rather than having Bob digitally sign (encrypt) the entire message by computing d_B(m), he should be able to sign just the message digest by compting d_B(H(m)). That is, having m and d_B(H(m)) together (note that m is not typically encrypted) should be "just as good as" having a signed complete message, d_B(m); this means that m and d_B(H(m)) together should be non-forgible, verifiable, and non-repudiable. Nonforgible will require that the message digest algorithm that computes the message digest have some special properties, as we will see below.

Figure 7.4-2: Hash functions are used to create message digests.

Our definition of a message digest may seem quite similar to the definition of a checksum (e.g., the Internet checksum, see section 4.4) or a more powerful error detection code such as a cyclic redundancy check (see section 5.1). Is it really any different? Checksums, cyclic redundancy checks, and message digests are all examples of so-called hash functions. As shown in Figure 7.4-2, a hash function takes an input, m, and computes a fixed-size string known as a hash. The Internet checksum, CRC's and message digests all meet this definition. If signing a message digest is going to be "just as good as" signing the entire message, in particular if it is going to satisfy the non-forgibility requirement, then a message digest algorithm must have the following additional properties:

Given a message digest value, x, it is computationally infeasible to find a message, y, such that H(y) = x;
It is computationally infeasible to find any two messages x and y such that H(x) = H(y).

Informally, these two properties mean that it is computationally infeasible for an intruder to substitute one message for another message that is protected by a message digest. That is, if (m,H(m)) are the message and message digest pair created by the sender, then an intruder can not forge the contents of another message, y, that has the same message digest value as the original message. When Bob signs m by computing d_B(H(m)), we know that no other message can be substituted for m. Furthermore, Bob's digital signature of H(m) uniquely identifies Bob as the verifiable, non-repudiable signer of H(m) (and as a consequence, m as well) as discussed above in section 7.4.1.

Figure 7.4-3: Sending a digitally signed message.

In the context of Bob sending a message to Alice, Figure 7.4-3 provides a summary of the operational procedure of creating a digital signature. Bob puts his original long message through a hash function to create a messge digest. He then encrypts the message digest with his own private key. The original message (in clear text) along with the digitally signed message digest (henceforth referred to as the digital signature) is then sent to Alice. Figure 7.4-4 provides a summary of the operational procedure of verifying message integrity. Alice applies the Bob's public key to the message to recover the message digest. Alice also applies the hash function to the clear text message to obtain a second message digest. If the two message digests match, then the recipientAlice can be sure about the integrity of the message, and sure that Bob sent the message.

Figure 7.4-4: Verifying the integrity of a signed message.

7.4.3 Hash Function Algorithms

Let's convince ourselves that a simple checksum, such as the Internet checksum, would make a poor message digest algorithm. Rather than performing 1 complement's arithmetic (as in the Internet checksum), let us compute a checksum by treating each character as a byte and adding the bytes together using 4-byte chunks at a time. Suppose Bob owes Alice $100.99" and sends an IOU to Alice consisting of the text string "IOU100.99BOB". The ASCII representation (in hexadecimal notation) for these letters is 49, 4F, 55, 31, 30, 30, 2E, 39, 39, 42, 4F, 42.

Figure 7.4-5 (top) shows that the 4-byte checksum for this message is B2 C1 D2 AC. A slightly different message (and a much more costly one for Bob) is shown in the bottom half of Figure 7.5-1. The message "IOU100.99BOB" and "IOU900.19BOB" have the same checksum! Thus, this simple checksum algorithm violates the two required requirements above. Given the original data, it is simple to find another set of data with the same checksum. Clearly, for security purposes. we are going to need a more powerful hash function than a checksum.

Two messages with same checksum
Figure 7.4-5: Initial message and fraudulent message have the same checksum!

The MD5 message digest algorithm by Ron Rivest [RFC 1321] is in wide use today. It computes a 128-bit message digest in a four-step process consisting of a padding step (adding a 1 followed by enough zero's so that the length of the message satisfies certain conditions), an append step (appending a 64-bit representation of the message length before padding), an initialization of an accumulator, and a final looping step in which the message's 16-word blocks are processed (mangled) in four rounds of processing. It is not known whether MD5 actually satisfies the requirements listed above. The author of MD5 claims "It is conjectured that the difficulty of coming up with two messages having the same message digest is on the order of 2⁶⁴ operations, and that the difficulty of coming up with any message having a given message digest is on the order of 2¹²⁸ operations. "[RFC 1321]. No one has argued with this claim. For a description of MD5 (including a C source code implementation) see [RFC 1321]. Computational aspects of MD5 are discussed in [RFC 1810].

The second major message digest algorithm in use today is SHA-1, the Secure Hash Algorithm [FIPS 1995]. This algorithm is based on principles similar to those used in the design of MD4 [RFC 1320], the predecessor to MD5. The Secure Hash Algorithm (SHA-1), a US federal standard, is required for use whenever a secure message digest algorithm is required for federal applications. It produces a 160-bit message digest.

References

[FIPS 95] Federal Information Processing Standard, "Secure Hash Standard", FIPS Publication 180-1.
[RFC 1320] R. Rivest, The MD4 Message-Digest Algorithm, RFC 1320, April 1992.
[RFC 1321] R.L. Rivest, The MD5 Message-Digest Algorithm, RFC 1321, April 1992.
[RFC 1810] J. Touch, "Report on MD5 Performance," RFC 1810, June 1995.