Wednesday, November 16, 2011

Encryption in C#

C# has a lot of good support for almost all modern cryptographic functions, but they are rarely easy to understand.  Most people are aware of the fact that encryption is meant to prevent data from being read/ used in an unauthorized manner, but that's usually as far as they want to take it.  Implementing encryption implies a certain level of complication, and writing it into your application can be a daunting prospect.  Hopefully this article will make things a bit more digestible.

There are three main types of encryption mechanisms: one-way, symmetric, and asymmetric.  They each have strengths and weaknesses, and are practical for different things. In this article I will explain one-way and symmetric; asymmetric is an article in itself.

One-way Encryption
One-way encryption is a bit of a misnomer; most people refer to these as hashes, or checksums.  It involves taking some data and deriving a digital "signature" from it - the same data will always produce the same signature.  The data cannot, however, be derived from the signature, and it is possible (though very rare, especially with modern hashing algorithms) that different data will produce the same signature.  This is called collision, and in some cases steps need to be taken to reduce the possibility of the same hash from being created.


It's called "one-way encryption" because the hashed data cannot be "unhashed"... the signature does not contain the original data.  There are quite a few advantages to this:
  1. Hashing is usually very fast compared to reversible encryption techniques.  On a small scale, this is not especially noticeable, but when dealing with large amounts of data, this can become rather significant.
  2. Due to its relative speed, hashing is useful for comparing large pieces of data with other large pieces of data.  Often, when used in this manner, a hash is called a checksum.  A good use case is verifying a large file successfully transferred over the network; the file publisher says the checksum should be X, and when you run the same checksum algorithm on your side, you should also get X... if you don't, there's a discrepancy that needs to be investigated.
  3. Hashing is also good if you don't want to be able to derive the original data, for security reasons.  For example, passwords are often stored in a database in hashed form.  When authenticating against a stored password hash, you would hash the password submitted by the user and compare the signatures; if they match, the user has authenticated.  (Note: if you care enough to hash your passwords, make sure you salt them first... out of scope for this article, but google salting passwords for a quick how-to)
  4. Hashes have a fixed length (the lengths are different depending on the algorithm used to hash, but the data has no bearing on the size of the hash), which makes them fit nicely into various strongly-typed storage containers.
In c#, it's pretty easy to write a simple hashing algorithm to suit your needs:


        public static byte[] GenerateHash(string toHash)
        {
            var bytes = Encoding.UTF8.GetBytes(toHash);
            using (var algorithm = new SHA384Managed())
            {
                return algorithm.ComputeHash(bytes);
            }
        }
In this case, you simply provide the string you wish to hash, and it generates a hash for you.  In this case, I chose to use the SHA-384 (a derivative of the SHA-2 algorithm) algorithm, as it's fairly fast and fairly secure.  I recommend against MD5, CRC, and SHA-1 because they tend to have some weaknesses (similar data may produce similar hashes, and that's generally not desired).

Symmetric Encryption
Symmetric encryption is a form of reversible encryption (i.e., the data can be decrypted and used) that uses a single encryption key to both encrypt and decrypt the data.  Whoever has the encryption key and knows how the data was encrypted can easily decrypt it.  Some things to remember about symmetric encryption:
  1. Symmetric encryption is usually significantly slower than hashing, but significantly faster than asymmetric encryption.
  2. Symmetric encryption is good if you can reasonably protect the encryption key, but the key needs to be available to all people who need to encrypt or decrypt the data.  You have to trust the data managers.
  3. Symmetrically encrypted data needs to be decrypted in the same manner it was encrypted in. It's kinda a no-brainer, but it means that your encryption method should be paired with your decryption method so you can keep your algorithms and settings properly synchronized.
  4. Since the actual data is contained in the encrypted payload, if your data is big, your encryption will also be big.
In c#, this is also fairly straightforward to implement.  The namespaces we will need are like so:
    using System;
    using System.IO;
    using System.Linq;
    using System.Security.Cryptography;
    using System.Text;

...and here's the code:

        public static byte[] EncryptString(string toEncrypt, byte[] encryptionKey)
        {
             var toEncryptBytes = Encoding.UTF8.GetBytes(toEncrypt);
            using (var provider = new AesCryptoServiceProvider())
            {
                provider.Key = encryptionKey;
                provider.Mode = CipherMode.CBC;
                provider.Padding = PaddingMode.PKCS7;
                using (var encryptor = provider.CreateEncryptor(provider.Key, provider.IV))
                {
                    using (var ms = new MemoryStream())
                    {
                        using (var cs = new CryptoStream(ms, encryptor, CryptoStreamMode.Write))
                        {
                            cs.Write(toEncryptBytes, 0, toEncryptBytes.Length);
                            cs.FlushFinalBlock();
                            var retVal = new byte[16 + ms.Length];
                            provider.IV.CopyTo(retVal, 0);
                            ms.ToArray().CopyTo(retVal, 16);
                            return retVal;
                        }
                    }
                }
            }
        }

        public static string DecryptString(byte[] encryptedString, byte[] encryptionKey)
        {
             using (var provider = new AesCryptoServiceProvider())
            {
                provider.Key = encryptionKey;
                provider.Mode = CipherMode.CBC;
                provider.Padding = PaddingMode.PKCS7;
                provider.IV = encryptedString.Take(16).ToArray();
                using (var ms = new MemoryStream(encryptedString, 16, encryptedString.Length - 16))
                {
                    using (var decryptor = provider.CreateDecryptor(provider.Key, provider.IV))
                    {
                        using (var cs = new CryptoStream(ms, decryptor, CryptoStreamMode.Read))
                        {
                            byte[] decrypted = new byte[encryptedString.Length];
                            var byteCount = cs.Read(decrypted, 0, encryptedString.Length);
                            return Encoding.UTF8.GetString(decrypted, 0, byteCount);
                        }
                    }
                }
            }
        }
A couple of things to note about the above code:
  1. It took me a while to wrap my brain around the IV (Initialization Vector).  Now I consider it similar to a salt value on a hash, because it's used for the same practical purpose (out of scope for this article), but just so I mention this a bit, with most symmetric encryption, the blocks of data you are encrypting build on data you had just finished encrypting, so the encrypted data will actually look different encoding similar data.  The IV actually "pretends" to be data you've already encrypted, and it's generally set randomly with each instantiation of the CryptoServiceProvider.  End result?  You can encrypt the same data with the same key a bunch of times, and the encrypted data payload will look different, but hold the same info.
  2. The IV is prepended to the actual payload of the data upon encryption, and stripped off before decryption.  This is a good practice, but not the only good practice.  A lot of people would want to save that separately.  I did it this way so I only have to pass the payload and key around.  Works pretty well.
  3. The mode I chose is a reasonably secure, as well as the padding; I primarily chose these because they were easy to test, and still secure.
That's it!  These methods can encrypt your data anywhere, without a lot of code.  If you want to save your data in string format you can use these functions for easy conversion:

     Convert.ToBase64String(byteArray);
     Convert.FromBase64String(string)

To generate a new encryption key for the above symmetric encryption functions, you can use something like this:

        public static byte[] GenerateAESKey()
        {
            using (var provider = new AesManaged())
            {
                provider.KeySize = 256;
                provider.GenerateKey();
                return provider.Key;
            }
        }

...and you should be good to go!













2 comments:

  1. Very nice post. I am thankful to you for providing this great detail about encryption. You have given a basic idea about this security mechanism in this article. The above information helped me a lot to learn and understand about it. After reading it things seems to me bit more clear.
    digital signature software

    ReplyDelete
  2. I read that Post and got it fine and informative. Please share more like that...
    澳洲cs代写

    ReplyDelete