Compression, Encryption and Hashing: (a) Lossy vs Lossless compression. (b) Run length encoding and dictionary coding for lossless compression. (c) Symmetric and asymmetric encryption. (d) Different uses of hashing.
Ever been frustrated by your slow computer or have you ever had to delete images or videos off your phone just so you could make space for more? The reality is that computer storage and network bandwidth are both limited. If we can reduce the amount of file bytes we get more storage space and faster transmission speeds by removing unnecessary bits of data. This is where compression comes in! Go on to the presentation and be sure to read the essentials below.
Useful videos on this topic
-Great video on Cryptography - this will give you some perspective and context as well as inspire you!
Another great video on Hashing
What you need to know – the essentials for this topic
Compression ratio is how the level of compression is measured: size of compressed file/size of the original, the smaller the better.
A number of compression packages exist, such as: zip, rar, tgz and 7-zip, and their compression performance is slightly different.
7-zip is worth remembering as it is Open Source and uses multiple algorithms to achieve very high compression ratios, as well as 256-bit encryption.
Lossy and Lossless
There are two ways to reduce file volume: lossy and lossless.
Lossy removes more data during compression but some of this data is lost and doesn’t get restored.
Lossless creates a perfect copy of the original after decompression.
The choice between them is not always clear, it depends on particular needs.
Run-length encoding is popular in lossless compression. Any repeated characters in a file are replaced with a flag character followed by the character replaced and how many times it was repeated. It will not compress sequences of characters that are shorter than four, as that four would include itself, a flag, and a digit as additional data. Often used to compress media like graphics or music where a similar pattern can repeat every frame (portion of a second).
Another lossless approach is dictionary encoding where commonly occurring sequences in a type of file are stored in a dictionary as the file is scanned to search for such sequences. All the multiple instances of the sequences are replaced with a link to the sequence in the dictionary – thus saving space.
Huffman encoding: This is one of the easiest dictionary encoding methods, where most characters used in a file are placed in a dictionary and then their occurrences in a file are replaced with numerical codes that link to the dictionary. The more common the character, the shorter the code.
Compression works better on longer files as more similarities and patterns exist in such files.
Encryption scrambles plain text into cipher text which can’t be understood. Decryption is the process of unscrambling the encrypted message to bring back the original. The piece of information needed to decrypt the cipher text (and kept secure) is called the key.
Did you know that Alan Turing was the first to use a computer to break enemy codes, which he did successfully during WWII.
Ciphers have been around for a long time. Caesar’s cipher is simple and famous. Every letter in an original message is replaced with the one either a few positions ahead or behind it in an alphabet. This can easily be broken by modern computers by trying all possible combinations.
Encryption is now used in almost all communication and the power of computers is making older encryption methods unsafe.
In symmetric encryption, decryption is simply the opposite of the encryption process and the same key is used for both. It is easy to do but is also easily cracked by the brute force of modern computers.
In asymmetric encryption, otherwise known as ‘public key encryption’, the encryption and decryption keys are different, so the party that encrypted the message can’t decrypt it and the party that can decrypt the message can’t encrypt it. It is much more secure but was unknown until recently. In this method, the encryption key is public and can be freely shared by the party that needs to receive an encrypted communication. The decryption key is private and secret, though. Anybody can find a public key of another person and use it to send them an encrypted message which only the recipient can decrypt.
This is achieved via a mathematical one-way function that can only be reversed under a single circumstance. Such functions are called RSA ciphers (from the three inventors’ initials). The function requires a person to choose three large prime numbers, of which one and the product of the other two are published collectively as a public key – for everybody to encrypt messages to this person. An exponential function with modular division is applied to every ASCII character in the message. The original recipient is the only one who knows the three secret numbers and there are too many combinations of the products of these numbers for somebody to guess them.
While secure, public key encryption is vulnerable to corruption and interference with message. Hashing helps with that. A hash is a unique string of data through which the original message is put. Unlike a key, which is unique to a person but is the same for all the messages from/to this person, a hash is unique for every message and is sent along with the encrypted message to the recipient. Any smallest change in the original message would generate a completely new hash. Upon receiving the message and the hash, the recipient decrypts the message, generates a hash and if whatever has been generated doesn’t match the hash that came along with a message, the message has been tampered with! Hashes are secure; they can’t be reversed without trying an almost infinite number of possible combinations. So, a hash is a bit like a unique signature.
One of the most common signatures is the Digital Signature Algorithm (DSA). It is possible to apply a signature to a hash, creating two layers of authentication.
Students often misunderstand that the compression ratio is better when it is smaller.
Students also confuse lossless and lossy compression or think that lossless is always better.
They also tend to forget the names of encoding schemes.
In practical work, students may not be able to do encryption reliably, and in programming it, they may not be familiar with string manipulation and nested loops.
Students also confuse public and private keys (which one is used for encryption, and which for decryption).
They may also not know what modular division or exponential functions are, which may need clarification in class.
PowerPoint loading ...
Your teacher may ask you to do one or more of the following tasks
Worksheets for this section - see the link to the left for 04-07
1. Complete a learning poster (a single slide) in which all of the learning objectives for this topic are explained
*Refer to the theory PowerPoint and the "What you will learn objectives" slide.
Research Power Point
1. Complete a research powerpoint on the topic selected by your teacher using the following template.
End of topic assessment
Well done on completing this topic!
Your teacher will direct you as to which task needs to be handed in