Entropy and Certainty in Lossless Data Compression

James Jay Jacobs


Data compression is the art of using encoding techniques to represent data symbols using less storage space compared to the original data representation. The encoding process builds a relationship between the entropy of the data and the certainty of the system. The theoretical limits of this relationship are defined by the theory of entropy in information that was proposed by Claude Shannon. Lossless data compression is uniquely tied to entropy theory as the data and the system have a static definition. The static nature of the two requires a mechanism to reduce the entropy without the ability to alter either of these key components. This dissertation develops the Map of Certainty and Entropy (MaCE) in order to illustrate the entropy and certainty contained within an information system and uses this concept to generate the proposed methods for prefix-free, lossless compression of static data. The first method, Select Level Method (SLM), increases the efficiency of creating Shannon-Fano-Elias code in terms of CPU cycles. SLM is developed using a sideways view of the compression environment provided by MaCE. This view is also used for the second contribution, Sort Linear Method Nivellate (SLMN) which uses the concepts of SLM with the addition of midpoints and a fitting function to increase the compression efficiency of SLM to entropy values L ( x ) ≤ H (x ) + 1. Finally, the third contribution, Jacobs, Ali, Kolibal Encoding (JAKE), extends SLM and SLMN to bases larger than binary to increase the compression even further while maintaining the same relative computation efficiency.