DICTIONARY_CODE is a MATLAB library which can apply a dictionary code to a text file.
A common feature of lossless compression schemes is the construction of a "dictionary" of the symbols or words occuring in the file, and the replacement of symbols by dictionary indices.
These functions illustrate that idea, by starting with a version of the Gettysburg Address. In order to simplify our work, we remove punctuation and capitalization. Using MATLAB's "textread" function, we can create a cell array where each entry is a word in the file. Using MATLAB's unique() function we can construct a "dictionary" that lists in alphabetic order every word occurring in the file. Using a surprisingly obscure MATLAB function, we can then replace every word in the text file by its dictionary index. This is the operation of the "dictionary_encode()" function.
In order to decode or uncompress the file, we need both the encoded file and the dictionary. For our example, the dictionary is stored as a separate file, although compression schemes pack both the encoded text and the dictionary together. The function "dictionary_decode()" can then recover the original message.
The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.
DICTIONARY_CODE is available in a MATLAB version.
ATBASH, a MATLAB library which applies the Atbash substitution cipher to a string of text.
CAESAR, a MATLAB library which can apply a Caesar Shift Cipher to a string of text.
CHRPAK, a MATLAB library which works with characters and strings.
FILUM, a MATLAB library which can work with information in text files.
MONOALPHABETIC, a MATLAB library which can apply a monoalphabetic substitution cipher to a string of text.
ROT13, a MATLAB library which can encipher a string using the ROT13 cipher for letters, and the ROT5 cipher for digits.