Please write HuffmanNode.java and HuffmanTree.java given HuffmanGUI.java. Instructions below: HuffmanGUI.java given belo
Posted: Fri Jun 10, 2022 11:55 am
Please write HuffmanNode.java and HuffmanTree.java given
HuffmanGUI.java. Instructions below:
HuffmanGUI.java given below, it was too long to paste so I had
to take images instead:
For this last CS211 assignment, you are to submit the following Java code for a Huffman Tree data structure, along with the Huffman Node structure that you have used. For the Huffman Nodes, read through the description of Huffman compression below and the following organization will become apparent. We're building a binary tree, where the "data" is a count of the frequency of each character. public class HuffmanNode implements Comparable { public public int frequency; public char character; HuffmanNode left; public HuffmanNode right; The Huffman Node class will also need a boolean isLeaf() method, plus a static method to actually provide a count of the characters in an input file, and place those counts into a Map, with character as the unique key mapped into the integer counts of that character: public static Map getCounts (FileInputStream input) Your Huffman Tree class must have the following defined: public public HuffmanTree (Map counts) // Constructor StringBuilder compress (InputStream inputFile) // inputFile is a text file public StringBuilder decompress (StringBuilder inputString) //inputString 1'3 & 0's public String printSideways () // as per the method presented in Chapter 17. Note that the input file here can actually be any format, but plain text will allow us to see what we're doing. The compress method returns a string of 1's and 0's (bits) as per the description below. The decompress method can take that string of 1's and 0's, use the Huffman tree structure (different for every text file) and recreate the text file from those bits. Finally, the printSideways here is similar to what is presented in Chapter 17 of Building Java Programs, but in this case we build a string to return into a graphical text area. You should actually use the String Builder class, as it is mutable. We normally don't use System.out calls in a GUI. But I understand they are useful during debugging. You must comment out all System.out calls before submitting your work. The client program (HuffmanGUI.java) is provided, and you need to meet these specifications listed above. I expect two files to be submitted (HuffmanNode.java and Huffman Tree.java) which I will copy into my Eclipse folder and run the provided GUI for testing. Huffman coding is an algorithm devised by David A. Huffman of MIT in 1952 for compressing text data to make a file occupy a smaller number of bytes. This relatively simple compression algorithm is powerful enough that variations of it are still used today in computer networks, fax machines, modems, HDTV, and other areas. Normally text data is stored in a standard format of 8 bits per character, commonly using an encoding called ASCII that maps every character to a binary integer value from 0-255. The idea of Huffman coding is to abandon the rigid 8-bits- percharacter requirement and use different-length binary encodings for different characters. The advantage of doing this is that if a character occurs frequently in the file, such as the letter 'e', it could be given a shorter encoding (fewer bits), making the file smaller. The tradeoff is that some characters may need to use encodings that are longer than 8 bits, but this is reserved for characters that occur infrequently, so the extra cost is worth it.
The table below compares ASCII values of various characters to possible Huffman encodings for the text of Shakespeare's Hamlet. First step is to count the number of occurrences of each character in the text. Frequent characters such as space and 'e should have short encodings, while rarer ones likez have longer ones. Character ASCII value ASCII (binary) Huffman (binary) 32 00100000 10 97 01100001 0001 'b' 98 01100010 0111010 001100 'C' 99 01100011 'e' 101 01100101 1100 'Z' 122 01111010 00100011010 The steps involved in Huffman coding a given text source file into a destination compressed file are the following: 1. Examine the source file's contents and count the number of occurrences of each character. 2. Place each character and its frequency (count of occurrences) into a sorted "priority" queue. Convert the contents of this priority queue into a binary tree with a particular structure. 3. 4. Traverse the tree to discover the binary encodings of each character. 5. Re-examine the source file's contents, and for each character, output the encoded binary version of that character to the destination file. Encoding a File: For example, suppose we have a file named example.txt with the following contents: ab ab cab In the original file, this text occupies 10 bytes (80 bits) of data. The 10th is a special "end-of-file" (EOF) byte. byte 1 2 3 4 5 7 8 9 I char 'a' 'a' 1 'b' 'C' 'a' 10 EOF 256 ASCII 97 98 32 97 98 32 99 97 98 binary 01100001 01100010 00100000 01100001 01100010 00100000 01100011 01100001 01100010 N/A In Step 1 of Huffman's algorithm, a count of each character is computed. (i.e. the getCounts method). The counts are represented as a map: {'¹=2, 'a'=3, 'b'=3, 'c'=1, EOF=1} Step 2 of the algorithm places these counts into binary tree nodes, each storing a character and a count of its occurrences. The nodes are put into a priority queue, which keeps them in sorted order with smaller counts at the front of the queue. (The priority queue is somewhat arbitrary in how it breaks ties, such as being before EOF and being before 'a'). 'b' 2 3 front 1 'C' 1 EOF 3 back T 1 'b' Now the algorithm repeatedly removes the two nodes from the front of the queue (the two with the smallest frequencies) and joins them into a new node whose frequency is their sum. The two nodes are placed as children of the new node; the first removed becomes the left child, and the second the right. The new node is re-inserted into the queue in sorted order: 2 3 3 front back 'b' 'a' do 'c' EOF I
This process is repeated until the queue contains only one binary tree node with all the others as its children. This will be the root of our finished Huffman tree. The following diagram shows this process: 3 3 10 'b' 'a' de dado 2 2 2 3 'b' 2 3 1 'c' EOF EOF 1 EOF Notice that the nodes with low frequencies end up far down in the tree, and nodes with high frequencies end up near the root of the tree. This structure can be used to create an efficient encoding. The Huffman code is derived from this tree by thinking of each left branch as a bit value of 0 and each right branch as a bit value of 1: 10 0 1 3 1 'C' EOF The code for each character can be determined by traversing the tree. To reach. we go left twice from the root, so the code for is 00. The code for 'c' is 010, the code for EOF is 011, the code for 'b' is 10 and the code for 'a' is 11. By traversing the tree, we can produce a map from characters to their binary representations. For this tree, it would be: {' '=00, 'a'=11, 'b'=10, 'c'=010, EOF=011} Using this map, we can encode the file into a shorter binary representation. The text ab ab cab would be encoded as: 'a' I 'b' " 'C' 'a' 'b' char binary 11 'b' 10 EOF 011 00 11 10 00 010 11 10 Use of the word "map" above is not random. Implementing this in code, you might find it useful to create a Map data structure that maps characters into the right string of bits. This really simplifies things, but still requires the building of a Huffman Tree to complete the assignment. The overall encoded contents are 1110001110000101110011, which is 22 bits, or almost 3 bytes, compared to the original file which was 10 bytes. (Many Huffman-encoded text files compress to about half their original size.) Modern encoding algorithms are faster and more efficient, but this is where it all started over 50 years ago, and has often been elevated into other technology (http://en.wikipedia.org/wiki/Huffman coding). 2 4 1 2 0 3 'b' 6 1 3
1 3 byte char a b a b ca b EOF binary 11 10 00 11 10 00 010 1 1 10 011 00 Since the character encodings have different lengths, often the length of a Huffman-encoded file does not come out to an exact multiple of 8 bits. Files are stored as sequences of whole bytes, so in cases like this the remaining digits of the last bit are filled with Os. You do not need to worry about this in the assignment; it is part of the underlying file system. It might worry you that the characters are stored without any delimiters between them, since their encodings can be different lengths and characters can cross byte boundaries, as with 'a' at the end of the second byte above. But this will not cause problems in decoding the compressed file, because Huffman encodings have a prefix property where no character's encoding can ever occur as the start of another's encoding. This is important when you decode the file later. Decoding a File: You can use a Huffman tree to decode text that was compressed with its encodings. The decoding algorithm is to read each bit from the file, one at a time, and use this bit to traverse the Huffman tree. If the bit is a 0, you move left in the tree. If the bit is 1, you move right. You do this until you hit a leaf node. Leaf nodes represent characters, so once you reach a leaf, you output that character. For example, suppose we are asked to decode a file containing the following bits: 101101000110111011 Using the Huffman tree, we walk from the root until we find characters, then we output them and go back to the root. 0 1 1 1 'c' EOF First we read a 1 (right), then a o (left). We reach 'b' and output b. Back to the root. 101101000110111011 We read a 1 (right), then a 1 (right). We reach 'a' and output a. . 101101000110111011 101101000110111011 . We read a o (left), then a 1 (right), then a o (left). We reach 'c' and output c. . We read a o (left), then a o (left). We reach and output a space. 1 We read a 1 (right), then a 1 (right). We reach 'a' and output a. 101101000110111011 101101000110111011 • We read a 0 (left), then a 1 (right), then a 0 (left). We reach 'c' and output c. We read a 1 (right), then a 1 (right). We reach 'a' and output a. So the overall decoded text is bac aca- (The input source reports when we reach an EOF character of 011, so we stop.) 101101000110111011 101101000110111011 When the selected input files get very large (try Moby.txt) the display will take forever (maybe 20 minutes) to load. This is a well know problem with the Java set text methods, as they require constant refresh, rebuffer, and reload. So I provide a check box on the GUI to skip the text displays, so you can see your compression can really run in under a second even for 2 0 1
huge files. In real compression algorithms, the 100011100... bit stream would actually go out over the wire as binary data, and not be converted into character, strings, and graphics (so slow). Deliverables: 1) HuffmanNode.java 2) HuffmanTree.java 3) QA document with screenshots showing your working program.
import java.io.*; import java.awt."; import java.awt.event.*; import java.util.*; import javax.swing.* public class HuffmanGUI implements ActionListener { // Runs the program and starts the GUI. public static void main(String[] args) { new HuffmanGUI().start(); } // font size for input/output file text private static final Font MONOSPACED_FONT= new Font("monospaced", Font.PLAIN, 12); // fields of the Iverson GUI private JFrame frame; private JTextField inputFileField; private JTextArea inputFileArea, huffTreeArea, outputFileArea; private JButton inputBrowse, inputLoad, compress, decompress, clear; private JLabel inputStatusLabel, outputStatusLabel, compressStatusLabel; private JFileChooser chooser; private JCheckBox displayOption; private Map counts = new TreeMap(); private HuffmanTree huffmanTree; private StringBuilder huffResults; private String filelnput; // There were file streams in the original UW version // to simplify, we'll read the whole input file into a string // and the entire stream of bits from the Huffman process into another string (huffResults) // Construct the GUI, sets up all graphical components, layout, and events. // A little complicated, as I started with a very different GUI for the UW version public HuffmanGUI() { // main window frame frame = new JFrame("CS211 (CSE143) Huffman Coding"); frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); // LEFT Panel for input file user interaction Container textAndBrowse = new JPanel(new BorderLayout()); inputFileField = new JTextField(10); inputFileField.addActionListener(this); textAndBrowse.add(new JLabel("Input file: "), BorderLayout.NORTH); textAndBrowse.add(inputFileField); // Another sub-Container for the Buttons Container textButtons = new JPanel(new FlowLayout()); inputBrowse = createButton("Browse...", 'B', this, textButtons); inputLoad = createButton("Reload", "L', this, textButtons); clear = createButton("Clear", "C', this, textButtons); textAndBrowse.add(textButtons, BorderLayout.SOUTH);
// text area in which to load input file contents (left side) inputFileArea = new JTextArea(20, 40); inputFileArea.setFont(MONOSPACED_FONT); inputFileArea.setEditable(false); inputFileArea.setFocusable(false); inputStatusLabel = new JLabel("No file loaded"); // and a container to hold text area and label together Container south = new JPanel(new BorderLayout()); south.add(new JScrollPane(inputFileArea)); south.add(inputStatusLabel, BorderLayout.SOUTH); // finally a container with all of above together Container inputFilePanel = new JPanel(new BorderLayout()); inputFilePanel.add(textAndBrowse, BorderLayout.NORTH); inputFilePanel.add(south, BorderLayout.CENTER); // For displaying the derived bit stream Container bitStream = new JPanel(new BorderLayout()); bitStream.add(new JLabel("Output Stream (String of bits): "), BorderLayout.NORTH); // text area in which to load output file contents outputFileArea = new JTextArea(20, 40); outputFileArea.setFont (MONOSPACED_FONT); outputFileArea.setEditable(false); outputFileArea.setFocusable(false); outputStatusLabel = new JLabel("No file loaded"); bitStream.add(new JScrollPane(outputFileArea)); bitStream.add(outputStatusLabel, BorderLayout.SOUTH); Container outputFilePanel = new JPanel(new BorderLayout()); outputFilePanel.add(bitStream, BorderLayout.NORTH); outputFilePanel.add(bitStream, BorderLayout.CENTER); // text area in which to load tree huffTreeArea = new JTextArea(20, 40); huffTreeArea.setFont(MONOSPACED_FONT); huffTreeArea.setEditable(false); huffTreeArea.setFocusable(false); Container right = new JPanel(new BorderLayout()); right.add(new JScrollPane(huffTreeArea)); Container huffTreePanel = new JPanel(new BorderLayout()); huffTreePanel.add(new JLabel("printSideways() Huffman Tree: "), BorderLayout.NORTH); huffTreePanel.add(right, BorderLayout.CENTER); // bottom section for buttons to compress/decompress Container compressPanel = new JPanel(new BorderLayout()); Container compressCenter = new JPanel(new FlowLayout()); compress = createButton("Compress", "C", this, compressCenter); decompress = createButton("Decompress", "D", this, compressCenter); compressStatusLabel = new JLabel("No file has been compressed yet");
compressStatusLabel.setHorizontalAlignment(SwingConstants.CENTER); displayOption = new JCheckBox("Check to display data", true); displayOption.addActionListener(this); compressPanel.add(compressCenter, BorderLayout.CENTER); compressPanel.add(compressStatusLabel, BorderLayout.SOUTH); compressPanel.add(displayOption, BorderLayout.EAST); // overall window's layout JPanel contentPane = new JPanel(new BorderLayout(15,10)); contentPane.setBorder(BorderFactory.createEmptyBorder(10, 10, 10, 10)); contentPane.add(inputFilePanel, BorderLayout.WEST); contentPane.add(outputFilePanel, BorderLayout.CENTER); contentPane.add(compressPanel, BorderLayout.SOUTH); contentPane.add(huffTreePanel, BorderLayout.EAST); // size frame and place it in the center of the screen frame.add(contentPane); frame.pack(); center(frame); } // Called whenever the user clicks a button or presses Enter on a text field. public void action Performed (ActionEvent event) { Object source = event.getSource(); JTextField field inputFileField; JTextArea area = inputFileArea; JLabel label=inputStatusLabel; if (source == inputBrowse) { // pop up a Browse... file chooser String current Dir = System.getProperty("user.dir"); if (chooser == null) { chooser = new JFileChooser(currentDir); } // load the selected file int result = chooser.showOpenDialog(frame); if (result == JFileChooser.APPROVE_OPTION) { File inputFile = chooser.getSelectedFile(); if (inputFile != null) { // shorten displayed file names if possible for prettier GUI String filename=inputFile.getAbsolutePath(); if (filename.startsWith(currentDir + File.separator)) { filename = filename.replace(currentDir + File.separator, ""); } field.setText(filename); loadFile(filename, area, label); } } } else if (source == inputLoad) { // load file from disk and display on text area String filename = field.getText().trim(); loadFile(filename, area, label); } else if (source == clear) { // clear display of file name and contents
area.setText(" "); //field.setText(" "); } else if (source == inputFileField) { // user pressed Enter on input file text field; load file and display it String inputFileName=inputFileField.getText().trim(); if (inputFileName.length() == 0) { return; } loadFile(inputFileName, inputFileArea, inputStatusLabel); } else if (source == compress) { // compress the currently selected file compress(); } else if (source == decompress) { // decompress the currently selected file decompress(); } // Helper to load the contents of the given FILE and display them into the // given text area. The given label displays status updates along the way. // Works for both text and binary files. public void loadFile(final String filename, final JTextArea area, final JLabel label) { final File file = new File(filename); if (file.exists()) { new Thread(new Runnable() { public void run() { label.setText("Loading..."); try { area.setText(readEntireFile(filename)); //helper to convert file to a String area.setLineWrap(false); label.setText("Text file, " + file.length() + " bytes (" + file.length() / 1024+" kB)"); area.setSelection Start(0); // scroll to top area.setSelectionEnd(0); } catch (IOException ioe) { // these exceptions are tough to get to now area.setText(""); label.setText("No file loaded"); ioe.printStackTrace(); JOptionPane.showMessageDialog(frame, "I/O error occurred:\n" + ioe.getMessage(), "I/O error", JOptionPane.ERROR_MESSAGE); } catch (OutOfMemoryError mem) { area.setText("(file too large to display)"); label.setText("No file loaded"); mem.printStackTrace(); JOptionPane.showMessageDialog(frame, "Out of memory error occurred:\n" + mem.getMessage(), "Memory error", JOptionPane.ERROR_MESSAGE); } }).start(); } else { // user has not yet selected an input and output file area.setText(""); label.setText("No file loaded"); }
// Runs the GUI by displaying its graphical frame. public void start() { frame.setVisible(true); } // Helper to compress the currently selected input file to the selected output file. private void compress() { final String inputFileName=inputFileField.getText().trim(); if (inputFileName.length() == 0) { return; } // run compression in a separate thread of execution because it can be slow new Thread(new Runnable() { public void run() { try { //get file's character counts compressStatusLabel.setText("Compression in progress..."); outputFileArea.setText("..."); outputStatusLabel.setText("Counting characters..."); counts = HuffmanNode.getCounts(new FileInputStream(inputFileName)); huffmanTree = new HuffmanTree(counts); if (displayOption.isSelected()) huffTreeArea.setText(huffmanTree.printSideways()); else huffTreeArea.setText("size=" + huffmanTree.printSideways().length() outputStatusLabel.setText("Creating encodings..."); // compress FileInputStream input = new FileInputStream(inputFileName); // use Huffman tree and counts to compress the input file // (also time it so we can display the runtime afterward) outputStatusLabel.setText("Compressing ..."); long startTime = System.currentTimeMillis(); huffResults = huffmanTree.compress(input); // display compressed file if (displayOption.isSelected()) outputFileArea.setText(new String(huffResults)); // only use for small files else outputFileArea.setText("size=" + huffResults.length() + " bits"); outputFileArea.setLineWrap(true); outputStatusLabel.setText("String of bits, size = "+huffResults.length()/8+" bytes ("+huffResults.length() / outputFileArea.setSelectionStart(0); // scroll to top long endTime = System.current TimeMillis(); long elapsed = end Time - startTime; compressStatusLabel.setText("Compression complete (" + elapsed + "ms)"); } catch (IOException ioe) { ioe.printStackTrace(); (8*1024) + " kB) "); + characters");
JOptionPane.showMessageDialog(frame, "I/O error occurred:\n" JOptionPane.ERROR_MESSAGE); + ioe.getMessage(), "I/O error", } catch (OutOfMemoryError mem) { mem.printStackTrace(); outputFileArea.setText("(file too large to display)"); JOptionPane.showMessageDialog(frame, "Out of memory error occurred:\n" + mem.getMessage(), "Memory error", JOptionPane.ERROR_MESSAGE); } } }).start(); } private void decompress() { new Thread(new Runnable() { public void run() { try { // use Huffman tree and counts to decompress // (also time it so we can display the runtime afterward) outputStatusLabel.setText("Decompressing above bits..."); long startTime = System.currentTimeMillis(); try { filelnput = new String(huffman Tree.decompress(huffResults)); } catch (Exception e) {} long endTime = System.currentTimeMillis(); long elapsed = endTime - startTime; // display decompressed file inputFileArea.setText(filelnput); inputFileArea.setLineWrap(true); inputStatusLabel.setText("Decompressed file size= " + filelnput.length()+" bytes (" + filelnput.length() / (1024) + kB) "); inputFileArea.setSelection Start(0); // scroll to top outputStatusLabel.setText("Done..."); compressStatusLabel.setText("Decompression complete (" + elapsed + "ms)"); } catch (OutOfMemoryError mem) { JOptionPane.showMessageDialog(frame, "Out of memory error occurred:\n" + mem.getMessage(), "Memory error", JOptionPane.ERROR_MESSAGE); mem.printStackTrace(); } } }).start(); } // Moves the given window to the center of the screen. private static void center(Component comp) { Dimension size = comp.getSize(); Dimension screen = Toolkit.getDefaultToolkit().getScreenSize(); comp.setLocation(Math.max(0, (screen.width - size.width) / 2), Math .max(0, (screen.height - 24 - size.height)/2)); } // Helper method to create a JButton with the given properties.
private static JButton createButton(String text, char mnemonic, ActionListener listen, Container panel) { JButton button = new JButton(text); if (mnemonic != '\0') { button.setMnemonic(mnemonic); } button.addActionListener(listen); if (panel != null) { panel.add(button); } button.setFocusable(false); return button; } // Returns the entire contents of the given file as a string. // To be used with text files. private static String readEntireFile(String filename) throws IOException { File file = new File(filename); if (!file.exists()) { return null; } StringBuilder sb = new StringBuilder((int) file.length() + 10); Reader in = new Buffered Reader(new FileReader(file)); while (in.ready()) { sb.append((char) in.read()); } return sb.toString();
HuffmanGUI.java. Instructions below:
HuffmanGUI.java given below, it was too long to paste so I had
to take images instead:
For this last CS211 assignment, you are to submit the following Java code for a Huffman Tree data structure, along with the Huffman Node structure that you have used. For the Huffman Nodes, read through the description of Huffman compression below and the following organization will become apparent. We're building a binary tree, where the "data" is a count of the frequency of each character. public class HuffmanNode implements Comparable { public public int frequency; public char character; HuffmanNode left; public HuffmanNode right; The Huffman Node class will also need a boolean isLeaf() method, plus a static method to actually provide a count of the characters in an input file, and place those counts into a Map, with character as the unique key mapped into the integer counts of that character: public static Map getCounts (FileInputStream input) Your Huffman Tree class must have the following defined: public public HuffmanTree (Map counts) // Constructor StringBuilder compress (InputStream inputFile) // inputFile is a text file public StringBuilder decompress (StringBuilder inputString) //inputString 1'3 & 0's public String printSideways () // as per the method presented in Chapter 17. Note that the input file here can actually be any format, but plain text will allow us to see what we're doing. The compress method returns a string of 1's and 0's (bits) as per the description below. The decompress method can take that string of 1's and 0's, use the Huffman tree structure (different for every text file) and recreate the text file from those bits. Finally, the printSideways here is similar to what is presented in Chapter 17 of Building Java Programs, but in this case we build a string to return into a graphical text area. You should actually use the String Builder class, as it is mutable. We normally don't use System.out calls in a GUI. But I understand they are useful during debugging. You must comment out all System.out calls before submitting your work. The client program (HuffmanGUI.java) is provided, and you need to meet these specifications listed above. I expect two files to be submitted (HuffmanNode.java and Huffman Tree.java) which I will copy into my Eclipse folder and run the provided GUI for testing. Huffman coding is an algorithm devised by David A. Huffman of MIT in 1952 for compressing text data to make a file occupy a smaller number of bytes. This relatively simple compression algorithm is powerful enough that variations of it are still used today in computer networks, fax machines, modems, HDTV, and other areas. Normally text data is stored in a standard format of 8 bits per character, commonly using an encoding called ASCII that maps every character to a binary integer value from 0-255. The idea of Huffman coding is to abandon the rigid 8-bits- percharacter requirement and use different-length binary encodings for different characters. The advantage of doing this is that if a character occurs frequently in the file, such as the letter 'e', it could be given a shorter encoding (fewer bits), making the file smaller. The tradeoff is that some characters may need to use encodings that are longer than 8 bits, but this is reserved for characters that occur infrequently, so the extra cost is worth it.
The table below compares ASCII values of various characters to possible Huffman encodings for the text of Shakespeare's Hamlet. First step is to count the number of occurrences of each character in the text. Frequent characters such as space and 'e should have short encodings, while rarer ones likez have longer ones. Character ASCII value ASCII (binary) Huffman (binary) 32 00100000 10 97 01100001 0001 'b' 98 01100010 0111010 001100 'C' 99 01100011 'e' 101 01100101 1100 'Z' 122 01111010 00100011010 The steps involved in Huffman coding a given text source file into a destination compressed file are the following: 1. Examine the source file's contents and count the number of occurrences of each character. 2. Place each character and its frequency (count of occurrences) into a sorted "priority" queue. Convert the contents of this priority queue into a binary tree with a particular structure. 3. 4. Traverse the tree to discover the binary encodings of each character. 5. Re-examine the source file's contents, and for each character, output the encoded binary version of that character to the destination file. Encoding a File: For example, suppose we have a file named example.txt with the following contents: ab ab cab In the original file, this text occupies 10 bytes (80 bits) of data. The 10th is a special "end-of-file" (EOF) byte. byte 1 2 3 4 5 7 8 9 I char 'a' 'a' 1 'b' 'C' 'a' 10 EOF 256 ASCII 97 98 32 97 98 32 99 97 98 binary 01100001 01100010 00100000 01100001 01100010 00100000 01100011 01100001 01100010 N/A In Step 1 of Huffman's algorithm, a count of each character is computed. (i.e. the getCounts method). The counts are represented as a map: {'¹=2, 'a'=3, 'b'=3, 'c'=1, EOF=1} Step 2 of the algorithm places these counts into binary tree nodes, each storing a character and a count of its occurrences. The nodes are put into a priority queue, which keeps them in sorted order with smaller counts at the front of the queue. (The priority queue is somewhat arbitrary in how it breaks ties, such as being before EOF and being before 'a'). 'b' 2 3 front 1 'C' 1 EOF 3 back T 1 'b' Now the algorithm repeatedly removes the two nodes from the front of the queue (the two with the smallest frequencies) and joins them into a new node whose frequency is their sum. The two nodes are placed as children of the new node; the first removed becomes the left child, and the second the right. The new node is re-inserted into the queue in sorted order: 2 3 3 front back 'b' 'a' do 'c' EOF I
This process is repeated until the queue contains only one binary tree node with all the others as its children. This will be the root of our finished Huffman tree. The following diagram shows this process: 3 3 10 'b' 'a' de dado 2 2 2 3 'b' 2 3 1 'c' EOF EOF 1 EOF Notice that the nodes with low frequencies end up far down in the tree, and nodes with high frequencies end up near the root of the tree. This structure can be used to create an efficient encoding. The Huffman code is derived from this tree by thinking of each left branch as a bit value of 0 and each right branch as a bit value of 1: 10 0 1 3 1 'C' EOF The code for each character can be determined by traversing the tree. To reach. we go left twice from the root, so the code for is 00. The code for 'c' is 010, the code for EOF is 011, the code for 'b' is 10 and the code for 'a' is 11. By traversing the tree, we can produce a map from characters to their binary representations. For this tree, it would be: {' '=00, 'a'=11, 'b'=10, 'c'=010, EOF=011} Using this map, we can encode the file into a shorter binary representation. The text ab ab cab would be encoded as: 'a' I 'b' " 'C' 'a' 'b' char binary 11 'b' 10 EOF 011 00 11 10 00 010 11 10 Use of the word "map" above is not random. Implementing this in code, you might find it useful to create a Map data structure that maps characters into the right string of bits. This really simplifies things, but still requires the building of a Huffman Tree to complete the assignment. The overall encoded contents are 1110001110000101110011, which is 22 bits, or almost 3 bytes, compared to the original file which was 10 bytes. (Many Huffman-encoded text files compress to about half their original size.) Modern encoding algorithms are faster and more efficient, but this is where it all started over 50 years ago, and has often been elevated into other technology (http://en.wikipedia.org/wiki/Huffman coding). 2 4 1 2 0 3 'b' 6 1 3
1 3 byte char a b a b ca b EOF binary 11 10 00 11 10 00 010 1 1 10 011 00 Since the character encodings have different lengths, often the length of a Huffman-encoded file does not come out to an exact multiple of 8 bits. Files are stored as sequences of whole bytes, so in cases like this the remaining digits of the last bit are filled with Os. You do not need to worry about this in the assignment; it is part of the underlying file system. It might worry you that the characters are stored without any delimiters between them, since their encodings can be different lengths and characters can cross byte boundaries, as with 'a' at the end of the second byte above. But this will not cause problems in decoding the compressed file, because Huffman encodings have a prefix property where no character's encoding can ever occur as the start of another's encoding. This is important when you decode the file later. Decoding a File: You can use a Huffman tree to decode text that was compressed with its encodings. The decoding algorithm is to read each bit from the file, one at a time, and use this bit to traverse the Huffman tree. If the bit is a 0, you move left in the tree. If the bit is 1, you move right. You do this until you hit a leaf node. Leaf nodes represent characters, so once you reach a leaf, you output that character. For example, suppose we are asked to decode a file containing the following bits: 101101000110111011 Using the Huffman tree, we walk from the root until we find characters, then we output them and go back to the root. 0 1 1 1 'c' EOF First we read a 1 (right), then a o (left). We reach 'b' and output b. Back to the root. 101101000110111011 We read a 1 (right), then a 1 (right). We reach 'a' and output a. . 101101000110111011 101101000110111011 . We read a o (left), then a 1 (right), then a o (left). We reach 'c' and output c. . We read a o (left), then a o (left). We reach and output a space. 1 We read a 1 (right), then a 1 (right). We reach 'a' and output a. 101101000110111011 101101000110111011 • We read a 0 (left), then a 1 (right), then a 0 (left). We reach 'c' and output c. We read a 1 (right), then a 1 (right). We reach 'a' and output a. So the overall decoded text is bac aca- (The input source reports when we reach an EOF character of 011, so we stop.) 101101000110111011 101101000110111011 When the selected input files get very large (try Moby.txt) the display will take forever (maybe 20 minutes) to load. This is a well know problem with the Java set text methods, as they require constant refresh, rebuffer, and reload. So I provide a check box on the GUI to skip the text displays, so you can see your compression can really run in under a second even for 2 0 1
huge files. In real compression algorithms, the 100011100... bit stream would actually go out over the wire as binary data, and not be converted into character, strings, and graphics (so slow). Deliverables: 1) HuffmanNode.java 2) HuffmanTree.java 3) QA document with screenshots showing your working program.
import java.io.*; import java.awt."; import java.awt.event.*; import java.util.*; import javax.swing.* public class HuffmanGUI implements ActionListener { // Runs the program and starts the GUI. public static void main(String[] args) { new HuffmanGUI().start(); } // font size for input/output file text private static final Font MONOSPACED_FONT= new Font("monospaced", Font.PLAIN, 12); // fields of the Iverson GUI private JFrame frame; private JTextField inputFileField; private JTextArea inputFileArea, huffTreeArea, outputFileArea; private JButton inputBrowse, inputLoad, compress, decompress, clear; private JLabel inputStatusLabel, outputStatusLabel, compressStatusLabel; private JFileChooser chooser; private JCheckBox displayOption; private Map counts = new TreeMap(); private HuffmanTree huffmanTree; private StringBuilder huffResults; private String filelnput; // There were file streams in the original UW version // to simplify, we'll read the whole input file into a string // and the entire stream of bits from the Huffman process into another string (huffResults) // Construct the GUI, sets up all graphical components, layout, and events. // A little complicated, as I started with a very different GUI for the UW version public HuffmanGUI() { // main window frame frame = new JFrame("CS211 (CSE143) Huffman Coding"); frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); // LEFT Panel for input file user interaction Container textAndBrowse = new JPanel(new BorderLayout()); inputFileField = new JTextField(10); inputFileField.addActionListener(this); textAndBrowse.add(new JLabel("Input file: "), BorderLayout.NORTH); textAndBrowse.add(inputFileField); // Another sub-Container for the Buttons Container textButtons = new JPanel(new FlowLayout()); inputBrowse = createButton("Browse...", 'B', this, textButtons); inputLoad = createButton("Reload", "L', this, textButtons); clear = createButton("Clear", "C', this, textButtons); textAndBrowse.add(textButtons, BorderLayout.SOUTH);
// text area in which to load input file contents (left side) inputFileArea = new JTextArea(20, 40); inputFileArea.setFont(MONOSPACED_FONT); inputFileArea.setEditable(false); inputFileArea.setFocusable(false); inputStatusLabel = new JLabel("No file loaded"); // and a container to hold text area and label together Container south = new JPanel(new BorderLayout()); south.add(new JScrollPane(inputFileArea)); south.add(inputStatusLabel, BorderLayout.SOUTH); // finally a container with all of above together Container inputFilePanel = new JPanel(new BorderLayout()); inputFilePanel.add(textAndBrowse, BorderLayout.NORTH); inputFilePanel.add(south, BorderLayout.CENTER); // For displaying the derived bit stream Container bitStream = new JPanel(new BorderLayout()); bitStream.add(new JLabel("Output Stream (String of bits): "), BorderLayout.NORTH); // text area in which to load output file contents outputFileArea = new JTextArea(20, 40); outputFileArea.setFont (MONOSPACED_FONT); outputFileArea.setEditable(false); outputFileArea.setFocusable(false); outputStatusLabel = new JLabel("No file loaded"); bitStream.add(new JScrollPane(outputFileArea)); bitStream.add(outputStatusLabel, BorderLayout.SOUTH); Container outputFilePanel = new JPanel(new BorderLayout()); outputFilePanel.add(bitStream, BorderLayout.NORTH); outputFilePanel.add(bitStream, BorderLayout.CENTER); // text area in which to load tree huffTreeArea = new JTextArea(20, 40); huffTreeArea.setFont(MONOSPACED_FONT); huffTreeArea.setEditable(false); huffTreeArea.setFocusable(false); Container right = new JPanel(new BorderLayout()); right.add(new JScrollPane(huffTreeArea)); Container huffTreePanel = new JPanel(new BorderLayout()); huffTreePanel.add(new JLabel("printSideways() Huffman Tree: "), BorderLayout.NORTH); huffTreePanel.add(right, BorderLayout.CENTER); // bottom section for buttons to compress/decompress Container compressPanel = new JPanel(new BorderLayout()); Container compressCenter = new JPanel(new FlowLayout()); compress = createButton("Compress", "C", this, compressCenter); decompress = createButton("Decompress", "D", this, compressCenter); compressStatusLabel = new JLabel("No file has been compressed yet");
compressStatusLabel.setHorizontalAlignment(SwingConstants.CENTER); displayOption = new JCheckBox("Check to display data", true); displayOption.addActionListener(this); compressPanel.add(compressCenter, BorderLayout.CENTER); compressPanel.add(compressStatusLabel, BorderLayout.SOUTH); compressPanel.add(displayOption, BorderLayout.EAST); // overall window's layout JPanel contentPane = new JPanel(new BorderLayout(15,10)); contentPane.setBorder(BorderFactory.createEmptyBorder(10, 10, 10, 10)); contentPane.add(inputFilePanel, BorderLayout.WEST); contentPane.add(outputFilePanel, BorderLayout.CENTER); contentPane.add(compressPanel, BorderLayout.SOUTH); contentPane.add(huffTreePanel, BorderLayout.EAST); // size frame and place it in the center of the screen frame.add(contentPane); frame.pack(); center(frame); } // Called whenever the user clicks a button or presses Enter on a text field. public void action Performed (ActionEvent event) { Object source = event.getSource(); JTextField field inputFileField; JTextArea area = inputFileArea; JLabel label=inputStatusLabel; if (source == inputBrowse) { // pop up a Browse... file chooser String current Dir = System.getProperty("user.dir"); if (chooser == null) { chooser = new JFileChooser(currentDir); } // load the selected file int result = chooser.showOpenDialog(frame); if (result == JFileChooser.APPROVE_OPTION) { File inputFile = chooser.getSelectedFile(); if (inputFile != null) { // shorten displayed file names if possible for prettier GUI String filename=inputFile.getAbsolutePath(); if (filename.startsWith(currentDir + File.separator)) { filename = filename.replace(currentDir + File.separator, ""); } field.setText(filename); loadFile(filename, area, label); } } } else if (source == inputLoad) { // load file from disk and display on text area String filename = field.getText().trim(); loadFile(filename, area, label); } else if (source == clear) { // clear display of file name and contents
area.setText(" "); //field.setText(" "); } else if (source == inputFileField) { // user pressed Enter on input file text field; load file and display it String inputFileName=inputFileField.getText().trim(); if (inputFileName.length() == 0) { return; } loadFile(inputFileName, inputFileArea, inputStatusLabel); } else if (source == compress) { // compress the currently selected file compress(); } else if (source == decompress) { // decompress the currently selected file decompress(); } // Helper to load the contents of the given FILE and display them into the // given text area. The given label displays status updates along the way. // Works for both text and binary files. public void loadFile(final String filename, final JTextArea area, final JLabel label) { final File file = new File(filename); if (file.exists()) { new Thread(new Runnable() { public void run() { label.setText("Loading..."); try { area.setText(readEntireFile(filename)); //helper to convert file to a String area.setLineWrap(false); label.setText("Text file, " + file.length() + " bytes (" + file.length() / 1024+" kB)"); area.setSelection Start(0); // scroll to top area.setSelectionEnd(0); } catch (IOException ioe) { // these exceptions are tough to get to now area.setText(""); label.setText("No file loaded"); ioe.printStackTrace(); JOptionPane.showMessageDialog(frame, "I/O error occurred:\n" + ioe.getMessage(), "I/O error", JOptionPane.ERROR_MESSAGE); } catch (OutOfMemoryError mem) { area.setText("(file too large to display)"); label.setText("No file loaded"); mem.printStackTrace(); JOptionPane.showMessageDialog(frame, "Out of memory error occurred:\n" + mem.getMessage(), "Memory error", JOptionPane.ERROR_MESSAGE); } }).start(); } else { // user has not yet selected an input and output file area.setText(""); label.setText("No file loaded"); }
// Runs the GUI by displaying its graphical frame. public void start() { frame.setVisible(true); } // Helper to compress the currently selected input file to the selected output file. private void compress() { final String inputFileName=inputFileField.getText().trim(); if (inputFileName.length() == 0) { return; } // run compression in a separate thread of execution because it can be slow new Thread(new Runnable() { public void run() { try { //get file's character counts compressStatusLabel.setText("Compression in progress..."); outputFileArea.setText("..."); outputStatusLabel.setText("Counting characters..."); counts = HuffmanNode.getCounts(new FileInputStream(inputFileName)); huffmanTree = new HuffmanTree(counts); if (displayOption.isSelected()) huffTreeArea.setText(huffmanTree.printSideways()); else huffTreeArea.setText("size=" + huffmanTree.printSideways().length() outputStatusLabel.setText("Creating encodings..."); // compress FileInputStream input = new FileInputStream(inputFileName); // use Huffman tree and counts to compress the input file // (also time it so we can display the runtime afterward) outputStatusLabel.setText("Compressing ..."); long startTime = System.currentTimeMillis(); huffResults = huffmanTree.compress(input); // display compressed file if (displayOption.isSelected()) outputFileArea.setText(new String(huffResults)); // only use for small files else outputFileArea.setText("size=" + huffResults.length() + " bits"); outputFileArea.setLineWrap(true); outputStatusLabel.setText("String of bits, size = "+huffResults.length()/8+" bytes ("+huffResults.length() / outputFileArea.setSelectionStart(0); // scroll to top long endTime = System.current TimeMillis(); long elapsed = end Time - startTime; compressStatusLabel.setText("Compression complete (" + elapsed + "ms)"); } catch (IOException ioe) { ioe.printStackTrace(); (8*1024) + " kB) "); + characters");
JOptionPane.showMessageDialog(frame, "I/O error occurred:\n" JOptionPane.ERROR_MESSAGE); + ioe.getMessage(), "I/O error", } catch (OutOfMemoryError mem) { mem.printStackTrace(); outputFileArea.setText("(file too large to display)"); JOptionPane.showMessageDialog(frame, "Out of memory error occurred:\n" + mem.getMessage(), "Memory error", JOptionPane.ERROR_MESSAGE); } } }).start(); } private void decompress() { new Thread(new Runnable() { public void run() { try { // use Huffman tree and counts to decompress // (also time it so we can display the runtime afterward) outputStatusLabel.setText("Decompressing above bits..."); long startTime = System.currentTimeMillis(); try { filelnput = new String(huffman Tree.decompress(huffResults)); } catch (Exception e) {} long endTime = System.currentTimeMillis(); long elapsed = endTime - startTime; // display decompressed file inputFileArea.setText(filelnput); inputFileArea.setLineWrap(true); inputStatusLabel.setText("Decompressed file size= " + filelnput.length()+" bytes (" + filelnput.length() / (1024) + kB) "); inputFileArea.setSelection Start(0); // scroll to top outputStatusLabel.setText("Done..."); compressStatusLabel.setText("Decompression complete (" + elapsed + "ms)"); } catch (OutOfMemoryError mem) { JOptionPane.showMessageDialog(frame, "Out of memory error occurred:\n" + mem.getMessage(), "Memory error", JOptionPane.ERROR_MESSAGE); mem.printStackTrace(); } } }).start(); } // Moves the given window to the center of the screen. private static void center(Component comp) { Dimension size = comp.getSize(); Dimension screen = Toolkit.getDefaultToolkit().getScreenSize(); comp.setLocation(Math.max(0, (screen.width - size.width) / 2), Math .max(0, (screen.height - 24 - size.height)/2)); } // Helper method to create a JButton with the given properties.
private static JButton createButton(String text, char mnemonic, ActionListener listen, Container panel) { JButton button = new JButton(text); if (mnemonic != '\0') { button.setMnemonic(mnemonic); } button.addActionListener(listen); if (panel != null) { panel.add(button); } button.setFocusable(false); return button; } // Returns the entire contents of the given file as a string. // To be used with text files. private static String readEntireFile(String filename) throws IOException { File file = new File(filename); if (!file.exists()) { return null; } StringBuilder sb = new StringBuilder((int) file.length() + 10); Reader in = new Buffered Reader(new FileReader(file)); while (in.ready()) { sb.append((char) in.read()); } return sb.toString();