solutions that the team can leverage to achieve their larger goal. For this homework, you will start by manually predicting a performance of two sorting algorithms on three types of input (by considering the algorithms and the input data format). This produces an initial "model" for how the algorithm will behave. Note that rather than producing a model from initial sample data like in empirical analysis, we are "seeding" the understanding of these novel algorithm/data scenarios with our existing understanding for other inputs (e.g., we understand that insertion sort on nearly ordered data is linear time). The next step will be to evaluate the algorithm on the input data, and see what is the result in practice. This produces two things: 1) We are able to evaluate and improve our own intuition (was the way we thought it would work correct?). 2) It produces a model for how the algorithm behaves. The first aspect is the the reason why we want to make our own prediction instead of jumping directly to benchmarking. (Consider: the more accurate our mental models are, the faster we can find solutions to problems. A side effect of solving a problem is becoming better at problem-solving in general.) To perform the benchmarking to validate our initial "guess" for performance, we will create three different types of input data, that may give different results when sorted. The two sorting algorithms will then be benchmarked on these three types of data. Each algorithm will be run twice, for different dataset sizes, in order to get times that we can use to apply the doubling formula. (see slide 23: Modeling Small Datasets) in the Analysis of Algorithms slide deck for details on the doubling formula.) The doubling formula is lg T(N) 972) = b where lg indicates a base2 log. If we compute the formula, then we will be able to figure out the algorithm's Big-Oh for a particular type of input data, since they will be O(nb). bis simply the power. This document is separated into four sections: Background, Requirements, Testing, and Submission. You have almost finished reading the Background section already. In Requirements, we will discuss what is expected of you in this homework. In Testing, we suggest some basic tests you use to start to verify your implementation. Lastly, Submission discusses how your source code should be submitted on Canvas/Grade- scope.
2 Requirements [36 points total] For this assignment you will be writing code to generate test data and benchmark sorting algorithms on it (edited from Sedgewick and Wayne: 2.1.36). Already provided for you is CompletedBenchmark Tool.java which contains the sorting algorithms: insertion sort and shellsort. As usual, it's not complete just yet. This class extends an interface called BenchmarkTool which defines all of the methods that you need to create. You may create additional private helper methods. Before writing any code, you should write up your hypotheses (3 per algorithm) to describe what you think the running time will look like on each algorithm/input combination (see Writeup below). Your first step to benchmarking the algorithms will be to write a series of methods to generate test data that is "non-uniform": • Implement the method public Integer generate Test DataBinary (int size). Half the data is Os, half 1s. For example, an input of length 8 might look like [0, 0, 0, 0, 1, 1, 1, 1]. See the interface. [4 points] • Implement the method public Integer[] generate TestData Halfs(int size). Half the data is 0s, half the remainder is 1s, half the reminder is 2s, half the reminder is 3s, and so forth. For example, an input of length 8 might look like [0, 0, 0, 0, 1, 1, 2, 3]. See the interface. [6 points] • Implement the method public Integer[] generate Test Data HalfRandom (int size). Half the data is Os, half random int values (use nextInt() from Java's Random package, and use Integer.MAX_VALUE as its argument to ensure the values are positive). For example, an input of length 8 might look like [0, 0, 0, 0, 54119567, 4968, 650736346, 138617093]. See the interface. [4 points] Each of these three techniques should be implemented as a method that takes an int representing the size of a dataset, and returns an Integer array containing that number of elements generated with the corresponding rule. Do not randomize (shuffle) the contents of the generated arrays. You may assume that only arrays with a power 2 length will need to be created. For each of the sorting algorithms, your program should run them on the three types of test data. Test them with datasets size of 4096 and 4096*2. (If your system is so fast that you don't get good results, you may increase the dataset size. If your system continues to generate strange values even with a large dataset size, try turning off compiler optimizations.) Time each of the tests with the Stopwatch class discussed in class. The program needs to compute the result of the doubling formula on the run times from the sample data generated to get the power ("b") for that algorithm on that type of input, and then display it. Six different values should be shown if you have properly implemented all of the benchmarks. • Implement the method public double compute Doubling Formula (double t1, double t2). See the interface for more information. [2 points] • Implement the method public double benchmarkInsertionSort(Integer[] small, Integer[] large). See the interface for more information. [2 points] • Implement the method public double benchmark Shellsort(Integer[] small, Integer[] large). See the in- terface for more information. [2 points] • Implement the method public void runBenchmarks(int size). Whitespace is flexible (any number of tabs or spaces) but you must show three decimal places. See the interface for more information. Hint: should call the two benchmark methods above. The output should look like below. [4 points] Insertion Shellsort Bin #### #### Half #### #### RanInt #### ####
2.1 Packages Do not import any packages other than java.text.Decimal Format or java.util. Random. (Do not use any star imports.) Required Files Completed Benchmark Tool.java Optional Files (none) Table 1: Submission ZIP file contents. 2.2 Writeup [12 points] Using the code you implemented, test and evaluate hypotheses about the effect of input on the performance of insertion sort and shellsort. In a separate document (to be submitted as a PDF; typically two pages double spaced), discuss: • Your hypothesis: Write up your hypotheses (3 per algorithm): describe what you think the running time will look like (O(n)? O(n²)? O(n³)?) on each data set, and explain briefly why you think that. As long as your ideas make sense, and you do the analysis prior to benchmarking, you will receive full credit on the hypotheses. [4 points] • Benchmark Results: Include the data (these are really the power on n) that your program outputs. If your results are strange, please explain possible issues (see disadvantages of empirical analysis). [1 points] • Evaluation/Validation: Compare and contrast your hypothesis for the run-time with the data that you generated. What was supported? What is still unclear? (Your hypothesis and your result will probably be different!) [4 points] • Scenario: Suppose that you are on a team that was given a legacy codebase that uses shellsort sort to process an array with half zeros, half random data. The legacy documentation indicates that shellsort is the best choice but you are asked to investigate and confirm. Based on your findings above, would you recommend, for better performance, that the team to stay with shellsort or switch to insertion sort? Multiple answers are possible but must be logical. (If you do not feel you have enough information to make a decision, state that, and justify.) [3 points]
4 Submission The submission for this assignment has two parts: a written hypothesis document, and a source code submission. Writeup: Submit a PDF that discusses your hypotheses. Include a header that contains your name, the class, and the assignment. We not are expecting anything longer than two pages. (If it makes your writeup more clear, you may use additional pages.) Source Code: The source file must be named as "Completed Benchmark Tool.java", and then added to a ZIP file (which can be called anything). The class must be in the "edu.ser222.m02_01" package, as already done in the provided base file (do not change it!). You will submit the ZIP on Gradescope.
Stopwatch.java 1 2 3 = 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 23 24 25 26 27 28 29 30 31 32 33 34 package edu.ser222.m02_01; /** * A utility class to measure the running time (wall clock) of a program. * * Improved over textbook implementation by switching to nanoTime. * * @author Acuna * @author Sedgewick * @author Wayne * @version 1.0 */ public class Stopwatch { 2 usages private final long start; /** * Initializes a new stopwatch. */ public Stopwatch() { start = System.nanoTime(); } * Returns the elapsed CPU time (in seconds) since the stopwatch was created. * * @return elapsed CPU time (in seconds) since the stopwatch was created */ public double elapsed Time () { long now = System.nanoTime(); return (now-start) / 1000000000.0; /** A } } ⠀ A 3 v Notifications
Benchmark Tool.java 1 package edu.ser222.m02_01; 2 3 /** An interface that defines a student's solution to the Section 02.01 ...*/ 11 o public interface BenchmarkTool { 12 13 H /* 14 * Generates an array of integers where half the data is 0s, half 1s. 15 16 * @param size number of elements in the array. 17 * @return generated test set. 18 */ 19 public Integer [] generate TestDataBinary(int size); 20 21 /** 22 * Generates an array of integers where half the data is 0s, half the 23 * remainder is 1s, half the reminder is 2s, half the reminder is 3s, and so * forth. 24 25 * 26 * @param size number of elements in the array. 27 * @return generated test set. 28 */ 29 public Integer [] generate TestDataHalves (int size); 30 31 /** 32 * Generates an array of integers where half the data is 0s, and half random * int values. All values will be positive. 33 34 * @param size 35 * @return 36 A */ 37 public Integer [] generate TestDataHalfRandom(int size); 38 39 B /** 40 * Computes the double formula value for two run times. 41 42 * @param t1 first time 43 * @param t2 second time 44 * @return b value 45 A */ 46 public double computeDoubling Formula (double t1, double t2); 47 A B A B A 15 2 ^ –
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 B * Computes an empirical b value for insertion sort by running it on a pair * of inputs and using the doubling formula. * * @param small small test data array * @param Large Large test data array. twice the same of small array. * @return b value @ */ public double benchmarkInsertion Sort (Integer [] small, Integer [] large); A * Computes an empirical b value for shellsort sort by running it on a pair * of inputs and using the doubling formula. * @param small small test data array * @param Large Large test data array. twice the same of small array. * * @return b value Ą */ public double benchmarkShellsort (Integer[] small, Integer [] large); A /** * Runs the two sorting algorithms on the three types of test data to * produce six different b values. B values are displayed to the user. * * @param size size of benchmark array. to be doubled later. */ 1 usage public void runBenchmarks (int size); A } /**
Completed Benchmark Tool.java 2 3 package edu.ser222.m02_01; /** (basic description of the program or class) ...*/ 11 12 13 ► public class Completed BenchmarkTool implements BenchmarkTool { 14 15 F /********* 16 * START - SORTING UTILITIES, DO NOT MODIFY (FROM SEDGEWICK) 17 A ************ ************** ********** 18 19 @ public static void insertionSort (Comparable [] a) { 20 int N = a.length; 21 22 for (int i = 1; i < N; i++) 23 { 24 25 // Insert a among a[i-1], a[i-2], a[i-3]... for (int j = i; j> 0 && less(a[j], a[i-1]); i--) exch(a, i, j: j-1); 26 27 28 29 30 31 @ 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 E A F A A A } public static void shellsort (Comparable [] a) { int N = a.length; int h = 1; while (h < N/3) h = 3*h + 1; // 1, 4, 13, 40, 121, 364, 1093, ... while (h >= 1) { // h-sort the array. for (int i = h; i < N; i++) { // Insert a among a[i-h], a[i-2*h], a[i-3*h].... for (int j = i; j >= h && less(a, a[j-h]); j -= h) exch(a, i, j: j-h); } h = h/3; } **********/ } 01 A 11 3 AV Notifications
47 48 49 @ 52 53 54 @ 55 56 57 58 59 60 61 62 63 64 ▶ F 65 66 67 68 69 70 71 72 73 A A A Ą } 2 usages private static boolean less (Comparable v, Comparable w) { return v.compareTo (w) < 0; } 2 usages private static void exch (Comparable[] a, int i, int j) { Comparable t = a; a = a[j]; a[j] = t; } /********** ******** * END - SORTING UTILITIES, DO NOT MODIFY ********** ******/ //TODO: implement interface methods. public static void main(String args[]) { BenchmarkTool me = new Completed BenchmarkTool(); int size = 4096; //NOTE: feel free to change size here. all other code must go in the // methods. me.runBenchmarks (size); }
Summary: In this homework, you will be implementing code to generate test data and benchmark sorting. 1 Background Learning Objectives: .LO1: Gathers data to support and refute ideas (EM@FSE C) • LO2: Propose a tentative model for an algorithm's performance. • LO3: Develop code to produce sample inputs to evaluate a sorting algorithm. • LO4: Develop code to evaluate a sorting algorithm and empirically estimate a Big-Oh run-time. A typical problem with using an existing solution to an algorithmic problem is needing to understand its behavior. To make matters more difficult, behavior may depend on the input's structure (e.g., insertion sort's linear time performance on nearly sorted data). Algorithms may be very complicated, or we may not have access to the source code, in either case, the algorithm may need to be treated as a black box, whose behavior is unknown. To understand it's behavior, we need to perform an empirical evaluation (benchmarking). This problem often occurs in the context of a software development team. For example, a team may receive a piece of legacy software that is not documented properly (e.g., this algorithm performs in O(n²) on data that looks like...), or they may be asked to work with a blackbox solution from another team or a 3rd party. In these cases, a software developer on the team would be asked to analyze/quantify the performance of the algorithm. The team can be thought of as a customer, who the developer is trying to serve by providing knowledge on the internal Summary: In this homework, you will be implementing code to generate test data and benchmark sorting. 1 Background Learn
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am