1 Background Learning Objectives: . LO1: Gathers data to support and refute ideas (EM@FSE C) . LO2: Propose a tentative

Post by **answerhappygod** » Mon Jun 06, 2022 11:26 am

1 Background
Learning Objectives:
. LO1: Gathers data to support and refute ideas (EM@FSE C)
. LO2: Propose a tentative model for an algorithm's
performance.
. LO3: Develop code to produce sample inputs to evaluate a
sorting algorithm.
. LO4: Develop code to evaluate a sorting algorithm and
empirically estimate a Big-Oh run-time.
A typical problem with using an existing solution to an
algorithmic problem is needing to understand its
behavior. To make matters more di cult, behavior may depend on
the input's structure (e.g., insertion sort's
linear time performance on nearly sorted data). Algorithms may
be very complicated, or we may not have
access to the source code, in either case, the algorithm may
need to be treated as a black box, whose behavior
is unknown. To understand it's behavior, we need to perform an
empirical evaluation (benchmarking). This
problem often occurs in the context of a software development
team. For example, a team may receive a
piece of legacy software that is not documented properly (e.g.,
this algorithm performs in O(n2) on data
that looks like...), or they may be asked to work with a
blackbox solution from another team or a 3rd party.
In these cases, a software developer on the team would be asked
to analyze/quantify the performance of the
algorithm. The team can be thought of as a customer, who the
developer is trying to serve by providing
knowledge on the internal solutions that the team can leverage
to achieve their larger goal.
For this homework, you will start by manually predicting a
performance of two sorting algorithms on
three types of input (by considering the algorithms and the
input data format). This produces an initial
"model" for how the algorithm will behave. Note that rather than
producing a model from initial sample
data like in empirical analysis, we are "seeding" the
understanding of these novel algorithm/data scenarios
with our existing understanding for other inputs (e.g., we
understand that insertion sort on nearly ordered
data is linear time). The next step will be to evaluate the
algorithm on the input data, and see what is
the result in practice. This produces two things: 1) We are able
to evaluate and improve our own intuition
(was the way we thought it would work correct?). 2) It produces
a model for how the algorithm behaves.
The rst aspect is the the reason why we want to make our own
prediction instead of jumping directly to
benchmarking. (Consider: the more accurate our mental models
are, the faster we can nd solutions to
problems. A side e ect of solving a problem is becoming better
at problem-solving in general.)
To perform the benchmarking to validate our initial guess for
performance, we will create three di erent
types of input data, that may give di erent results when sorted.
The two sorting algorithms will then be
benchmarked on these three types of data. Each algorithm will be
run twice, for di erent dataset sizes, in
order to get times that we can use to apply the doubling
formula. (see slide 23: Modeling Small Datasets)
in the Analysis of Algorithms slide deck for details on the
doubling formula.) The doubling formula is
lg T(2N)
T(N) = b where lg indicates a base2 log. If we compute the
formula, then we will be able to gure out
the algorithm's Big-Oh for a particular type of input data,
since they will be O(nb). b is simply the power.