You will implement a simple generic Differential Evolution search
You will apply this generic algorithm to two problems
You are provided with scaffolding code that you will need to complete
You will also perform experiments and report on your results
In machine learning, a hyperparameter is a parameter whose value is set before the learning process
begins. By contrast, the values of other parameters like the weights of a neural network are derived
via training. Examples of hyperparameters for neural networks include the number of layers, the
number of neurons in each layer and the learning rate.
Different training algorithms require different hyperparameters. Some simple algorithms (such as
ordinary least squares regression) require none. The optimal values for the hyperparameters depend
also on the dataset. In this assignment, you will implement a search algorithm to find good values
for a small set of hyperparameters of a neural network.
Differential Evolution (DE) is an efficient and high performing optimizer for real-valued functions.
DE is the search technique that you are required to use in this assignment to solve the optimization
problems. As DE is based on evolutionary computing, it performs well on multi-modal,
discontinuous optimization landscapes. Another appealing feature of DE is that it is still applicable
even if the function we try to optimize is not differentiable. This is often the case when some
simulation is involved in the computation of the real-valued function being optimized. In these
scenarios, a gradient descent approach is not applicable. DE performs a parallel search over a
population of fixed size, where each population member is a higher dimensional vector.
In this assignment, the core problem is to adapt DE to perform a search for four hyperparameters of
a neural network. In the next section, we describe in more details DE.
Let f(w) be the cost function which must be minimized. Note that maximization can be performed
by considering the function -f . The function f(w) takes a candidate solution as argument in the form
of a vector w of real numbers and produces a real number as output which indicates the cost of the?
There exist many variations of the DE algorithm. The variation that you have to implement has
been chosen for its simplicity. In the rest of this section, variable names written in?blue italic
correspond to Python program variables used in the provided scaffolding code.
We constrain the search by specifying bounds for each component of?w?with a list of pairs called
bounds?in the scaffolding code. That is, we impose
bounds[k] <=?w[k] <=?bounds[k]
for k in range(len(w))
For example, if?bounds?is [(1,100),(1,100),(-6,2),(-6,1)] , then w is constrained to the interval
[1, 100] and w is constrained to the interval [-6, 2].
To make the code generic, we normalize each component of?w?to the interval [0, 1] by mapping
with an affine function the interval [?bounds[k],?bounds[k] ] to the interval [0, 1].
We first initialize all individuals?w?of the initial population with random points from the search-
Until a termination criterion is met (maximum number of iterations/generations done, or acceptable
cost is reached), we repeat the following loop:
For each individual?w?in the population do:
Pick three distinct individuals?a,?b?and?c?from the current population at random. The
individuals?a,?b?and?c?must be distinct from?w?as well.
Create a?mutant?vector?a?+?mut?* (b?–?c) where?mut?is a constant (passed as an argument to
Clip the?mutant?entries to the interval [0, 1]
Create a?trial?vector by assigning to?trial[k] with probability?crossp?the value?mutant[k]
and probability 1-crossp?the value of?w[k]
Evaluate the cost of the?trial?vector. If the?trial?vector is better than?w, then replace?w?by the
In other words, for each individual?w?in the current population, we create a challenger?trial?vector.
If this challenger is better than?w, then?w?is replaced by the?trial?vector .
The mutation factor?mut?with and the crossover constant?crossp?are numerically defined in the
The file “my_submission.py” contains scaffolding code. The key function is?differential_evolution.
This function return a generator. You should have seen generators in the Python
To validate your implementation of the function?differential_evolution, you should first test and
debug it with the function?task_1?as it tries to solve a simpler problem than?task_2.
In Task 1, you fit a polynomial to a noisy dataset. This is the same problem as you have seen in the
early pracs where you solve this problem with the pseudo-inverse and a gradient descent approach.