How to maximize a non-analytic function over a variable space

I have a very simple question that I thought there might be a solution online, but I haven't found one yet.

I have a (not math, not analytic) function that calculates the value of F based on a set of variables a, b, c, d that come from a set of files / databases / online traversal, and I want to find a set of variables a, b, c, d, maximizing F. Searching for the entire space a, b, c, d is impossible, and the use of differentials / derivatives is impossible, since F is not analytic. I would really appreciate a pointer to what packages / algorithms I could use, or just how to get started. Most of what I've seen on the internet on python optimization seems to be about analytic / math functions (f (x) = x ^ 2 + ...) and no more non-analytic problems.

For example:

def F(a,b,c,d):
   ... a lot of computations from databases, etc using 
   a,b,c,d that are different float values ...
   returns output # output is a float

      

Now, for all values ​​of a, b, c, d, where each has possible values, say [0, 0.1, 0.2, ... 1.0]. The values ​​are discrete and I don't need extreme precision in my optimization.

Now I want to find the set of values ​​a, b, c, d that gives me the highest F.

Oh, and I have no maximization constraints on either F, a, b, c, d ..

+3


source to share


3 answers


I was able to build on the answer, thanks everyone for the help and comments here. The DEAP documentation was very helpful, but I would like to share my answer anyway along with some comments that hopefully others will be helpful.

I used the OneMax example here https://github.com/DEAP/deap/blob/b46dde2b74a3876142fdcc40fdf7b5caaa5ea1f4/examples/ga/onemax.py with a passer-yes from here: https://deap.readthedocs.org/en/latest/examples /ga_onemax.html . I found these two pages to be invaluable as well, statements: https://deap.readthedocs.org/en/latest/tutorials/basic/part2.html#next-step and type creation: https://deap.readthedocs.org/en /latest/tutorials/basic/part1.html#creating-types

So my solution is here (formatting apologies, this is my first posting of long code and adding comments). Basically, all you really need to do is make the fitness function (here eval_Inidividual

) the function F you are trying to optimize. And you can limit the range (and distribution) of each of the N variables / inputs of F by limiting their possible values ​​on initialization (here random_initialization

) and mutation (here mutate_inputs

).

One final note: I made my code multi-core using the multiprocessing library (only two lines of code and changing the map function you are using!). More details here: https://deap.readthedocs.org/en/default/tutorials/distribution.html



Code (read my comments for a more detailed explanation):

import random

from deap import base
from deap import creator
from deap import tools

start_clock = time.clock()


NGEN =  100  # number of generations to run evolution on
pop_size = 10000 # number of individuals in the population. this is the number of points in the N-dimensional space you start within
CXPB  = 0.5 # probability of cross-over (reproduction) to replace individuals in population by their offspring
MUTPB = 0.2 # probability of mutation
mutation_inside = 0.05 # prob mutation within individual
num_cores = 6
N = 8 # the number of variables you are trying to optimize over. you can limit the range (and distribtuion) of each of them by limiting their possible values at initialization and mutation.


def eval_Inidividual(individual):
    # this code runs on your individual and outputs the 'fitness' of the individual

def mutate_inputs(individual, indpb):
    # this is my own written mutation function that takes an individual and changes each element in the tuple with probability indpb 
    # there are many great built in such mutation functions

def random_initialization():
    # this creates each individual with an N-tuple where N is the number of variables you are optimizing over

creator.create("FitnessMax", base.Fitness, weights=(-1.0,)) # negative if trying to minimize mean, positive if trying to maximize sharpe. can be a tuple if you are trying to maximize/minimize over several outputs at the same time e.g. maximize mean, minimize std for fitness function that returns (mean, std) would need you to use (1.0, -1.0)
creator.create("Individual", list, fitness=creator.FitnessMax)

toolbox = base.Toolbox()
# Attribute generator
toolbox.register("attr_floatzzz", random_initialization) # i call it attr_floatzzz to make sure you know you can call it whatever you want.
# Structure initializers
toolbox.register("individual", tools.initRepeat, creator.Individual, 
    toolbox.attr_floatzzz, N) # N is the number of variables in your individual e.g [.5,.5,.5,.5,.1,100] that get 
# fed to your fitness function evalOneMax
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

import multiprocessing as mp

pool = mp.Pool(processes=num_cores)
toolbox.register("map", pool.map) # these 2 lines allow you to run the computation multicore. You will need to change the map functions everywhere to toolbox.map to tell the algorithm to use a multicored map

# Operator registering
toolbox.register("evaluate", eval_Inidividual)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", mutate_inputs, indpb = mutation_inside)
toolbox.register("select", tools.selTournament, tournsize=3)


def main():
#     random.seed(64)

    pop = toolbox.population(n=pop_size) # these are the different individuals in this population, 
                                    # each is a random combination of the N variables

    print("Start of evolution")

    # Evaluate the entire population
    fitnesses = list(toolbox.map(toolbox.evaluate, pop))
    for ind, fit in zip(pop, fitnesses):
        ind.fitness.values = fit   #this runs the fitness (min mean on each of the individuals)

#     print("  Evaluated %i individuals" % len(pop))

    # Begin the evolution
    for g in range(NGEN):
        print("-- Generation %i --" % g)
        f.write("-- Generation %i --\n" % g)
#         f.write("-- Generation %i --\n" % g)
#         g = open('GA_generation.txt','w')
#         g.write("-- Generation %i --" % g)
#         g.close()

        # Select the next generation individuals
        offspring = toolbox.select(pop, len(pop)) # this selects the best individuals in the population
        # Clone the selected individuals
        offspring = list(toolbox.map(toolbox.clone, offspring)) #ensures we don’t own a reference to the individuals but an completely independent instance.

        # Apply crossover and mutation on the offspring
        for child1, child2 in zip(offspring[::2], offspring[1::2]): #this takes all the odd-indexed and even-indexed pairs child1, child2 and mates them
            if random.random() < CXPB:
                toolbox.mate(child1, child2)
                del child1.fitness.values
                del child2.fitness.values

        for mutant in offspring:
            if random.random() < MUTPB:
                toolbox.mutate(mutant)
                del mutant.fitness.values

        # Evaluate the individuals with an invalid fitness
        invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
        fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)
        for ind, fit in zip(invalid_ind, fitnesses):
            ind.fitness.values = fit

#         print("  Evaluated %i individuals" % len(invalid_ind))

        # The population is entirely replaced by the offspring
        pop[:] = offspring

        # Gather all the fitnesses in one list and print the stats
        fits = [ind.fitness.values[0] for ind in pop]

#         length = len(pop)
#         mean = sum(fits) / length
#         sum2 = sum(x*x for x in fits)
#         std = abs(sum2 / length - mean**2)**0.5

        print("  Min %s" % min(fits))
#         print("  Max %s" % max(fits))
#         print("  Avg %s" % mean)
#         print("  Std %s" % std)

    print("-- End of (successful) evolution --")

    best_ind = tools.selBest(pop, 1)[0]
    print("Best individual is %s with mean %s" % (best_ind, 
        best_ind.fitness.values[0])

    done = time.clock() - start_clock # the clock doens't work on the multicored version. I have no idea how to make it work :)
    print "time taken: ", done, 'seconds'


if __name__ == "__main__":
    main()

      

ps: watch doesn't work on multicore version. I don't know how to make it work :)

+1


source


For a non-analytic function, you can explore the parameter space with a genetic algorithm or similar evolutionary computation. Then find the highs or "hills" within the resulting space to find a solution that maximizes your function. I would suggest using a library rather than writing it yourself; DEAP looks pretty promising.



+2


source


You broke your problem very well. This, as it seems to you, is a search in space.

I don't know of libraries for doing this in any language other than Prolog (where it actually is the language itself as the solution engine), but one of the more useful spatial search algorithms is the "Stellar" search, also known as "heuristic optimization". Your problem is like giving in to a good A-star search neighbor called "greedy best search".

Basically you start with some set of parameters, call F

on those parameters, and then tweak each parameter a little to see how it changes F

. You are heading uphill, adopting a setting that zooms F

up the most and potentially highlights "other" paths to find later. This eagerly brings you closer to the "top of the hill" - the local maximum. Once you reach the local maximum, you can try again with some random combination of parameters. You can even use something like simulated annealing to reduce the amount you tweak over time - searching is very chaotic at first and then settling when you know an undefined problem landscape.

The only way to guarantee an optimal result is to do a full search like BFS, but there are many good ways to make this very likely, you will get an optimal result. Which one will give the best results depends on F

: The hill climb presented here is best if the mapping between inputs and outputs is at least close to the adjacent topology with small discontinuities.

+2


source







All Articles