Friday, June 10, 2011

Vectorizing Scripts for Improved Performance: Faster Fitting an Ellipse to Noisy Data

ExampleEllipse[4]

Problem Statement:

The optimization problem to fit an ellipse is too slow.

Discussion:

The easiest thing to improve performance in a script is to try compiling it using a tool like psyco or py2exe. Beyond that, the time critical portions of the code can be written in C/C++ and embedded into the script using a library like scipy.weave or a tool like Cython.

If a Python script has elements which are naturally parallelizable, then there are additional approaches to improving the script’s performance. One approach is to break the code up and execute it among several CPU cores using a library like the multiprocessing library or parallel python. Another approach to try taking advantage of the CPU and GPU hardware using PyOpenCL. Finally, the script can be vectorized, where all of the equations are expressed as vector operations.

Since the objective in the Ellipse fitting problem is easily decomposed into a set of parallel equations, the solution considered here is the use of numpy and vectorization of the code.

Solution:

The original problem, presented previously, is broken into a couple of modules. The first module generates the ellipse data used in the fitting exercise.

ellipse.py

"""
Module to generate noisy data samples about an ellipse. 
"""
from numpy import *
from numpy import linalg as LA
from random import uniform,normalvariate

def generate_noisy_pts(a,foci1,foci2,
                    num_pts=200, angles=None,
                    x_noise = None, y_noise=None):

    '''
       Generate points for an ellipse given
          the foci, and
          the distance to the intersection of the minor axis and ellipse.
    
       Optionally, 
          the number of points can be specified,
          the angles for the points wrt to the centroid of the ellipse, and 
          a noise offset for each point in the x and y axis.
    '''
    c = (1/2.0)*LA.norm(foci1-foci2)
    b = sqrt(a**2-c**2)
    x1 = foci1[0]
    y1 = foci1[1]
    x2 = foci2[0]
    y2 = foci2[1]
    if angles is None:
        t = arange(0,2*pi,2*pi/float(num_pts))
    else:
        t = array(angles)
            
    ellipse_x = (x1+x2)/2 +(x2-x1)/(2*c)*a*cos(t) - (y2-y1)/(2*c)*b*sin(t)
    ellipse_y = (y1+y2)/2 +(y2-y1)/(2*c)*a*cos(t) + (x2-x1)/(2*c)*b*sin(t)
    try:
        # try adding noise to the ellipse points
        ellipse_x = ellipse_x + x_noise
        ellipse_y = ellipse_y + y_noise
    except TypeError:
        pass
    return (ellipse_x,ellipse_y)

def generate_point_list(num_samples,a_ref,foci1,foci2,sigma=0.2):    
    # generate list of noisy samples on the ellipse
    angles = [uniform(-pi,pi) for i in range(0,num_samples)]
    sigma = 0.2
    x_noise = [normalvariate(0,sigma) for t in angles]
    y_noise = [normalvariate(0,sigma) for t in angles]
    x_list,y_list = generate_noisy_pts(a_ref,foci1,foci2,
                                    angles  = angles,
                                    x_noise = x_noise,
                                    y_noise = y_noise)
    point_list = [array([x,y]) for x,y in zip(x_list,y_list)]    
    return point_list

    
if __name__=='__main__':
    import matplotlib.pylab as plt
    
    # define the foci locations
    foci1 = array([2,-1])
    foci2 = array([-2,1])
    # define distance from foci to ellipse circumference
    a_ref = 2.5


    num_pts = 200
    point_list = generate_point_list(num_pts,a_ref,foci1,foci2)
    
    plt.figure()
    x,y = zip(*point_list)
    plt.plot(x,y,'.b')
    plt.show()

The next two modules define the objective function in two different ways. In the first, the code calculates the objective function using scalar math. In the second, the objective function is calculated using numpy vector math.

The first module, objective_scalar.py, uses a class to define the objective function. This is done for several reasons. The first is that the class allows the parameters, the values which will not be changed by the optimization search, to be bound to the objective without a need to pass them through the optimizer. Additionally, the __init__ and __del__ methods in the class provide a convenient place to allocate special purpose services and hardware which may be used when evaluating the objective. To simplify the use of the class, the __call__ method is used so the instance of the class looks like a function to the optimizer.

objective_scalar.py

'''
This module contains an objective function and logic to determine the starting 
    point for the ellipse fitting problem.

This objective is coded using operations on scalar quantities.

'''

from numpy import *
from numpy import linalg as LA
    
class Objective(object):
    def __init__(self,parameters):
        # setup parameters which will be used later
        self._p = parameters['point_list']
        self._sigma = parameters.get('sigma',0.2)
        self.x0 = self._find_x0()
        self._ahat_max = self.x0[0]
        

    def _find_x0(self):
        '''
        Determine the initial value, x0, for the optimization problem.
        '''
        points = array(self._p)
        x_list,y_list = zip(*points)
        # find x mean
        x_mean = array(x_list).mean()
        # find y mean
        y_mean = array(y_list).mean()
        # find point farthest away from mean
        center = array([x_mean,y_mean])
        distances = zeros((len(x_list),1))
        for i,point in enumerate(points):
            distances[i,0]=LA.norm(point-center)
        ind = where(distances==distances.max())
        max_pt = points[ind[0],:][0]
        # find point between mean and max point
        foci1 = (max_pt+center)/2.0
        # find point opposite from 
        foci2 = 2*center - max_pt
        return [distances.max(), foci1[0],foci1[1],foci2[0],foci2[1]]

    def __call__(self,x):
        '''
        Calculate the objective cost in the optimization problem using scalar
           equations which are summed.
        '''
        
        point_list = self._p
        
        foci1 = array([x[1],x[2]])
        foci2 = array([x[3],x[4]])
        a     = x[0]
        n = float(len(point_list))
        lambda_   = 0.1
        sigma    = self._sigma
        ahat_max = self._ahat_max
        total = 0
        for point in point_list:
            total += ((LA.norm(point-foci1,2)+LA.norm(point-foci2,2)-2*a)**2)/n
        total += lambda_*ahat_max*sigma*exp((a/ahat_max)**4)
        
        return total

    
if __name__=='__main__':
    import time
    from random import uniform, normalvariate, seed

    # local modules
    import ellipse
    
    ####################################################################

    # setup test conditions
    num_reps = 100    
    num_pts = 256
    precision = 'float'
    seed(1234567890)      # set the random generator to get repeatable results
    
    # setup the reference ellipse
    
    # define the foci locations
    foci1_ref = array([2,-1])
    foci2_ref = array([-2,1])
    # define distance from foci to ellipse circumference
    a_ref = 2.5
    
    point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref)
    
    parameters = { "point_list" : point_list }

    # test the function on an NVIDIA card
    t0 = time.time()
    my_objective = Objective(parameters)
    x0 = my_objective.x0
    t1 = time.time()
    for i in range(0,num_reps):
        y  = my_objective(x0)    
    t2 = time.time()
    
    print ''
    print 'Initialization took %f sec' % (t1-t0)
    print 'Execution took %f sec' % (t2-t1)
    print 'Executed %i times.' % (num_reps)
    print ''

    

The next module vectorizes the codes for faster execution. To simplify the coding, the vectorized objective inherits the interfaces and initialization from the scalar objective code module.

objective_vectorized.py

'''
This module contains an objective function for the ellipse fitting problem.
The objective is coded using vectorized operations.

'''

from numpy import *
from numpy import linalg as LA
import objective_scalar
    
class Objective(objective_scalar.Objective):

    def __call__(self,x):
        '''
        Calculate the objective cost in the optimization problem using
           vectorized equations.
        '''
        
        point_list = self._p
        foci1 = array([x[1],x[2]])
        foci2 = array([x[3],x[4]])
        a     = x[0]
        n = float(len(point_list))
        _lambda = 0.1
        sigma    = self._sigma
        ahat_max = self._ahat_max
           
        pt_diff1 = point_list - foci1
        pt_diff2 = point_list - foci2
     
        x_f1_diff = pt_diff1[:,0]
        x_f2_diff = pt_diff2[:,0]
        y_f1_diff = pt_diff1[:,1]
        y_f2_diff = pt_diff2[:,1]
     
        x_f1_diff_sq = power(x_f1_diff,2)   
        x_f2_diff_sq = power(x_f2_diff,2)   
        y_f1_diff_sq = power(y_f1_diff,2)   
        y_f2_diff_sq = power(y_f2_diff,2)
        
        norm_pt_to_f1 = power(x_f1_diff_sq+y_f1_diff_sq,0.5)
        norm_pt_to_f2 = power(x_f2_diff_sq+y_f2_diff_sq,0.5)
        
        temp = power(norm_pt_to_f1+norm_pt_to_f2-2*a,2)/n
        total = sum(temp)
        total += _lambda*ahat_max*sigma*exp((a/ahat_max)**4)
        
        return total

    
if __name__=='__main__':
    import time
    from random import seed

    # local modules
    import ellipse
    
    ####################################################################

    # setup test conditions
    num_reps = 100    
    num_pts = 256
    precision = 'float'
    seed(1234567890)      # set the random generator to get repeatable results
    
    # setup the reference ellipse
    
    # define the foci locations
    foci1_ref = array([2,-1])
    foci2_ref = array([-2,1])
    # define distance from foci to ellipse circumference
    a_ref = 2.5
    
    point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref)
    
    parameters = { "point_list" : point_list }

    # test the function on an NVIDIA card
    t0 = time.time()
    my_objective = Objective(parameters)
    x0 = my_objective.x0
    t1 = time.time()
    for i in range(0,num_reps):
        y  = my_objective(x0)    
    t2 = time.time()
    
    print ''
    print 'Initialization took %f sec' % (t1-t0)
    print 'Execution took %f sec' % (t2-t1)
    print 'Executed %i times.' % (num_reps)
    print ''
    

 

Executing objective_scalar.py and objective_vectorized.py will provide some sense of the difference in speed between the two approach. However, it is a good idea to measure the performance under different conditions. The next script, main.py, evaluates the implementation of the ellipse fit objective when different numbers of points are available for the fit. It also evaluates how the time to solve the ellipse fit problem changes between the scalar and the vectorized objectives when the number of points changes. To reduce the variation due to changes in the data points, the random seed is set to the same value each time points are generated.

main.py

'''
This module contains an objective function for the ellipse fitting problem.
The objective is coded using vectorized operations.

'''

__author__ = 'Ed Tate'
__email__  = 'edtate<at>gmail-dot-com'
__website__ = 'exnumerus.blogspot.com'
__license__ = 'Creative Commons Attribute By - http://creativecommons.org/licenses/by/3.0/us/'''


import time
from random import uniform, normalvariate, seed
from numpy import *
import pylab as plt
from openopt import NLP

# local modules
import ellipse
import objective_scalar
import objective_vectorized

####################################################################

# setup test conditions
num_reps = 100   

seed(1234567890)      # set the random generator to get repeatable results

# setup the reference ellipse

# define the foci locations
foci1_ref = array([2,-1])
foci2_ref = array([-2,1])
# define distance from foci to ellipse circumference
a_ref = 2.5


######################################################
# measure the timing to setup and call the objective function

num_pts_list = [10,100,250,500,1000,2500,5000,10000,50000]

objective_functions = [objective_scalar,objective_vectorized]
setup_times = []
call_times  = []
for num_pts in num_pts_list:
    seed(1234567890)      # set the random generator to get repeatable results    
    point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref)
    
    parameters = { "point_list" : point_list }
    
    test_setup_times = []
    test_call_times = []
    for objective_function in objective_functions:
        t0 = time.time()
        my_objective = objective_function.Objective(parameters)
        x0 = my_objective.x0
        t1 = time.time()
        for i in range(0,num_reps):
            y  = my_objective(x0)    
        t2 = time.time()
        test_setup_times.append(t1-t0)
        test_call_times.append(t2-t1)
        
    setup_times.append(test_setup_times)
    call_times.append(test_call_times)
    print 'Evaluated %i points' % num_pts

# plot the results
plt.figure()
plt.loglog(num_pts_list,call_times)
plt.legend(['scalar','vectorized'],loc='upper left')
plt.xlabel('Number of points')
plt.ylabel('Execution Time [sec]')
plt.title('Times for %i calls to objective function' % num_reps)
plt.grid(True)

#############################################################
# test the performance when used in an optimizer

num_pts_list = [10,100,250,500,1000,2500,5000]

objective_functions = [objective_scalar,objective_vectorized]
setup_times = []
call_times  = []
for num_pts in num_pts_list:
    seed(1234567890)      # set the random generator to get repeatable results    
    point_list = ellipse.generate_point_list(num_pts,a_ref,foci1_ref,foci2_ref)
    
    parameters = { "point_list" : point_list }
    
    test_setup_times = []
    test_call_times = []
    for objective_function in objective_functions:
        t0 = time.time()
        my_objective = objective_function.Objective(parameters)
        x0 = my_objective.x0
        p_vectorized = NLP(my_objective, x0, iprint = -1)
        t1 = time.time()
        opt_result = p_vectorized.solve('ralg')
        t2 = time.time()
        test_setup_times.append(t1-t0)
        test_call_times.append(t2-t1)
        
        print opt_result.xf
        
    setup_times.append(test_setup_times)
    call_times.append(test_call_times)
    print 'Evaluated %i points' % num_pts
    print

# plot the results
plt.figure()
plt.loglog(num_pts_list,call_times)
plt.legend(['scalar','vectorized'],loc='upper left')
plt.xlabel('Number of points')
plt.ylabel('Execution Time [sec]')
plt.title('Times to solve the ellipse fit problem')
plt.grid(True)

plt.show()

Testing Results and Conclusions

To easily see the results which cover a large range of numbers, the results are plotted on a log-log graph. Comparing the scalar calculations against the vectorized calculations of the objective function shows almost an order of magnitude improvement in computing time.

objective call times

 

When the vectorized objective is used in the ellipse fit problem, the performance improvement is almost an order of magnitude. For a small number of points, the overhead from the optimization routine is significant and reduces the improvement (as should be expected).

Ellipse fit times

 

Test Conditions:

This work is licensed under a Creative Commons Attribution By license.

No comments:

Post a Comment