Saturday, July 31, 2010

How to start Open Office (with UNO services) from an external copy of Python

Problem Statement:

Demonstrate how to start Open Office from an external installation of Python so the pyuno library can be used to talk to the UNO interfaces in Open Office.

Target Application:

The purpose of this code snippet is to test a feature which automates Open Office from an external copy of Python (not the one shipped with Open Office).

Discussion:

There are several ways to start a process external to Python. One of the easiest is to use the  os module’s system function. This allows a command to be passed to a shell, just like it was typed at the command line. This function starts a shell and passes a string to that shell, then return execution to python once the new shell terminates.

To start Open Office so that pyuno can automate it, some command line arguments must be passed.  The following snippet illustrates how to use os.system(…) to start Open Office so pyuno can talk to it. While os.system is an easy way to start a process, it blocks execution until Open Office is closed.

import os
workingDir = 'C:\\Program Files (x86)\\OpenOffice.org 3\\program'
cmd = "\"\"" + workingDir + '\\soffice\" '+'"-accept=socket,host=localhost,port=8100;urp;StarOffice.NamingService"'
# blocking call 
os.system(cmd) 
# this call block execution until open office terminated 
# do something after Open Office closes 

To allow a Python script to continue to execute while Open Office is running, the subprocess module offers more powerful ways to launch and control the Open Office process. The following code snippet illustrates how to start Open Office as a listener for UNO calls using subprocess.Popen(…) .

import subprocess
workingDir = 'C:\\Program Files (x86)\\OpenOffice.org 3\\program'
cmd  = workingDir + '\\soffice.exe'
opts = '"-accept=socket,host=localhost,port=8100;urp;StarOffice.NamingService"'
OOprocess = subprocess.Popen([cmd,opts])   # this call allows continued execution
# do more stuff while Open Office runs

An important difference between using subprocess.Popen(…) and os.system(…) is that subprocess.Popen(…) eliminates the need to add double quotes around path names with spaces in Windows. In fact, if you add the double quotes you will get “WindowsError: [Error 5] Access is denied” because the path and file name are not properly interpreted.

 Test Environment:

  • PythonXY 2.6.5.1
  • Open Office 3.2.0
  • Windows 7

References:

Monday, July 26, 2010

How to prevent integer division errors in Python 2.x

When doing scientific computing, the usual assumption is that all numbers are floating point numbers. In Python 2.x, if two numbers are typed as integers, the result of division is the floor of the result. For example, dividing integer 3 by 4 results in 0 rather than 0.75. It is very easy to let subtle coding errors creep into a program because of this behavior.

>>> 3/4
0
>>> 3./4
0.75

What can make a bug really subtle is if a routine is defined which uses division. Then later a call is made with integers rather than floats.

>>> def testFunction(x,y):
...    return x/y
...
>>> testFunction(3,4)
0
>>> testFunction(3.,4)
0.75

One way to avoid this issue is to guard all divisions in your code with a float conversion. However, this required a lot of diligence to maintain. A potentially easier solution is to import the division definitions from the __future__ module. Once this module is imported, it redefined how the ‘/’ operator behaves and eliminates this potential source of errors.

>>> from __future__ import division
>>> 3/4
0.75

Wednesday, July 21, 2010

Regression and Curve Fitting in Python – Pt 2

Weighted Curve Fitting.

ErrorBand

Introduction

When using least-squares linear regression, an assumption in typical  implementations is that the noise is Gaussian, white, and has the same statistics for all measurements. All of the solutions discussed in part 1 of this tutorial make this assumption including the polyfit function. This means that the regression will only work correctly is the measurement device always has the same statistics in its measurement errors. Sometimes, when data is collected, the noise statistics vary with each measurement. For example, this can happen when the background noise changes over time. Weighted least squares is a way to find fit a curve or find parameters when this occurs.

An Example

SCAN0899 (2)

Consider a simplified ballistic problem. Ignoring the effect of air resistance, the vertical position of a projective is governed by its initial position, initial velocity, and the force of gravity. Since air resistance is not considered, the measured altitude of projectile is a closed form function of time:

image

The noise term at the end of the equation is the error introduced by the measurement system. The noise, varies as a function of time and is represented by a normal distribution with time varying variance. For this problem, assume the noise gets worse with each subsequent sample. The mean of the error in each measurement is zero and has a standard deviation equal to 1 plus 4 times the number of seconds since the measurements started:

image

This might happen because the measurement hardware heats up with each sample, or the projectile moves away from the sensor. To visualize how this distribution of measurement errors looks, the measurements are taken many times. Each experiment is plotted over the previous experiments. The density of the measurements provides a visual feel for how the increase in error spreads out the measurements. This is illustrated here:

ErrorCloud The challenge in this kind of problem is using the knowledge of the model of the behavior (e.g. the ballistics) and the model of the noise to find an optimal fit. If a suboptimal method is used, the error in the fit is significantly greater than if an optimal method is used. A single comparison on a one data set is insufficient to see the benefits of one approach versus another. When a large number of experiments are performed where the true value is know and the estimated values are know, the distribution of the estimation errors due to different approaches is visible. The following graph shows how using weighted least-squares versus least-squares assuming constant weights improves on the distribution of errors.

ErrorDistributionThese graphs were generated from the following script:

from random import normalvariate
from pylab import *

class System(object):
    def __init__(self,p0=0.0,v0=10.0,a=-9.8):
        self.p0 = p0
        self.v0 = v0
        self.a = a

    def position(self,time):
        return self.p0+self.v0*t+0.5*self.a*t**2
        
class Sensor(object):
    def __init__(self,system=System(),errorStd=0.0):
        self.system = system
        self.errorStd = errorStd
    def error(self,t=0):
        try:
            std = self.errorStd(t=t)
            err = normalvariate(0,std)
        except TypeError:
            err = normalvariate(0,self.errorStd)
        return err
        
class PositionSensor(Sensor):
    def measure(self,t):
        err = Sensor.error(self,t=t)
        return self.system.position(t)+err
        
    def errorStd(self,t):
        try:
           return self.errStd(t)
        except TypeError:
            return self.errStd


def weightedPolyFit(xList,yList,sigmaList,order=1,crossesAtZero=False):
    '''fit the data using a weighted least squares for a polynomial model
        xList is a list of input values
        yList is a list of output values
        sigmaList is a list with the standard deviation for each y value
        order defines the order of the model, 
            order = 1 -> linear model
            order = 2 -> quadratic model
        crossesAtZero specifies whether the polynomial must be equal to zer
            at x = 0
    '''
    fList = [(lambda x,n=n: x**n) for n in range(order,-1,-1)]
    if crossesAtZero: 
        # eliminate the first column so the model corsses at 0
        del fList[0]
    # build row for each element in y
    bList = []
    A_List = []
    for (thisX,thisY) in zip(xList,yList):
        bList.append(thisY)
        A_Row = [f(thisX) for f in fList]
        A_List.append(A_Row)
    W = diag([1.0 /sigma for sigma in sigmaList])
    b = matrix(bList).T
    A = matrix(A_List)
    b2 = W*b
    A2 = W*A
    w = inv(A2.T*A2)*A2.T*b2
    return w.T.tolist()[0]


if __name__=='__main__':
    import random
    
    errStdFun = lambda t: 1 + 4.0*t
    
    random.seed(0)

    p0True = 0.0
    v0True = 10.0
    aTrue = 9.8

    s = System()
    p = PositionSensor(s,errStdFun)
    tSamples = arange(0,2,0.025)
 
    pErr1List = []
    pErr2List = []
    vErr1List = []
    vErr2List = []
    aErr1List = []
    aErr2List = []
    
    for i in range(0,20000):
        measuredPosition = [p.measure(t) for t in tSamples]
        
        xList = tSamples
        yListMeasured = measuredPosition
        sigmaList = [p.errorStd(t) for t in tSamples]
        w =  weightedPolyFit(xList,yListMeasured,sigmaList,order=2)
        p0Est = w[2]
        v0Est = w[1]
        aEst  = w[0]*2
        pErr1List.append(p0Est-p0True)
        vErr1List.append(v0Est-v0True)
        aErr1List.append(aEst-aTrue)
        
        w2 =  polyfit(xList,yListMeasured,2)
        p0Est2 = w2[2]
        v0Est2 = w2[1]
        aEst2  = w2[0]*2
        pErr2List.append(p0Est2-p0True)
        vErr2List.append(v0Est2-v0True)
        aErr2List.append(aEst2-aTrue)

    figure(1)
    subplot(3,1,1)
    hist(pErr1List,50,normed=True)
    hist(pErr2List,50,normed=True,alpha=0.5)
    grid(True)
    title('Error in initial position estimate')
    
    
    subplot(3,1,2)
    hist(vErr1List,50,normed=True)
    hist(vErr2List,50,normed=True,alpha=0.5)
    grid(True)
    title('Error in initial velocity estimate')

    subplot(3,1,3)
    hist(aErr1List,50,normed=True,label='Weighted Least Sq Fit')
    hist(aErr2List,50,normed=True,label='Least Sq Fit',alpha=0.5)
    legend()
    grid(True)
    xlabel('error')
    title('Error in acceleration estimate')


    show() 

Some Theory: Finding the best fit

Recall from pt 1, that a least-squares fit is performed by reducing the equations to a matrix expression,

image then using the KKT conditions to find the weights which minimize the sum of errors squared.

In weighted least-squares, each error can have a different relative importance in the minimization problem. Usually, this weighting is equal to the inverse of the standard deviation of the error and each error is assumed to be uncorrelated. If these conditions are met, the relative weighting is in a diagonal matrix. Using the KKT conditions, this minimization problem,

image

is solved using

image 

The solved example

To solve this problem, a system class and a sensor class is created. The sensor class is subclassed into a position sensor. The system class provides a model of the true behavior of the system. The sensor class provides a model of the data measured by a sensor which detecting the true position of the system in the presence of noise.

The weightedPolyFit function, in the listing,  provides the logic to generate a weighted fit for parameters in a polynomial equation, which describes the position of the projectile.

 

This plot shows the true trajectory of the projectile, the measured positions, and the estimated positions.

Position and Measurements

The full code for the example

from random import normalvariate
from pylab import *

class System(object):
    def __init__(self,p0=0.0,v0=10.0,a=-9.8):
        self.p0 = p0
        self.v0 = v0
        self.a = a

    def position(self,time):
        return self.p0+self.v0*t+0.5*self.a*t**2
        
class Sensor(object):
    def __init__(self,system=System(),errorStd=0.0):
        self.system = system
        self.errorStd = errorStd
    def error(self,t=0):
        try:
            std = self.errorStd(t=t)
            err = normalvariate(0,std)
        except TypeError:
            err = normalvariate(0,self.errorStd)
        return err
        
class PositionSensor(Sensor):
    def measure(self,t):
        err = Sensor.error(self,t=t)
        return self.system.position(t)+err
        
    def errorStd(self,t):
        try:
           return self.errStd(t)
        except TypeError:
            return self.errStd


def weightedPolyFit(xList,yList,sigmaList,order=1,crossesAtZero=False):
    '''fit the data using a weighted least squares for a polynomial model
        xList is a list of input values
        yList is a list of output values
        sigmaList is a list with the standard deviation for each y value
        order defines the order of the model, 
            order = 1 -> linear model
            order = 2 -> quadratic model
        crossesAtZero specifies whether the polynomial must be equal to zer
            at x = 0
    '''
    fList = [(lambda x,n=n: x**n) for n in range(order,-1,-1)]
    if crossesAtZero: 
        # eliminate the first column so the model corsses at 0
        del fList[0]
    # build row for each element in y
    bList = []
    A_List = []
    for (thisX,thisY) in zip(xList,yList):
        bList.append(thisY)
        A_Row = [f(thisX) for f in fList]
        A_List.append(A_Row)
    W = diag([1.0 /sigma for sigma in sigmaList])
    b = matrix(bList).T
    A = matrix(A_List)
    b2 = W*b
    A2 = W*A
    w = inv(A2.T*A2)*A2.T*b2
    return w.T.tolist()[0]


if __name__=='__main__':
    import random
    
    errStdFun = lambda t: 1 + 4.0*t
    
    random.seed(0)
    s = System()
    p = PositionSensor(s,errStdFun)
    
    tSamples = arange(0,2,0.025)
    truePosition = [s.position(t) for t in tSamples]
    measuredPosition = [p.measure(t) for t in tSamples]
    
    xList = tSamples
    yListMeasured = measuredPosition
    sigmaList = [p.errorStd(t) for t in tSamples]
   
    w =  weightedPolyFit(xList,yListMeasured,sigmaList,order=2)
    p0Est = w[2]
    v0Est = w[1]
    aEst  = w[0]*2
    print '..p0Est = %f, v0Est=%f, aEst = %f' % (p0Est,v0Est,aEst)
    sEst = System(p0=p0Est,v0=v0Est,a=aEst)
    est1Position = [sEst.position(t) for t in tSamples] 
 
    figure(1)
    plot(tSamples,truePosition)
    plot(tSamples,est1Position,'--k')
    plot(tSamples,measuredPosition,'+r',markeredgewidth=2)
    grid(True)
    ylabel('Position [meters]')
    xlabel('Time [sec]')
    title('Position, Measurements, and Estimated Position')
    legend(['True Position','Estimated Position','Measured Position'])
    savefig('Position and Measurements.png')

    figure(2)
    err3Sigma = [2.0*p.errorStd(t) for t in tSamples]
    errorbar(tSamples,truePosition,err3Sigma)
    #plot(tSamples,truePosition)
    plot(tSamples,est1Position,'--k')
    plot(tSamples,measuredPosition,'+r',markeredgewidth=2)
    grid(True)
    ylabel('Position [meters]')
    xlabel('Time [sec]')
    title('2 sigma error band')
    savefig('ErrorBand.png')

    figure(3)
    for i in range(0,250):
        measuredPosition = [p.measure(t) for t in tSamples]
        plot(tSamples,measuredPosition,'xr',alpha=0.1,markeredgewidth=2)
    plot(tSamples,truePosition,'-b')
    upperLimit = [ 2.0*p.errorStd(t)+s.position(t) for t in tSamples]
    lowerLimit = [-2.0*p.errorStd(t)+s.position(t) for t in tSamples]
    plot(tSamples,upperLimit,'--k')
    plot(tSamples,lowerLimit,'--k')
    grid(True)
    ylabel('Position [meters]')
    xlabel('Time [sec]')
    title('True Position, Measurements, & 95% Bounds on Measurements')

    show()

 


See part 1 here.


All text is copyright © 2010, Ed Tate, All Rights Reserved.

All software and example codes are subject to the MIT License

Copyright (c) 2010, Ed Tate, Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Saturday, July 17, 2010

A simple example of how to add engineering units to numbers in Python

Problem Statement:
              When building numeric analyses, one source of errors is units. It is very common to merge data from multiple sources, where each source uses a different, and incompatible set of engineering units. Once an error like this is introduced, it can be difficult to find the error and correct it.
    This tutorial shows how to create a class which behaves like a floating point number, but carries a descriptive string which can be used for various purposes in a program. This is a very simple approach is provided to understand how to create a class which inherits from an immutable class in python.

Use Case:

The use case of this class is simple. It is intended to allow a unit to be attached to a floating point value when the value is passed between functions. However, the units will not be preserved through math operations. There are several libraries which will do all of this.

>>x = FloatWithUnits(3.1415,'miles')   

>>y = FloatWithUnits(2.7182,'km')  

>>print x

... 3.1415+0.0j miles

>>print y

... 2.7182+0.0j km

>>z = x*y

>>print z

... 8.5392253



Solution and Discussion:


   Any class which emulates numeric types needs to override many class methods. By inheriting from an existing type, like a float, all of these methods will be defined with default behavior.   For this problem, the challenge is to attach and engineering unit, while preserving all of the methods of a floating point number. The obvious approach to this problem is to create a class which inherits from floats.

   When inheriting from an immutable type, things are a little different than inheriting from other classes. The __new__ method must generally be overridden. Otherwise, the __new__ method will prevent an overridden __init__ method from executing.  Also, since we’d like the class to display properly, the __str__ and __repr__ methods are overridden. Here is the new class:

class FloatWithUnits(float):
    def __new__(cls,value,units=None):
        return float.__new__(cls,value)
        
    def __init__(self,value,units=None):
        self.units=units
        
    def __repr__(self):
        return 'FloatWithUnits('+str(self.real)+'+'+str(self.imag)+'j,"'+self.units+'")'
        
    def __str__(self):
        return str(self.real)+'+'+str(self.imag)+'j '+self.units

if __name__=='__main__':

    print FloatWithUnits(4.543)**2


    import unittest
                
    class Test_FloatWithUnits(unittest.TestCase):
        
        def setUp(self):
            pass
            
        def test_DefaultInstanciation(self):
            self.assertTrue(FloatWithUnits(1.45)==1.45)
            self.assertAlmostEqual(FloatWithUnits(1.45)*2,2*1.45)
            self.assertTrue(FloatWithUnits(-1.45)<0)
            self.assertTrue(FloatWithUnits(1.45)>0)
            
        def test_Units(self):
            x = FloatWithUnits(2.3,'miles')
            #print x.units
            self.assertTrue(x.units=='miles')
        
            
    #unittest.main()
    suite = unittest.TestLoader().loadTestsFromTestCase(Test_FloatWithUnits)
    unittest.TextTestRunner(verbosity=2).run(suite)

 

What doesn’t work and why.

   As important as it is to see a working example, seeing a nonworking example is useful. A naive approach to solving this problem is to inherit from float and simply override the __init__ method. This will fail for several reasons, but, the most perplexing error will be the related to the number of arguments during class creation. Try the following code snippet.

class FloatWithUnits(float):
    def __init__(self,value,units=None):
        super(FloatWithUnits,self).__init__()
        self.units=units
 
 

>>x = FloatWithUnits(2.3,'miles’)   

... TypeError: float() takes at most 1 argument (2 given)

    This exception occurs because the __new__ method for float only accepts one value. The inheritance of floats in this class brings in the float __new__ method, and the __init__ method never has a chance to run.


Related Libraries and Packages:

   There are several libraries which add engineering units to scalar and matrix math. Here are a few of them:


References:



All text is copyright © 2010, Ed Tate, All Rights Reserved.
All software and example codes are subject to the MIT License
Copyright (c) 2010, Ed Tate, Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.