mcmc

Source Code for Module mcmc

1 """Adaptive Markov-Chain Monte-Carlo algorithm. 2 This can be used to generate samples from a probability distribution, 3 or also as a simulated annealing algorithm for maximization. 4 This can be used as a script, from the command line, or (more likely) 5 it can be imported and its functions and classes can be used. 6 7 The central interfaces are the L{BootStepper} class, and within 8 it, the C{step()} method is used iteratively to take a Markov step. 9 10 To use it as a script, you create your own module that implements 11 a few functions, and run C{mcmc.py}. It will import your module 12 and call your functions repeatedly. 13 14 The algorithm is described in Kochanski and Rosner 2010, 15 and earlier versions have been used in ZZZ. 16 It evolved originally from amoeba_anneal (in Numerical Recipes, Press et al.). 17 The essential feature is that it keeps a large archive of previous positions 18 (possibly many times more than C{N} of them). It samples two positions 19 from the archive and subtracts them to generate candidate steps. 20 This has the nice property that when sampling from a multivariate 21 Gaussian distribution, the candidate steps match the distribution nicely. 22 23 It can be operated in two modes (or set to automatically switch). 24 One is optimization mode where it heads for the maximum of the probability 25 distribution. The other mode is sampling mode, where it 26 asymptotically follows a Markov sampling procedure and has the 27 proper statistical properties. 28 29 As a script, it takes the following arguments: 30 - C{-o fff} Specifies a log file for some basic logging. 31 32 - {C -py ddd/mm} Specifies the module to load and to optimize. 33 If the name contains a slash, it searches C{ddd} 34 instead of the python search path, looking for a module 35 named C{mm}. If no slash, then it looks up the module 36 in C{PYTHONPATH}. 37 38 - C{-NI NN} Sets the number of iterations. Required, unless the 39 module defines C{yourmod.NI}. 40 41 - C{--} Stops interpretation of the argument list. 42 The remainder is passed to yourmod.init(), if 43 it is defined, then to yourmod.start(), if that is 44 defined. 45 46 It uses the following functions and variables: 47 48 - C{yourmod.c} (required). 49 This is an object that is passed into C{yourmod.resid()} 50 or C{yourmod.logp()}. You can make it be anything you 51 want, as it is only touched by your functions. 52 Often, it contains data or parameters for these functions, 53 but it can be L{None} if you have no need for it. 54 55 - C{yourmod.start(arglist, c)} (Not required). 56 This is where you initialize the Monte-Carlo iteration. 57 If you define yourmod.start(), it is passed the arglist, 58 after the standard C{mcmc.py} flags and arguments are removed 59 from it. C{Yourmod.start()} 60 must return a list of one or more starting vectors, 61 (either a list of numpy arrays or a list of sequences 62 of floats). These starting vectors are used to seed the iteration. 63 64 If it is not defined, mcmc.py will read a list 65 of starting vectors in ascii format from the standard input 66 using L{_read}(). 67 68 - C{yourmod.init(ndim, ni, c, arglist)} (Not required). 69 This is primarily there to open a log file. 70 Init is passed the remainder of the scripts argument 71 list, after yourmod.start() looks at it. It can do anything 72 it wishes with the argument list. It can return an object or None. 73 74 If it returns something other than L{None}, it will be used 75 this way:: 76 77 x = yourmod.init(ndim, ni, yourmod.c, arglist) 78 x.add(parameters, iteration) 79 # add is there to accumulate averages of parameters 80 # or to log parameter vectors. 81 x.finish(sys.stdout) 82 # finish can be used to print a summary or close a log file. 83 84 C{Ndim} is the length of the parameter vector, 85 C{ni} is the number of iterations, C{c} is C{yourmod.c}. 86 87 - C{yourmod.resid(x, c)} (Either yourmod.logp() or yourmod.resid() is required.) 88 89 - C{yourmod.logp(x, c)} (Either C{yourmod.logp} or C{yourmod.resid} is required.) 90 These funtions are the thing which is to be optimized. 91 C{Yourmod.log} returns the logarithm (base e) of the probability 92 density at position C{x}. (It does not need to be normalized, so it really 93 only needs something proportional to the probability density.) 94 C{X} is a numpy array (a 1-dimensional vector), 95 and C{c} is an arbitrary object that you define. 96 97 C{Yourmod.logp} should return L{None} if x is a silly value. Either that, 98 or it can raise a L{NotGoodPosition} exception. Note that 99 the optimizer will tend to increase values of logp. It's a maximizer, 100 not a minimizer! 101 102 If you give yourmod.resid() instead, logp() will be calculated 103 as -0.5 times the sum of the squares of the residuals. It should 104 return a numpy vector. It may return None, or raise L{NotGoodPosition}, which will 105 cause the optimizer to treat the position as extremely bad. 106 107 - yourmod.log(result_of_init, i, p, w) (Not required). 108 This is basically a function to log your data. 109 No return value. It is called at every step and 110 passed the result of yourmod.init() (which might 111 be a file descriptor to write into). It is also passed the iteration 112 count, i, the current parameter vector, p, and w, which is reserved 113 for possible future use. 114 115 116 - yourmod.NI (required unless the -NI argument is given on the command line) 117 Specifies how many iterations. This is an integer. 118 119 - yourmod.STARTLOW (not required). Integer 0 or 1. 120 If true, the iteration is started from the largest value of logp 121 that is known. 122 123 124 - yourmod.V (Not required) 125 This specifies the covariance matrix that is used to hop from one 126 test point to another. It is 127 a function that takes the result of yourmod.start() and returns 128 a Numeric/numarray 2-dimensional square matrix, 129 130 If it is not specified, mcmc will use some crude approximation 131 and will probably start more slowly, but should work OK. 132 If it is not specified and yourmod.start() returns only 133 one starting position, then it will just use an identity matrix 134 as the covariance matrix, and it might still work, but 135 don't expect too much. 136 """ 137 from __future__ import with_statement 138 139 import sys 140 141 import numpy 142 143 from gmisclib import die 144 from gmisclib import mcmc 145 from gmisclib import gpkmisc 146 from gmisclib import g_implements 147 from gmisclib import multivariate_normal as MVN 148 149 Debug = 0 150 MEMORYS_WORTH_OF_PARAMETERS = 1e8 151 152 153 154 155

156 -def logp_from_resid(x, c, resid_fcn):

157 r = resid_fcn(x, c) 158 if r is None: 159 raise mcmc.NotGoodPosition 160 return -numpy.sum(r**2)/2.0

161 162 163

164 -def _print(fd, prefix, p):

165 fd.writelines( ' '.join( [prefix] + [ '%g'%pj for pj in p ] ) ) 166 fd.writelines('\n')

167 168

169 -def _read(fd):

170 o = [] 171 for t in fd: 172 if t.startswith('#'): 173 continue 174 t = t.strip() 175 if t == '': 176 continue 177 o.append( numpy.array( [float(x) for x in t.split() ], 178 numpy.float) 179 ) 180 np = o[-1].shape[0] 181 # die.info("np=%d" % np) 182 return o[ max(0, len(o)-2*np*np) : ]

183 184 185 186

187 -def go(argv, theStepper):

188 """Run the optimization, controlled by command line flags.""" 189 global Debug 190 import load_mod 191 if len(argv) <= 1: 192 print __doc__ 193 return 194 python = None 195 out = None 196 NI = None 197 arglist = argv[1:] 198 while len(arglist) > 0 and arglist[0][0] == '-': 199 arg = arglist.pop(0) 200 if arg == '-o': 201 out = arglist.pop(0) 202 elif arg == '-py': # path/module 203 python = arglist.pop(0) 204 elif arg == '-NI': 205 NI = int(arglist.pop(0)) 206 elif arg == '-debug': 207 Debug += 1 208 elif arg == '--': 209 break 210 else: 211 die.die("Unrecognized flag: %s" % arg) 212 213 assert python is not None, "Need to use the -py flag." 214 215 mod = load_mod.load_named_module(python, use_sys_path='/' in python) 216 print "# mod= %s" % str(mod) 217 218 if out: 219 logfd = open(out, "w") 220 else: 221 logfd = None 222 223 mod_c = getattr(mod, "c", None) 224 if not hasattr(mod, "start"): 225 start = _read(sys.stdin) 226 else: 227 start = mod.start(arglist, mod_c) 228 229 if hasattr(mod, 'logp'): 230 logPfcn = mod.logp 231 elif hasattr(mod, 'resid'): 232 logPfcn = lambda x, c, m=mod: logp_from_resid(x, c, m.resid) 233 else: 234 die.die("Can't find logp() or resid() function in module.") 235 236 if NI is None: 237 NI = mod.NI 238 print "# NI= %d" % NI 239 240 241 mod_c = getattr(mod, "c", None) #A kluge, in case mod.start() changed c. 242 if hasattr(mod, "V"): 243 V = numpy.asarray(mod.V(start, mod_c), numpy.float) 244 elif mcmc.start_is_list_a(start) and len(start)>1: 245 V = mcmc.diag_variance(start) 246 else: 247 V = numpy.identity(start[0].shape[0], numpy.float) 248 249 mod_c = getattr(mod, "c", None) #A kluge, in case mod.V() changed c. 250 if hasattr(mod, "init"): 251 logptr = mod.init(V.shape[0], NI, mod_c, arglist) 252 else: 253 logptr = def_logger(V.shape[0], NI, mod_c, arglist) 254 255 sort = getattr(mod, 'SORT_STRATEGY', mcmc.BootStepper.SSAUTO) 256 257 fixer = getattr(mod, 'fixer', None) 258 259 mod_c = getattr(mod, "c", None) #A kluge, in case mod.init() changed c. 260 sys.stdout.flush() 261 x = theStepper(logPfcn, start, V, c=mod_c, sort=sort, fixer=fixer) 262 263 sys.stdout.writelines('[P]\n') 264 for i in range(NI): 265 x.step() 266 p = x.prms() 267 sys.stdout.flush() 268 if logfd is not None: 269 logfd.writelines('# ' + x.status() + '\n') 270 _print(logfd, '%d'%i , p) 271 logfd.flush() 272 if logptr is not None: 273 logptr.add(p, i) 274 if logptr is not None: 275 logptr.finish(sys.stdout)

276 277

278 -class def_logger:

279 - def __init__(self, ndim, ni, c, arglist):

280 self.v = numpy.zeros((ndim,ndim), numpy.float) 281 self.n = 0 282 self.NI = ni 283 self.c = c

284

285 - def add(self, p, i):

286 if i >= self.NI/10: 287 self.v += numpy.outer(p, p) 288 self.n += 1

289

290 - def finish(self, stdout):

291 self.v /= float(self.n) 292 sys.stdout.writelines('[V]\n') 293 sys.stderr.writelines(str( self.v ) + "\n") 294 evalues, evec = numpy.linalg.eigh(self.v) 295 stdout.writelines('[Evals]\n') 296 stdout.writelines("%s\n" % str(evalues)) 297 stdout.writelines('[Evecs]\n') 298 stdout.writelines("%s\n" % str(evec))

299 300 301 302 303 304 305 if __name__ == '__main__': 306 try: 307 import psyco 308 psyco.full() 309 except ImportError: 310 pass 311 312 try: 313 go(sys.argv, mcmc.bootstepper) 314 except: 315 die.catch('Unexpected exception') 316 raise 317