Module mcmc
[frames] | no frames]

Source Code for Module mcmc

  1  """Adaptive Markov-Chain Monte-Carlo algorithm. 
  2  This can be used to generate samples from a probability distribution, 
  3  or also as a simulated annealing algorithm for maximization. 
  4  This can be used as a script, from the command line, or (more likely) 
  5  it can be imported and its functions and classes can be used. 
  6   
  7  The central interfaces are the L{BootStepper} class, and within 
  8  it, the C{step()} method is used iteratively to take a Markov step. 
  9   
 10  To use it as a script, you create your own module that implements 
 11  a few functions, and run C{mcmc.py}.  It will import your module 
 12  and call your functions repeatedly. 
 13   
 14  The algorithm is described in Kochanski and Rosner 2010, 
 15  and earlier versions have been used in ZZZ. 
 16  It evolved originally from amoeba_anneal (in Numerical Recipes, Press et al.). 
 17  The essential feature is that it keeps a large archive of previous positions 
 18  (possibly many times more than C{N} of them). It samples two positions 
 19  from the archive and subtracts them to generate candidate steps. 
 20  This has the nice property that when sampling from a multivariate 
 21  Gaussian distribution, the candidate steps match the distribution nicely. 
 22   
 23  It can be operated in two modes (or set to automatically switch). 
 24  One is optimization mode where it heads for the maximum of the probability 
 25  distribution.   The other mode is sampling mode, where it 
 26  asymptotically follows a Markov sampling procedure and has the 
 27  proper statistical properties. 
 28   
 29  As a script, it takes the following arguments: 
 30          - C{-o fff}     Specifies a log file for some basic logging. 
 31   
 32          - {C -py ddd/mm}        Specifies the module to load and to optimize. 
 33                  If the name contains a slash, it searches C{ddd} 
 34                  instead of the python search path, looking for a module 
 35                  named C{mm}.  If no slash, then it looks up the module 
 36                  in C{PYTHONPATH}. 
 37   
 38          - C{-NI NN}     Sets the number of iterations.  Required, unless the 
 39                  module defines C{yourmod.NI}. 
 40   
 41          - C{--} Stops interpretation of the argument list. 
 42                  The remainder is passed to yourmod.init(), if 
 43                  it is defined, then to yourmod.start(), if that is 
 44                  defined. 
 45   
 46  It uses the following functions and variables: 
 47   
 48          - C{yourmod.c}  (required). 
 49                  This is an object that is passed into C{yourmod.resid()} 
 50                  or C{yourmod.logp()}.  You can make it be anything you 
 51                  want, as it is only touched by your functions. 
 52                  Often, it contains data or parameters for these functions, 
 53                  but it can be L{None} if you have no need for it. 
 54   
 55          - C{yourmod.start(arglist, c)}  (Not required). 
 56                  This is where you initialize the Monte-Carlo iteration. 
 57                  If you define yourmod.start(), it is passed the arglist, 
 58                  after the standard C{mcmc.py} flags and arguments are removed 
 59                  from it.   C{Yourmod.start()} 
 60                  must return a list of one or more starting vectors, 
 61                  (either a list of numpy arrays or a list of sequences 
 62                  of floats).    These starting vectors are used to seed the iteration. 
 63                   
 64                  If it is not defined, mcmc.py will read a list 
 65                  of starting vectors in ascii format from the standard input 
 66                  using L{_read}(). 
 67                   
 68          - C{yourmod.init(ndim, ni, c, arglist)}   (Not required). 
 69                  This is primarily there to open a log file. 
 70                  Init is passed the remainder of the scripts argument 
 71                  list, after yourmod.start() looks at it.    It can do anything 
 72                  it wishes with the argument list.  It can return an object or None. 
 73                   
 74                  If it returns something other than L{None}, it will be used 
 75                  this way:: 
 76   
 77                  x = yourmod.init(ndim, ni, yourmod.c, arglist) 
 78                  x.add(parameters, iteration) 
 79                  # add is there to accumulate averages of parameters 
 80                  # or to log parameter vectors. 
 81                  x.finish(sys.stdout) 
 82                  # finish can be used to print a summary or close a log file. 
 83   
 84                  C{Ndim} is the length of the parameter vector, 
 85                  C{ni} is the number of iterations, C{c} is C{yourmod.c}. 
 86   
 87          - C{yourmod.resid(x, c)}  (Either yourmod.logp() or yourmod.resid() is required.) 
 88   
 89          - C{yourmod.logp(x, c)}  (Either C{yourmod.logp} or C{yourmod.resid} is required.) 
 90                  These funtions are the thing which is to be optimized. 
 91                  C{Yourmod.log} returns the logarithm (base e) of the probability 
 92                  density at position C{x}.  (It does not need to be normalized, so it really 
 93                  only needs something proportional to the probability density.) 
 94                  C{X} is a numpy array (a 1-dimensional vector), 
 95                  and C{c} is an arbitrary object that you define. 
 96                   
 97                  C{Yourmod.logp} should return L{None} if x is a silly value.  Either that, 
 98                  or it can raise a L{NotGoodPosition} exception.  Note that 
 99                  the optimizer will tend to increase values of logp.  It's a maximizer, 
100                  not a minimizer! 
101                   
102                  If you give yourmod.resid() instead, logp() will be calculated 
103                  as -0.5 times the sum of the squares of the residuals.   It should 
104                  return a numpy vector.  It may return None, or raise L{NotGoodPosition}, which will 
105                  cause the optimizer to treat the position as extremely bad. 
106   
107          - yourmod.log(result_of_init, i, p, w)   (Not required). 
108                  This is basically a function to log your data. 
109                  No return value.    It is called at every step and 
110                  passed the result of yourmod.init() (which might 
111                  be a file descriptor to write into).  It is also passed the iteration 
112                  count, i, the current parameter vector, p, and w, which is reserved 
113                  for possible future use. 
114   
115   
116          - yourmod.NI  (required unless the -NI argument is given on the command line) 
117                  Specifies how many iterations.  This is an integer. 
118   
119          - yourmod.STARTLOW (not required).  Integer 0 or 1. 
120                  If true, the iteration is started from the largest value of logp 
121                  that is known. 
122   
123   
124          - yourmod.V     (Not required) 
125                  This specifies the covariance matrix that is used to hop from one 
126                  test point to another.   It is 
127                  a function that takes the result of yourmod.start() and returns 
128                  a Numeric/numarray 2-dimensional square matrix, 
129   
130                  If it is not specified, mcmc will use some crude approximation 
131                  and will probably start more slowly, but should work OK. 
132                  If it is not specified and yourmod.start() returns only 
133                  one starting position, then it will just use an identity matrix 
134                  as the covariance matrix, and it might still work, but 
135                  don't expect too much. 
136  """ 
137  from __future__ import with_statement 
138   
139  import sys 
140   
141  import numpy 
142   
143  from gmisclib import die 
144  from gmisclib import mcmc 
145  from gmisclib import gpkmisc 
146  from gmisclib import g_implements 
147  from gmisclib import multivariate_normal as MVN 
148   
149  Debug = 0 
150  MEMORYS_WORTH_OF_PARAMETERS = 1e8 
151   
152   
153   
154   
155   
156 -def logp_from_resid(x, c, resid_fcn):
157 r = resid_fcn(x, c) 158 if r is None: 159 raise mcmc.NotGoodPosition 160 return -numpy.sum(r**2)/2.0
161 162 163
164 -def _print(fd, prefix, p):
165 fd.writelines( ' '.join( [prefix] + [ '%g'%pj for pj in p ] ) ) 166 fd.writelines('\n')
167 168
169 -def _read(fd):
170 o = [] 171 for t in fd: 172 if t.startswith('#'): 173 continue 174 t = t.strip() 175 if t == '': 176 continue 177 o.append( numpy.array( [float(x) for x in t.split() ], 178 numpy.float) 179 ) 180 np = o[-1].shape[0] 181 # die.info("np=%d" % np) 182 return o[ max(0, len(o)-2*np*np) : ]
183 184 185 186
187 -def go(argv, theStepper):
188 """Run the optimization, controlled by command line flags.""" 189 global Debug 190 import load_mod 191 if len(argv) <= 1: 192 print __doc__ 193 return 194 python = None 195 out = None 196 NI = None 197 arglist = argv[1:] 198 while len(arglist) > 0 and arglist[0][0] == '-': 199 arg = arglist.pop(0) 200 if arg == '-o': 201 out = arglist.pop(0) 202 elif arg == '-py': # path/module 203 python = arglist.pop(0) 204 elif arg == '-NI': 205 NI = int(arglist.pop(0)) 206 elif arg == '-debug': 207 Debug += 1 208 elif arg == '--': 209 break 210 else: 211 die.die("Unrecognized flag: %s" % arg) 212 213 assert python is not None, "Need to use the -py flag." 214 215 mod = load_mod.load_named_module(python, use_sys_path='/' in python) 216 print "# mod= %s" % str(mod) 217 218 if out: 219 logfd = open(out, "w") 220 else: 221 logfd = None 222 223 mod_c = getattr(mod, "c", None) 224 if not hasattr(mod, "start"): 225 start = _read(sys.stdin) 226 else: 227 start = mod.start(arglist, mod_c) 228 229 if hasattr(mod, 'logp'): 230 logPfcn = mod.logp 231 elif hasattr(mod, 'resid'): 232 logPfcn = lambda x, c, m=mod: logp_from_resid(x, c, m.resid) 233 else: 234 die.die("Can't find logp() or resid() function in module.") 235 236 if NI is None: 237 NI = mod.NI 238 print "# NI= %d" % NI 239 240 241 mod_c = getattr(mod, "c", None) #A kluge, in case mod.start() changed c. 242 if hasattr(mod, "V"): 243 V = numpy.asarray(mod.V(start, mod_c), numpy.float) 244 elif mcmc.start_is_list_a(start) and len(start)>1: 245 V = mcmc.diag_variance(start) 246 else: 247 V = numpy.identity(start[0].shape[0], numpy.float) 248 249 mod_c = getattr(mod, "c", None) #A kluge, in case mod.V() changed c. 250 if hasattr(mod, "init"): 251 logptr = mod.init(V.shape[0], NI, mod_c, arglist) 252 else: 253 logptr = def_logger(V.shape[0], NI, mod_c, arglist) 254 255 sort = getattr(mod, 'SORT_STRATEGY', mcmc.BootStepper.SSAUTO) 256 257 fixer = getattr(mod, 'fixer', None) 258 259 mod_c = getattr(mod, "c", None) #A kluge, in case mod.init() changed c. 260 sys.stdout.flush() 261 x = theStepper(logPfcn, start, V, c=mod_c, sort=sort, fixer=fixer) 262 263 sys.stdout.writelines('[P]\n') 264 for i in range(NI): 265 x.step() 266 p = x.prms() 267 sys.stdout.flush() 268 if logfd is not None: 269 logfd.writelines('# ' + x.status() + '\n') 270 _print(logfd, '%d'%i , p) 271 logfd.flush() 272 if logptr is not None: 273 logptr.add(p, i) 274 if logptr is not None: 275 logptr.finish(sys.stdout)
276 277
278 -class def_logger:
279 - def __init__(self, ndim, ni, c, arglist):
280 self.v = numpy.zeros((ndim,ndim), numpy.float) 281 self.n = 0 282 self.NI = ni 283 self.c = c
284
285 - def add(self, p, i):
286 if i >= self.NI/10: 287 self.v += numpy.outer(p, p) 288 self.n += 1
289
290 - def finish(self, stdout):
291 self.v /= float(self.n) 292 sys.stdout.writelines('[V]\n') 293 sys.stderr.writelines(str( self.v ) + "\n") 294 evalues, evec = numpy.linalg.eigh(self.v) 295 stdout.writelines('[Evals]\n') 296 stdout.writelines("%s\n" % str(evalues)) 297 stdout.writelines('[Evecs]\n') 298 stdout.writelines("%s\n" % str(evec))
299 300 301 302 303 304 305 if __name__ == '__main__': 306 try: 307 import psyco 308 psyco.full() 309 except ImportError: 310 pass 311 312 try: 313 go(sys.argv, mcmc.bootstepper) 314 except: 315 die.catch('Unexpected exception') 316 raise 317