Skip to content

Back to approximate enumeration

Lets return to our first hack at choosing transition probabilities. We just chose a step direction uniformly at random and then if it was a reversal we deleted it, and otherwise we added it. ie and detailed balance gave us and so . That is — the probability of sampling any walk at all was equal. Hence the probability of sampling a walk of length was Consequently, So — one (very crappy) way we could estimate would be to run our (very naive) sampling code and compare the probabilities that we sample a walk of length vs one of length .

Of course, this is going to be very very slow, since our code will need to run an exponentially long time to see 1 walk of length for every walks of length . How can we improve this?

Well — lets look at our improved of simple sampling — when we put in our Metropolis-Hastings step. Here we have , and so . So the probability of sampling a walk of length was and consequently We saw that this was very roughly constant. Hence from a good estiamte of we can run our code, estimate the relative probability of seeing a SAW of length compared to one of length and so get our estimate of . We then went a bit further and used actual enumeration data to tune the transition probabilities to get a very flat histogram giving

But what I’d like to do now is turn this problem around. Rather than setting probabilities and observing the resulting histograms, I’d like to use the histograms to set the probabilities.

Transitions from a running histogram

First up — let be the number of walks of length sampled up to the current time step. This is clearly a function of the total number of samples the algorithm has produced — call that number . Then we should have Okay, we know that if we start out with our very simple naive approach () then and so we can estimate from our data as

We saw earlier that we get a nice flat histogram when we set

So in the absence of knowing the counts exactly, we can use our histogram estimates to set Ah — but this is (so far) garbage.

  • It will take an exponentially long time for the histogram buckets to settle to the right levels since we need , and
  • We only get when we use our original naive probabilities — if our histogram is flat then our probabilities are wrong, and if our probabilities are right, then our histogram will be flat. Urgh.

The first one might be easy to fix — we could work with something like the logarithm of the number of samples — call this quantity . We could set and each time we sample a configuration of length we increase by a constant. This avoids us having to take exponentially large numbers of samples to get the right ratios.

This might also sort out the second problem — by keeping separated from the actual histogram, if we do manage to choose correctly (or converge to the correct ones), then a flat histogram means all our ‘s will be increased by roughly the same amount, leaving our probabilities roughly unchanged.

Lets try this very simple -sampler (okay, -sampler is not the name used in the literature, but we’ll shortly get to the right algorithm and then give it the right).

from random import random, randint
from math import exp
import matplotlib.pyplot as plt

N=71
S=10**6

correct=[1,4, 12, 36, 100, 284, 780, 2172, 5916, 16268, \
  44100, 120292, 324932, 881500, 2374444, 6416596, 17245332, \
    46466676, 124658732, 335116620, 897697164, 2408806028, \
    6444560484, 17266613812, 46146397316, 123481354908, \
    329712786220, 881317491628, 2351378582244, 6279396229332,\
    16741957935348, 44673816630956, 119034997913020, 317406598267076,\
    845279074648708, 2252534077759844, 5995740499124412, \
    15968852281708724, 42486750758210044, 113101676587853932,\
    300798249248474268, 800381032599158340, 2127870238872271828,\
    5659667057165209612, 15041631638016155884, 39992704986620915140,\
    106255762193816523332, 282417882500511560972, 750139547395987948108,\
    1993185460468062845836, 5292794668724837206644, 14059415980606050644844,\
    37325046962536847970116, 99121668912462180162908, 263090298246050489804708,\
    698501700277581954674604, 1853589151789474253830500, 4920146075313000860596140,\
    13053884641516572778155044, 34642792634590824499672196, 91895836025056214634047716,\
    243828023293849420839513468, 646684752476890688940276172, 1715538780705298093042635884,\
    4549252727304405545665901684, 12066271136346725726547810652, 31992427160420423715150496804,\
    84841788997462209800131419244, 224916973773967421352838735684, 596373847126147985434982575724,\
    1580784678250571882017480243636, 4190893020903935054619120005916]

counts=[0 for i in range(N+1)]
G=[0 for i in range(N+1)]

class SAW:
  xMoves=[1,-1,0,0]
  yMoves=[0,0,1,-1]

  def __init__(self):
    self.length=0
    self.walk=[(0,0)]
    self.occupied=set()
    self.occupied.add( (0,0) )

  def deleteStep(self):
    x,y=self.walk.pop() 
    self.occupied.remove( (x,y) )
    self.length-=1

  def addStep(self, p):
    #metropolis-hastings step
    if(self.length==N or random() > exp(G[self.length] - G[self.length+1]) ):
      return()

    if( p not in self.occupied ):
        self.walk.append(p)
        self.occupied.add(p)
        self.length+=1

  def doMove(self):
      xn,yn=self.walk[-1] #last vertex
      c=randint(0,3); xn += SAW.xMoves[c]; yn += SAW.yMoves[c]

      if(self.length==0): #cannot delete
          self.addStep( (xn,yn) )
      else:
          if( (xn,yn)==self.walk[-2] ):
              self.deleteStep() #immediate reversal
          else:
              self.addStep( (xn,yn) )


phi=SAW()

for s in range(S):
  phi.doMove()
  if( s%3==0 ):
    counts[ phi.length ]+=1
    G[ phi.length ]+=1

plt.bar(range(N+1), counts, color="blue")
plt.show()

normG = list(map( lambda x:exp(x-G[0]), G ))
plt.plot(range(N+1), normG, "bo", range(N+1), correct, 'r-')

plt.yscale('log')
plt.show()

It gives us a nice flat histogram

Flat but...

And now look at the estimates — we should have Plotting these against our oracle-Iwan data we can see something is more than a little suspect:

G is bad

There are a couple of problems we need to correct.

Up and down agnostic

First up — the way we have designed our transition probabilities assumes that there are always fewer objects of size than of size — this is indeed the case for SAWs. However, we can see from our plot of , that our estimates (as the code runs) need not obey this — ie at some point in time we might have . This definitely screws up detailed balance and our results.

There is an easy fix — which also neatly symmetrises our transitions. Fix this is independent of the relative lengths of and — we no longer need to assume that or vice-versa.

Of course, this means that frequently the “probability” we have stated above will be bigger than . So we can fix that by setting Notice that if then — just like we had previously.

So the above means that we add a Metropolis-Hastings step into our code when we add a step, and also when we delete a step. Plugging this into detailed balance we get so if our histogram is flat then the LHS is just and hence

Our newly corrected code is then

from random import random, randint
from math import exp
import matplotlib.pyplot as plt

N=71
S=10**6

correct=[1,4, 12, 36, 100, 284, 780, 2172, 5916, 16268, \
  44100, 120292, 324932, 881500, 2374444, 6416596, 17245332, \
    46466676, 124658732, 335116620, 897697164, 2408806028, \
    6444560484, 17266613812, 46146397316, 123481354908, \
    329712786220, 881317491628, 2351378582244, 6279396229332,\
    16741957935348, 44673816630956, 119034997913020, 317406598267076,\
    845279074648708, 2252534077759844, 5995740499124412, \
    15968852281708724, 42486750758210044, 113101676587853932,\
    300798249248474268, 800381032599158340, 2127870238872271828,\
    5659667057165209612, 15041631638016155884, 39992704986620915140,\
    106255762193816523332, 282417882500511560972, 750139547395987948108,\
    1993185460468062845836, 5292794668724837206644, 14059415980606050644844,\
    37325046962536847970116, 99121668912462180162908, 263090298246050489804708,\
    698501700277581954674604, 1853589151789474253830500, 4920146075313000860596140,\
    13053884641516572778155044, 34642792634590824499672196, 91895836025056214634047716,\
    243828023293849420839513468, 646684752476890688940276172, 1715538780705298093042635884,\
    4549252727304405545665901684, 12066271136346725726547810652, 31992427160420423715150496804,\
    84841788997462209800131419244, 224916973773967421352838735684, 596373847126147985434982575724,\
    1580784678250571882017480243636, 4190893020903935054619120005916]

counts=[0 for i in range(N+1)]
G=[0 for i in range(N+1)]

class SAW:
  xMoves=[1,-1,0,0]
  yMoves=[0,0,1,-1]

  def __init__(self):
    self.length=0
    self.walk=[(0,0)]
    self.occupied=set()
    self.occupied.add( (0,0) )

  def deleteStep(self):
    #metropolis-hastings step
    if( random() > exp(G[self.length] - G[self.length-1]) ):
      return()

    x,y=self.walk.pop() 
    self.occupied.remove( (x,y) )
    self.length-=1

  def addStep(self, p):
    #metropolis-hastings step
    if(self.length==N or random() > exp(G[self.length] - G[self.length+1]) ):
      return()

    if( p not in self.occupied ):
        self.walk.append(p)
        self.occupied.add(p)
        self.length+=1

  def doMove(self):
      xn,yn=self.walk[-1] #last vertex
      c=randint(0,3); xn += SAW.xMoves[c]; yn += SAW.yMoves[c]

      if(self.length==0): #cannot delete
          self.addStep( (xn,yn) )
      else:
          if( (xn,yn)==self.walk[-2] ):
              self.deleteStep() #immediate reversal
          else:
              self.addStep( (xn,yn) )

phi=SAW()

for s in range(S):
  phi.doMove()
  if( s%3==0 ):
    counts[ phi.length ]+=1
    G[ phi.length ]+=0.1

plt.bar(range(N+1), counts, color="blue")
plt.show()

normG = list(map( lambda x:exp(x-G[0]), G ))
plt.plot(range(N+1), normG, "bo", range(N+1), correct, 'r-')
plt.yscale('log')
plt.show()

It gives us a nice flat histogram

Nice and flat

And now look at the estimates — we should have Plotting these against our oracle-Iwan data we can see something is still suspect G still wonky

Urgh.

But — all is not lost. Notice what happens when instead of increasing by 1 each sample, we increase it by

  G[ phi.length ]+=0.1 

Another nicely flat-ish histogram

Again nicely flat

But now our estimates look a lot better:

G is getting better

Why is this? Roughly speaking it is because is big.

Say (hypothetically) that we have managed to get our distribution to converge beautifully so that . Then if our algorithm is currently sitting at a configuration of length . Then we should have Now our algorithm decides to take a down step and increment the histogram and . After this single step we have and our nice distribution is all messed up in that single step.

So, by making the increment smaller, we are smoothing out this effect — it takes way more samples to move the distribution around. This is why our results look better.

But since our initial distribution is so wrong, by reducing the increment, it is going to take much longer to converge at longer lengths. Hence, we’d really like a system in which the -increment starts big but gets smaller with time. We now have all the ingredients of Wang-Landau sampling.

Getting better all the time

Let us define our increment as (that seems to be the traditional symbol in this game). We’d like to start out “big” to make big changes to our initial very wrong distribution, and then as time ticks along we’d like to get smaller so that we are “polishing” our distribution instead of making big changes.

One way to do this is to just reduce after some number of iterations. This is very easy to code. Our newly corrected code is then

from random import random, randint
from math import exp, log
import matplotlib.pyplot as plt

correct=[1,4, 12, 36, 100, 284, 780, 2172, 5916, 16268, 44100, 120292, 324932, 881500, 2374444, 6416596, 17245332, 46466676, 124658732, 335116620, 897697164, 2408806028, 6444560484, 17266613812,
    46146397316, 123481354908, 329712786220, 881317491628, 2351378582244, 6279396229332, 16741957935348, 44673816630956, 119034997913020, 317406598267076, 845279074648708, 2252534077759844,
    5995740499124412, 15968852281708724, 42486750758210044, 113101676587853932, 300798249248474268, 800381032599158340, 2127870238872271828, 5659667057165209612, 15041631638016155884,
    39992704986620915140, 106255762193816523332, 282417882500511560972, 750139547395987948108, 1993185460468062845836, 5292794668724837206644, 14059415980606050644844,
    37325046962536847970116, 99121668912462180162908, 263090298246050489804708, 698501700277581954674604, 1853589151789474253830500, 4920146075313000860596140, 13053884641516572778155044,
    34642792634590824499672196, 91895836025056214634047716, 243828023293849420839513468, 646684752476890688940276172, 1715538780705298093042635884, 4549252727304405545665901684,
    12066271136346725726547810652, 31992427160420423715150496804, 84841788997462209800131419244, 224916973773967421352838735684, 596373847126147985434982575724,
    1580784678250571882017480243636, 4190893020903935054619120005916]

N=71
S=10**6

class SAW:
  xMoves=[1,-1,0,0]
  yMoves=[0,0,1,-1]

  def __init__(self):
    self.length=0
    self.walk=[(0,0)]
    self.occupied=set()

  def doMove(self):
      xn,yn=self.walk[-1] #last vertex
      c=randint(0,3); xn += SAW.xMoves[c]; yn += SAW.yMoves[c]

      if(self.length==0): #cannot delete
          self.addStep( (xn,yn) )
      else:
          if( (xn,yn)==self.walk[-2] ):
              self.deleteStep() #immediate reversal
          else:
              self.addStep( (xn,yn) )

  def deleteStep(self):
    #metropolis-hastings step
    if( random() > exp(G[self.length] - G[self.length-1]) ):
      return()

    x,y=self.walk.pop() 
    self.occupied.remove( (x,y) )
    self.length-=1

  def addStep(self, p):
    #metropolis-hastings step
    if(self.length==N or random() > exp(G[self.length] - G[self.length+1]) ):
      return()

    if( p not in self.occupied ):
        self.walk.append(p)
        self.occupied.add(p)
        self.length+=1

counts=[0 for i in range(N+1)] #total samples
H = [0 for k in range(N+1)] #running histogram
G = [0 for k in range(N+1)] #G-distribution
f = 1 #G-incrementer

phi=SAW()

while(f >=0.1):
  for s in range(S):
    phi.doMove()

    if( s%17==0 ):
      counts[ phi.length ]+=1
      G[ phi.length ]+=f
      H[ phi.length ]+=1

  f /= 2
  g0 = G[0]
  G = list(map(lambda X: X-g0, G) )
  c = list(map(lambda X: exp(X), G) )
  print(c)

plt.bar(range(N+1), counts, color="blue")
plt.show()

plt.plot(range(N+1), c, "bo", range(72), correct, 'r-')
plt.yscale('log')
plt.show()

It gives us a nice flat histogram

Nice and flag

And now look at the estimates against oracle-Iwan data:

Nice and flag

Very nice.

Getting betterer all the time

The above works very nicely, but our choice of when to reduce is pretty arbitrary. We’d like to choose when to reduce it by something a bit more objective. To do so we’d like a measure of when our distribution is looking pretty good. One such measure is the flatness of our histogram — let us use that to decide when to reduce .

  • Start with (or some other similar sized number, maybe around )
  • Every samples, test the histogram for flatness — eg min-bucket vs max-bucket
  • If “flat enough” then reduce .
  • Reset histogram and continue.

This is pretty easy to code up. This is the Wang-Landau algorithm, named for Fugao Wang and David Landau who proposed this in 2001. It has since become a very standard simulation tool.

Lets look at some code:

from random import random, randint
from math import exp, log
import matplotlib.pyplot as plt

correct=[1,4, 12, 36, 100, 284, 780, 2172, 5916, 16268, 44100, 120292, 324932, 881500, 2374444, 6416596, 17245332, 46466676, 124658732, 335116620, 897697164, 2408806028, 6444560484, 17266613812,
    46146397316, 123481354908, 329712786220, 881317491628, 2351378582244, 6279396229332, 16741957935348, 44673816630956, 119034997913020, 317406598267076, 845279074648708, 2252534077759844,
    5995740499124412, 15968852281708724, 42486750758210044, 113101676587853932, 300798249248474268, 800381032599158340, 2127870238872271828, 5659667057165209612, 15041631638016155884,
    39992704986620915140, 106255762193816523332, 282417882500511560972, 750139547395987948108, 1993185460468062845836, 5292794668724837206644, 14059415980606050644844,
    37325046962536847970116, 99121668912462180162908, 263090298246050489804708, 698501700277581954674604, 1853589151789474253830500, 4920146075313000860596140, 13053884641516572778155044,
    34642792634590824499672196, 91895836025056214634047716, 243828023293849420839513468, 646684752476890688940276172, 1715538780705298093042635884, 4549252727304405545665901684,
    12066271136346725726547810652, 31992427160420423715150496804, 84841788997462209800131419244, 224916973773967421352838735684, 596373847126147985434982575724,
    1580784678250571882017480243636, 4190893020903935054619120005916]

N=71
S=10**5

class SAW:
  xMoves=[1,-1,0,0]
  yMoves=[0,0,1,-1]

  def __init__(self):
    self.length=0
    self.walk=[(0,0)]
    self.occupied=set()

  def doMove(self):
      xn,yn=self.walk[-1] #last vertex
      c=randint(0,3); xn += SAW.xMoves[c]; yn += SAW.yMoves[c]

      if(self.length==0): #cannot delete
          self.addStep( (xn,yn) )
      else:
          if( (xn,yn)==self.walk[-2] ):
              self.deleteStep() #immediate reversal
          else:
              self.addStep( (xn,yn) )

  def deleteStep(self):
    if( log(random()) > G[self.length]-G[self.length-1] ):
        return()
    x,y=self.walk.pop()
    self.occupied.remove( (x,y) )
    self.length-=1

  def addStep(self, p):
    if(self.length==N or log(random()) > G[self.length]-G[self.length+1] ):
        return

    if( p not in self.occupied ):
        self.walk.append(p)
        self.occupied.add(p)
        self.length+=1

counts=[0 for i in range(N+1)] #total samples
H = [0 for k in range(N+1)] #running histogram
G = [0 for k in range(N+1)] #G-distribution
f = 1 #G-incrementer

def flatten():
  global f,H,G
  r=min(H)/max(H)
  if(r>0.9): #time to reduce f
    f=f/2
    H = [0 for k in range(N+1)] #reset histogram
    g0 = G[0] 
    G = list(map(lambda X: X-g0, G) ) #avoid overflows
    print("Histogram is flat = ",r, "\tReduce f to ",f)
  else:
    print("Histogram not flat enough", r, "\tf left as ", f)

phi=SAW()

while(f >=0.01):
  for s in range(S):
    phi.doMove()
    if( s%17==0 ):
      counts[ phi.length ]+=1
      G[ phi.length ]+=f
      H[ phi.length ]+=1

  flatten()

plt.bar(range(N+1), counts, color="blue")
plt.show()

g0 = G[0]
c = list(map(lambda X: exp(X-g0), G) )

plt.plot(range(N+1), c, "bo", range(72), correct, 'r-')
plt.yscale('log')
plt.show()

It runs along and every iterations it checks the flatness of the histogram and if good enough it reduces . The results are very nice:

Look at that histogram

And now look at the estimates against oracle-Iwan data:

Strong agreement

Really really nice! With very little effort we are getting very good approximate enumeration data.

A couple of comments

Now — we should be a little careful with what we have done above. By tuning the proababilities as the chain runs, we are not sampling from a stable distribution. We should really run our code in 2 phases

  1. -tuning phase — in which we tune our distribution to the desired level of flatness (and so get our approximate enumeration data). There are lots of tricks we might try here — eg we could run Wang-Landau chains separately and then average the -data to smooth it out. We could also try different -reduction criteria — one good way is to simply let reduce with time, eg something like . We could also try some sort of statistical smoothing of the -data, or ....

  2. Sampling phase — once we’ve gotten our -data sorted out, out algorithm produces samples at all lengths. We can then use those samples to do statistical analysis of our problem.