Not so simple sampling

So above we have proved that simple sampling won’t get us long self-avoiding walks — they die out too quickly. Let’s try something else.

To make the mathematics a little simpler, let us assume, for the moment, that we will fix a maximum length of walk. This way our state space will be finite and computing normalisations for probabilities (etc) is a simpler proposition. This can be extended to infinite state spaces, but then we definitely have to make sure that we suppress higher length walks so that our normalisations are finite.

Building on what we know

Lets assume that we’ve managed to construct a self-avoiding walk of some reasonable length. Given the existing walk, construct a new walk from it. For SAWs, there are a few obvious ways to do this

add a step onto the end of the walk
delete the last step of the walk

Notice that we can produce every single self-avoiding walk by some sequence of these operations, and that the operations undo each other. We could also do more sophisticated(?) or complicated(?) things like “insert an edge into the middle of the walk” and “delete an edge from the middle of the walk” — indeed such things are necessary for sampling polygons (BFACF algorithm) — but lets start by doing simpler things.

Now — we cannot just add edges randomly, we still need to maintain self-avoidance, so let’s tweak the add/subtract rule as follows:

take a SAW of length $n$
choose a direction uniformly at random
if direction is an immediate reversal, then delete the last edge, returning the SAW of length $n-1$ .
otherwise add an edge in that direction, and if step is legal, add that edge and return the SAW of length $n+1$ , otherwise keep the current walk.

Notice that we can summarise this by saying — pick an edge neighbouring the end of the walk, if it is vacant, occupy it, and if it is occupied, vacate it. Keep the result if it is a valid walk.

This defines a Markov chain on the set of walks. The Markov chain we have described above is (nearly) the Berretti-Sokal algorithm — named after the authors of the paper in which it was introduced in 1985. It does sometimes lead to unfortunate sounding abbreviations (as we’ll see).

It still has a few issues, but we can hope that running the above process will produce nice random self-avoiding walks for us.

A very little bit of Markov chain theory

We have a Markov chain, but we’d like to know a bit more about it — in particular, if we use these rules to walk around on the set of SAWs, do we see each SAW with the correct probability. ie — we’d like to make sure that every walk of length $n$ is visited with the same probability. While we don’t mind too much that there might be biasses between lengths, we really don’t want (at this stage) to introduce sampling biasses within any given fixed length.

First up — lets make sure that we can build every walk of length $n$ . Notice that there is always a non-zero probability of building any given “legal” sequence of steps. Similarly, there is always a positive probability of deleting all the steps to return to the walk of length $0$ . Hence there is a non-zero probability of getting between any two given walks of finite length. This ensures that the way we have defined the chain has not accidentally split the set of SAWs into disconnected subsets.

Next, we need something called detailed balance — this is a very easy way to ensure that a Markov chain samples objects from a given prescribed distribution. Say we’d like our chain to sample a walk $\varphi$ with probability $\pi(\varphi)$ . To be more precise — we’d like the stationary distribution of the Markov chain to be given by $\pi$ .

Then it suffices to show that $\begin{align*} \pi(\varphi) \cdot P(\varphi \to \tau) = \pi(\tau) \cdot P(\tau \to \varphi) \end{align*}$ where $P(\varphi \to \tau)$ is the probability of moving from $\varphi$ to $\tau$ in a single step of the chain.

So consider our chain as described above and two SAWs $\varphi$ and $\tau$ .

If $\varphi$ and $\tau$ are not separated by a single legal step of the chain, then $P(\varphi\to\tau) = P(\tau\to\varphi)=0$ and detailed balanced is satisfied.
Lets assume that the SAWs are separated by a single step, and, without loss of generality, assume that $\tau$ is 1 step shorter. To highlight this, lets denote our walks $\varphi_n$ and $\tau_{n-1}$ .
The probability of going from $\varphi_n \to \tau_{n-1}$ is just $\begin{align*} P(\varphi_n \to \tau_{n-1}) = \frac{1}{4} \end{align*}$
The probability of going from $\tau_{n-1}$ to $\varphi_n$ is then $\begin{align*} P(\tau_{n-1} \to \varphi_n) = \frac{1}{4} \end{align*}$
Hence we require that $\begin{align*} \pi(\varphi_n) \cdot \frac{1}{4} &= \pi(\tau_{n-1}) \cdot \frac{1}{4} \end{align*}$
Hence we are sampling from the uniform distribution — every configuration (regardless of length) will be sampled with the same probability.

This might not work out so well…

Let’s first do this wrong

So before we correct what is a fairly obvious problem with our Markov chain, I want to do this the wrong way to illustrate a point that will become important for approximate enumeration later. We’ll start by asking our chain to produce all SAWs (up to a given maximum length) with equal probability — sounds innocent enough — no biases. Notice, though, at any given step of the chain, the probability of trying add a step is about 3 times greater than the probability of deleting a step. This will certainly cause us to sample way more long walks. Lets look at some code though — we can fix this problem we are busy creating once we’ve got working code. To slightly butcher Knuth (or at least his words):

Premature optimisation is the root of all evil (or at least most of it) in programming.

Notice — we aren’t going to collect data after every chain step — we should really leave enough iterations in order to get a statistically independent sample. There are method for deciding on this, but for the moment, lets just do something simple. Again — something that can be fixed later.

We’ll also try to encapsulate the code in a more intelligent way. We’ll create a SAW-object using python’s class structure. This allows us the keep all the SAW code contained away from the other code. Our SAW object will know how to initiate itself, how to add a step, how to delete a step and how to do a single Markov-chain move. The main code can then just tell the SAW-object to do a step. This keeps our code cleaner and (if we do it right) allows us to reuse stuff more easily.

from random import random, randint
import matplotlib.pyplot as plt

N=100
S=10**6
counts=[0 for i in range(N+1)]

class SAW:
  #these will belong to all instances of SAW
  xMoves=[1,-1,0,0]
  yMoves=[0,0,1,-1]

  def __init__(self):
    # "self.blah" belongs to just this SAW.
    self.length=0
    self.walk=[(0,0)]
    self.occupied=set()
    self.occupied.add( (0,0) )

  def deleteStep(self):
    x,y=self.walk.pop() 
    self.occupied.remove( (x,y) )
    self.length-=1

  def addStep(self, p):
    if(self.length==N):
      return #don't get too long
    if( p not in self.occupied ):
        self.walk.append(p)
        self.occupied.add(p)
        self.length+=1

  def doMove(self):
      xn,yn=self.walk[-1] #last vertex
      c=randint(0,3); xn += SAW.xMoves[c]; yn += SAW.yMoves[c]

      #be careful of length 0 walks.
      if(self.length==0): #cannot delete
          self.addStep( (xn,yn) )
      else:
          if( (xn,yn)==self.walk[-2] ):
              self.deleteStep() #immediate reversal
          else:
              self.addStep( (xn,yn) )

phi=SAW()
for s in range(S):
  phi.doMove()
  if( s%17==0 ): #move a few times then record
    counts[ phi.length ]+=1

plt.bar(range(N+1), counts, color="blue")
plt.yscale('log')
plt.show()

xs,ys = zip( *phi.walk ) 
plt.axis([ min(xs)-1, max(xs)+1, min(ys)-1, max(ys)+1])
plt.plot(xs,ys,'bo-')
plt.show()

We can then look at one of the SAWs it produces

A SAW

and at the distribution of SAWs it produces

A poor distribution

Clearly there is a big problem here.

By choosing (innocently) to sample all SAWs with the same probability, we are just finding lots of long SAWs and almost no short SAWs — this is because there are exponentially more long saws than short SAWs. We need some bias to make sure that we don’t spend all our time sampling long walks.

Worse — if we never shrink down to short walks, then we never manage to erase our first few choices of steps, they’ll be there almost all the time, giving us crappy and over-correlated statistics.

Let’s do it less wronger

Think about our choices of the probabilities of deleting a step adding and deleting steps: $\begin{align*} p_d &= \frac{1}{4} & p_a &= 3\times\frac{1}{4} \end{align*}$ Ignore that we are sampling self-avoiding walks for a moment, and just concentrate on what this means about the length of the underlying walks. With probability $1/4$ we choose to try to shorten the walk, and with probability $3/4$ we try to lengthen it. This describes a biased random 1d walk — no wonder we end up with way too many long walks from our first BS-sampler (there is that unfortunate abbreviation).

Instead we’d like to tune our sampler to get samples at all lengths — not least because we’d like our algorithm to return to short lengths regularly in order to erase most of the “history” of the SAW — ie those early steps in the configuration. To do this, we’d like the random-walk in length to be unbiased — with $p_a = p_d$ .

But what would this imply about the underlying sample? Back to our detailed balance equations: $\begin{align*} p_d \pi(\varphi_n) &= \frac{p_a}{3} \pi(\tau_{n-1}) \end{align*}$ If $p_d=p_a$ , then we are actually forcing the distribution: $\begin{align*} \pi(\varphi_n) &= \frac{1}{3} \pi(\tau_{n-1}) \end{align*}$ This implies that we are sampling from the distribution: $\begin{align*} \pi(\varphi_n) & \propto 3^{-n}. \end{align*}$

Now — to make this work, we could

scan all 4 directions to work out which results in delete and which results in add.
then with probability $0.5$ choose the delete direction and else choose one of the other 3 directions uniformly at random.

This feels like a very clunky solution. Instead, it is easier to make this work by adding a (more explicit) “do nothing” move into the chain. Effectively, this do-nothing move is introduced to reduce the probability of adding a step.

Choose direction uniformly at random.
If immediate reversal then delete (no change to this bit)
Otherwise it is an add move — but…
with probability $p$ do the add move (as before), but with probability $1-p$ do nothing and keep current walk.

So now $\begin{align*} p_d &= \frac{1}{4} & p_a &= \frac{3}{4} \cdot p & p_n &= \frac{3}{4} \cdot(1-p) \end{align*}$ Setting $p=1/3$ gives $p_d=p_a=1/4$ and $p_n=1/2$ . And now our chain step becomes

Choose direction uniformly at random.
If immediate reversal then delete the last step
Otherwise with probability $1/3$ attempt to add step in chosen direction and otherwise retain current configuration.

This is barely any change to our code — we just need to introduce this small “rejection” probability into the add-step routine.

   if( random()<0.33333 ):
      return;

This “do nothing” move is an idea that is known as the Metropolis-Hastings step (or algorithm). I’ll discuss it a bit more shortly. It is named after Nicholas Metropolis (who came up with the idea in the early 50s— though there is some disputing this) and Wilfred Hastings (who extended it in the early 70s). Anyway — it is known as Metropolis-Hastings now.

Our code is now

from random import random, randint
import matplotlib.pyplot as plt

N=100
S=10**6


class SAW:
  xMoves=[1,-1,0,0]
  yMoves=[0,0,1,-1]

  def __init__(self):
    self.length=0
    self.walk=[(0,0)]
    self.occupied=set()
    self.occupied.add( (0,0) )

  def deleteStep(self):
    x,y=self.walk.pop() 
    self.occupied.remove( (x,y) )
    self.length-=1

  def addStep(self, p):
    if(self.length==N):
      return 
    #metropolis-hastings step
    if( random() > 0.375 ):
      return()

    if( p not in self.occupied ):
        self.walk.append(p)
        self.occupied.add(p)
        self.length+=1

  def doMove(self):
      xn,yn=self.walk[-1] #last vertex
      c=randint(0,3); xn += SAW.xMoves[c]; yn += SAW.yMoves[c]

      if(self.length==0): #cannot delete
          self.addStep( (xn,yn) )
      else:
          if( (xn,yn)==self.walk[-2] ):
              self.deleteStep() #immediate reversal
          else:
              self.addStep( (xn,yn) )

counts=[0 for i in range(N+1)]
phi=SAW()

for s in range(S):
  phi.doMove()
  if( s%17==0 ):
    counts[ phi.length ]+=1

plt.bar(range(N+1), counts, color="blue")
plt.yscale('log')
plt.show()

Running it gives us the following distribution:

A better distribution

So — we are now getting plenty of walks at short lengths, but not enough walks at long lengths. We have clearly made it too hard to add steps. The reason for this is that while every deletion-move succeeds, a large portion of addition-moves fail due to self-intersection — leaving us with an “do nothing” move that is unstated in the above calculation.

Lets back up a couple of equations to here: $\begin{align*} \pi(\varphi_n) = 3^{-|\varphi_n|} \end{align*}$ To avoid issues with normalisations, set: $\begin{align*} Z &= \sum_{\varphi_n} \pi(\varphi_n) \end{align*}$ Note here — the above is finite provided our state space is finite, but we have to be more careful if we consider the set of all self-avoiding walks.

Then the probability of sampling the trivial walk is $\begin{align*} \mathbb{P}(\cdot) = \frac{1}{Z} \end{align*}$ while the probability of sampling a given walk, $\varphi_n$ of length $n$ is $\begin{align*} \mathbb{P}(\varphi_n) = 3^{-n}/Z \end{align*}$ But then, the probability of sampling any walk of length $n$ is $\begin{align*} \mathbb{P}(n) = c_n \cdot 3^{-n}/Z \end{align*}$ We saw (no pun intended) that $c_n \ll 3^n$ — and thus the probability of us sampling a walk of length $n$ is decaying exponentially with $n$ .

Notice — that this implies that even if we do take the set of all self-avoiding walks (instead of capping at some finite maximum length) then the normalisation $Z$ is a finite constant.

To avoid this exponential decay (or at least mitigate it a bit), we should choose (roughly) $\begin{align*} \pi(\varphi_n) & \propto \mu^{-|\varphi_n|} \end{align*}$ where $\mu$ is the growth rate of SAWs. If we do this for the finite state-space version our normalisation will be finite, but we might have to think more about what happens for the inifinite state-space version. Does our normalisation converge? Since we don’t know $\mu$ exactly, best to just pick some number that is guaranteed to be a touch larger than $\mu$ .

To achieve this distribution we tweak our Metropolis-Hastings to accept the add-move with probability $1/\mu$ . This translates to $\begin{align*} p_d &= \frac{1}{4} & p_a &= \frac{3}{4 \mu} \end{align*}$ Detailed-balance then gives us $\begin{align*} p_d \pi(\varphi_n) &= \frac{p_a}{3} \pi(\tau_{n-1}) \\ \frac{1}{4} \pi(\varphi_n) &= \frac{1}{4\mu} \pi(\tau_{n-1}) \end{align*}$ and so $\pi(\varphi_n) \propto \mu^{-n}$ as required.

So — the only change we need to make to our code is to tweak the MH probability. Since $\mu \approx 2.64\dots$ , lets set $\beta = 0.375$ (which is a touch smaller than $1/\mu$ ). Again — this is a tiny change to our code:

   if( random()<0.375 ):
      return

but we now get:

An even better distribution

which looks much better

Let’s do even betterer

For the moment — suspend disbelief — and assume that we know everything about SAWs enumeration. Assume that we have an oracle — for various reasons it should be called Iwan — that can tell us the number of self-avoiding walks of length $n$ in constant (and short) time.

How can I use that information to choose $\pi(\varphi_n)$ ? Say I want to sample all walks up to length $71$ (that is as far as Iwan has worked), and I’d like equal probability of sampling a walk at any of those lengths — and the probability of sampling any walk of a given length to be equal. That is, I want to bias between different lengths, but not within a given length.

Then I’d like to be able to set $\begin{align*} \mathbb{P}(n) &= \frac{1}{72} = const. \end{align*}$ but I know this is equal to $\begin{align*} \mathbb{P}(n) &= c_n \cdot \pi(\varphi_n) & \text{where $\varphi_n$ is a given walk of length $n$} \end{align*}$

But then I need to set $\pi(\varphi_n) \propto c_n^{-1}$ — which makes intuitive sense. We have a bunch of buckets of SAWs each with different numbers of objects in them. The probability of picking any given bucket needs to be constant. So the probability of picking a given SAW from its bucket needs to “undo” the number of SAWs in that bucket. In order to make that work we are going to need to have different Metropolis-Hastings probabilities at different lengths.

So back to our $p_d, p_a$ equations but now being more careful about lengths. We know we want to choose $\pi(\varphi_n) \propto c_{n+1}^{-1}$ . Then detailed balance is $\begin{align*} p_d \pi(\varphi_n) &= \frac{p_a}{3} \pi(\tau_{n-1}) \\ \frac{p_d}{c_n} &= \frac{p_a/3}{ c_{n-1}} \\ \frac{p_d}{p_a/3} &= \frac{c_{n}}{c_{n-1}} \end{align*}$ Remember that stray factor of $1/3$ comes from the fact that there are 3 ways to add a step (1 way to delete), so if we want to add a specific edge, there is a $p_a/3$ probability.

Now if we leave $p_d$ constant as $1/4$ (easier to do this), then we set the probability of stepping from a configuration of length $n-1$ to (a specific) one of length $n$ to be $c_{n-1}/c_n$ .

Hence when are are at a configuration of length $n$ and we attempt an add-move, we set our MH probability to be $c_{n}/c_{n+1}$ . Lets look at some code (using our Iwan-oracle data).

from random import random, randint
import matplotlib.pyplot as plt

N=70
S=10**6

correct=[1,4, 12, 36, 100, 284, 780, 2172, 5916, 16268, \
  44100, 120292, 324932, 881500, 2374444, 6416596, 17245332, \
    46466676, 124658732, 335116620, 897697164, 2408806028, \
    6444560484, 17266613812, 46146397316, 123481354908, \
    329712786220, 881317491628, 2351378582244, 6279396229332,\
    16741957935348, 44673816630956, 119034997913020, 317406598267076,\
    845279074648708, 2252534077759844, 5995740499124412, \
    15968852281708724, 42486750758210044, 113101676587853932,\
    300798249248474268, 800381032599158340, 2127870238872271828,\
    5659667057165209612, 15041631638016155884, 39992704986620915140,\
    106255762193816523332, 282417882500511560972, 750139547395987948108,\
    1993185460468062845836, 5292794668724837206644, 14059415980606050644844,\
    37325046962536847970116, 99121668912462180162908, 263090298246050489804708,\
    698501700277581954674604, 1853589151789474253830500, 4920146075313000860596140,\
    13053884641516572778155044, 34642792634590824499672196, 91895836025056214634047716,\
    243828023293849420839513468, 646684752476890688940276172, 1715538780705298093042635884,\
    4549252727304405545665901684, 12066271136346725726547810652, 31992427160420423715150496804,\
    84841788997462209800131419244, 224916973773967421352838735684, 596373847126147985434982575724,\
    1580784678250571882017480243636, 4190893020903935054619120005916]

class SAW:
  xMoves=[1,-1,0,0]
  yMoves=[0,0,1,-1]

  def __init__(self):
    self.length=0
    self.walk=[(0,0)]
    self.occupied=set()
    self.occupied.add( (0,0) )

  def deleteStep(self):
    x,y=self.walk.pop() 
    self.occupied.remove( (x,y) )
    self.length-=1

  def addStep(self, p):
    if(self.length==N):
      return 
    #metropolis-hastings step
    if( random() > correct[self.length]/correct[self.length+1] ):
      return()

    if( p not in self.occupied ):
        self.walk.append(p)
        self.occupied.add(p)
        self.length+=1

  def doMove(self):
      xn,yn=self.walk[-1] #last vertex
      c=randint(0,3); xn += SAW.xMoves[c]; yn += SAW.yMoves[c]

      if(self.length==0): #cannot delete
          self.addStep( (xn,yn) )
      else:
          if( (xn,yn)==self.walk[-2] ):
              self.deleteStep() #immediate reversal
          else:
              self.addStep( (xn,yn) )


counts=[0 for i in range(N+1)]
phi=SAW()

for s in range(S):
  phi.doMove()
  if( s%17==0 ):
    counts[ phi.length ]+=1

plt.bar(range(N+1), counts, color="blue")
plt.show()

and this gives us a beautifully flat histogram of samples:

Beautifully flat

Notice — we are not plotting the histogram with a log-scale.

Of course, this isn’t so much use in the “real world”, since we don’t have all that enumeration data lying around. We used the enumeration data to get a flat-histogram, but we can turn that around and use a flat(ish)-histogram to get enumeration data. Back to approximate enumeration.