UP | HOME

hammersley-clifford theorem

Any probability distribution that satisfies markov property with respect to a graph is a gibbs random field and vice versa.

1 proof:

Proof from wikipedia with my commentary

1.1 mrf \(\Rightarrow\) grf

1.1.1 lemma 1

  • Let \(U\) be all the variables under consideration.
  • Let \(\Theta, \Phi_1, ... \Phi_n \subseteq U\) and \(\Psi_1, ..., \Psi_m \subseteq U\) be arbitary sets of variables
  • Here, when applicable, \(X\) will also refer to an (arbitrary) assignment to the variables \(X\)

If, \[ Pr(U) = f(\Theta)\prod_{i=1}^{n}g(\Phi_i) = \prod_{j=1}^{m}h(\Psi_j) \] for functions \(f\), \(g_i\) and \(h_j\), then there exist functions \(g_i'\) and \(h_j'\) such that: \[ Pr(U) = \prod_{j=1}^{m}h'(\theta \cap \Psi_j)\prod_{i=1}^{n}g'(\Phi_i) \]

In other words, \(h_j\) gives a template for further factoring \(f\).

Now, with lemma 1, we can finish the proof. The local Markov property tells us that for any variable \(x\), we can write:

\begin{equation} Pr(U) = f_{x}(x,\delta x)f_{-x}(x,U\setminus\{x\}) \label{factor} \end{equation}

where \(\delta x\) are the neighors of \(x\). Namely: \[ Pr(U) = \underbrace{P(\delta x)P(x \mid \delta x)}_{f_{x}} \underbrace{P(U\setminus(\{x\} \cup \delta x) \mid \delta x)}_{f_{-x}} \]

Then, let's repeatedly apply lemma 1 to Equation \ref{factor}. The goal is to end up with an expression that only contains functions over cliques. Let's take a look at what happens when our template is: \[ Pr(u) = f_{y}(y, \delta y) f_{-y}(y, U\setminus\{y\}) \] Then, by lemma 1, applying this template to \Equation \ref{factor} gets us: \[ f_{x,y}'(\{x,\delta x\} \cap \{y, \delta y\})h'(\{x,\delta x\} \cap (U\setminus \{y\}))f_{-x}'(U\setminus \{x\}) \] Since we get to choose how to factor this function, let's assume that \(x\) and \(y\) belong to the same maximal clique

To take stock:

  • fx,y' is a function over things that are neighbors of \(x\) and \(y\)
  • h' is now a function over \((\{x,\delta x\} \setminus \{y\})\)
  • f-x is still a function over (U \setminux \{x\})

Here is the key realization:

  • Let's recursively take templates and factor each one of the terms
  • so for \(f_{x,y}\) let's continue to factor it, i.e. by using the templates for \(z\), where \(z\) is some node in the same maximal clique as \(x\) and \(y\)
  • once that first term consists only of nodes in a maximal clique, we can move on to the next term and rinse and repeat
  • how do we know that this process will terminate? because the sets of variables involved will strictly decrease everytime we factor
1.1.1.1 proof of lemma 1

Let \(\bar\theta\) denote a particular assignment to the variables \(U\setminus \Theta\). For an arbitrary set of variables \(X\) let \(\bar\theta[X]\) be the variables in \(X\) where the assignments to the variables that fall outside \(\Theta\) are fixed by \(\theta\).

Then, we will write \[ Pr(U) = f(\Theta)\prod_{i=1}^{n}g_i(\Phi_i) \] as \[ Pr(U) = f(\Theta)\prod_{i=1}^{n}g_i(\Phi_i \cap \Theta, \bar\theta[\Phi_i])\prod_{i=1}^{n}\frac{g_i(\Phi_i)}{g_i(\Phi_i\cap\Theta, \bar\theta[\Phi_i])} \] where \(g_i(\Theta_i \cap \Theta, \bar\theta[\Phi_i])\) is the function where the variables in \(\Phi_i\) that lie outside \(\Theta\) are fixed. Then, let:

  • \(f'(\Theta) = f(\Theta)\prod_{i=1}^{n}g_i(\Theta\cap\Phi_i, \bar\theta[\Phi_i])\)
  • \(g_i'(\Phi_i) = \frac{g_i(\Theta_i)}{g_i(\Phi_i \cap \Theta, \bar\theta[\Phi_i])}\)

Then, \[ Pr(U) = f'(\Theta)\prod_{i=1}^{n} g'(\Phi_i) = \prod_{j=1}^{m}h_j(\Psi_j) \]

Now, note that \(g_i'(\Phi_i) = 1\) when the assignments to \(\Phi_i\) exactly match the assignments in \(\bar\theta\). So, let's see what happens when we fix the assignments outside \(\Theta\) to be exactly the assignments in \(\bar\theta\). Then, we have:

\[ Pr(\Theta, \bar\theta) = f'(\Theta)\prod_{i=1}^n 1 = \prod_{j=1}^{m} h_j(\Psi_j\cap \Theta, \bar\theta[\Psi_j]) \] So, \[ f'(\Theta) = \prod_{j=1}^{m} h_j(\Psi_j\cap \Theta, \bar\theta[\Psi_j]) \] Then, let's set \[ h'_j(\Psi_j \cap \Theta) = h_j(\Psi_j \cap \Theta, \bar\theta[\Psi_j]) \] So, now we can finally say: \[ Pr(U) =\left( \prod_{j=1}^{m}h_j'(\Psi_j \cap \Theta)\right) \left(\prod_{i=1}^{n} g_i'(\Phi_i) \right) \]

1.2 grf \(\Rightarrow\) mrf

Consider \(X\) and \(Y\) separated by a set of nodes. Once all the separating variables have been fixed, we see that the probability function breaks into two neat pieces: one piece whose terms only involve \(X\) and one piece that only involves \(Y\)

Created: 2021-09-14 Tue 21:44