Minimum Hamming Distance Probability Distribution Function Analysis


2005-09-14

Introduction

Given an n bit binary string, 2n different combinations or code words can be constructed. Any given pair of codewords can be described by a hamming distance.

If a set of M codewords is taken without replacement from the full N=2n member set of codewords, the set can be described by the minimum hamming distance between any pair of codewords in the set.

Error correction benefits from selecting a set of codewords such that the minimum hamming distance of the codeword set is a maximum possible for a given n and M. [reference (maybe webpages I've viewed).

In an effort to understand how good a code can be achieved, I am computing the "probability density function" (PDF) of the value of the minimum hamming distance (MHD) for varying (n,M) combinations.

A Monte Carlo approach is selected to compute the distribution since for most (n,M) combinations, the number, N choose M (binomial coefficient) combinations cannot be computed exhaustively. The density is therefore approximated by...

The raw, un-normalized frequency distributions can be viewed here.

add ref on plot page to jpgrapgh


How the results are computed and accumulated

A small applet is used to facilitate generation of the distributions.

Download the applet here.

The zip file, once extracted on the target (Windows) box will produce a setup program to install the applet. The installation will add "codeham" to your start menu.

The applet allows results to be accumulated on any Internet connected box and sent to a common server (currently this server). The client server architecture allows me (and even you) to run the program on multiple boxes, speeding up the data collection process. The applet dynamically adjusts the number of results accumulated prior to initiating a UDP packet to the server. This is done to minimize network traffic. The applet will attempt to send a result set about once every couple of seconds.

The server side code is very primitive and cannot yet handle incomplete UDP packets, so it is possible for packet fragmentation to keep particular boxes from successfully sending results to the server. The maxmimum packet size selected worked during my testing but your mileage may vary.


Objectives

  1. Given the number of bits (n) and a target minimum hamming distance (MHD), find directly from experimental data the maximum number of codewords that can be selected from the full codeword set while maintaining the MHD constraint.

    This first objective has been achieved, at least insofar as data is available to support the inquiry.

    Maximum codeword set size lookup tool.

  2. Derive an analytical predictor of the maximum codeword set size given n and MHD.
  3. Given the number of bits (n) and a target minimum hamming distance (MHD), find from a predictive code derived from analysis of the experimental data, the maximum number of codewords that can be selected from the full codeword set while maintaining the MHD constraint.
Naturally occurring distributions can be compared and possibly fit to gain insight into how to construct a better code.


References

Beej's Guide to Network Programming

CodeHam Home