wclique Marker Selection How-To Document

How to use wclique for selecting genetic or Radiation Hybrid (RH) markers for framework mapping.

Introduction

wclique is a C++ program that helps select genetic or RH markers for framework mapping. wclique is copyrighted material, but may be freely distributed under the terms of the GNU General Public License.

Synopsis

% wclique [-r] [-b {breaks}] [-e {exact}] [-i {iters}] {input file}
-r
toggles optional weighing of markers by retention frequency instead of default informativity. May be of interest when selecting RH markers.
-b
followed by an integer specifies the minimum number of breaks allowed between selected markers. The default value is 1.
-e
followed by an integer specifies the maximum number of test markers to exaustively search. The default is 50, increasing this number slows execution, but may find better cliques. Setting this value to the number of markers in the input file guarantees finding the optimal solution but is prohibitivly slow for more than 150 markers.
-i
followed by an integer specifies the number of random trials to run when searching for cliques. If the -e parameter is set to the number of markers in the file, this parameter may be reduced to 1. The default value is 100.

Input file

wclique requires an file that contains an integer M, specifying the number of markers to be analyzed, followed by M lines of marker names, then N, an integer specifying the number of chromosomes to be analyzed, followed by N lines of length M each, specifying whether each marker is phase "P", "M", or "U".

Here's a (small) sample file with 4 markers and 5 chromosomes.

4
NAME1
NAME2
NAME3
NAME4
5
PMUP
UPPM
PUUM
MMPM
MUPU

For genetic mapping we use origins, from the BPE package to infer the grandparental origins of a set of markers in a three generation pedigree, then filter the labels file with a perl script ( mwc-prep.perl ). Here is how we typically use the programs:

% origins -p pedin.dat -d datain.dat -r /dev/null -l labels.dat
% mwc-prep labels.dat > wclique.in
% wclique wclique.in

For RH mapping we also have perl scripts to convert radmap input files into wclique format. Given a .mat and a .l file from radmap, you can produce the input file with a perl script ( rh-prep.perl ). Here is how we typically use the programs:

% rh-prep map.mat map.l > rhclique.in
% wclique -r rhclique.in

Output format

wclique produces a series of cliques on it's standard output. Each is a maximal weight clique, as it is discovered by the partial enumeration algorithm, the final line of the output is a maximal weight clique, although there may be others before it of equal weight:

Minimum number of breaks = 1
Found clique of size 4 , weight 14
1 2 3 4

The first line states the minimum number of breaks allowed between any two markers in the clique, subsequent pairs of lines give the clique size and weight, and lists indicies for the clique markers (i.e., "1 2 3" means the first second and third markers in the file form part of a clique).

The output is similar when run with the -r option:

Using retained weight function
Minimum number of breaks = 1
Found clique of size 4 , weight 7
1 2 3 4 

We have two other perl tools, the first, getnames.perl, can read an input file in wclique format, and a list of clique marker indicies, and output the marker name for each marker in the list.

% cat test.set
1 2 3
% getnames wclique.in test.set
NAME1
NAME2
NAME3

The second, verify.perl, reads a input file a list of marker indicies, and a minimum number of breaks, and checks that every marker does in fact have the required number of breaks:

% verify wclique.in test.set 3
Verifying that all markers have at least 3 breaks.
Markers = 4
Chromosomes = 5
Marker 1 has only 1 breaks with marker 2.
Marker 1 has only 2 breaks with marker 3.
Marker 2 has only 1 breaks with marker 3.

Notes

Related Links


Troglodita approved!

Humberto Ortiz Zuazaga
humberto@hpcf.upr.edu

Most recent change: 2005/1/6 at 11:58
Generated with GTML