-----------------------
PoooL v1.0 for Windows
-----------------------


Department of Statistics and Finance,
University of Science and Technology of China.
Written by
                          Han Zhang

in collaboration with
                      Dr. Hsin-Chou Yang
                  and
                      Prof. Yaning Yang

Latest revision: April 2008

I'd like to thank my BEST friend, Jianming Wu, for his technical supports in programming.

PLS visit
http://home.ustc.edu.cn/~zhanghan
or
http://staff.ustc.edu.cn/~ynyang
for accessing the upgraded program of PoooL.

--------------------------
Introduction
--------------------------

PoooL, is a new software along with our paper,
PoooL: An efficient algorithm for estimating haplotype frequencies from large DNA pools,
which was submitted to Bioinformatics recently. PoooL can efficiently estimate haplotype
frequencies from DNA pools with arbitrarily large sizes. We reformulated the EM-based model
into the constrained maximum entropy model, then solve this problem by the IIS algorithm.
Simulations shows that the computational complexity of our algorithm is independent of pool
sizes, and the computational efficiency for large pools is thus substantially improved over
existing estimating methods.

--------------------------
Usage of the PoooL software.
--------------------------

The users need to provide a data file (real_data.txt) and a supporting file (para.txt)

(1). para.txt

This files is used to to specify parameters such as pool size, number of pools and the number
of loci. For example, a typical "para.txt" has the following three lines

pool size    >>100
pool number  >>30
loci         >>3

the above parameters mean that the pool size is N=100, the number of pools is T=30 and number
of loci is q=3.

(2). real_data.txt

The data file has T lines (T is number of pools) and q columns (for the q loci).
Each line looks like this

	allele_freq_locus_1	allele_freq_locus_2 ... 	allele_freq_locus_q

where allele_freq_locus_k is the frequencis of alleles on the k-th locus, k=1,2,...,q,
and the fields should be separated by Tabs.

After setting up the above two files, double click "PoooL v1.0.exe" and "Enter" to run the program.
After the program terminate, output will be shown on the screen, press "q+Enter" to quit.

--------------------------
Example
--------------------------

Two files, para.txt and real_data.txt, are stored in the
same folder together with the software.

(1). Parameter settings in para.txt are as follows:

pool size    >>200
pool number  >>30
loci         >>10

(2). Pooled genotype data in real_data.txt are as follows:

0.890000	0.970000	0.557500	...	0.947500
0.865000	0.985000	0.585000	...	0.965000
...	...
0.855000	0.985000	0.572500	...	0.955000


(3). Double click PoooL v1.0.exe, press Enter to start the program, output
looks like the follow

------------------------output----------------------------------------

entropy= 5.147860


====================================

loci= 10
pool number= 30
pool size= 200

idx     hap     freq

[272]   0000100010  >>0.028
[617]   1001011001  >>0.027
[619]   1101011001  >>0.025
[718]   0111001101  >>0.021
[751]   1111011101  >>0.493
[963]   1100001111  >>0.012
[970]   0101001111  >>0.102
[1003]  1101011111  >>0.226
[1023]  1111111111  >>0.023

-------------------------end of output------------------------------------


where "entropy" is the maximum entropy value  sum_j p_j*log(p_j/Rh_j).
From this output, the frequency for the 751th haplotye is estimated to be 0.493,
its true value is 0.5075 (see below for the true values)



Remark: in the above example, the true haplotype frequencis are (zero frequencies are not listed)

p111=0.0333
p491=0.0166
p587=0.0167
p617=0.0167
p619=0.0167
p751=0.5075
p922=0.0167
p963=0.0333
p970=0.1000
p1003=0.1925
p1023=0.0500