----------------------- PoooL v1.0 for Windows ----------------------- Department of Statistics and Finance, University of Science and Technology of China. Written by Han Zhang in collaboration with Dr. Hsin-Chou Yang and Prof. Yaning Yang Latest revision: April 2008 I'd like to thank my BEST friend, Jianming Wu, for his technical supports in programming. PLS visit http://home.ustc.edu.cn/~zhanghan or http://staff.ustc.edu.cn/~ynyang for accessing the upgraded program of PoooL. -------------------------- Introduction -------------------------- PoooL, is a new software along with our paper, PoooL: An efficient algorithm for estimating haplotype frequencies from large DNA pools, which was submitted to Bioinformatics recently. PoooL can efficiently estimate haplotype frequencies from DNA pools with arbitrarily large sizes. We reformulated the EM-based model into the constrained maximum entropy model, then solve this problem by the IIS algorithm. Simulations shows that the computational complexity of our algorithm is independent of pool sizes, and the computational efficiency for large pools is thus substantially improved over existing estimating methods. -------------------------- Usage of the PoooL software. -------------------------- The users need to provide a data file (real_data.txt) and a supporting file (para.txt) (1). para.txt This files is used to to specify parameters such as pool size, number of pools and the number of loci. For example, a typical "para.txt" has the following three lines pool size >>100 pool number >>30 loci >>3 the above parameters mean that the pool size is N=100, the number of pools is T=30 and number of loci is q=3. (2). real_data.txt The data file has T lines (T is number of pools) and q columns (for the q loci). Each line looks like this allele_freq_locus_1 allele_freq_locus_2 ... allele_freq_locus_q where allele_freq_locus_k is the frequencis of alleles on the k-th locus, k=1,2,...,q, and the fields should be separated by Tabs. After setting up the above two files, double click "PoooL v1.0.exe" and "Enter" to run the program. After the program terminate, output will be shown on the screen, press "q+Enter" to quit. -------------------------- Example -------------------------- Two files, para.txt and real_data.txt, are stored in the same folder together with the software. (1). Parameter settings in para.txt are as follows: pool size >>200 pool number >>30 loci >>10 (2). Pooled genotype data in real_data.txt are as follows: 0.890000 0.970000 0.557500 ... 0.947500 0.865000 0.985000 0.585000 ... 0.965000 ... ... 0.855000 0.985000 0.572500 ... 0.955000 (3). Double click PoooL v1.0.exe, press Enter to start the program, output looks like the follow ------------------------output---------------------------------------- entropy= 5.147860 ==================================== loci= 10 pool number= 30 pool size= 200 idx hap freq [272] 0000100010 >>0.028 [617] 1001011001 >>0.027 [619] 1101011001 >>0.025 [718] 0111001101 >>0.021 [751] 1111011101 >>0.493 [963] 1100001111 >>0.012 [970] 0101001111 >>0.102 [1003] 1101011111 >>0.226 [1023] 1111111111 >>0.023 -------------------------end of output------------------------------------ where "entropy" is the maximum entropy value sum_j p_j*log(p_j/Rh_j). From this output, the frequency for the 751th haplotye is estimated to be 0.493, its true value is 0.5075 (see below for the true values) Remark: in the above example, the true haplotype frequencis are (zero frequencies are not listed) p111=0.0333 p491=0.0166 p587=0.0167 p617=0.0167 p619=0.0167 p751=0.5075 p922=0.0167 p963=0.0333 p970=0.1000 p1003=0.1925 p1023=0.0500