Author Infor.

Lab. website


WinHAP 2.0

Phasing Software Package


WinHAP2.0 is a significantly improved version of the WinHAP program that both process longer sequences with less computer memory usage and increases computing speed. We used a Segmenting-Merging strategy for phasing longer sequences and parallelized the program by OpenMP programming model for speeding up. We tested it on large-scale datasets and found that this new algorithm could phase 500 genotypes with 100000 SNPs using just 12.8MB memory on a personal computer which is nearly impossible for any other existing tools. And WinHAP2 improved computing speed by 10 times with 16 threads. The speed of parallelized WinHAP2 is several orders of magnitude faster than existing large-scale phasing algorithms.


The software WinHAP2.0 is free for non-commercial use. Linux and Windows executables are available for download:

The large-scale genotype dataset for testing can be downloaded here:

The original code can be downloaded here:

Now the code is a little confusing. Later, we will put it in order.

If the buttons do not work, you can send an e-mail to the authors which includes:

subject: Request for WinHAP Package
1. Name.
2. Affiliation.
3. Version of operating system.
4. Maximum datasize (# of genotypes in your data / # of SNPs in your data).

Program Usage

You can run the executables directly as follow:

$ ./WinHap2 genotypesfile haplotypesfile [options]

genotypesfile :      The genotypes file which is inputed. Default value = "./genotypes.in".

haplotypesfile :     The haplotypes file which is outputed. Default value = "./haplotypes.out".


-b (=1000)       Length of each segment except for the last one. The value should be larger than 1,000.

-p (=1)            Number of threads used to compute.

-d (="./")         Directory of temperary files. The software will build a folder named "winhap" in the
                       directory to store some temperary files which are needed during computing.

-s (=0)            If the value=0, the "winhap" folder will be deleted after the result is got. If the value=1,
                       the folder will not be deleted. Any other value is not permited. In the folder, "subfile_*"
                       are the genotype segments got after first phase. "subresult_*" are the haplotype results
                       for the "subfile_*". "mergeresult_*" are the temperary files in last phase.

If you can't remember the parametres of the program, you can get help by running as follows

$ ./WinHap2 --help.

The program can also be run using the default parameters as follows:

$ ./WinHap2

If you want to use the default parameters, you must guarantee the file of genotypes.in exists, and the file is not empty.

InputFile formats:

    One line per genotype, SNPs values are in {0,1,2,?}
    0 - homozygous SNP with major allele
    1 - homozygous SNP with minor allele
    2 - heterozygous SNP
    ? - missing data

OutputFile formats:

Two haplotypes per genotype.

    One line per haplotype, SNPs values are in {0,1}
    0 - major allele SNP
    1 - minor allele SNP

Sample input and output:

    The input file genotypes.in contains 9 genotypes each with 96 SNPs.

    The output file haplotypes.out is a result phased by WinHAP for genotypes.in, which contains 18 haplotypes each with 96 SNPs.


If you use WinHAP in public, please report the WinHAP version used and cite the publication:


Weihua Pan
Phone: +8615856386154
Email: whpan@mail.ustc.edu.cn
Office: 502, NHPCC(Hefei), USTC, Jinzhai Road
Homepage: http://home.ustc.edu.cn/~whpan/

Yun Xu
Phone: +86-551-3602441
Email: xuyun@ustc.edu.cn
Homepage: http://staff.ustc.edu.cn/~xuyun/

Since July 1, 2013.