PTIFS software

The algorithm and software is developed by Mr. Zhanfeng Wang.

The program can also be downloaded from his web: http://202.38.64.10/~zfw/PTIFS.htm

Supplementary information (simulation results)

Source: Wang Z, Chang YC, Ying Z, Zhu L, Yang Y. (2007) PTIFS: A parsimonious threshold-independent protein feature selection method through the area under receiver operating characteristic curve. Bioinformatics. 2007 Oct 15;23(20):2788-9

DOWNLOAD:

PTIFS program, flow chart of PTIFS algorithm, example, data set 1, data set 2

GENERAL DESCRIPTION:

The parsimonious threshold-independent protein feature selection (PTIFS) method was proposed for selecting protein (peptide) biomarkers using mass spectrometry data, but it can also be used for other types of data such as gene expression data. The PTIFS is a parsimonious feature selection method. It selects features (proteins) in a similar way as the LARS method. The area under the receiver operating characteristic curve (ROC) is used as the criterion for selecting features. The threshold parameter is determined by cross-validation and therefore is threshold-independent. The current version of PTIFS is designed for two-class classification problem.

PROGRAM:

PTIFS is a release version of PTIFS.f90 and it can be executed on Windows system. The program is designed for two-class classification problem. It needs two input data files (casedata.txt and controldata.txt) and one file for parameter specifications (para.txt):

INPUT:

(1) Data sets (casedata.txt and controldata.txt)

The casedata.txt and controldata.txt are respectively the data sets for case (diseased) group and control (normal) group. Note that for both data sets, the rows are for features and columns for individuals.

(2) Parameter File (para.txt)

1^st row: number of features

2nd row: sample sizes for case and control group

3rd row: training sample sizes for case and control groups

4th row: K (number of partition of samples in K-fold cross-validation)

5th row: step size in the gradient descent optimization algorithm

6th row: lambda

OUTPUT:

Results are stored in file results.txt

1st row: number of features selected

2nd row: indices of the selected features

3th row: threshold parameter tau selected by cross-validation

4th row: AUCs for trainning and testing data sets

5th row: classification result for trainning data set

6th row: classification result for testing data set

EXAMPLE:

An example session is available here. Download it and extract it. Then you can see the input data sets (casedata.txt and controldata.txt) and parameter file (para.txt) in the folder. Click PTIFS.exe to run the program. Results are stored in results.txt.

Last updated July 30, 2007