Application of Genetic Programming (GP) in Prediction of Gas Chromatographic Retention Time of some Pesticides

Mohammad Hossein Fatemi, Zahra Pahlevan Yali

Article (PDF)


Quantitative structure–retention relationships, pesticide, retention time, multiple linear regression, genetic programming


In this study, quantitative structure–retention relationship (QSRR) methodology was employed for modeling of gas chromatographic retention time for 74 pesticides. Stepwise multiple linear regression (SW-MLR) was used for the selection of most important descriptors. Multiple linear regression (MLR) and genetic programming (GP) were utilized to develop linear and symbolic regression equation models, respectively. Inspection to statistical parameters of developed MLR and GP models indicates symbolic regression equation via GP can be selected as the best fitted model. For this model, the square correlation coefficients (R2) were 0.943 and 0.911, and the root-mean square errors (RMSE) were 2.56 and 2.77 for the training and test sets, respectively. The built GP model was assessed by leave one out cross-validation (Q2cv = 0.79, SPRESS = 2.57) as well as external validation. In addition, the result of sensitivity analysis of GP model suggest structural features and polarity are important factors responsible for gas-chromatographic retention time values of studied pesticides.