The activity of 34 sulfonamide derivatives has been estimated by means of multiple linear regression (MLR), artificial neural network (ANN), simulated annealing (SA) and genetic algorithm (GA) techniques. These models were also utilized to select the most efficient subsets of descriptors in a crossvalidation procedure for nonlinear log (IC50) prediction. The results obtained using GAANN were compared with MLRMLR, MLRANN, SAANN and GAANN approaches. A high predictive ability was observed for the MLRMLR, MLRANN, SAANN and MLRGA models, with root mean sum square errors (RMSE) of 0.3958, 0.1006, 0.0359, 0.0326 and 0.0282 in gas phase and 0.2871, 0.0475, 0.0268, 0.0376 and 0.0097 in solvent, respectively (N=34). The results obtained using the GAANN method indicated that the activity of derivatives of sulfonamides depends on different parameters including DP03, BID, AAC, RDF035v, JGI9, TIE, R7e+, BELM6 descriptors in gas phase and Mor 32u, ESpm03d, RDF070v, ATS8m, MATS2e and R4p, L1u and R3m in solvent. In conclusion, the comparison of the quality of the ANN with different MLR models showed that ANN has a better predictive ability.
INTRODUCTION
The sulfonamide group is considered as a pharmacopoeia which is present in a number of biologically active molecules, particularly in antimicrobial agents.
^{1−5}
It is also present in inhibitors of carbonic anhydrase,
^{6−10}
anticancer
^{11}
and antiinflammatory agents,
^{12}
which are derivatives of sulfonamides.
Most diseases that involve Gprotein receptors in the central nervous system cause abnormal behavior, due to drug addiction to sulfonamides. Recent studies have shown that in regulating other receptors that interact with drug and other substance abuse, the opioid receptors play an important role.
^{13−16}
One of the most important aspects in chemometrics that provide important information useful for molecular design and medicinal chemistry is the Quantitative structure activity relationship (QSAR).
^{17−19}
QSAR models are mathematical equations that create a relationship between chemical structures and biological activities. The first step in the QSAR study is to find a set of descriptors with higher impact on biological activity.
^{20−23}
In QSAR models, a wide range of descriptors are used, which can be constitutional, geometrical etc.
Several QSAR studies
^{26,27}
have been carried out involving the use of an effective computational method to examine the inhibition mechanism.
In the present study, the multiple linear regressions (MLR) as linear models, and artificial neural networks (ANN), simulated annealing (SA) and genetic algorithm (GA)
^{21−25}
as nonlinear models were applied to investigate the QSAR in sulfonamide derivatives. Various QSAR models have been used to select the best descriptors for the important prediction of inhibitory activity of sulfonamide compounds, and then these models were compared.
THEORY AND COMPUTATIONAL METHODS
 General methods
The geometric optimizations of sulfonamide compounds were carried out using Gaussian 03W at B3lyp/631g.
^{28}
Polarized continuum model (PCM) was applied to consider the nonspecific solvent effect, and all molecules were optimized in H
_{2}
O solvent.
3226 molecular descriptors in topological, geometrical, MoRSE,
^{30,31}
RDF,
^{31,32}
GETAWAY,
^{33,23}
autocorrelations
^{34}
and WHIM
^{35, 36}
groups were calculated using the Dragon program.
^{29}
In three steps, the number of descriptors was reduced through an objective feature selection.
At first, in the dataset of sulfonamide compounds, the descriptors that had the same value of at least 70% were removed. and thereafter, the descriptors with correlation coefficient less than 0.25 with the dependent variable (log IC50) were considered redundant and removed.
^{37}
After these two steps, the number of descriptors was reduced to 1047 in the gas phase and 1110 in the solvent phase. Stepwise multiple linear regression procedure was used for rejection of descriptors. The QSAR method with high correlation coefficient (R), low standard deviation, least numbers of independent variables, high ability to predict and high F statistic value is an ideal method.
^{38}
The best subset of descriptors selected in (MLR) was fed into neural networks in the MLRANN method. The neural networks used in this study were all threelayer feedforward network. The networks were trained using the TSET members with LevenbergMarquart algorithm.
^{39}
In SAANN and GAANN methods, 1047 and 1110 descriptors in the gas and solvent phase were considered as possible input of the ANN and fed into the input layer of the ANNs in GAANN and SAANN models (
. 1
). All calculations in the present study were done in Matlab environment (V 7.12, The Mathworks,Inc), SA, GA and Neural Fitting toolbox.
The employed procedure for finding optimum descriptors of the ANN models.
The mean square error of all the models was calculated using the following equation:
where
y_{i}
is the desired output,
y_{o}
is the predicted value by model, and
n
is the number of molecules in this study’s data set.
RESULTS AND DISCUSSION
Thirty four different sulfonamide derivatives were selected as a sample set, and the geometry of the compounds was optimized using Gaussian 09W at B3LYP/631 g. All the optimized Sulfonamide compounds are shown in
. 2
.
Optimized structure of the compounds used to build QSAR models with B3lyp/631g in gas phase.
Linear and nonlinear feature selection methods, such as MLRMLR (stepwiseMLR), MLRANN, SAANN, MLRGA and GAANN, were used to select the most significant descriptor.
SPSS
^{40}
software was used for stepwise MLR models as shown in
10
. The RMSE in MLRMLR for predicted activity was found to be 0.39576 in gas phase and 0.2871 in solvent phase. Also, the correlation coefficient (R
^{2}
) calculated for the PSET was 0.8226 in gas phase and 0.90671 in solvent phase.
10
shows that MLRMLR method is better than other linear methods (MLRPLS1 and MLRPCR). The definition of the descriptors in the MLRMLR method is shown in
1
.
The best selected descriptors using MLRMLR method in gas phase
The best selected descriptors using MLRMLR method in gas phase
The descriptors, which were selected using the MLRMLR model were fed into the neural networks to establish the MLRANN model. In this model, the RMSE for predicted activity and TSET compounds were found to be 0.1006, 0.0475 and 0.1162, 0.0458 in gas and solvent phase, respectively (
9
).
To establish the SAANN, MLRGA and GAANN models, the 1047 and 1110 descriptors in gas and solvent phase were fed into the neural network to select the best descriptors, also 3 neurons in the hidden layer of the GAANN model were used in this study (
. 1
).
The descriptors, which were selected using the QSAR models are shown in
1

8
. These parameters relate the structure to the activity of the optimized compounds.
The best selected descriptors using MLRMLR method in solvent phase
The best selected descriptors using MLRMLR method in solvent phase
The best selected descriptors using SAANN method in gas phase
The best selected descriptors using SAANN method in gas phase
The best selected descriptors using SAANN method in solvent phase
The best selected descriptors using SAANN method in solvent phase
The best selected descriptors using MLRGA method in gas phase
The best selected descriptors using MLRGA method in gas phase
The best selected descriptors using MLRGA method in solvent phase
The best selected descriptors using MLRGA method in solvent phase
The best selected descriptors using GAANN method in gas phase
The best selected descriptors using GAANN method in gas phase
The best selected descriptors using GAANN method in solvent phase
The best selected descriptors using GAANN method in solvent phase
MATS5e and GATS2p (
1
and
2
), GATS3e and ATS4v (
4
), ATS8m, and MATS2e (
8
) are 2D autocorrelation descriptors. The 2Dautocorrelation descriptors explain how the values of certain functions, at intervals equal to the lag, are correlated.
^{41}
EEig0 (
1
), EEig13d (
2
and
5
) and ESPm03d (
8
) are Edge adjacency indices. The Edge adjacency relationships in molecular graphs have been used to define a new topographic index.
^{41}
RDF 130p, RDF 115v, RDF095v, RDF035v, RDF070v, and RDF 115p (
1
,
2
,
5
,
7
,
8
, and
6
) are RDF descriptors. The radial distribution function (RDF) descriptors are based on the distance distribution in the molecule.
^{42}
IC5 (
2
) and IC0 (
5
), and AAC (
7
) are information indices. The total information content (I) is obtained by multiplying the mean information content by the number of elements:
^{43}
G1s and G1v (
1
), L1m (
3
), KM (
6
), L1u (
6
), and TP (
5
) are WHIM descriptors. WHIM descriptors are built in such a way to capture the relevant molecular 3D information regarding molecular size, shape, symmetry and atom distribution with respect to invariant reference frames.
^{31}
R4e+ and R5p+ (
1
), R7u+ (
2
) and H6v, RTe, R6u+ (
3
), H5m (
4
), R7e+ (
7
) and R4p, R3m (
8
) are GETAWAY descriptors. GETAWAY (Geometry, Topology, and AtomWeights Assembly) descriptors encode the geometrical information obtained from the molecular matrix, the topological information obtained from the molecular graph and the information obtained from atomic weights, which are specially designed with the aim of matching the 3Dmolecular geometry.
^{31}
Mor08u, Mor17v, Mor23e (
2
) and Mor02v, Mor17u (
4
) and Mor 17e (
5
), and Mor 32u (
8
) are 3DMoRSE descriptors. The 3DMoRSE descriptors were obtained through the molecular transformation employed in electron diffraction studies.
^{43}
GGI6 (
3
), JGI9 (
7
) are topology charge indices. The Topological Charge Indices were proposed to evaluate the charge transfer between pairs of atoms and therefore, the global charge transfer in the molecule.
^{31}
MPC05 (
3
), MWC03 (
6
), BID (
7
) are walk and path count. The molecular walk count of kth order (MWCk) is the total number of walks of the kth length in the hydrogen suppressed molecular graph.
^{31}
F06[CC] (
6
) and F09[CO] are 2D frequency fingerprints descriptors. Fragment descriptors are representations of local atomic environments.
^{31}
BEHm6 (
3
), Belem (
4
), BEHp6 (
6
), and BELm6 are Burden eigenvalue descriptors. The B matrix has been defined as the number of atoms, bond order between two atoms or the electronegativity of the atoms.
^{31}
VED2 (
4
), Eig1m (
5
), and Adige (
6
) are eigenvalue based indices descriptors. The Eigenvalue Sum Descriptors are computed from Weighted Distance Matrices of a Hydrogendepleted Molecular Graph.
QYYM (
5
) and DP03 (
7
) are geometrical and Rancid molecular profiles. The Rancid molecular profile DP
_{k}
is derived from the distance distribution moments of the geometric matrix G as the average row sum of its entries raised to the k
^{th}
power and normalized by the factor k!.
^{31}
The geometrical variables incorporate information about the magnitude of the displacement between the molecular centroid (center of mass) and the polarizabilityfield (center of charge).
^{4}
SOK, TI2 (
6
), J (
5
) and TIE (
7
) are topological descriptors. Topological index mathematically encode information regarding the structure of molecules, which have been depicted as graphs and are often sensitive to size, shape, branching, cyclicity and, to a certain extent, the electronic characteristics of molecules.
^{31}
The statistical parameters of all QSAR models are shown in
9
and
10
. In train, a computation of 80% sulfonamid compounds is used. In the GAANN model, the RMSE and Rsquare were calculated as 0.0282 and 0.9716 in gas phase and 0.0097 and 0.9894 in the solvent phase, respectively, therefore, GAANN model was better than the other models and as such, only the descriptors used in this model were evaluated in this study. These descriptors are shown in
7
and
8
. The observed and predicted values of  logIC50 using Matlab program are shown in
11
and
12
. The plot showing the variation of observed versus predicted logIC50 values are shown in
. 3
and
4
.
Statistical parameters of different nonlinear QSAR models
Statistical parameters of different nonlinear QSAR models
Statistical parameters of different linear QSAR models in gas and solvent phase
Statistical parameters of different linear QSAR models in gas and solvent phase
Observed and predicted values of logIC50 by using GA ANN in gas phase
Observed and predicted values of logIC50 by using GA ANN in gas phase
Observed and predicted values of –logIC50 by using GAANN in solvate phase
Observed and predicted values of –logIC50 by using GAANN in solvate phase
Plot between observed vs predicted log (/IC50) by using GAANN descriptors in gas phase.
Plot between observed vs predicted log (IC50) by using GAANN descriptors in solvate phas.
The plots of the DP03, BID, AAC, RDF035v, JGI9, TIE, R7e+, and BELm6 descriptors (
. 5
) in the gas phase and Mor 32u, ESPm03d, RDF070v, ATS8m and MATS2e, R4p, L1u, and R3m descriptors in solvent phase (
. 6
) versus the experimental negative logarithm half maximal inhibitory concentration (logIC50) values were plotted using Excel program. The descriptors values in GAANN method in gas and solvent phase were normalized using the equation (2) in Excel program.
Plot between log IC50 experimental versus the DP03, BID, AAC, RDF035v, JGI9, TIE, R7e^{+}, and BELm6 normalized descriptors in the gas phase.
Plot between experimental log IC50 value versus the Mor32u, ESpm03d, RDF070v, ATS8m, MATS2e, R4p, L1u, and R3m normalized descriptors in the solvent phase.
The charts in gas phase show that the experimental negative logarithm half maximal inhibitory concentration (logIC50) value increases with increasing DP03 (Molecular profile no.3), BID (Balaban ID number), R7e+ (weighted by atomic Sanderson electronegativities), and BElm6 (Weighted by atomic masses) descriptors. Thus the half maximal inhibitory concentration (IC50) value is reduced. Therefore, the aforementioned descriptors are the best among the eight descriptors in the gas phase. As the RDF035v (weighted by atomic polarizabilities) descriptor increased, the experimental negative logarithm half maximal inhibitory concentration (logIC50) value decreased. In JGI9 (topological charge index) descriptor of about 0.8, response do not change. But between 0.81 values, an increased experimental negative logarithm half maximal inhibitory concentration rate is shown in the bar chart. As the TIE (Estate topological parameter) descriptor increased up to 0.4, the experimental negative logarithm half maximal inhibitory concentration (log IC50) value increased and then, the increased TIE (that are sensitive to size, shape, and the electronic characteristics of molecules) descriptor decreased the experimental negative logarithm half maximal inhibitory concentration value. Charts in solvent show that as Mor 32u (indicates that the size of the inhibitor molecule has certain effect on the extent of the interaction between the drug and molecule), RDF070v (weighted by atomic van der waals volumes), R4p (weighted by atomic polarizabilities), L1u (size direction index), and R3m (weighted by atomic masses) descriptors are increased, the experimental negative logarithm half maximal inhibitory concentration (logIC50) value is reduced. However, with an increase in ATS8m (BrotoMoreau autocorrelation of a topological structure), the amount of experimental negative logarithm half maximal inhibitory concentration is first increased and then reduced, and finally a sharp increase is achieved. In increased ESPm03d (Spectral momen 03 edge adj. matrix weighted by dipole moments), a constant process experimental negative logarithm half maximal inhibitory concentration (logIC50) is seen, and then subsequently increased. MATS2e (weighted by atomic Sanderson electronegativities) descriptor, which increased the amount of 0.8 changes in the experimental negative logarithm half maximal inhibitory concentration (logIC50), cannot be seen. But from 0.8 to 1, an increase probe was seen in the experimental negative logarithm half maximal inhibitory concentration (log IC50) value.
Selected descriptors that are common between all the QSAR methods are shown in
14
. The GETAWAY descriptors played an important role in predicting the logIC50 of Sulfonamide compounds. The plots of the GETAWAY descriptors versus the experimental negative logarithm half maximal inhibitory concentration (logIC50) values were plotted using Excel program.
Physicochemical descriptors in GAANN method in gas and solvent phase
Physicochemical descriptors in GAANN method in gas and solvent phase
The common selected descriptors using QSAR methods
The common selected descriptors using QSAR methods
. 7
shows that the logIC50 value increase with increasing R5p, R7u+, H5m, R7e+ descriptors. As the R3m descriptor increased the –log IC50 value decreased.
Plot between GETAWAY descriptors versus log (IC50).
However, with an increased in RTe and R6u+ descriptors the amount of logIC50 is first increased and then reduced (
Fig. 7
). In R4p descriptor the amount of logIC50 is first decreased and then increased. In increased R4e+, a constant process logIC50 is seen. R5p, H5m, R7e+, RTe, R4p, R3m are physicochemical descriptors and These are polarizability, weighted by atomic masses and sanderson electronegativities.
13
shows physicochemical descriptors in GAANN method in gas and solvent phase.the physicochemical descriptors were found to have an important role in change in activity (
. 5
,
6
). These descriptors reduce the half maximal inhibitory concentration (IC50).
Statistical parameter and QSAR model of the sulfonamide compounds from the previous literatures are presented on the
15
.
^{45−48}
It shows that the results of GAANN method in this work (
9
) is better than the other QSAR models in previous studies.
Statistical parameter and QSAR model from the previous literatures
Statistical parameter and QSAR model from the previous literatures
CONCLUSION
Among the QSAR models used in this study, the nonlinear feature selection models were demonstrated to be better than their linear methods, and the results of GAANN method were better than the other nonlinear models used. These results also proved that DP03, BID, AAC, RDF035v, JGI9, TIE, R7e
^{+}
, BELm6 descriptors in the gas phase and Mor32u, ESpm03d, RDF070v, ATS8m, MATS2e, R4p, L1u, R3m descriptors in the solvent phase were more significant than other descriptors in building this QSAR model and predicting the biological activity of Sulfonamides substitution patterns.
Acknowledgements
Publication cost of this paper was supported by the Korean Chemical Society.
Katzung B. G.
1995
Basic and Clinical Pharmacology
6th ed.
University of California
San Francisco
Joshi S.
,
Khosla N.
,
Tiwari P.
2004
In VitroStudy of Some Medicinally Important Mannich Bases Derived from an Antitubercular Agent
Bioorg. Med. Chem.
12
571 
DOI : 10.1016/j.bmc.2003.11.001
Anand N.
,
Wolff M. E.
1996
Burger’s Medicinal Chemistry and Drug Discovery
John Wiley & Sons Inc.
New York
Sulfonamides and Sulfones
527 
Kamal A.
,
Khan M. N. A.
,
Reddy K. S.
,
Rohini K.
,
Sastry G. N.
,
Sateesh B.
,
Sridhar B.
2007
Bioorg. Med. Chem. Lett.
17
5400 
DOI : 10.1016/j.bmcl.2007.07.043
Garaj V.
,
Puccetti L.
,
Fasolis G.
,
Winum J.Y.
,
Montero J.L.
,
Scozzafava A.
,
Vullo D.
,
Innocentia A.
,
Supurana C. T.
2004
Bioorg. Med. Chem. Lett.
14
5427 
DOI : 10.1016/j.bmcl.2004.07.087
Weber A.
,
Casini A.
,
Heine A.
,
Kuhn D.
,
Supuran C. T.
,
Scozzafava A.
,
Kiebe G.
2004
J. Med. Chem.
47
550 
DOI : 10.1021/jm030912m
Pubchem Home Page
https://pubchem.ncbi/nlm.nih.gov
Dhawan B. N.
,
Cesselin F.
,
Raghubir R.
,
Reisine T.
,
Bradley P. B.
,
Portoghese P. S.
,
Hamon M.
1996
Pharmacol. Rev.
48
567 
Janecka A.
,
Fichna J.
,
Janecki T.
2004
Curr. Top. Med. Chem.
4
1 
Putta S.
,
Eksterowicz J.
,
Lemmen C.
,
Stanton R.
2003
J. Chem. Inf. Comput. Sci.
43
1623 
DOI : 10.1021/ci0256384
Guha R.
,
Serra J. R.
,
Jurs P. C.
2004
J. Mol. Graph. Model.
23
Todeschini R.
Milano Chemometrics and QSAR Research Group
http://www.disat.unimib.it/chem
Todeschini R.
,
Consonni V.
2000
Hand Book of Molecular Descriptors
WileyVCH
Levenberg K.
1944
A Method for the Solution of Certain NonLinear Problems in Least Squares
Quarterly of Applied Mathematics
2
164 
SPSS (Version19)
http://www.sps ssc ien ce.com
Asadollahi T.
,
Dadfarnia S.
,
Mohammad A.
,
Shabani H.
,
Ghasemi J. B.
2014
MATCH Commun. Math. Comput. Chem.
71
287 
Strand website
www.strandls.com/sarchitect/.../desctheory
Sisodiya D.
,
Dashora K.
2014
Int. J. of Phyto. Pharm.
4
153 
Jaiswal D.
,
Karthikeyan C.
,
Shirastava S. K.
,
Trivedi P.
2006
Internet Electron. J. Mol. Des.
5
345 
Eroglu E.
,
Turkmen H.
,
Guler S.
,
Palaz S.
,
Oltulu O.
2007
Int. J. Mol. Sci.
8
145 
DOI : 10.3390/i8020145