Advanced
Model development in freshwater ecology with a case study using evolutionary computation
Model development in freshwater ecology with a case study using evolutionary computation
Journal of Ecology and Environment. 2010. Dec, 33(4): 275-288
Copyright ©2010, The Ecological Society of Korea
This is an Open Access article distributed under the terms of theCreative Commons Attribution Non-Commercial License(http://creativecommons.org/licenses/by-nc/3.0/)which permits unrestrictednon-commercial use, distribution, and reproduction in any medium,provided the original work is properly cited.
  • Received : July 07, 2010
  • Accepted : September 20, 2010
  • Published : December 01, 2010
Download
PDF
e-PUB
PubReader
PPT
Export by style
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Dong-Kyun Kim
School of Computer Science & Engineering, Seoul National University, Seoul 151-744, Korea
Kwang-Seuk Jeong
Department of Biology, Pusan National University, Busan 609-735, Korea
Robert Ian(Bob) McKay
School of Computer Science & Engineering, Seoul National University, Seoul 151-744, Korea
Tae-Soo Chon
Department of Biology, Pusan National University, Busan 609-735, Korea
Hyun-Woo Kim
Department of Environmental Education, Sunchon National University, Suncheon 540-742, Korea
Gea-Jae Joo
Department of Biology, Pusan National University, Busan 609-735, Korea
gjjoo@pusan.ac.kr

Abstract
Ecological modeling faces some unique problems in dealing with complex environment-organism relationships,making it one of the toughest domains that might be encountered by a modeler. Newer technologies and ecosystem modeling paradigms have recently been proposed, all as part of a broader effort to reduce the uncertainty in models arising from qualitative and quantitative imperfections in the ecological data. In this paper, evolutionary computation modeling approaches are introduced and proposed as useful modeling tools for ecosystems. The results of our case study support the applicability of an algal predictive model constructed via genetic programming. In conclusion, we propose that evolutionary computation may constitute a powerful tool for the modeling of highly complex objects, such as river ecosystems
Keywords
INTRODUCTION
Diverse ecosystem phenomena arising from combinations of living organisms and their interactions with the physical environment are highly nonlinear, very complex,and frequently chaotic (Fielding 1999). In a Newtonian physical simulation of a thrown ball, for example, it is necessary to incorporate factors such as the effects of gravity, the mass of the ball, etc. In many circumstances,we can ignore aspects such as air density, wind, etc. In other circumstances (golf, baseball), however, these factors may prove important. Generally, we know with a fair degree of accuracy what must be included and what must be omitted to construct a model with the required level of accuracy, according to the degree of relevance to the issue. By way of contrast, we frequently possess little of this type of knowledge in the study of ecology, which is not the case in physics or chemistry
Hence, it can prove quite difficult to forecast and explain the broad variety of environmental aspects and their emergent behaviors in ecosystems, especially as compared to other existing scientific systems. Ecological modeling faces some unique problems in dealing with complex environment-organism relationships, and is one of the toughest domains that might be encountered by a modeler. The relevant difficulties derive both from the complexity of the systems being modeled and the quality and quantity of data available for model development(Shan et al. 2006).
However, this is not the only reason that ecological modeling is difficult. The data, too, may introduce some difficulty. Ecological data is frequently both rough and noisy, particularly when it is sampled from the field. Field sampling is generally expensive, since it is often collected by hand. The data is frequently sparse (missing) and/or collected in an irregular fashion, owing to exceptional conditions including illness, equipment failure, or holidays. As reported previously by Lek (2007), ecological data frequently contains sampling errors and measurement and intermittent estimation mistakes, thereby introducing uncertainty into the resultant models. Moreover, the sources of errors themselves may be biased, thus creating errors that are correlated with the measurements.
Although the development of sampling and measuring technologies for data collection has ameliorated these problems to some degree, many ecological datasets have been collected over periods of multiple years, and these changes have had limited impact thus far. The issues relevant to coping with imperfect data remain very important in the field of ecological modeling. Moreover, additional difficulties arise from the large numbers of variables relative to the number of instances in ecological datasets ? because we do not generally know which relationships are important, we tend to include all available measurements, rather than risk omitting an important one. Consequently, the redundant variables may create additional difficulties in the development of automated modeling methods.
A broad variety of methods have been employed in the development of such models, ranging from classical mathematical modeling (Recknagel and Benndorf 1982, Chapra and Reckhow 1983) to evolutionary computation (EC) (Kim et al. 2007b, Cao et al. 2008). EC can be used to create automatic functions or models, producing diverse candidates with a nonlinear computational structure; EC, as well as artificial neural networks (ANN), has yielded promising results in terms of the prediction certain environmental phenomena in ecological research (Recknagel et al. 2002, Cho and Sung 2004, Park et al. 2006a). In this paper, we discuss the relevance of EC to ecological modeling, illustrating it with an application to water quality modeling, and specifically to plankton population dynamics.
The remainder of this paper is structured as follows. First, we detail the relevant background of ecological modeling, describing the wide range of techniques that have been used thus far. We then attempted to identify the appropriate situations for the use of ecological modeling. We described some nature-inspired computational methods (of which EC constitutes a sub-class). We then investigated the important considerations to be taken into consideration in the development of an ecological model. We illustrate this via specific applications to water quality, and then conclude with a discussion of the applicability of EC to ecological modeling.
NECESSITY OF ECOLOGICAL MODELING
- Why is ecological modeling special?
The model can be broadly defined as a specific representation of a system, in which each component involves a combination of relationships and interactions. In some cases, the models do not reflect the full mechanisms of the dynamic and integrated systems ? relatively simple model approaches such as regression, logistic-type models and predator-prey models may be employed in order to gain insight into general principles and probabilities (Lotka 1925, Volterra 1926, Schaefer 1968, Boerema and Gulland 1973, Cloern 1996). However, the ultimate objective of almost all ecological model construction is the construction of a system that can reproduce and simulate patterns of outcomes. Thus, the constructed models must be sufficiently sophisticated to accurately represent the target system, with the additional assumption that all of the knowledge is suited to the representation. Such models can be employed in the interpretation of general possibilities or the prediction of outcomes for particular populations, communities, or ecosystems.
Initially, ecosystems researchers engaged in great debates as to whether in vitro or in vivo investigations were more appropriate for ecosystems research. Although both experimental approaches require more time and effort than mathematical or other theoretical approaches, such experiments do not guarantee a high probability that the system’s performance will be satisfactory. This has compelled researchers to search for methods capable of representing the target system in ways suitable to the principal objectives of ecological modeling.
Many approaches to ecosystems modeling have been developed that reproduce a system and reveal interactions and relationships, particularly when other experimental approaches prove impossible or impractical. Since Eugene Odum introduced theoretical modeling methods for use in systems ecology (Odum 1983), a number of models have been constructed in efforts to elucidate ecological processes more accurately. Jørgensen (1992) previously proposed the concept of exergy, as well as methods for computing ecosystem quality, to better understand the information level and interactions between ecological theory and the models. Deaton and Winebrake (2000) previously surveyed a variety of dynamic models that could be applied to environmental systems to model growth patterns, coupled predator-prey populations, water pollution, global warming, and so forth.
- Ecological issues for freshwater systems in South Korea
These recent developments in modeling techniques have been previously applied to a case study examining algal communities in freshwater ecosystems. In Korea, modeling has been more frequently applied to the fields of hydrology and hydraulics than to limnology and freshwater ecology (Park and Lee 2002, Cho and Sung 2004). In this paper, we demonstrate the application of EC to ecological analysis and modeling in the context of Korean freshwater ecosystems.
The majority of freshwater ecosystems in South Korea no longer bear any resemblance to natural streams or lakes. They have generally been heavily modified by physical alterations, including dam construction and estuarine barrages (Kim et al. 1998, Kim et al. 2004). Trophic states are largely nutrient-enriched due to the approximately forty million people residing within this relatively small area (Joo et al. 1997). Additionally, climate characteristics, particularly the biased rainfall pattern (rainy summer and dry winter), are known to accentuate the effects of this freshwater eutrophication. Korean freshwater ecosystems, therefore, differ profoundly from, and perhaps are more complex to model than some other modified ecosystems.
MAJOR APPROACHES TO ECOLOGICAL MODELING
- Conventional modeling
Statistical methods have been extensively employed for the analysis of datasets across different scientific regimes. In the field of ecological research, statistical analysis has given rise to the increasingly important field of biostatistics (Zar 1999). In the infancy of this discipline, readily applicable linear and statistical approaches were employed to isolate and identify significant ecosystem properties. In particular, many ecologists have analyzed their experimental data primarily via multivariate analyses such as principal component analysis (PCA) and canonical correspondence analysis. These ordination methods have commonly been employed in efforts to simplify the aquatic ecology data (Magadza 1980, Matta and Marshall 1984, van Tongeren et al. 1992, ter Braak and Verdonschot 1995, Romo et al. 1996). The limitations of these methods have been well established (e.g., horseshoe and arch effects). However, we do not discuss this in depth herein, since EC seldom deals with ordination methods, especially in ecological areas.
Second, a variety of time-series analyses have also been employed. In statistical approaches, multivariate linear regression (MLR) methods are probably the most popular. However, they are limited in several ways, including the presence of strong distortion deriving from nonlinear relations attributable to outliers, heteroscedasticity, and colinearity (Zuur et al. 2009). Among more advanced linear methods, an autoregressive model is a type of random process employed in the prediction of certain types of values and phenomena. AutoRegressive (integrated) moving averages (ARMA/ARIMA) are representatives, which are used for the prediction of continuous values, particularly in time-series analyses. Harding and Perry (1997) predicted a long-term increase in phytoplankton biomass using ARMA, and Mishra and Desai (2006) conducted comparative experiments between linear statistical models and neural networks to forecast droughts on the basis of the precipitation index of the river basin. Recently, Jeong et al. (2008) also compared forecasting performances between ARIMA and autoregressive ANN in predicting chlorophyll a. Generally, these approaches appear to have a somewhat limited ability to capture non-stationary and nonlinear peaks in ecological data. Consequently, ecologists searching for better prediction methods have become increasingly interested in artificial intelligence methods, which are able to deal with data in highly nonlinear structures.
In addition to linear statistical approaches, mathematical and numerical modeling techniques provide some of the most common tools used for the quantitative description of a system, frequently relying on mass balance equations. In these models, all components employed to represent and evaluate the system are described in the initial stages of model construction. Each component of the system interconnects and interacts with others in the model, based on known causal relationships; the succession of the resultant values generates the results. The majority of such models are deterministic models, which are represented as individual-based and object-oriented processes. Commonly, such models consist of a set of ordinary differential equations that model the dynamic system. For example, Odum (1983) previously introduced and exemplified many types of deterministic models to represent virtual ecosystems. In freshwater systems, a plethora of water quality models have already been designed and developed. Hakanson and Boulion (2003) presented a general dynamic model to predict phytoplankton biomass and production, and Arhonditsis and Brett (2005) developed a more complex model that incorporated phyto- and zooplankton in Lake Washington. For assessments of streams and rivers, QUAL2E is one of the most popular water quality models (Brown and Barnwell 1987). However, this technique has had some difficulties in cases in which the errors between predicted and observed values have been too large for direct application to target river systems. Hence, Park and Lee (2002) added some tuning parameters, such as autochthonous sources, in order to improve their model predictions. Nonetheless, this technique is still limited in terms of its ability of predict specific values (e.g. Biochemical Oxygen Demand and chlorophyll a) relevant to water quality, particularly in regulated river systems (Choi et al. 2008). In addition to these QUAL-based models, POTAMON is a unidimensional, non-stationary model that was designed to simulate potamoplankton. This is a more biologically friendly technique than QUAL2E, but does not reduce the errors inherent to the prediction of real observed values (Everbecq et al. 2001).
- Empirical modeling
The rapid advance of computer science has ushered in a host of new technologies relevant to a broad range of sciences since the 1990s. Newer technologies and paradigms of ecosystem modeling have been proposed, aiming to reduce the uncertainty in models arising from qualitative and quantitative imperfections in the ecological data (Lek 2007). With the advent of computer-based modeling, data-collecting systems have also been developed and larger quantities of data have become available. This phenomenon has grown to encompass and delineate a wholly novel research field, referred to as ecological informatics (Recknagel 2006).
Computational algorithms take advantage of quick iterative calculations conducted with large volumes of data. Generally, empirical computational ecosystem models are designed to derive the best-fitting representation for an ecological dataset via a training and validation process (Fielding 1999). As many empirical computational models are constructed via data learning, they also fall under the rubrics of ‘machine learning’, ‘inductive model’ or ‘data-driven model’ (Recknagel 2006). Some representative examples include ANN, EC, decision tree models, fuzzy logic, etc. (Silvert 1997, Whigham and Recknagel 2001a, Goethals et al. 2003, Shan et al. 2006).
Among these, ANN and EC may be classified as biologically inspired methods, and ecological scientists have begun to take increasing interest in applying them to ecosystem modeling. Recknagel (2001) previously demonstrated some useful empirical models for ecological time-series modeling, emphasizing the limitation in the complexity of deductive ecological models with their rigid structures. Jeong et al. (2003) described an empirical predictive model in a comparison between statistical linear models and evolutionary computation. Kim et al. (2007a) also interpreted ecological significance on the basis of an empirical predictive model
EVOLUTIONARY COMPUTATIONS AND RELATED RESEARCH
Genetic algorithms (GA) are a mechanism originally inspired by natural evolution (Holland 1975), which operate on strings of bits that are analogous to chromosomes. One unique attribute of the GA is that it adopts the evolutionary mechanisms of heritable variation and selection. Crossover and mutation processes in the GA cause variations in the population (chromosomes) over time. The individuals with poor fitness are excluded in the selection of the next generation’s parents. A near-optimal solution eventually results from the iterated application of these mechanisms.
Genetic programming (GP) is an extension of the GA concept, in which the individuals exhibit a more complex (labeled tree) structure, thereby allowing them to reflect more complex target solutions (Koza 1992), comparable with ANN. This may ease the process of creating new offspring populations from the two parents. New populations are generated by removing a branch from one tree and inserting it into another, or replacing it with a whole new branch, by analogy with genetic operators such as crossover and mutation ( Fig. 1 ). The overall procedure of the GP is described in Fig. 2 . Population size, P( t ), refers to the initial number of candidate tree models at time t. Better individuals in the population are selected via reproduction and genetic operations. A single cycle of this process is referred to as a generation, with the computation eventually halting when a predetermined maximum number of generations is reached.
At the termination of the computation, GP supplies labeled tree structures that can, in principle, be under-
Lager Image
Basic structure and evolutionary principle of genetic programming (letters from a to e imply constant parameters and Vi means variable parameter for inputs).
Lager Image
Computational procedure of genetic programming.
stood by the user. This is an advantage of GP in terms of the readability of the model, whereas ANNs are a black-box model (their meaning is not readily comprehensible to humans). Nonetheless, ANNs have been utilized more extensively in ecological research (Lek et al. 1996, Chon et al. 2001, Park et al. 2006b, Goethals et al. 2007), and relatively few ecologists have presented the results of predictive modeling using GP (Savic et al. 1999, Whigham and Recknagel 2001a).
Table 1 presents some environmental and ecological research related to the applications of EC. Internationally, EC has been fairly broadly employed in environmental research. In particular, GA has been generally perceived as a favorable tool for parameter optimization in the engineering field, and has consequently come into common use for the constant fitting of complex structured models such as QUAL2K (Pelletier et al. 2006, Cho and Lee 2009). This methodology has been recently adopted for model optimization in Korea (Cho et al. 2004) and utilized for operational purposes in management policy (Lee and Chung 2004, Park et al. 2006a). Nonetheless, the applications of this technique in biological research are far fewer than those possible at an international level. Moreover, it appears that GA is more familiar to domestic researchers than is GP. GP has been used only rarely in the environmental engineering field, although its solutions are more transparent and extensible than GA. In rainfall-runoff modeling, GA-optimized tank structured (Paik et al. 2005) and GP-based self-automated models (Khu et al. 2001, Rabunal et al. 2007) have been used in both domestic and international research.
Comprehensive environmental and ecological research in relation to evolutionary computations
Lager Image
Comprehensive environmental and ecological research in relation to evolutionary computations
COMPARATIVE ADVANTAGES OF DIFFERENT MODELING APPROACHES
In assessing specific phenomena and ecological events, we must first gain insight into the properties of the different potential modeling methods. In this section, we compare the characteristics of each modeling method, delineating the advantages and disadvantages of the methods.
Statistical models and analyses are the most commonly used tools in many scientific disciplines. They are predicated on simple statistical relationships (generally correlations) between important parameters ? most often linear, commonly also polynomial or logarithmic, but always in a pre-defined simple form. MLR models have been broadly employed for the prediction of responses to independent effects. However, ecological datasets frequently contain many variables, particularly relative to the total number of instances; however, too many variables can conceal causal relationships, confusing attempts to extract them via automated methods. Thus, it has been known for some time that the limitation of classical statistical models to the extraction of linear relationships meant that these models might miss important nonlinear relationships in ecosystems (Lek et al. 1996, Jeong et al. 2003).
Mathematical mechanistic models are used to construct a representation of the ecosystem on the basis of known physical principles, most commonly the mass balance between various components within the ecosystem boundaries. In mechanistic models, it is important to model all relevant components within the system (otherwise, the assumption of mass balance may be invalid). Such models have been particularly favored for decision-making by managers and administrators in the field of water resource operations, owing primarily to the completeness of the models; this means that very flexible operation, extrapolating beyond the range of previous data, might prove possible. However, they commonly evidence very complicated architectures. As with statistical
Comparison between conventional models and evolutionary computation
Lager Image
Comparison between conventional models and evolutionary computation
models, prediction uncertainty is apt to be large, owing to a lack of knowledge regarding the non-mass-balance components of the ecosystem. Generally, the prediction accuracy is not sufficiently high for practical applications. Thus, determining how to incorporate the benefits of mechanistic models, while dealing with the uncertainty and nonlinearity of ecological data, is one of the most important issues in the field of ecological modeling.
By way of contrast with the above methods, empirical computational models can be employed in constructing a representation of an ecosystem on the basis of the observed data. Their primary objective is usually to find the optimal model structure for the target ecosystem (‘best’ is usually taken to mean ‘lowest predictive error’) based on computations and reasoning from large quantities of data. The higher level of automation makes it feasible for end users to select and apply the most appropriate methods. In this regard, machine learning (ML) techniques are employed in order to extract information regarding the relevant interactions and relationships between environmental entities, through the optimization of a model to fit the target ecosystem. A major premise in this regard is that data is inherently noisy, and thus this noise may mask weaker relationships within the data, thus making the development of a perfect and complete ecosystem model impossible; these methods are premised on finding the best model justified by the specific data available. These methods are also thought to be particularly useful when the important relationships within the target ecosystem are not fully known, or are too complicated to represent in a model, or when the quantity and quality of the data are insufficient for the construction of a complete representation of the system ( Table 2 ).
CASE STUDY: WATER QUALITY PREDICTION IN THE LOWER NAKDONG RIVER
- Site description and methods
The study site (Mulgeum) was located within the lower part of the Nakdong River, the longest (ca. 525 km) river in South Korea ( Fig. 3 ). The trophic state of the river is a persistent eutrophic level (chlorophyll a: 40 μg/L) throughout the year, except during the summer heavy rainfall season. Algal proliferations comprise two severe problems: 1) summer cyanobacterial blooms and 2) winter diatom blooms (Ha et al. 1999, Ha et al. 2003). Large populations of people also reside in this area, and thus demand for water resources availability is relatively high.
A total of 17 input variables were used to generate a one-week-ahead predictive GP model to forecast algal
Lager Image
Site location in the Nakdong River.
abundance. Hydrological and meteorological data (flow rate, 4 dam discharges and rainfall) were acquired from the Korean Water Management Information System, and other data (water temperature, dissolved oxygen, pH, Secchi disc depth, conductivity, alkalinity, turbidity, nitrate, phosphate, silica and nitrogen:phosphorus ratio)
Data used in evolutionary computation modeling (N = 782)
Lager Image
Data used in evolutionary computation modeling (N = 782)
were collected and measured via field sampling ( Table 3 ). The concentration of chlorophyll a was employed as a proxy for algal abundance as the output measure. Data from 1994 to 2008 were used for model construction (N = 782).
In this study, we employed a GP program in the C++ language, which was originally designed by Cao et al. (2006). One key issue in this type of time-series prediction is how to allocate data to training and the testing of the model (the two need to be kept separate for fair validation). We employed 702 data instances for training, with the remaining 80 reserved for testing; the partition of the data was conducted using the bootstrapping method (Adams et al. 1997) per trial (200 runs) to avoid tedious k-fold cross-validation. The initial population size was fixed at 5,000, and the maximum tree depth (length of the model structure, i.e., limit on model complexity) was 5. The GP system was allowed to construct solutions however it liked using the standard arithmetic operators (+, -, *, \) along with the exponential and logarithmic functions, arithmetic relations (>, =, <) and the Boolean if then else construct. Each GP run continued for a total of 100 generations, for a total of 200 runs overall. The root mean squared error (RMSE) was used as the fitness function in this experiment.
RESULTS AND DISCUSSION
The best predictive model was generated via selection by both RMSE and the determination coefficient (r 2 ). The optimal model contained eight input variables, and was as follows:
Lager Image
Where, WT: water temperature
  • chl.a: chlorophylla
  • DO: dissolved oxygen
  • FL: flow rate
  • AD: Andong dam discharge
  • Se: Secchi disc depth
Among our conditional criteria, water temperature (WT) was selected, as the pattern of chlorophyll a concentration is affected profoundly by temperature. At high temperatures, rule-based expression was rather simple, whereas more complicated expression patterns were produced by GP for normal and lower temperature ranges.
The overall prediction error was 31.32 (RMSE) with r2 = 0.45. Note that random data partitioning between the training and test was used. Fig. 4 shows the comparison between the observed and predicted values for chloro-
Lager Image
Comparative result between observed and predicted data for algal dynamics
phyll a concentration. Although the predicted peak values were generally slightly underestimated, the model does accurately depict the dynamic pattern of chlorophyll a and also accurately matches the timing. If we regard 40 μg/L and a high eutrophic level and as the critical indicators for water quality deterioration at the study site, the predictive model performs with an accuracy of 82.5% (212 of 257 cases) when employed as an early warning system for the management of the river ecosystems. Additionally, the stability of the model predictions should be taken into consideration when assessing the application of the models. Error ranges for the predictive models generated by GP were 32.2 ± 1.5 (mean ± standard deviation) for training and 37.1 ± 12.9 for test ( N = 200).
From the GP predictive models, we can observe that input variable selection provides important information. The frequency with which GP selects a variable in model construction is a good proxy for the degree of influence it exerts (Kim et al. 2007a). The selection frequencies for the different variables are quite diverse; their distribution is presented in Fig. 5 . WT was most frequently selected, whereas pH and Secchi depth were also included in more than 50% of the models. In fresh water, these factors are highly influential on chlorophyll a concentrations, because algal growth is regulated markedly by temperature and light intensity (Jeong et al. 2001). Additionally, the more frequent selection of silica than nitrogen and phosphorus observed in this study is consistent with ecological knowledge regarding the reoccurrence of winter diatom blooms in the lower Nakdong River (Ha and Joo 2000). Besides, in reference to equation 1, lower silica
Lager Image
Selectivity of input variables in the genetic programming predictive models. FL flow rate; AD Andong dam discharge; IH Imha dam discharge; NG Namgang dam discharge; HC Hapcheon dam discharge; Ra rainfall; WT water temperature; DO dissolved oxygen; Se Secchi depth; Co conductivity; Al alkalinity; Tu turbidity; NO nitrate; Si silica; PO phosphate; NP nitrogen:phosphorus ratio.
concentrations result in increased algal biomass. Kilham et al. (1986) previously stressed that Stephanodiscus species ? a predominant diatom in the Nakdong River ? required a high supply rate of phosphorus, but could grow successfully under low silica and light conditions, although diatom species employ silica to build their shells (frustule). However, it is also reasonable to assume that high silica consumption induces increasing algal concentrations, particularly winter diatom species, as there is a time lag for algal growth via nutrient absorption (Kim et al. 2007a). Although we can understand and explain this effect, the importance of silica was somewhat counter to our expectations. This highlights one important advantage of GP: it can be used to extract unexpected information via learning in data-driven modeling.
With regard to predictability, the most significant issue is how to acquire larger quantities of higher quality data. The data quality issue is related directly to how we can obtain data from stable analytical methodologies (i.e., high consistency in monitoring and measuring). In addition to the qualitative issue, empirical models such as EC require large quantities of data for data learning/training ? perhaps larger quantities than are required for other methods. A great deal of time may be required to gain sufficient data using traditional methods, but we anticipate that the rapid development of ecological monitoring and analysis systems will help to remedy this problem before too long. Data cleaning is a favorable option not only for the extraction of potentially useful information, but also for the removal of outliers and noise from data. Consequently, it should prove possible to reduce predictive errors through the appropriate data cleaning techniques.
APPLICABILITY OF INTEGRATED MODELS IN FUTURE ECOSYSTEMS RESEARCH
In ecological research, data accumulation is accelerating precipitously, as the measuring equipment used for ecosystems is under rapid and continuous development. A broad variety of tools and techniques for the analysis and assessment of ecological properties are continuously being created and deployed. Although we introduced a variety of analytical methodologies and categorized them, we are currently unable to pre-determine a specific framework of modeling approaches for a particular range of ecosystems. Each modeling approach has some useful properties for the analysis of a target ecosystem, which may prove valuable in the interpretation and understanding of that ecosystem. For instance, in a comparison between linear (PCA and correspondence analysis) and nonlinear methods (self-organizing map, SOM), it may prove desirable to employ nonlinear methods in ecological patterns to prevent horseshoe (PCA) and arch effects (CA), but alternatives such as SOM do not allow for the control of gradient directions (Giraudel and Lek 2001). Consequently, a combination or fusion of analytical techniques is desirable, particularly in the patterning and clustering of the structures of ecosystem populations and communities.
The applicability of different modeling methodologies is a matter under continual discussion, regardless of whether deductive or inductive approaches are employed. Previously, conventional modeling techniques involved a variety of standardized mathematical and stochastic methods, such as differential equations, multivariate statistics, and regression models, whereas recent modeling approaches have been biased toward heavily computational models based on data warehousing and biologically inspired algorithms (Dolk 2000, Recknagel 2006). Additionally, a few ecological scientists have reported some promising results via hybrid approaches. Hybrid evolutionary algorithms, in which rule sets and algebraic equations define the model architecture but the content is selected via evolution, have been employed in the prediction of chlorophyll a concentrations in rivers and lakes (Kim et al. 2007a, Welk et al. 2008). Atanasova et al. (2006) reported good simulation results for chlorophyll a in Lake Kasumigaura using an assembly of ODEs. Additionally, some of the generic lake models (SALMO and Lake Washington model) have been upgraded and updated via GP techniques (Cetin et al. 2005, Cao et al. 2008).
In studies of South Korean freshwater ecosystems, ecological scientists have undertaken only a limited amount of modeling via comparison with the data of hydrological engineers. Thus far, the majority of such research has been biased toward specific analysis methods, particularly statistically based approaches (Yoo 2002, An et al. 2006, Kim et al. 2007c). Mechanistic models have been employed in a few applications, and these have focused principally on pollutant transportation (Shim et al. 1995, Park and Lee 2002). However, these models regarded the physicochemical impacts as more important than the biological influences. However, in the lakes and regulated rivers of South Korea, grazing activity by zooplankton is a critical component in determining water quality during the dry winter period (Kim et al. 2000). Although the modified QUAL2E (QUAL-NIER) incorporated 31 variables in the model, zooplankton activity is not one of them (Choi et al. 2008). Comparatively, in regard to the use of empirical modeling approaches, only a few ML techniques have been applied thus far to the prediction of population and community dynamics in stream and river ecosystems (Chon et al. 2000, Jeong et al. 2006); the numbers of such studies are relatively small compared to other countries
This imbalance in model application may limit future scientific research. Thus, interdisciplinary collaborations may prove an effective solution for understanding and improving ecological modeling. In turn, the development of better ecological models is expected to allow for the development of effective and efficient strategies for water resource management.
Acknowledgements
This work was supported by a research project (355-2008-1-C00047) of the Korea Research Foundation during Dr. Kim, Dong-Kyun’s postdoctoral fellowship. This study was also partially supported by the LTER Programme of the Ministry of Environment in the Nakdong River. The authors are grateful to Dr. Cao, Hongqing for sharing the GP programming source code and to Mr. Oh, Insoo for revising and updating the code. The Seoul National University Institute for Computer Science and Technology provided facilities for the research.
References
Adams DC , Gurevitch J , Rosenberg MS 1997 Resampling tests for meta-analysis of ecological data. Ecology 78 1277 - 1283    DOI : 10.1890/0012-9658(1997)078[1277:RTFMAO]2.0.CO;2
Ahmed JA , Sarma AK 2005 Genetic algorithm for optimal operating policy of a multipurpose reservoir Water Resour Manag 19 145 - 161    DOI : 10.1007/s11269-005-2704-7
An KG , Park SJ , Choi SM , Park JS 2006 Comparative analysis of long-term water quality data monitored in Andong and Imha Reservoirs. Korean J Limnol KOI: KISTI1.1003/JNL.JAKO200618317186401 39 21 - 31
Arhonditsis GB , Brett MT 2005 Eutrophication model for Lake Washington (USA): part I. model description and sensitivity analysis. Ecol Model 187 140 - 178    DOI : 10.1016/j.ecolmodel.2005.01.040
Atanasova N , Recknagel F , Todorovski L , Dzeroski S , Kompare B 2006 Computational assemblage of ordinary differential equations for chlorophyll-a using a lake process equation library and measured data of Lake Kasumigaura. 409-428 Springer-Verlag Berlin In: Ecological Informatics: Scope Techniques and Applications (Recknagel F ed).
Bobbin J , Recknagel F 2001 Knowledge discovery for prediction and explanation of blue-green algal dynamics in lakes by evolutionary algorithms Ecol Model 146 253 - 262
Boerema LK , Gulland JA 1973 Stock assessment of the peruvian anchovy (Engraulis ringens) and management of the fishery J Fish Res Board Can 30 2226 - 2235
Brown LC , Barnwell TO Jr 1987 The Enhanced Stream Water Quality Models QUAL2E and QUAL2E-UNCAS: Documentation and User Manual. U.S. Environmental Protection Agency Athens GA. EPA/600/3-87/007.
Cao H , Recknagel F , Cetin L , Zhang B 2008 Process-based simulation library SALMO-OO for lake ecosystems: part 2. multi-objective parameter optimization by evolutionary algorithms Ecol Inform 3 181 - 190    DOI : 10.1016/j.ecoinf.2008.02.001
Cao H , Recknagel F , Joo GJ , Kim DK 2006 Discovery of predictive rule sets for chlorophyll-a dynamics in the Nakdong River (Korea) by means of the hybrid evolutionary algorithm HEA. Ecol Inform 1 43 - 53    DOI : 10.1016/j.ecoinf.2005.08.001
Cetin L , Zhang B , Recknagel F 2005 Process-based simulation library SALMO-OO for lake ecosystems. 318-324 Melbourne International Congress on Modelling and Simulation 2005 Dec 12-15
Chapra SC , Reckhow KH 1983 Engineering Approaches for Lake Management. Vol. II: Mechanistic Modeling Butterworth Publishers Boston MA.
Cho JH , Lee CH 2009 Parameter optimization of QUAL2K using influence coefficient algorithm and genetic algorithm J Environ Impact Assess KOI: KISTI1.1003/JNL.JAKO200910103465200 18 99 - 109
Cho JH , Sung KS 2004 A study on the river water quality management model using genetic algorithm J Korean Soc Water Wastewater 18 453 - 460
Cho JH , Sung KS , Ha SR 2004 A river water quality management model for optimising regional wastewater treatment using a genetic algorithm J Environ Manag 73 229 - 242    DOI : 10.1016/j.jenvman.2004.07.004
Choi JK , Chung S , Ryoo JI 2008 Comparative evaluation of QUAL2E and QUAL-NIER models for water quality prediction in eutrophic river. J Korean Soc Water Qual 24 54 - 62
Chon TS , Park YS , Cha EY 2000 Patterning of community changes in bentic macroinvertebrates collected from urbanized streams for the short term prediction by temporal artificial neuronal networks. Springer Berlin In: Artificial Neuronal Networks: Application to Ecology and Evolution (Lek S Guegan JF eds).
Chon TS , Kwak IS , Park YS , Kim TH , Kim Y 2001 Patterning and short-term predictions of benthic macroinvertebrate community dynamics by using a recurrent artificial neural network. Ecol Model 146 181 - 193    DOI : 10.1016/S0304-3800(01)00305-2
Cloern JE 1996 Phytoplankton bloom dynamics in coastal ecosystems: a review with some general lessons from sustained investigation of San Francisco Bay California Rev Geophys 34 127 - 168    DOI : 10.1029/96RG00986
Deaton ML , Winebrake JJ 2000 Dynamic Modeling of Environmental Systems Springer-Verlag New York NY.
Dolk DR 2000 Integrated model management in the data warehouse era Eur J Oper Res 122 199 - 218    DOI : 10.1016/S0377-2217(99)00229-5
Dorado J , Rabunal J , Puertas J , Santos A , Rivero D 2002 Prediction and modelling of the flow of a typical urban basin through genetic programming 190-201 Springer Berlin In: Applications of Evolutionary Computing (Cagnoni S Gottlieb J Hart E Middendorf M Raidl G eds)
Everbecq E , Gosselain V , Viroux L , Descy JP 2001 Potamon: a dynamic model for predicting phytoplankton composition and biomass in lowland rivers Water Res 35 901 - 912
Fielding AH 1999 An introduction to machine learning methods Kluewer Academic Publishers Norwell MA In: Machine Learning Methods for Ecological Applications (Fielding AH ed).
Giraudel JL , Lek S 2001 A comparison of self-organizing map algorithm and some conventional statistical methods for ecological community ordination Ecol Model 146 329 - 339
Goethals P , Dedecker A , Gabriels W , De Pauw N 2003 Development and application of predictive river ecosystem models based on classification trees and artificial neural networks. 91-107 Springer-Verlag New York NY In: Ecological Informatics (Recknagel F ed).
Goethals PLM , Dedecker AP , Gabriels W , Lek S , De Paw N 2007 Applications of artificial neural networks predicting macroinvertebrates in freshwaters Aquat Ecol 41 491 - 508    DOI : 10.1007/s10452-007-9093-3
Ha K , Joo GJ 2000 Role of silica in phytoplankton succession: an enclosure experiment in the downstream Nakdong River (Mulgum). Korean J Ecol KOI: KISTI1.1003/JNL.JAKO200011921336281 23 299 - 307
Ha K , Jang MH , Joo GJ 2003 Winter Stephanodiscus bloom development in the Nakdong River regulated by an estuary dam and tributaries. Hydrobiologia 506 221 - 227    DOI : 10.1023/B:HYDR.0000008564.64010.4c
Ha K , Cho EA , Kim HW , Joo GJ 1999 Microcystis bloom formation in the lower Nakdong River South Korea: importance of hydrodynamics and nutrient loading Mar Freshw Res 50 89 - 94    DOI : 10.1071/MF97039
Hakanson L , Boulion VV 2003 A general dynamic model to predict biomass and production of phytoplankton in lakes Ecol Model 165 285 - 301    DOI : 10.1016/S0304-3800(03)00096-6
Dorado LW , Rabuñal ES , Puertas , Santos , Rivero 1997 Long-term increase of phytoplankton biomass in Chesapeake Bay 1950-1994 Mar Ecol Prog Ser 157 39 - 52    DOI : 10.3354/meps157039
Holland JH 1975 Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology Control and Artificial Intelligence University of Michigan Press Ann Arbor MI.
Icaga Y 2005 Genetic algorithm usage in water quality monitoring networks optimization in Gediz (Turkey) river basin. Environ Monit Assess 108 261 - 277    DOI : 10.1007/s10661-005-4328-z
Jeong KS , Recknagel F , Joo GJ 2006 Prediction and elucidation of population dynamics of a blue-green algae (Microcystis aeruginosa) and diatom (Stephanodiscus hantzschii) in the Nakdong River-Reservoir System (South Korea) by artificial neural networks. 255-273 Springer Berlin In: Ecological Informatics: Scope Techniques and Applications (Recknagel F ed)
Jeong KS , Kim DK , Whigham P , Joo GJ 2003 Modelling Microcystis aeruginosa bloom dynamics in the Nakdong River by means of evolutionary computation and statistical approach Ecol Model 161 67 - 78    DOI : 10.1016/S0304-3800(02)00280-6
Jeong KS , Joo GJ , Kim HW , Ha K , Recknagel F 2001 Prediction and elucidation of phytoplankton dynamics in the Nakdong River (Korea) by means of a recurrent artificial neural network Ecol Model 146 115 - 129    DOI : 10.1016/S0304-3800(01)00300-3
Jeong KS , Kim DK , Jung JM , Kim MC , Joo GJ 2008 Non-linear autoregressive modelling by Temporal Recurrent Neural Networks for the prediction of freshwater phytoplankton dynamics Ecol Model 211 292 - 300    DOI : 10.1016/j.ecolmodel.2007.09.029
Joo GJ , Kim HW , Ha K , Kim JK 1997 Long-term trend of the eutrophication of the lower Nakdong River Korean J Limnol 30 472 - 480
Jørgensen SE 1992 Integration of Ecosystem Theories: A Pattern Kluwer Academic Publishers Dordrecht.
Khu ST , Liong SY , Babovic V , Madsen H , Muttil N 2001 Genetic programming and its application in real-time runoff forecasting J Am Water Resour Assoc 37 439 - 451    DOI : 10.1111/j.1752-1688.2001.tb00980.x
Kilham P , Kilham SS , Hecky RE 1986 Hypothesized resource relationships among African planktonic diatoms Limnol Oceanogr 31 1169 - 1181    DOI : 10.4319/lo.1986.31.6.1169
Kim DK , Jeong KS , Whigham PA , Joo GJ 2007a Winter diatom blooms in a regulated river in South Korea: explanations based on evolutionary computation Freshw Biol 52 2021 - 2041    DOI : 10.1111/j.1365-2427.2007.01804.x
Kim DK , Cao H , Jeong KS , Recknagel F , Joo GJ 2007b Predictive function and rules for population dynamics of Microcystis aeruginosa in the regulated Nakdong River (South Korea) discovered by evolutionary algorithms Ecol Model 203 147 - 156    DOI : 10.1016/j.ecolmodel.2006.03.040
Kim G , Kim Y , Song M , Ji K , Yu P , Kim C 2007c Evaluation of water quality charateristics in the Nakdong River using multivariate analysis. J Korean Soc Water Qual 23 814 - 821
Kim HW , Ha K , Joo GJ 1998 Eutrophication of the lower Nakdong River after the construction of an estuarine dam in 1987. Int Rev Hydrobiol 83 65 - 72    DOI : 10.1002/iroh.19980830107
Kim HW , Hwang SJ , Joo GJ 2000 Zooplankton grazing on bacteria and phytoplankton in a regulated large river (Nakdong River Korea). J Plankton Res 22 1559 - 1577    DOI : 10.1093/plankt/22.8.1559
Kim LH , Choi E , Gil KI , Stenstrom MK 2004 Phosphorus release rates from sediments and pollutant characteristics in Han River Seoul Korea. Sci Total Environ 321 115 - 125
Koza JR 1992 Genetic Programming: On the Programming of Computers by Means of Natural Selection he MIT Press New York NY
Lavric V , Iancu P , Ple?u V 2005 Genetic algorithm optimisation of water consumption and wastewater network topology J Clean Prod 13 1405 - 1415    DOI : 10.1016/j.jclepro.2005.04.014
Jørgensen KS 2004 Optimal operation rules for multireservoir systems using genetic algorithm J Korean Soc Civil Eng KOI: KISTI1.1003/JNL.JAKO200410103402860 24 9 - 17
Lek S 2007 Uncertainty in ecological models Ecol Model 207 1 - 2    DOI : 10.1016/j.ecolmodel.2007.03.015
Lek S , Delacoste M , Baran P , Dimopoulos I , Lauga J , Aulagnier S 1996 Application of neural networks to modelling nonlinear relationships in ecology Ecol Model 90 39 - 52    DOI : 10.1016/0304-3800(95)00142-5
Lotka AJ 1925 Elements of Physical Biology. Dover Publications New York NY.
Magadza CHD 1980 The distribution of zooplankton in the Sanyati Bay Lake Kariba: a multivariate analysis Hydrobiologia 70 57 - 67    DOI : 10.1007/BF00015491
Makkeasorn A , Chang NB , Li J 2009 Seasonal change detection of riparian zones with remote sensing images and genetic programming in a semi-arid watershed. J Environ Manag 90 1069 - 1080    DOI : 10.1016/j.jenvman.2008.04.004
Matta JF , Marshall HG 1984 A multivariate analysis of phytoplankton assemblages in the western North Atlantic J Plankton Res 6 663 - 675    DOI : 10.1093/plankt/6.4.663
McKay RIB , Hao HT , Mori N , Hoai NX , Essam D 2006 Model-building with interpolated temporal data Ecol Inform 1 259 - 268    DOI : 10.1016/j.ecoinf.2006.02.005
McNyset KM 2005 Use of ecological niche modelling to predict distributions of freshwater fish species in Kansas Ecol Freshw Fish 14 243 - 255    DOI : 10.1111/j.1600-0633.2005.00101.x
Mishra AK , Desai VR 2006 Drought forecasting using feed-forward recursive neural network Ecol Model 198 127 - 138    DOI : 10.1016/j.ecolmodel.2006.04.017
Odum HT 1983 Ecological and General Systems: An Introduction to Systems Ecology University Press of Colorado Niwot CO.
Paik K , Kim JH , Kim HS , Lee DR 2005 A conceptual rainfall-runoff model considering seasonal variation Hydrol Process 19 3837 - 3850    DOI : 10.1002/hyp.5984
Park SS , Lee YS 2002 A water quality modeling study of the Nakdong RiverKorea. Ecol Model 152 65 - 75    DOI : 10.1016/S0304-3800(01)00489-6
Park SY , Choi JH , Wang S , Park SS 2006a Design of a water quality monitoring network in a large river system using the genetic algorithm. Ecol Model 199 289 - 297    DOI : 10.1016/j.ecolmodel.2006.06.002
Park YS , Tison J , Lek S , Giraudel JL , Coste M , Delmas F 2006b Application of a self-organizing map to select representative species in multivariate analysis: a case study determining diatom distribution patterns across France Ecol Inform 1 247 - 257    DOI : 10.1016/j.ecoinf.2006.03.005
Pelletier GJ , Chapra SC , Tao H 2006 QUAL2Kw: a framework for modeling water quality in streams and rivers using a genetic algorithm for calibration. Environ Model Software 21 419 - 425    DOI : 10.1016/j.envsoft.2005.07.002
Peterson AT , Ball LG , Cohoon KP 2002 Predicting distributions of Mexican birds using ecological niche modelling methods. Ibis 144 E27 - E32
Rabunal JR , Puertas J , Suarez J , Rivero D 2007 Determination of the unit hydrograph of a typical urban basin using genetic programming and artificial neural networks Hydrol Process 21 476 - 485    DOI : 10.1002/hyp.6250
Recknagel F 2001 Applications of machine learning to ecological modelling. Ecol Model 146 303 - 310    DOI : 10.1016/S0304-3800(01)00316-7
Recknagel F 2006 Ecological Informatics: Scope Techniques and Applications. Springer-Verlag Berlin
Recknagel F , Benndorf J 1982 Validation of the ecological simulation model “SALMO”. Int Rev Gesamten Hydrobiol 67 113 - 125
Recknagel F , Bobbin J , Whigham P , Wilson H 2002 Comparative application of artificial neural networks and genetic algorithms for multivariate time-series modelling of algal blooms in freshwater lakes J Hydroinformatics 4 125 - 133
Recknagel F , van Ginkel C , Cao H , Cetin L , Zhang B 2008 Generic limnological models on the touchstone: testing the lake simulation library SALMO-OO and the rule-based Microcystis agent for warm-monomictic hypertrophic lakes in South Africa Ecol Model 215 144 - 158
Romo S , Van Donk E , Gylstra R , Gulati R 1996 A multivariate analysis of phytoplankton and food web changes in a shallow biomanipulated lake. Freshw Biol 36 683 - 696    DOI : 10.1046/j.1365-2427.1996.d01-511.x
Savic DA , Walters GA , Davidson JW 1999 A genetic programming approach to rainfall-runoff modelling. Water Resour Manag 13 219 - 231    DOI : 10.1023/A:1008132509589
Schaefer MB 1968 Methods of estimating effects of fishing on fish populations. Trans Am Fish Soc 97 231 - 241    DOI : 10.1577/1548-8659(1968)97[231:MOEEOF]2.0.CO;2
Shan Y , Paull D , McKay RI 2006 Machine learning of poorly predictable ecological data Ecol Model 195 129 - 138    DOI : 10.1016/j.ecolmodel.2005.11.015
Shim SB , Oh YS , Lee YS , Koh DK 1995 Eutrophication forecasting of Daechong Reservoir using WASP5 water quality model J Inst Constr Technol 14 41 - 53
Silvert W 1997 Ecological impact classification with fuzzy sets Ecol Model 96 1 - 10    DOI : 10.1016/S0304-3800(96)00051-8
Stockman AK , Beamer DA , Bond JE 2006 An evaluation of a GARP model as an approach to predicting the spatial distribution of non-vagile invertebrate species Divers Distrib 12 81 - 89    DOI : 10.1111/j.1366-9516.2006.00225.x
ter Braak CJF , Verdonschot PFM 1995 Canonical correspondence analysis and related multivariate methods in aquatic ecology Aquat Sci 57 255 - 289
Underwood EC , Klinger R , Moore PE 2004 Predicting patterns of non-native plant invasions in Yosemite National Park California USA. Divers Distrib 10 447 - 459    DOI : 10.1111/j.1366-9516.2004.00093.x
van Tongeren OFR , van Liere L , Gulati RD , Postema G , Boesewinkel-De Bruyn PJ 1992 Multivariate analysis of the plankton communities in the Loosdrecht lakes: relationship with the chemical and physical environment Hydrobiologia 233 105 - 117    DOI : 10.1007/BF00016100
Volterra V 1926 Fluctuations in the abundance of a species considered mathematically Nature 118 558 - 560    DOI : 10.1038/118558a0
Welk A , Recknagel F , Cao H , Chan WS , Talib A 2008 Rule-based agents for forecasting algal population dynamics in freshwater lakes discovered by hybrid evolutionary algorithms Ecol Inform 3 46 - 54    DOI : 10.1016/j.ecoinf.2007.12.002
Whigham PA 2000 Induction of a marsupial density model using genetic programming and spatial relationships Ecol Model 131 299 - 317    DOI : 10.1016/S0304-3800(00)00248-9
Whigham PA , Recknagel F 2001a. An inductive approach to ecological time series modelling by evolutionary computation. Ecol Model 146 275 - 287    DOI : 10.1016/S0304-3800(01)00313-1
Whigham PA , Recknagel F 2001b Predicting chlorophyll-a in freshwater lakes by hybridising process-based models and genetic algorithms. Ecol Model 146 243 - 251    DOI : 10.1016/S0304-3800(01)00310-6
Yoo HS 2002 Statistical analysis of factors affecting the Han River water quality J Korean Soc Environ Engin KOI: KISTI1.1003/JNL.JAKO200216240803954 24 2139 - 2150
Zar JH 1999 Biostatistical Analysis Prentice-Hall Upper Saddle River NJ
Zuur AF , leno EN , Walker NJ , Saveliev AA , Smith GM 2009 Limitations of linear regression applied on ecological data. 11-33 Springer New York NY In: Mixed Effects Models and Extensions in Ecology with R (Zurr AF Ieno EN Walker NJ Saveliev AA Smith GM eds).