Advanced
Modern Probabilistic Machine Learning and Control Methods for Portfolio Optimization
Modern Probabilistic Machine Learning and Control Methods for Portfolio Optimization
International Journal of Fuzzy Logic and Intelligent Systems. 2014. Jun, 14(2): 73-83
Copyright © 2014, Korean Institute of Intelligent Systems
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • Received : May 21, 2014
  • Accepted : June 24, 2014
  • Published : June 25, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Jooyoung Park
Department of Control & Instrumentation Engineering, Korea University
ungdong Lim
Department of Control & Instrumentation Engineering, Korea University
Wonbu Lee
Department of Control & Instrumentation Engineering, Korea University
Seunghyun Ji
Department of Control & Instrumentation Engineering, Korea University
Keehoon Sung
Department of Control & Instrumentation Engineering, Korea University
Kyungwook Park
School of Business Administration, Korea University

Abstract
Many recent theoretical developments in the field of machine learning and control have rapidly expanded its relevance to a wide variety of applications. In particular, a variety of portfolio optimization problems have recently been considered as a promising application domain for machine learning and control methods. In highly uncertain and stochastic environments, portfolio optimization can be formulated as optimal decision-making problems, and for these types of problems, approaches based on probabilistic machine learning and control methods are particularly pertinent. In this paper, we consider probabilistic machine learning and control based solutions to a couple of portfolio optimization problems. Simulation results show that these solutions work well when applied to real financial market data.
Keywords
1. Introduction
Recent theoretical progress in the field of machine learning and control has many implications for related academic and professional fields. The field of financial engineering is one particular area that has benefited greatly from these advancements. Portfolio optimization problems [1 8] and the pricing/hedging of derivatives [9] can be performed more effectively using recently developed machine learning and control methods. In particular, since portfolio optimization problems are essentially optimal decision-making problems that rely on actual data observed in a stochastic environment, theoretical and practical solutions can be formulated in light of recent advancements. These problems include the traditional mean-variance efficient portfolio problem [10] , index tracking portfolio formulation [6 8 , 11] , risk-adjusted expected return maximizing strategy [1 , 2 , 12] , trend following strategy [13 17] , long-short trading strategy (including the pairs trading strategy) [13 , 18 20] , and behavioral portfolio management.
Modern machine learning and control methods can effectively handle almost all of the portfolio optimization problems just listed. In this paper, we consider a solution to the trend following trading problem based on the natural evolution strategy (NES) [21 23 , 25] and a risk-adjusted expected profit maximization problem based on an approximate value function (AVF) method [27 30] .
This paper is organized as follows: In Section 2, we briefly discuss relevant probabilistic machine learning and control methods. The exponential NES and iterated approximate value function method, which are the two main tools employed in this paper, are also summarized. Solutions to the trend following trading problem and the risk-adjusted expected profit maximization problem as well as simulation results using real financial market data are presented in Section 3. Finally, in Section 4, we present our concluding remarks.
2. Modern Probabilistic Machine Learning and Control Methods
In this section, we describe relevant advanced versions of the NES and AVF methods that will be applied later in this paper.
The NES method belongs to a family of evolution strategy (ES) type optimization methods. Evolution strategy, in general, attempts to optimize utility functions that cannot be modeled directly, but can be efficiently sampled by users. A probability distribution (typically, a multi-variate Gaussian distribution) is utilized by NES to generate a group of candidate solutions. In the process of updating distribution parameters based on the utility values of candidates, NES employs the so-called natural gradient [23 , 31] to obtain a sample-based search direction. In other words, the main idea of NES is to follow a sampled natural gradient of expected utility to update the search distribution. The samples of NES are generated according to the search distribution π(·∣ θ ), and by utilizing these samples, NES tries to locate a parameter update direction that will increase the performance index, J ( θ ). This performance index is defined as the expected value of the utility, f(z) , under the search distribution:
PPT Slide
Lager Image
Note that by the log-likelihood strategy, the gradient of the expected utility with respect to the search distribution parameter θ can be expressed as
PPT Slide
Lager Image
Hence, when we have independent and identically distributed samples, zi, i ∈ {1,···, n }, a sample-based approximation to the regular policy gradient (often referred to as the vanilla gradient) of the expected utility can be expressed as
PPT Slide
Lager Image
It is widely accepted that using the natural gradient is more advantageous than the vanilla gradient when it is necessary to search optimal distribution parameters while staying close to the present search distribution [23 , 31] . In the natural gradient based search method, the search direction is obtained by replacing the gradient ∇ θJ(θ) with the natural gradient defined by F -1 ( θ )∇ θJ(θ) , where F ( θ ) = E [∇ θ log π( x θ )∇ θ log π( x θ ) T ] is the Fisher information matrix. Note that the Fisher information matrix can be estimated from samples. Therefore, the core procedure of the NES algorithm can be summarized as follows [22] :
Preliminary steps :
  • 1. Choose the learning rate,η, number of samples in each generation,n, and utility function,f.
  • 2. Initialize parameterθof the search distribution π(·∣θ).
Main steps : Repeat the following procedure until the stopping condition is met.
1. For i = 1, · · · , n Draw a sample zi from the current search distribution. Compute the utility of the sample, f(zi). Compute the gradient of the log-likelihood, ∇θ logπ(zi|θ). end
2. Obtain the Monte-Carlo estimate of the gradient:
PPT Slide
Lager Image
3. Obtain the Monte-Carlo estimate of the Fisher information matrix:
PPT Slide
Lager Image
4. Update parameter θ of the search distribution:
PPT Slide
Lager Image
The procedure shown above is a basic form of NES and can be modified based on the application. For example, the concept of the baseline can be employed to reduce the estimation variance [21] . Recent improvements to the NES procedure can be found in [21 23 , 25] , and one of the most remarkable improvements is the exponential NES [23 , 25] . The main idea of the exponential NES method is to represent the covariance matrix of the multivariate Gaussian distribution, π (·| θ ), using the following matrix exponential map:
PPT Slide
Lager Image
Two remarkable advantages of using the matrix exponential are that it enables the covariance matrix to be updated in a vector space, and it makes the resultant algorithm invariant under linear transformations. The key idea of the exponential NES is the use of natural coordinates defined according to the following change of variables [23] :
PPT Slide
Lager Image
where A and M are matrix variables satisfying C = AT A = exp ( M ) for the covariance matrix C of the search distribution. The use of natural coordinates renders the step of inverting the Fisher information matrix unnecessary, hence, bypassing a major computational burden of the original NES. In Section 3, we apply the exponential NES, which is now a state-of-the-art evolution strategy, to the problem of finding a flexible long-flatshort type rule for a trend following trading strategy.
Another tool used in portfolio optimization applications of this paper is a special class of approximate value function methods [27 30] . In general, stochastic optimal control problems can be solved by utilizing state value functions, which estimate performance at a given state. The solutions of stochastic optimal control problems based on the value function are called dynamic programming. For more details on the various applications of dynamic programming, please refer to [32 , 33] . Solving stochastic control problems by dynamic programming corresponds to finding the best state-feedback control policy
PPT Slide
Lager Image
to optimize the performance index of specified constraints and dynamics
PPT Slide
Lager Image
where x ( t ) is the state, u ( t ) is the control input, and w ( t ) is the disturbance. The expected sum of discounted stage costs,
PPT Slide
Lager Image
is widely used as the performance index. The minimal performance index value, J *, is obtained by minimizing the performance index over all admissible control policies; the optimal control policy achieving J * is denoted by
PPT Slide
Lager Image
The state value function is defined as the minimum total expected cost achieved by an optimal control policy for the given initial state x (0) = z . Formally,
PPT Slide
Lager Image
Note that the optimal performance index value is J * = V *( x 0 ) for the given initial condition x 0 . The state value function in (11) is the fixed point of the following Bellman equation:
PPT Slide
Lager Image
In operator equation form, this fixed point property can be written as
PPT Slide
Lager Image
where
PPT Slide
Lager Image
Because it is hard to compute optimal control policy
PPT Slide
Lager Image
that satisfies the Bellman equation except special case [32] , the AVF,
PPT Slide
Lager Image
: X U , is utilized to obtain approximate solutions to the stochastic control problem. In an example considering a set of real financial market data, we utilize an ADP-based solution procedure utilizing the iterated AVF policy approach of O’Donoghue, Wang, and Boyd [29] . In the iterated AVF method [29] , by letting the parameters of the approximate value functions satisfy the iterated Bellman inequalities
PPT Slide
Lager Image
with
PPT Slide
Lager Image
, it is ensured that
PPT Slide
Lager Image
is a lower bound of the optimal state value function V* [27 , 29] . Also by optimizing this lower bound
PPT Slide
Lager Image
via convex optimization, the iterated AVF approach finds the approximate state value functions,
PPT Slide
Lager Image
, and the associated policies
PPT Slide
Lager Image
Note that in this step, the associated iterated AVF policies are obtained by
PPT Slide
Lager Image
for t = 0, . . . , M , and
PPT Slide
Lager Image
for t > M [29] .
3. Machine Learning and Control Based Portfolio Optimization
In this section, we present probabilistic machine learning and control based solutions to two important portfolio optimization problems: the trend following trading problem and the riskadjusted expected profit maximization problem. The first topic of our portfolio application concerns trading strategy. There have been a great deal of theoretical studies on trading strategies in financial markets; however, a majority of them are focused on the contra-trend strategy, which includes trading policies for a mean-reverting market [14] . Research interests in trendfollowing trading rules are also growing. A strong mathematical foundation has led to some important theorems regarding the trend following trading strategy [14 17] . We consider an exponential NES based solution to find an efficient trend following strategy. One of the key references for our solution is the stochastic control approach of Dai et al. [14 , 15] . In [14] , the authors considered a bull-bear switching market, where the drift of the asset price switches back and forth between two values representing the bull market mode and the bear market mode. These switching patterns follow an unobservable Markov chain. More precisely, the governing equation for the asset price, Sr , in [14] is given by
PPT Slide
Lager Image
where αr ∈ {1, 2} is the mode of the market at time r , and its movement follows a two-state Markov chain. The first state, αr = 1, represents the bull market mode, and its second state, αr = 2, represents the bear market mode. Drifts, μ (1) = μ 1 and μ (2) = μ 2 , represent the expected return rates in the bull and bear markets, respectively. Clearly, these drift values must satisfy μ 1 > 0 and μ 2 < 0. The Markov chain for the movement of αr is described by the following generator [14] :
PPT Slide
Lager Image
Note that in this Markov chain generator, λ 1 and λ 2 are the switching intensities for the bull-to-bear transition and bear-to- bull transition, respectively. In [14] , it is assumed that the switching market mode, { αr }, and the Brownian motion for the asset price variation, { Br }, are independent. Moreover, Dai et al. [14] showed that the optimal trend following long-flat type trading rules can be found by solving the associated Hamilton- Jacobi-Bellman (HJB) equation and employing the conditional probability for the bull market to generate the trade signals. In their problem formulation, the transaction cost and risk-free interest rate were fixed at 100 K [%] and ρ , respectively. The resulting optimal buying times, τ 1 , τ 2 , · · · , and selling times, v 1, v 2, · · · , were obtained by optimizing the performance index given by
PPT Slide
Lager Image
if initially flat, and
PPT Slide
Lager Image
if initially long [14] . The results of [14] are mathematically rigorous and establish a strong theoretical justification for the trend following trading theory. We utilize a less mathematical, but hopefully easier to understand approach, which is based on the exponential NES method [23 , 25] for the trend following trading problem. Note that in the exponential NES approach, the performance index may be chosen with more flexibility. This paper extends our previous work on this topic [13] in two ways. First, we utilize a more advanced version of the NES–the exponential NES [23 , 25] –that is now the state-of-the-art method in the field. Second, we focus on more flexible long-flat-short type trading rules whereas our previous paper considered only the long-flat strategy. According to [14] , there exist two monotonically increasing optimal sell and buy boundaries,
PPT Slide
Lager Image
and
PPT Slide
Lager Image
, in case of the finite-horizon problem for obtaining the optimal trend following long-flat type trading rule. When long-term investments are emphasized, the behavior of the system is similar to the infinite horizon case, and these threshold functions can be approximated by constants [14 , 15] . Following this approximation scheme, we try to find threshold constants for all possible transitions that can occur in long-flat-short type trading, i.e.,
PPT Slide
Lager Image
by applying the exponential NES method [23 , 25] . The price-series sample paths generated in accordance with the switching geometric Brownian motion (18) are required during the training phase. Simulating both Markov chains and geometric Brownian motions is not difficult; thus, the price-series sample path generation can be performed efficiently. By combining the thresholds found by the exponential NES method together with the Wonham filter [14 , 34] to estimate the conditional probability that the mode of the market is bull, one can obtain a long-flat-short type trend following trading strategy. To illustrate the applicability of the exponential NES based trading strategy, we considered the problem of determining a trend following trading rule for the NASDAQ index. For the example, NASDAQ closing data from 1991 to 2008 was considered (see Fig. 2 ). According to the estimation results of [14] , the parameters of the switching geometric Brownian motion for the NASDAQ are as follows: μ1 = 0:875, μ2 = −1.028, σ1 = 0.273, σ2 = 0.35, σ = 0.31, λ1 = 2.158, λ2 = 2.3, where σ is the simple average of σ 1 and σ 2 . Furthermore, the ratio of slippage per transaction and the risk-free interest rate were fixed as K = 0.001, and ρ = 5.4%, respectively. For training data, we generated episodes by utilizing parameter estimation results. By performing the exponential NES based training for these episodes, we obtained the threshold values for the trend following trading rules.
Figures 1 - 6 show the simulation results for the exponential NES based trend following trading rules. For these simulations, we set the initial wealth to one. Figure 1 shows the learning curve, which graphs the average total cost sums versus the policy update over a set of 10 simulation runs. As shown in the curve, the exponential NES method exhibits desirable behavior within less than 250 policy updates. This indicates that the exponential NES method works well for finding a long-flat-short type trading strategy. Figure 2 shows the NASDAQ index values together with the long-flat-short trading signals resulting from a policy obtained by the exponential NES method over the entire period. For comparison purposes, we also show the long-flat trading signals obtained by the NES approach [13] . According to Fig. 2 , the total number of position changes in the long-flat-short type trading strategy (2nd panel) is 345. Note that this value differs from the corresponding number of position changes in the long-flat type trading strategy (3rd panel, 100 changes) obtained by the NES approach [13] . Simulation results in Fig. 3 show that with the exponential NES based long-flat-short type trading strategy, trade returns are generally large and wealth steadily increases until it reaches 21.52 at the final time. In comparison, Fig. 4 shows that with the long-flat type trading rule, the trade returns are generally small, and wealth increases at a relatively slower rate. Its wealth value at the final time is only 10.65. From these simulation results, one can see that when K = 0.001, short positions slightly changed the number of trading and significantly improved wealth. To investigate the robustness of NES based trading rules against system changes, we performed simulations for various values of the transaction cost ratio. In particular, we considered the case when the transaction cost increased tenfold (i.e., K = 0.01); simulation results are shown in Figs. 5 and 6 . Figure 5 shows the trading positions resulting from the long-flat-short type strategy (top) and the long-flat type strategy (bottom) when K = 0.01. When the trading cost increases, both the longflat- short type strategy and the long-flat type strategy trade less frequently compared to the case when K = 0.001. Interestingly, the trading frequency of the long-flat-short type strategy decreased at a slower rate than that of the long-flat type trading strategy. We believe that this difference is due to the fact that the long-flat-short type strategy has more flexibility and can cope with system changes with less sensitivity. Finally, Fig. 6 shows the wealth resulting from the long-flat-short type policy (top) and the long-flat type policy (bottom) when K = 0.01. Note that the wealth values of the long-flat-short type strategy and the long-flat type strategy at the final time are 10.85 and 5.09, respectively. These values are still considerably larger than the wealth values of the buy-and-hold strategy (3.34) and the risk-free interest rate (2.74).
PPT Slide
Lager Image
Learning curve.
PPT Slide
Lager Image
NASDAQ index and trading position when K = 0.001.
PPT Slide
Lager Image
Trade return and wealth resulting from the long-flat-short type policy when K = 0.001.
PPT Slide
Lager Image
Trade return and wealth resulting from the long-flat type policy when K = 0.001.
PPT Slide
Lager Image
Trading position resulting from the long-flat-short type policy (top) and the long-flat type policy (bottom) when K = 0.01.
PPT Slide
Lager Image
Wealth resulting from the long-flat-short type policy (top) and the long-flat type policy (bottom) when K = 0.01.
For the second application example, we considered the riskadjusted, expected profit maximization problem and utilized an AVF [27 30] based procedure to find an efficient solution. To express the risk-adjusted, expected profit maximization problem in state-space format, it is necessary to define the state and control input together with the performance index that is used as an optimization criterion. To do this, we follow the research of Boyd et al. [1 , 28 , 30] . We define the state vector as the collection of the portfolio positions. Let xi(t) denote the dollar value of asset i at the beginning of time t . Then the state vector is given by
PPT Slide
Lager Image
The control input considered for this problem is a vector of trades,
PPT Slide
Lager Image
executed for portfolio x(t) at the beginning of each time step t . Note that ui(t) represents buying or selling assets; the asset associated with xi(t) is bought when ui(t) > 0 and sold when ui(t) < 0. Having these state and input definitions, the state transition is given by
PPT Slide
Lager Image
where r(t) is the vector of asset returns in period t . The return vector, r(t) , is independent and identically distributed with mean vector
PPT Slide
Lager Image
and covariance matrix
PPT Slide
Lager Image
. Note that the mean vector and covariance matrix do not change over time. For the performance index, we considered
PPT Slide
Lager Image
where the total gross cash entered in the portfolio is 1 T u , λ ( x + u ) T Ʃ( x + u ) is the quadratic post-trade risk penalty, uT diag( s ) u is the quadratic transaction cost, and ĸ u ∣ is the linear transaction cost. Furthermore, λ ≥ 0 is the risk aversion parameter, si ≥ 0 is the price-impact cost for the i th asset, and ĸ is the ratio of slippage per transaction. We considered the case when the initial portfolio x (0) was fixed at x 0 . In general, when portfolio optimization problems are solved, the control input u(t) should satisfy certain, naturally arising constraints. In particular, we considered the control input bound constraint:
PPT Slide
Lager Image
which means that only a limited amount of trading is allowed for each asset. Thus, the risk-adjusted profit maximization problem can be expressed as
PPT Slide
Lager Image
To solve this optimization problem, we utilized the iterated AVF approach [29] , which is one of the most advanced AVF methods. In the iterated AVF approach, convex quadratic functions
PPT Slide
Lager Image
are used to approximate the state value function at time t , and the
PPT Slide
Lager Image
parameters satisfy a series of Bellman inequalities
PPT Slide
Lager Image
with
PPT Slide
Lager Image
The Bellman inequalities guarantee that
PPT Slide
Lager Image
is a lower bound of the optimal state value function
PPT Slide
Lager Image
[27 29] . The iterated AVF method maximizes this lower bound using convex optimization [29] . Another constraint is kuk1 ∥ u U bdd , which can be written in terms of quadratic inequalities as
PPT Slide
Lager Image
where ei is the i th column of the identity matrix In . This constraint enables us to obtain sufficient conditions for the constrained Bellman inequality requirements in the form of linear matrix inequalities (LMIs) using the S-procedure [28 , 35] . As a real-market example of portfolio optimization, we examined an application of the iterated AVF approach [29] for a set of real financial market data [2 , 6] . The data considered five major stocks: IBM, 3M, Altria, Boeing, and AIG (the ticker symbols of these are IBM, MMM, MO, BA, and AIG, respectively). For the training data, we used the weekly prices of the five major stocks from Jan. 2, 1990 to Dec. 27, 2004, and obtained the exponentially weighted moving average (EWMA) of the mean return vector,
PPT Slide
Lager Image
(with the effective window size of three years), and covariance matrix, Ʃ. During the test period (2005 to 2007), iterated AVF based trading was performed every four weeks (20 trading days). For the risk-free rate, we assumed ρ = 0.05 as in [2] , and the discount factor, ϒ , was defined accordingly (i.e., ϒ = exp(- ρ /(52/4))). For the upper bound of the trading amount, we used U bdd = 20. The coefficients of the risk penalty and transaction costs were
λ = 0.005, ĸ = 0.005, si = 0.005, i = 1, … , 5.
For the Bellman inequalities in (28), we considered M = 150 time steps. Finally, the initial portfolio vector and the initial wealth level were chosen to be x 0 = [0, 0, 0, 0, 0] T and W = 100, respectively.
PPT Slide
Lager Image
The control input considered for this problem is a vector of trades,
PPT Slide
Lager Image
executed for portfolio x ( t ) at the beginning of each time step t . Note that ui ( t ) represents buying or selling assets; the asset associated with xi ( t ) is bought when ui ( t ) > 0 and sold when ui ( t ) < 0. Having these state and input definitions, the state transition is given by
PPT Slide
Lager Image
where r ( t ) is the vector of asset returns in period t . The return vector, r ( t ), is independent and identically distributed with mean vector
PPT Slide
Lager Image
and covariance matrix
PPT Slide
Lager Image
Note that the mean vector and covariance matrix do not change over time. For the performance index, we considered where the total gross cash entered in the portfolio is 1 Tu , λ ( x + u ) T Σ( x + u ) is the quadratic post-trade risk penalty, uT diag( s ) u is the quadratic transaction cost, and κ | u | is the linear transaction cost. Furthermore, λ ≥ 0 is the risk aversion parameter, si ≥ 0 is the price-impact cost for the i th asset, and κ is the ratio of slippage per transaction. We considered the case when the initial portfolio x (0) was fixed at x 0 . In general, when portfolio optimization problems are solved, the control input u ( t ) should satisfy certain, naturally arising constraints. In particular, we considered the control input bound constraint:
PPT Slide
Lager Image
which means that only a limited amount of trading is allowed for each asset. Thus, the risk-adjusted profit maximization problem can be expressed as
PPT Slide
Lager Image
To solve this optimization problem, we utilized the iterated AVF approach [29] , which is one of the most advanced AVF methods. In the iterated AVF approach, convex quadratic functions
PPT Slide
Lager Image
are used to approximate the state value function at time t , and the
PPT Slide
Lager Image
parameters satisfy a series of Bellman inequalities
PPT Slide
Lager Image
with
PPT Slide
Lager Image
The Bellman inequalities guarantee that
PPT Slide
Lager Image
is a lower bound of the optimal state value function
PPT Slide
Lager Image
[27 29] . The iterated AVF method maximizes this lower bound using convex optimization [29] . Another constraint is ∥ u U bdd , which can be written in terms of quadratic inequalities as
PPT Slide
Lager Image
where ei is the i th column of the identity matrix In . This constraint enables us to obtain sufficient conditions for the constrained Bellman inequality requirements in the form of linear matrix inequalities (LMIs) using the S-procedure [28 , 35] . As a real-market example of portfolio optimization, we examined an application of the iterated AVF approach [29] for a set of real financial market data [2 , 6] . The data considered five major stocks: IBM, 3M, Altria, Boeing, and AIG (the ticker symbols of these are IBM, MMM, MO, BA, and AIG, respectively). For the training data, we used the weekly prices of the five major stocks from Jan. 2, 1990 to Dec. 27, 2004, and obtained the exponentially weighted moving average (EWMA) of the mean return vector,
PPT Slide
Lager Image
(with the effective window size of three years), and covariance matrix, Σ. During the test period (2005 to 2007), iterated AVF based trading was performed every four weeks (20 trading days). For the risk-free rate, we assumed ρ = 0.05 as in [2] , and the discount factor, γ , was defined accordingly (i.e., γ = exp(− ρ /(52/4))). For the upper bound of the trading amount, we used Ubdd = 20. The coefficients of the risk penalty and transaction costs were λ = 0.005, κ = 0.005, si = 0.005, i = 1, · · · , 5. For the Bellman inequalities in (28), we considered M = 150 time steps. Finally, the initial portfolio vector and the initial wealth level were chosen to be x 0 = [0, 0, 0, 0, 0] T and W = 100, respectively.
Figures 7 - 13 show the simulation results of the portfolio optimization example under the iterated AVF method. Figure 7 depicts the portfolio profile during the test period. From this figure, one can see that with the passage of time, the portfolio profile slowly changes its direction to increase the performance index. Figure 8 shows the gross cash put into the portfolio, and Fig. 9 plots the cumulative cost sums. From Fig. 8 , it is clear cash is entered into the portfolio during the early stage of trade; as time progresses, the portfolio gains income. Furthermore, based on the trend of this figure, we can expect more profit to be derived in the later stage of trading. Figure 10 shows the transaction cost. When the portfolio is being built, the transaction cost is very high; however, it stabilizes over time. Figure 11 shows the risk penalty, which is always non-negative and increasing. Wealth history and amount of cash holdings are plotted in Figs. 12 and 13 , respectively. Note that in this scenario, wealth steadily increases as trading proceeds and reaches approximately 220 (yielding a 120% profit at the end of 2007). Also note that according to Fig. 13 , the amount of cash holding briefly remains at the initial value, rapidly decreases, and then slowly begins to restore itself. This behavior suggests that in the iterated AVF based trading strategy, a large amount of profit is obtained when aggressive initial investments are made with cashing out subsequently.
PPT Slide
Lager Image
Portfolio profile.
PPT Slide
Lager Image
Gross cash put into the portfolio.
PPT Slide
Lager Image
Cumulative cost.
PPT Slide
Lager Image
Transaction cost.
PPT Slide
Lager Image
Risk penalty.
PPT Slide
Lager Image
Wealth.
PPT Slide
Lager Image
Amount of cash.
4. Conclusion
Machine learning and control methods have been applied to a variety of portfolio optimization problems. In particular, we considered two important classes of portfolio optimization problems: the trend following trading problem and the risk-adjusted profit maximization problem. The exponential NES and iterated approximate value function methods were applied to solve these problems. Simulation results showed that these probabilistic machine learning and control based solutions worked well when applied to real financial market data. In the future, we plan to consider more extensive simulation studies, which will further identify the strengths and weaknesses of probabilistic machine learning and control based methods, and applications of our methods to other types of financial decision making problems.
No potential conflict of interest relevant to this article was reported.
Acknowledgements
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2011-0021188).
BIO
Jooyoung Park received his BS in Electrical Engineering from Seoul National University in 1983 and his PhD in Electrical and Computer Engineering from the University of Texas at Austin in 1992. He joined Korea University in 1993, where he is currently a professor at the Department of Control and Instrumentation Engineering. His recent research interests are in the areas of machine learning, control theory, and financial engineering.
E-mail: parkj@korea.ac.kr
Jungdong Lim received his BS in Control and Instrumentation Engineering from Korea University in 2013. Currently, he is a graduate student at Korea University majoring in Control and Instrumentation Engineering. His research areas include approximate dynamic programming, machine learning, and control applications.
E-mail: huhuvwvw@korea.ac.kr
Wonbu Lee received his BS in Control and Instrumentation Engineering from Korea University in 2013. Currently, he is a graduate student at Korea University majoring in Control and Instrumentation Engineering. His research areas include approximate inference, machine learning, and control applications.
E-mail: karimia@korea.ac.kr
Seunghyun Ji received his BS in Control and Instrumentation Engineering from Korea University in 2014. Currently, he is a graduate student at Korea University majoring in Control and Instrumentation Engineering. His research areas include control theory and machine learning.
E-mail: mysky5871@korea.ac.kr
Keehoon Sung received his BS in Control and Instrumentation Engineering from Korea University in 2014. Currently, he is a graduate student at Korea University majoring in Control and Instrumentation Engineering. His research areas include control theory and machine learning.
E-mail: skh0910@korea.ac.kr
Kyungwook Park received his BBA andMBA from Seoul National University and his PhD in Finance from the University of Texas at Austin in 1993. He joined Korea University in 1994, where he is currently a professor at the School of Business Administration. His recent research interests are in the areas of derivatives and hedging, control-theory- based assets and derivatives management, and cost of capital estimation with derivative pricing.
E-mail: pkw@korea.ac.kr
References
Boyd S. , Mueller M. T. , O’Donoghue B. , Wang Y. 2014 “Performance bounds and suboptimal policies for multiperiod investment,” Foundations and Trends in Optimization http://dx.doi.org/10.1561/2400000001 1 (1) 1 - 72    DOI : 10.1561/2400000001
Primbs J. A. “Portfolio optimization applications of stochastic receding horizon control,” Proceeding of the 2007 American Control Conference New York, NY July 9-13, 2007 http://dx.doi.org/10.1109/ACC.2007.4282251 1811 - 1816    DOI : 10.1109/ACC.2007.4282251
Calafiore G. C. 2008 “Multi-period portfolio optimization with linear control policies,” Automatica http://dx.doi.org/10.1016/j.automatica.2008.02.007 44 (10) 2463 - 2473    DOI : 10.1016/j.automatica.2008.02.007
Alenmyr S. , ’Ogren A. 2010 “Model Predictive Control for Stock Portfolio Selection,“ M.S. Thesis Department of Automatic Control, Lund University Sweden
Barmish B. R. 2011 “On performance limits of feedback controlbased stock trading strategies,” Proceedings of 2011 American Control Conference San Francisco, CA June 29-July 1, 2011 3874 - 3879
Primbs J. A. , Sung C. 2008 “A stochastic receding horizon control approach to constrained index tracking,” Asia-Pacific Financial Markets http://dx.doi.org/10.1007/s10690-008-9073-1 15 (1) 3 - 24    DOI : 10.1007/s10690-008-9073-1
Beasley J. E. , Meade N. , Chang T. J. 2003 “An evolutionary heuristic for the index tracking problem,” European Journal of Operational Research http://dx.doi.org/10.1016/S0377-2217(02)00425-3 148 (3) 621 - 643    DOI : 10.1016/S0377-2217(02)00425-3
Jeurissen R. , van den Berg J. 2005 “Index tracking using a hybrid genetic algorithm,” Proceedings of the ICSC Congress on Computational Intelligence Methods and Applications Istanbul, Turkey http://dx.doi.org/10.1109/CIMA.2005.1662364    DOI : 10.1109/CIMA.2005.1662364
Primbs J. A. 2009 “Dynamic hedging of basket options under proportional transaction costs using receding horizon control,” International Journal of Control http://dx.doi.org/10.1080/00207170902783341 82 (10) 1841 - 1855    DOI : 10.1080/00207170902783341
Markowitz H. 1959 Portfolio Selection: Efficient Diversification of Investments (Cowles Foundation for Research in Economics at Yale University Monograph 16) Wiley New York, NY
Park J. , Yang D. , Park K. 2013 “Approximate dynamic programming-based dynamic portfolio optimization for constrained index tracking,” International Journal of Fuzzy Logic and Intelligent Systems http://dx.doi.org/10.5391/IJFIS.2013.13.1.19 13 (1) 19 - 28    DOI : 10.5391/IJFIS.2013.13.1.19
Park J. , Jeong J. , Park K. 2012 “An investigation on dynamic portfolio selection problems utilizing stochastic receding horizon approach,” Journal of Korean Institute of Intelligent Systems http://dx.doi.org/10.5391/JKIIS.2012.22.3.386 22 (3) 386 - 393    DOI : 10.5391/JKIIS.2012.22.3.386
Park J. , Yang D. , Park K. 2013 “Investigations on dynamic trading strategy utilizing stochastic optimal control and machine learning,” Journal of Korean Institute of Intelligent Systems http://dx.doi.org/10.5391/JKIIS.2013.23.4.348 23 (4) 348 - 353    DOI : 10.5391/JKIIS.2013.23.4.348
Dai M. , Zhang Q. , Zhu Q. J. 2010 “Trend following trading under a regime switching model,” SIAM Journal on Financial Mathematics http://dx.doi.org/10.1137/090770552 1 (1) 780 - 810    DOI : 10.1137/090770552
Dai M. , Zhang Q. , Zhu Q. J. 2011 “Optimal trend following trading rules,” Social Science Research Network http://dx.doi.org/10.2139/ssrn.1762118    DOI : 10.2139/ssrn.1762118
Kong H. T. , Zhang Q. , Yin G. G. 2011 “A trend-following strategy: conditions for optimality,” Automatica http://dx.doi.org/10.1016/j.automatica.2011.01.039 47 (4) 661 - 667    DOI : 10.1016/j.automatica.2011.01.039
Yu J. , Zhang Q. 2012 “Optimal trend-following trading rules under a three-state regime switching model,” Mathematical Control and Related Fields http://dx.doi.org/10.3934/mcrf.2012.2.81 2 (1) 81 - 100    DOI : 10.3934/mcrf.2012.2.81
Kim S. J. , Primbs J. , Boyd S. 2008 “Dynamic spread trading,”
Primbs J. A. 2009 “A control systems based look at financial engineering,”
Mudchanatongsuk S. , Primbs J. A. , Wong W. 2008 “Optimal pairs trading: a stochastic control approach,” Proceedings of the American Control Conference Seattle, WA June 11-13, 2008 http://dx.doi.org/10.1109/ACC.2008.4586628 1035 - 1039    DOI : 10.1109/ACC.2008.4586628
Wierstra D. , Schaul T. , Peters J. , Schmidhuber J. “Natural evolution strategies,” Proceedings of the IEEE World Congress on Evolutionary Computation Hong Kong June 1-6, 2008 http://dx.doi.org/10.1109/CEC.2008.4631255 3381 - 3387    DOI : 10.1109/CEC.2008.4631255
Wierstra D. , Schaul T. , Glasmachers T. , Sun Y. , Schmidhuber J. 2011 “Natural evolution strategies,” http://arxiv.org/abs/1106.4487
Glasmachers T. , Schaul T. , Yi S. , Wierstra D. , Schmidhuber J. “Exponential natural evolution strategies,” Proceedings of the 12th Genetic and Evolutionary Computation Conference Portland, OR July 7-11, 2010
Schaul T. “Benchmarking exponential natural evolution strategies on the noiseless and noisy black-box optimization testbeds,” Proceedings of the 14th Genetic and Evolutionary Computation Conference Philadelphia, PA July 7-11, 2012 http://dx.doi.org/10.1145/2330784.2330816    DOI : 10.1145/2330784.2330816
Wang Y. , O’Donoghue B. , Boyd S. 2014 O’Donoghue, and S. Boyd, “Approximate dynamic programming via iterated Bellman inequalities,” International Journal of Robust and Nonlinear Control
O’Donoghue B. , Yang W. , Boyd S. “Min-max approximate dynamic programming,” Proceedings of the IEEE International Symposium on Computer-Aided Control System Design Denver, CO September 28-30, 2011 http://dx.doi.org/10.1109/CACSD.2011.6044538 424 - 431    DOI : 10.1109/CACSD.2011.6044538
O’Donoghue B. , Wang Y. , Boyd S. “Iterated approximate value functions,” Proceedings European Control Conference Zurich, Switzerland July 17-19, 2013 3882 - 3888
Keshavarz A. , Boyd S. 2014 “Quadratic approximate dynamic programming for input-affine systems,” International Journal of Robust and Nonlinear Control http://dx.doi.org/10.1002/rnc.2894 24 (3) 432 - 449    DOI : 10.1002/rnc.2894
Peters J. , Schaal S. 2008 “Natural actor-critic,” Neurocomputing http://dx.doi.org/10.1016/j.neucom.2007.11.026 71 (7-9) 1180 - 1190    DOI : 10.1016/j.neucom.2007.11.026
Bertsekas D. P. 1995 Dynamic Programming and Optimal Control Athena Scientific Belmont, MA
Powell W. B. 2007 Approximate Dynamic Programming : Solving the Curses of Dimensionality Wiley-Interscience Hoboken, NJ
Wonham W. M. 1965 “Some applications of stochastic differential equations to optimal non-linear filtering,” SIAM Journal on Control 2 347 - 369
Boyd S. P. 1994 Linear Matrix Inequalities in System and Control Theory Society for Industrial and Applied Mathematics Philadelphia, PA