The objective of this study is to propose a method to calculate prediction intervals for onedayahead hourly forecasts of photovoltaic power generation and to evaluate its performance. One year of data of two systems, representing contrasting examples of forecast’ accuracy, were used. The method is based on the maximum likelihood estimation, the similarity between the input data of future and past forecasts of photovoltaic power, and on an assumption about the distribution of the error of the forecasts. Two assumptions for the forecast error distribution were evaluated, a Laplacian and a Gaussian distribution assumption. The results show that the proposed method models well the photovoltaic power forecast error when the Laplacian distribution is used. For both systems and intervals calculated with 4 confidence levels, the intervals contained the true photovoltaic power generation in the amount near to the expected one.
1. Introduction
Methods to forecast photovoltaic, PV, power generation are expected to fulfill an important role in the integration of PV systems on current power grids. This is due to the ability that such methods have to anticipate, eventual and strong variations of PV power generation, caused by changes in the weather. The information provided by PV power forecast methods can help power companies and users to prepare for such events. Many methods to forecast PV power generation have been proposed for different time and spatial scales
[1

3]
. Moreover, comprehensive reviews of several approaches also are available
[4
,
5]
.
Regardless the method and input data used, the strong dependence of PV power on weather conditions makes the realization of accurate forecasts in continuous fashion a difficult task. The problem becomes even acuter on locations with unstable weather and for time scales longer than a few hours. In this case, besides a value for the forecast of PV power generation, for a given time and location, it is interesting to have also information about the uncertainty of such forecast. Uncertainty can be expressed in several ways, and one of them is through the calculation of prediction intervals which are expected to contain a future point observation with a given confidence level.
Thus, the objective of this study is to present a simple method to calculate prediction intervals for onedayahead forecasts of power generation of single PV systems. The method is based on the use of the maximum likelihood estimation method, and on the concept of similarity between PV power forecasts for different hours and different days.
To validate the method, it was applied to calculate prediction intervals, with different theoretical confidence levels, for 1 year of hourly forecasts of power generation of two PV systems installed in different locations in Japan. The forecasts of PV power were done using a method previously proposed
[6]
, which was based on numerical weather prediction data and a support vector regression algorithm. The performance of the prediction intervals calculation method was verified analyzing the correspondence of prespecified confidence levels used in the intervals calculations and the achieved annual forecast error coverage they provided. Moreover, comparisons with 2 naïve reference approaches were done to evaluate the sizes of the prediction intervals according to their forecast error coverage.
2. Prediction Intervals Methods
In this section the proposed method to calculate prediction intervals is presented and compared with previous approaches. Moreover, two naïve reference methods to calculate the intervals are presented. Their objective is to provide a basis of comparison to analyze the performance of the proposed method.
 2.1 Proposed method
It is desired as a prediction interval for any point forecast of PV power generation
f_{i}
, an interval that will contain the true PV power generation
y_{i}
with a given confidence level. One way to approach this problem is making assumptions about the distribution of the forecast error
e
(
x
), regarded as
f
(
x
) −
y
, where
x
represents the input variables used to make the forecasts
f
(
x
). If the true distribution of the forecast error is known, predictive intervals can be obtained with a given confidence level from the corresponding probability distribution. In the case the true distribution is not known, one option is to assume that the forecast errors follow a known distribution and to estimate the parameters of such distribution via maximum likelihood estimation
[7]
.
In a previous study, Lin and Weng
[8]
, proposed to calculate prediction intervals with the approach described for forecasts done with support vector machines. They estimated the forecast error of the method based on the errors of a crossvalidation procedure on the training data used to construct the forecast model. Furthermore, they assumed that the distribution of the forecast errors followed a symmetric Gaussian distribution, as shown in Eq. 1 or a symmetric Laplacian one, as showed in Eq. 2. From these assumptions it is possible to estimate the scale parameter σ for each of these 2 distributions by maximizing the likelihood. In this case, σ for a Gaussian distribution is the root mean square error of the forecasts; and for a Laplacian one σ is the mean absolute error of the forecasts.
If the probability distribution of the forecast error follows a known distribution, then for a given probability 1 −
s
, the prediction interval limits can be calculated from upper
s
^{th}
percentile
p_{s}
of the corresponding probability distribution, Eq. 3. In Eq. 3
L_{lim}
and
U_{lim}
are the lower and upper limits of the prediction interval.
For a Gaussian distribution, the upper
p_{s}
is given by Eq. 4, where
Φ^{1}
is the quantile function of the distribution.
For a Laplacian distribution,
p_{s}
is given by Eq. 5.
According to Lin and Weng
[8]
, similar approach was also used by Platt
[9]
in the problem of classification. However, this kind of approach presents 2 problems to be applied in the problem of PV power generation forecasts. First, the forecasts errors are estimated from a crossvalidation procedure. The PV power generation forecast is a time series problem. As such, the use of crossvalidation if applied directly will not yield good error estimates for the forecast model. The second problem is that the calculation of σ as proposed implies that the forecast error distribution depends on the input just through the forecasted value
[8]
. In other words, each forecast model will have just one prediction interval regardless the magnitude of the input variables. In the problem of PV power forecast this assumption poses a problem because forecasts for different periods of the day and weather conditions will have prediction intervals with different sizes. For example, forecasts for hours at the beginning and the end of the day should have prediction intervals with lower magnitudes than for hours around noon time.
In fact, we showed in a previous study that the application of such approach without modification is not effective in the PV power generation forecast problem, proposing a simple modification based on the target hours of the forecasts to improve the prediction intervals
[10]
.
In this study we propose to use past forecast errors instead of using the ones of a crossvalidation procedure applied on training data. Furthermore, a criterion based on input data similarity is used to obtain suitable prediction intervals according to the forecast hour and input data of the forecasts. The hypothesis behind this approach is that for a specific location, at a given time, similar input data should yield similar forecasts errors of PV power generation and these errors should belong to the same distribution. Thus, a prediction interval of the PV power generation value for a sunny weather at noon will be based on past forecast errors for sunny weather at noon. Calculating this way for a given location, the prediction intervals will vary according to the input data, weather conditions and target hour of the forecasts.
To identify past input data similar to the input data of a target forecast the Euclidean distance was used as the similarity parameter. Therefore, to calculate the prediction intervals of a forecast of PV generation for a given hour, the input data that generated such forecast is compared with the input data of hourly forecasts done in the previous 60 days. From this comparison the
n
% most similar hours are retrieved and used in the calculation of the prediction intervals. Based on a preliminary assessment of how much data are necessary and how similar the data have to be to obtain good prediction intervals,
n
was set on 5% (42 hours) of the initial set of data. The preliminary assessment results are in the initial version of this paper presented at the 2014 International Conference of Electrical Engineering
[11]
.
Finally, regarding the proposed method, two physical constraints were adopted. First, the lowest value for the inferior limit of the prediction intervals was set to be zero as it is the minimum PV power generation. Second, the maximum value for the superior limit of the prediction intervals for a PV system was set to its maximum theoretical power generation at the same hour, given the same extraterrestrial insolation conditions.
 2.2 Reference method 1
If past forecast data are available, a simple approach to obtain predictive intervals would be to use these data directly without making any assumption about the distribution of the forecast errors. In this case, the intervals are directly estimated from the quantiles of the data sets. With this method, to calculate the prediction intervals for a given forecast with a confidence level of 90% for example, it is enough to identify to 5% quantile and the 95% quantile of the past forecasts that had their input data similar to the input data of the target forecast.
This method may work well in databases containing many years of past forecasts. However, its application in this study provides an assessment regarding the validity of the hypothesis done in section 2.1, where several years of past data are not available.
 2.3 Reference method 2
A different reference approach to calculate the prediction intervals is one where they are defined by the maximum and minimum possible values for the forecasts of PV power generation for each hour of the day. In this way the intervals will always comprise the true PV power generation and they will provide coverage of the forecast error of 100%. In reality, however, the resulting intervals will be so large that they will not have any practical application.
Nevertheless, the use of this method has the objective to provide a reference value regarding the size of the intervals obtained with the method proposed in section 2.1. If the proposed method yield intervals as large as the ones obtained with this reference method, they will not be useful.
To obtain the maximum possible values for the forecasts of PV power, the horizontal extraterrestrial insolation for every hour of forecast is used. With this information the PV power generation was calculated using the model presented in Eq. 6, proposed by Mellit & Pavan
[2]
.
In Eq. 6
P_{pv}
is the photovoltaic power generated in kW,
A
is the total area of the modules in m
^{2}
,
n_{pv}
is the conversion efficiency,
n_{bos}
the system efficiency and
G
the insolation in kW/m
^{2}
. To obtain a maximum theoretical value for the PV power
G
was regarded as the horizontal extraterrestrial insolation. Furthermore, to avoid problems with shadow, modules tilt angle, orientation angle, and with the first and last hours of daylight, a correction factor of 5% of the rated power of the PV system was added to
P_{pv}
.
3. Forecast Method
The prediction interval methods described in section 2 can provide intervals for forecasts of PV power generation done with any kind of method. They depend only of the past input data used and the output data the forecast method yielded. In this study they were applied to provide intervals to forecasts done with a method based on the use of support vector regression, hourly extraterrestrial insolation and numerical weather prediction data. These data are provided on the day preceding the forecast day. The forecast horizon is therefore of one day ahead of time. The numerical weather prediction is provided by gridpoint value forecasts with a mesoscale model, GPVMSM, of the Japan Meteorological Agency.
The input data used for any hour of forecast of PV power is in
Table 1
. The method provides hourly forecasts based on the hourly input data and for each day of forecasts the model is trained with hourly input data and measured PV power of the previous 60 days. Details about the setup of the algorithm and its application are in previous studies
[6
,
12]
.
Input data used in the forecasts of PV power.
*The value for the hour of forecast and the preceding one are used as input.
4. PV Systems Data
One year of prediction intervals, 2010, were calculated for hourly forecasts of power of 2 PV systems. One PV system is located in Saitama prefecture, north of Tokyo, and the other in Aichi prefecture, southwest of Tokyo. Both PV systems have a rated power of 10 kW, and their specifications and installation conditions are in
Table 2
.
PV systems specification and installation data.
PV systems specification and installation data.
These 2 PV systems were chosen because they provide examples of forecasts of PV power with high average annual forecasts errors, PV2 in
Table 2
, and low average annual forecast errors, PV1 in
Table 2
. Thus, the performance of the prediction interval methods can also be assessed for different kinds of forecast errors.
5. Results
In
Fig. 1
, the annual forecast error coverage achieved with each confidence level is presented. In
Fig. 1a
are results for PV system 1, and in
Fig. 1b
are the results for PV system 2. Each figure also contains a dotted line representing the ideal behavior regarding the confidence levels and the forecast error coverage. Finally, in the same
Fig. 1a
and
Fig. 1b
are the results obtained using the reference approach 1.
Annual forecast error coverage with prediction intervals versus the corresponding prespecified confidence levels used in the calculation of the intervals for a PV system with low forecast errors (a) and another with high ones (b).
Based on the results in
Fig. 1a
and
Fig. 1b
, it is clear that the reference method 1 has poor performance regardless the PV system and the confidence level. This characteristic reflects the fact that the data set size of 42 hours of similar input data is not sufficient to provide direct estimation of prediction intervals.
Regarding the distribution assumptions, the results in
Fig. 1
show that the difference between the use of the Laplacian distribution and the Gaussian distribution was small. However, clearly, assuming a Laplacian distribution caused the prediction interval method to approximate well the slope of the ideal curve for both PV systems. In the case of the Gaussian distribution assumption, the confidence levels had a tendency of underestimating the forecast error coverage for low values, 85% and 90% and overestimating them for high values, 95% and 97.5%. This behavior is noted in
Fig. 1a
.
Comparing both PV systems, PV system 2, which generally had high forecast errors, caused the proposed method to yield prediction intervals naturally larger than the ones obtained for PV system 1. The overall result was higher forecast error coverage for PV system 2 than for PV system 1. However, there was not a strong difference; it was not higher than 1.5% in the worst case.
Another important factor to consider in the evaluation of a prediction interval method is the size of the intervals it yields given different prespecified confidence levels.
In the case of PV power prediction intervals, their size can be regarded as a kind of reserve power needed by the PV system operator to deal with the forecast error. For example, given a forecast of PV power for an hour, if the forecast underestimates the true value, there will be an excess of power regarding what was expected. This excess can be thought as a quantity that has to be absorbed, wasted, or sent somewhere else in the power grid, or to a battery, so that the balance between power demand and supply can be kept. In the case such excess of power is not wasted, the upper limit of the prediction interval can be thought as a measure of a reserved capacity prepared to store surplus of PV power generation.
On the other hand, if the forecasted PV power overestimates the true value, power has to be delivered by the power grid, or by a battery to complete the gap between what was expected and what was generated. In this case, the lower limit of the prediction interval expresses a reserved capacity available to deliver power in case of overestimations.
In both cases the prediction intervals can be seen a measure of how much power has to be reserved. An example of this way of seeing the prediction intervals is illustrated in
Fig. 2
.
Prediction intervals as a measure of reserve power to deal with PV power generation fluctuations.
Considering the intervals as reserve power, they will imply costs. Therefore, it is desired to obtain intervals that are only as big as necessary. The intervals’ size for given confidence levels, provides then a useful measure when comparing prediction interval methods.
It should be noted that a proper prediction interval method will yield intervals that ultimately reflect the level of forecast error. If the forecast errors for a given hour or weather condition are high, so it should be the related prediction interval. Therefore comparisons of interval sizes only make sense when comparing different prediction interval methods to evaluate which one reflects better the characteristics of the forecast errors.
An initial evaluation of the intervals size is presented in
Fig. 3(a)
, for PV system 1 and
Fig. 3(b)
for PV system 2. The reserve power is normalized by the PV system rated power. The required value achieved for each prespecified confidence interval is presented. In
Fig. 3(a)
and
Fig. 3(b)
the reserve power required by reference methods 1 and 2 are also presented.
Annual reserve power required with each prediction interval method for different confidence levels (for a PV system with low forecast errors (a) and one with high ones (b).
The results in
Fig. 3
show that with reference 2 100 % of the forecast errors are covered. However the required reserve power to do that was significantly higher than the reserve power required by the proposed method. For example, in
Fig. 3(a)
using the Laplacian distribution assumption with a confidence level of 97.5%, a forecast error coverage of 97.1% was achieved using 36% less reserve power than reference method 2.
Comparing the distribution assumptions, generally with the Gaussian distribution less reserve power was required than with the Laplacian distribution. Nevertheless, as shown in
Fig. 1
, the effective forecast error coverage was also slightly lower than the one achieved with the Laplacian distribution assumption.
In
Fig. 3(a)
and
Fig. 3(b)
, the results indicate that reference method 1 required the lowest reserve power regardless the confidence level. However, the results in
Fig. 2
also show that such low reserve power values were associated with poor forecast error coverage, making the method actually the worst of the ones evaluated.
From
Fig. 3(b)
, one can see that the differences between the reserve power value required by reference method 2 and the ones of the other methods are lower than in the case of
Fig. 3(a)
. For the PV system with high forecast errors, the application of the proposed method was less effective than for PV systems with low forecast errors. For example, in
Fig. 3(b)
using the Laplacian distribution assumption with a confidence level of 97.5%, a forecast error coverage of 98.2% was achieved using 18% less reserve power than with the reference method 2. This value is half the difference achieved for PV system 1 in
Fig. 3(a)
.
A better understanding of the performance of each method can be seen comparing directly the effective forecast error coverage with the corresponding reserve power for each method and PV system. These results are in
Fig. 4
.
Annual forecast error coverage versus reserve power for different PV systems and prediction interval methods.
To identify how much reserve power is required in terms of what is actually generated, the reserve power in
Fig. 4
was normalized by the annual power generation of each PV system.
The results in
Fig. 4
indicate that the use of the Gaussian distribution with the proposed method yielded in general lower prediction intervals (expressed as the reserve power ratio) than the use of the Laplacian distribution. However, as also noted in
Fig. 1(a)
, for the PV system with low forecast errors using the Gaussian distribution assumption caused strong overestimations of the forecast error coverage for low confidence levels and slight underestimations for high confidence levels. With the Laplacian distribution assumption the proposed method presented more uniform behavior approximating better the prespecified confidence levels.
For the PV system with high forecast errors, the Gaussian distribution was a better fit. Furthermore, with the Gaussian distribution also the lowest reserve power ratio was achieved.
These results can be understood considering the shapes of the Laplacian and Gaussian curves and the distribution of the forecast errors of both PV systems. For PV system 1, with generally low forecast errors, most of the forecast errors will be around zero. Moreover, the frequencies of forecast errors will decrease sharply with the increase of their absolute values. This behavior resembles better the shape of the Laplacian distribution. In the case of PV system 2, with generally higher forecast errors, the frequency of low forecast error will be lower than the ones of PV system 1, yielding a forecast error distribution more similar to the Gaussian curve.
Comparing the results obtained with the proposed method with the ones provided by the reference method 2, the benefits in terms of less reserve power are clear. For example, to cover 97.1% of the forecast errors for PV system 1, it was necessary to have a reserve of 1.5 times the total PV power generated in the year. To cover all forecast errors with the reference method 2 it was required near to 2.35 times the total PV power generated in the year.
In
Fig. 5
are examples of prediction intervals calculated with the proposed method for a given day. The calculations were done for PV system 1 using the Laplacian distribution assumption. In
Fig. 5
the green line indicates the superior limit of the prediction interval achieved with reference method 2.
Examples of forecasts of PV power generation with prediction intervals with the proposed method.
6. Conclusion
The objective of this study was to present a simple method to calculate prediction intervals for forecasts of power generation of PV systems. The method is based on the use of the maximum likelihood estimation, and on the concept of similarity between the input data used in the forecasts.
The results showed that the proposed method used with the Laplacian distribution assumption is more suitable to PV systems with low forecast errors. For PV systems with high forecast errors the Gaussian distribution assumption was more suitable.
In spite of that, focusing only on the relation between forecast error coverage and confidence levels of the intervals, the use of the Laplacian distribution is indicated. The Gaussian distribution assumption yielded a stronger tendency to overestimate prediction intervals for low confidence level values and to underestimate them for high confidence level values than the Laplacian distribution assumption.
Based on the results, it can be concluded that the proposed method to calculate the prediction intervals in the problem PV power generation forecast is valid. The forecast error coverage obtained with it approximated well the confidence levels of the intervals, and it used significantly less reserve power than the reference method 2. Moreover, it requires just 60 days of past forecasts, being a useful option when large databases with past PV power generation and forecast data are not available.
Still, the results are based on PV systems’ data representing extreme cases regarding annual forecast errors. In further studies a comprehensive analysis containing a wide range of PV systems installed in Japan will be done to better characterize the validity of the method and of the forecast error distribution assumptions.
Acknowledgements
This work was funded by NEDO in the project Research and Development of PV Performance and Reliability Characterization Technologies, and by the JST Agency (CREST).
BIO
Joao Gari da Silva Fonseca Junior received a doctor degree in Mechanical Systems at Kobe University, Japan, on 2009, and after that he worked at the National Institute of Advance Industrial Science and Technology, Japan. He is currently working at the Institute of Industrial Science, Collaborative Research Center for Energy Engineering of the University of Tokyo, Japan. His research topics are related with photovoltaic power generation forecast techniques and machine learning.
Takashi Oozeki received a doctor degree on computer science and engineering at Tokyo University of Agriculture and Technology on 2005/09, Japan. In the same year, he started to work as researcher at the National Institute of Advance Industrial Science and Technology in Japan. His main research topics are related with photovoltaic systems.
Hideaki Ohtake received a doctor degree on Earth and Environmental Sciences from Hokkaido University, Japan, on 2009/03. In the same year he started working at the Meteorological Research Institute of Japan Meteorological Agency. In 2011/04, he started working at the National Institute of Advance Industrial Science and Technology in Japan. Currently he is working at the Research Center for Photovoltaic Technologies of the same institute and his main research topics are numerical weather prediction models and solar irradiance forecast.
Takumi Takashima received a master degree in Science and Engineering at Tsukuba University in 1992/03. In the same year he started to work at the National Institute of Advance Industrial Science and Technology in Japan. His main research topics are related with performance evaluations of photovoltaic systems.
Kazuhiko Ogimoto graduated as electronic engineer on 1979/03 at University of Tokyo in Japan. In April of the same year he started to work at JPOWER on grid planning. Since 2008 /01 he is a professor at the Institute of Industrial Science, Collaborative Research Center for Energy Engineering of the University of Tokyo, Japan.
Mellit A.
,
Pavan A. M.
2010
“A 24h forecast of solar irradiance using artificial neural network: Application for performance prediction of a gridconnected PV plant at Trieste, Italy,”
Solar Energy
84
(5)
807 
821
DOI : 10.1016/j.solener.2010.02.006
Lorenz E.
,
Heinemann D.
,
Wickramarathne H.
,
Beyer H. G.
,
Bofinger S.
2007
“Forecast of Ensemble Power Production by GridConnected PV Systems,”
Proceedings of the 20th European PV Conference
Italy
3.9 
7.9
Yona A.
,
Senjyu T.
,
Saber A. Y.
,
Funabashi T.
,
Sekine H.
,
Kim C. H.
2008
“Application of Neural Network to OneDayAhead 24 hours Generating Power Forecasting for Photovoltaic System,”
Proceedings of International Conference on Intelligent Systems Applications to Power Systems 2007
1 
6
Paulescu M.
,
Paulescu E.
,
Gravila P.
,
Badescu V.
2012
Weather Modeling and Forecasting of PV Systems Operation
Springer
Espinar B.
,
Aznarte J.L.
,
Girard R.
,
Moussa A. M.
,
Kariniotakis G.
2010
“Photovoltaic Forecasting: A state of the art,”
Proceedings 5th European PVHybrid and MiniGrid Conference
Spain
250 
255
Fonseca J. G. da S.
,
Oozeki T.
,
Takashima T.
,
Koshimizu G.
,
Uchida Y.
,
Ogimoto K.
2012
“Use of support vector regression and numerically predicted cloudiness to forecast power output of a photovoltaic power plant in Kitakyushu, Japan,”
Progress in Photovoltaics Research and Applications
20
(7)
874 
882
DOI : 10.1002/pip.1152
Geisser S.
1993
Predictive Inference
CRC Press
Lin C. J.
,
Weng R. C.
2004
“Simple probabilistic predictions for support vector regression,”
Natl. Taiwan Univ.
Taipei
Platt J.
1999
“Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,”
Advances in Large Margin Classifiers
10
(3)
61 
74
Fonseca J.G.S.Jr.
,
Oozeki T.
,
Ohtake H.
,
Shimose K.
,
Takashima T.
,
Ogimoto K.
2013
“Uncertainty Information in Forecasts of Photovoltaic Power Generation with Support Vector Regression: A Preliminary Study,”
Proceedings of the 17th International Conference on intelligent System Applications to Power Systems
Japan
Fonseca J. G. da S. Jr.
,
Oozeki T.
,
Ohtake H.
,
Takashima T.
,
Ogimoto K.
2014
“On the Use of Maximum Likelihood Estimation and Data Similarity to Obtain Prediction Intervals for Forecasts of Photovoltaic Power Generation,”
Proceedings of the International Conference on Electrical Engineering 2014
Jeju
1181 
1188
Fonseca J. G. da S. Jr.
,
Oozeki T.
,
Ohtake H.
,
Shimose K.
,
Takashima T.
,
Ogimoto K.
2013
“A Comprehensive Study of Photovoltaic Power Generation Forecasts in Multiple Locations in Japan,”
Proceedings of the 28th European Photovoltaic Solar Energy Conference and Exhibition
France
3601 
3606