**Last update 17.9.1999**

A time series is a finite data set, for example, f0, f1, f2, ...., fn-1 that has been measured at different points t0, t1, t2, ..., tn-1 in time. Standard time series prediction methods try to evaluate fn at time tn. Usually, it is assumed that all time steps are equal and need not be stored explicitly. Moreover, it is assumed, that a good estimate of fn may be obtained from the m previous values fn-1, fn-2, fn-3, ..., fn-m. Therefore, a formula or mechanism of the form f'i=F(fi-1, fi-2, fi-3, ..., fi-m) is searched in such a way that f'i=fi+ei, where ei is the prediction error. Since fi is known for all i<n, the error ei is known for i<n and Formula F can applied for i>m-1. Now, F is searched in such a way that ei is minimized for all m-1<i<n. When a formula is found that leads to small errors ei, one can assume that the prediction f'n=F(fn-1, fn-2, fn-3, ..., fn-m) is accurate. Note that the formula F is not unique, i.e., one can find various formulae Fj that result in different predictions f'nj, i.e., the predictions are uncertain. When a long-term prediction f'k with k>n is desired, one can iteratively apply f'i=F(fi-1, fi-2, fi-3, ..., fi-m), starting with i=n and continuing with i=n+1, i=n+2, ..., i=k. In the first step, the input variables fi-1, fi-2, fi-3, ..., fi-m of F are known (measured) data. In the successive steps, these values are replaced by estimates obtained in the previous steps, which drastically increases the uncertainty and the errors of the estimate f'k. Therefore, this procedure rarely leads do good long-term predictions.

GGP offers a new method for long-term prediction that seems to be much better in most cases. Here, the time series is considered as a function f(t), sampled in the points f0, f1, f2, ...., fn-1 at t0, t1, t2, ..., tn-1. Now, a generalized series expansion of the form f(t)=Sum (k=1,2,...K) Ak Fk(pk,t) is searched, where Ak are linear parameters, pk linear parameters and Fk basis functions. Usually, only a small set of basis functions (typically K=3) is constructed by an improved Genetic Programming technique and the parameters Ak and pk are optimized in such a way that a good extrapolation is obtained, when f(t)=Sum (k=1,2,...K) Ak Fk(pk,t) is evaluated for t>tn-1. To improve the prediction, GGP offers several features such as subdivision of the known data set into approximation and extrapolation range, data pre-processing, and a huge number of system parameters. An optimal setting of these parameters is not known because the author (Christian Hafner) had no time for extensive tests, but the default values seem to be quite good in most situations when the given data set is properly scaled.

More information on GGP and the GGP time series prediction philosophy.

The following picture show an excellent GGP long-term prediction of the properly scaled and weighted Dow Jones index over almost 100 years, obtained with standard GGP system parameters and K=3 basis functions. The time scale starts in 1900 and ends in 1998. The data in the approximation range are used to find the linear and non-linear parameters. The basis functions are searched in such a way that a good fit is obtained in the approximation range and the extrapolation quality of the solutions is checked in the extrapolation range. The data in the prediction range is not known for the GGP algorithm. The best 10 solutions are plotted. The quality of them is indicated by colors (dark for relatively bad, bright for relatively good, best solution by green). Excellent predictions have been found for a range of approximately 25 years! It seems that GGP found three different sets. Two of them overestimate the performance.

Since the time range is very long, one can assume that the data in the approximation range that are far away from the prediction range are less important for the prediction than those that are near the prediction range. To take this into account, the error function to be minimized by GGP was exponentially weighted. If this weighting is omitted, the GGP prediction becomes wrong:

This reflects the uncertainty of the prediction. Note that the prediction starts in an area with a relatively high uncertainty. When the border of the prediction range is moved the difference between weighted and unweighted GGP results becomes less pronounced.

The uncertainty of the prediction depends on various properties of the given data set. Often the data set is noisy and noise can drastically reduce the quality of a prediction. Financial data can be very noisy. It is known that such data have also some fractal aspects that should not be mixed up with noise. Theoretically, it would be possible to take the fractal aspects into account and to correctly predict such data. With the standard settings, GGP can also create discontinuous functions, but it turns out that such solutions die out quite quickly and the GGP focuses on smooth continuous functions. Note that the green solution in the figure above seems to be noisy. In fact, GGP has created a function with a small non-smooth portion that looks like noise. This portion is caused by a purely deterministic basis function. Obviously, it does not contribute much to the quality of the solution. Therefore, GGP would replace the corresponding basis function by a better one if one would run GGP for a longer time. However, noise can considerably disturb the prediction and it seems that the noise content in financial data is the higher the shorter the observation interval is. Moreover, indices like the Dow Jones are less noisy than stock values of special companies - especially small ones. A typical example is shown in the following figure:

GGP has a noise estimation feature that indicates that the AMD data over a period of 8 years is noisy. As one can see, most of the GGP are completely wrong. If one would run GGP for a longer time, it might find better solutions, but since GGP offers many solutions with a similar quality in the approximation and in the extrapolation range, but with a completely different behavior in the prediction range, it is impossible to obtain confidence in any of the predictions.

Microsoft is a much bigger company with less noisy data as one can see in the following figure.

As one can see, all GGP predictions show the same trend and are almost identical in the approximation and extrapolation range. Because of the more or less exponential behavior, it would be reasonable to analyze the logarithm of the stock value, i.e., to do some pre-processing first. GGP could easily do this, but since the author (Christian Hafner) had no access to precise data (the data were extracted from a bitmap file of the chart found on the WWW), this would cause inaccuracies for low values, which would correspond to some noise.

As one can see in the following figure (General Electrics - a big company with low noise and behaviour similar to the Dow Jones), GGP first creates solutions that are obviously wrong. These solutions do not even approximate the data in the approximation range. After a while, good approximations are found with a low quality in the extrapolation range.

When the given data set is simple enough and not too noisy, GGP can find several solutions with a good quality in the approximation and extrapolation range. When these solutions remain within a limited area and look reasonable in the prediction range, one can obtain some confidence in the prediction. To obtain more (or less) confidence, one should analyze the same data within several GGP runs with slightly different system parameters, different weighting, different size of the approximation/extrapolation range, etc.