**Last update 17.9.1999**

Assume that a function f(t) is known in a finite set of points ti, i=1,2,...,I, i.e., you have a finite data set of pairs fi,ti with fi=f(ti)+e(ti), where ei=e(ti) is the error of the measurement that is not known. Usually, the time points are ordered in such a way that ti<tj if i<j. Now, you would like to predict fj for j>I, i.e., at a future time point. If you know, for example, that you observe a harmonic function of the form f(t)=A*cos(omega*t+phi), you can estimate the parameters A, omega, and phi from the given data set, provided that I>2. Usually, you don't know so much about the function f(t) and this makes the prediction harder. In physics, you sometimes know sometimes about the process that generates f(t), but you cannot describe f(t) by a simple formula with a finite number of parameters. When you observe stock quotes, the Dow Jones index, etc. you might only have very unclear ideas on what is behind the time series you observe. In this case, f(t) might be described by complicated formulae, by functions that are not well known or not even known yet. As a consequence, the problem becomes ill-posed, because there are infinitely many formulae fk(t) that approximate f(t) in all points ti with an arbitrarily small error. For example, you can set up a Fourier series expansion, power series expansion, or any other series expansion s(t) with a sufficient number of parameters that may be tuned in such a way that s(ti)=fi. For a prediction in some points tj, j=1...J, tj>tI, you can even assume arbitrary values fj=f(tj). Then you can consider the the data set fk(tk) with k=1,2,...,I,I+1,I+2,...,I+J as e new, finite data set that can be approximated by infinitely may formulae fk(t), for example, by series expansions with K=I+J tunable parameters. Therefore, you have no chance to obtain any prediction if you really don't know anything about the time series you observe.

What can be done? First of all, it is important to recognize that one never knows nothing about the time series that is observed. For example, an old law in physics is that nature makes no jumps. Maybe, this law is not true, but we often observe functions without jumps and sometimes the functions are even very smooth. Therefore, it is reasonable to focus on smooth functions in many situations and focusing on smooth functions drastically reduces the number of possible candidates for a given time series. Stock quotes usually don't look smooth at all, but it seems that one can assume smoothness at least for long term predictions of trends. Although one may observe pronounced jumps, the values of observed time series usually are limited within a band that is known from experience. This also reduces the number of possible candidates for a given time series.

We construct an approximation fk(t) always with known mathematical tools. The number of such tools is finite and we usually restrict ourselfs to a quite small number of tools that are known to be often sucessful. This means that we make some selection that also drastically reduces the number of possible candidates for a given time series. This selection might have the consequence that we are unable to obtain a correct prediction. It seems that mathematics has coevolved with physics. Therefore, this selection is often well adapted to the observed f(t), especially in physics. In economy, one has less experience with the mathematical description of observed processes. Therefore, it is less likely that good predictions are obtained with well known mathematical tools.

An approximation fk(t) can be considered as a "theory" that explains the observations f(ti). According to Einstein, theories should be a simple as possible in some sense. For time series predictions, simple theories have as few tunable parameters as possible. Series expansions such as Fourier series with infinitely many tunable parameters are not simple in this sense, whereas truncated series with a few terms can be considered as simple theories. GGP works with short, truncated series of the form fk(t)=sum Al*fl(t,pl), where Al is a linear parameter, fl a basis function, and pl a non-linear parameter. GGP optimizes the parameters Al and pl and it constructs and obtimizes the basis functions fl from a set of elementary, i.e., simple mathematical functions. The construction of the fl is more sophisticated than in standard Genetic Programming (GP) and can be represented by trees. GGP works with full binary trees that are much smaller, i.e., simpler than the trees that are used in standard GP codes.