Estimating Real GDP Growth
Contents
Introduction[edit]
In this paper, we search for a relationship between the change rates of real GDP, gross fixed capital formation, labor productivity and investment in knowledge as reported by the OECD. The findings of our analysis are summarized in a model to explain and estimate real GDP growth as a function of the other growth rates in the form:
with being the dependent variable real GDP growth and , and being the explaining variables, and , , and the coefficients to be determined.
Our intuitive hypothesis is that capital formation growth, labor productivity growth and investment in knowledge growth have positive effect on real GDP growth.
Such a model may be useful to derive policy recommendations for any government that aims to improve a country's economic performance in terms of real GDP growth.
Research Methodology[edit]
The data is taken from the general statistics section of the OECD website. The dataset comprises 32 observations which correspond to 30 OECD member countries plus the values for total OECD and total EU15 and four variables that are defined as follows:
 Real GDP Growth rates
The OECD website provides the GDP growth rates for nearly all 32 observations from 1971 to 2004 . Our dataset contains the average for the period from 1997 to 2004.
 Change in Gross Fixed Capital
As only the absolute values for the gross fixed capital stock are given in the OECDtable, we compute the change rates from 1997 to 2004 and take their average values.
 Change in Labor Productivity
The dataset contains the averages of the change rates from 1997 to 2004 that are given in the OECDtable.
 Change in Investment in Knowledge
The corresponding OECDtable contains the absolute values of investment in knowledge for 19 countries(Australia, Austria, Belgium Canada, Denmark, Finland, France, Germany, Greece, Ireland, Italy, Japan Korea, Netherlands, Portugal, Spain, Sweden, United Kingdom, United States). We transform these into average change rates as we do for other variables above, but with a lag of 2 years, i.e. from 1995 to 2002. All available observations are clustered in terms of the first three variables. The missing values are replaced by the means of the fourth variable within the respective clusters.
The data analysis is conducted with XploRe.
Univariate Data Analysis[edit]
In this part, each variable is briefly analyzed by their boxplots, histograms and some key figures.
Real GDP Growth[edit]
The boxplot for real GDP growth shows only one outlier: Ireland has the maximum average growth rate of 7.67%. The mean of 3.17% is slightly lower than the median of 3.23%. The minimum observation being Japan with 0.86% is not an outlier.
Excluding Ireland does not change the picture a lot. The new maximum is Luxembourg with a growth rate of 5.44% and is not an outlier. The new mean of 3.01% is now further away from the new median of 3.19%. 

The histograms are rightskewed with the skewness having a value of 1.03 and show at least two high peaks. The kurtosis of 5.08 is rather high.
The histograms excluding the outlier Ireland look quite different: the kurtosis is now less than half as high with a value of 2.29. The skewness even turn to negative: 0.03 which means that the histograms are now very slightly leftskewed. The variance is reduced from 1.81 to 1.13. 
Change in Fixed Capital[edit]
The highest fixed capital change rate of 3.41% is observed in Greece, the lowest of –3.68% in Turkey. Both extremes, however, still lie within the interquartile range * 1.5. So the boxplot for the change in fixed capital does not show any outlier.
The mean of 0.07% is slightly higher than the median of –0.09%. 

The histogram is almost unskewed as the skewness of 0.02 is very close to zero. In the averageshifted histogram, we see multiple peaks again. The kurtosis is 2.32 and the variance 3.45 respectively. 
Change in Labour Productivity[edit]
The boxplot for the change in labour productivity shows again only one outlier. The Slovak Republic has the highest growth rate in labour productivity with 5.23%. Besides Slovak Republic, all countries have had positive values, Italy being the minimum with 0.49%. The mean of 2.37% and the median of 2.19% are again rather close to one another.
If the Slovak Republic is excluded, there is no new outlier. The new maximum observation is Ireland with 4.56%. The new mean of 2.27% is still slightly higher than the corresponding median of 2.18%. 

The histogram and the averageshifted histogram are slightly rightskewed with skewness of 0.71. There seem to be multiple peaks, but they are rather close to each other.
The histograms excluding the Slovak Republic are less rightskewed with a skewness of 0.59. The two upper peaks are now more widely distinct. The variance is reduced slightly from 1.38 to 1.12 and the kurtosis from 2.67 to 2.49 respectively. 
Change in Investment in Knowledge[edit]
The boxplot of the change in investment in knowledge shows once more a single outlier on the top: Greece has the highest average increase in investment in knowledge with 8.83%. This time Ireland has the minimum value that is even negative with –0.82%. The mean of 2.76% lies slightly above the median of 2.64%.
Having excluded Greece, the new maximum observation is Austria with 5.7% average increase in investment in knowledge. The new mean of 2.55% is now below the unchanged median of 2.64%. 

If Greece is excluded, then the skewness almost disappears as it is reduced to 0.02, but the multimodality becomes more apparent. In addition, the variance and the kurtosis are reduced to 1.94 and 2.99 respectively. 
None of the four variables is distributed normally. They are all multimodal, skewed to either the right or the left and have kurtosis higher or lower than the normal distribution.
The highest and lowest cases don’t show any clear pattern. Countries being among the highest or lowest cases in one variable are usually not in the others. An exception is Ireland which is among the highest cases in three out of the four variables.
Bivariate Data Analysis[edit]
As our aim is to explain the behavior of GDP growth rate with respect to the changes in capital formation, labor productivity and investment in knowledge, two dimensional scatter plots can give us deeper insight beyond univariate analysis.
Plotting All Variables Against GDP Growth[edit]
The Real GDP growth versus fixed capital formation graph reveals a considerable correlation between the two variables. There seems to be two different linear trends in the data. Dividing the data into two subgroups can be helpful in the analysis in order to get a better fit of regression line. The countries plotted in cyan, namely Turkey, Slovak Republic, Korea, Luxemburg and Ireland (from left to right) seems to have steeper regression line compared to the other countries’ regression line. This means that cyan colored countries’ growth rates are more sensitive to the changes in fixed capital formation growth.  
The second scatter plot graph also reveals a liner correlation between GDP growth rate and labor productivity growth, though the linearity is less apparent than it is in real GDP growth versus fixed capital formation growth graph. Spain, Luxemburg and Ireland (left to right) seem to be in the upper part of the other countries. Most EU15 countries as well as US and Japan are in the lower left part of the graph. Two subgroups are visible from the graph: Developed countries are usually located in the left part of the graph (plotted in blue) and the emerging countries in right (plotted in cyan). As the real GDP and labor productivity are already relatively high in developed countries, the growth rates are low. Spain, Luxemburg and Ireland seem to outperform other countries in their subgroups by achieving above average real GDP growths with respect to their labor productivity growths.  
As we try to explain the role of improvement in technology by “growth of investment in knowledge” variable, we want to illustrate the graph of real GDP growth versus investment in knowledge graph here, although we could not find a pattern in the data. This third graph can only help us to see the outliers; namely, Ireland (left top) and Greece (far right). 
To reach an explicit explanation about the technology improvement, we analyzed the investment in knowledge growth variable further in the next section.
Investment in Knowledge Growth versus Other Variables[edit]
The graphs in the left column show all the data including the countries which have missing values of investment in knowledge growth variable in the original data. Ireland and Greece are respectively left and right outliers in all of them. Ireland has a negative investment in knowledge growth (the only negative growth value in our data); on the other hand it achieves high growth rates in real productivity, fixed capital formation and labor productivity growth. Ireland seems to outperform other OECD countries since it also stands out as an outlier in all scatter plots analyzed in the previous sections. Greece, being an outlier in opposite direction, has the highest investment in knowledge growth rate but still it is not an outlier in any graph analyzed in previous sections; the country achieves a moderate rate of growth in GDP, fixed capital formation and labor productivity.
Besides providing distinctive outliers, the scatter plots of investment in knowledge growth do not provide any general pattern of the data. At this point no exponential or logarithmic transformation is helpful since we already converted all variables into the growth terms. In addition to the scatter plots of all data points in the left column, the countries which have the original data of investment in knowledge are plotted in the right column in order to avoid any replacement error that might have occurred. Nevertheless, no improvement is achieved.

Multivariate Data Analysis[edit]
In this section, we apply some tools to analyse all of the variables and observations at the same time.
Parallel Coordinates Plot[edit]
The parallel coordinates plot allows to visualize and thus to analyze more than two or three variales at the same time. The integer numbers on the xaxis represent the four variables, while the respective realizations (i.e. the changes rates in %) are marked on the yaxis. Each line in the plot above represents a certain observation. The red is Germany, the blue correspond to the other EU 15 member countries, the black to the nonEU 15 OECD countries and the green stand for the EU 15 and OECD totals.
The latter lie in the middle as one would expect as they are just averages of some or all of the other lines. The majority of the blue lines look quite similar to the green. The two lines which deviate most correspond to Ireland and Greece which strike already in the previous analysis. The black lines are shaped much more inhomogeneously. Most of them seem to follow the same pattern as the green, while some, such as Turkey and the Slovak Republic do so in a more extreme way. Germany is located in the lower part of the graph which means that it is among the countries with the lowest change rates for most of the variables. 
Andrews Curves[edit]
The Andrews curves are another tool for multivariate data analysis. The idea is quite similar to that of the parallel coordinates plot. Each observation is transformed into a curve and plotted in the interval from to . Again the green curves as representing averages of the others lie in the middle. The total of the other curves seem to follow the same pattern. Only two of the blue curves are strikingly different. These are Ireland and Greece again. 
Star Diagrams[edit]
Each corner represents the standardized realisation value of the respective variable of the observation on an axis from 0 to 1.
Each country is plotted as a rectangle as follows: 
The “stars” of the EU 15 countries are on average smaller than those of the other OECD countries. So are the stars of the totals. This means that the EU 15 countries have had lower growth rates. Only Ireland and Greece are much bigger than the other EU 15 countries and even bigger than most of the others. 
Regressions[edit]
Introductory Analysis[edit]
<math> \Delta GDP</math> <math> \Delta FC</math> <math> \Delta LP</math> <math> \Delta IK</math> [1,]<math> \Delta GDP</math> 1 0.40281 0.58194 0.1316 [2,]<math> \Delta FC</math> 0.40281 1 0.19776 0.01092 [3,]<math> \Delta LP</math> 0.58194 0.19776 1 0.058048 [4,]<math> \Delta IK</math> 0.1316 0.01092 0.058048 1 
After understanding the behavior of variables and relationship among them, we try to formalize the relationship among the variables by multiple regression. Correlation and scatter plot matrices gives a general picture of the relations. GDP growth has 0.40 and 0.58 correlation with fixed capital formation growth and labor productivity growth respectively. Scatter plot matrices also give clues about the linear relationship. Neither correlation nor scatter plot matrices provide any clear information about the relationship between investment in knowledge growth and other variables as well as our previous analysis.
Regression With Fixed Capital Formation, Labor Productivity and Investment in Knowledge Growth[edit]
[ 2,] "A N O V A SS df MSS Ftest Pvalue" ________________________________________________________________________ [ 4,] "Regression 33.003 3 11.001 14.588 0.0000" [ 5,] "Residuals 19.606 26 0.754" [ 6,] "Total Variation 52.609 29 1.814" [ 7,] "" [ 8,] "Multiple R = 0.79204" [ 9,] "R^2 = 0.62732" [10,] "Adjusted R^2 = 0.58432" [11,] "Standard Error = 0.86838" "PARAMETERS Beta SE StandB ttest Pvalue" "________________________________________________________________________" [16,] "b[ 0,]= 1.4885 0.4560 0.0000 3.265 0.0031" [17,] "b[ 1,]= 0.3889 0.0881 0.5389 4.412 0.0002" [18,] "b[ 2,]= 0.7836 0.1404 0.6828 5.582 0.0000" [19,] "b[ 3,]= 0.0738 0.0905 0.0978 0.816 0.4220"
Although investment in knowledge growth seems not to have any significant linear relation with any other variable, we still include the variable in our regression model for the time being since we have proposed that the real GDP growth is positively correlated with fixed capital formation growth, labor productivity growth and investment in knowledge. However, the ANOVA table implies that we should not include investment in knowledge growth variable because it has very high probability that the null hypothesis of no relation between investment in knowledge and real GDP growth is true (0.42). The coefficient of investment in knowledge growth variable does not pass from ttest, as well. In addition the size if the coefficient (0.0738) is very small compared to the size of other variables’ coefficients (fixed capital formation growth 0.39, labor productivity growth 0.78). Furthermore, any transformation that we have tried on this variable does not improve the results significantly.
Therefore we have dropped the investment in knowledge growth variable from our regression and proceed with the other variables.
Regression With Fixed Capital Formation and Labor Productivity Growth[edit]
[ 2,] "A N O V A SS df MSS Ftest Pvalue" [ 3,] "_________________________________________________________________________" [ 4,] "Regression 32.501 2 16.250 21.820 0.0000" [ 5,] "Residuals 20.108 27 0.745" [ 6,] "Total Variation 52.609 29 1.814" [ 7,] "" [ 8,] "Multiple R = 0.78599" [ 9,] "R^2 = 0.61778" [10,] "Adjusted R^2 = 0.58947" [11,] "Standard Error = 0.86299" [12,] "" [13,] "" "PARAMETERS Beta SE StandB ttest Pvalue" "________________________________________________________________________" [16,] "b[ 0,]= 1.2697 0.3664 0.0000 3.465 0.0018" [17,] "b[ 1,]= 0.3889 0.0876 0.5390 4.440 0.0001" [18,] "b[ 2,]= 0.7902 0.1393 0.6885 5.673 0.0000" Contents of beta [1,] 1.2697 [2,] 0.3889 [3,] 0.79018
Making the regression for real GDP growth with fixed capital formation and labor productivity growth increases the explanatory power of the regression slightly; adjusted R² is 0.58432 in the former regression and it is 0.58947 in the new one. Besides, Ftest proves that our model is significant. The F statistics of the regression is 21.820 which is well above the critical value (3.3541) at the significance level of 0.05 with 2 and 27 degrees of freedom. Furthermore, in this regression model, all the coefficients are significant; they all pass ttest with pvalues well below our significance level of 0.05. Also, the size of the coefficients (0.39 for fixed capital formation growth and 0.79 for labor productivity growth) seems to be considerable.
Results[edit]
Our regression model can be summarized by the following formula
Where is real GDP growth, is fixed capital formation growth, is labor productivity growth, and is the residual.
1% increase in fixed capital formation increases the real GDP growth by 0.39% and 1% increase in labor productivity increases the real GDP growth by 0.79%.
The size of labor productivity growth coefficient (0.79018) indicates that labor productivity growth has the strongest effect on real GDP growth.
The regression model is also illustrated by the regression plane in which first dimension stands for fixed capital formation growth, second for labor productivity growth and third for real GDP growth. The distance between regression plane and individual observations also reveals the size of deviations from the regression model. This might cause some problems in making predictions by using the estimated model. As a matter of fact, when we predict the real GDP growth rates of the countries in the dataset by using our estimated model, high percentage errors occur for those countries which deviate from the regression plane a lot. 
Conclusion[edit]
Our data does not follow normal or unimodal distribution for any variable. There are different subgroups of countries which can have different paths to the increase of real GDP. As we have a small size dataset, some countries, especially the outliers, should be analyzed separately. Outperforming outliers can be regarded as benchmarks in policy recommendations for the poor performers. Ireland seems to be the most outstanding example. The country's success of high GDP growth comes from its high growth both in fixed capital formation and labor productivity. Although Slovak republic achieves a bigger growth in labor productivity, due to reduction in fixed capital formation, the country achieves only a moderate real GDP growth. Increasing the fixed capital formation growth is the obvious policy recommendation for this country.
Fixed capital formation and labor productivity have considerable and significant impact on real GDP growth. Countries should try to increase both their fixed capital formation and labor productivity in order to achieve high GDP growth rates. Labor productivity seems to have a bigger effect on real GDP growth. Countries who want to achieve high real GDP growths should focus on improvement of this variable.
Although we try to represent technology improvements by investment in knowledge growth variable and to explain the real GDP growth explicitly in terms of improvements in technology, we find that this is not possible with the available data. In fact, improvements in technology are important and affect the growth of output in an economy. There might be other variables that can better represent the improvements in technology. However, in choosing the variables we are restricted by the data available. In addition, we lag the investment in knowledge variable two years. Lagging five or ten years might give better results, but again we are restricted by the available data; the data are not available for those years.
References[edit]
 Härdle, W., S. Klinke, M. Müller, (2000), XploRe Learning Guide, SpringerVerlag Berlin Heidelberg
 Härdle, W.,L. Simar, (2003) Applied Multivariate Statistical Analysis, Springer Verlag BerlinHeidelberg
 OECD General Statistics, November 2006: http://stats.oecd.org/WBOS/default.aspx?DatasetCode=CSP6
 OECD Definitions, November 2006: http://stats.oecd.org/glossary/index.htm
 Wikipedia Handbuch, November 2006: http://de.wikipedia.org/wiki/Wikipedia:Handbuch
Comments[edit]
 Why do you take the average of GDP growth 19972004?
 The multimodality, especially for "Change in Investment in Knowledge" may come from a wrong binwidth
 Is Ireland part in the analysis? You stressed the fact that Ireland is an "outlier", but you do not say that is excluded. I'll assume it is part of the analysis
 I'am not sure that I would assume two regression in "Plotting All Variables Against GDP Growth". What about the following interpretation: POL and HUN are part of the lower group and we have a small outlier group IRL, LUX, KOR, SLO and R?
 Typo in "GDP growth rate and labor productivity growth"
 Can not read axes labels in Knowvsall.jpeg
 In Andrews curves: which order of variables did you choose? The curves depend on the order of variables
 If you compare regression coefficient then you should use the StandB coefficients
 Did you visit Prof. Rönz lecture "Computergestützte Statistik II"? It would be useful to do here some regression diagsnostic to judge the quality of the regression.