Analysis of trends in worldwide internet

From Teachwiki
Revision as of 10:48, 16 January 2007 by WikiSysop (Talk | contribs) (References)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


The thesis makes a first view on a worldwide access, with the aim to explain growth of internet users. Internet is today the fastest developing medium. It is a powerful tool for spreading knowledge, education and modern business. Therefore we are interested in factors, which influence its growth.



Our data were downloading from EarthTrends is a comprehensive online database, maintained by the World Resources Institute, which focuses on the environmental, social, and economic trends that shape our world. It provides a lot of different attributes of each country, so we choose for our research variables important for our topic. For each characteristic we have also a time series with year frequency. The original sources of this data are: The World Bank, International Telecommunications Union (ITU) and UNESCO. In our dataset are data from 208 countries and we choose following 12 attributes:

Table 1: Variable List
Variables Meanings Type
Name Name of the country text
Users2002 Number of users per 1000 people in 2002 numeric
Users2001 Number of users per 1000 people in 2001 numeric
Users2000 Number of users per 1000 people in 2000 numeric
Investment Annual investment per person in telecommunications numeric
GDPgrowth GDP growth 2001 ( in percent) numeric
GDP GDP per person 2001 in American dollar numeric
Computers Number of computers per 1000 people in 2001 numeric
Handies Number of handies per 1000 people in 2001 numeric
Telephone Number of telephone lines per 1000 people in 2001 numeric
NetHosts Number of net host per 1000 people in 2001 numeric
Television Number of television per 1000 people in 2001 numeric
Population Population numeric

Missing values[edit]

Because of political situation, absence of credible sources etc., we couldn’t have complete data from all countries. We cope with the missing value by this way: At first we found a country in same region, which have most similar values. Then we estimate a missing value in same rate, as was between values in matching country. Except the number of internet users, we use for this purpose time series of attribute (we use time series of number of internet users, so we couldn’t estimate it by itself).


Our work is interested in trends, so we have to be also careful when we take away outliers. We couldn’t simply take away big countries like USA, China, for all they are often outliers. But they significantly dispose trends. On the other hand, small countries can have impropriate influence on our statistics. A situation on some small island in Pacific with just one internet provider is too much specific for providing a information, how economics indicators influence internet growth. Hence we take away from our dataset all countries with less then half million inhabitants. Together with countries, taking away for the sake of missing values, we usually work with about 120 countries. So finally we could say, that our data have a sufficient quality for finding trends in worldwide internet, but we couldn’t use them for exact comparison of two countries (this warning is also included in original description of data).

World overview[edit]

For pictures in first part of our thesis, we use this color notation:

  • Europe countries are green
  • Asia countries are yellow
  • Sub-Saharan Africa countries are red
  • Central, South America and Caribbean are blue
  • Middle East and North Africa are cyan
  • USA, Canada and Australia are magenta

USA, Canada, Australia and a few bigger countries from Oceania are in one specific group, because they are often outliers, so it is useful to know about them.

Number of users per 1000 inhabitants, year 2002

As a wealth, internet access is also not uniformly divided in the world. From the boxplots we could see, that the disproportion between developing countries from regions like Africa and high income Europe countries is quite big - around 22times (mean of European countries is 224 , Africa's 10). But in each group there are countries, where peoples have very bad possibilities to connect themselves to the internet. In Europe it is Albania (3,9) or Ukraine (18,7), in Asia Tajikistan (0,5) or Bangladesh (1,5). South Africa (191) and Zimbabwe (41) are high above the Africa median, but on the level of European countries they have a long journey. Top leading countries are Iceland (647) and South Korea (551). But as we could see, countries like South Korea and Japan highly influence statistic of Asia countries, because mean of these countries (55) is much lower then the median of Europe countries (224).

Except Middle East, the relative growth (Users 2002/ Users 2001) seems to be more uniform among the continents. But it only says that the countries from less developed regions could hardly reach rank of rich countries. Only the income from petroleum resources help Middle East develop much faster then the rest of the world. Countries with fastest relative growth belongs between countries with small number of users (Syrian: users 21, growth rate 6,3). The top countries of absolute growth (number of users in 2002 - number of users in 2001; all per 1000 inhabitants) are partially different. Here belong countries, which have build up basic facilities for spreading internet. So nothing inhibit internet boom. The market in most developed countries is quite saturated, therefore there is not so big space for growing.

Users 2002/ Users 2001
Users 2002-Users 2001

The boxplots of annual investment can us just say, that a rich country can afford invest a lot of money into telecommunication. Although they mostly are not countries with top fastest growth. But the picture doesn't say us anything about the quality of internet access. In this countries could growth a number of broadband internet users. The interesting fact is that none of richest Arabian state doesn't invest so much as European countries, although they have comparable GDP.

Annual investment into telecommunication

More interesting is statistic of countries, which are "clever" and invest a lot of money relatively to their GDP. The following boxplots show rate between annual investment and GDP per person, product multiplied by 1000. The unexplainable outliers are Cyprus(254), Latvia(97) and Cambodia(40). The second picture is plot between GDP and relative investment (previous new variable). We could see that any of the richest countries don't have investment into telecommunication as a priority of their expenses. Otherwise, some of the very poor countries, where people "starve of food", they couldn't "live without internet". This fact could be caused by international concerns, which try to catch a part of future potential market. The famous example is Afghanistan (is not included in our dataset), where after war the first foreign concerns became mobile phone concerns.

Boxplot of relative investment
Relative investment versus GDP

The last pair of pictures provides us a comparison between new and old technology. First is a plot between Number of televisions and GDP. The second is plot between Number of internet users and GDP. It shows us that time help to take down differences, but rich countries still have better equipment. Also shows that television is still more common home facility than a internet, even in rich countries (take attention on different scale on pictures).

Number of televisions versus GDP
Number of internet users versus GDP

Focus of internet growth[edit]


Now we want to direct attention on explanation of internet growth. Our first question is: Are internet growth linear dependent on annual investment into telecommunication? We deduce our result from these three variables:

  • Internet users 2002
  • Internet users 2001
  • Investment into telecommunication (year 2001)

We have two basics possibilities, how to manage with them:

  • Discounting: Users in 2002 – Users in 2001
  • Dividing: Users in 2002 / Users in 2001

Each way has different results and different explanation. Although we could simply calculate one variable from the second. When we try to use linear regression, the discounting maybe has a sense, but using dividing doesn’t have sense. From dividing picture we could see two things. The countries with small investment relatively growth a lot and there are a big differences between them. It could be caused by their low starting level. More developed countries, although they invest two times more then another country on similar level, couldn’t double their internet users.

Users 2002-Users 2001
Users 2002/ Users 2001

Because countries in each continent have different opportunities, we try to apply linear regression for discounting case on each continent apart. As we could see from the table, for some continent it fit very well (Middle East - adjusted R^2 is 0.64) and for some it also don’t have a sense (adjusted R^2 of Asia is 0.03).

Country Adjusted R^2 B[1]
ALL 0.46 1,89
EUROPE 0.14 1,10
AFRICA 0.19 0,98
ASIA 0.03 0,51
MIDDLE EAST 0.64 3,45

Application of multiple linear regression give similar result. For some continents was more sensible use discounting approach, for some dividing. After using stepwise selection, the important variables were also different. So the internet growth can be in some group of countries described by linear regression, but with different attributes.

Country Adjusted R^2 DIVIDING Adjusted R^2 DISCOUNTING
ALL 0.10 0.47
EUROPE 0.26 0.24
AFRICA 0.32 0.48
ASIA 0.64 0.49
MIDDLE EAST 0.22 0.28

Multivariate analysis gives us first view about variables, which could influence the internet growth. But for concrete use for estimating a growth of some country, we have to always look if the model fit for our country in previous years, or there exist some specific national conditions. We could use these results as a help, on which variables we should focused during predicting the internet growth.

Parallel Coordinate Plots[edit]

The dependences between growth and countries attributes are probably more complicated then linear regression. Therefore we use Parallel Coordinate Plots, which is another tool for representing more variables at the same time. For final pictures we choose just five variables, where the results were most significant.

  1. Users 2002/users 2001
  2. Users 2001
  3. Investment 2001
  4. GDP growth
  5. Telephone lines

We use this new color notation:

  1. Growth < 0.5 median of all countries from sample
  2. Growth > 0.5 median of all countries from sample
  3. Growth > median of all countries from sample
  4. Growth > 1.5 median of all countries from sample

We could see that countries with small investment into telecommunications, small numbers of users, and big GDP growth, have the fastest relative growth. It is due to those countries are in a period of economics growth, people could afford buy new technologies, and their starting level is low. Therefore they could have big relative growth. This trend is similar on all continents.

Parallel Coordinate Plots for Europe - Dividing
Parallel Coordinate Plots for World - Dividing

On the other hand, for big absolute growth is necessary to have a lot of users and invest a lot of money into telecommunications. This means that although those countries on low level of internet access develop relatively fast, in absolute numbers it is still slower than the high level countries. This trend is clearly in Asia, but not so much in Europe. It could be because this market is saturated.

Parallel Coordinate Plots for Europe - Discounting
Parallel Coordinate Plots for Asia - Discounting
Parallel Coordinate Plots for World -Discounting


From a global point of view, we could find clear dependences between internet growth and characteristics of countries, but they are more complicated than simple linear regression. The most important are number of users, investment into telecommunication and GDP growth. For predicting accurate numbers of growth, deeper analysis will be needed. A cluster analysis instead regions could be a good way. Including facts of like market conditions (monopoly, more providers) or law background should be also useful.


  • Härdle W., Simar L., Aplied Multivariate Statistical Analisis, Springer 2003
  • Härdle W., Hlávka Z., Klinke S., Xplore Aplication Guide, Springer 2000
  • Härdle W., Klinke S., Műller M., Xplore Learning Guide, Springer 2000


  • Did make the missing replacement process yourself?
  • Nice boxplots!
  • The continents as group are very heterogenous, other groups would have been better
  • No Xplore programs (not even in the appendix)