Analysis of Assets of U.S. Companies

From Teachwiki
Jump to: navigation, search


In this thesis we analyze data of 79 U.S. companies. Our focus is company's asset. First we check the relationship between company's asset and sector (e.g. Finance or HiTech). Idea is to test the independence between these 2 variables. Then we analyze relationship between asset and other variables like company's markets value, employee and so on. Result will be given in form of multivariate regresion function.

As a result, we find the independence between assets and sectors is not very significant. Final regression function shows that market values and costs have positive impact on assets, while the variable of employees has negative impact.

Analysis Background

Introduction to Data

Our data set comes from MD*base ( It consists of measurements for U.S. companies of the top 500. There are 79 observations and 8 variables, 2 nominal and 6 numeric variables (see Table 1). All of the companies come from 9 different sectors: Communication, Energy, Finance, High Technology, Manufacturing, Medical, Retail, Transportation and other else. The Observations contain 500 top companies such as IBM, Apple Computer, and so on.

Table 1: Variable List
Variables Meanings Symbol Type
Comapany The name of U.S. companies CP nominal
Assets Assets of U.S. companies AS numeric
Sales Sales volume of U.S. companies SA numeric
Market Value Stock market value of U.S. companies MA numeric
Profits Net profit of U.S. companies PR numeric
Cash Flow Net cash flow of U.S. companies CF numeric
Employees The number of employees in U.S. companies EM numeric
Sector The sector that U.S. companies belong to SE nominal

Why assets?

It seems that market value is a very popular term that evaluates people's or companies' wealth. For example, a couple of years ago, people always said Bill Gates is the most wealthy man in the world, because the market value of Microsoft is quite high. But recently nobody say that again. Market value is much more volatile.

Here in our analysis, we use assets as a term that evaluates companies' wealth or power. In our opinion, "assets" is a combination of different kinds of valuables, such as stocks, bonds, fixed assets, human capital and so on. "Assets" is much more stable than market value and may reflect the true situation of the companies.

A most accepted accounting definition of "asset" is the one used by the International Accounting Standards Board: "An asset is a resource controlled by the enterprise as a result of past events and from which future economic benefits are expected to flow to the enterprise."

In accounting, asset is defined as,

\ Assets = Liabilities + Owners' Equity\,

This equation is known as accounting equation.

Analysis Objectives

  • Sectors and assets: independent or not?

Our observations come from 9 different sectors. Intuitively we think companies of different sectors may have different values of assets. We will check if there is significant relation between sectors and assets. Our first objective is to test independence. We try to analyze whether asset is independent on sector.

  • Regression on assets

In our data, we have 6 numeric variables. Let assets be the dependent variable, while the other 5 variables (sales, market value, profits, cash flow, and employees) be the independent variables. We try to find out the relationship between assets and the other 5 variables. Finally we determine the regression function of assets.

Descriptive Statistics

Through descriptive statistical analysis, we can get some useful information about variables, such as distribution feature, outlier, and possible relationship. Our data set has no missing data.

  • Histogramms

From histogramms, we can see the distributions of all 6 numeric variables are similar (see Fig.1). They all have an extremely high peak, the rest are very low. And all the distributions have right long tails.

Fig.1 Histogramms of 6 numeric variables

  • Andrew's Curve

We also draw the andrew’s curve based on all the observations (see Fig. 2). Most of the observations have similar curves because many curves are overlapped. We can also see only one curve is totally different from the others.

Fig.2 Andrew's Curve for all observations

  • Face Plot

Through drawing Chernoff-Flury faces, we get very similar faces, except for a quite outstanding guy’s face (see Fig. 3). We identify that this different observation is no. 40, which is IBM. We think it should be an outlier. In later analysis, we may delete this outlier.

Fig.3 Face Plot for all observations

  • Scatter Plot Matrix

After deleting the outlier, we make a scatter plot matrix (see Fig. 4). We try to find out the relationships among the 6 variables respectively. We find profits and cash flow have clearly positive correlation. Assets should have somehow correlation with the other 5 variables, but which are not very clear. We will use linear regression analysis to find out the relationship between assets and the other 5 variables.

Fig.4 Scatter Plot Matrix

Test of Independence

After knowing variables' descriptive features, we begin to analyze relation between companies' assets and sectors. First let's look at our data again.

Sectors and Companies

Table 2 below shows the 9 sectors of companies and the numbers of companies within each of them. We see that in our data there are 2 communication companies, 15 energy companies, 17 finance companies and so on.

Table 2: Sectors and Companies
Sector Number of Observations
Communication 2
Energy 15
Finance 17
HiTech 8
Manufacturing 10
Medical 4
Other 7
Retail 10
Transportation 6
Fig.5 Boxplots of assets

Fig.5 (right) shows Boxplots of company assets among these 9 sectors. From left to right are Communication, Energy, Finance, HiTech, Manufacturing, Medical, Other, Retail, and Transportation.

We see that HiTech sector has the highest mean and the largest value of asset, which is a circle in Boxplot. It is our no. 40 observation, IBM. Finance sector has a larger median than HiTech sector. The sector “Other” has a lowest median. From these Boxplots we think there may be differences among these 9 sectors by assets. In order to make sure if sectors and assets have relationship, we need to do statistical test.

Our idea is chi-square test of independence.

Theoretical Background

Statistically can show,


approximately follows chi-aquare distribution with degrees of freedom df = (rows-1)*(columns-1)


\ O_{ij}\ is observed frequency

\ E_{ij}\ is expected frequency

Under null hypothesis of independence, E_{ij}=\frac{O_{i.} \cdot O_{.j}}{n}\,

Transformation and Group Making

For doing chi-square test of independence with "crosstable" in XploRe, we need a pair of numeric vectors. So the nominal vector of sectors should be transformed into numeric one. And we also need to make groups for the assets vector in order to obtain an approximate chi-square distribution.

  • Transformation of Nominal Vector into Numeric One

Below is our original data of sector vector which is nominal.

[ 1,] "Communication"
[ 2,] "Communication"
[ 3,] "Energy"
[ 4,] "Energy"
[ 5,] "Energy"
[ 6,] "Energy"
[ 7,] "Energy"
[ 8,] "Energy"
[ 9,] "Energy"
[10,] "Energy"
[11,] "Energy"
[12,] "Energy“
…     …
…     …

In order to transform it into numeric one, we need a code as follows,


y1=x.text[,2]= ="Communication"
y2=x.text[,2]= ="Energy"
y3=x.text[,2]= ="Finance"
y4=x.text[,2]= ="HiTech"
y5=x.text[,2]= ="Manufacturing"
y6=x.text[,2]= ="Medical"
y7=x.text[,2]= ="Other"
y8=x.text[,2]= ="Retail"
y9=x.text[,2]= ="Transportation"


Then we get a new sector vector of y like this,

Contents of y

[ 1,]        1 
[ 2,]        1 
[ 3,]        2 
[ 4,]        2 
[ 5,]        2 
[ 6,]        2 
[ 7,]        2 
[ 8,]        2 
[ 9,]        2 
[10,]        2 
[11,]        2 
[12,]        2 
…            …
…            …

where “1” stands for communication, “2” stands for energy ,and so on.

  • Group Making for Assets

In order to obtain an approximate chi-square distribution, we divide assets into 2 groups by their median.

x1=x.double[,1]<= median(x.double[,1])
x2=x.double[,1]> median(x.double[,1])

Then we obtain a new vector of x like this,

Contents of x

[ 1,]        2 
[ 2,]        2 
[ 3,]        2 
[ 4,]        1 
[ 5,]        1 
[ 6,]        2 
[ 7,]        2 
[ 8,]        2 
[ 9,]        1 
[10,]        1 
[11,]        1 
[12,]        1
…            …
…            …

Where “1” stands for those whose asset is smaller than or equals the median of all assets,“2” stands for those whose asset is larger than the median.

Test Result

Result of independence test by using "crosstable" for vector x and y in XploRe is below,

Contents of cross

[ 1,] " "
[ 2,] "                   "
[ 3,] "Crosstable for variables Asset, Sector"
[ 4,] " "
[ 5,] "         |      1.0000  2.0000  3.0000  4.0000  5.0000  6.0000  7.0000  8.0000  9.0000 |"
[ 6,] "---------|-----------------------------------------------------------------------------|---------"
[ 7,] " 1.0000  |      0       8       4       3       5       3       6       6       5      |      40 "
[ 8,] " 2.0000  |      2       7      13       5       5       1       1       4       1      |      39 "
[ 9,] "---------|-----------------------------------------------------------------------------|---------"
[10,] "         |      2      15      17       8      10       4       7      10       6      |      79 "
[11,] "                   "
[12,] "Chi^2 test of independence"
[13,] " "
[14,] "  chi^2 statistic:                    14.96"
[15,] "  degrees of freedom:                     8"
[16,] "  significance level for rejection:  0.0599"
[17,] " "
[18,] "  contingency coefficient:             0.40"
[19,] "  corrected contingency coefficient:   0.56"
[20,] "                   "

We get a p-value of 0.0599. At a significant level of 10%, we reject H_0 and would say assets and sectors are not independent. There is relationship between these 2 variables. At a significant level of 5%, we can not reject H_0 and will say assets and sectors are independent.

The independence between assets and sectors is not very significant. This result is not surprising. Companies from different sectos may have different performances or different production process. As the boxplots of assets (Fig. 5) above already suggests, Finance has a larger median in assets, while Retail and Transportation have relatively smaller ones.

Regression Analysis

After checking the relationship between assets and sectos, let us turn to regression analysis.


Before we do the regression analysis, we transform the variables. As we know, net profits should be equal to sales volume minus total cost (Profits = Sales - Cost). Since we have already known the data of profits and sales, we create a new variable named as cost. It comes from sales minus profits. Then we replace profits and sales by cost (CO). And we will get a linear regression model:

\ AS = b_0 + b_1*MV + b_2*CF + b_3*EM + b_4*CO\

Where AS stands for assets, MV for market values, CF for cash flows, EM for employees and CO for costs.

Regression I

We do linear regression analysis by using XploRe, get a result as follows (without no.40 observation):

[ 1,] ""
[ 2,] "A  N  O  V  A            SS         df       MSS            F-test   P-value"
[ 3,] "_________________________________________________________________________"
[ 4,] "Regression           4182348244.692     41045587061.173     32.817   0.0000"
[ 5,] "Residuals            2357693938.979     7431860728.905"
[ 6,] "Total Variation      6540042183.671     7883846694.662"
[ 7,] ""
[ 8,] "Multiple R      = 0.79969"
[ 9,] "R^2             = 0.63950"
[10,] "Adjusted R^2    = 0.62001"
[11,] "Standard Error  = 5644.53088"
[12,] ""
[13,] ""
[14,] "PARAMETERS       Beta           SE         StandB        t-test   P-value"
[15,] "_________________________________________________________________________"
[16,] "b[ 0,]=       2561.1287     818.6294       0.0000         3.129    0.0025"
[17,] "b[ 1,]=          0.5219       0.2546       0.6443         2.050    0.0439"
[18,] "b[ 2,]=         -2.8154       2.1943     - 0.3707        -1.283    0.2035"
[19,] "b[ 3,]=        -96.9105      25.8494     - 0.6827        -3.749    0.0003"
[20,] "b[ 4,]=          1.6241       0.2807       1.1316         5.786    0.0000"

We get very good P-value for the model and also good Adjusted R square. But the p-value of our parameter b2 (it is cash flow, CF) is much larger than 0.05 significant level. We decide delete this variable from our model.

Regression II

We define new model as:

\ AS = b_0 + b_1*MV + b_2*EM + b_3*CO\

Do linear regression analysis again. We get the result as follows:

[ 1,] ""
[ 2,] "A  N  O  V  A             SS         df     MSS            F-test   P-value"
[ 3,] "___________________________________________________________________________"
[ 4,] "Regression           4129897384.073    31376632461.358      42.839   0.0000"
[ 5,] "Residuals            2410144799.598    7532135263.995"
[ 6,] "Total Variation      6540042183.671    7883846694.662"
[ 7,] ""
[ 8,] "Multiple R      = 0.79466"
[ 9,] "R^2             = 0.63148"
[10,] "Adjusted R^2    = 0.61674"
[11,] "Standard Error  = 5668.79740"
[12,] ""
[13,] ""
[14,] "PARAMETERS       Beta         SE           StandB        t-test   P-value"
[15,] "________________________________________________________________________"
[16,] "b[ 0,]=       2363.5850     807.4776       0.0000         2.927   0.0045"
[17,] "b[ 1,]=          0.2260       0.1082       0.2789         2.088   0.0402"
[18,] "b[ 2,]=        -95.3962      25.9335      -0.6720        -3.678   0.0004"
[19,] "b[ 3,]=          1.6190       0.2819       1.1280         5.744   0.0000"

Our model’s p-value is zero, Adjusted R square is 0.616. It means that our model fits the data very well, and 61.6 percent of the data is reflected by our model. We also get very good p-values of the parameters.

The final model is:

\ AS = 2363.59 + 0.23*MV - 95.40*EM + 1.62*CO\

Where AS stands for assets, MV for market values, EM for employees and CO for costs.

Variable of employees has negative impact on assets. That means the more employees, the lower assets company has. And the higher market value, the higher assets; the higher cost, the higher assets. Companies that have high costs must have high assets to afford these high costs. Otherwise, high costs might make them bankrupt.


Through descriptive statistics, test of independence and regression analysis, we draw conclusion as follows:

1. At a significant level of 10%, assets and aectors are not independent. There is relationship between each other. At a significant level of 5%, Assets and Sectors are independent. Generally speaking, the independence between assets and sectos is not very significant.

2. Using linear regression analysis, we get the final regression function of assets. The variables of market values and costs have positive impact on assets, while employees have negative impact.

The final model is:

\ AS = 2363.59 + 0.23*MV - 95.40*EM + 1.62*CO\

Where AS stands for assets, MV for market values, EM for employees and CO for costs.


  • Härdle, W./ Simar, L.: Applied Multivariate Statistical Analysis, Springer 2003


  • Andrews Curve and Faces rely heavily on the order of variables.
  • Fig3 says "Face Plot for all observations", why do I not see 500 faces?
  • Why do you made only two groups from your assets variable?
  • Why do you choose alpha=10%?
  • In the conclusion you sometimes state facts like "Variable of employees has negative impact on assets.", sometimes you try to interpret "Companies that have high costs must have high assets to afford these high costs.".