Correlation Coefficient for Bi-Variate Data

Subject: Business Statistics

Overview

It is possible to calculate the correlation between two variables simultaneously, or bi-variate correlation.

Karl Pearson’s Correlation in Bivariate Frequency Table

A correlation table or bivariate frequency table, which displays the frequency distributions of two linked variables, is frequently used to classify data when the number of observations in a bivariate distribution is infinitely large. To examine correlation between two grouped series, a correlation table is required. Two variables, x and y, each have a class interval, one of which is stated in the captions and the other in the stubs to the left of the table. The correlation coefficient of the bivariate distribution is then calculated using the formula below.

  • r = \{\frac{N ∑f UV – (∑fU) (∑fV) }{\{sqrt{N. ∑fU2—(∑fU)2 }\} \{sqrt{N. ∑fV2—(∑fV)2 }\} }\}

Where,

N = total frequency

V = Y-B

B= assume mean of variable Y

U = X-A

A = assume mean of variable X

  • r= \{\frac{N ∑f U’.V’ – (∑fU’) (∑fV’) }{\{sqrt{N. ∑fU’2—(∑fU’)2 }\} \{sqrt{N. ∑fV’2—(∑fV)'2 }\} }\}

where ,

N = total frequency

V = \{\frac{Y - B}{K}\}

B = assume mean of variable Y

K = class size of variable Y

U = \{\frac{X - A}{h}\}

A = assume mean of variable X

h = class size of variable X

Steps:

  • List the class interval for the two variables X and Y, with the column and row headings respectively.
  • Take deviations (or steps-deviations) from the variables' assumed means, which are represented by U and V (or U' and V', respectively), and then calculate the midpoints of the class intervals for the variables X and Y.
  • Add the frequencies of all the cells for each X class. the same for Y.
  • To calculate fV, multiply the frequency of the X variable by the corresponding value of U, then add the resulting products.
  • Once more, multiply fU by U and fV by V.
  • Multiply each cell's f, U, and V values, and then write the result in the right-hand corner of each cell.
  • The last column (or row) fU.V is obtained by adding all the values in the top corner sequences.
  • To compute r, replace all sums of values in the formula.

Example:

Utilizing the following bivariate frequency distribution, calculate the coefficient of correlation. also figure out the likely error.

Sales revenue

(Rs. In lakh)

 

 

 

Advertising expenditure in Rs.

 

5000-10000

10000-15000

15000-20000

20000-25000

75-125

4

1

-

-

125-175

7

6

2

1

175-225

1

3

4

2

225-275

1

1

3

4

So;ution:

U’= \{\frac{X - 150}{50}\}

V’ = \{\frac{Y – 12.5}{5}\}

Adv. Exp(Rs. ‘000)

5-10

10-15

15-20

20-25

f

fU’

fU’2

fU’V’

Mid-value Y

7.5

12.5

17.5

22.5

Sales revenue

Mid-value

V’

U’

-1

0

1

2

75-125

100

-2

8

4

0

1

-

-

5

-10

20

8

125-175

150

-1

7

7

0

6

-2

2

-2

1

16

-16

16

3

175-225

200

0

0

1

0

3

0

4

0

2

10

0

0

0

225-275

250

1

-1

1

0

1

3

3

8

4

9

9

9

10

 

f

13

11

9

7

N=40

∑fU’= -17

∑fU’2=45

∑fU’V’=21

 

fV’

-13

0

9

14

∑fV’=10

 
 

fV’2

13

0

9

28

 

∑fV’2=50

 
 

fU’V’

14

0

1

6

Fv’U’=21

 
                             

Karl Pearson’s coefficient of correlation is given by:

r= \{\frac{N ∑f U’.V’ – (∑fU’) (∑fV’) }{\{sqrt{N. ∑fU’2—(∑fU’)2 }\} \{sqrt{N. ∑fV’2—(∑fV)'2 }\} }\}

= \{\frac{40(21) – (-17)(10)}{ \{sqrt{40(45) – (-17)2}\} \{sqrt{40(50) – (10)2}\} }\}

Therefore, r= 0.596

Probable error = 0.6745 * \{\frac{1 – r2}{ \{sqrt{N}\}}\}

= 0.6745 * \{\frac{1 – (0.596)2}{ \{sqrt{40}\} }\}

=0.69

Refrence

Chaudary, A.K. (2061).Business statistics. kathmandu:Bhundipuran Prakshan

Dhakal Bashanta (2014).Business Statistics,Buddha academic publisher

Sthapit, Azaya Bikram(2006),Business Statistics,Asmita publication

Things to remember

© 2021 Saralmind. All Rights Reserved.