Sunday, 31 March 2019

Correlation and regression

Correlation and regression::

CORRELATION
Correlation is a statistical technique that shows the degree and direction of relationship between two or more variables. It is
important because of the following:
· It measures extent of relationship between two or more variables which is important in statistics
· Predictions about expectations can be made. (say if there is proper rain, the food position is likely to be rood)
· Where value of one variable is known. the value of other variable can be worked out.
Types of correlation :
The important types of correlation may be positive (direct) and negative (indirect) OR linear and non-linear correlation. Other
may be simple and multiple correlation OR partial and total correlation, OR logical and illogical Correlation.
Positive correlation : Where two variables move in the same direction i.e. if there is decline in one variable and 2 nd
variable also shows decline, the correlation- is direct or positive. For example at increasing price of a commodity, the
supply of the commodity is likely to be increasing. Hence, between price and supply, the
correlation is positive..

Negative correlation : Where two variables move in the opposite direction i.e. if there is decline in one variable and 2" variable
shows increase, the correlation is -indirect or negative. For example, at increasing price of a . commodity, the demand of the
commodity is-likely to be decreasing.
Linear correlation : If change in one variable brings change in the other variable constantly at the same rate. over the entire range
of values
Two variables are linearly related if there is a relation of the form Y = a + b X, between them.
Linear correlation is positive when the curve moves from left to right upward and negative when the curve moves from /ell to right
downward.
N o n - L i n e a r ( o r c u r v i - l i n e a r ) c o r r e l a t i o n : I f c h a n g e i n o n e v a r i a b l e b r i n g s c h a n g e i n t h e o t h e r
v a r i a b l e c o n s t a n t l y a t t h e d i f f e r e n t r a t e , o v e r t h e e n t i r e r a n g e o f v a l u e s Two variables are non-linearly
related if there is a 'elation of the form y = ax 2
 bx .-c = a.b' between them.
Non-linear correlation may be positive when the curve moves from left to right upward and negative when the curve moves from
left to right downward.
Degree and interpretation of correlation coefficient:
The coefficient of correlation lies between two limits i.e. + or— I. For perfect positive correlation, the value would be +1 and for
perfect negative, the value would be -I . When value is 0, there is no relationship.
Methods for studying correlation:
There are broadly two methods i.e. scattered diagram method and graphic method.
Scattered diagram method: Under this method (which is also called dot diagram, datagram or scatter gram), the data is plotted in a
graph in the form of dots. The term scatter means dispersion or spread of dots on the graph. Based on the spread of these dots, the
correlation may be interpreted as under:
· Where the points are .close to each other, this shows high correlation but where these are not close, it means poor
correlation.
· If the points show upward or downward trend, this means there is correlation. But if there is no trend, it means variables not
related or uncorrelated. Graphic method: Under this method (which is also called correlogram), the data is plotted in a graph paper
on the basis of which two curves will be drawn. By observing the direction and closeness of these curves, it can be concluded
whether the variables are related or unrelated. If they move in the same direction, it shows positive correlation. But if they move in
opposite direction, there is negative correlation.
Algebraic or mathematical method : Under these methods the value of co-efficient of correlation remains between plus and minus I.
Under these methods there is Karl Pearson's Covariance method (or coefficient of correlation). This is based on arithmetic mean and
standard deviation. The products of the corresponding values of two series i.e. co variance is divided by the product of standard
deviations of the two series to determine the formula.
The calculation of covariance : It can be done both for individual data or grouped data by using direct method as well as short-cut
method.
Direct method (when deviations are obtained from actual means) = r = Ely /N
where Yxy = Co-variance of x and y -
where x = (X —X Bar) means ..;,..riation in X series from its actual mean
where y= (Y — YeBar) means deviation in X series from its actual mean
where ox = Standard deviation of X series
where 6y= Standard deviation of Y series
Where N =No. of observations.
Measures to describe the degree of correlation
Coefficient of determination :
It the primary method through which the extent or strength of association is measured. It is calculated as: r 2
 = I — (vi / v2 )
It may be remembered that-
the coefficient of determination measures the strength of a linear relationship between two variable. If
we have a lot of x, y points randomly scattered on the circumference of circle, there may clearly be relationship, but it is not linear:
Hence, the coefficient of determination would be zero or close to zero.
Covariance : By computing the deviations of each point from the mean of x and y, we can obtain a measure of the direction and
strength of the relationship. This can be done by multiplying these paired deviation together and then add the cross
products of the deviations over all the points.
Covariance = (X, Y) = I x'y' / n

Coefficient of correlation :A dimensionless-
value showing the extent and direction of relationship is coefficient of correlation.- It
describes how one variable is explained by the other. It can be calculated :
Coefficient of correlation = r {covariance (x, y) / o x o y}
REGRESSION ANALYSIS
The statistical technique .of estimating or predicting. Unknown value of a dependent Variable from –
the known Value of an independent variable,
called regression :analysis If it is known that the two Variables say 'price (X) and demand (Y), are closely related, most probable value
of Y can be found with given value of x.
The regression analysis 'can be classified on the basis of: Change in proportion and Number of variables..
Regression on the basis of Change in proportion: On the basis 'of -change in proportion, regression can be Classified as linear
regression and non-linear regression.
Linear regression : When-
 the dependent variable moves in a fixed proportion of the unit movement of the ... independent variable, : it
is called linear regression. When it is plotted on graph 'paper, it forms a straight line..
Mathematically it can be expressed as:, - . .
 yi.=.a.+bxi+ei (where a and b are known as regression parameters, ei denotes residual terms , xi presents value of independents
variable and yi is.
 the value of dependent variable say y when 'the: value.
 of independent' variable, that is x, is zero).
Again b denotes slope of regression line of y on x axis. ei denotes the combined effect of all other variable on Y axis.
Non-linear regression : In such regression the value of dependent variable say y does not change by a constant absolute amount for
unit change in the value of the independent variable, say x. If the data is plotted on the graph, it would form a curve instead of a
straight line. Hence it is called, curvi-linear regression.
Regression on the basis of no. of variables: Regression analysis can be simple, partial or multiple regression.
Simple regression : When only two variables are studied, it is knows as simple regression. One of these variables is independent and
other the dependent. Functional relationship between price and demand of a product is an example of such regression.
Partial regression : When more than two variables are studied in a functional relationship but the relationship of only two variables is
analysed at a time keeping the other constant it is partial regression.
Multiple regression : When more than 2 variables are studied and their functional relationship are simultaneously worked out, it is
case of multiple regression. Study of growth in bank deposits in relation, to occupation and wealth of people, is an example of such
regression.
Regression lines : It is a graphic technique to show the functional relationship between the two variables say X and Y i.e.
dependent and independent. It is the line which shows average relationship between two variables X and Y.
Regression equation : These are algebraic expression of regression lines.
Properties of regression coefficient:
1 Both the regression coefficients bxy and byr cannot be greater than unit. In other words, the square root of the product of two
regression coefficients must be less than or equal to +1 or -I.
2 Both the regression coefficients will have the same sign i.e. if bxy is positive, the byx will be positive.
3 If r is zero, then bxy and bp: shall be zero.
4 Regression coefficients are independent of change of origin, but not of scale.
Correlation and regression Relationship:
These are two important tools to study the functional relationship between variable. Coefficient of correlation is a measure of degree
of covariance between x and y while the aim of regression analysis is to study the nature of relationship between the variable. This
helps to knOw the value of one variable on
the basis of another,'
Correlation and regression: Difference
Correlation analysis tests the closeness of the variable while the regression analysis measures the extent of change.
Con-
elation analysis studies the cause and effect relationship .between two variables but in regression analysis the causal relationship
is studied.
 In correlation analysis there may be spurious correlation between variables but in regression, there is no such types of relation.
Correlation analysis is a relative measure of linear relationship and the regression analysis is absolute measure.
Utility of regression analysis: It helps in predicting unknown value.It helps in establishing the nature of the relationship between two
variables. It provides regression co-efficient which are generally used in calculation-of co-efficient of correlation.
It is h elpf u l in est imat in g th e error in volved in u sin g regression lin e as t he basis for est imat ion .
Li mi t ati on of r egr essi on an al ysi s: It is based on lin ear relat ion sh ip , wh ich on certain occasion s may n ot
b e availab le. It is calculated on the basis of static condition of relationship. The relationship can be ascertained within limits only.
Standard error of estimates: It is the square root of the difference between the actual (observed) value and the estimated
(computed) value of independent variable.

No comments:

Post a Comment