For example, suppose you have four data pairs for x and y. Your table may look like this: x || y 1 || 1 2 || 3 4 || 5 5 || 7
Using the example above, note that you have four values for x. To calculate the mean, add all the values given for x, then divide by 4. Your calculation would look like this: μx=(1+2+4+5)/4{\displaystyle \mu _{x}=(1+2+4+5)/4} μx=12/4{\displaystyle \mu _{x}=12/4} μx=3{\displaystyle \mu _{x}=3}
In the example above, you also have four values for y. Add all these values, then divide by 4. Your calculations would look like this: μy=(1+3+5+7)/4{\displaystyle \mu _{y}=(1+3+5+7)/4} μy=16/4{\displaystyle \mu _{y}=16/4} μy=4{\displaystyle \mu _{y}=4}
σx=1n−1Σ(x−μx)2{\displaystyle \sigma _{x}={\sqrt {{\frac {1}{n-1}}\Sigma (x-\mu _{x})^{2}}}} With the sample data, your calculations should look like this: σx=14−1∗((1−3)2+(2−3)2+(4−3)2+(5−3)2){\displaystyle \sigma _{x}={\sqrt {{\frac {1}{4-1}}((1-3)^{2}+(2-3)^{2}+(4-3)^{2}+(5-3)^{2})}}} σx=13∗(4+1+1+4){\displaystyle \sigma _{x}={\sqrt {{\frac {1}{3}}(4+1+1+4)}}} σx=13∗(10){\displaystyle \sigma _{x}={\sqrt {{\frac {1}{3}}*(10)}}} σx=103{\displaystyle \sigma _{x}={\sqrt {\frac {10}{3}}}} σx=1. 83{\displaystyle \sigma _{x}=1. 83}
With the sample data, your calculations should look like this: σy=14−1∗((1−4)2+(3−4)2+(5−4)2+(7−4)2){\displaystyle \sigma _{y}={\sqrt {{\frac {1}{4-1}}((1-4)^{2}+(3-4)^{2}+(5-4)^{2}+(7-4)^{2})}}} σy=13∗(9+1+1+9){\displaystyle \sigma _{y}={\sqrt {{\frac {1}{3}}(9+1+1+9)}}} σy=13∗(20){\displaystyle \sigma _{y}={\sqrt {{\frac {1}{3}}*(20)}}} σy=203{\displaystyle \sigma _{y}={\sqrt {\frac {20}{3}}}} σy=2. 58{\displaystyle \sigma _{y}=2. 58}
ρ=(1n−1)Σ(x−μxσx)∗(y−μyσy){\displaystyle \rho =\left({\frac {1}{n-1}}\right)\Sigma \left({\frac {x-\mu _{x}}{\sigma _{x}}}\right)*\left({\frac {y-\mu _{y}}{\sigma _{y}}}\right)} You may notice slight variations in the formula, here or in other texts. For example, some will use the Greek notation with rho and sigma, while others will use r and s. Some texts may show slightly different formulas; but they will be mathematically equivalent to this one.
Using the sample data, you would enter your data in the correlation coefficient formula and calculate as follows: ρ=(1n−1)Σ(x−μxσx)∗(y−μyσy){\displaystyle \rho =\left({\frac {1}{n-1}}\right)\Sigma \left({\frac {x-\mu _{x}}{\sigma _{x}}}\right)\left({\frac {y-\mu _{y}}{\sigma _{y}}}\right)} ρ=(13)∗{\displaystyle \rho =\left({\frac {1}{3}}\right)}[(1−31. 83)∗(1−42. 58)+(2−31. 83)∗(3−42. 58){\displaystyle \left({\frac {1-3}{1. 83}}\right)\left({\frac {1-4}{2. 58}}\right)+\left({\frac {2-3}{1. 83}}\right)\left({\frac {3-4}{2. 58}}\right)} +(4−31. 83)∗(5−42. 58)+(5−31. 83)∗(7−42. 58){\displaystyle +\left({\frac {4-3}{1. 83}}\right)\left({\frac {5-4}{2. 58}}\right)+\left({\frac {5-3}{1. 83}}\right)\left({\frac {7-4}{2. 58}}\right)}] ρ=(13)∗(6+1+1+64. 721){\displaystyle \rho =\left({\frac {1}{3}}\right)*\left({\frac {6+1+1+6}{4. 721}}\right)} ρ=(13)∗2. 965{\displaystyle \rho =\left({\frac {1}{3}}\right)*2. 965} ρ=(2. 9653){\displaystyle \rho =\left({\frac {2. 965}{3}}\right)} ρ=0. 988{\displaystyle \rho =0. 988}
Because the correlation coefficient is positive, you can say there is a positive correlation between the x-data and the y-data. This means that as the x values increase, you expect the y values to increase also. Because the correlation coefficient is very close to +1, the x-data and y-data are very closely connected. If you were to graph these points, you would see that they form a very good approximation of a straight line.
For example, at the website http://ncalculators. com/statistics/correlation-coefficient-calculator. htm, you will find one horizontal box for entering x-values and a second horizontal box for entering y-values. You enter your terms, separated only by commas. Thus, the x-data set that was calculated earlier in this article should be entered as 1,2,4,5. The y-data set should be 1,3,5,7. At another site, http://www. alcula. com/calculators/statistics/correlation-coefficient/, you can enter data either horizontally or vertically, as long as you keep the data points in order.
Each calculator will have slightly different key commands. This article will give the specific instructions for the Texas Instruments TI-86. Enter the Stat function by pressing [2nd]-Stat (above the + key), then hit F2-Edit.
Use the arrow keys to move the cursor to highlight the heading “xStat. ” Then press Clear and Enter. This should clear all values in the xStat column. Use the arrow keys to highlight the yStat heading. Press Clear and Enter to empty the data from that column as well.
Continue entering all the x-data values. When you complete the x-data, use the arrow keys to move to the yStat column and enter the y-data values. After all the data has been entered, hit Exit to clear the screen and leave the Stat menu.
Enter the Stat function and then hit the Calc button. On the TI-86, this is [2nd][Stat][F1]. Choose the Linear Regression calculations. On the TI-86, this is [F3], which is labeled “LinR. ” The graphic screen should then display the line “LinR _,” with a blinking cursor. You now need to enter the names of the two variables that you want to calculate. These are xStat and yStat. On the TI-86, select the Names list by hitting [2nd][List][F3]. The bottom line of your screen should now show the available variables. Choose [xStat] (this is probably button F1 or F2), then enter a comma, then [yStat]. Hit Enter to calculate the data.
y=a+bx{\displaystyle y=a+bx} : This is the general formula for a straight line. However, instead of the familiar “y=mx+b,” this is presented in reverse order. a={\displaystyle a=}. This is the value of the y-intercept of the best-fit line. b={\displaystyle b=}. This is the slope of the best-fit line. corr={\displaystyle {\text{corr}}=}. This is the correlation coefficient. n={\displaystyle n=}. This is the number of data pairs that were used in the calculation.
For example, if you were to measure the heights and ages of children up to the age of about 12, you would expect to find a strong positive correlation. As children get older, they tend to get taller. An example of negative correlation would be data comparing a person’s time spent practicing golf shots and that person’s golf score. As the practice increases, the score should decrease. Finally, you would expect very little correlation, either positive or negative, between a person’s shoe size, for example, and SAT scores.
The mean of a variable is denoted by the variable with a horizontal line above it. This is often referred to as “x-bar” or “y-bar” for the x and y data sets. Alternatively, the mean may be signified by the lower-case Greek letter mu, μ. To indicate the mean of x-data points, for example, you could write μx or μ(x). As an example, if you have a set of x-data points (1,2,5,6,9,10), then the mean of this data is calculated as follows: μx=(1+2+5+6+9+10)/6{\displaystyle \mu _{x}=(1+2+5+6+9+10)/6} μx=33/6{\displaystyle \mu _{x}=33/6} μx=5. 5{\displaystyle \mu _{x}=5. 5}
Symbolically, standard deviation is expressed with either the lower-case letter s or the lower-case Greek letter sigma, σ. Thus, the standard deviation of the x-data is written as either sx or σx.
As an example, if you have a set of x-data points (1,2,5,6,9,10), then ∑x means: 1+2+5+6+9+10 = 33.