Manual calculation of Correlation coefficient
You:“I want to analize the timing of buying and selling by comparing two stock data.”
Teacher:“calculate the correlation coefficient then you’ll get the answer.”
You:“what is correlation coefficient??”
if you have this kind of problem, this blog is for you.
this blog is for you who have below kind of problems.
- ・what is correlation coefficient?
- ・You learned correlation coefficient at university but you didn't remember at all.
- ・You want to know the similarity of two data but you don't know how to do it.
- ・You want to be able to calculate the correlation coefficient.
By reading this blog you can understand what is correlation coefficient and how to calculate it.
I explain step by step.🙂
what is correlation coefficient.
correlation coefficient is showing the similarity of two data.
the word similarity is the same one that you use for two face quite similar.
the correlation coefficient of Mark Zuckerberg and savanna Takahashi seems to be high.😁
How about Hirose Suzu and Tuchiya Tao?
Let’s see👀
The correlation coefficient seems high😍
How to?〜Manual calculation of Correlation coefficient〜
The definition of correlation coefficient r is showen like below.
$$\begin{align*}
r&=\frac{\sum (x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\sum (x_{i}-\bar{x})^2}\sqrt{\sum (y_{i}-\bar{y})^2}} \\
&=\frac{\sum (x_{i}-\bar{x})(y_{i}-\bar{y})/n}{\sqrt{\frac{\sum(x_{i}-\bar{x})^2}{n}}\sqrt{\frac{\sum(y_{i}-\bar{y})^2}{n}}} \\
&=\frac{V_{xy}}{\sigma_{x}\sigma_{y}} \\
\end{align*}$$
$$\begin{align*}
V_{xy}&: covariance of x and y \\
\sigma_{x}&: standard deviation of x \\
\sigma_{y}&: standard deviation of y \\
\end{align*}$$
you might be what?? No idea🤔.
don’t worry.
I explain with below 4 steps.
- To calculate average
- To calculate stranded divination
- To calculate the covariance of x and y
- To calculate the correlation coefficient
In this example, let’s use x and y which stored below variables.
you can calculate it with calculator manually.
let’s check this out!
$$\begin{align*}
x&=(1,2,3,4,5) \\
y&=(2,8,6,4,10)
\end{align*}$$
1. To calculate average
$$\begin{align*}
\bar{x}&=\frac{1+2+3+4+5}{5} =3 \\
\bar{y}&=\frac{2+8+6+4+10}{5} =6 \\
\end{align*}$$
2. To calculate stranded divination.
calculate the square of deviation
$$\begin{align*}
x_{i}-\bar{x}&=((1-3)^2,(2-3)^2,(3-3)^2,(4-3)^2,(5-3)^2) \\
&=(4,1,0,1,4) \\
\end{align*}$$
$$\begin{align*}
y_{i}-\bar{y}&=((2-6)^2,(8-6)^2,(6-6)^2,(4-6)^2,(10-6)^2) \\
&=(16,4,0,4,16) \\
\end{align*}$$
calculate the square root of average
$$\begin{align*}
\sigma_{x}&= \sqrt{\frac{4+1+0+1+4}{5}} \fallingdotseq 1.41 \\
\sigma_{y}&=\sqrt{\frac{16+4+0+4+16}{5}} \fallingdotseq 2.83 \\
\end{align*}$$
3. To calculate the covariance of x and y
multiply by the deviation of x and y
$$\begin{align*}
(x_{i}-\bar{x})*(y_{i}-\bar{y})&=((1-3)(2-6),(2-3)(8-6),(3-3)(6-6),(4-3)(4-6),(5-3)(10-6)) \\
&=(8,-2,0,-2,8) \\
\end{align*}$$
calculate the average $$ V_{xy}= \frac{8-2+0-2+8}{5} =2.4 $$
4. To calculate the correlation coefficien
$$\begin{align*}
r&=\frac{V_{xy}}{\sigma_{x}\sigma_{y}}=\frac{2.4}{1.41*2.83} \fallingdotseq 0.6 \\
\end{align*}$$
the correlation coefficient is calculated. the answer is 0.6. the closer correlation coefficient is one, the higher correlation.
as a side note the Matlab code is
Matlab code is shown like below.
it is very simple!!
x=[1,2,3,4,5];
y=[2,8,6,4,10];
r=corrcoef(x,y)
Caution
the correlation don’t necessary mean causality.
for example, if the correlation coefficient between stock and some economic index, the reason of the stock raise is not necessary because the economic index raised.
below YouTube also good for learning about this caution.
it is important that you should think with your head about the cause hiding from the data.
Future
Next time I’ll explain about the future prediction with using the model.
Reference
Below books is good for beginners.
If you want to know more about Hirose Suzu please see below books ^_^.
Thank you for reading😊
Hiroki🐶