I am trying to run grangercausalitytests
on two time series:
import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import grangercausalitytests
n = 1000
ls = np.linspace(0, 2*np.pi, n)
df1 = pd.DataFrame(np.sin(ls))
df2 = pd.DataFrame(2*np.sin(1+ls))
df = pd.concat([df1, df2], axis=1)
df.plot()
grangercausalitytests(df, maxlag=20)
However, I am getting
Granger Causality
number of lags (no zero) 1
ssr based F test: F=272078066917221398041264652288.0000, p=0.0000 , df_denom=996, df_num=1
ssr based chi2 test: chi2=272897579166972095424217743360.0000, p=0.0000 , df=1
likelihood ratio test: chi2=60811.2671, p=0.0000 , df=1
parameter F test: F=272078066917220553616334520320.0000, p=0.0000 , df_denom=996, df_num=1
Granger Causality
number of lags (no zero) 2
ssr based F test: F=7296.6976, p=0.0000 , df_denom=995, df_num=2
ssr based chi2 test: chi2=14637.3954, p=0.0000 , df=2
likelihood ratio test: chi2=2746.0362, p=0.0000 , df=2
parameter F test: F=13296850090491009488285469769728.0000, p=0.0000 , df_denom=995, df_num=2
...
/usr/local/lib/python3.5/dist-packages/numpy/linalg/linalg.py in _raise_linalgerror_singular(err, flag)
88
89 def _raise_linalgerror_singular(err, flag):
---> 90 raise LinAlgError("Singular matrix")
91
92 def _raise_linalgerror_nonposdef(err, flag):
LinAlgError: Singular matrix
and I am not sure why this is the case.
Kenil Vasani
The problem arises due to the perfect correlation between the two series in your data. From the traceback, you can see, that internally a wald test is used to compute the maximum likelihood estimates for the parameters of the lag-time series. To do this an estimate of the parameters covariance matrix (which is then near-zero) and its inverse is needed (as you can also see in the line
invcov = np.linalg.inv(cov_p)
in the traceback). This near-zero matrix is now singular for some maximum lag number (>=5) and thus the test crashes. If you add just a little noise to your data, the error disappears: