"calculate_bartlett_sphericity" test outputing nan values

524 Views Asked by At

I have a dataframe V as follows:

       ECON1     ECON2     ECON3     FOOD1     FOOD2     FOOD3      ENV1  \
28  0.310071  0.096913  0.228500  0.234986  0.260894  0.267858  0.489309   
28  0.353609  0.045075  0.222571  0.222803  0.248388  0.330560  0.060107   
28  0.280600  0.170201  0.232027  0.226792  0.233379  0.316765  0.114550   
28  0.299062  0.127866  0.198080  0.189948  0.222982  0.327082  0.052881   
28  0.346291  0.645534  0.371397  0.389068  0.380557  0.386004  0.186583   

        ENV2      HEA1      HEA2      HEA3     PERS1     PERS2     PERS3  \
28  0.206320  0.252537  0.266968  0.248452  0.184450  0.093345  0.173952   
28 -0.206570  0.263673  0.126182  0.265908  0.134481  0.191341  0.113324   
28  0.237818  0.257337  0.102037  0.214423  0.159002  0.321451  0.165960   
28  0.345857  0.272412  0.069192  0.251301  0.130606  0.132732  0.174925   
28  0.372713  0.382155  0.373531  0.468293  0.364305  0.299510  0.350822   

        COM1      COM2      POL1      POL2  
28  0.781430  0.487822  0.361886  0.233124  
28  0.083918  0.005381  0.266604  0.237078  
28  0.395897  0.257888  0.330607  0.229079  
28  0.000000  0.000000  0.307907  0.238908  
28  0.188402  0.101147  0.410619  0.385933  

I am looking to do a bartlett_sphericity test to check whether or not the observed variables (dataframe V) intercorrelate at all using the observed correlation matrix against the identity matrix.

from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
chi_square_value, p_value=calculate_bartlett_sphericity(V)
print(chi_square_value, p_value)

The issue i find is that the output looks as follows:

nan nan

I am not sure what i am doing wrong. All values in V are numeric. Can someone comment on this please?

1

There are 1 best solutions below

0
On

The Bartlett sphericity test returns a NaN value:

Your case appears to be the last one.

Let's load the data:

from io import StringIO
data = """
ECON1,ECON2,ECON3,FOOD1,FOOD2,FOOD3,ENV1,ENV2,HEA1,HEA2,HEA3,PERS1,PERS2,PERS3,COM1,COM2,POL1,POL2
0.310071,0.096913,0.2285,0.234986,0.260894,0.267858,0.489309,0.20632,0.252537,0.266968,0.248452,0.18445,0.093345,0.173952,0.78143,0.487822,0.361886,0.233124
0.353609,0.045075,0.222571,0.222803,0.248388,0.33056,0.060107,-0.20657,0.263673,0.126182,0.265908,0.134481,0.191341,0.113324,0.083918,0.005381,0.266604,0.237078
0.2806,0.170201,0.232027,0.226792,0.233379,0.316765,0.11455,0.237818,0.257337,0.102037,0.214423,0.159002,0.321451,0.16596,0.395897,0.257888,0.330607,0.229079
0.299062,0.127866,0.19808,0.189948,0.222982,0.327082,0.052881,0.345857,0.272412,0.069192,0.251301,0.130606,0.132732,0.174925,0.0,0.0,0.307907,0.238908
0.346291,0.645534,0.371397,0.389068,0.380557,0.386004,0.186583,0.372713,0.382155,0.373531,0.468293,0.364305,0.29951,0.350822,0.188402,0.101147,0.410619,0.385933
"""
# Convert the string data to a file-like object
data_io = StringIO(data)
# Read the data into a pandas DataFrame
V = pd.read_csv(data_io)

Let's check how many times each variable has a correlation with any other variable greater than .95:

(V.corr() > .95).sum(1).sort_values(ascending=False)
POL2 ECON2 PERS1 ECON3 FOOD1 FOOD2 PERS3 HEA1 HEA3 COM2 COM1 POL1 ECON1 PERS2 ENV2 ENV1 FOOD3 HEA2
9 7 7 6 6 6 4 4 4 2 2 1 1 1 1 1 1 1

Let's remove the variables with more cases from the dataset and see if the Bartlett test returns a proper value:

for c in ['POL2','ECON2','PERS1']:
    V_fix = V.drop(c, axis=1)
    chi_square_value, p_value = calculate_bartlett_sphericity(V_fix)
    print(c, chi_square_value, p_value)
POL2 nan nan
ECON2 -1181.9125463026403 1.0
PERS1 -1182.5543638437994 1.0