Pandas plot dataframe as scatter complains of unknown item

2.8k Views Asked by At

I have thousands of data points for two values Tm1 and Tm2 for a series of text lables of type :

    Tm1 Tm2
ID      
A01 51  NaN
A03 51  NaN
A05 47  52
A07 47  52
A09 49  NaN

I managed to create a pandas DataFrame with the values from csv. I now want to plot the Tm1 and Tm2 as y values against the text ID's as x values in a scatter plot, with different color dots in pandas/matplotlib.

With a test case like this I can get a line plot

from pandas import *
df2= DataFrame([52,54,56],index=["A01","A02","A03"],columns=["Tm1"])
df2["Tm2"] = [None,42,None]


Tm1 Tm2
A01 52  NaN
A02 54  42
A03 56  NaN

Plot obtained from DataFrame

I want to not connect the individual values with lines and just have the Tm1 and Tm2 values as scatter dots in different colors.

When I try to plot using

df2.reset_index().plot(kind="scatter",x='index',y=["Tm1"])

I get an error:

KeyError: u'no item named index'

I know this is a very basic plotting command, but am sorry i have no idea on how to achieve this in pandas/matplotlib. The scatter command does need an x and y value but I somehow am missing some key pandas concept in understanding how to do this.

1

There are 1 best solutions below

0
On BEST ANSWER

I think the problem here is that you are trying to plot a scatter graph against a non-numeric series. That will fail - although the error message you are given is so misleading that it could be considered a bug.

You could, however, explictly set the xticks to use one per category and use the second argument of xticks to set the xtick labels. Like this:

import matplotlib.pyplot as plt

df1 = df2.reset_index() #df1 will have a numeric index, and a 
                        #column named 'index' containing the index labels from df2
plt.scatter(df1.index,df1['Tm1'],c='b',label='Tm1')
plt.scatter(df1.index,df1['Tm2'],c='r',label='Tm2')
plt.legend(loc=4) # Optional - show labelled legend, loc=4 puts it at bottom right
plt.xticks(df1.index,df1['index']) # explicitly set one tick per category and label them
                                   # according to the labels in column df1['index']
plt.show()

I've just tested it with 1.4.3 and it worked OK


For the example data you gave, this yields:

enter image description here