I am a bit confused about how scatter_matrix
in the pandas.plotting
module works. e.g., see the plot below from https://www.geeksforgeeks.org/pair-plots-using-scatter-matrix-in-pandas/
The 3 plots along the main diagonal looks like distributions. But the y and x axis labels indicate it's plotting a variable vs. itself, so shouldn't it be a straight line? Where did the distribution come from?
By default
pandas.plotting.scatter_matrix
plots histograms on the diagonal. Each histogram shows the counts for just that column of data. Otherwise, as you mentioned, we'd only have (useless) straight lines on the diagonal.There is a
diagonal
parameter to choose between a histogram or kernel density: