I've opened a Google Sheet in Colab using gspread
document = gc.open_by_url('https://docs.google.com/myspreadsheet')
sheet = elem.worksheet('Sheet1')
data = sheet.get_all_values()
df = pd.DataFrame(data)
The document contains element data and a print of head() looks like this:
0 1 ... 26 27
0 AtomicNumber Element ... NumberofShells NumberofValence
1 1 Hydrogen ... 1 1
2 2 Helium ... 1
3 3 Lithium ... 2 1
4 4 Beryllium ... 2 2
The problem I have is that when I try to reference by title, for example:
df.plot(x = 'AtomicNumber', y= 'AtomicMass', kind = 'scatter')
I get an error. I have also tried:
df.plot(x = df.AtomicNumber, y= df.AtomicMass, kind = 'scatter')
and
df.plot(x = df['AtomicNumber'], y= df['AtomicMass'], kind = 'scatter')
but I have no joy either. Unless I am using the column references like so:
df.plot(x = 0, y= 17, kind = 'scatter')
I get nothing. It will get tiring pretty fast if I have to keep referencing the .csv file to figure out which column reference I need!!
Finally, when I print:
df.columns.values
I get:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27])
I can't seem to not get this - even if I try to create a new dataframe that contains every row of df bar row index 0
I'm pretty new with this so I'm sure it's pretty simple, but I've hit an impasse... Help!
It seems, that sheet file contains as a first row numbers, which are used as column names. You need to drop first row before converting sheet to dataframe. I'm not very familiar with gspread API, but suppose that following should work:
After that it should be possible to address column by names.