problem with list_rows with max_results value set and to_dataframe in Kaggle's "Intro to SQL" course

1k Views Asked by At

I could use some help. In part 1, "Getting Started with SQL and BigQuery", I'm running into the following issue. I've gotten down to In[7]:

# Preview the first five lines of the "full" table
client.list_rows(table, max_results=5).to_dataframe()

and I get the error:

getting_started_with_bigquery.py:41: UserWarning: Cannot use bqstorage_client if max_results is set, reverting to fetching data with the tabledata.list endpoint.
  client.list_rows(table, max_results=5).to_dataframe()

I'm writing my code in Notepad++ then running by calling it in the command prompt on Windows. I've gotten everything else working up until this point, but I'm having trouble finding a solution to this problem. A Google search leads me to the source code for google.cloud.bigquery.table which looks like that error should come up if pandas is not installed, so I installed it and I added import pandas to my code, but I'm still getting the same error.

Here is my full code:

from google.cloud import bigquery
import os 
import pandas

#need to set credential path
credential_path = (r"C:\Users\crlas\learningPython\google_application_credentials.json")
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path

#create a "Client" object
client = bigquery.Client()

#construct a reference to the "hacker_news" dataset
dataset_ref = client.dataset("hacker_news", project="bigquery-public-data")
#API request - fetch the dataset 
dataset = client.get_dataset(dataset_ref)

#list all tables in the dataset
tables = list(client.list_tables(dataset))
#print all table names
for table in tables:
    print(table.table_id)
print()

#construct a reference to the "full" table
table_ref = dataset_ref.table("full")
#API request - fetch the dataset 
table = client.get_table(table_ref)
#print info on all the columns in the "full" table
print(table.schema)
# print("table schema should have printed above")
print()
#preview first 5 lines of the table
client.list_rows(table, max_results=5).to_dataframe()
1

There are 1 best solutions below

1
On BEST ANSWER

As the warning message says - UserWarning: Cannot use bqstorage_client if max_results is set, reverting to fetching data with the tabledata.list endpoint.

So this is still working with the warning and using tabledata api to retrieve data. You just need to point the output to a dataframe object and print it, like below:

df = client.list_rows(table, max_results=5).to_dataframe()
print(df)