ML Engine BigQuery: Request had insufficient authentication scopes

974 Views Asked by At

I'm running a tensorflow model submitting the training on ml engine. I have built a pipeline which reads from BigQuery using tf.contrib.cloud.python.ops.bigquery_reader_ops.BigQueryReader as a reader for the queue.

Everything works fine using DataLab and in local, setting the GOOGLE_APPLICATION_CREDENTIALS variable pointing to the json file for the credentials key. However, when I submit the training job in the cloud I get these errors (I just post the main two):

  1. Permission denied: Error executing an HTTP request (HTTP response code 403, error code 0, error message '') when reading schema for...

  2. There was an error creating the model. Check the details: Request had insufficient authentication scopes.

I've already checked everything else like correctly defining the table schema in the script and project/dataset/table ids/names

I paste down here the whole error present in the log for more clarity:

message: "Traceback (most recent call last):

File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)

File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals

File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 131, in <module>
    hparams=hparam.HParams(**args.__dict__)

File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 210, in run
    return _execute_schedule(experiment, schedule)

File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 47, in _execute_schedule
    return task()

 File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 495, in train_and_evaluate
    self.train(delay_secs=0)

File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
    hooks=self._train_monitors + extra_hooks)

File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
    monitors=hooks)

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)

File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
    loss = self._train_model(input_fn=input_fn, hooks=hooks)

File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
    _, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 521, in __exit__
    self._close_internal(exception_type)

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 556, in _close_internal
    self._sess.close()

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 791, in close
    self._sess.close()

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 888, in close
    ignore_live_threads=True)

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
    enqueue_callable()

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1063, in _single_operation_run
    target_list_as_strings, status, None)

File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
PermissionDeniedError: Error executing an HTTP request (HTTP response code 403, error code 0, error message '')
     when reading schema for pasquinelli-bigdata:Transactions.t_11_Hotel_25_w_train@1505224768418
     [[Node: GenerateBigQueryReaderPartitions = GenerateBigQueryReaderPartitions[columns=["F_RACC_GEST", "LABEL", "F_RCA", "W24", "ETA", "W22", "W23", "W20", "W21", "F_LEASING", "W2", "W16", "WLABEL", "SEX", "F_PIVA", "F_MUTUO", "Id_client", "F_ASS_VITA", "F_ASS_DANNI", "W19", "W18", "W17", "PROV", "W15", "W14", "W13", "W12", "W11", "W10", "W7", "W6", "W5", "W4", "W3", "F_FIN", "W1", "ImpTot", "F_MULTIB", "W9", "W8"], dataset_id="Transactions", num_partitions=1, project_id="pasquinelli-bigdata", table_id="t_11_Hotel_25_w_train", test_end_point="", timestamp_millis=1505224768418, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Any suggestion would be extremely helpful since I'm relatively new with GC. Thank you all.

1

There are 1 best solutions below

1
On

Support for reading BigQuery data from Cloud ML Engine is still under development, so what you are doing is currently unsupported. The issue you are hitting is the machines that ML Engine runs do not have the right scopes to talk to BigQuery. A potential issue you may also encounter running locally is poor performance reading from BigQuery. These are two examples of work that needs to be addressed.

In the meantime, I recommend exporting data to GCS for training. This is going to be much more scalable so you don't have to worry about poor training performance as your data increases. This can be a good pattern as well as it will let you preprocess your data once, write the result to GCS in CSV format, and then do multiple training runs to try out different algorithms or hyperparameters.