Keras callback error due to folders permission

149 Views Asked by At

I have a keras model that has a ModelCheckpoint callback.

when I set the path in the callback to tmp folder, it works perfectly, but when I set it to another folder that called kaggle I get an error.

The error is quite long, and this is the last part of it:

    21/22 [===========================>..] - ETA: 0s - loss: 0.7804 - acc: 0.50482020-04-28 17:36:20.771950: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: indices[5,12] = 11086 is not in [0, 11086)
         [[{{node embedding/embedding_lookup}}]]
2020-04-28 17:36:20.778527: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: indices[5,12] = 11086 is not in [0, 11086)
         [[{{node embedding/embedding_lookup}}]]
         [[dense_1_target/_2]]
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/DIRECTORY2/train.py", line 76, in <module>
    Train(args)
  File "/DIRECTORY2/train.py", line 28, in __init__
    Train.train(params.read(configs))
  File "/DIRECTORY2/train.py", line 69, in train
    verbose = 1)
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
    steps_name='steps_per_epoch')
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 264, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1175, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3443, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 561, in __call__
    return self._call_flat(args)
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 660, in _call_flat
    outputs = self._inference_function.call(ctx, args)
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 434, in call
    ctx=ctx)
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[5,12] = 11086 is not in [0, 11086)
         [[{{node embedding/embedding_lookup}}]] [Op:__inference_keras_scratch_graph_3082]

I printed the permission for both folders and it looks that they have the same permission! enter image description here

Edited (1):

The directory that caused the error was transfered to my linux user using WinSCP program from another windows machine, while the other one (tmp) was created by locally in linux.

Edited (2):

I deleted the directory that caused the error and created the same one locally and the error disapeared!. I'm quite sure that the error is due to directories permissions but I don't know what was the source.

0

There are 0 best solutions below