I have data saved in parquet format. Petastorm is a library I am using to obtain batches of data for training.
Though I was able to do this in my local system, but the same code is not working in Databricks.
Code I used in my local system
# create a iterator object train_reader. num_epochs is the number of epochs for which we want to train our model
with make_batch_reader('file:///config/workspace/scaled.parquet', num_epochs=4,shuffle_row_groups=False) as train_reader:
train_ds = make_petastorm_dataset(train_reader).unbatch().map(lambda x: (tf.convert_to_tensor(x))).batch(2)
for ele in train_ds:
tensor = tf.reshape(ele,(2,1,15))
model.fit(tensor,tensor)
Code I used in Databricks
with make_batch_reader('dbfs://output/scaled.parquet', num_epochs=4,shuffle_row_groups=False) as train_reader:
train_ds = make_petastorm_dataset(train_reader).unbatch().map(lambda x: (tf.convert_to_tensor(x))).batch(2)
for ele in train_ds:
tensor = tf.reshape(ele,(2,1,15))
model.fit(tensor,tensor)
Error I ma getting on DataBricks code is:
TypeError: init() missing 2 required positional arguments: 'instance' and 'token'
I have checked the documentation, but couldn't find any argument that Goes by the name of instance and token.However, in a similar method make_reader in petastorm, for Azure Databricks I see the below line of code:
# create sas token for storage account access, use your own adls account info
remote_url = "abfs://container_name@storage_account_url"
account_name = "<<adls account name>>"
linked_service_name = '<<linked service name>>'
TokenLibrary = spark._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary
sas_token = TokenLibrary.getConnectionString(linked_service_name)
with make_reader('{}/data_directory'.format(remote_url), storage_options = {'sas_token' : sas_token}) as reader:
for row in reader:
print(row)
Here I see some 'sas_token' being passed as input.
Please suggest how do I resolve this error?
I tried changing paths of the parquet file but that did not work out for me.
The SAS Token that is used in the code can be generated for your container by using the following steps:
Generate SAS