How to generate Sindhi Sentence Level Embedding

21 Views Asked by At

I am working on a project where I need Sindhi Sentence level Embedding. For this I am using the Word2vec available pretrained model as described in the sample code. The code is only presented for the Word level embedding whereas I want it for entire Sentence and there can be any strategy, like Average or anything. However I am facing issues in my pipeline

documentAssembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

tokenizer = Tokenizer() \
    .setInputCols(["document"]) \
    .setOutputCol("token")

Use WordEmbeddings instead of WordEmbeddingsModel

word_embeddings = WordEmbeddingsModel.pretrained("w2v_cc_300d","sd") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")

Use SentenceEmbeddings for obtaining sentence embeddings

sentence_embeddings = SentenceEmbeddings() \
    .setInputCols(["document", "word_embeddings"]) \
    .setOutputCol("sentence_embeddings") \
    .setPoolingStrategy("AVERAGE")

pipeline = Pipeline(stages=[documentAssembler, tokenizer, word_embeddings, sentence_embeddings])

data = spark.createDataFrame([["مون کي اسپارڪ اين ايل پي سان پيار آهي"]]).toDF("text")

result = pipeline.fit(data).transform(data)

# Extract the final embeddings
sentence_embeddings = result.select("sentence_embeddings.result").first()[0]
print(sentence_embeddings)

The error is : enter image description here

0

There are 0 best solutions below