Apparently, the LSHModel of MLLib from spark 2.4 supports Spark Structured Streaming (https://issues.apache.org/jira/browse/SPARK-24465).
However, it's not clear to me how. For instance an approxSimilarityJoin
from MinHashLSH
transformation (https://spark.apache.org/docs/latest/ml-features#lsh-operations) could be applied directly to a streaming dataframe?
I don't find more information online about it. Could someone help me?
You need to
modelFitted
) somewhere accessible to your Streaming job. This is done outside of your streaming job.df
) withIt might be required to get the streaming Dataframe into the correct format to be used in the model prediction.