I have Neo4j FULLTEXT INDEX with ~60k records (keywords). This is my keyword vocabulary. I need to extract all possible keywords (which are present in this index) from the different input texts. Is this possible to implement with Neo4j, Cypher, APOC ?
UPDATED
For example there is a text:
Looking for Apache Spark expert to coach me on the core concepts of optimizing the parallelism of Spark using Scala and OpenAcc programming model.
The mentor must have comprehensive hands-on knowledge of Big Data analytics in large scale of data (especially Spark and GPU programming) to design the software tool with sample data analysis using Scala language and OpenAcc directives.
In the Neo4j database with FULLTEXT INDEX I have the following keywords:
apache-spark
scala
gpu
I need to extract from the text above
Apache Spark
Scala
GPU
So, generally using an FT index is for the opposite use case, storing the texts in the index and matching for keywords, nevertheless :
Poor Man Solution
Query the index with your text. For eg, given the following setup
Use your text as search query
Since a lucene query will by default use all tokens of the text with an
ORoperator, it will workResult :
Limitations :
This is with an
ORoperator, so while here it works you need to know that when you index the keywords, a keyword likeapache-sparkwill actually produce two tokens in the index, namelyapacheandspark, so this would be returned as well if your text would containApache Age.Alternative solution
Do the other way around, the process would be :
This will be the lucene queries produced
Result
Summary
There is in my opinion no real bullet proof solution really