I'm not really sure if my question is correct per se to post here, but I thought I'd give it a go.
I'm working on a project where I take text data from a public knowledge base and want to use this text to automatically expand tag based search queries with additional terms that are supposed to be relevant to the original query. The public knowledge base is basically a collection of data from Wikipedia; in my case the abstracts of 3.74 million articles.
In the beginning I simply performed a search based on an original query, fetched the words used in articles describing the matches from my query and did a simple term frequency calculation to get the N most used terms.
It seemed to be a simple idea that worked to begin with, but as I tested more queries I started running into problems. It's clear that I need some kind of semantic analysis on my custom text collection, but I have no idea where to even begin doing something like this. Any tools I find online that are supposed to do semantic analysis' like this only works on a predefined collection of texts. As stated: I need something that can process a custom collection and later use that index to perform searches on.
Any ideas or suggestions?