I have an API that returns most_similar_approx from a magnitude model . The model is built from native Word2Vec format with 50 dimensions and 50 trees. The magnitude model is close to 350MB, with approximately 350000 tokens. Load testing this API I observed that the performance deteriorates as I increase the topn value for most_similar_approx, I need a high number of similar tokens for downstream activities, with topn=150 I get a throughput of 500 transactions per second on the API, while gradually reducing it I get 800 transactions with topn=50 and and ~1300 with topn=10. The server instance is not under any memory/cpu load, am using a c5.xlarge AWS EC2 instance.
Is there anyway I can tune the model to improve the performance for a high topn value? My aim is to obtain most_similar tokens from word embeddings, and pymagnitude was the most recommended option I found, are there any similar high performing alternatives.