(This question is about strategy and high-level approach to data refining, not programming, so if it is off-topic... sorry in advance, but I couldn't find a better stackexchange community)
So, we are in a (typical) scenario in which new data are introduced by a moltitude of users (bottom-up contribution) and periodically refined, corrected, categorized and enriched by moderators/administrators/trusted users (top-down refining).
This scenario is quite common in websites (stackexchange tags
are a good example)
Is there a "best strategy" to minimize efforts and maximize the quality of data?
Here some doubt:
- Force data to pass a validation process or let them populate the system (accepting a certain grade of incorrectness/inconsistency) and fix/enrich the most popular as they arise.
- Top-down-prefill the system with as much data as you can anticipating the bottom-up arrivals.
- Help bottom-up entries to be consistent with the rest of other data (autocompletes and did-you-mean boxes for the user)