Sorry for the very vague title but I am little bit new in this area. Let me try to explain my question and curiosity.
I have 5 different pages each containing 50 questions. So in total I have 250 questions and If I look closely I can see some of them are repeating also and each question of a page can be linked to its source and or characterises like
1.What is natural selection?
- Subject -> Biology
- Chapter -> Evolution
- Sub-Chapter ->Natural Selection
- points -> 4
and some other.
So how can I add these questions into some form that I can add those tags and then later run a program to find most repetitive question, or chapters contributing more points or occurrence trend of certain question within those 5 papers.
xml ? RDF ? semantic web ?
Please guide me to the right direction what should I learn or do to convert those questions from physical papers to something semantic enough to be read by a web app.
And please ask if you have any confusion with the question.
XML or JSON would be good formats to use if you want to process the data using another program. Most languages have good libraries for parsing both formats.
There are two ways you could organize the data in either format: hierarchical, and tagged. Here are some examples of how you could represent it:
XML hierarchical:
XML tags:
The second will be easier to parse, but contains more redundant information. There are also many other ways you could organize the data.
JSON hierarchical:
JSON tags: