How to convert physical document to a semantic document that can be read by web application

77 Views Asked by At

Sorry for the very vague title but I am little bit new in this area. Let me try to explain my question and curiosity.

I have 5 different pages each containing 50 questions. So in total I have 250 questions and If I look closely I can see some of them are repeating also and each question of a page can be linked to its source and or characterises like

1.What is natural selection?

  • Subject -> Biology
  • Chapter -> Evolution
  • Sub-Chapter ->Natural Selection
  • points -> 4

and some other.

So how can I add these questions into some form that I can add those tags and then later run a program to find most repetitive question, or chapters contributing more points or occurrence trend of certain question within those 5 papers.

xml ? RDF ? semantic web ?

Please guide me to the right direction what should I learn or do to convert those questions from physical papers to something semantic enough to be read by a web app.

And please ask if you have any confusion with the question.

1

There are 1 best solutions below

2
On BEST ANSWER

XML or JSON would be good formats to use if you want to process the data using another program. Most languages have good libraries for parsing both formats.

There are two ways you could organize the data in either format: hierarchical, and tagged. Here are some examples of how you could represent it:

XML hierarchical:

<document>
  <subject name="biology">
    <chapter name="evolution">
      <subChapter name="Natural Selection">
        <question points="4">Some question</question>
      </subChapter>
    </chapter>
  </subject>
</document>

XML tags:

<document>
  <question>
    <content>Some question</content>
    <subject>Biology</subject>
    <chapter>Evolution</chapter>
    <subChapter>Natural Selection</subChapter>
    <points>4</points>
 </question>
</document>

The second will be easier to parse, but contains more redundant information. There are also many other ways you could organize the data.

JSON hierarchical:

{
  "Biology": {
    "Evolution": {
      "Natural Selection": [
        {"question": "Some Question", "points":4},
        {"question": "Some other Question", "points":2}
       ]
     }
   }
}

JSON tags:

[{"question": "Some Question",
  "subject": "Biology",
  "chapter": "Evolution",
  "subChapter": "Natural Selection",
  "points":4
}]