How to create data types dynamically depending on the input

156 Views Asked by At

Update for my original problems

Well, maybe I didn't describe my problem completely before. Sorry for all! The following is the real problem:

I have a txt file containing patent data, such as:

1/1523 DWPI

AP - JP29446999A 19991015

PN - JP2000188399 A 20000704 DW200044 JP4568930B2 B2 20101027 DW201071

AN - 2000495116

PA - (NPDE ) DENSO CORP

PR - JP1998000295406 19981016

MC - U11-C18A3,U12-D02A

OPD - 1998.10.16

ICAI - H01L29/12,H01L29/78,H01L21/265,H01L21/336

TI - Planar type metal oxide semiconductor field effect transistor

AB - <-contents eliminated for simplicity->

CPY - NPDE

FN - JP2000188399

There are 1523 items with the similar format. I want to analyze the patent data, so I have to parse the data. I have defined the data type for every field, such as:

data AP = AP String Day String

data PN = PN String Day String

data AN = AN String

data PD = PD day

....  -- many other data types are not shown just for simplicity.

Now I have written the parser for every field with megaparsec, such as apField, pnField, anField, etc.

However, not every record has the same field, for example, the 2nd item may only contain fields of AP, PN, PA, PR, OPD, TI, AB, CPY and FN, with AN, MC, and ICAI missing. Besides, someone may be interested in different fields, and he just exports the txt file containing records only with fields of AP, PN, PA, OPD and CPY.

Now I want to write a generic code, which can parse the records with fields people are interested in, and write the parsing result into a SQLite database.

For example, if I want to parse records with fields of AP, PN, PA, OPD and CPY, I can construct a record parser according to the input, such as toParser "ap,pn,pa,opd,cpy", or toParser "ap,pa,cpy", which I have figured out. The parsed result should be Record AP PN PA OPD CPY or Record AP PA CPY respectively. Then I'd like to write the parsed results into a database. Since every record in the data corresponds to a Record data type, and the record to be parsed may be different, I have to construct a Record data type with different fields depending on the user's input. This is the problem that I have met.

I can work around it by defining all the field data types as data Field = Field [String] and the record as data Record = Record [Field]. However, I want more control over data type, such as a day as a Day type, and id number as a Int type.

If constructing Record data type with different fields depending on the input is impossible, maybe there are other ways to solve my problem. I appreciate any advices! And sorry for the long description of my problem and my ambiguous descriptions for my problem before!

1

There are 1 best solutions below

0
On BEST ANSWER

Well, if I got your question right, no you can't write a single function which returns different data types depending on the input. However what you can do is write a function that returns a single data type that can be constructed in different ways depending on input.. i.e. like:

data PatentRecord = PN String Day String
                  | AN String
                  | PD day

so now you can write a function parseRecord :: String -> Maybe PatentRecord for example which parses your input and depending on what it matches returns a PatentRecord built using the PN constructor, or the AN constructor, etc...

PS: Implementation Tip: use rather an Either SomeErrorType instead of Maybe to provide richer information upon parsing errors ;-)