Infer the type information for any arbitrary CSV files?

673 Views Asked by At

I want to use the following console program to get the type information (not the data) of Csv type provider. The file name will be passed as a command line argument. However, it seems the CsvProvider<> only accept constant literal.

Is there a way to workaround it? Or is it possible to do it using F# script? Or can F# compiler service help?

Or is there any other project does this?

open FSharp.Data
open Microsoft.FSharp.Collections
open System

[<Literal>] 
let fn = """C:\...\myfile.csv""" // Want to dynamically set the fn from arguments

[<EntryPoint>]
let main argv = 
    let myFile = CsvProvider<fn>.GetSample()
    // The following doesn't work
    let fn = argv.[0]
    let myFile = CsvProvider<fn>.GetSample()

    // code to get type information of myFile
2

There are 2 best solutions below

0
On BEST ANSWER

Suggested by Tomas, the following F#-Data CSV provider function can be used to resolve the issue.

let data = CsvFile.Load(....)
let inferredProperties =
    // InferColumnTypes : inferRows:int 
    // * missingValues:string [] 
    // * cultureInfo:CultureInfo 
    // * schema:string 
    // * assumeMissingValues:bool 
    // * preferOptionals:bool 
    // * ?unitsOfMeasureProvider:IUnitsOfMeasureProvider 
    // -> PrimitiveInferedProperty list
    data.InferColumnTypes(10000, [|""|], CultureInfo.InvariantCulture, "", false, true)

Not sure what the parameters should be used. But the above settings seem work OK.

6
On

I think you might be misunderstanding the purpose of the CSV type provider - the idea is that you have a representative sample of your data available at compile time (and can use it to guide the type inference). At runtime, you just give it (possibly a different) file with the same format. This gives you a nice way of handling files with known format.

If you want to parse arbitrary CSV files (with different headers etc.) then CSV type provider won't help. However, you can still use the CsvFile type from F# Data which provides a simple CSV parser. Example from the documentation:

// Download the stock prices
let msft = CsvFile.Load("http://ichart.finance.yahoo.com/table.csv?s=MSFT")

// Print the prices in the HLOC format
for row in msft.Rows do
  printfn "HLOC: (%s, %s, %s)" (row.GetColumn "High") 
     (row.GetColumn "Low") (row.GetColumn "Date")

Here, you loose the nice static typing, but you can load file with any format (and then dynamically look at the columns that were available in the file).