Reading Arrow Feather files in GoLang or Javascript

1k Views Asked by At

I am looking for a way to read the feather files via GoLang or Javascript, or some other languages that does not require users to do some other extra installation.

My goal is to provide a User-interface to read a feather csv file and convert it back to a human-readable csv. However I can't find much resources on how to work it out.

Currently I have a test feather file generated by below.

import pandas as pd
import datetime
import numpy as np
import pyarrow.feather as feather

# Create a dummy dataframe
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')

columns = ['A','B', 'C']
df = pd.DataFrame(index=index, columns=columns)
df = df.fillna(0) # with 0s rather than NaNs

feather.write_feather(df, 'test_feather.csv')

Thanks in advance.

2

There are 2 best solutions below

0
On BEST ANSWER

The Javascript package apache-arrow comes with a script that does exactly this. You can find the source for the script here: https://github.com/apache/arrow/blob/master/js/bin/arrow2csv.js

If it is not doing exactly what you want the script should serve as an example of how to use the API to read in a feather file.

0
On

Thanks for the hints from @Pace. Turns out I found that I can simply use the arrow.Table.from([arrow]) function to convert .feather file to csv. For those people encountered same issue, you may find the code below for reference.

const apArrow = require('apache-arrow');
const fs = require('fs');

const outputDir = 'output/feather';

const writeIntoFile = (data) => {
  fs.appendFileSync(`${outputDir}/test_feather.csv`, data, function (err) {
    if (err) return console.log(err);
  });
};

const readDataFromRow = (fields, row) => {
  return fields
    .map((f) => row.get(f))
    .join(',');
};

const arrowReader = (filePath) => {
  console.log('filePath', filePath);
  const arrow = fs.readFileSync(filePath);
  const table = apArrow.Table.from([arrow]);

  const columns = table.schema.fields.map((f) => f.name);

  let buf = columns.join(',') + '\n';

  for (let i = 0; i < table.count(); i++) {
    const rowData = readDataFromRow(columns, table.get(i));
    buf += `${rowData}\n`;
    // export to csv every 10000 rows
    if (i % 10000 === 0) {
      writeIntoFile(buf);
      buf = '';
      if (i > 0) {
        break;
      }
    }
  }

  writeIntoFile(buf);
};