Preventing "getCurrentDirectory: resource exhausted (Too many open files)" error

944 Views Asked by At

I am trying to run a Parsec parser over a whole bunch of small files, and getting an error saying I have too many open files. I understand that I need to use strict IO, but I'm not sure how to do that. This is the problematic code:

files = getDirectoryContents historyFolder

hands :: IO [Either ParseError [Hand]]
hands = join $ sequence <$> parseFromFile (many hand) <<$>> files

Note: my <<$>> function is this:

(<<$>>) :: (Functor f1, Functor f2) => (a -> b) -> f1 (f2 a) -> f1 (f2 b)
a <<$>> b = (a <$>) <$> b
2

There are 2 best solutions below

1
On

I don't know what your parseFromFile function looks like right now (probably a good idea to include that in the question), but I'm guessing you're using Prelude.readFile, which as @Markus1189 points out includes lazy I/O. To get to strict I/O, you just need a strict readFile, such as Data.Text.IO.readFile.

A streaming data library like pipes or conduit would allow you to avoid reading the entire file into memory at once, though- to my knowledge- parsec doesn't provide a streaming interface to allow this to happen. attoparsec, on the other hand, does include such a streaming interface, and both pipes and conduit have attoparsec adapter libraries (e.g., Data.Conduit.Attoparsec).

tl;dr: You probably just need the following helper function:

import qualified Data.Text as T
import qualified Data.Text.IO as TIO

readFileStrict :: FilePath -> IO String
readFileStrict = fmap T.unpack . TIO.readFile
0
On

You can use the BangPatterns language extension to enforce strictness of your IO operations, in this case parseFromFile. For example the function hands can be changed in:

hands :: [String] → IO [Either ParseError [Hand]]
hands [] = return []
hands (f:fs) = do
  !res ← parseFromFile hand f
  others ← hands fs
  return (res:others)

This version of hands waits for the results of each call of parseFromFile before moving to the next file in the list. Once you have this, the problem should disappear. A full working toy example is:

{-# LANGUAGE BangPatterns #-}
import Control.Monad
import Control.Applicative hiding (many)
import Data.Char (isDigit)
import System.Directory (getDirectoryContents)
import System.FilePath ((</>))
import Text.ParserCombinators.Parsec

data Hand = Hand Int deriving Show

hand :: GenParser Char st [Hand]
hand = do
  string "I'm file "
  num ← many digit
  newline
  eof
  return [Hand $ read num]

files :: IO [String]
files = map ("manyfiles" </>)
      ∘ filter (all isDigit) <$> getDirectoryContents "manyfiles"

hands :: [String] → IO [Either ParseError [Hand]]
hands [] = return []
hands (f:fs) = do
  !res ← parseFromFile hand f
  others ← hands fs
  return (res:others)

main :: IO 
main = do
  results ← files >≥ hands
  print results