Reimplementing getContents using getChar

330 Views Asked by At

On my journing towards grasping lazy IO in Haskell I tried the following:

main = do
  chars <- getContents
  consume chars

consume :: [Char] -> IO ()
consume [] = return ()
consume ('x':_) = consume []
consume (c : rest) = do
  putChar c
  consume rest

which just echos all characters typed in stdin until I hit 'x'.

So, I naively thought it should be possible to reimplement getContents using getChar doing something along the following lines:

myGetContents :: IO [Char]
myGetContents = do
  c <- getChar
  -- And now?
  return (c: ???) 

Turns out it's not so simple since the ??? would require a function of type IO [Char] -> [Char] which would - I think - break the whole idea of the IO monad.

Checking the implementation of getContents (or rather hGetContents) reveals a whole sausage factory of dirty IO stuff. Is my assumption correct that myGetContents cannot be implemented without using dirty, ie monad-breaking, code?

3

There are 3 best solutions below

1
On BEST ANSWER

You need a new primitive unsafeInterleaveIO :: IO a -> IO a that delays the execution of its argument action until the result of that action would be evaluated. Then

myGetContents :: IO [Char]
myGetContents = do
  c <- getChar
  rest <- unsafeInterleaveIO myGetContents
  return (c : rest)
2
On

You should really avoid using anything in System.IO.Unsafe if at all possible. They tend to kill referential transparency and are not common functions used in Haskell unless absolutely necessary.

If you change your type signature a little I suspect you can get a more idiomatic approach to your problem.

consume :: Char -> Bool
consume 'x' = False
consume _   = True

main :: IO ()
main = loop
  where
    loop = do
      c <- getChar
      if consume c
      then do
        putChar c
        loop
      else return ()
0
On

You can do this without any hacks.

If your goal is simply to read all of stdin into a String, you don't need any of the unsafe* functions.

IO is a Monad, and a Monad is an Applicative Functor. A Functor is defined by the function fmap, whose signature is:

fmap :: Functor f => (a -> b) -> f a -> f b

that satisfies these two laws:

fmap id = id
fmap (f . g) = fmap f . fmap g

Effectively, fmap applies a function to wrapped values.

Given a specific character 'c', what is the type of fmap ('c':)? We can write the two types down, then unify them:

fmap        :: Functor f => (a      -> b     ) -> f a      -> f b
     ('c':) ::               [Char] -> [Char]
fmap ('c':) :: Functor f => ([Char] -> [Char]) -> f [Char] -> f [Char]

Recalling that IO is a functor, if we want to define myGetContents :: IO [Char], it seems reasonable to use this:

myGetContents :: IO [Char]
myGetContents = do
  x <- getChar
  fmap (x:) myGetContents

This is close, but not quite equivalent to getContents, as this version will attempt to read past the end of the file and throw an error instead of returning a string. Just looking at it should make that clear: there is no way to return a concrete list, only an infinite cons chain. Knowing that the concrete case is "" at EOF (and using the infix syntax <$> for fmap) brings us to:

import System.IO
myGetContents :: IO [Char]
myGetContents = do
  reachedEOF <- isEOF
  if reachedEOF
  then return []
  else do
    x <- getChar
    (x:) <$> myGetContents

The Applicative class affords a (slight) simplification.

Recall that IO is an Applicative Functor, not just any old Functor. There are "Applicative Laws" associated with this typeclass much like the "Functor Laws", but we'll look specifically at <*>:

<*> :: Applicative f => f (a -> b) -> f a -> f b

This is almost identical to fmap (a.k.a. <$>), except that the function to apply is also wrapped. We can then avoid the bind in our else clause by using the Applicative style:

import System.IO
myGetContents :: IO String
myGetContents = do
  reachedEOF <- isEOF
  if reachedEOF
  then return []
  else (:) <$> getChar <*> myGetContents

One modification is necessary if the input may be infinite.

Remember when I said that you don't need the unsafe* functions if you just want to read all of stdin into a String? Well, if you just want some of the input, you do. If your input might be infinitely long, you definitely do. The final program differs in one import and a single word:

import System.IO
import System.IO.Unsafe
myGetContents :: IO [Char]
myGetContents = do
  reachedEOF <- isEOF
  if reachedEOF
  then return []
  else (:) <$> getChar <*> unsafeInterleaveIO myGetContents

The defining function of lazy IO is unsafeInterleaveIO (from System.IO.Unsafe). This delays the computation of the IO action until it is demanded.