This was actually something I wanted to understand with my previous question, but I misworded it by giving for granted the solution would somehow have to build on sequence and repeat, so I got an intersting answer, which is definitely useful to solve the problem at hand, but I didn't actually solve my curiosity as well, so I'm rephrasing the quesition as in the title.

As you can see from the linked question, I initially wrote

main :: IO ()
main = do
  x <- takeWhile (/= 'q') <$> sequence (repeat getChar)
  print "you pressed q"

thinking x would be a finite-length [Char].

But I'm now under the impression that I've not "just" used the wrong tool above (takeWhile, a pure function), but that there's actually no (orthodox) tool at all to bail out of that IO [Char] that is sequence (repeat getChar).

Am I correct?

If not, then how do you bail out of that?

Beware, I'm not asking now the same question as before. I'm looking for a solution that does make use of sequence (repeat getChar) and still manages to get to the print "you pressed q" action, or the explanatin of why such a solution doesn't exist.

2

There are 2 best solutions below

2
K. A. Buhr On BEST ANSWER

If you sequence an infinite list of IO operations, you can't bail out via "pure" program logic. So, you are correct that there is no "orthodox" tool to bail out of sequence (repeat getChar).

The only way to bail out is to invoke an IO effect to do "impure" bailing, for example by raising an exception, killing the thread, exiting the program, or similar. In these cases, you can't easily return a value, so this wouldn't be very useful for your use case. Alternatively, instead of bailing out, you can "cheat" and use unsafe lazy I/O, which allows you to execute an initial prefix of the infinite list of actions and inspect the initial prefix of the returned list. You can either continue processing the list indefinitely or "abandon" the list after processing some finite number of elements.

In general, though, either of these approaches are going to be bad practice, at least for your problem.

If you want to write a proper monadic action that reads characters until the first 'q', then the right way to do it with elementary Haskell code is something like this:

getToQ :: IO String
getToQ = do
  c <- getChar
  if c == 'q' then pure [] else do
    cs <- getToQ
    pure (c:cs)

The technical reason you can't use sequence to implement this logic is that this is a fundamentally monadic algorithm but sequence is "only" an applicative combinator. Applicative combinators can't use the results of applicative actions to influence the "structure" of the computation. When you sequence an infinite list of IO actions, you have committed to executing an infinite loop, and the results of the actions cannot change this into a finite loop, except as a side effect (exception, etc.). In particular sequence can't implement the part of getToQ where we inspect the return value c and decide whether the rest of the computation is going to be pure [] or a recursive call to getToQ to do more getChars. You can see this from the following (simplified) definition of sequence:

sequence [] = pure []
sequence (x:xs) = do
  a <- x
  as <- sequence xs
  pure (a:as)

See how the returned a value from the x action cannot be used to influence whether or not the sequence xs action in the next line executes? That's how sequence differs structurally from getToQ.

Now, there are some packages, like extra and monad-loops that provide monadic combinators that can be used to succinctly write the equivalent of getToQ. For example, from monad-loops, the unfoldWhileM will do exactly what you want:

unfoldWhileM (/= 'q') getChar

There's nothing magical about these combinators. A simplified implementation of this combinator would be:

unfoldWhileM :: Monad m => (a -> Bool) -> m a -> m [a]
unfoldWhileM p x = do
  a <- x
  if p a then do
      as <- unfoldWhileM p x
      pure (a:as)
    else pure []

See how this combinator, in contrast to sequence, is fundamental a non-applicative, monadic combinator. It uses the return value a from the action x to determine the structure of the rest of the computation: (i) recurse to execute more x actions; or (ii) end the computation.

4
Ben On

sequence (repeat getChar) has type IO [Char]. That type says that it's a single IO action which when run to completion will produce a [Char].

Of course we know that this single action was produced by stitching together a series of smaller actions, but there's no way for that to help us. There's no way to unpick it again into the individual steps and run them one at a time so that we can look at each individual getChar result and decide whether to continue. A value of type IO [Char] simply has no ability for you to run it partially and get part of the result and then decide whether you want to continue; that is a much more complicated interface than IO provides. Once you've got an IO [Char] your choice is to run it or not to run it; that's it.

(You can very easily write programs that do what you want of course, but you don't do it by combining the actions into a single action that unconditionally runs them all and then trying to process the resulting action to add the conditions; you do it by putting the conditions in as part of a more complicated combining operation than sequence)

This isn't even specific to infinite lists of IO actions; exactly the same "problem" happens with something like this:

takeWhile (/= 'q') <$> sequence (replicate 10 getChar)

sequence (replicate 10 getChar) will get 10 characters from stdin and then sequence them into a list in the order they were read. That's simply what that code means. Applying further computations to the resulting list (like takeWhile (/= 'q')) doesn't change what IO actions happened, it just computes a pure value from the pure string that was produced after all the I/O was done. Try it and see; if you type q before 10 characters it will still keep calling getChar a total of 10 times, and you won't see any output printed until you hit the 10th key. takeWhile (/= 'q') is like a stage in a pipeline; it changes what comes out of the pipeline, but it cannot change the behaviour of the stages before it to make them do less I/O.

(And that's a very good thing; if we're writing code that interacts with a user we're probably doing lots of I/O that doesn't produce any result that's read by downstream code. If the downstream code could "switch off" some of the I/O we do by not reading its result it would be a nightmare to control what the user experience is going to be. If we want to let calling code control exactly what bits of I/O are done, we have to provide those bits as separate I/O actions and let it decide how to compose them)

The simple answer is that the program you are trying to express is not "run all of these actions, combine their results into a list, then filter the list". But that's exactly what takeWhile _ <$> sequence _ means. If you want to do something else, write different code. In particular sequence is simply not what you want if you want to run your actions one at a time, examine the results, and decide whether to continue. But neither is anything else that gives you a single IO [Char]; don't combine your actions into one if you want to apply separate logic!


This can seem like IO is very different from normal Haskell, since in normal Haskell you can apply takeWhile (/= 'q') to an infinite list produced by an infinite computation and stop running part way through. But note that when you do that there is no way for takeWhile (/= 'q') to change the computation that it is post-processing. If the list you apply it to was produced by let xs = xs :: [Char], applying takeWhile can't stop that from running forever (and nor can any other function that examines the list in any way). In exactly the same way, if you write code that specifies an IO [Char] that runs forever (like sequence (repeat getChar)) then nothing you apply to it after the fact can convert into an IO [Char] that runs for a finite time and produces a different value (with different observable effects!).

But it's not that IO makes lazy things non-lazy. When you have an IO [Char], the [Char] that is produced is still lazy in exactly the same way as an ordinary [Char]. You can see this with something like:

getInfiniteChars = do
  c <- getChar
  pure $ repeat c

take 10 <$> getInfiniteChars will work just fine, because getInfiniteChars is a finite IO action that produces an infinte [Char], so we can run the action to completion and then post-process the result lazily; there's no need to "interrupt" the I/O and decide to stop it. sequence (repeat getChar) is simply an action doing an infinite amount of I/O, rather than a finite action producing an infinite [Char].

It's the IO part of IO [Char] that is "not lazy", but this isn't special; there are loads of other types that can't be consumed lazily (Integer, Int, Char, strict Text, etc). IO is simply like those, not like lists. Don't think of a IO [Char] as an actual string that exists and is simply tagged as being impure; the IO [Char] value is more like a program that could produce a string when you run it (which is why it can produce different strings when you run it multiple times; getChar :: IO Char could not possibly model console input if it was "a Char value tagged as impure", after all). You can't consume the program lazily in order to run parts of it and not others, but you can consume its result lazily once it has run to completion and handed over a result.

(You could do something like fork a thread to run the action and interrupt it from another thread while it's part way through; but there's no way to examine the memory managed by that thread and figure out which bits correspond to "the first part of its result" or "the character it just read" - at least not without dropping WAY down into low-level code that would make it hard to even recognise Haskell-level values - so that doesn't help you do what you're after either)


Now the elephant in the room is this: there is such a thing as lazy IO, and its even in the prelude in the form of things like getContents :: IO String. This does produce a string which can be filtered with things like takeWhile, and how much of the string you read influences how much I/O actually takes place.

However, it is impossible to implement something like getContents with "normal" Haskell. It requires the use of unsafeInterleaveIO, which is a special low-level operation involving compiler internals (and as the name suggests, is potentially unsafe), breaking the normal rules of the language. So it is probably possible for you to use things like unsafeInterleaveIO to implement a magic lazySequence :: [IO a] -> IO [a] that allows consumption of the resulting [a] to drive whether the individual IO a actions are run. But it would require tricky and advanced code, and still wouldn't be anything you could just apply to sequence (repeat getChar) after the fact; you would still need to replace sequence.

It is something of a consensus now that lazy I/O like getContents might have been a mistake to include in the prelude. It is very convenient for some simple programs, and being able to write those without needing to import things is arguably good. But lazy I/O can very easily lead to some very surprising behaviour causing bugs, which is especially harmful for beginners, so arguably it should have been kept out of the basic prelude that beginners typically learn from. And that's the pre-existing lazy I/O operations; writing your own lazy I/O by using unsafeInterleaveIO directly is not something to take lightly.

This answer didn't feel complete without acknowledging the existence of lazy I/O, but I would very much recommend against solving this kind of problem with it.