iteratee Iteratee breaks monadic laws

John Lato jwlato at gmail.com
Wed Feb 15 16:45:39 GMT 2012


Hi Michael,

Thanks very much for your reports, and this discussion.  It's been
very useful to me.

On Wed, Feb 15, 2012 at 11:14 AM, Michael Baikov <manpacket at gmail.com> wrote:
> Hello John.
>
> I see. That's a very interesting second law, and it will be part of my
> next bug report :)
>
> "All iteratees must enter the done state (i.e. call the first
> continuation) upon receipt of EOF."
> This must be extended to something like "must enter the done state or
> emit an exception to
> perform seek or anything else to change this EOF state"

I take your point here.

>
> Suppose you have a file and you want to do seeking. And you want to
> seek only for specific marks.
> So first you read whole file, remembering all marks location and then
> after getting EOF just seek for first such mark and do some magical
> stream processing.

This is a reasonable thing to want to do.  When I've had to deal with
this myself, I've essentially worked around it by not advancing the
stream when I know I'm at the last mark.  This was possible for the
file format I was working with, but it may not always be the case.

>
>
>
> I know, this is a bad example, but only for simplicity sake.
> First two options dies after 10 loops, as expected, third one waits
> forever because of throwErr in joinI.

If I don't change the second law, then it's your "consumer" that's broken :)

But in all seriousness, what would be the desired semantics of joinI
if the second law is changed as you suggest?  If we take

> joinI outer = outer >>= lift . run

as a semantic definition, the behavior is identical to the current
joinI up to bottom (e.g. this throws an exception, the current is
non-terminating).

Or from another perspective, how exactly is the continuation meant to
be constructed in the general case?  Since "joinI :: Iteratee s m
(Iteratee s' m a) -> Iteratee s m a", where are the required "s' "s
meant to come from?  It's not a problem with Control.Monad.join,
because that function has both a different type and different
semantics (even when the types do match).

In short, although I do think it's reasonable to want to change my
second law as you suggest, I'm not certain how to do so sensibly.  The
second law essentially provides that "run :: Iteratee s m a -> m a" is
total for all iteratees that obey it.  Your consumer causes "run" to
diverge.  That makes joinI divergent in turn, by the meaning of joinI
which explicitly terminates the inner iteratee with "run".  Of course
(like many laws) it can be usefully violated, but only if you avoid
the cases that the law is meant to protect against.  Or if you are
exceedingly clever.

I can make joinI friendlier by having it propagate an exception
instead of non-terminating or calling error, but for now I think
that's the extent of what's possible.

>
> import Data.Iteratee.Iteratee
> import Control.Monad
> import Data.Iteratee.IO
> import Control.Monad.IO.Class
> import qualified Data.Iteratee as I
>
> enumFileForever :: String -> Enumerator String IO a
> enumFileForever fname iter = enumFileRandom 10 fname iter >>= enumEof
>>>= enumFileForever fname
>
> consumer :: Iteratee String IO ()
> consumer = liftI (go 0)
>    where
>        go c (Chunk s) = liftIO (putStr s) >> liftI (go c)
>        go 10 e = error "I did 10 loops, let's die"
>        go n e = I.seek 0 >> liftI (go (n+1))
>
> main = do
>  --  enumFileForever "foo" consumer >>= run
>  --  enumFileForever "foo" (join $ (mapChunks id) $ consumer) >>= run
>  --  enumFileForever "foo" (joinI $ (mapChunks id) $ consumer) >>= run
>
>
>
> PS: There is something wrong with iteratee maillist against.  Feb is missing.
> http://projects.haskell.org/pipermail/iteratee

I should probably look for a new host.  It's disappointing, but
projects.haskell.org has been problematic.

John


>
>
> On Wed, Feb 15, 2012 at 6:12 PM, John Lato <jwlato at gmail.com> wrote:
>> Hi Michael,
>>
>> Yes, this is unfortunate.  Iteratees that attempt to inject data into
>> a stream are unsound, for exactly this reason.  Currently, I am aware
>> of two laws that well-formed iteratees should follow:
>>
>> 1.  Any data returned to a stream (via the second parameter of
>> "idone") should be a subset of data received from the stream.
>> 2.  All iteratees must enter the done state (i.e. call the first
>> continuation) upon receipt of EOF.
>>
>> The first is (in part) because of this problem; the second (together
>> with the first) guarantees a result upon termination.
>>
>> The first law may in fact be too strict, at least from the perspective
>> of monad laws.  It's probably possible to write a safe version of
>> "inject" provided that you first take some stream data, e.g.
>>
>> inject :: Int -> Iteratee [Int] IO ()
>> inject x = liftI step
>>  where
>>  step (Chunk xs) = idone () (Chunk (x:xs))
>>  step str        = idone () str
>>
>> Of course this won't actually inject values unless you apply it to an
>> enumerator:
>>
>> main = do
>>  (enumPure1Chunk [] test') >>= run >>= print
>>  (enumPure1Chunk [] test'') >>= run >>= print
>>
>> Both of these do print "[0,1,2]".  However if your iteratee starts
>> injecting data into a stream it will alter the stream length, which
>> will cause issues with things like "I.take" and "countConsumed".
>> Which is another reason for the first law.
>>
>> The monadic law violation is a consequence of the current design, in
>> which remainder stream data is carried along with a final result.  In
>> iteratee-0.9, I'm changing the definition of iteratee to something
>> closer to (haven't quite decided yet):
>>
>> newtype Iteratee s m a = Iteratee{ runIter :: forall r.
>>          (a -> r) ->
>>          ((Stream s -> m (Iteratee s m a, Stream s)) -> r) ->
>>          (Iteratee s m a -> SomeException -> r) ->
>>          (forall b. m b -> (b -> (Iteratee s m a)) -> r) ->
>>          r}
>>
>> This should fix the inject problem, although injecting elements will
>> still be unsound WRT I.take and similar.
>>
>> John L.
>>
>> On Wed, Feb 15, 2012 at 4:47 AM, Michael Baikov <manpacket at gmail.com> wrote:
>>> There are three known monadic laws which every monad must obey:
>>>
>>> "Left identity":      return a >>= f  ≡  f a
>>> "Right identity":    m >>= return  ≡  m
>>> "Associativity":    (m >>= f) >>= g  ≡  m >>= (\x -> f x >>= g)
>>>
>>> Iteratee breaks the third one: "do { a; b; c }" gets desugared to this:
>>> "a >> (b >> c)" plain usage of "a >> b >> c" is the same as "(a >> b) >> c",
>>> since (>>)  is defined with infixl.
>>>
>>> Here is a simple example which works with do notation, but fails to
>>> work with (>>)
>>>
>>> import qualified Data.Iteratee  as I
>>> import Data.Iteratee.Iteratee
>>>
>>> inject :: Int -> Iteratee [Int] IO ()
>>> inject x = idone () (Chunk [x])
>>>
>>> test' :: Iteratee [Int] IO [Int]
>>> test' = do
>>>    inject 2
>>>    inject 1
>>>    inject 0
>>>    I.stream2list
>>>
>>> test'' :: Iteratee [Int] IO [Int]
>>> test'' = inject 2 >> inject 1 >> inject 0 >> I.stream2list
>>>
>>> main = do
>>>    run test' >>= print  -- [0, 1, 2]
>>>    run test'' >>= print -- [2]



More information about the Iteratee mailing list