iteratee Iteratee breaks monadic laws

John Lato jwlato at gmail.com
Tue Feb 21 15:21:06 GMT 2012


Ok, this ended up being a little trickier than I expected, but I've
pushed some code to the repos.

New variants of enumeratees are in Data.Iteratee.PTerm (they're also
exported by Data.Iteratee).  I'm pretty sure they all work properly.
I haven't add variants of 'group' or 'groupBy' yet though.

I'd appreciate it if you could confirm that they do provide the
expected behavior.

Cheers,
John

On Mon, Feb 20, 2012 at 1:36 PM, John Lato <jwlato at gmail.com> wrote:
> Hi Michael,
>
> A small update on this issue, and putting some information on the list
> for posterity.
>
> "It's not a bug, this behavior is by design".
>
> Currently, as a general rule enumeratees do not pass EOF to their
> inner iteratees.  This is to allow for compositions like the
> following:
>
> compile :: StreamExpr -> Enumerator Vec m a
> compile (FileSource stuff) i = tryRun =<< enumFile (fp stuff) (I.drop
> (offset stuff) >> I.takeUpTo (count stuff) i)
> compile (Seq exprs) i = foldr ((>=>) . compile) enumEof exprs i
>
> If enumeratees passed along terminators, then the inner iteratee in
> the "FileSource" line couldn't be retrieved, as it would always be in
> a Done state after it's extracted from I.takeUpTo.
>
> For what you want to do, though, the enumeratee must pass along EOF in
> order to signal that state to the consumer, and then return an outer
> iteratee in either the done or cont state depending on the inner
> iteratee's behavior.
>
> I think the best approach is to duplicate all provided enumeratees so
> that both semantics are available.  Fortunately the required changes
> are pretty minimal.  I've re-implemented the enumeratees from
> Data.Iteratee.Iteratee, and the new "mapChunks" seems to behave as
> required.  Many core functions, such as eneeCheckIfDone* and joinI,
> should be unaffected.
>
> I'll see if I can get a patch into the repo today, and if it works for
> you push it to hackage later.
>
> Best,
> John
>
> On Thu, Feb 16, 2012 at 1:35 AM, Michael Baikov <manpacket at gmail.com> wrote:
>> Hello John.
>>
>>> This is a reasonable thing to want to do.  When I've had to deal with
>>> this myself, I've essentially worked around it by not advancing the
>>> stream when I know I'm at the last mark.  This was possible for the
>>> file format I was working with, but it may not always be the case.
>>
>> It's not possible in general case. And in my case it's even more complicated -
>> file gets appended by separate process and I want to be able to follow it's
>> tail.
>>
>>
>>>> I know, this is a bad example, but only for simplicity sake.
>>>> First two options dies after 10 loops, as expected, third one waits
>>>> forever because of throwErr in joinI.
>>>
>>> If I don't change the second law, then it's your "consumer" that's broken :)
>>
>> True, but I can change EOF Nothing with EOF EndOfFileReachedDoSeekNow.
>> This extra field inside EOF constructor was meant to be used to send controll
>> messages from Enumerator to Iteratees/Enumeratees. This is mentioned in Oleg's
>> original paper, he also mentioned this in our conversation about typesafe
>> exception passing. So it there is anything broken - it's iteratee
>> implementation :)
>>
>>
>>
>>>>> Both of these do print "[0,1,2]".  However if your iteratee starts
>>>>> injecting data into a stream it will alter the stream length, which
>>>>> will cause issues with things like "I.take" and "countConsumed".
>>>>> Which is another reason for the first law.
>>
>> Those are two different streams. Those functions are "take from outer stream"
>> and "count consumed from outer stream", and not "feed into consumer" and
>> "count consumed by ultimate consumer". I still can have a transformer which
>> will break [Word64] into much larger [Word8] or even [Bit].
>>
>>
>>
>> On Thu, Feb 16, 2012 at 12:45 AM, John Lato <jwlato at gmail.com> wrote:
>>> Hi Michael,
>>>
>>> Thanks very much for your reports, and this discussion.  It's been
>>> very useful to me.
>>>
>>> On Wed, Feb 15, 2012 at 11:14 AM, Michael Baikov <manpacket at gmail.com> wrote:
>>>> Hello John.
>>>>
>>>> I see. That's a very interesting second law, and it will be part of my
>>>> next bug report :)
>>>>
>>>> "All iteratees must enter the done state (i.e. call the first
>>>> continuation) upon receipt of EOF."
>>>> This must be extended to something like "must enter the done state or
>>>> emit an exception to
>>>> perform seek or anything else to change this EOF state"
>>>
>>> I take your point here.
>>>
>>>>
>>>> Suppose you have a file and you want to do seeking. And you want to
>>>> seek only for specific marks.
>>>> So first you read whole file, remembering all marks location and then
>>>> after getting EOF just seek for first such mark and do some magical
>>>> stream processing.
>>>
>>> This is a reasonable thing to want to do.  When I've had to deal with
>>> this myself, I've essentially worked around it by not advancing the
>>> stream when I know I'm at the last mark.  This was possible for the
>>> file format I was working with, but it may not always be the case.
>>>
>>>>
>>>>
>>>>
>>>> I know, this is a bad example, but only for simplicity sake.
>>>> First two options dies after 10 loops, as expected, third one waits
>>>> forever because of throwErr in joinI.
>>>
>>> If I don't change the second law, then it's your "consumer" that's broken :)
>>>
>>> But in all seriousness, what would be the desired semantics of joinI
>>> if the second law is changed as you suggest?  If we take
>>>
>>>> joinI outer = outer >>= lift . run
>>>
>>> as a semantic definition, the behavior is identical to the current
>>> joinI up to bottom (e.g. this throws an exception, the current is
>>> non-terminating).
>>>
>>> Or from another perspective, how exactly is the continuation meant to
>>> be constructed in the general case?  Since "joinI :: Iteratee s m
>>> (Iteratee s' m a) -> Iteratee s m a", where are the required "s' "s
>>> meant to come from?  It's not a problem with Control.Monad.join,
>>> because that function has both a different type and different
>>> semantics (even when the types do match).
>>>
>>> In short, although I do think it's reasonable to want to change my
>>> second law as you suggest, I'm not certain how to do so sensibly.  The
>>> second law essentially provides that "run :: Iteratee s m a -> m a" is
>>> total for all iteratees that obey it.  Your consumer causes "run" to
>>> diverge.  That makes joinI divergent in turn, by the meaning of joinI
>>> which explicitly terminates the inner iteratee with "run".  Of course
>>> (like many laws) it can be usefully violated, but only if you avoid
>>> the cases that the law is meant to protect against.  Or if you are
>>> exceedingly clever.
>>>
>>> I can make joinI friendlier by having it propagate an exception
>>> instead of non-terminating or calling error, but for now I think
>>> that's the extent of what's possible.
>>>
>>>>
>>>> import Data.Iteratee.Iteratee
>>>> import Control.Monad
>>>> import Data.Iteratee.IO
>>>> import Control.Monad.IO.Class
>>>> import qualified Data.Iteratee as I
>>>>
>>>> enumFileForever :: String -> Enumerator String IO a
>>>> enumFileForever fname iter = enumFileRandom 10 fname iter >>= enumEof
>>>>>>= enumFileForever fname
>>>>
>>>> consumer :: Iteratee String IO ()
>>>> consumer = liftI (go 0)
>>>>    where
>>>>        go c (Chunk s) = liftIO (putStr s) >> liftI (go c)
>>>>        go 10 e = error "I did 10 loops, let's die"
>>>>        go n e = I.seek 0 >> liftI (go (n+1))
>>>>
>>>> main = do
>>>>  --  enumFileForever "foo" consumer >>= run
>>>>  --  enumFileForever "foo" (join $ (mapChunks id) $ consumer) >>= run
>>>>  --  enumFileForever "foo" (joinI $ (mapChunks id) $ consumer) >>= run
>>>>
>>>>
>>>>
>>>> PS: There is something wrong with iteratee maillist against.  Feb is missing.
>>>> http://projects.haskell.org/pipermail/iteratee
>>>
>>> I should probably look for a new host.  It's disappointing, but
>>> projects.haskell.org has been problematic.
>>>
>>> John
>>>
>>>
>>>>
>>>>
>>>> On Wed, Feb 15, 2012 at 6:12 PM, John Lato <jwlato at gmail.com> wrote:
>>>>> Hi Michael,
>>>>>
>>>>> Yes, this is unfortunate.  Iteratees that attempt to inject data into
>>>>> a stream are unsound, for exactly this reason.  Currently, I am aware
>>>>> of two laws that well-formed iteratees should follow:
>>>>>
>>>>> 1.  Any data returned to a stream (via the second parameter of
>>>>> "idone") should be a subset of data received from the stream.
>>>>> 2.  All iteratees must enter the done state (i.e. call the first
>>>>> continuation) upon receipt of EOF.
>>>>>
>>>>> The first is (in part) because of this problem; the second (together
>>>>> with the first) guarantees a result upon termination.
>>>>>
>>>>> The first law may in fact be too strict, at least from the perspective
>>>>> of monad laws.  It's probably possible to write a safe version of
>>>>> "inject" provided that you first take some stream data, e.g.
>>>>>
>>>>> inject :: Int -> Iteratee [Int] IO ()
>>>>> inject x = liftI step
>>>>>  where
>>>>>  step (Chunk xs) = idone () (Chunk (x:xs))
>>>>>  step str        = idone () str
>>>>>
>>>>> Of course this won't actually inject values unless you apply it to an
>>>>> enumerator:
>>>>>
>>>>> main = do
>>>>>  (enumPure1Chunk [] test') >>= run >>= print
>>>>>  (enumPure1Chunk [] test'') >>= run >>= print
>>>>>
>>>>> Both of these do print "[0,1,2]".  However if your iteratee starts
>>>>> injecting data into a stream it will alter the stream length, which
>>>>> will cause issues with things like "I.take" and "countConsumed".
>>>>> Which is another reason for the first law.
>>>>>
>>>>> The monadic law violation is a consequence of the current design, in
>>>>> which remainder stream data is carried along with a final result.  In
>>>>> iteratee-0.9, I'm changing the definition of iteratee to something
>>>>> closer to (haven't quite decided yet):
>>>>>
>>>>> newtype Iteratee s m a = Iteratee{ runIter :: forall r.
>>>>>          (a -> r) ->
>>>>>          ((Stream s -> m (Iteratee s m a, Stream s)) -> r) ->
>>>>>          (Iteratee s m a -> SomeException -> r) ->
>>>>>          (forall b. m b -> (b -> (Iteratee s m a)) -> r) ->
>>>>>          r}
>>>>>
>>>>> This should fix the inject problem, although injecting elements will
>>>>> still be unsound WRT I.take and similar.
>>>>>
>>>>> John L.
>>>>>
>>>>> On Wed, Feb 15, 2012 at 4:47 AM, Michael Baikov <manpacket at gmail.com> wrote:
>>>>>> There are three known monadic laws which every monad must obey:
>>>>>>
>>>>>> "Left identity":      return a >>= f  ≡  f a
>>>>>> "Right identity":    m >>= return  ≡  m
>>>>>> "Associativity":    (m >>= f) >>= g  ≡  m >>= (\x -> f x >>= g)
>>>>>>
>>>>>> Iteratee breaks the third one: "do { a; b; c }" gets desugared to this:
>>>>>> "a >> (b >> c)" plain usage of "a >> b >> c" is the same as "(a >> b) >> c",
>>>>>> since (>>)  is defined with infixl.
>>>>>>
>>>>>> Here is a simple example which works with do notation, but fails to
>>>>>> work with (>>)
>>>>>>
>>>>>> import qualified Data.Iteratee  as I
>>>>>> import Data.Iteratee.Iteratee
>>>>>>
>>>>>> inject :: Int -> Iteratee [Int] IO ()
>>>>>> inject x = idone () (Chunk [x])
>>>>>>
>>>>>> test' :: Iteratee [Int] IO [Int]
>>>>>> test' = do
>>>>>>    inject 2
>>>>>>    inject 1
>>>>>>    inject 0
>>>>>>    I.stream2list
>>>>>>
>>>>>> test'' :: Iteratee [Int] IO [Int]
>>>>>> test'' = inject 2 >> inject 1 >> inject 0 >> I.stream2list
>>>>>>
>>>>>> main = do
>>>>>>    run test' >>= print  -- [0, 1, 2]
>>>>>>    run test'' >>= print -- [2]



More information about the Iteratee mailing list