iteratee Strange behaviour eneeCheckIfDone

Michael Baikov manpacket at gmail.com
Thu Jul 7 10:50:27 BST 2011


On Thu, Jul 7, 2011 at 6:10 PM, John Lato <jwlato at gmail.com> wrote:
> Hi Michael,
> Thanks for this, more comments inline.
>
> On Thu, Jul 7, 2011 at 3:59 AM, Michael Baikov <manpacket at gmail.com> wrote:
>>
>> First let's import some things, which will be used later
>>
>> > import Data.Iteratee as I
>> > import Data.Iteratee.Char as I
>> > import Data.Iteratee.IO as I
>> > import Control.Monad.IO.Class
>> > import Control.Monad.Trans.Class
>> > import Control.Exception
>> > import Control.Monad (when)
>> > import Data.Char (toUpper)
>>
>>
>> And then let's define some Iteratees
>>
>> This one just dumps all it gets from input
>>
>> > dump = printLinesUnterminated
>>
>> This one performs one seek and then dumps everything else
>>
>> > dumpAndSeek = I.seek 6 >> dump
>>
>> Let's define some Enumeratees
>>
>> This one - using regular mapChunks (and eneeCheckIfDone) (actually we
>> can use streamMap, but mapChunks's  type signature looks better)
>>
>> > upStream :: Enumeratee String String IO a
>> > upStream = mapChunks (map toUpper)
>>
>> This one - with my mapChunks (and modified eneeCheckIfDone)
>>
>> > upStream' :: Enumeratee String String IO a
>> > upStream' = mapChunks' (map toUpper)
>>
>> And it's time to do some test. File "hello.txt" contains message
>> "Hello world!!!!\n\n"
>>
>> > test1 = enumFileRandom 1 "hello.txt" dump
>>
>> As expected: Hello world!!!!
>>
>> > test2 = enumFileRandom 1 "hello.txt" dumpAndSeek
>>
>> world!!!!
>>
>> > test3 = enumFileRandom 1 "hello.txt" (joinI $ upStream dump)
>>
>> HELLO WORLD!!!!
>>
>> > test4 = enumFileRandom 1 "hello.txt" (joinI $ upStream dumpAndSeek)
>>
>> throwErr in eneeCheckIfDone - so it just hangs forever.
>> Unexpected behaviour.
>
> This is indeed a bug.
>
>>
>> > test5 = enumFileRandom 1 "hello.txt" (joinI $ upStream' dumpAndSeek)
>>
>> And with modified version - it works fine.
>> WORLD!!!!
>>
>> > test6 = enumFileRandom 1 "hello.txt" (joinI $ upStream (I.seek 6 >>
>> > stream2list)) >>= run >>= print
>>
>> hangs forever
>
> This looks like the same bug, since 'upStream' is defined in terms of
> 'mapChunks', which in turn is defined with 'eneeCheckIfDone'.

It is the same bug, I just wanted to show it one more :)
I found mine version of this bug in the middle of huge multi-threaded
haskell program which takes ages to run so i decided to provide you
with a nice and simple version :)

>
>>
>> > test7 = enumFileRandom 1 "hello.txt" (joinI $ upStream' (I.seek 6 >>
>> > stream2list)) >>= run >>= print
>>
>> "WORLD!!!!\n\n"
>>
>> I don't see why it must behave differently when I am applying a simple
>> transformation to the stream.
>> And if I am misunderstanding something - what is the proper way to
>> dump file contents from 6'th byte
>> to the and while applying map upCase to it. With iteratees.
>
> I would put the seek outside of the enumeratee stream.  Or, since you know
> you're using ASCII characters, use drop instead.

Sure, that will work, but again - this is very simplified problem. In
the real world you need to use several layers of transformations and
decision to do seek is made in top most iteratee. So we need to be
able to pass exeptions as transparently as possible.


>> test8 = enumFileRandom 1 "hello.txt" (I.seek 6 >> joinI (upStream
>> stream2list)) >>= run >>= print
>
>>
>> And my modified implementation - it uses
>> eneeCheckIfDonePass (icont . step)
>> instead of
>> eneeCheckIfDone (liftI . go)
>>
>> > mapChunks' :: (Monad m, NullPoint s) => (s -> s') -> Enumeratee s s' m a
>> > mapChunks' f = eneeCheckIfDonePass (icont . step)
>> >     where
>> >         step k (Chunk xs)     = eneeCheckIfDonePass (icont . step) . k .
>> > Chunk $ f xs
>> >         step k str@(EOF mErr) = idone (k $ EOF mErr) str
>>
>>
>> eneeCheckIfDonePass - does not tries to handle any exceptions, just
>> passes them to
>> the parent Enumeratee/Enumerator
>>
>>
>> > eneeCheckIfDonePass :: (Monad m, NullPoint elo) =>
>> >     ((Stream eli -> Iteratee eli m a) -> Maybe SomeException -> Iteratee
>> > elo m (Iteratee eli m a))
>> >     -> Enumeratee elo eli m a
>> > eneeCheckIfDonePass f inner = Iteratee $ \od oc ->
>> >     let onDone x s = od (idone x s) (Chunk empty)
>> >         onCont k e = runIter (f k e) od oc
>> >     in runIter inner onDone onCont
>>
>> eneeCheckIfDoneHandle - Has a separate handler for exception, so user
>> can decide if
>> he wants to handle the exception or pass it to the partent.
>>
>> > eneeCheckIfDoneHandle :: (Monad m, NullPoint elo) =>
>> >     ((Stream eli -> Iteratee eli m a) -> Maybe SomeException -> Iteratee
>> > elo m (Iteratee eli m a))
>> >     -> ((Stream eli -> Iteratee eli m a) -> SomeException -> Iteratee
>> > elo m (Iteratee eli m a))
>> >     -> Enumeratee elo eli m a
>> > eneeCheckIfDoneHandle f h inner = Iteratee $ \od oc ->
>> >     let onDone x s = od (idone x s) (Chunk empty)
>> >         onCont k Nothing = runIter (f k Nothing) od oc
>> >         onCont k (Just e) = runIter (h k e)      od oc
>> >     in runIter inner onDone onCont
>>
>> eneeCheckIfDoneIgnore - Ignores all exceptions
>>
>> > eneeCheckIfDoneIgnore :: (Monad m, NullPoint elo) =>
>> >     ((Stream eli -> Iteratee eli m a) -> Maybe SomeException -> Iteratee
>> > elo m (Iteratee eli m a))
>> >     -> Enumeratee elo eli m a
>> > eneeCheckIfDoneIgnore f inner = Iteratee $ \od oc ->
>> >     let onDone x s = od (idone x s) (Chunk empty)
>> >         onCont k _ = runIter (f k Nothing) od oc
>> >     in runIter inner onDone onCont
>
> I need to spend a little more time reviewing these, but they all seem like
> useful alternatives.  Sometimes it makes sense for the stream transformer
> (enumeratee) to handle an exception, sometimes not.  In particular, seeking
> would have to be passed up to the handle enumerator.
> Unfortunately it's not quite that simple in all cases.   If there isn't a
> 1-1 correspondence between elements of the inner stream and the outer
> stream, how should seek behave?  Should it attempt to seek in the inner
> stream (which may not be possible), or just pass it up and assume you know
> what you're doing?  The second is much easier to implement, but I think the
> former would be more useful.
> Thoughts?

Sure. You just need to create several different seeks and probably
rename I.seek to something more specific, like fileHandleSeek or
something like that. So if you want to go to specific point in file
(and you know that your enumeratee's chain can handle such seek - you
sends an exception named (FileSeek Offset). If you want to go to
specific time frame - you just fire up another exception - (TimeSeek
TimeOffset) and handle it in the appropriate place. You can use drop
for stream of chars if you are not doing any transformations, but if
each chunk takes 1 second to process and you need to drop 1 million of
them...

So just create several types of exceptions, place handlers in
reasonable places, handle those exceptions that you can and pass then
further if you can't and create several different seek functions.

> John



More information about the Iteratee mailing list