iteratee FileOffset of data read with s

Thu Jan 12 15:41:58 GMT 2012

Hello,

(Apologies for the fairly unstructured reply)

You can take the approach I do in sndfile-enumerators, which creates a
dictionary of embedded enumerators.  Each dict entry stores the chunk
length and an enumerator to use for that chunk (wave chunk), e.g.

> data WAVEDEENUM =
>   WENDUB (forall a m. (MonadCatchIO m, Functor m) =>
>     MEnumeratorM2 (Vector Word8) (Vector Double) m a)
>
> type MEnumeratorM2 sfrom sto m a = Iteratee sto m a -> Iteratee sfrom m (Iteratee sto m a)

the code to create the enumerator looks like
>   WENDUB (\iter_dub -> do
>     I.seek (8 + fromIntegral offset)
>     let iter = convStream (convFunc fmt) iter_dub
>     joinI . takeUpTo count $ iter)

It works reasonably well, however this is all a bit aside from the
point of seeking through enumeratees, which unfortunately is a
slightly messy process.

There are a couple (conflicting?) cases to be accounted for.  I've
always been inspired by Oleg's design, which has the handle/fd
enumerator keep track of the stream position, and never exposes it to
an iteratee.  Extending that further, I don't think you need the
Offset wrapper at all, but there are at least two good reasons why you
may want to keep it.

You can use a function like this:

> convSeek1
>   :: (Monad m, Nullable s)
>   => Iteratee s m s'
>   -> Integer    -- ^ Offset
>   -> Rational  -- ^ Conversion ratio
>   -> Enumeratee s s' m a

With this and support for relative seeking you'd get everything the
"Offset a" type gives you, with one notable exception.  It would be
more difficult to get access to the current offset.  I like this
design, but that's a very big drawback.

Of course, you'll need something like "convSeek1" to support seeking
forward whenever the conversion ratio isn't 1-1.  And there's no
reason you couldn't use "convSeek1" with the Offset wrapper as well.

If the conversion ratio isn't fixed (e.g. decoding bytes to Unicode)
it gets messier.  You can only reliably seek back within the current
chunk (provided that it exists somewhere up the chain), and forward
seeking really can't be done at all except via dropping chunks.

John

On Thu, Jan 12, 2012 at 4:26 AM, Conrad Parker <conrad at metadecks.org> wrote:
> Hi,
>
> I've been playing around with some code to track the file offsets of
> data being read from an iteratee. It is currently in a branch named
> "offset-bytestring" in zoom-cache, but I'd like some feedback before
> doing anything more with it (eg. merging it into something, splitting
> it out into a new package etc.)
>
> The approach is to introduce a wrapper type (Offset a):
>
>> data Offset a = Offset {-# UNPACK #-}!FileOffset !a
>
> with instances for Nullable, NullPoint, Monoid, FoldableLL, ListLike defined in:
>
> https://github.com/kfish/zoom-cache/blob/offset-bytestring/Data/Offset.hs
>
> We then read data from a file using an enumerator of stream (Offset
> ByteString). When each Chunk (Offset ByteString) is constructed it is
> tagged with the current file position before reading; see
> makeFdCallbackOBS in:
>
> https://github.com/kfish/zoom-cache/blob/offset-bytestring/Data/Iteratee/IO/OffsetFd.hs
>
> Data can be read from such a stream using any Iteratee ByteString,
> with the iteratee transformer convOffset (a hacked-up version of
> countConsumed, which Alex Lang recently contributed to iteratee; I
> wonder if it's possible to implement convOffset using
> countConsumed...). There are also iteratee versions of tell, take etc.
> which operate on (Offset ByteString) and update the offset tag
> appropriately, defined in:
>
> https://github.com/kfish/zoom-cache/blob/offset-bytestring/Data/Iteratee/Offset.hs
>
> All this seems to be working ok, and that branch of zoom-cache reports
> the offsets of packets and summaries, and the corresponding branch of
> scope works on streams of type (Offset Block) etc.
>
> The definition of (Offset a) may be inefficient, and the
> implementation of OffsetFd.hs is currently just a hacked-up version of
> Data.Iteratee.IO.Posix.
>
> The next step in zoom-cache is to actually use the file offsets for
> building seek tables and so on. However I'm wondering if this approach
> is the right way to go, or is there a simpler way to associate file
> offsets with an iteratee stream.
>
> thoughts?
>
> Conrad.
>
> _______________________________________________
> Iteratee mailing list
> Iteratee at projects.haskell.org
> http://projects.haskell.org/cgi-bin/mailman/listinfo/iteratee