Accelerate

Manuel M T Chakravarty chak at cse.unsw.edu.au
Wed Jul 28 02:12:53 EDT 2010


Hi Ben,

> (I've sent this email to the accelerate mailing list - is this a better destination for these discussions?)

Yes, but Rami, BenLi, and Gabi haven't signed up yet.  [You can do that here: <http://projects.haskell.org/cgi-bin/mailman/listinfo/accelerate>]

> To handle a 5-point stencil, the shape of the intermediate stencil array can be a one-dimensional array of 5 elements. The location of a, b, c, d and e in the stencil array is given by the ordering of the array as specified by the stencil function. For example:
> 
> stencilFn :: dimS -> dim' -> dim
> stencilFn i (y, x) =
>  case i of
>    0 -> (y - 1,     x)  -- a
>    1 -> (    y, x - 1)  -- b
>    2 -> (    y,     x)  -- c
>    3 -> (    y, x + 1)  -- d
>    4 -> (y + 1,     x)  -- e
> 
> One piece of information we did leave out of our specification is that the stencil reduction operator can expect that the intermediate stencil array will be folded over, sequentially, in row-major order. These semantics allow the stencil reduction operator to treat stencil array elements uniquely if it is important to do so.

Ok, that makes sense.

> Your question, however, does highlight an interesting point, which is that the stencil array doesn't really need to be an array of any shape. Our semantics are that the stencil function simply gathers a bunch of elements in some defined order. Perhaps, the specification of the stencil array extent should be removed from the type signature and replaced with simply the number of elements to gather. For example:
> 
> stencil
>    :: (Ix dim, Ix dim', Ix dimS, Elem a, Elem b)
>    => Acc (Array dim a)                                        -- Source array
>    -> (Exp dim  -> Exp dim')                                   -- Function specifying extent of the destination array
>    -> Exp DIM1                                                 -- Number of elements gathered by stencil
>    -> (Exp DIM1 -> Exp dim' -> Exp dim)                        -- Stencil specification function
>    -> (Exp dim' -> Exp b -> Exp a -> Exp b)                    -- Stencil reduction operator
>    -> Exp b                                                    -- Stencil reduction identity value
>    -> Acc (Array dim' b)                                       -- Destination array
> 
> Do you think this is a good way to handle situations like a 5-point stencil?
> 
> To answer your other question about the stencil specification function (Exp dimS -> Exp dim' -> Exp dim) potentially being too general: we agree that it is general but we see stencil specifications being a regular pattern for gathering input array elements, even if they aren't neighbours. Whilst regularity can not be guaranteed statically, if the stencil pattern is regular, there are similar opportunities for locality-based optimisations.

Personally, I would prefer to err on the side of being too specialised.  If we define a version of stencil now that is too restrictive, we can always generalise it once we have a concrete example that demonstrates the need for generalisation.  A more specialised version is easier to implement, easier to optimise, and easier to understand.

From a parallel programming point of view, the stencil specification function specifies a kind of permutation.  In data-parallel programming, permutations are big and inefficient sledge hammers that should be avoided where possible (and be replaced with more structured operations).  In fact, I get the distinct feeling that we should characterise more precisely what a stencil actually is.  What are a stencil's properties and constraints?  (I do mean the actual stencil, not the operation that applies the stencil to an entire source array, which we slightly confusingly call 'stencil' at the moment.)

A stencil is static; ie, it is not computed by the code running on the GPU, but determined by the Haskell program embedding the Accelerate program.  So, we do have the full Haskell language to characterise stencils.  Before trying to characterise stencils in Haskell, let me list a few constraints that I believe stencils meet:

* A stencil is always drawn from a neighbourhood of locations in the source array — ie, from a sub-hypercube of the same dimensionality as the source array.
* The focal point of a stencil is not necessarily the middle of that neighbourhood.  (It would be easier of it always were the middle, but I guess that would be too narrow — correct me if I'm wrong.)
* As the 5-point stencil shows, some points from the neighbourhood may be ignored.

Is that right?

Manuel

> On 28/07/2010, at 12:21 AM, Manuel M T Chakravarty wrote:
> 
> Hi Ben,
> 
> Your revised definition seems like a way to get out of having to use nesting.  One point, I'm wondering about is how you like to represent something like a 5-point stencil
> 
>   a
> bcd
>   e
> 
> where some elements of the stencil array are not used.
> 
> Cheers,
> Manuel
> 
> Hi Manuel,
> 
> Thanks for your comments about our suggested stencil function. I also haven't been able to add comments on Trac, so this response is being broadcasted to everyone.
> 
> Rami, Trevor and I made a start on a CUDA backend for the stencil function today, and quickly realised that, in its current form, it is non trivial to implement because it allows for nested data parallelism. After some discussion, we arrived at a modified version of stencil that, whilst being more restrictive, should make the backend simpler and also allow for further optimisations. At this stage, we believe it is sufficient for computer vision use cases.
> 
> The attached stencil.hs file contains type signatures of proposed modified stencil functions. The key modifications are:
> 
> 
> *   Direct access to the stencil array(s) has been removed to avoid the nested data parallelism problem:
>  *   Instead, the semantics of the stencil function is to apply a fold reduction over each intermediate stencil array to produce an output array element
>  *   In the case of stencil2 and higher, multiple intermediate stencil arrays are folded together, element-wise, to produce an output array element
> *   For stencil2 and higher, the stencil arrays, produced from elements of the source arrays, are of the same extent - this is necessary to support the element-wise folding
> 
> It would be good to get your's and other's thoughts on these modifications.
> 
> Trevor says he's happy to convey some of the background reasoning behind this proposal in your group meeting tomorrow. If, however, you think it would be useful for us to join the meeting and explain further, please let us know, and we'll definitely come in.
> 
> 
> Regards,
> 
> Ben Lever
> Senior Researcher Engineer
> National ICT Australia
> 
> 
> 
> 
> Ben Lever
> Senior Researcher Engineer
> National ICT Australia
> 
> NICTA l Locked Bag 9013 l Alexandria NSW 1435
> T + 612 8306 0742 | F +612 9376 2027
> www.nicta.com.au<http://www.nicta.com.au/> l ben.lever at nicta.com.au<mailto:rami.mukhtar at nicta.com.au>
> 
> From imagination to impact.
> 
> 
> 
> 
> 
> ________________________________
> The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.




More information about the Accelerate mailing list