Ben Lever Ben.Lever at
Wed Aug 25 01:28:13 EDT 2010

Hi Manuel,

> You should be able to write stencil functions that build and return stencil functions.  It may be a bit tricky (involving type classes) as we use native Haskell pattern matching (and not a facility in the embedded language) for pattern matching in the stencil — this does make it *much* nice to write stencil's manually, though.  In the worst case, there is always Template Haskell (a meta-programming facility for Haskell), with which we can even generate vanilla Haskell code.  So, generating stencil functions is surely possible.
> You are saying that a convolution function would take a convolution kernel as its input.  How would you want to represent that kernel?

I think your suggestion of using Template Haskell is a good one. In that case, I guess how the kernel is represented is less of an issue.

Just to provide further motivation for where I'm coming from, perhaps a more compelling example is that of feature evaluation. In feature evaluation you run a stencil over a feature array (computed from the original input frame) and inspect a small, scattered, sub-set of the elements within each stencil. The scattering is specified by a feature classifier and it is static at the time of performing evaluation. The classifier typically only specifies a few hundred features out of much larger set of possibilities, for example, the stencil may be a 36x36 window with 8 features/element (i.e. 36x36x8 = 10368 features).

Because the feature classifier is static it would be good to generate a stencil function that is customised for the contents of a particular classifier. Such a generator would need to customise the stencil pattern to be the correct size and the stencil function would need to be customised to only the required elements. Obviously, what you don't want is a stencil function that refers to a feature classifier at run-time to determine which elements to evaluate.

I haven't used Template Haskell before, but from what I understand it seems like it could probably support generating stencil code in this way.

>> Is my understanding correct? Maybe there is a way too implement a function such as convolution ...
>> This is not an immediate problem, but maybe it can be discussed again when we investigate larger stencil patterns in the future. For example, in one of our use cases (36x36 stencil), it would be ideal if the stencil function could be generated, where the generation is driven by data specifying which coordinates within the stencil bounding box are to be analysed.
> In principle, I think, we can scale the current approach up to 36x36 stencils and use a code generator for the stencil code, but it may be a bit awkward.  So, we may want to have a look at whether we'd like an alternative interface for large stencils.  However, I'm worried that if we use arrays instead of tuples to describe the stencil, it'll get harder to identify unused stencil positions in the backend, so it depends a bit on what kind of convolution functions we are looking at.

How are unused stencil positions currently identified?

As mentioned above, the ability to identify unused stencil positions can be an issue with larger stencils that have this scattering pattern. That said, from the perspective of a CUDA backend and determining shared-memory requirement, because stencil is just a sliding window, it will eventually result in most elements being accessed.


The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

More information about the Accelerate mailing list