language.c: support for statements in MonadTrav

Aaron Tomb aarontomb at gmail.com
Sat Feb 5 22:54:21 GMT 2011


Hi,

As Benedikt mentioned, I've been the main person working on the
analysis component of language-c. As it stands right now, there is no
callback mechanism in MonadTrav to deal with anything deeper in the
AST than the top level of function bodies. I considered the
possibility of adding code to dive into statements, but I generally
prefer the idea of using generic programming libraries for that sort
of thing.

For traversal of statements and expressions, I've had success doing
generic programming with Uniplate, which is sufficient for everything
I've needed, and purely static. As a result, it's almost as efficient
as a hand-written traversal, and will never fail at runtime.

I wrote up a set of Uniplate instances for statements and expressions,
and have been meaning to upload it to Hackage (as a separate package,
so that it doesn't force language-c to depend on uniplate), but there
are a couple of things I want to fix about it before I make it fully
public. If you want to see it in its current state, take a look at my
Github repository:

https://github.com/atomb/language-c-uniplate

However, using generic programming this way does, as others pointed
out, depend on having some sort of symbol table in advance. The
current strategy I've been considering is to do a first pass over the
AST using MonadTrav and the existing type checker, and use this to
build up a map from Name -> Type, or Name -> something else, and then
do all other traversals using generic programming, looking up types or
other metadata in the symbol table from then on. If I remember
correctly, I haven't completely tied in the code to build a full
symbol table yet, but I don't think there's much work left to be done.

The existing code will build up a map containing information about all
global declarations, but my idea is to extend that to include local
declarations, as well, to make it useful when doing repeated traversal
of statements and expressions within function bodies.

Aaron

On Fri, Feb 4, 2011 at 8:46 AM, Benedikt Huber <benedikt.huber at gmail.com> wrote:
>
> On 04.02.2011, at 17:05, Alexander Bernauer wrote:
>
>> Hi
>> [..]
>>>
>>> Ah, sorry. What I had in mind is that visitors are not necessary if
>>> you can use generic programming
>>> [http://d.hatena.ne.jp/syd_syd/20080813#p1].
>>
>> I can't read nay Japanese, can you?
>
> Hi,
> No :) But if you know a little bit about Data.Generics the google-translated
> webpage gives you a very good idea what it means to manipulate
> syntax trees using 'everywhere' and friends.
>>
>> But I googled a bit and found the Lämmel & Jones papers [1] and I guess
>> you refer to that.
>
> correct.
>>
>> I am pretty new to Haskell so I am not completely sure about the
>> following. But I think generic programming depends on run-time type
>> checks which could be expensive at runtime, couldn't it?
>
> They do depend on run-time type checks, obviously; I have no idea
> how 'expensive' generic programming is, though. I think Neil Mitchell
> did performance measurements for his Uniplate
> (http://hackage.haskell.org/package/uniplate)
> and a few other frameworks, but I can't find the page right now.
>
>> On the other hand I think a generic visitor implementation also requires
>> having the boilerplate code for traversal only once. So, what advantages
>> do
>> you see for using generic programming?
>
> I think visitor frameworks and generic programming are related concepts.
> Renaming all variables by writing
>> let ast' = everywhere (mkT renameVarExpr) ast
> looks elegant and 'functional' to me, but maybe you can achieve
> the same thing using visitors? A (in this cass imho less important) benefit
> of generic programming is that the library author does not have to write any
> code.
> But you definitely benefit from an language-c specific visitor or generic
> programming framework; you know that all visited elements are CNodes,
> for example; or (as I said before) provide symbol table and type
> information.
>
>>> But on the raw C AST, generic programming is not trivial, because
>>> you need to have symbol tables and type information at hand to do
>>> something useful. For example, when building a callgraph, you need
>>> to know the current state of the symbol table at each call site,
>>> because a local definition might shadow a global one.
>>
>> As far as I know there are only local function declarations but no
>> definitions. Furthermore, local function declarations always refer to
>> global function definitions. This means, you can only change the type of
>> a function locally but you can not shadow a global function.
>>
>> So, collecting the call graph should be easy. Simply find all function
>> calls in all function definitions and use the function name as a
>> reference.  Did I miss anything?
>
> Yes, you did :)
>> int f(int (*g)(void)) {
>>    return g();
>> }
>
>>
>>> CIL does a lot of preprocessing, which greatly simplifies the C AST.
>>> If this is an option for you, maybe [1] would be interesting to you.
>>
>> Hmm, interesting. But this seems to be a bit incomplete, doesn't it?
>>
>>> Otherwise, you will benefit from Language.C exposing all the
>>> cruelities of the C Syntax :)
>>
>> Not the syntax but the semantics. Nevertheless, it's cruel, I know :-/
>>
>> I think I will give the visitor framework a try.
>
> Ok fine. I hope you'll make good progress :)
>
> cheers, benedikt
>
> p.s.: I'm on holiday for 3 weeks now, so replies will be delayed even
> more than usual ;)
>
>>
>> best
>>
>> Alex
>>
>> PS: I am subscribed to the list now, so you don't have to CC me any
>> more. Thanks.
>>
>> [1] http://research.microsoft.com/en-us/um/people/simonpj/papers/hmap/
>
>
> _______________________________________________
> Language-c mailing list
> Language-c at projects.haskell.org
> http://projects.haskell.org/cgi-bin/mailman/listinfo/language-c
>



More information about the Language-c mailing list