Haskell style scanner

The program was inspired by a python script (called haskell_style_check) from www.cs.caltech.edu/courses/cs11/material/haskell/misc/haskell_style_guide.html that reports: The program scan is supposed to serve the same purpose. The main difference is that the scanner is better aware about the haskell tokens and comments (and that it is written in Haskell). Since version 0.1.0.6 also some comment checking and changing is supported, i.e.
-- |The 'square' function squares an integer.
-- |It takes one argument, of type 'Int'.
can be changed to:
{- |The 'square' function squares an integer.
It takes one argument, of type 'Int'. -}
or with the option -g4 to:
{- |The 'square' function squares an integer.
    It takes one argument, of type 'Int'. -}
Note, that the wrong haddock marker in the second line has been removed.

To avoid disaster when joining comments, checking reports nested comments or comment delimiters within line comments.

A block comment on a single and separate line can be changed to a line comment.

Installation

scan can be installed locally by:
cabal update
cabal install scan
The binary scan will be installed into the directory ~/.cabal/bin that you should add to your search path. (Under unix with a bash shell you would add export PATH=~/.cabal/bin:$PATH to your file ~/.bashrc.)

(The darcs source repository is under http://code.haskell.org/style-scanner/.)

Usage

usage: scan [options] [--] <file>+
  -h         --help                    show usage message and exit
  -v         --version                 show version and exit
  -w         --windows                 create windows (CRLF) file
  -t         --template-haskell        no hints for $ in template haskell
  -l <n>     --line-length=<n>         report lines longer than <n> (default -l80)
  -m <n>     --multiple-blanks=<n>     report more than <n> blanks (default -m1)
  -s <b>     --check-spacing=<b>       check spacing around symbols (default True)
  -c <b>     --check-comments=<b>      check comment delimiters (default True)
  -C <b>     --change-comments=<b>     change to some line comments (default True)
  -j <b>     --join-comments=<b>       join consecutive comments (default True)
  -g <n>     --comment-gap=<n>         spaces between joined comments (default -g0)
  -b <n>     --blank-lines=<n>         remove more than <n> blank lines (default -b2)
  -i[<ext>]  --inplace-modify[=<ext>]  modify file in place (backup if <ext> given)
  -e <ext>   --extension=<ext>         create output file with given extension <ext>
  -o <file>  --output-file=<file>      output modified input to <file>
  -O <dir>   --output-directory=<dir>  output modified file to <dir>

scan expects at least one filename on the command line. The corresponding (non-literate) Haskell source file is scanned and diagnostic messages are output. If scan seems to do nothing despite a valid filename it simply means that the file passed all style checks, which is a good sign but pretty unlikely for haskell code that has not been scanned before.

scan does not change your file unless you add one of the options "-i, -e, -o, -O" that are explained below. (In the unlikely case that you have filenames looking like options they must follow the "--" option.)

The options -m0 -sf -cf -Cf -jf -b0 along with one option to write back will only remove your tabs and trailing white spaces. Along with the option -w you would get proper CRLF line endings on windows. To ensure a single final newline, use a reasonable number for the -b option. The boolean flag values can be given by any non-empty upper- or lowercase prefix of "True" or "False", therefore the "f" in -sf switches off checking spacing around symbols.

Options

-w
This option assumes that you work with windows files that have two characters (CR and LF) as line delimiters. The diagnostics and a finally written file are adjusted accordingly.
-t
This option allows to treat template haskell source files. It prevents i.e. putting a space between a "$" and an open parenthesis "(", since "$(" indicates a splice in template haskell and is no infix application of $. This option reports fewer messages and makes fewer changes to output files if applied to normal haskell source files.
-l <n>
This option complains about lines being long than <n> characters. Without this option a line length of 80 is assumed. This option is irrelevant when writing back files since lines are not yet broken automatically.
-m <n>
This option controls consecutive blanks between tokens. By default multiple blanks are currently only allowed for indentation and before a line comment. (Line comments are expected to be aligned, but that isn't checked yet.) In case you have your code aligned in a tabular fashion, you may increase the number of consecutive blanks i.e. to 10, but that does not check your alignment, but only tolerates your gaps. Multiple blanks following the keywords do, let, of, or where are considered "suspicious" regarding layout and are always reported, unless you use the -m0 option to switch off all hints about multiple blanks.
-s <b>
This option allows you to switch off (or on again) checking and changing the spacing around infix symbols and after commas which is the actual purpose of this tool. But maybe you only want to adjust your comments.
-c <b>
This option allows you to switch off (or on again) checking and changing of your line and block comments which are no pragmas or DrIFT directives.
-C <b>
Plain block comments at the end of a single line are converted to line comments. Pass -Cf to switch this off.
-j <b>
Line or block comments on consecutive lines will be joined together into a single block comment. Thereby also some wrong haddock markup will be removed. This feature may be switched off.
-g <n>
When joining consecutive comments the comment start delimiter of the second comment will be deleted. In order the keep your text aligned you may want to replace this delimiter by blanks. Pass -g3 to exactly replace the removed delimiter.
-b <n>
This option sets the limit for the maximal number of consecutive blank lines. The default value is 2. If you pass -b0 blanks lines are not checked. Note, that initial blank lines are counted, but more than one final line will be reported or removed (except for -b0).
-i[<ext>]
This and the following option will not report but apply some suggestions to your input file, possibly destroying your layout, see below. If you supply an extension, i.e. by -ibak, a backup of your original input file (ending with .bak) will be created.
-e <ext>
With this option the original file will not be overwritten. Instead a file with the given extension will be created (possibly destroying an existing one).
-o <file>
This is an alternative option to write out the modifications, leaving the input file unchanged (if the output file name is different). Using "-ibak -o temp.hs" in this order will create a backup file and only write to the temp.hs file.
-O <dir>
This is yet another way to specify an output file. The directory name will be joined with the input file name.
If options of the same kind are given multiple times only the last one will apply. (Canceling the -w or -t option is not possible by subsequent options.)

Examples

The diagnostics created by scan applied to Examples.hs are shown in scan.log whereas the differences that will be created by "scan -i Examples.hs" are shown in diff.log. If you additional pass the -t to both calls of scan, there would be no hint to insert a blank before "(" and such change would be applied to the possible splice "$(".

Emacs integration

The emacs integration is similar to compilation-mode in conjunction with the haskell emacs mode. It is an adaption of hs-lint.el called hs-scan.el. It allows navigation between messages using M-g n and M-g p (or M-g M-n and M-g M-p) or by clicking on a position in the *hs-scan* buffer. A possible entry for your .emacs may contain the following snippet:
(load "~/emacs-modes/haskell-mode/haskell-site-file")
(load-file "~/emacs-modes/hs-scan.el")
(load-file "~/emacs-modes/hs-lint.el")
(defun my-haskell-mode-hook ()
 (global-set-key [f6] 'hs-lint)
 (global-set-key [f7] 'hs-scan))
(add-hook 'haskell-mode-hook 'my-haskell-mode-hook)
Pressing F7 on a haskell source file would invoke scan and create the *hs-scan* buffer.

Related application

Another inspiration for the scanner was hlint that is based on a parser from haskell-src-exts and also gives you hints to improve your source code on the expression level, like:

Combined application

Assuming that you install scan and hlint using "cabal install" (in your local repository), you can use a combined checking shell script:
#!/bin/sh
for f in $@
do
 $HOME/.cabal/bin/scan $f
 $HOME/.cabal/bin/hlint $f
## haddock -w $f
done
You may add some -i options to hlint if you want.

Calling haddock on a file without knowing the ghc compiler flags is usually a bad idea, but might be used to spot haddock parsing problems earlier. (Unfortunately, not only parsing but also scanning may depend on the language extension chosen that scan does not consider.)

Writing back files

The -i, -e, -o, or -O options of scan allow you to write back an adjusted source file, that may, however, be no longer valid haskell due to insertion or deletion of blanks or comment delimiters! At the moment the following changes are applied: It needs to be pointed out, that the notion of infix operators is broader here than the haskell notion of it. The key symbols like |, \, =, <-, ->, =>, :: are also treated like infix operators (but not !, #, @, ~, and not $ with the -t option).

The unary and binary minus are distinguished. I.e. (-x) requires no space but y - x does.

Warning

The layout may be destroyed if there are multiple blanks between the keywords do, let, of, or where and the actual layout start. Therefore you may consider to break the line after do, of, or where. Starting a line with let and indenting further equations by four blanks is fine, though! Leaving two blanks between do and the first statement in order to indent subsequent statements by four blanks is not supported by scan (and therefore no good idea).

To keep the loss risk low only a single file argument on the command line will be accepted when an adjusted file should be written out.

Things to do

Versions overview

scan-0.1.0.9
scan-0.1.0.8
scan-0.1.0.7
made it use any parsec version
scan-0.1.0.6
scan-0.1.0.5
made it compilable with ghc-6.12.2:
src/scan.hs:1:0:
    The main function `main' is not exported by module `Main'
scan-0.1.0.4
scan-0.1.0.3
scan-0.1.0.2
scan-0.1.0.1
initial version
scan-0.1.0.0
non-compilable upload (missing source file):
src/scan.hs:24:7:
    Could not find module `Language.Haskell.Scanner':