Haskell style scanner
The program was inspired by a python script (called
haskell_style_check) from
www.cs.caltech.edu/courses/cs11/material/haskell/misc/haskell_style_guide.html
that reports:
- TABS
- LINE IS TOO LONG
- PUT SPACE AFTER COMMA
- PUT SPACE AROUND OPERATORS
- PUT SPACE AFTER OPEN COMMENT
The program scan is
supposed to serve the same purpose. The main difference is that the scanner is
better aware about the haskell tokens and comments (and that it is written in
Haskell).
Since version 0.1.0.6 also some comment checking and changing is supported, i.e.
-- |The 'square' function squares an integer.
-- |It takes one argument, of type 'Int'.
can be changed to:
{- |The 'square' function squares an integer.
It takes one argument, of type 'Int'. -}
or with the option -g4 to:
{- |The 'square' function squares an integer.
It takes one argument, of type 'Int'. -}
Note, that the wrong haddock marker in the second line has been removed.
To avoid disaster when joining comments, checking reports nested comments or
comment delimiters within line comments.
A block comment on a single and separate line can be changed to a line comment.
Installation
scan can be
installed locally by:
cabal update
cabal install scan
The binary scan will be installed into the
directory ~/.cabal/bin that you should add to your search path. (Under
unix with a bash shell you would add export
PATH=~/.cabal/bin:$PATH to your file ~/.bashrc.)
(The darcs source repository is
under http://code.haskell.org/style-scanner/.)
Usage
usage: scan [options] [--] <file>+
-h --help show usage message and exit
-v --version show version and exit
-w --windows create windows (CRLF) file
-t --template-haskell no hints for $ in template haskell
-l <n> --line-length=<n> report lines longer than <n> (default -l80)
-m <n> --multiple-blanks=<n> report more than <n> blanks (default -m1)
-s <b> --check-spacing=<b> check spacing around symbols (default True)
-c <b> --check-comments=<b> check comment delimiters (default True)
-C <b> --change-comments=<b> change to some line comments (default True)
-j <b> --join-comments=<b> join consecutive comments (default True)
-g <n> --comment-gap=<n> spaces between joined comments (default -g0)
-b <n> --blank-lines=<n> remove more than <n> blank lines (default -b2)
-i[<ext>] --inplace-modify[=<ext>] modify file in place (backup if <ext> given)
-e <ext> --extension=<ext> create output file with given extension <ext>
-o <file> --output-file=<file> output modified input to <file>
-O <dir> --output-directory=<dir> output modified file to <dir>
scan expects at least one filename on the command line. The
corresponding (non-literate) Haskell source file is scanned and diagnostic
messages are output.
If scan seems to do nothing despite a valid filename it simply means
that the file passed all style checks, which is a good sign but pretty unlikely
for haskell code that has not been scanned before.
scan does not change your file unless you add one of the options
"-i, -e, -o, -O" that are explained below.
(In the unlikely case that you have filenames looking like options
they must follow the "--" option.)
The options -m0 -sf -cf -Cf -jf -b0 along with one option to write
back will only remove your tabs and trailing white spaces. Along with the
option -w you would get proper CRLF line endings on windows. To ensure
a single final newline, use a reasonable number for the -b option.
The boolean flag values can be given by any non-empty upper- or lowercase
prefix of "True" or "False", therefore the "f" in -sf switches off
checking spacing around symbols.
Options
- -w
- This option assumes that you work with windows files that have two
characters (CR and LF) as line delimiters. The diagnostics and a finally
written file are adjusted accordingly.
- -t
- This option allows to treat template haskell source files. It prevents i.e.
putting a space between a "$" and an open parenthesis "(",
since "$(" indicates a splice in template haskell and is no infix
application of $. This option reports fewer messages and makes fewer
changes to output files if applied to normal haskell source files.
- -l <n>
- This option complains about lines being long than <n>
characters. Without this option a line length of 80 is assumed. This option
is irrelevant when writing back files since lines are not yet broken
automatically.
- -m <n>
- This option controls consecutive blanks between tokens. By default multiple
blanks are currently only allowed for indentation and before a line comment.
(Line comments are expected to be aligned, but that isn't checked yet.)
In case you have your code aligned in a tabular fashion, you may increase
the number of consecutive blanks i.e. to 10, but that does not check your
alignment, but only tolerates your gaps.
Multiple blanks following the keywords do, let, of,
or where are considered "suspicious" regarding layout and are always
reported, unless you use the -m0 option to switch off all hints
about multiple blanks.
- -s <b>
- This option allows you to switch off (or on again) checking and changing
the spacing around infix symbols and after commas which is the actual purpose
of this tool. But maybe you only want to adjust your comments.
- -c <b>
- This option allows you to switch off (or on again) checking and changing
of your line and block comments which are no pragmas or DrIFT directives.
- -C <b>
- Plain block comments at the end of a single line are converted to line
comments. Pass -Cf to switch this off.
- -j <b>
- Line or block comments on consecutive lines will be joined together into a
single block comment. Thereby also some wrong haddock markup will be removed.
This feature may be switched off.
- -g <n>
- When joining consecutive comments the comment start delimiter of the second
comment will be deleted. In order the keep your text aligned you may want to
replace this delimiter by blanks. Pass -g3 to exactly replace the
removed delimiter.
- -b <n>
- This option sets the limit for the maximal number of consecutive blank
lines. The default value is 2. If you pass -b0 blanks lines are not
checked. Note, that initial blank lines are counted, but more than one final
line will be reported or removed (except for -b0).
- -i[<ext>]
- This and the following option will not report but apply some
suggestions to your input file, possibly destroying your layout,
see below. If you supply an extension,
i.e. by -ibak, a backup of your original input file (ending
with .bak) will be created.
- -e <ext>
- With this option the original file will not be overwritten. Instead a file
with the given extension will be created (possibly destroying an existing
one).
- -o <file>
- This is an alternative option to write out the modifications, leaving the
input file unchanged (if the output file name is different).
Using "-ibak -o temp.hs" in this order will create a
backup file and only write to the temp.hs file.
- -O <dir>
- This is yet another way to specify an output file. The directory name will
be joined with the input file name.
If options of the same kind are given multiple times only the last one will
apply. (Canceling the -w or -t option is not possible by
subsequent options.)
Examples
The diagnostics created by scan applied
to Examples.hs are shown
in scan.log whereas the differences that will be created
by "scan -i Examples.hs" are shown in diff.log.
If you additional pass the -t to both calls of scan, there
would be no hint to insert a blank before "(" and such change would be
applied to the possible splice "$(".
Emacs integration
The emacs integration is similar to compilation-mode in conjunction with the
haskell emacs mode. It is an adaption of
hs-lint.el
called hs-scan.el.
It allows navigation
between messages using M-g n and M-g p (or M-g M-n
and M-g M-p) or by clicking on a position in the *hs-scan*
buffer. A possible entry for your .emacs may contain the following
snippet:
(load "~/emacs-modes/haskell-mode/haskell-site-file")
(load-file "~/emacs-modes/hs-scan.el")
(load-file "~/emacs-modes/hs-lint.el")
(defun my-haskell-mode-hook ()
(global-set-key [f6] 'hs-lint)
(global-set-key [f7] 'hs-scan))
(add-hook 'haskell-mode-hook 'my-haskell-mode-hook)
Pressing F7 on a haskell source file would invoke scan and
create the *hs-scan* buffer.
Related application
Another inspiration for the scanner
was hlint that is based
on a parser from
haskell-src-exts
and also gives you hints to improve your source code on the expression level,
like:
- Redundant
- brackets
- lambda
- return
- do
- if
- $
- Use camelCase
- Eta reduce
- Use null, etc.
Combined application
Assuming that you install scan and hlint using "cabal
install" (in your local repository), you can use a combined checking
shell script:
#!/bin/sh
for f in $@
do
$HOME/.cabal/bin/scan $f
$HOME/.cabal/bin/hlint $f
## haddock -w $f
done
You may add some -i options to hlint if you want.
Calling haddock on a file
without knowing the ghc compiler flags is usually a bad idea, but
might be used to spot haddock parsing problems earlier.
(Unfortunately, not only parsing but also scanning may depend on the language
extension chosen that scan does not consider.)
The -i, -e, -o, or -O options of scan allow you to
write back an adjusted source file, that may, however, be no longer valid
haskell due to insertion or deletion of blanks or comment delimiters! At the
moment the following changes are applied:
- spaces are inserted after commas (but not within comments or string
literals!)
- spaces are inserted around infix operators
- spaces between parentheses and infix operators within sections are
removed
- tabs are replaced by blanks
- trailing white spaces are removed
- more than two consecutive blanks (or n with the -m
option and n > 1) are removed
- more than two consecutive blank lines (or n with the
-b option and n > 0)
and all final blank lines are removed
- the file will end with a single final newline
- comment delimiters are adjusted to contain a single space
- single line block comments are turned into line comments
- comments on consecutive lines are joined together
It needs to be pointed out, that the notion of infix operators is broader here
than the haskell notion of it. The key symbols like |, \, =, <-, ->, =>,
:: are also treated like infix operators (but not !, #, @,
~, and not $ with the -t option).
The unary and binary minus are distinguished. I.e. (-x) requires no
space but y - x does.
Warning
The layout may be destroyed if there are multiple blanks between the
keywords do, let, of, or where and the actual layout
start. Therefore you may consider to break the line after do, of,
or where. Starting a line with let and indenting further
equations by four blanks is fine, though! Leaving two blanks
between do and the first statement in order to indent subsequent
statements by four blanks is not supported by scan (and therefore no
good idea).
To keep the loss risk low only a single file argument on the command line
will be accepted when an adjusted file should be written out.
Things to do
- add a separate switch for warnings about semicolons.
They are part of the -c switch, currently.
- delete single line comments without text
- remove bug regarding magic hash for instance in unsafeCoerce#
- treat haddock comments better that are directly preceded by plain comments
- check alignment on ::, =, ->, <-, -- signs and reduce
complaining about multiple blanks
- maybe complain about multiple left hand sides for functions and
suggest case expressions
- do not treat $ like a section if used as TH splice
- do not complain that a block comment could be a line comment when
the block comment was internally created from a line comment for subsequent
joining
Versions overview
- scan-0.1.0.9
- read (and write) files in binary mode to avoid failures on latin1
files
- also allow compilation with parsec1
- scan-0.1.0.8
- recognize quasi-quotes
- leading spaces in comments are preserved when joining without checking
comments using the -cf flag
- template haskell quotes (aka [e|) are recognized if
the -t flag is supplied
- scan-0.1.0.7
- made it use any parsec version
- scan-0.1.0.6
- added options
- for windows files
- to switch off spacing of code
- to adjust comments
- for the maximal number of blank lines
- to further control output files to be written
- more robust option parsing
- check for nested comments
- check for comment delimiters in line comments
- turn short block comments into line comments
- join consecutive comments into one block comment
- consider haddock markup, pragmas and DrIFT directives
- minor bug fixes
- scan-0.1.0.5
- made it compilable with ghc-6.12.2:
src/scan.hs:1:0:
The main function `main' is not exported by module `Main'
- scan-0.1.0.4
- added options
- for version and usage messages
- to set the maximal line length
- to support template haskell source files
- to suppress hints on multiple blanks (-m0)
- to control output and backup files written
- removed "-" option
(fails with openFile: does not exist)
- check if top-level code starts in column 1
- scan-0.1.0.3
- improved messages
- untabify and remove trailing white spaces of the whole file that
is written back
- recognize carriage returns of windows files and remove them when
written back
- scan-0.1.0.2
- show positions like hlint does
- added usage message and version string
- allow scanning uni-code operators, template haskell quotes, magic double
hashes
- scan-0.1.0.1
- initial version
- scan-0.1.0.0
- non-compilable upload (missing source file):
src/scan.hs:24:7:
Could not find module `Language.Haskell.Scanner':