[Takusen] Takusen package split and source re-org

Tue Jun 21 22:25:55 BST 2011

Hello all,

I'd like to resume work on Takusen, a little at a time, but one
barrier is that I cannot build the tests - I get some inscrutable
cabal error message. Rather than waste time investigating cabal
internals, I thought maybe this would be a good trigger to restructure
the entire project :-) Here's a proposal, please let me know if you
think it's any good (or things you like done differently):

 1. split Takusen into a core package and backend-specific packages
 2. separate tests from the main src tree (but keep them in the same
package); have src and test folders in the repo
 3. rename modules with product name

---------------------------------------------------------

Regarding (1): people have said that it's wrong for Takusen to have a
setup that changes the API depending on what you have installed, and
I'm inclined to agree. So I think we should split Takusen into these
separate packages:
 - takusen-core
 - takusen-odbc
 - takusen-sqlite
 - takusen-postgres
 - takusen-oracle

This shoud make it easier to install, and make Setup.hs much simpler -
trivial, even - which is good from a maintenance POV.

The odbc, sqlite, etc packages would have takusen-core as a package
dependency. These packages would contain both the low-level FFI
module, and the left-fold Enumerator module.

We could still keep the single source repo, and just have a separate
source subtree for each package. This would mean quite a bit of repo
reorganisation, but it's just darcs moves and renames, so should be
prety simple, and we get to keep the source history.

Jason has suggested also packaging the low-level FFI modules as
seperate libraries on hackage; if we did that then the package
structure becomes very fine-grained:
 - takusen-core
 - takusen-odbc
 - takusen-odbc-ffi
 - takusen-sqlite
 - takusen-sqlite-ffi
 - takusen-postgres
 - takusen-postgres-ffi
 - takusen-oracle
 - takusen-oracle-ffi

Regarding (2): it seems tidier to have the tests in a separate module
namespace and source tree.

Regarding (3): The Database.* namespace is a bit too generic. The
convention seems to be to use a product name as the top-level
namespace e.g. Takusen.*

So in takusen-core we might have these modules:
 Takusen.Enumerator
 Takusen.InternalEnumerator
 Takusen.Util
 Foreign.C.UTF8
 Control.Exception.Extensible
 Control.Exception.MonadIO

and then in takusen-odbc:
 Takusen.ODBC.Enumerator

and in takusen-odbc-ffi:
 Takusen.ODBC.OdbcFunctions

etc.

So I guess the new repo structure would be a bit like this:

takusen\
 _darcs\
 LICENSE

 core\
   src\
     Control\
       Exception\
         Extensible.hs
         MonadIO.hs
     Foreign\
       C\
         UTF8.hs
     Takusen\
       Enumerator.lhs
       InternalEnumerator.lhs
       Util.lhs
   test\
     Takusen\
       Enumerator.lhs
       Performance.lhs
       Util.lhs
     Test\
       MiniUnit.hs
       MiniUnitTest.hs
     Main.hs
   Setup.hs
   takusen-core.cabal

 odbc-ffi\
   src\
     Takusen\
       ODBC\
         OdbcFunctions.hsc
   test\
     Takusen\
       ODBC\
         OdbcFunctions.hsc
     Main.hs
   Setup.hs
   takusen-odbc-ffi.cabal

 odbc\
   src\
     Takusen\
       ODBC\
         Enumerator.lhs
   test\
     Takusen\
       ODBC\
         Enumerator.lhs
     Main.hs
   Setup.hs
   takusen-odbc.cabal

..etc.

There's plenty of other stuff to fix, obviously, but I think this will
give us a better base. Just as a note, here's a list of some of the
things that I think should be fixed/improved; do you have any to add?:

 - separate core and backend packages
 - make low-level FFI code separate hackage packages
 - use QuickCheck2 in tests
 - fix Oracle bind buffer premature release bug
 - Oracle resource leak: Ref Cursors (StmtHandles) not freed
 - Out bind parameters for ODBC
 - Multiple result-sets from ODBC procedure call
 - better result-set <-> iteratee validation. Check column types?
 - replace internal code with these hackage equivalents:
    - extensible exceptions
    - EMonadIO or MonadCatchIO-mtl (or-transformers)
    - regions?
    - iteratee?
 - implement fetchAll or something similar: fetch generic rows
    which can be introspected to determine datatypes.
    Useful for random/unknown queries.
 - support multiple (heterogenous) database connections
 - look at converting from mtl to transformers
 - use hsc2hs to create #define constants from header files,
    rather than hard-code them.
 - Blob support (and clob?).
 - FreeTDS back-end.

Alistair