Comments on CCPN data model 0.9.6
---------------------------------

Tom Goddard
Dec 5, 2001

I downloaded the CCPN 0.9.6 data model distribution and tried reading
the sample XML files with the provided Python code.  I also wrote
Python code to convert the CCPN data model objects to Sparky data
objects.  Here are problems I encountered and comments.

Web documentation
-----------------

The web site documentation for CCPN 0.9.6 isn't organized for someone
new to the data model.  The Data Model Figures (pdf files) and 
Documentation sections were the most useful for me but they are not
at the top.  I think it would make the CCPN data model alot easier
to approach if you provided lead in documentation to it that does not
include any python code or implementation details.  In other words
make the figures and the CcpnNmrData and CcpnNmrRefData documentation
that describes the classes come first in your documentation.  Also
I would filter out those auto-generated implementation classes.  And
I would give a high level descriptive overview covering some major
classes (Molecule, Experiment, ...).  Next would come the API text
description, then example data, and last the links to specific Python
files with brief descriptions.

Figures
-------

The figures on the web have open diamonds to indicate the parent/child
tree structure.  Apparently some diamonds are left out as obvious.  This
was confusing.  The figures at the workshop with all the open diamonds
shown were clearer.  The new figures haven't made it to the web site.
The UML figure legend text will need updating to say the open diamonds
are not optional.

Why does every attribute in the figures have a "+" sign in front of it?
This makes the figures less readable.

Python XML parser
-----------------

Python 2.1 does not include an XML SAX parser.  It is necessary to
install the PyXML package to get the expat parser.  You have a note
suggesting this might be the case on your web site.  You should say
that PyXML is necessary and provide a link to PyXML.  Python
documentation does not seem to say anywhere whether an XML parser is
included or not.

CCPN not a Python package
-------------------------

The first thing I wanted to try was to read the sample data in Python.
I had to set the PYTHONPATH environment variable to point to my CCPN
directory.  I'd prefer if the CCPN distribution were a Python package
so I could put it in lib/python2.1/site-packages/ccpn and use "import
ccpn".  This avoids namespace polution, important when I try to use
the data model Python code in Sparky.  I tried making it a package by
adding an __init__.py file.  That failed because an XML parser
callback tries to import a CCPN file and can't find it.  I didn't go
further and change that "import xxx" to the needed "from ccpn import xxx".

Data model browser
------------------

I wanted to look at the Python data model objects for the example
data.  I ended up writing Python code to print parts of data model I
am interested in (data source, cross peaks, assignments).  I wrote a
few hundred lines of Python code.  That was a very difficult way to
inspect data model objects.  Perhaps CCPN could provide a graphical
browser using Python/Tk.  It would start at Study and show all attribute
values and all links you can traverse.  You could traverse links by
clicking with the mouse.  This is a good way for someone new to the
data model to get familiar with it.

Undefined variable
------------------

My code to print data model objects at one point tried to traverse a
non-existant link.  The CCPN code then threw an exception while trying
to print an error message becase of an undefined variable.

CcpnAPIbase.py line 917, variable "link" not defined:

            print ("Warning, invalid link reference in link %s for obj. %s,%s"
                   % (link,self.treeroot.name,self.ID) )


Alphebetical order for class documentation
------------------------------------------

The Ccpn API descriptions of all the classes aren't entirely in
alphabetical order.  This makes it hard to find classes.

Experiment/DataSource questions
-------------------------------

In writing code to convert CCPN Experiment/DataSource into Sparky objects
there were many questions I could not resolve using the documentation.
Many were about the definition of attributes.  Some questions were answered
at the workshop, but I include them here since the documentation should
answer them.

Do multiple DataSources for an Experiment mean different processed versions?
Is DataSource.ndim equal to Experiment.ndim?
How do I know the file format for DataSource.filename?
Does DataDim.dim equal ExpDim.dim?  If not does it reflect permutation of axes?
How do I get sweep widths?
What is ExpDimRef.maxValue, ExpDimRef.minValue?
Is FreqDataDim.npointsOrig and DataDimRef.valuePerPoint adequate to compute
  sweep width?
What is the units of DataDimRef.valuePerPoint?  Is it ExpDimRef.units?
What strings can ExpDimRef.units be?  PPM? ppm? Ppm? Hz? hz? point?
Why does ExpDimRef link to more than one isotope?  Unusual experiments?
How does ExpDimRef distinguish shift by sweep width type folding from 
  reflection type folding?

Crosspeak/Assignment questions
------------------------------

What are the precise meanings of AbstractXpk attributes volumeESU,.heightESU,
  and figOfMerit?
Is XpkDim.widthPoints a half height width?
What is Residue.resCode?
Is AssignmentGroup intended to address unknown stereo-specific assignments?
  Are there any other intended uses?
What are possible values of SingleAssignment nuclGroupType and atomSiteType?
What are the possible/suggested values for ChemComponent.molType?

What is a CCPN Chain?
---------------------

What is a Chain?  CCPN documentation says "a specific instance of a given
molecule in a specific environment."  So a Chain is a Molecule in a specific
Condition?  A Chain is not a sub-part of a Molecule as in a PDB file?
Then what is the equivalent of a PDB chain ID?  Maybe PolymerSequenceBlock?
But PolymerSequenceBlock has an organismName and organismScientificName but
no one letter code (from say a PDB entry).  Also MolResidue has a seqID but
no PDB style chain ID.  How is the PDB molecule chain represented in the
CCPN data model?

