Wednesday, March 29, 2006

The DAG/POSET conjecture and Ordered Bioinformatic Sequences: Sole Thread for Disclosure/Discussion

At the sci.math thread:

************** DAG/POSET consultancy fee conjecture: sole thread here for all future postings *****************

contributions from Stas Buysgin, Guenter Stertenbrink, Robin Houston, David Wagner, Jerry Rieper, and Stephen M. Fortescues have all pretty much established that "dim 2" DAGs other than pre-ordered finite rooted directed trees (and ordered forests trivially built therefrom) can be obtained by construction.  It remains to be seen whether any complete characterization of "dim 2" DAGs comes out of this or not, pending:

a) review of Jerry Rieper's second and third installments, to be posted shortly; b) review of David Wagner's approach (to be posted in mid-August) c) review of the approach that Guenter Stertenbrink will ultimately present.

It is extremely encouraging to the knowledgeable bio-informatic staff at Cumulative Inquiry that the &! quot;construction approach" needs to be taken, at least initially, for guaranteed ID of certain classes of "dim 2" DAGS. The reason is as follows.

(Before reading further, please take a moment to read the posting at sci.math which begins "Loyalty to a Research Aesthetic ...", because the following is a 5% disclosure of Cumulative Inquiry's "intellectual capital".)

Suppose we have four DNA bases {t,c,a,g} or their mRNA equivalents {u,c,a,g}, which I will use here.  The alphabet {u,c,a,g} can be thought of as a four-letter alphabet {++, -+, --. +-} in THREE different ways, due to the fact that biochemically, these four bases are cross-cut by three orthogonal chemical dichotomies:

The W(eak):S(trong) W:S Dichotomy:

u,a are W(eak) bases and c,g are S(trong) bases, where W means two H-bonds in canonical Watson-Crick u:a base pairs and S means three H-bonds in canonical Watson-Crick c:g base pairs.

The pY! rimidine:puRine (Y:R) Dichotomy

u,c are pyrimidines; they lack the five-rings of the larger c,g purines.

The kEto:aMino (E:M) Dichotomy.

u,g are the keto bases which have O's where the amino bases a,c have NH2's

COmbining any two of these dichotomies together completely specifies the four bases, i.e.: we rename the W/S dichotomy as +W:-W, the Y:R dichotomy as +Y:-Y, and obtain:

u: +W, +y c: -W, +y g: -W, -Y a: +W, -Y

Renaming the E:M dichotomy as +E:-E, we obtain

u: +W, +E g: -W, +E c: -W, -E a: +W, -E


u: +Y, +E g: -Y, +E a: -Y, -E c: +Y, -E

Using the alphabet {++,-+,--,++} to represent any of these three alphabets, we can obviously build "dim 2" DAGs from mRNA sequences simply by taking the elements of this alphabet as operations which add vertices successively to a given trivial graph of just one vertex:

++ : adds a vertex(xi,yi) "above and to the right" of all existing vertices      (xj,yj), i.e. xi > xj and yi > yj (so xi,yi) mus! t be a descendant      of every (xj,yj)

-+ : adds a vertex (xi,yi) "above and to the left" of all existing vertices      (xj,yj), i.e. (xi < xj) and (yi > yj), so (xi,yi) "comes before"      every (xj,yj) in the trivial extension of the dim 2 poset of the dag,      but is not an ancestor of (xj,yj)

--:  adds a vertex (xj,yj) "below and to the left" of all existing vertices     (xj,yj), so (xi,yi) is an ancestor of (xj,yj)

+-:  adds a vertex (xi,yi) "below and to the right" of all existing vertices      (xj,yj), i.e. (xi < xj) and (yi > yj), so (xi,yi) "comes after"      every (xj,yj) in the trivial extension of the dim 2 poset of the dag,      but is not a descendant of (xj,yj)

Using this concept of "dim 2 DAG-building" alone, Dr. Robert Jamison (Clem! son) obtained in 1994-1996 an empirically useful notion of "tree (on a torus) with half-turn  symmetry" that can be built via the above operations, and William F. Mann helped in 1998-199 to clarify the two basic types of mRNA sequences which build such "trees invertible under a half-turn" ("head-centered" and "bar-centered" sequences.)

Furthermore, thru laborious statistical effort from 1994-2003, Dr. Jacques Fresco (Princeton), Dr. Arthur Lesk (Cambridge, UK), and myself were finally able to identify a grouping of the 20 amino acids into four groups "{++,-+,--.+-}", so that the amino acid chains of protein primary structures can also be treated as sequences which build "dim 2 DAGs using the above four operations.  (That is, an empirically productive and scientifically rationalizable grouping of the 20 amino acids into four groups {++,-+,--.+-}

But here is where the work of Stas/Guenter/Ro! bin/Jerry/David comes in. Their constructions for building non-trivial "dim 2 dags" (i.e. non-trees/ non-forests) will permit Cumulative Inquiry (CI) to extend the sequence > DAG will specify certain DAGs and posets.  In other words, SUBSEQUENCES of bases in mRNA sequences (not individual bases) will, when put together successively, build certain types of complex DAGs/POSETS, and the same for SUBsequences of amino acids (not individual amino acids) in protein primary structures.

So if CI's basic heuristic hypothesis correct, then "formally interesting" graph/poset-building sequences will correlate well with "empirically interesting" mRNA coding sequences, and also with "empirically interesting" amino acid chains in protein primary strucure. Furthermore, CI has some reason to believe that things will work out this nicely; see URL

(Otherwis e, CI wouldn't be spending the money on the pure math research.)

And if the CI hypotheses are NOT correct, it's been a whole lot of fun that maybe has generated some clarifying insights into the dimensionality of "orderable DAGs."