Dec 18 2011


A quodlibet is a piece of music combining several different melodies, usually popular tunes, in counterpoint and often a light-hearted, humorous manner. The term is Latin, meaning “whatever” or literally, “what pleases.”

Quodlibet (QUOD) is a suite of network creation, editing and querying software I am currenty developing. QUOD is a software application that displays biochemical pathway data in a way that is interactive and information intensive.

In addition to providing information about genetic-protein interactions QUOD also allows for additional information to be harvested, including data on the effects of natural products on gene-protein expression.

A bit of the Adipocytikine network as displayed in QUOD. Amber nodes indicate high betweenness centrality and in-degree. Dark green tags indicate that a natural product has been associated with the regulation of that node.

Unlike most biochemical pathway/network depiction programs QUOD actually ‘thinks’ in a social network sense in that it analyzes the network and reports on many graph functions including betweeness centralities, page-ranks, and cluster coefficients. This allows an immediate understand of which nodes are acting in a critically important role in the network.

It was designed to be simple, easy to user and fun to edit and develop in. Because it is web-based, no special software is required, other than a modern browser and a decent Internet connection. QUOD runs under the DataPunk platform and is open-access. Curators can use the extensive editing tools to add to, alter, or create entirely new networks.

One of the more powerful aspects of QUODLIBET is its ability to highlight naturopathic procedures and agents that have been shown to exert an influence on the expression or function of elements in a molecular network.  This will have a major influence on the future practice of Generative Medicine, since complex patterns of relationships between naturopathic agents and procedures (traditional as well as biomedical) can be superimposed over the network analysis of complex molecular graphs so as to allow clinicians to derive extraordinarily high quality suggestions about specific approaches that may more closely approximate the holism of the Vis Medicatrix Naturae.

Since the project is community-based, we are always looking for volunteers who are interested in helping out. Volunteers need not have any medical or super computer skills, just a passion for exactitude and a desire learn more about the genetics and biology. To learn more about QUODLIBET, you can download the User Guide and visit the Community Forums.

What is needed now is the community of volunteers who would be willing to devote time to curate additional maps. Hopefully this call to action will not prove too disappointing: I can think of no better way to enhance one’s own understanding and knowledge about a complex topic than to build mind-maps. Thus it is to the students of our profession that I direct this challenge, though any interested party with a computer and a decent Internet connection is welcome to help out.

Phase I of our map development program will involve translating the KEGG genomic and metabolic maps into QUOD format. KEGG maps are good and even hyperlink to relevant KEGG entries for enzyme activities, drugs, etc. However KEGG maps are hand drawn and cannot perform network graph computations. However KEGG has done much of the heavy lifting: converting a KEGG map to a QUOD map makes development quick and painless. From there QUOD will link all subsequent maps into clustered networks, allowing for greater and greater information processing.

No comments on this article yet

Nov 12 2011

Beautiful Data

It has been said that if one really wants to learn something, they should teach it. However we may want to expand that aphorism, to perhaps if one really wants to learn something, they should teach it, organize it or animate it.

The commonality between science and art is in trying to see profoundly – to develop strategies of seeing and showing. –Edward Tufte

One of the main goals of our Datapunk bioinformatics platform is the development of new and exciting information visualization (InfoViz) tools that can be used to develop new appreciations for the relationships between data. We are currently developing two new InfoViz platofrms that I think have great potential. But more than that, these tools feature stunning interfaces that go a long way towards again proving the fact that information, presented imaginatively, can not only yield amazing secrets, but can be stunningly beautiful as well.


The first InfoViz platform we developed is a full-bodied genomic network depictor called PathScrubber, which runs inside Datapunk. PathScrubber draws network graphs of gene-protein relationships. Each node in the network is click-able and links to a popup that provides information on that gene, through an API to OMIM (Online Mendellian Inheritance in Man). Perhaps more significantly, Datapunk is the first informatics tool that is harvesting scientific references detailing phytochemicals and dietary agents that have been reported in the literature to influence the expression of these gene-proteins.


Simply enter any gene-protein terms you wish to include in your network. Don’t worry about partial terms; PathScrubber will return a list of possible terms for you to consider and you can check which ones are appropriate in the next screen. After the program draws the network, you can zoom in or out with either the mouse or via a slider. Nodes containing genomic expression information on phytochemical or dietary agents are coded orange. PathScrubber has an extensive help page. PathScrubber is programmed in Perl, utilizing Léon Brocard’s GraphViz module to draw the graphs.

InfoViz Democratizers

This platform is designed to provide a venue to allow naturopathic physicians and researchers to animate their own data. One of the most exciting/important things we are developing is a set of guidelines for authors on how to structure their data so as to allow us to easily port it into a stunning visual display. Most JavaScript data is encoded in a format called JSON, which although paradoxically designed to be easily readable by humans when compared to other data structures, requires a heavy degree of ‘nesting’ of the data, which most non-programmers would find confusing and thus increase their proneness to data entry error. So we developed a simple language that only requires that the data be entered in a simple text file with a few codes. This is then parsed by Perl into JSON and piped to the page as HTML. Here are two examples:

Lectins: Classification and Taxonomy
Our second infoviz tool is a depiction of a the taxonomy and classification of known animal, plant and microbial lectins. Lectins are protein molecules that attach to sugars and modulate a variety of cell functions, including mitosis, agglutination, metastasis and infections. Most of the data for this infoviz is from my textbook, Fundamentals of Generative Medicine Clicking on a node should move the tree and center that node. This infoviz makes extensive use of JavaScript, especially the JavaScript InfoVis Toolkit. The tree-like structure opens and closes as one click on the various categories.

Lectin Classifications

Radio buttons on the top allow for the user to display the tree in different aspects and to chose between a ‘normal’ display, where the categories open up in a linear fashion of a ‘centering’ mode where the newly selected category moves to the center of the tree. Finally branches of the tree often contain hyperlinks to additional information.

Actions of Medicinal Plants
Our third infoviz tool is a depiction of a paper developed by Eric Yarnell, ND entitled ‘A Compendium Pharmacological Actions of Medicinal Plants and Their Constituents.’ Clicking on a node should move the tree and center that node. This InfoViz also makes extensive use of JavaScript, especially the JavaScript InfoVis Toolkit to display day in the form of a morphing hypertree. The centered node’s children are displayed in a relations list in the right column. The data set for this InfoViz is still in development.

Compendium of Medicinal Herbs

No comments on this article yet

Nov 09 2011

When you point one finger, four usually point back at you.

There have been no shortage of supercilious, poorly researched articles about the blood type diets in the major media. They are usually provoked by a statement of support by some celebrity or media icon who has experienced success with the plan. Typically within a week my phone rings off the hook with one major media outlet or another needing ‘to talk with me as soon as possible because we are doing a story on your diet and are on deadline.’

This rushed approach characterizes almost every encounter I’ve had with major media. They gobble up whatever simple facts there are so as to explain the gist of the story, then go looking for an opposing viewpoint so they can get off being held responsible for any direct conclusions. This is called ‘balanced journalism.’

The article usually begins with a roll call of all the famous people who are following or have been on the diet. I usually have never heard of any of these folks, but my daughters often serve to tell me a bit about who they are. Then the diets are described, but never fully. Some reporters concentrate on the lectin-blood group specificity, others the anthropology, some the digestive differences in physiology. However, I’ve never seen a single article explain all aspects to some degree, which is of course the strongest argument for the theory: That it can be verified in multiple dimensions of analysis.

On the positive side, many articles feature a personal story about an average-type person who has had success with the diet. Usually the focus is weight loss, for obvious reasons. Very few of these articles profile people who have been cured of any physical diseases by adopting this way of eating. Again, this is due to legal issues. However, trying to heal or control a physical illness is the most common reason why people try the blood type diets.

Hard Facts from the Fiction Department

Finally, and this is almost always rolled out at the end, the article presents one or two nutrition experts to pass judgement on the merit, risk and need to follow a diet for your blood type. Unless the expert has had some exposure to the deeper, scientific basis for the theory (never) their comments are almost universally negative.

These comments usually fall into common categories:

  • The diets are dangerous. This statement is usually proffered by experts concerned that, by restricting certain foods by blood type, people will develop nutrient deficiencies. However, each diet variant (A, O, B and AB) is a carefully engineered balance of foods that ensures full nutritional value. This criticism has a long and hallowed record of institutional whoring for agribusiness: ‘whole grains are fine, except if you are gluten sensitive’, ‘high fructose corn syrup can be part of a balanced healthy diet’, etc. Curiously, this concern is often matched with the next:
  • Of course people get better, the diets are all healthy: If you tell someone to get off of diet soda, they will feel better. I actually have no problem owning this criticism. We do include a lot of good naturopathic food wisdom with the blood type recommendations. I fail to see the problem with that. Paradoxically, if one were to go back and read my earliest popular book, Eat Right For Your Type they would find perhaps some of the earliest references to the value of using grass-fed beef, spelt and sprouted breads, quinoa and amaranth grains and a host of other weird foods. Now these are all part of the popular culture, but back then nobody recommended these things. I’ve always thought that the basic benefit of using blood type as a guide to proper eating was its ability to let people know exactly which, of all the supposedly healthy diets, would be best for them.
  • There is no scientific validation. This criticism is particularly nefarious, since it says one thing but means another. To say that the blood type diet theory has not been tested in large numbers, via a double-blind study, at some neutral university, is in fact true. There are reasons why this has not occurred. One is the sheer size and cost of a study of this sort. Food and diet studies are notoriously difficult to control, require constant supervision, often must be pursed for years, and are enormously expensive. Now, on top of all that, multiply the work and expense by four. But let’s talk about what this criticism actually means. Most of the time what the critic is really implying is that the theory has no scientific basis, which is a form of intellectual dishonesty, since not one of theses critics has ever taken the time to read through the extensive collation of existing research, much of it freely available online, that supports the conclusions drawn to form the basis of the blood type diet. I don’t spend a lot of time writing critiques of other diet theories, but if I was planning to do so, I’d probably pick up the phone, call the other guy, explain my concerns and see if I’d gotten my facts straight. Yet in the last fifteen years I’ve never received any such phone call.
  • This is pseudoscience. This pejorative term is often used in conventional science to tar and feather ideas and practices that have no basis in accepted science, such as a theory that flouts the basic laws of physics. Of course, one man’s pseudoscience is another man’s frontier science, but that is the continual rub of forward progress and most honest scientists agree that it is part of the game. However, usage of the term has increasing become a favorite tactic of scoundrels to manufacture disinformation and stop interest and inquiry into a topic they might detest for any number of non-scientific reasons. By the way, there is nothing in the theory behind eating for your blood type that flouts the laws of physics, chemistry, immunology or physiology. On the contrary, that’s the problem: To talk intelligently about it requires that you know more than a little about these disciplines.

Case History

My office was recently contacted by Alexia Elejalde-Ruiz, a young reporter for the Chicago Tribune and told that they were doing an article on the blood type diet. Having given literally hundreds, if not thousands, of interviews at this point, my PR person usually begins the process with a few questions, such as what prompted the interest. In this case the interest appeared to stem from the health editor, who is apparently a type O vegan and somewhat distraught by the conclusions drawn from my theories. However the reporter was respectful and several emails went back and forth with the aim of addressing concerns from some of the nutrition experts. Since some were quite technical we directed the reporter to various links that discussed those point in detail. However, none of our corrections were turned into teaching points (for example, ‘a lot of people often think this about the diet, but in fact..’). Instead it seemed that they just went down their list to the next concern.

My Spidey senses were tingling, and I suspected that the article already had a preconceived agenda. Not surprisingly, when released the article pretty much went out of its way to diss the entire concept. Because of an association with Gannett Publishing, this article has appeared in other publications as well.

Famously, one of the experts firing the dreaded Parthian Shot, a Michael Greger, MD, who is touted as the head of something called, was quoted as the following:

Dr. Michael Greger, founder of, said the premise of the blood-type diet is wrong: The blood-type system, which predates humans, is far more complicated than just ABO, he said. “People crave individualized, personalized science, but this is pseudoscience,” said Greger, a general practitioner specializing in clinical nutrition.

I will not dwell on the stupendously ignorant basis of Dr. Greger’s criticism, since it betrays a complete lack of understanding about how blood types function in the body. You can read my answer to a similarly uninformed vegan doc here. What is interesting is the use of the pseudoscience label. Not simply because it is being applied by a person ignorant in the basic science, but rather because a quick look at Dr. Greger’s website ( and his ostensibly important job as head of the rather questionable show instead a vegan-biased, rather jaundiced army of one, and certainly not the vaunted expert the Tribune article purports him to be. Although I certainly acknowledge the value of vegan diets for some people (but certainly not everyone) others argue that the vegan diet theory itself is a pseudoscience, hence the title of this blog.

Conclusions, if any.

Media profilers of science and health need to start vetting their so-called experts. Here’s a news flash: Many critics of diet books have their own diet books and dietary agendas to protect. You’d think as professional news media this would cross their minds, but apparently it doesn’t. Balanced journalism should not mean that you just go out and find someone to disagree with the premise of your coverage.

No comments on this article yet

Oct 12 2011

Walking on Eggshells

After a self-declared coding holiday, I was back at things this weekend working on the Pathscrubber module of the Datapunk platform. A recently developed vexing problem that needed to be addressed was actually two problems intertwined. If you used Pathscrubber and clicked on any gene/protein node, PS would query Entrez-gene for the descriptive text and pull a bunch of theory and clinical stuff together and send it all out as a pop-up window. For some reason the response time (on their end) was unbearably slow. The second problem was a change to the interface between NCBI and the OMIM database. OMIM is run by Johns Hopkins and suddenly one day the NCBI query tool that PS uses to get the OMIM entry on any gene stopped working. It was certainly their problem since the NCBI’s own links do not work. However I discovered that OMIM was now available for download (something like 200 megabytes total).

Gotta love having an email address that ends in ‘.edu’!

Datapunk Logo.

Datapunk Logo.


However there were problems with the data files, beyond the fact that they were incredibly huge. They are not in a typically common data file format, where each record is delineated by a carriage return (‘enter’) and each field in the record is delineated by a  tab, comma or pipe (|)  character. The OMIM gene records as weird blend of individual lines that contain data and other lines that name fields, all of which are variable in length and appearance.  I’ve dealt with files like these  before (some KEGG files have this format)  and you have to really work hard to code a way for Perl (the computer language I typically use)  to tease out what you need. Fortunately, Perl has a vibrant community of programmers that produce different ‘modules’ that expand Perl’s capabilities. Thus you do not have to reinvent the wheel if someone has already done it.

One module I use a lot is called BIO::PERL. This has lots of cool interfaces and tools, including one that parses (reads) OMIM gene files. Normally that would be end of the story. However that BIO::PERL module, while doing a good job, was too slow, so I developed a work-around that involved using the module to tease out specific data, which was then re-organized and written to new data files indexed by the OMIM gene ID number. By the time I was done, I have four different data files which the program could quickly query and execute rapidly.

One problem I encountered doing this was the exceedingly complex nature of the data returned from the BIO::PERL parser. Much of it was nested inside a series of ‘hash arrays.’ In the computer world an array is a place to store data, much like an egg carton stores eggs: once the eggs are in the carton, you can specify which egg you want by naming the column and row number of the egg you want. Easy enough, but in computer world, in addition to an egg (or an empty space), the location of any place in our egg carton can also contain the location of another egg carton!

This is how data often gains meaning from organization.



There are 2 comments on this article so far

Sep 24 2011

Simple Differences

The last century has seen science and technology used to justify any and all supremacist theories, culminating in the development of a pseudoscience called “Eugenics”, which advocated the improvement of society through what might me called selective breeding. Now, not all of the eugenic goals were crackpot and indeed many prominent scientists (including one of the greatest scientists of all, R.A. Fisher) allied themselves with the movement, at least in its early stages. Indeed one can still see some aspects of eugenic thinking in society’s use of prenatal testing and screening, genetic counseling and birth control.

However, eugenics had a far seedier side. For example, in July 1933 Germany passed a law allowing for the involuntary sterilization of “hereditary and incurable drunkards, sexual criminals, lunatics, and those from an incurable disease which would be passed on to their offspring.” Sweden, the USA, Canada, and virtually every non-Catholic country had Eugenic Societies. In the USA, immigration policies were motivated by the goals of eugenics, in particular a desire to exclude “inferior” races from the national gene pool.

As the human legacy of Nazism became known to the postwar scientific community, and it shuddered at its consequences, and many scientists began to look upon genetics and anthropology as the very opposite of race-definers; they saw it instead as a way of showing just how bankrupt the notion of racial stereotyping was.

William Boyd and Isaac Asimov put the first modern scientific approach to race forward in a simple, readable, and completely forgotten book called Races and People. Written in 1955, it is an unabashed championing of the essential value of any human being. Asimov, well known to three generations of Science Fiction readers, had grown up Jewish in an era when significant portions of the world found anti-Semitism innocuous or even virtuous. Boyd, blood type anthropologist, science fiction writer and the discoverer of of the blood type specificity of certain  lectins (talk about a life!), used research with blood groups to demonstrate that the superficial characteristics which so many of us use to define race and determine our value vis-à-vis other human beings are utterly without scientific basis. (1)

Publishing their book in a time when racial segregation and colonialism were still the norm and in the wake of terrible genocide, Boyd and Asimov set the pattern for all future anthropologic and genetic analysis of race. However, with the onset of those classic liberal values we so identify with the 1960’s and 1970’s and their effects in popular culture and academia, the pendulum began to swing the other way round. In scientific circles, race became a non-entity, possessing no significance whatsoever.

'Around a flowering tree, one finds many insects.' - Proverb from Guinea

Boyd defined later race as “not an individual, not a single genotype, but a group of individuals more or less from the same geographical area (a population), usually with a number of identical genes, but in which many different types may occur.” For Boyd, as with Livingstone, you got your racial characteristics from where you live more than from your genes, and this explained why the variability made the notions of race untenable. (2)

Rather than being racists themselves, I think we should consider the early blood group researchers rare beacons of tolerance in a world still coming to grips with the notion of equality for all.

However, just because you say something doesn’t exist doesn’t necessarily make it go away, and it is childish to think that we can contribute to the elimination of racism by putting our heads in the sand with the belief that there are no clearly defined races. One of the primary blood type/ anthropology sources I’ve cited, Frank Livingstone, (3) even rejected the concept of race altogether. Livingstone suggested that the variability in the frequency of any gene does not utilize the concept of race. He pointed out that although it is true that there is biological variability between the populations of organisms which comprise a species, this variability does not conform to the discrete packages we call ‘races’. In other words, there are no races, the are only clines (a ‘cline’ is a gradient of physiological change in a group of related organisms usually along a line of environmental transition). This is still the guiding principle in contemporary anthropology; at least in name, if not in practice. Instead of racial distributions, we now have “clines”: distribution lines very much like those you would see on a weather map. Not surprisingly, most of these clines do a very nice job of delineating population differences that any person could have arrived at by simply traveling to that area and having a look around.

Alice Brues, a well-known physical anthropologist, addressed the folly of avoiding race as a physical characteristic:

“A popular political statement now is, “There is no such thing as race.” I wonder what people think when they hear this. They would have to suppose that the speaker, if he were dropped by parachute into downtown Nairobi, would be unable to tell, by looking around him, whether he was in Nairobi or Stockholm. This could only damage his credibility. The visible differences between different populations of the world tell everyone that there is something there.”

An important paper written against the use of race as a method of classification argued that since the probability of mis-classifying an individual based on variation in a single gene is approximately 30%, race is an invalid taxonomic construct: In short because humans share 50% of their DNA with a rose bush, we must be 50% the same. This was countered by an argument (“Lewontin’s Fallacy“) that argued if one took into account more genetic markers, the possibility of a racial mis-classification rapidly dropped to almost 0%. The counterargument to this counterargument is that if we looked at enough genes we could presumably distinguish Swedes and Norwegians as two distinct races.

Let’s take a moment to remember while that Boyd and Asimov did not deny the existence of race, they demolished the notion of using race to determine an individual’s value. For our purposes, we’ll use race and ethnicity simply to get additional information that may be valuable in helping to design a more intelligent lifestyle for you, the reader. Let’s just assume that you can and do belong to certain human groupings whose members have more in common with each other than they do with other groupings.

What we call a “race” is really just another fact. Moreover, when we try to subsume it into non-existence, we do injustice to both sides of the distinction. When you share a fact with someone, it makes no one the better or worse, just better informed.

‘History is bunk,” wrote the industrialist Henry Ford. It is a quote with the ring of truth in it. We are destined to interpret past events through the eyes of who left the record (usually the winner) and our own modern day thoughts and rationales. Losers rarely write history, and it is just about impossible for the average person to put himself or herself in the mindset of a person living in a world without light, heat, supermarkets, and the Internet.

Science is fact-based, but scientists can sometimes be charmingly naïve. One of the most common ways they display this naiveté is the coining of politically correct euphemisms. Thus, instead of the negatively charged term “race” you sometimes see the phrase “mutually inbred ancestral groups” which, at least to me, sounds even worse.

Despite the gloss, we at least now have a framework to allow us to collect and categorize those genes and polymorphisms that show different frequencies between races.

Called Ancestry-Informative Markers (AIM) this category of genes includes blood groups, markers of pigmentation and other SNP’s that distinguish between races but don’t always result in some visually detectable difference. A collection of AIM’s that distinguish African and European populations contains over 3000 highly differentiated SNP’s.

An example of an AIM gene is called “Duffy,” which codes for the Duffy blood group, The Duffy blood group has a variant that codes for a Duffy blood type (Duffy Null allele) that is found 100% of Sub-Saharan Africans, but occurs very infrequently in other races. Interestingly, like some of the hemoglobins, this variant has been known to provide some resistance to malaria infection.

Another interesting variant of the APOA1 gene (the TT genotype) is seen in high concentration in African Americans. This variant may help to explain their higher rates of heart disease as a genetic factor leading to difficulty in adapting to new nutritional environments.  (1)

Once, after a public lecture, I was approached by an attendee who asked if I was aware that there were criticisms of my work as ‘racist’ on the internet, and that I had derived my conclusions from long-discredited research done by the Nazis in concentration camps. It turned out that the accuser was a zealous follower of veganism , who thought this might be an effective way to quell further interest in my conclusions.

In all of my trolling through the scientific literature on blood groups since 1910 I’ve not recovered a single reference on ABO blood group that supported any of the racial notions then in vogue in Nazi Germany. My suspicion is that if any research was done the results were not supportive of their racist prejudices — i.e. the subjects were more alike on a blood group basis than they would have liked to admit.

  1. Boyd WC and Asimov I. Races and People. Abelard-Schuman 1955
  2. Boyd WC. 1952 The Contribution of Genetics to Anbthyropology. in Anthropology Today, ed. by A.L. Kroeber,
  3. Livingstone FB. 1962 On the non-existence of human race. Current Anthropology 3 (3):279-281.
  4. Lutucuta S, Ballantyne CM, Elghannam H, Gotto AM Jr, Marian AJ. Novel polymorphisms in promoter region of ATP-binding cassette transporter gene and plasma lipids, severity, progression, and regression of coronary atherosclerosis and response to therapy. Circ Res. May 11; 88(9):969-73. (2001)

There is 1 comment on this article so far

Sep 12 2011

Transposable elements

DNA sequencing is not static. A considerable amount of DNA jumps around from place to place. While other elements compete for representation at a given locus, transposable elements accumulate by copying themselves to new locations in the genome. Transposable elements are sequences of DNA that can move around to different positions within the genome of a single cell, a process called transposition. In the process, they can cause mutations and change the amount of DNA in the genome.

Consequently, there may be tens of thousands of active copies of a single transposable element dotted around the genome of a single individual, and different individuals may have their insertions in different places in the genome. Being small and adapted to integrating themselves into novel places in then genome, transposable elements are also capable of moving between species. Because of this amazingly expansive drive (both within and between species, transposable elements have appeared to colonize all eukaryotic species and have radiated into a bewildering array of subtypes. Most species have multiple types or families of transposable elements, each present in multiple copies per genome. About half of our own genome is derived from transposable elements.

There are three main types of transposable elements, which have little in common in structure or mechanism other than the fact that they are relatively short and that they encode one protein. DNA (or Class II) transposons typically encode one protein. DNA transposons usually move by a mechanism analogous to cut and paste, rather than copy and paste, using an enzyme called transposase, which recognizes the ends of an element, cuts it out, and reinserts it elsewhere in the genome. These cut and paste mechanisms lead to an increase in copy number. Transposons typically produce insertion-type frame shift mutations.

The other two classes transpose via an RNA intermediate through the action of reverse transcriptase. Retrotransposons copy themselves to RNA, and then the RNA is copied into DNA by a reverse transcriptase and inserted back into the genome. Barbara McClintock (1902-1992) first discovered transposable elements in 1952 in the Ac and Ds elements of DNA transposons in maize.


Barbara McClintock (1902-1992) first discovered transposable elements in 1952, working with various species of maize. Her initial reports were met with such a degree of hostility and skepticism that she did not publish any additional research on the subject until 1959. As it turns out the variegated color of the “Indian corn” people often use as a holiday decoration around Thanksgiving is the result of transposition!















The simpler class, long interspersed repetitive elements (LINE’s), typically encodes one or two proteins. LINE’s are a group of genetic elements that are found in large numbers in eukaryotic genomes. They are transcribed to an RNA using an RNA polymerase II promoter that resides inside the LINE. LINE’s encode a multifunctional enzyme with domains for DNA binding, DNA cleavage, and the reverse transcription of RNA into DNA. The reverse transcriptase has a higher specificity for the LINE RNA than other RNA and makes a DNA copy of the RNA that can be integrated into the genome at a new site. Unlike most host genes, which have their promoter region upstream of their transcription start site, many LINE’s have an internal promoter. By carrying its own promoter, the element increases the probability that it will be transcribed regardless of where it happens to land in the genome. (3) Because LINE’s move by copying themselves (instead of moving, as transposons do), they enlarge the genome. The human genome, for example, contains about 20,000-40,000 LINE’s, which is roughly 21% of the genome. (4)

The long terminal repeats (LTR’s) encode five to six proteins, typically two to three structural proteins (capsid and nucleocapsid), and three enzymes (protease, reverse transcriptase, and integrase). LTR’s are thought to be an amalgam of the other two types; since the reverse transcriptase is homologous to that of the LINE2, and the integrase is homologous to some transposases. About 8% of the human genome and approximately 10% of the mouse genome are composed of the LTR transposons. (5)

Short interspersed repetitive elements (SINE’s) are short DNA sequences (<500 bases) that do not encode any proteins themselves but instead have evolved to parasitize the LINE retrotransposon machinery. SINE’s do not encode a functional reverse transcriptase protein and rely on other mobile elements for transposition. With about 1,500,000 copies, SINE’s make up about 13% of the human genome. (6) While historically viewed as “junk DNA,” recent research suggests that in some rare cases both LINE’s and SINE’s were incorporated into novel genes so as to evolve new functionality. The distribution of these elements has been implicated in some genetic diseases and cancers. The most common SINE’s in primates are called Alu sequences.  We have about one million copies of Alu and its relatives in our DNA and about 7,000 are unique to humans. It is estimated that about 10.7% of the human genome consists of Alu sequences. Alu elements appear to control gene expression by inserting themselves all over the place. Alu elements are 280 base pairs long, do not contain any coding sequences, and can be recognized by the restriction enzyme Alu (hence the name).

Single nucleotide DNA variations in an Alu element have been linked to human disease. For example, a SNP in the promoter region of the myeloperoxidase (MPO) gene has been associated with a variety of disorders, including Alzheimer’s disease, lung cancer, stomach cancer, and lupus nephritis. (8)

Alu insertions are associated with several diseases: (7)
•    Breast cancer
•    Ewing’s sarcoma
•    Familial hypercholesterolemia
•    Hemophilia
•    Neurofibromatosis
•    Diabetes mellitus type II

Despite their proliferative capacities, there is abundant evidence from genome sequencing studies that transposable elements often go extinct within a host species, with active copies of the gene disappearing from the gene pool. In the human genome there are hundreds of thousands of inactive “fossil” DNA transposons grouped into 63 families, all of which proliferated to varying extents  to various degrees at various times of our existence —and at this point in time are not completely inactive. (9)

Transposable elements have some very sophisticated enzymatic capabilities that in certain circumstances may be useful to the rest of the genome. An apparent clear-cut co-option of a transposable element has taken place within the evolution of the vertebrate immune system. Most vertebrates have immunoglobulin (Ig) and T-cell receptor (TCR) genes that are “split” and must be re-assembled by recombination before they can be expressed. The split nature of immunoglobulin and T-cell-receptor genes appears to derive from germ line insertion of this element into an ancestral receptor gene soon after the evolutionary divergence of jawed and jawless vertebrates.

This recombination, called V(D)J recombination, occurs only in lymphocyte cells and in most  vertebrates  is responsible for generating much of the diversity of antigen receptors within an individual organism, the assembly process resulting in slightly different genes in different cells. This assembly process is initiated by proteins encoded by the RAG1 and RAG2 genes, cleaving the Ig and TCR genes in a method very similar to that initiating DNA-based transposition. Moreover, the RAG1 and RAG2 genes are immediately adjacent to each other in the genome. These observations led to the suggestion that RAG1, RAG2, and the repeating domain they recognize in the Ig and TCR genes are descendants of ancient transposons that have since become domesticated for host benefit. (10)

Genetic instability is one of the principal hallmarks and causative factors in cancer. Human transposable elements have been reported to cause human diseases, including several types of cancer through insertional mutagenesis of genes critical for preventing or driving malignant transformation. (11)

Portions excerpted from Fundamentals of Generative Medicine copyright 2010, Drum Hill Publishing, USA.

1.    Sawyer SA, Parsch J, Zhang Z, Hartl DL. Prevalence of positive selection among nearly neutral amino acid replacements in Drosophila. Proc. Natl. Acad. Sci. U.S.A. 104 (16): 6504–10 (2007)
2.    Vega F, Medeiros A. Chromosomal translocations involved in non-Hodgkin lymphomas. Archiv Path Lab Med 127 (9): 1148–60(2003)
3.    Burt A and Trivers R. Genes in Conflict: The biology of selfish genetic elements. Belknap Harvard Cambridge MA (2006)
4.    Singer MF SINE’s and LINE’s: highly repeated short and long interspersed sequences in mammalian genomes. Cell 28 (3): 433–4. (1982)
5.    McCarthy EM, McDonald JF Long terminal repeat retrotransposons of Mus musculus. Genome Biol. 5 (3): R14. (2004).
6.    Ibid 3.
7.    Batzer MA, Deininger PL. Alu repeats and human genomic diversity. Nat. Rev. Genet. 3 (5): 370–9 (May 2002)
9.    Lander ES et al. Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409 (6822):860-921
10.    Agrawal A, Eastman QM, Schatz DG. Transposition mediated by RAG1 and RAG2 and its implications for the evolution of the immune system. Nature. 1998 Aug 20; 394(6695):744-51

11. Belancio V,Roy-Engel A,and  Deininger P. All y’all need to know ‘bout retroelements in cancer. Semin Cancer Biol. 2010 August; 20(4): 200–210.

There are 2 comments on this article so far

Aug 30 2011

Epigenetic inheritance


The stream of time sweeps away errors, and leaves the truth for the inheritance of humanity.

—Georg Brandes



Approximately 10% of the protein pool encoded by the mammalian genome plays a role in transcription or chromatin regulation.  Given that the mammalian genome consists of 3,000,000,000 base pairs this gives rise to an astounding array of possible regulatory messages, including DNA binding interactions, histone modifications, histone variants, nucleosome remodeling, DNA methylation and non-coding RNA.

The maintenance of a repressed or activated status of a gene is often necessary for cellular differentiation.  This observation should not be very surprising: A person’s liver cells, skin cells and kidney cells look different and behave quite differently, yet the all contain the same genetic information. With very few exceptions the differences between specialized cells are epigenetic (e.g. “post-genomic” or “post-translational”), not genetic.  The remarkable thing about specialized cells is that not only can they acquire specialized traits and functions through development; they can also pass on these phenotypic manifestations to their own daughter cells. Although their DNA sequences remain unchanged during development, differentiated cells nevertheless acquire information that they pass on to their succeeding generations. The transmission of this sort of information is known as epigenetic inheritance systems (EIS).

Although histone modifications have been predicted to affect transcription almost since their discovery in the mid 1960’s  the existence of epigenetic inheritance was not widely recognized until the mid 1970’s. Embryologists had always wrestled with the basis for cell differentiation, but were more interested in the signaling that switched genes on and off and the cascade of events that lead cells to become specialized. Less research interest was placed on how cells seem to remember this new state and how they passed it off to their progeny. In 1975, a series of articles independently suggested a mechanism that would enable states of gene activity and inactivity to be maintained and transmitted to future cell generations.


The Types of Epigenetic Inheritance Systems:

1. Self-sustaining loops
2. Architectural changes
3. Chromatin marking systems



The most elemental form of EIS is known as the self-sustaining loop, first described theoretically by the American geneticist Sewall Wright in 1945. The essence of self-sustaining loop is that X causes Y, and Y causes X. An example might be a temporary environmental cue that turns a gene on and the product of that gene in turn ensures the continued activity of the gene. In this case the product of gene A is gene A’s own regulator, attaching to the control region of A and keeping it active long after the environmental cure that induced it has dropped out and disappeared. Following cell division, the level of protein A is high enough in the daughter cells to induce further activity from their own genes. You might recognize that there is significant potential for phenotypic variation here; since protein production is itself subject to stochastic variation, it is quite possible that two daughter cells could have differing amounts of protein A and perhaps in one the level is below the amount needed to activate the gene A regulator. In this daughter cell, the gene might then deactivate, producing two daughter cells with quite different phenotypes. In a self-sustaining loop, the functional state is dependant on the interactions between the constituent elements. The state of the loop is transmitted from generation to generation as a whole, and it varies as a whole. The nondecomposable nature of the information in this type of system is called holistic and it is very different from decomposable systems, like DNA where the components (nucleotides) can be changed without destroying the whole system.

The second type of epigenetic inheritance involves architectural changes to the cell structure and the subsequent transmission of these structural changes to the offspring. The British biologist Thomas Cavalier-Smith has advanced the basis for this form of inheritance.  Cell membranes, such as the plasma membrane that surrounds the cell or the internal membrane system of the endoplasmic reticulum differ from each other in both composition and location. They cannot assemble without guidance, and their consistency and continuity depend on preexisting membranes, which act as templates for more membranes with the same structure. From this templating, the membrane grows and eventually divides between daughter cells. Cavalier-Smith refers to as the membranome, and believes that many of the most important landmark events in early development, including the formation of the first cell were dependent on changes to the membranome.

There appears to be some evidence for this mechanism in the unique patterns of inheritance seen in the prion diseases, such as Creutzfeldt-Jakob disease and kuru. A prion is an infectious agent that is composed primarily of protein. To date, all such agents have been discovered to propagate by transmitting a misfolded protein state. As with viruses the protein itself does not self-replicate, rather it induces existing polypeptides in the host organism to take on the rogue form. Prion particles can be transferred from one cell generation to the next and in each instance modify normal proteins to assume prion-like characteristics. Certain sea slugs use a prion-like protein to remember certain experiences; and undoubtedly, more prion-like mechanisms will be identified in the future.

The third system of epigenetic inheritance is known as the chromatin marking systems. Chromatin is the stuff of chromosomes. It is the DNA plus all the RNA, proteins and whatever other molecules happen to be associated with it. We’re interested in the non-DNA features of chromatin, as these are the aspects of chromatin transmitted generation to generation that enable states of gene activity to be perpetuated in the cell lineages.


Chromatin Marking Systems:

1. DNA methylation and demethylation
2. Histone methylation and demethylation
3. Histone acetylation and de-acetylation
4. Phosphorylation
5. Ubiquitination, deubiquitination, and SUMOylation
6. RNA interference (secondary)




  1. Luger K, Mäder AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389 (6648): 251–60. 1997
  2. Kupiec JJ. The Origin of Individuals. World Scientific Press Singapore (2009)
  3. Ibid 2.
  4. Allfrey VG, Faulkner R, Mirsky AE. Acetylation and methylation of histones and their possible role in the regulation of RNA synthesis. Proc. Natl. Acad. Sci. U.S.A. 51:1964
  5. Felsenfeld G, Groudine M. Controlling the double helix. Nature 421 (6921): 448–53. 2003
  6. Jablonka E and Lamb M. Evolution in Four Dimensions. The MIT Press. Cambridge MA 2006
  7. Holliday R, Pugh JE. DNA modification mechanisms and gene activity during development. Science. 1975 Jan 24; 187(4173):226-32.
  8. Riggs AD. X inactivation, differentiation, and DNA methylation. Cytogenet Cell Genet. 1975; 14(1):9-25.
  9. Ibid 6.
  10. Cavalier-Smith TH. “Membranome and Membrane Heredity in Development and Evolution” by in:  Organelles, Genomes, and Eukaryote Phylogeny: An Evolutionary Synthesis in the Age of Genomics. Editors: Hirt RP and Horner DS. CRC Press (2004)

There are 2 comments on this article so far

Jun 18 2011

Genetics as an Interactive Sport

“For your information, I would like to ask a question.”
—Samuel Goldwyn

From an evolutionary point of view, in order for something to carry information, there must first be some sort of “receiver” that reacts to the source of information and interprets it. Through its reaction and interpretation, the receiver’s functional state is changed in a way that is related to the form and organization of the source. Receivers usually have no intentionality in themselves, although it often benefits from the results. Like receivers, most sources do not change when a receiver reacts to and acquires information from them. Humans for example, don’t change much reading one recipe or another. Your computers don’t change physically if you change whatever software you are running on it.

What is information?

A common definition of information is “the knowledge of specific events or situations that has been gathered or received by communication.” (1) Communication is the change of information from one state to another. Only a minute fraction of the energy used by most living systems is employed for information processing. In living systems theory when communications are processed they often shift from one matter-energy state to another, from one sort of marker to another. Matter-energy and information always flow together.

For our needs, information means that a stretch of DNA embodies in an encoded form the particular amino acid sequence for a polypeptide chain. It could just as well be said that a particular DNA sequence provides the code of a regulatory protein to attach to itself, thereby perhaps stopping the transcription of another gene in another stretch of code.

By asking certain questions about the information dynamics of a particular system of heredity transmission, we can help pinpoint similarities and differences. For example, we think of DNA as a “linear” sequence of units (the nucleotides A, T, C, and G) in which any site in the sequence can be occupied by any one of the four possible nucleotides. These nucleotides are interchangeable; replacing one nucleotide in the sequence with another will not influence the sequence to nucleotides that come before or after it. This means that quite a large number of sequences are possible. A sequence of 100 nucleotides, made up of four nucleotides will be capable of producing 4100 different sequences. Even at this graspable level, we are producing unfathomable numbers: This is a number greater than the total number of atoms in our galaxy! (2)

Another peculiar property of DNA is that, much like a photocopier, it will reproduce The Gettysburg Address or a Rorschach Drawing with the same degree of fidelity: DNA has a fundamental indifference to what is being replicated. The combination of its vast capability for possible variations and its inherent indifference to the outcome of replication means that DNA can provide a lot of raw material for natural selection. The downside is that many nonsensical and useless DNA variations can also be generated.

Control of Gene Activity. A: The product of gene P binds to the control region of gene Q and prevents transcription. B: A regulatory molecule associates with the product of gene P, which changes it to a nonfunctional shape, making it unable to bind to gene Q. Gene Q is now transcribed and the mRNA translated into a protein. (after Jablonka and Lamb, 2005).

Control of the process of gene transcription affects patterns of gene expression and, thereby, allows a cell to adapt to a changing environment, perform specialized roles within an organism, and maintain basic metabolic processes necessary for survival. A protein involved in regulating gene expression is called a regulatory protein or regulatory molecule. It is usually bound to a DNA binding site that is typically located near the gene’s promoter site Regulatory proteins often bind with a regulatory binding site to switch a gene on (activator) or to shut off a gene (repressor). Generally, as the organism grows more sophisticated, their cellular protein regulation becomes more complicated and many activators and repressors working together can control many of our genes. A major feature of multicellular animals is the use of morphogen gradients, which in effect provide a “global positioning system” that tells a cell where in the body it is, and subsequently what sort of cell to become. A gene that is turned on in one cell may make a product that leaves the cell and diffuses through adjacent cells, entering them and turning on genes only when it is present above a certain threshold level. These cells are thus induced into a new fate, and they may even generate other morphogens that signal back to the original cell.

Cells are mostly the mRNA’s and proteins that arise from gene expression. These mRNA and proteins interact with each other with various degrees of specificity. Some diffuse around the cell. Others are bound to cell membranes, interacting with molecules in the environment. Still others pass through cell membranes and mediate long-range signals to other cells in a multi-cellular organism. These molecules and their interactions comprise a gene regulatory network

Cells are mostly the mRNA and proteins that arise from gene expression. These mRNA and proteins interact with each other with various degrees of specificity. Some diffuse around the cell. Others are bound to cell membranes, interacting with molecules in the environment. Still others pass through cell membranes and mediate long-range signals to other cells in a multi-cellular organism. These molecules and their interactions comprise a gene regulatory network

Over longer distances, morphogens may use the active process of signal transduction. Such signaling controls embryogenesis and maintains and regulates adult bodies through feedback processes. The loss of such feedback because of a mutation can be responsible for the cell proliferation that is seen in cancer. In parallel with this process of building structure, the gene cascade turns on genes that make structural proteins that give each cell the physical properties it needs. It has been suggested that, because biological molecular interactions are intrinsically stochastic (random), gene networks are the result of cellular processes and not their cause.

Thus, molecular networks are not the cause but the result of cellular processes because these latter restrict the stochastic variability of molecular interactions. (3) Genes are ruled by probabilistic mechanisms allowing cells to differentiate stochastically: Man may be a machine but he is a random one. (4)

This realization leads us to my second favorite definition of information, courtesy of Shu-Kun Lin (b. 1957):

“Information is the amount of the data after data compression.”

  1. Miller JG. Living Systems. McGraw Hill New York, NY (1978)
  2. Jablonka E and Lamb M. Evolution in Four Dimensions. MIT Press, Cambridge MA USA (2005)
  3. Laforge B, Guez D, Martinez M, Kupiec J. Modeling embryogenesis and cancer: an approach based on equilibrium between the autostabilization of stochastic gene expression and the interdependence of cells for proliferation. Progress in Biophysics and Molecular Biology 89 (2005) 93–120
  4. Kupiec J. The Origin of Individuals. World Scientific Publishing Company (2009)

There is 1 comment on this article so far

Apr 22 2011

Enterotypes and blood types

My mailbox got choked up the other morning with emails from friends and strangers wondering about my thoughts on the recent study published in Nature on the mapping of gut bacterial (microbiome) patterns to basically three general groups. The findings were extensively reported in the media, including Wired Magazine which managed to start the article off with the bizarre claim that humans can belong to any one of eight blood groups. Although I suspect that the writer was alluding to ABO and Rhesus (4*2=8) that is not a very accurate way of putting things, since there are a large number of determinants, and in fact the secretor status (FUT2) that controls ABO presence in bodily secretions is much more significant ‘blood group’ on a phenotypic level that Rhesus (Rh), which is a true erythrocyte antigen and not found in the gut or body secretions.

The authors found three distinctive “enterotypes,” or bacterial communities dominated by a distinct genus — Bacteroides, Prevotella or Ruminococcus — each of which is found with a particular community of bacteria.

The abstract describes the study as follows:

By combining 22 newly sequenced fecal metagenomes of individuals from four countries with previously published data sets, here we identify three robust clusters (referred to as enterotypes hereafter) that are not nation or continent specific. We also confirmed the enterotypes in two published, larger cohorts, indicating that intestinal microbiota variation is generally stratified, not continuous. This indicates further the existence of a limited number of well-balanced host-microbial symbiotic states that might respond differently to diet and drug intake.

There is much being made out of the apparent similarities between these enterotypes and the human blood groups. And well there should be. Except the link between blood groups and the microbiome is already well-known. I first wrote about it in my book Live Right For Your Type over ten years ago. (1)


The distal human intestine represents an anaerobic bioreactor programmed with an enormous population of bacteria, dominated by relatively few divisions that are highly diverse at the strain/subspecies level. This microbiota and its collective genomes (microbiome) provide us with genetic and metabolic attributes we have not been required to evolve on our own, including the ability to harvest otherwise inaccessible nutrients. New studies are revealing how the gut microbiota has coevolved with us and how it manipulates and complements our biology in ways that are mutually beneficial.

We are also starting to understand how certain keystone members of the microbiota operate to maintain the stability and functional adaptability of this microbial organ. It is estimated that the human digestive tract may contain up to 100 trillion microorganisms (2) and the human gut may host up to 500-1000 different species of bacteria, of which as little as 7% have been successfully cultured in the laboratory. (3)

What's your type?

The human GI tract is predominantly a bacterial ecosystem. Cell densities in the colon (1011-1012/ml contents) are the highest recorded for any known ecosystem. The vast majority of phylotypes belong to two divisions (superkingdoms) of Bacteria: the Bacteroidetes (48%) and the Firmicutes (51%). The remaining phylotypes are distributed among the Proteobacteria, Verrucomicrobia, Fusobacteria, Cyanobacteria, Spirochaetes, and the candidate phylum VadinBE97.

Gut bacteria can have direct effects on gene activation that may be essential for proper gut development. Bacteria induced expression of mammalian genes has been known since the 1980’s when Japanese researchers were able to show that a fucosyltransferase enzyme (fucosyl-asialo GM1) was induced by bacteria but was absent from germ-free strains. (4)

This is especially interesting in light of the fact that many of the fucosyltransferase enzymes convey blood group and/or secretor status. (5) Human feces contain enzymes produced by enteric bacteria that degrade the A, B, and H blood group antigens of gut mucin glycoproteins.

The autosomal dominant ABH secretor gene together with the ABO blood group gene controls the presence and specificity of A, B, and H blood group antigens in human gut mucin glycoproteins. There is evidence that the host’s ABO blood group and secretor status affects the specificity of blood group-degrading enzymes produced by his fecal bacteria in vitro. (6)

In essence, bacteria ‘eat right for their type’ even if we sometimes don’t.

Comparatively small populations of fecal bacteria produce blood group-degrading enzymes but their presence is highly correlated with the ABO /secretor phenotype of the host: Fecal populations of B-degrading bacteria were stable over time, and their population density averaged 50,000-fold greater in blood group B secretors than in other subjects. In fact, the large populations of fecal anaerobes may be an additional source of blood group antigen substrate for blood group antigen degrading bacteria: antigens crossreacting with blood group antigens were detected on cell walls of anaerobic bacteria from three of 10 cultures inoculated. (7,8)

Another example of “eco-phenotypic cooperation” between a host’s polymorphism and gut bacteria may be seen in the development of the early vascular networks. In this case a mechanism of postnatal animal development, where microbes colonizing a mucosal surface are assigned responsibility for regulating elaboration of the underlying microvasculature by signaling through a bacteria-sensing epithelial cell and its possible relation to a polymorphic phenotype on the part of the host.

  1. D’Adamo P, Whitney C. Live Right For Your Type. 2001. GP Putnam and Sons, NYC
  2. Bäckhed F, Ley RE, Sonnenburg JL, Peterson DA, Gordon JI. Host-bacterial mutualism in the human intestine. Science. Mar 25; 307(5717):1915-20. (2005)
  4. Eckburg, P. B., Bik, E. M., Bernstein, C. N., Purdom, E., Dethlefsen, L., Sargent, M., Gill, S. R., Nelson, K. E. & Relman, D. A. Diversity of the human intestinal microbial flora. Science 308, 1635-38. (2005)
  5. Umesaki Y. Immunohistochemical and biochemical demonstration of the change in glycolipid composition of the intestinal epithelial cell surface in mice in relation to epithelial cell differentiation and bacterial association. J Histochem Cytochem. 1984 Mar; 32(3):299-304.
  6. D’Adamo PJ, Kelly GS. Metabolic and immunologic consequences of ABH secretor and Lewis subtype status. Altern Med Rev. Aug;6(4):390-405; 2001
  7. Hoskins LC, Boulding ET. Degradation of blood group antigens in human colon ecosystems. I. In vitro production of ABH blood group-degrading enzymes by enteric bacteria. J Clin Invest Jan;57(1):63-73;1976
  8. Hoskins LC, Boulding ET. Degradation of blood group antigens in human colon ecosystems. II. A gene interaction in man that affects the fecal population density of certain enteric bacteria. J Clin Invest Jan;57(1):74-82; 1976

There are 6 comments on this article so far

Mar 15 2011

Carbohydrates: More than just calories

Carbohydrates comprise only about 1 percent of the human body; proteins comprise 15 percent, fatty substances 15 percent and inorganic substances 5 percent (the rest being water). Nevertheless, carbohydrates are important constituents of the human diet, accounting for a high percentage of the calories consumed. Thus some 40 percent of the calorie intake of Americans (and some 50 percent of that of Britons and Israelis) is in the form of carbohydrates: glucose, fructose, lactose (milk sugar, a disaccharide of glucose and galactose), sucrose, and starch.

Carbohydrates are the fuel of life, being the main source of energy for living organisms and the central pathway of energy storage and supply for most cells. They are the major products through which the energy of the sun is harnessed and converted into a form that can be utilized by living organisms. According to rough estimates, more than 100 billion tons of carbohydrates are formed each year on the earth from carbon dioxide and water by the process of photosynthesis. Polymers of glucose, such as the starches and the glycogens, are the mediums for the storage of energy in plants and animals respectively. Coal, peat, and petroleum were probably formed from carbohydrates by microbiological and chemical processes.

Carbohydrates are the fuel of life, being the main source of energy for living organisms and the central pathway of energy storage and supply for most cells.

Carbohydrates are the most abundant group of biological compounds on the earth, and the most abundant carbohydrate is cellulose, a polymer of glucose; it is the major structural material of plants. Another abundant carbohydrate is chitin, a polymer of N-acetylglucosamine; it is the major organic component of the exoskeleton of arthropods, such as insects, crabs, and lobsters, which make up the largest class of organisms, comprising some 900,000 species (more than are found in all other families and classes together). It has been estimated that millions of tons of chitin are formed yearly by a single species of crab. (1)

The name carbohydrate was originally assigned to compounds thought to be hydrates of carbon, that is, to consist of carbon, hydrogen, and oxygen. They are typical hexose monosaccharides, meaning that they have six carbon atoms. However, carbohydrates now include polyhydroxy aldehydes, ketones, alcohols, acids and amines, their simple derivatives and the products formed by the condensation of these different compounds through glycosidic linkages (essentially oxygen bridges) into oligomers (oligosaccharides) and polymers (polysaccharides).

The biological roles of carbohydrates are particularly important in the assembly of complex multicellular organs and organisms, which requires interactions between cells and the surrounding matrix. All cells and numerous macromolecules in nature carry an array of covalently attached sugars (monosaccharides) or sugar chains (oligosaccharides and polysaccharides), the latter that are generically referred to as “glycans.” (2)

Localization of glycoconjugates in the intracellular and extracellular compartments.

Because many carbohydrates are on the outer surface of cellular and secreted macromolecules, and are often freestanding entities, they are in a position to modulate or mediate a wide variety of events in cell–cell, cell–matrix, and cell–molecule interactions critical to the development and function of a complex multicellular organism. Much of the current interest in carbohydrates is focused on such substances as glycoproteins and glycolipids, complex carbohydrates in which sugars are linked respectively to proteins and lipids. They are termed glycoconjugates. They can also act as mediators in the interactions between different organisms (for example, between host and a parasite). In addition, simple, rapidly turning over, protein-bound glycans are abundant within the nucleus and cytoplasm, where they can serve as regulatory switches. A more complete paradigm of molecular biology must therefore include glycans, often in covalent combination with other macromolecules, (glycoconjugates) such as glycoproteins and glycolipids. (3) The term glycan may also be used to refer to the carbohydrate portion of a glycoconjugate, such as a glycoprotein, glycolipid, or a proteoglycan.

During the initial phase of the molecular biology revolution of the 1960s and 1970s, studies of glycans lagged far behind those of other major classes of molecules. This was in large part due to their inherent structural complexity and the great difficulty in determining their sequences. Also inhibiting interest was the fact that their biosynthesis could not be directly predicted from a DNA template. In addition, unlike genome products, glycans are highly dynamic and have extraordinarily complex biosynthetic pathways. The development of many new technologies for exploring the structures and functions of glycans has since opened a new frontier of molecular biology. The coming together of the traditional disciplines of carbohydrate chemistry and biochemistry with a modern understanding of the cell and molecular biology of glycans, and in particular, their conjugates with proteins and lipids, is called “glycobiology.” (4)

Analogous to genomics and proteomics, glycomics represents the systematic methodological elucidation of the “glycome” (the totality of glycan structures) of a given cell type or organism. The glycome, a subset of glycobiology, is immense and far more complex than the genome or proteome. In the past decade, over 30 genetic diseases have been identified that alter glycan synthesis and structure, and ultimately the function of nearly all organ systems. Many of the causal mutations affect key biosynthetic enzymes, but more recent discoveries point to defects in chaperones and Golgi-trafficking complexes that impair several glycosylation pathways. As more glycosylation disorders and patients with these disorders are identified, the functions of the glycome are starting to be revealed. (5,6)

  1. Sharon N. Carbohydrates Sci. Am. 245: (5) 90-116. 1980
  2. Varki A and Sharon N. Historical Background and Overview: Varki A, Cummings R, Esko J, Freeze H, Stanley P, Bertozzi C, Hart G, and Etzler M. Essentials of Glycobiology, 2nd edition, Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press; 2009.
  3. Ibid.
  4. Rademacher TW, Parekh RB, Dwek RA. Glycobiology. Annu Rev Biochem. 1988;57:785-838
  5. Freeze HH. Genetic defects in the human glycome. Nat Rev Genet. 2006 Jul; 7(7):537-51.
  6. Taylor ME and Drickamer. Introduction to Glycobiology. Oxford University Press 2nd Edition 2006

There is 1 comment on this article so far

« Newer Entries - Older Entries »