The gene sequence is not enough

Evolutionarily preserved mechanism governs use of genes:
[Via Eureka! Science News – Popular science news]

Researchers at Uppsala University have found that the protein coding parts of a gene are packed in special nucleosomes. The same type of packaging is found in the roundworm C elegans, which is a primeval relative of humans. The mechanism can thereby be traced back a billion years in time, according to the study presented in the journal Genome Research.


Simply knowing ones genomic sequence will not tell us exactly what is going on with them. We all know that environment has an effect on a person’s life but too many people seem to think that knowing the genetic information will tell us all we need to know.

As epigenetics has amply demonstrated, there can be very different cellular responses to stimuli without any change in the underlying DNA sequences. For example, we can have differentiated cells that all have the same DNA but very different forms (i.e. white blood cells vs. muscle cells). Knowing the DNA sequence would not tell us why one cell is used to fight infection and the other to move bones around.

So, just knowing the DNA sequences does not mean one knows just what a person looks like or what diseases they my be susceptible to. In fact, due to epigenetic factors, two animals with the same DNA may not be exactly alike. Life is much more complex than that. Life responds to environmental pressures by altering the proteins that are produced, even when these proteins are produced by the same gene. Yes, the same genetic sequence can produce different proteins with different activities through a process called splicing.

This paper, Nucleosomes are well positioned in exons and carry characteristic histone modifications, (which is Open Access and thus free to read) discusses some possible epigenetic processes that appear to be involved with splicing. It ends with these very interesting words:

Our results show that exons are functional units, defined not only by their coding capacity but also by the way they are packaged in nucleosomes. This may have an impact on their stability and evolution.

It proposes an interesting model where specifically modified proteins serve as markers to delineate where important segments of a gene sequence are, permitting cellular processes to more easily produce a variety of novel proteins in response to different environmental pressures.

Nucleosomes are a big part of the packaging that permits the huge length of DNA in each chromosome to be bundled small enough to fit into a nucleus. They wrap around almost every nucleotide base in the chromosome, cover up much of the DNA.This is the structural role for nucleosomes that has been known for many years. (All images from Wikipedia)

chromatin structure

One of the continuing questions has been whether nucleosome packaging is ‘random’ with respect to gene transcriptional units (that is, it only serves a structural role) or if there are distinct packaging arrangements that are used to identify where genes are. The latter has been shown to be true with respect to the start of genes. This paper postulates that there are also distinct nucleosome arrangements that identify other parts of the gene, called exons.

Our genes, as well as those of almost every multicellular organism, are split into regions that code for proteins (exons) and regions that seldom code for anything (introns). The introns are removed from a large pre-mRNA by a process called splicing, producing a contiguous sequence that can now be translated into a protein by the ribosomes.


The discovery of split genes for protein coding has been one of the more interesting conundrums in biology. Why would organisms produce large mRNAs whose introns would then have to be removed? Why not just make the mRNA directly without the extra step? Why produce an added biological process, with all the extra proteins and RNAs needed, to create the spliceosome?

Not all of these questions have been answered but it does appear that this arrangement provides some added flexibility for the production of novel proteins. Through a process called alternative splicing, different exons can be brought together to produce proteins with different activities. A particular exon could be ‘skipped’ in the splicing process, allowing a new transcript to be produced that codes for a different protein than another transcript, even though they were produced from the same RNA segment.


In fact, there are a variety of splicing scenarios that can produce a large number of different looking transcripts from a single gene in the same organism.

more splicing

So, the presence of splicing provides the ability of a single gene to produce multiple mRNA transcripts and many different proteins, each with bits and pieces in common with others, but with unique structures and activities of their own. Thus a large range of protein activities can be created from a relatively small number of genes.

More importantly, differing biological conditions can result in different modes of alternative splicing and thus different sorts of proteins from the same gene. So, just knowing someone’s genetic sequence is not really enough to know just what proteins are being produced. Understanding the epigenetics of splicing is critical.

Trying to understand how these multitudes of alternative splicing events is controlled has been an area of intense investigation.

Splicing generally takes place in the nucleus. It appears in some cases that splicing takes place as the RNA transcript is produced (co-transcriptional) and in others after the entire pre-mRNA is put together (post-transcriptional). There is some discussion which is more likely but this paper postulates that transcription and splicing happen roughly at the same time.

But how does this work? Nucleosomes, as well as other proteins, cover up all the DNA. How does the cellular apparatus know where to go, particularly when different splice junctions can be thousands of nucleotides apart?

As this paper suggests, some of the proteins making up the nucleosomes can be modified, providing a way to tell nucleosomes apart. Thus, there is a sort of Dewey decimal system for genes. Nucleosomes with the right modified proteins are placed at exons, helping the cellular machinery find them.

Thus, nucleosomes around exons are tagged, making it much easier for the transcription/spliceosome apparatus to find them. Or not find them if the properly tagged modifications are not present in the nucleosomal proteins.

In this model, control of alternative splicing might be facilitated by the epigenetic presence and absence of modifications on specific nucleosomes. This is an important process:

Thus, exons are not only characterized by their coding capacity but also by their nucleosome organization, which seems evolutionary conserved since it is present in both primates and nematodes

An interesting model but that is not the amazing thing about this paper. They produced this model by mining data that are in available databases. As stated in the press release:

The study is based on extremely large amounts of data published by other scientists, but not previously analyzed in such detail.

No new data. Just a reanalysis of previous data in novel ways. Cool.

They have now presented a useful model that can be experimentally examined. Maybe they have correctly identified the mechanism or maybe not. Further work will be needed.

But they have identified something interesting. There is something going on and if this is not yet the explanation, their analysis will still lead to an answer to a question that had not even been formulated when the original data was created.

These researchers have been able to ask a question, use previously generated data to find an answer and proposed a model for explanation. It is a fascinating view of the scientific process.

And it could have been done by anyone with access to the Internet and open databases. Well, anyone with the training and interest. But, there are some very informed ‘amateurs’ who might be able to do something like this.

Perhaps someday we will have biology amateurs, like astronomy has amateurs, who provide key insights by mining databases in novel ways. The barrier to entry for scientific research could be greatly lowered.

Technorati Tags:

One thought on “The gene sequence is not enough

Comments are closed.