A research team of scientists from EMBL Grenoble and the IGBMC in Strasbourg, France, have, for the first time, described in molecular detail the architecture of the central scaffold of TFIID: the human protein complex essential for transcription from DNA to mRNA. The study, published January 7 in Nature, opens new perspectives in the study of transcription and of the structure and mechanism of other large multi-protein assemblies involved in gene regulation.
How in the world do viruses and ice give us the structure of a major protein complex required to convert DNA into mRNA?
I’ve always loved protein structures. These are the molecules that make us what we are and determining what they look like requires such a grasp of so many concepts. You need to understand biology, chemistry, physics and math, all at pretty high levels to do it well.
Luckily, you can still enjoy the work without needing that degree of knowledge. Computer software has gotten to the point that much of the work can be automated. And the visualization procedures have moved from metal ball and stick models to full 3D images (the technology we use to see 3D movies today was developed to see molecules in 3D).
Things get trickier though when we want to look at multi-protein complexes. It is hard enough to determine the structure of a single protein with perhaps 1000 atoms tightly linked together in it. Complexes not only have many more proteins but the proteins are not strongly linked together. Sometimes parts are missing.
Structure determination works best when every complex is exactly the same. If some of the complexes are missing elements, it becomes very difficult.
That is partly what this paper solved. In order to determine the structure, you need to isolate the proteins or protein complex. Here, they were actaully working with several protein complexes, each made up of several proteins.
When they tried to express each protein separately, they could not easily reconstitute the entire complex, even when all the genes were placed in a single cell. Some proteins were expressed at higher levels than others, altering the relative amounts of each protein needed to make a complex – the stoichiometry.
If there is too little of one type of protein, it becomes very hard to recreate the entire complex.
So they fixed this by using a trick from some viruses – they express the needed proteins as one long polyprotein and then clip out each protein by use of another enzyme. This way the stoichiometry always matches. If you need two copies of 3 different proteins, then instead of hoping the cell makes enough copies, make it as one long polyprotein that then gets clipped into 6 different pieces – two copies of three.
By doing this they were able to make large mixtures of complete complexes. But even complexes can be hard to get high definition data. The best approaches – using X-rays or magnetic resonance – become difficult when there are as many atoms as are found in these large complexes.
This particular complex is about 650,000 daltons (Carbon has a mass of about 12 daltons). So we are talking about a lot of atoms.
We have gotten around that limitation using electron microscopy and ice. Cryo-EM does hot have the resolution of the other approaches but can deal with very large complexes.
Essentially the complexes are frozen in a very special way so that electrons can be bounced off of them. We record how the electrons bounce and get a 2D image. This can be done carefully to allow the visualization of the molecules in the ice, which is invisible. You get something like this:
The raw data is on the left, with some actual images arrayed below it. As you can see. while you can visualize each complex, they are arrayed randomly across the 2D image. But this is a feature, not a bug.
Using computers, thousands of these images can be overlaid on one another in such a way at to recreate the entire complex. Think of it this way. A CAT scan looks at 2D layers of a structure one at a time, then adds these layers together to get a 3D image. Here the 2D images are spread out randomly but the computer can still put them back together.
Here is the model you get when you do average thousands of these (the model is rotated 90° each time):
Now this particular complex is actually just one small part of the entire complex they were looking at. They used a handful of different approaches, ranging from structural data of individual proteins (highlighted in the last figure on the right) to the use of antibodies to locate just where specific regions lay on the entire complex.
And they did a lot of refinement along the way.
At the end they had this model:
What is cool is that by using ice we can get the low resolution look at a huge molecular complex. This can tell us a lot about how it actually works.
And, since you needed to have the proteins in the first place, using a viral approach to make a polyprotein is quite nice.
Without our knowledge of viruses, and our knowledge of ice, we would not have a new understanding of a pivotal biological molecule. Cool.