Some bumps in the road

sequencing machines by jurvetson
Your SNPs are your information:
[Via business|bytes|genes|molecules]

A quick follow up to my previous post. Reading Rob Carlson’s wonderfully written post on the C&Ds sent out by the California Department of Public Health to 13 consumer genetics companies got me thinking. He writes

There appears to be some tension between the interpretation of tests ordered for diagnostic purposes, which probably should require a prescription, and sequencing or genotyping services that provide information about a consumer’s genetic makeup.
[More}

There is a really interesting conversation going on about various genotyping companies and their interaction with the California Department of Public Health. There are some real questions about its jurisdiction and some people really wonder why doctors have to be involved. (i.e. Pay the cost of a doctor’s visit to get a prescription to have my own DNA examined. That sounds odd? Why can’t I just send in a cheek swab?)

Some of this also seems to run afoul of the state’s consumer protection focus. Again, this seems a little odd but I guess a point could be made to apply some sort of precautionary principle (i.e. the companies need to demonstrate that they are able to accurately produce a genotype with a very low level of errors) since it is likely that a large percentage of the people using the services may view the results as something more than just informational. Look what happened to Alec Guinness when he was given incorrect information regarding a medical condition.

Having been doing molecular biology since the the late 70s, I know that these techniques are not necessarily an exact science. What level of redundancy will be needed? What are the error rates? How good are the microarray chips used? What are the verified QA/QC procedures and who does the verification? The results had better have a high degree of accuracy.

I was interested in the accuracy of the chips used by these genotyping services. I am basing the following on my own research experience and what is actually written on the web sites. If i have made some fundamental error, please let me know. It would not be the first time I went off on a ragged trail to be brought up by a fatal assumption. But I think I have done the math right.

The important use for 23andme is the generation of a very deep and rich database that can be used to find linkages between important genes and SNPs. The accuracy of an individual’s sample is not strictly that important for this purpose since the law of large numbers means any individual errors will be swamped out. But individual errors in an individual’s sample may be important for that individual (Yes, I am channeling Po from Kung Fu Panda – ‘Legend tells of a legendary warrior whose kung fu skills were the stuff of legend’. Or Criswell from Plan 9 from Outer Space – ‘And remember my friend, future events such as these will affect you in the future.’)

I had to do a little digging to get some numbers but 23andme claims a call rate that averages 99%, meaning that 1% of the SNPs can not be called. With nearly 600,000 SNPs to look at, this means that 6000 SNPs are not called in a single sample. But that is based on an average. Without knowing the standard deviation, it is possible that some samples will have much higher ‘No call’ rates. So someone who might have a SNP linked to an important trait would not know that they had it.

While there is no data given for what percentage of incorrect calls are made, they do say that reproducibility is 99.9%, so that if the sample is run again 99.9% of the information will be the same. (I would assume this also means that the 6000 ‘No Calls’ are also ‘No Calls’ the second time.) So, for 600 SNPs in an individual sample there will be an ambiguity.

Is this a problem? I really do not know, not being an expert. For a research database that will be examining thousands of genomes, this may all be irrelevant. There will be some redundancy from the power of large numbers. The ‘noise’ from errors will be swamped out by the overall SNPs called. So it would be possible to find links between SNPs and phenotypes with this level of error.

But, for individuals who will examine their own data, this could be a little more problematic. Using my rough rule of thumb from all those years I did Poisson distributions, in a group of about 3000 samples, there is a greater than 90% chance that there will be at least one ambiguity (error?) for every one of the 600,000 SNPs on the chip. So if say there was a SNP that tracked an important phenotype, such as Alzheimer’s, someone in that group would most likely get the wrong information.

And in a group of about 300 people, there will be a ‘no call’ for every SNP when there should be a call. So while each individual would have a low probability of having an error in their data, it is very likely that someone in the group will have a error at an important SNP.

Like the lottery, the chances are low of ‘winning’ but you can be assured that someone will ‘win.’

Of course, they would not know they had ‘won.’ And considering that 23andme claims to have performed 10 billion genotypes since 2007, there are a lot of individual samples with important errors. Not important if you are creating a database of 10 billion genotypes but it might be of some concern to an individual.

I understand the principles behind all of this and recognize the extreme scientific importance of such a database. But I do not believe the average person does. In 1988, 21% of the people surveyed thought the Sun revolved around the Earth. Similar numbers were seen in 1999. Only 47% of the people answered correctly said that it takes a year for the Earth to go around the Sun. Only half the people in one survey understood probabilities (which explains the popularity of lotteries). This is from last year:

The most recent National Science Foundation Science Indicators report draws on different surveys to tell us that only about 54 percent of Americans realize that antibiotics do not kill viruses, fewer than half know that genetically modified foods are in their neighborhood grocery store, and only 44 percent believe that human beings developed from other animal species (about three-quarters of those responding realize that the theory of evolution says this, but many reject the theory).

Indeed, more people believe that houses can be haunted than accept the theory of the Big Bang, and 29 percent are not certain that the earth revolves around the sun rather than vice versa. (my emphasis)

Now sometimes I think we really are headed towards Idocracy but not everyone has the time or inclination to understand the ramifications of what they are doing when they engage biotech. (I have had so many dinner conversations about genetically modified foods. I sometimes feel like my father having to explain the oil industry, where he worked as an exploration geologist for 30 years, to those who do not understand its complexity. Although I think people have a greater visceral fear of ‘Frankenfoods’ than they do of OPEC )

But these same people make real life decisions based on the results of a 23andme scan. They do not have the time nor ability to understand deeply what the results really mean (at least the 50% that do not understand probabilities). How many errors are usually present? What does level of confidence really mean?Which companies are really legit and which are just jumping on a money-making opportunity? Few people are able to figure this out. (23andme does an excellent job trying to provide this information, though.)

That is why the American people have designated surrogates in the government to do this for them. Thus the FDA, or the California Department of Public Health. They attempt to bring some sort of validation to very complex processes.

I mean, I know that those ‘natural’ pills will have no real effect on my prowess, no matter how big a smile that guy has. But apparently lots of people believed him. While caveat emptor applies, these sorts of companies are hit all the time with fines and notices by the FDA. The makers of Enzyte apparently defrauded people of $100 million! Its president, his mother and others were found guilty. He may have to serve 20 years in prison, yet the company’s web site is still there, it looks like you can still buy Enzyte and all they are doing is mulling whether they should change their name.

So there is a role for some sort of oversight here and I would expect these companies to welcome some. Now the key question is really who should do the overseeing? I would prefer a Federal group rather than a state but, just as we seem to need New York Attorneys General to go after high tech companies, there may be a need for states to intercede if no Federal one will.

[Lest anyone think I am picking on 23andme, they are the epitome of an open site, providing all the information I needed and obviously stating in very open and transparent terms exactly what is going on. They appear to possess as much integrity as any organization I can find out about online. The main benefit of 23andme is the database being generated. They are open about sharing this information for research pirposes. Their blog is a constant source of very interesting science. I would not hesitate in giving them a sample because I would not care at all what the results really were. My sample really helps them more than it helps me. I actually kind of like their mission, but I am a well-educated guy who knows that the Earth goes around the Sun in a year. I am not an average customer.]

Of course, this is another example of the rapidly changing world hitting up against regulations that were developed for something completely different. It took some time for DNA fingerprinting techniques to be readily accepted and even then individual labs screw up without proper oversight. While I do not believe the same sorts of consequences to arise from genotyping, I expect that oversight will be worked out because the technology is just too easy to use.

This collision of old and new smacks of the similar problems copyright has when dealing with the digital world. The game has changed and bureaucracies that were created to deal with old style medicine will have to learn how to deal with this newfangled biotech. Because it is likely that in just a few years, it will be possible to sequence a person’s entire genome in perhaps a single day for under $1000.

Garage Biotech will really be in full swing. How will the California Department of Public Health deal with that?!

Technorati Tags: , , , ,

Advertisements

3 thoughts on “Some bumps in the road

  1. That is nice data to see. Reproducibility and robustness are important aspects of this technology.

    Now they just need to convince some bureaucrats.

Comments are closed.