A quick word about how this sequencing thing works, then. A strand of DNA is made up of four different nucleotides, A, C, T, and G. They're mixed together pretty evenly, with a slight excess of A's and T's. When there's a stretch of one nucleotide repeated, say 8 A's in a row, that can cause problems when you want to replicate the strand. The polymerase (the enzyme that makes new DNA) can lose count, and do only 7 before going on to whatever comes next. Isolated polymerases in the laboratory make lots of that kind of mistake, and although the replication machinery in our cells (which includes the polymerase but also dozens of other proteins in an enormous complex that proofreads and makes sure the replication is faithful to the old strand to be copied) is quite good at making good copies, these 'homopolymer' sequences are often the source of mutations. A mistake is made, a base lost, and everything that comes after is out of phase.
So when I'm looking for mutations in a gene to tell a family where their cancer risk comes from, I pay special attention to these homopolymers.
Our new sequencer reads sequence by a trick that means each base incorporated into the strand being synthesized emits a certain amount of light. One base, one unit of light. Two bases of the same kind, two light units. The counting is pretty good through 5 bases, but once you get up to 7 or 8, it's hard to tell exactly how many units you have. Ten identical bases in a row, and the machine doesn't have time to get through all that in a single cycle, and so you just can't read 10 or more.
Our sequencer also reads each molecule of DNA that you give it one molecule at a time. Each one is called a 'read'. The software lines up the reads against the reference sequence and tells you how many of what kind of bases you have from start to finish.
Let me show you its counting problem.
The software knows that you have to have a whole number of bases. 7 or 8. No such thing as 7.6 bases. So it takes all of the reads with intermediate values for the number of bases in a series, and rounds off to the nearest whole base.
The software knows that you have to have a whole number of bases. 7 or 8. No such thing as 7.6 bases. So it takes all of the reads with intermediate values for the number of bases in a series, and rounds off to the nearest whole base.



This is just exactly what I expect from a real sample with a deletion of one base on one of her two copies of the gene. Ironically, the commercial software gives me exactly the same values as the preceding example! The second experiment to confirm or reject the proposed mutation is underway, but it looks pretty good. Two populations of values. The difference between the peaks is not exactly one base (it's 0.8), but it's pretty close. If this sample doesn't have a mutation, we'll be having some serious talks about the technique. I have some other examples of real deletion mutations, and they look just like this.

No mutation.
All the DNA has 8 bases.
Not 7. Not 6.
.
sigh.
.
4 comments:
my brain hurts
I found this really interesting and it just goes to show that you can't just make assumptions based on one experiment. I guess with something as potentially life-changing to people as a indication of cancer would be, it pays to be super careful not to give out false positives just as much as to fail to spot genuine mutations.
I like the light blue and the dark blue colours on the graphs.
!"*&%^$@#~?<kZQ
Post a Comment