Science Beat: What major technical advances have allowed this
new way of working?
Hawkins: It's all been due to the Human Genome Project. We built
these very large, industrial-scale processors because we were desperate
to sequence the human genome, but now we're sitting on the ability to
sequence hundreds of bases every second. And that pretty much doubles
every six months. We work closely with outside technology developers,
and the spin-off for us is that we get their new machines and new reagents
cheaper and faster.
There are two big areas we're focusing on today. One is functional genomics;
the other is the computing area.
Rokhsar: Comparative genomics is like the Rosetta stone. You see
these ancient texts, and you know they must mean something, but how do
you know what they mean?
Science Beat: You mean by comparing those languages with languages
you can understand?
Rokhsar: Yes, but the difference between the Rosetta stone and
the genomic situation is that these aren't dead languages. They're living
inside the bodies of these animals.
Now that we can generate all this sequence, it puts a lot more pressure
on us to assemble it. Our JAZZ assembler [a computer program] was developed
here over about a year. Its claim to fame is that it's the only assembler
in the public domain and it takes into account aspects of the data
that other assemblers simply ignore, for example the fact that we sequence
from both ends of a piece of DNA. Which is, again, a technology that JGI
has employed from the very beginning.
Science Beat: And when you have that assembly, you have to figure
out how it all functions.
Richardson: Trying to figure out the functions of genes includes
looking at things like alternative splicing. Remarkably, there's a single
gene of Drosophila that has 38,000 different splice variants! It's
a very large gene with many exons [coding sequences], and it directs neuronal
growth. By shuffling these exons around, and making different mRNA, which
leads to different proteins, it's able to direct those neurons to different
parts of the body. You can say that's one gene, but it's performing many
different functions.
Functional genomics also includes looking at gene expression: what are
the cues that turn genes on. And we're also getting into an area that
is now called proteomics which is the next step after functional
genomics actually getting right down to the biochemical function
of the proteins.
One of the first things we're doing, because we're interested in regulatory
translations, is looking at how expressed proteins bind to DNA, and getting
an idea of what genes they might affect.
Science Beat: What are the next steps for these comparative-genome
projects?
Hawkins: The squirt is just the start. We have other genomes in
mind that will help us compare with some of the networks that we've found
in the sea squirt. One of the next genomes that we're seriously considering
is the frog, Xenopus tropicalis.
When you look at a sea squirt you don't think of human development, but
when you look at a frog embryo, there's a lot of similarity between a
frog embryo and a human embryo. So if you begin to see something in the
sea squirt, and then begin to see it in the frog, and then someone else
starts to see it in the mouse, you can say, well this is probably going
to be present in the human.
Richardson: In Ciona, we're trying to get an idea of which
genes function in the patterning of an embryo and the development of the
notochord [a precursor of the spinal chord, found in some primitive animals
and in vertebrate embryos]. And those will have homologs, or genes that
are similar, in every other organism that has a similar body plan. In
a more primitive organism the same gene may carry out the functions of,
say, two, three, or four genes in a more specialized organism. Because
oftentimes those genes may have been duplicated or even reduplicated.
Rokhsar: In computation, the next hurdle is allowing people to
make use of this comparative data and all the other data. As people
use those elements in their own experiments, they're going to bring information
back that's much more diverse than just the sequence of As, Cs, Ts, and
Gs [the nucleotide bases of DNA whose order determines genetic coding].
You want to be able to interrogate the whole system and ask questions
at the meta-level, and not just be tied to finding a particular gene.
|
|
|
|
Hawkins: With sequence production, before now, only the "bravest
and fittest" were able to interpret this data. We want to make it
so even your grandmother can use it.
As we look forward to producing other kinds of data, the burden is on
us to produce tools to allow people to get the most out of the data sets.
Additional information:
|