Comparative Genomics at the Joint Genome Institute: an interview

January 14, 2002

		Part 2: Comparative genomics at the JGI— it all started with the Human Genome

		Contact: Paul Preuss, paul_preuss@lbl.gov

Science Beat: What major technical advances have allowed this new way of working?

Hawkins: It's all been due to the Human Genome Project. We built these very large, industrial-scale processors because we were desperate to sequence the human genome, but now we're sitting on the ability to sequence hundreds of bases every second. And that pretty much doubles every six months. We work closely with outside technology developers, and the spin-off for us is that we get their new machines and new reagents cheaper and faster.

There are two big areas we're focusing on today. One is functional genomics; the other is the computing area.

Rokhsar: Comparative genomics is like the Rosetta stone. You see these ancient texts, and you know they must mean something, but how do you know what they mean?

Science Beat: You mean by comparing those languages with languages you can understand?

Rokhsar: Yes, but the difference between the Rosetta stone and the genomic situation is that these aren't dead languages. They're living inside the bodies of these animals.

Now that we can generate all this sequence, it puts a lot more pressure on us to assemble it. Our JAZZ assembler [a computer program] was developed here over about a year. Its claim to fame is that it's the only assembler in the public domain — and it takes into account aspects of the data that other assemblers simply ignore, for example the fact that we sequence from both ends of a piece of DNA. Which is, again, a technology that JGI has employed from the very beginning.

Science Beat: And when you have that assembly, you have to figure out how it all functions.

Richardson: Trying to figure out the functions of genes includes looking at things like alternative splicing. Remarkably, there's a single gene of Drosophila that has 38,000 different splice variants! It's a very large gene with many exons [coding sequences], and it directs neuronal growth. By shuffling these exons around, and making different mRNA, which leads to different proteins, it's able to direct those neurons to different parts of the body. You can say that's one gene, but it's performing many different functions.

Functional genomics also includes looking at gene expression: what are the cues that turn genes on. And we're also getting into an area that is now called proteomics — which is the next step after functional genomics — actually getting right down to the biochemical function of the proteins.

One of the first things we're doing, because we're interested in regulatory translations, is looking at how expressed proteins bind to DNA, and getting an idea of what genes they might affect.

Science Beat: What are the next steps for these comparative-genome projects?

Hawkins: The squirt is just the start. We have other genomes in mind that will help us compare with some of the networks that we've found in the sea squirt. One of the next genomes that we're seriously considering is the frog, Xenopus tropicalis.

When you look at a sea squirt you don't think of human development, but when you look at a frog embryo, there's a lot of similarity between a frog embryo and a human embryo. So if you begin to see something in the sea squirt, and then begin to see it in the frog, and then someone else starts to see it in the mouse, you can say, well this is probably going to be present in the human.

Richardson: In Ciona, we're trying to get an idea of which genes function in the patterning of an embryo and the development of the notochord [a precursor of the spinal chord, found in some primitive animals and in vertebrate embryos]. And those will have homologs, or genes that are similar, in every other organism that has a similar body plan. In a more primitive organism the same gene may carry out the functions of, say, two, three, or four genes in a more specialized organism. Because oftentimes those genes may have been duplicated or even reduplicated.

Rokhsar: In computation, the next hurdle is allowing people to make use of this comparative data — and all the other data. As people use those elements in their own experiments, they're going to bring information back that's much more diverse than just the sequence of As, Cs, Ts, and Gs [the nucleotide bases of DNA whose order determines genetic coding]. You want to be able to interrogate the whole system and ask questions at the meta-level, and not just be tied to finding a particular gene.

Hawkins: With sequence production, before now, only the "bravest and fittest" were able to interpret this data. We want to make it so even your grandmother can use it.

As we look forward to producing other kinds of data, the burden is on us to produce tools to allow people to get the most out of the data sets.

Additional information:

More about JGI and its programs