An Instagram post here, a YouTube video there, a few thousand financial transactions over there. We live in the world of big data — and it’s only going to get bigger. In fact, the accumulation of data worldwide is expected to hit 44 trillion gigabytes by 2020. That’s enough to fill the memory of a pile of iPads reaching to the moon and back three times – and halfway there again.
The technology we have to store all those pieces of digital information hasn’t kept up with the rate at which we’re producing it. So some computer scientists are going back to the basics, looking to the very building blocks of nature for a solution. That’s right: they want to encode digital data into DNA. And a study released this week by University of Washington researchers demonstrates a system for how this can happen on a broad scale.
Most data today is stored in the form of disks or tapes, which can take up a lot of space, and which last at most a few decades. But DNA — the molecule that carries the genetic code for the development and functioning of every living organism — could be stored as a small drop on a thin piece of paper, and can last for hundreds, if not thousands, of years. By using DNA in place of our current storage methods, “researchers could shrink the space needed to store digital data that today would fill a Walmart supercenter down to the size of a sugar cube,” according to a press release sent out on Thursday.
Each DNA molecule is composed of strands that are made up of a sequence of the four basic units called nucleotides: adenine (A), cytosine (C), guanine (G), and thymine (T). But DNA isn’t sacrosanct to life: “DNA is also just a chemical,” study co-author Georg Seelig told Crosscut. And it’s a chemical that we have the ability to synthesize, nucleotide by nucleotide, into a sequence that, instead of encoding for something living, can provide the instructions for assembling the video from a court trial, say, or a cute cat photo.
In fact, the UW research team did just that, coding the photograph of a kitten featured at the top of this story into DNA, and then extracting it again.
Physicist Mikhail Neiman first proposed the idea of storing digital information in the form of DNA in 1965, but it wasn’t until 1999 that researchers were finally able to do it (scientists encoded the message, “the enemy’s masterpiece of espionage” — a reference to a means of concealing messages developed by the Germans in WWII). Since then, several other teams have encoded and sequenced DNA for digital files, by assigning the binary values of 0s and 1s to each of the four nucleotides.
The new UW study improves upon the current technology, and outlines a scheme or for an entire DNA-based archival storage system. As part of the study, the researchers successfully coded four images into DNA, and then read, or sequenced, the DNA, re-creating the images — including one of a cat. (“Computer science often uses images of cats for research,” study co-author Luis Ceze told Crosscut.)
The archival system outlined by the new study includes a DNA synthesizer to encode the data that is to be stored in DNA, a storage container, and a DNA sequencer that coverts the code back into digital data. One of the most notable things about the study, however, is that it describes a way to allow for “random access” to a particular piece of data that is encoded in a bigger pool.
“Let’s say there are a few things in your pool of DNA — a photo, some music — and you want to take just one thing out,” Seelig explained. By using PCR, the polymerase chain reaction, the researchers figured out how to hone in on that particular photo, by amplifying only the desired data.
Still, we’re not talking about replacing your everyday flash drive here. Reading the DNA strands is a process that still takes at least a day or two. “It would be good for storing anything you want to keep for a long time, but not read all the time,” Seelig said.
For example, the researchers also encoded and retrieved data that includes interviews from the Rwandan war crime tribunal. It is mandated these videos be maintained, but they will likely only be looked at on rare occasions when requested by a lawyer or other personnel — making them a prime case for this long-term storage method.
There is a long way to go until storing digital information in the form of DNA really takes off, however: It is still a costly, time-intensive process. But Seelig is hopeful that advances will be made quickly, as the science also directly benefits the biotechnology industry.
For example, even the rate at which we’re deciphering biological DNA is adding to the problem of data storage: More human genomes are being sequenced on a daily basis than researchers have been able to store.
The solution to that problem may be found surprisingly close to the problem itself.