Who needs pen drives and hard disks when you can just store all the data in the world in your DNA

By Akshaiyaa V S

Technology has rapidly evolved in the past decade and we are unable to keep up with its pace. With advancements in technology, more data is being produced and data storage is the main issue to be addressed, because what is the use of producing so much of data if we are not able to store them for future purposes?

DNA for data storage

As bizarre as it may sound, storing data in DNA is becoming a trendsetter. DNAs are naturally equipped with the efficiency to store genetic information. For Nick Goldman, who introduced this concept, the ‘Eureka’ moment was at a hotel in Hamburg, Germany, where he was talking with some of his bioinformaticist peers about how they could afford to store the reams of genome sequences.

Digital data is being produced at an unstoppable rate and storing them in the conventional computing technology is frustrating, given its limitations and the increasing expenses. Nick says, “That’s when we thought, –What’s to stop us using DNA to store information

DNA is much denser than the modern storage media like hard disks. Much more data can be stored in a small pack of DNA than in thousands of hard disks. They are also more durable as they can last for decades or even centuries when stored in the right environment unlike hard drives and flash drives which sometimes crash without any warning. Magnetic tapes and DVDs may survive for a long time, but DNAs are close to immortal! This fact reminds me of a movie called “Artificial Intelligence” where, in the future, a group of people find a robot fully intact even after centuries have passed and brought him back to life to ask him questions. Being ultracompact enables easy storage and using DNA reduces the error rate to one-billionth. Also, we all know that hard drives and DVDs are becoming more obsolete day by day. With DNA, becoming obsolete is out of the question. Equipped with so many advantages, it’s high time that we start using it for data storage at a much larger scale.

Within just two years of research, Nick and his colleague Birney successfully encoded five files, consisting of Shakespeare’s Sonnets and a part of Martin Luther King’s famous “I have a dream” speech. This comprised of about 739 kilobytes. In July 2016, Microsoft successfully encoded information up to 200 megabytes. This file contained high definition videos, texts and also audio.

It is estimated that the digital archives will go up to a whopping 44 trillion gigabytes by the year 2040. If the ongoing research enables data to be packed in DNA as dense as the bacteria E. coli, the whole world’s data could be stored in just about a kilogram of DNA. But one huge drawback is that storing data in it is quite expensive and also time-consuming. Now, if we find ways to bend around the drawbacks and improvise on the advantages, it would be a revolution!

Data storage

The conventional storage devices like hard drives and DVDs store data by making use of and changing the magnetic, electrical or optical properties of a material to store binary values. Storing data in DNA doesn’t have much difference. We all have learnt in high school that DNA molecules consist of smaller molecules called nucleotides which are of 4 types – adenine(A), cytosine(C), thymine(T) and guanine(G). Just like we use the binary values (0s and 1s) in the electronic media, DNA uses the nucleotide sequences to store data.

Assigning patterns to the DNA nucleotides is the general idea for storage. The most commonly used encoding technique is the Huffman coding, where the data is converted into binary codes. These binary codes are then converted into the A, T, G, C sequence and are then mixed in a solution of other chemicals that specify the order of reactions in the the strands. This process is called DNA encoding and it brings an added advantage as backup copies are also automatically created. The converted DNA code is then used to produce strands of DNA and finally given for cold storage. Cold and dry storage protects them from humidity and light which might induce unwanted reactions.

One important point to note is that we need to make sure we use shorter strands of DNA as chemically building a long strand of a DNA is arduous. Using shorter strands and assigning keys or labels to them enables easy access and correct order of storage. one thing to note here is that data cannot be rewritten on DNA so if any changes are required, we will have to create a new sequence.

Other useful encoding techniques include ‘The comma code’ where the frames are separated by commas, The Alternating Code and the improved Huffman coding scheme.

Data retrieval

Whenever data retrieval is required, the strands are removed from the cold storage. Decoding is done using a standard DNA sequencing machine, after which it is converted into the binary format. We use primers for the polymerase chain to amplify DNA snippets belonging to a particular file, though doing that specifically for one part is still a challenge as of now. Even if the backup copies are erased, we can duplicate the copies by DNA multiplication just like nature does it all the time. Parallel reading of files is possible and this enables multiple files to be read at the same time.

Random access is the cue for practically using DNA as memory, but until now, it has been successful with data around only 0.15 megabytes. Recently the research team from Microsoft has improvised on this and according to their report, random access is now possible for 400 megabytes of data which was encoded in DNA with no errors.

Potential risks

To see how the DNA strands react to the security vulnerability, researchers included a well known and well manageable malware inside it. With the malware’s help, security experts were able to penetrate and exploit the information stored in the DNA, the results of which were published. Basically, the easiest way to exploit it is by inserting a malware into the DNA sample which is sent to the sequencing lab to get back information. When asked about its possible consequences, the security experts assured that such penetrations are practically not so easy to pull off. They also assured that researchers are currently going on to avoid the “potential” threats from becoming into “actual” threats.

Future scope

DNA storage is still in the experimental phase. The main challenges include making it completely automated and improvising the method by which it reads data. As of now, we can write only a few hundred bytes per second, in contrast with a hard drive which can write hundreds of millions of bytes per second. Even a simple task such as saving a text takes several hours in a DNA strand while it is done in a fraction of a second in case of modern devices. Based on the current situation, since expenses are also reducing, we can expect a fully functional DNA storage system by 2030.

ScienceTechnology