
It was late 2015, when a DNA sequencing company based in Silicon Valley, collaborated with MediaAgility to develop a robust genetic data analysis and discovery platform. The company is dedicated to simplifying genetic testing while making it affordable. Their aim is to make genetic information globally available to help blood banks, hospitals, commercial laboratories, government agencies, and alternate care testing sites.
In early 2016, MediaAgility delivered a cloud based genetic data analysis and discovery platform on Amazon Web Services (AWS). The solution organizes analytical data derived from experiments run by scientists in an easily accessible, organized manner. The solution has smart filters to analyze data and monitor progress of experiments through various stages of analysis.
DNA sequencing of a single human genome can potentially produce several hundred gigabytes of data. Hence, their experimental data grew to petabytes within months. Within 6 months from implementation, the storage platform’s extensively rocketing costs alarmed the customer.
Their entire technical team was perplexed with rapidly increasing data and associated costs. They approached cloud experts at MediaAgility and expressed their concerns about the emerging requirement of achieving an optimized and cost effective DNA sequencing process. They noticed they were bearing almost $30,000 per month in May, which hiked to nearly $50,000 per month by June 2016!
Here’s how MediaAgility curbed their cloud storage costs…
Costs were growing exponentially. In pursuit to control increasing costs, data analytics team at MediaAgility started talking to scientific groups and researchers at the customer’s organization. The team found out that after conducting research and adding annotations, ‘Raw experimental data’ was the only dataset category that was rarely used after a certain amount of time, yet, was consuming massive storage.
The MediaAgility team devised a robust data storage and processing mechanism where the system asked scientists to mark / flag data they wanted to save and archive. With the use of Data Object Lifecycle Management configurations, the team identified and downgraded the storage class of experiments data that was not accessed within last 365 days to Coldline Storage and moved experiments data, which was accessed at the frequency of once a month to multiple times in a year, to Nearline Storage. The mechanism helped customer optimize their data computing process and helped them save approximately $80,000 on the storage front.
What followed next? Well, a comparison.
We did not stop at curbing storage costs. We decided to do a quick comparison between Google Cloud Platform with Amazon Web Services to let our solution run on most suitable computing platform.
Here’s a quick dissection of the story, or say, our observations when we compared the two major computing platforms for this particular solution :