A New Design for Managing the Deluge of Research Data

If there’s one thing a top-10 public research university has in droves, it’s data.

As University of Minnesota researchers continue to push the boundaries of knowledge in their fields, they’re producing more and more data—several hundred terabytes a month, fueled in part by data-intensive practices like DNA sequencing, high-resolution imaging, and supercomputing. Data storage needs at the U were forecasted to more than double from 2016 to 2019.

In response to the looming need for data storage, a team of data experts from across colleges and units has been working to redesign the University’s approach to data management and ensure the U can keep up with the increasing demand. The recently formed University Storage Council (USC) aims to better coordinate and allocate data to improve the experience for researchers seeking reliable storage and make data storage more efficient and cost-effective. Experts on the council come from the Academic Health Center, many colleges, the Office of Information Technology (OIT), the Office of the Vice President for Research, the system campuses, and University Libraries.

Prior to the USC, there was no University-wide, big-picture approach to continually reviewing data storage needs and plan for future demand. Claudia Neuhauser, Ph.D., associate vice president for research and director of Research Computing, said the USC will help the U’s storage approach become more sustainable and collaborative.

“The need for data storage is increasing exponentially,” said Neuhauser, who helped lead the committee that assessed the U’s data storage needs and recommended the creation of the USC. “We have to help people manage their data. Creating a uniformly good experience for researchers—that’s really what the goal is.”

How much data does the University currently store? More than 45 petabytes in total, Neuhauser said. To put that number in perspective, consider that the average smartphone holds just 32 gigabytes. You would need 1.4 million smartphones to house what the U already stores. Stacked like pancakes, that many phones would tower about 35,000 feet in the air—cruising altitude for a commercial airliner.

The Right Fit for the Data

A key part of the storage redesign focuses on how to allocate data through the different storage providers and formats available at the U. Some projects need access to specific types of storage, such as high-performance computing or high-security storage, but these options are limited and costly. More expansive alternatives, like cloud storage through Google Drive, can provide the right fit for many researchers’ purposes while clearing up more specialized storage for those who need it.

These efficiencies can also help the USC assess where the greatest storage needs are and plan for upgrades and hardware replacement, said Michael Langhus, storage and data protection service owner in OIT and USC co-chair.

“Traditionally, data stored at the University has been managed across multiple colleges, departments, and units,” Langhus said. “By having all the storage providers working in a more coordinated and unified manner, not only will we will be able to better address gaps in our current services, but we will also reduce costs by eliminating overlaps.”

To improve storage efficiency without making things more complicated for researchers, the USC will launch the Storage Champion Program. The program will assign data storage experts, or “storage champions,” to each college and unit to find the best storage option for a given research project. Researchers may not know where to house their data, or may be familiar with one storage option when another would better fit their project. Soon, researchers will be able to describe their data to a storage champion and let them take care of the rest.

“Researchers want to do their research, not worry about data storage,” said Lisa Johnston, research data management/curation lead and codirector of the University Digital Conservancy in University Libraries, as well as a USC co-chair. “Our mission is to do a better job when helping researchers gain access to the best storage option to fit their particular need, and to make that happen as seamlessly as possible.”

Over the next 12 months, the University Storage Council will roll out the new storage approach in phases, including the launch of a web portal researchers can use to connect with their college or unit’s storage champion. The USC plans to communicate the changes with faculty and staff as these services become available.

Quick Facts: University Data Storage

  • The U currently stores an estimated 45 petabytes of data
  • Cloud storage is growing by 30 terabytes a month, with Google Drive alone holding more than 116 million files
  • High-performance computing storage currently holds 25 percent of all research data
  • The three largest storage providers at the U are OIT, the Minnesota Supercomputing Institute, and the St. Anthony Falls Lab/Polar Geospatial Center.