Q&A: In a flash: How to build an enterprise imaging AI infrastructure

Building the infrastructure to support the accelerating adoption of AI in healthcare is the mission of Pure Storage and its FlashBlade technology, an all-flash scale-out object-based solution that can expand to petabytes of capacity. As Esteban Rubens says, infrastructure to power AI, machine learning and deep learning needs to be effortless, efficient and evergreen to ensure success today and into the future. Here’s how. 

Why is flash storage so important in imaging and research?

Esteban Rubens: If you look around, flash storage is everywhere. Flash is where storage is going, so that’s where the research and innovation are happening. Hard drives were great in the ‘90s and in the ‘00s, but now they’ve plateaued in terms of bit density and performance. The obvious next step for storage is flash. Performance and low latency are key in medical imaging and in research, and you get that from flash.

We see healthcare facilities using flash these days because the access patterns required for AI research in healthcare are massively parallel. So the same way they need massively-parallel compute, which is what GPUs give you, they need massively-parallel storage. The way to feed those GPU pipelines is with something that looks like GPU, but on the storage side, and that’s what FlashBlade is. It can get very busy, but doesn’t get slower.

What impact will AI and machine learning have on storage?  

We see artificial intelligence changing the way healthcare facilities look at infrastructure. That’s because the requirements of AI-driven healthcare IT are totally different than “traditional” applications. It all goes back to parallelism—that is what really sets FlashBlade apart.

Always-on data reduction is important too because it doesn’t make sense to waste capacity storing things that can be deduplicated or compressed. In a healthcare AI world in which we’re having to store so much data to train models, like image and genomics data, you have to be smart about how you consume your storage.

What are healthcare systems looking for in infrastructure for today and the future?

Progressive health systems want the flexibility to be able to deploy infrastructure today knowing that it will still be useful with the new world of AI. For instance, we support the S3 protocol in FlashBlade for local storage. It’s now becoming a de facto standard. That’s what modern developers want plus it offers investment protection and flexibility running alongside traditional file protocols such as NFS and SMB.

Healthcare organizations often find themselves being asked to do more with less. How does that impact data storage?

Simplicity matters here too. IT management doesn’t want to dedicate full-time employees to tuning storage like they’ve done in the past. We make sure the storage devices are automatically optimizing themselves. It just happens, and then IT folks can take care of the important things, such as making sure that critical applications are running and clinicians can get the data they need at the time they need it. With Pure, you are guaranteed to always be running on the latest and greatest we offer through our subscription to innovation.

Why is GPU compute important to AI?

Health systems are going to use GPU compute for AI. It’s clear they want to keep those environments busy to avoid GPU starvation, which happens when GPUs are sitting idle waiting for data which is sub-optimal in terms of productivity and ROI. That’s why we partnered with Nvidia on AIRI. Nvidia recognizes that there’s not many storage platforms that can keep their GPUs busy. They wanted to be able to offer this all-in-one converged product with their GPU-based supercomputer, the DGX, and FlashBlade, plus the appropriate 100 gigabit-per-second networking, and curated software. It’s a turnkey solution for AI.