Compute intensive requirements of fMRI research

Background

Laura is a professor in the Psychology department at California Polytechnic State University and does research investigating how the various senses (vision, touch, hearing, etc.) work together and separately to produce an understanding of the world around us. Her research focuses on both how we perceive and remember the objects in our environment, as well as their spatial relationships. Her research helps support how vision is typically the sense on which we rely the most for these cognitive functions since it provides us with the fastest and richest processing of an object or scene. To conduct this research, Laura places subjects into a functional MRI machine and has them focus on different objects while taking measurements to determine which parts of the brain are active during perception. The result is raw fMRI files which need to be processed to build a 3 dimensional map of the brain to illustrate this activity.  

Problem:

 

Laura needs compute resource to process the resulting MRI files.  She is able to use open source software to process this data on a high powered iMac desktop machine but does the process manually.  Her current process has the following constraints:

  • Each subject can take anywhere between 12 - 90 hours depending on the data
  • These files are processed serially as to not impose too much memory and compute burden on the same machine she uses for daily work
  • This process is prone to errors and when a subject fails she has to reprocess them


In the end for her to process all 30 subjects combined with other work activity the whole process took 6 months to complete. 

Solution:

Laura engaged the Cloud Computing Research Committee on campus to see if her research could benefit from cloud technology.  It turns out, this is an ideal use case for cloud compute.  The ability to leverage larger machines that can parallelize the problem is just what she needed.  Laura then worked with Cal Poly technology engineers to process the compute in the AWS cloud.  Building off the work done by Paul Wighton at Harvard University we were able to use AWS Batch, a customized Docker container and Amazon's Elastic Container Registry to install the open source software on a Docker container and configure the container pull the raw data from a S3 bucket and then store the post processed files back on S3.  AWS Batch excels at running jobs like this on the right sized instance.  As the job queue began to increase AWS determined a larger instance was needed to process the Docker jobs accordingly.  AWS Batch automatically scaled the resources down after the jobs completed.  Since we ran all jobs in parallel we were able to process all 30 subjects within the time it took to run the longest subject!  The end result was after 90 hours and $284.56 of AWS resource costs we complete a task that took 6 months in just 90 hours!

Future Improvements:

The FreeSurfer program that processes the raw fMRI data runs in a single thread.  A newer version of the program exist that support both GPU cores and using multiple threads.  In our experiment, we were unsuccessful in the first pass to get either of these options to work.  Future iterations will focus on debugging the failure of those runs.

ReUsable Architecture:

This framework could easily be adapted to any workload that is compute intensive and will run inside a Docker container.  The only work involved would be to configure the Docker container for your specific use case and push that customized container to AWS ECR.

Questions?

We are happy to share this solution with others that are interested.

Contact: Darren Kraker dkraker@calpoly.edu

Related Content