Forbes recently published an article about the $62M Misfire MD Anderson had in its project to use IBM Watson designed to integrate the knowledge of MD Anderson’s clinicians and researchers to treat patients with the most effective, safe and evidence-based standard of care available. Creation of machine learning models is a small part of the overall process in performing data analysis or data science. In the zeal to incorporate AI into healthcare and other industries, several lessons stand out to me that we have learned when working with clients who want to leverage data science.
1. Understand the subject matter, previous knowledge and goals. Many companies are undecided about their targets and what they want to solve. Ambiguous goals lead to ambiguous results. When the goals are unclear, much time is wasted and projects take on a trial-and-error approach. You often have more things to try than you can possibly implement. In the case of MD Anderson, the original target switched from leukemia to lung cancer, and a change in medical record software rendered Watson incompatible to work with the data.
2. Data integration, selection, cleaning and pre-processing is a critical and often time-consuming part. It is important to use high quality data. The more data you have the more potential for challenges because data is inherently dirty. You need domain expertise to deal with data gaps, duplicates, incorrect formatting and a host of other problems. Since data science allows you to take in many types of data, companies need to clean up data that they may not have paid close attention to in the past. Ask any data scientist and they will tell you they spend 80% of their time prepping the data just to begin doing data science. Knowledgeable experts on your staff or outside sources who understand how to prep the data for machine learning is a critical step that is often overlooked by most companies. The cleaner and the more machine learning prepped the data, the more time data scientist actually use to create the algorithms.
3. Data mining techniques come in two forms: supervised and unsupervised. Both categories provide capabilities to find hidden patterns in different data sets. Supervised data mining techniques are used when you have a specific target value you would like to predict about your data. The output datasets are known and are used to train the machine to get the desired results. In unsupervised learning no datasets are provided, instead the data is clustered into different classes. With unsupervised learning, the target is not known but the results can be compelling. Supervised learning is the most mature and the most studied and the type of learning used by most machine learning algorithms. Learning with supervision is much easier than learning without supervision. Understanding these different approaches helps companies set the proper expectations and goals around results.
4. Consolidating results and taking action on the results are two different things. Many companies do not take action on their results. Much of the machine learning remains in the lab. Companies often have vigorous debate about the findings from data science and what to do with the results. Operationalizing findings, leadership skills to understand results, and how to best take action is an evolving process for many companies. Data science sometimes poses new questions and truly strategic thinking is required from leadership teams to understand how to make operational and business process changes to take advantage of the information in a meaningful way.
5. Start slow and pick a smaller project. Many companies want to jump on the data science bandwagon but the technology is new and evolving. Educating your entire organization about data science, the types of data you want to evaluate, the amount of time to prep the data and creating feedback loops between the data science findings and the business stakeholders is much less disruptive when you give the entire organization time to learn with a smaller project.
Data science and in particular machine learning hold a lot of promise for all industries including healthcare. Artificial intelligence is no magic bullet but we want it to be. MD Anderson indicated the project did achieve operational objectives however largely missing in all the coverage about this project was the lack of evidence that the technology improved patient outcomes, lowered costs, or provided some other benefit. I hope that these tips help before your organization has a costly data science misfire.