It was years ago, Richard Gilder said, that he first met Clementine, one of the world’s most popular data-mining software tools. Its analytical capabilities intrigued him, but at $300,000 for one license for one person, it cost way too much.
About three years ago, the Baylor Health Care System biostatistician said, he recognized Clementine again, this time embedded in the code for IBM’s high-powered analytical and predictive program called Modeler.
At $50,000, the price was affordable. Paired with a $20,000 customized computer from Dell, the program is now a key piece of Texas-based Baylor’s new data-mining laboratory, part of what some are calling a big data revolution that is transforming American business, including health care.
Striving to save money and improve care, some providers are taking advantage of powerful and affordable computer hardware and software that can turn an avalanche of data into knowledge. In the past, executives say, the healthcare industry tended to be data-rich but information-poor.
Big data is also enabling ever more ambitious medical research. In a joint venture between Children’s Medical Center Dallas and the University of Texas-Southwestern Medical Center, Sean Morrison and his team are sequencing the genetic material in melanoma tumors, hoping to find a way to prevent the most deadly of the cancer cells from spreading.
It wasn’t long ago, Morrison said, that some thought such science was impossible because there were no computers big enough. “We are opening profound new opportunities,” he said.
UT-Southwestern is even establishing a new department of bioinformatics, to try to maximize the impact on biomedical research of all the big data-related technology advances.
This is a critical time for the healthcare industry. Providers face significant cuts in reimbursements as the Affordable Care Act takes full effect. All are under intense pressure to raise quality and lower costs.
“Whether you survive or not is going to be far more objectively determined than it is today,” said Daniel Varga, the chief clinical officer for Arlington, Texas-based Texas Health Resources. Reputations and market share won’t mean as much as they have in the past. Survival will depend upon results. Using big data, providers hope to improve those results.
Already, because of an analysis by its data lab, Baylor is changing nurse staffing at some hospitals to improve patient satisfaction.
Texas Health, in partnership with Healthways, is combining clinical and insurance claims information to determine its high-risk patients of tomorrow and offer them customized interventions today, before serious problems occur.
Methodist Health System in the Dallas-area is analyzing accountable-care organization claims from 14,000 Medicare beneficiaries and 6,000 employees to predict who among them is most likely to need high-cost care in the future.
“We are really just beginning to unlock the potential of analytics,” said Melissa Gerdes, chief medical officer of outpatient services and accountable-care organization strategy at Methodist.
What is big data? The term can mean different things to different people.
In their recent book of that title, Viktor Mayer-Schonberger and Kenneth Cukier say that because of the improvements in high-performance digital technologies, researchers can use literally all the data available in their areas of interest rather than relying on smaller sample sets.
“More trumps less,” they write. As more and more data go into machine-learning algorithms, performance improves dramatically.
One example of more-data-is-better occurred recently when The Journal of the American Medical Association published an analysis of the profit margins for all of Texas Health’s 30,000-plus surgery inpatients during 2010. Operative word: all.
The study found that hospitals benefit financially from preventable complications, which extend stays. That finding was suspected, but previous efforts to prove the thesis were “limited by use of small data sets” or simplified approaches, the study said.
With big data methods, researchers use vast data sets and computers that operate in a massively parallel fashion, meaning a number of processors run coordinated tasks.
A typical Excel spreadsheet can contain up to about a million rows and 16,000 columns of information. In a big data analysis, Gilder said, data potentially could fill millions of rows and billions of columns.
“A regular desktop computer would churn three days and then flash the blue screen of death,” Gilder said. He then nodded at the Dell. “With this thing, it may run two days, but you get an answer.”