Can we use Big Data to improve health reporting?

Our project:

Ali and I built a tool that uses Big Data to help journalists who report on health see how their coverage of a subject (ie., diabetes) stacks up next to what actually kills people and the related research dollars invested. The idea was that the tool could give journalists a sense of whether they are covering health issues that actually affect their readers and also whether there are topics with lots of research being done that they might be under-covering.

Using the tool:

As a health journalist, I was interested in trying out the tool and seeing whether it could give me a sense of missed opportunities in my reporting or ways to improve  coverage.

For this demo, we used New York Times and Wall Street Journal data. I looked at the 2011 visualization for the New York Times, and imagined I was a journalist working at that paper. I found that while heart disease kills a lot of Americans, I was hardly covering it. Meanwhile, I was dedicating a great deal of ink to Parkinson’s disease, even though relatively few Americans are afflicted. Similarly, few die by suicide, yet it received the maximum media attention at my paper, while accidents—which kill many more people—got hardly any coverage. Chronic lower respiratory disease and stroke are both major killers in America that attract a sizable amount of research funding, yet we barely reported on these health subjects.

These gaps between mortality and media attention left me with ideas about how I might be able to diversify and strengthen my health journalism. It allowed me to reflect on where my focus was, and where my blind spots might be. I’d think about doing stories on accidents and heart disease, for example, and looking into possibilities for reporting on COPD, which got no coverage.

Still, this is just a demo with a lot of room for improvement.

Future directions:

1) We want to use DALYs data instead of mortality as our population health measure. DALYs—disability-adjusted life years—are a measure of overall disease burden that measure the number of life years lost due to poor health. They include a range of factors such as smoking, diet, pollution, cancer, depression and asthma. So DALYs would encompass a broader scope of health and lifestyle issues than the mortality measure.

2) We want to add other media outlets from around the world to the tool, and create a feature that allows users to upload data from their own websites to see how they compare.

3) We need to improve our user interface so that the information is displayed on one screen and there is no need to scroll down.

4) We want to build our database so users can compare media attention over a longer period of time and see how their focus is shifting over the years.

5) We need to make sure we are comparing apples to apples. On almost every measure, the Wall Street Journal appeared to have almost zero coverage next to the New York Times and we need to explore why this is happening.

6) We included measures health researchers might be interested in. For example, the ratio of research investment to mortality in the population. We hope to work with health researchers to ensure our methodology is robust enough, and invite them to use our tool for their studies.

How good is your health journalism? by Ali and Julia

As we have discussed in previous posts, health journalism isn’t always as diverse and accurate as it could be. While measuring quality in journalism is a difficult task, Ali and I wanted to create a tool that could allow journalists to see where there may be space for improvement or missed opportunities in their health coverage.

So we propose a tool that would allow journalists to compare the proportion of  coverage at the top 25 media outlets in the world on a given health subject to 1) the proportion of related research spending in a given year 2) and the relative health impact of that subject on the population.

For example, on our website, a user would select:
1) a news source, such as the New York Times;
2) a health subject, such as “diabetes;”

Then the website we build will spit out a visualization (probably a bubble chart) of how much coverage the disease garnered at that media source compared to how much money was invested in diabetes-related research globally and how much that health issue impacts quality of life compared to other factors. We could also rate the robustness of a news outlet’s coverage (ie, We could give a poor grade to a source that is severely under-covering an important health issue or subject for which there is a lot of research output).

Ali has access to a database of media stories from the world’s top 25 outlets and we will do key word searches in each source to find out how often the top disease burden factors are mentioned.

To measure health impact, we are using global data from the Institute of Health Metrics and Evaluation (IHME) about the relative impact on DALYs or disability-adjusted life years. This “is a measure of overall disease burden, expressed as the number of years lost due to ill-health, disability or early death” which gives a sense of which diseases or exposures lead to the most deaths” and DALYs include everything from obesity to cancer and pollution.

For the data on health-research investment, we are using global data from IHME, as well.

The data will be visualized in bubbles that allow users to easily see where there are gaps between coverage, research spending, and public-health impact. The user will get a sense of whether the health coverage at a given paper is actually representing related research and disease burden.

We hope that journalists and editors who cover health could use this tool to find missed opportunities and ideas for how to expand their work.



My dream activist-journalist website

With the shift toward “view from somewhere” journalism, we are probably going to see more activist-style stories permeate the news pages. So how do you engage readers who might want to do something about what they are reading?

Rather than designing a widget, I think the way to get readers talking and acting is to simply do a better job of helping them find content they might be interested in and that might provide ammunition for causes they already hold dear.

As we discussed in early classes, the media do a poor job of matching readers with content. In my experience, beyond simple algorithms at the news sites I wrote for, it was typically up to me and online editors to identify communities that would be interested in a new story and target them by emailing a link, tweeting at them with appropriate hashtags, or encouraging sources to share the story. This was time consuming and we weren’t always aware of the key players and interest groups on the subject—especially globally—or we didn’t have time to find their contact information.

This assignment made me wonder whether there is a systematic way to match journalists who write about particular causes to appropriate online communities, newsletters, interest groups, policymakers, discussion boards, Facebook pages, et cetera, so that they can better target their stories and mobilize people for action.

For example, I imagine a website for journalists that allows them to search key words related to a story, perhaps narrowing by geographic region. Let’s say I am writing about the need for transparency in clinical trials. I enter the words “transparency” and “clinical trials” and the website spits out contacts and Twitter handles for advocates of the AllTrials campaign, other journalists, health professionals, politicians, researchers, and activists who already write or talk about clinical trials transparency, and Facebook or other social pages related to the issue. It would be even better if the journalist could enter the entire text of her story, and the website could return all the appropriate related social content and contact information (ie., Twitter handles, Facebook pages, hashtags, etc.) so that the journalist could save time searching. I think this kind of resource would help journalists and editors better match their content with eager readers who are likely to care about and act upon a given story.




Posted in All

Journalism check-up: Are reporters doing a good job of covering health?

By Ali and Julia

It’s no secret that journalists fall into many traps when covering the contradictory and sometimes convoluted area of health research. As a 2013 Columbia Journalism Review article—titled ‘Survival of the Wrongest’—summed up: “Even while following what are considered the guidelines of good science reporting, (journalists) still manage to write articles that grossly mislead the public, often in ways that can lead to poor health decisions with catastrophic consequences.”

This can take the form of reporting science out of context, misinterpreting conclusions, or missing big stories all together. So we set out to gather data on the places where health journalism goes wrong.

We had a grim starting place: We looked at the leading causes of death in America and compared that to how well the most comprehensive national newspaper—The New York Times—covered related stories. We wanted to see whether public health issues that matter to people are under-reported.

First, we gathered mortality data from the CDC’s most recent National Vital Statistics Report, which included 2010 deaths:

Cause of death Number of deaths Percent of total deaths
All causes 2,468,435 100
Heart disease 597,689 24.2
Cancer 574,743 23.3
Chronic lower respiratory diseases 138,080 5.6
Stroke (cerebrovascular diseases) 129,476 5.2
Accidents (unintentional injuries) 120,859 4.9
Alzheimer’s disease 83,494 3.4
Diabetes 69,071 2.8
Nephritis, nephrotic syndrome, and nephrosis 50,476 2
Influenza and Pneumonia 50,097 2
Intentional self-harm (suicide) 38,364 1.6
Septicemia 34,812 1.4
Chronic liver disease and cirrhosis 31,903 1.3
Essential hypertension and hypertensive renal disease 26,634 1.1
Parkinson’s disease 22,032 0.9
Pneumonitis due to solids and liquids 17,011 0.7
All other causes 483,694 19.6

Here, the leading causes of death are represented in a bubble chart; the biggest bubbles relate to America’s leading killers: Heart disease, cancer, chronic lower respiratory disease, stroke, accidents, et cetera.  cause of death data

Then, we did a query in The New York Times corpus of key search terms related to the top 15 causes of death in America. Here, we found the number of 2010 stories which mention those key words:

Times stories in 2010 Keywords
1,630 “cancer”
1,470 “heart disease”
527 “diabetes”
456 “alzheimer”
331 “suicide”
216 “stroke”
214 “parkinson’s”
183 “accident”
121 “liver disease” “cirrhosis”
95 “influenza” “pneumonia”
88 “hypertension” “renal disease”
27 “respiratory diseases” “copd”
2 “nephritis”
1 “Septicemia”
1 “Pneumonitis”

We then created an index to represent the media attention focused on America’s leading killers. We did this by dividing the number of New York Times stories by the number of deaths in America and then multiplying that number by 100,000. So: (New York Times stories/deaths)*100,000. Here’s what we found:

Media attention index
Parkinson’s disease 971
Intentional self-harm (suicide) 863
Diabetes 763
Alzheimer’s disease 546
Chronic liver disease and cirrhosis 379
Essential hypertension and hypertensive renal disease. 330
Cancer 284
Heart disease 246
Influenza and Pneumonia 190
Stroke (cerebrovascular diseases) 167
Accidents (unintentional injuries) 151
Chronic lower respiratory diseases 20
Pneumonitis due to solids and liquids 6
Nephritis, nephrotic syndrome, and nephrosis 4
Septicemia 3

bubble of representation

As you can see, the big bubbles (Parkinson’s, suicide, diabetes, Alzheimer’s) suggest there’s a lot of coverage proportional to the number of deaths while barely visible bubbles mean these killers are under-covered by the media compared to mortality. If these data are correct, the third leading cause of death in America—COPD—is hardly covered in the newspaper nor was the fifth leading cause of death in America (accidents). Meanwhile, heart disease and cancer—the top killers—got relatively little attention when compared to Parkinson’s, Alzheimer’s, diabetes, and suicide.

So what does this mean?

The focus by the media on chronic diseases and diseases of aging—instead of, for example, accidents and COPD—probably reflects the interests of the more mature readership of the Times and the emphasis in newsrooms on “news you can use,” health journalism commentator Gary Schwitzer said.

He also offered another interpretation: This exercise may reflect the work of advocacy campaigns. Maybe, in this sample, advocacy groups for Parkinson’s, liver disease, suicide, flu, diabetes, Alzheimer’s, et cetera, were just that much more successful in priming the pump by getting stuff in the New York Times.”

What’s more, our data might not be representative. Schwitzer noted that searching by key terms could turn up spurious correlations. For example, “Suicide showing up as a key word may mean that it comes from all sorts of general news stories. That may not be comparable to stroke showing up as a keyword from a stroke study. Yes, it’s what’s in the paper, but it’s not necessarily a comparison of what health care/medical/science journalists chose to report on.”


Of course, our data have other limitations. In addition to the potential flaws of searching for key terms, we used New York Times coverage as a proxy for health coverage. As Schwitzer pointed out, “‘What we journalists cover’ doesn’t necessarily equate to ‘what the New York Times did.’ To some degree, yes, because of copycat journalism. But to a large degree, day in and day out, not so much.” Similarly, the data only reflect one year of coverage.

Health editor and Retraction Watch blogger Ivan Oransky wondered whether the quantity of studies on a given topic drive coverage. “There may simply be more studies and press releases about the subjects that New York Times are more likely to cover,” he said. “And if that’s the case, this is another good reminder why letting journals set the agenda can skew what reporters cover.”

Andre Picard, a long-time public health reporter at Canada’s Globe and Mail, asked whether reflecting causes of mortality was truly the best measure for quality health coverage: “Should our choice of story topics be based (or influenced) by the impact of a disease/condition on the impact of the population?”

Picard’s answer was ‘sort of.’ “We should base our story choices in part, on the impact of diseases/conditions on the population. But I’m not sure mortality is the best metric for judging impact and I’m really sure that we should pay a lot more attention to the causes of illness than to illnesses themselves. We do that a bit – smoking as a cause of heart disease and lung cancer, for example. But we tend to shy away from issues that don’t have medical treatments.” He added: “I think availability of treatments, more than anything else, influences our coverage.”

What may not get a lot of attention in the health news pages, even though it drives human health more than anything, are the “causes of the causes of disease” such as poverty, Picard said. “We know that income is the single biggest determinant of health, followed by education. But I’m betting ‘poverty’ wouldn’t even show up as a tiny blip on your chart of health story topics. The poor and uneducated are many times more likely to die of heart disease, cancer, COPD, suicide, car crashes, etc., you name it.”

Future research

We were also interested in seeing whether there’s a disconnect between public investment in research spending and mortality. To look at this question, we tallied the dollar amounts of research funding by disease category at the NIH in 2010, and compared those to the data on the top causes of death in America. We then created an index for the research/death ratios. The bigger bubbles—stroke, Parkinson’s, Alzheimer’s, heart disease—are areas with relatively more research funding compared to mortality. Again, diseases related to aging attracted funding, as did those related to cardiovascular health.

research spending

In summary, our findings raised more questions than answers. This exercise gave us a chance to reflect on what other metrics we could use to measure the quality of health journalism and better identify the gaps in health reporting. Considering the limitations of our data, we plan to gather a more robust data set so that we can be more confident in our findings and recommendations to journalists.

How big is the gap between health news and research?

For my data storytelling assignment, I’d like to see how health journalism coverage in the US compares to mortality data and health-research spending. I hope to tell a story about whether there is adequate coverage in the US media of things that harm and kill people here, and how that maps against where the government invests in health research to find out whether there are under-covered areas of science.

NIH has data on spending by category for 2010 and beyond:

CDC has mortality data by cause (most recent being 2010):

I hope to work with a programmer to access all news articles related to health from LexisNexus and use automated clustering to identify the most covered topics.

I’ll probably use Many Eyes to visualize the data in a bubble chart so that comparisons among research spending, journalism, and mortality can be easily made. But suggestions welcome!

Toronto gets a cat video festival

I was poking around news sites to find events I might want to cover for this assignment. Lo and behold, I found out that Toronto was going to host its first “cat video festival.” What’s more, none other than the Prime Minister’s wife, Laureen Harper, will appear at the event.

I made this short commentary in Zeega about the news and reaction to the news.

I also collected reactions on Twitter using Storify.

Here are some highlights:

twitter rx1 twitter rxn3 twitter rxn2

Notes about process: Both Zeega and Storify were extremely intuitive and easy to use. I do, however, regret the fact that I could not embed cat videos in Zeega.


Julia’s 4-hour challenge: Do people expect more from marriage today?


At the Association for the Advancement of Science conference—a general science meeting taking place in Chicago this week—psychologist Eli Finkel from Northwestern University delivered a talk about how marriage has changed over time. He was mainly focused on debunking the notion that people expect more from marriage now than they ever have. “It’s not about more or less,” he said. “It’s about where you are in Maslow’s hierarchy.”

Finkel used psychologist Abraham Maslow’s famous theory about the “hierarchy of needs” to explain how marriage has evolved. Pre-industrialization, pairing off was a pragmatic endeavor. “What people looked for at that time from marriage was ability to achieve things like food production,” he said. Marriage satisfied  basic needs, such as safety and hunger, at the bottom of Maslow’s hierarchy.

As Americans began to move from farms to cities, their marriages moved up the hierarchy of needs. Now that basic requirements were met, the “companionate marriage” became more common: people could think about linking up for love and belonging.

In 1960s, the birth control pill emerged and civil rights movements brewed. There were a “slew of countercultural revolutions” and increasingly, people began to look to marriage as a mechanism for personal growth. Finkel quoted the sociologist Robert N. Bellah:  love is “the mutual exploration of infinitely rich, complex and exciting selves.” Through marriage, said Finkel, “All of us want to become our own unique special butterfly… We want to discover who it is we are and become the best version of ourselves.”

To illustrate Finkel’s talk, I used Venngage to make this infographic. (You need to click through to see the full picture.) It took me less than three hours to watch the talk and produce the infographic. Venngage was very efficient but I only realized after using the service that you must subscribe in order to properly export whatever image you create.

P.S: You can read more about Finkel’s research on marriage in this recent New York Times piece.

Julia’s Media Diary: An American Affair

As a journalist, I was interested in learning about the kinds of news media I generally consume: where it’s from, the sources I tend to go to, whether information is pushed to me (via social media or e-newsletters) or I actively seek it out. I used RescueTime and a media log to figure out how and where I was getting my news.

An American Affair

I was surprised to learn that almost all of my top news media sources were American. In fact, US-based news sources far outstripped my media from any other country, with Canada and the UK trailing behind. Mainstream media sources—NPR, the New Yorker, the New York Times—represented the bulk of my media diet. I also spent a lot of time reading about the changes in the media industry these past few days. (General news stories were the main focus, followed by science-related stories, and then media-related stories.)


Push vs. Pull

As well, I found that I spent almost as much time wading through news information that is pushed to me as I do seeking out news.


The Sadness of Endless Scroll

When it comes to news media, I mostly gathered information online (using my computer and my iPhone), though I also got a healthy portion of news through the radio or podcasts. What didn’t make it on to my graph was the vast number of websites I had visited for only minutes or even seconds, according to RescueTime. Of my top news sources, Twitter and Feedly represented a large proportion of the time I spent consuming media, which suggests I was scrolling through stories and reading tweets and snippets of stories instead of diving deep.


The Tyranny of Gmail

Finally, I was distressed by the amount of time I dedicated to Gmail. Nearly seven hours in five days, and two of those days fell on a weekend when I tend to use email less than during weekdays.top_media_activites

Reading books—for classes and pleasure—was my third most popular media activity, after creating content on Microsoft word. This was somewhat comforting, though I am not sure it represents my typical media diet. Since I’m on leave from work at the moment and back at school, my sources of news and the way I use them differ quite a bit from the usual. Still, I would like to offer these media diet resolutions:

1) Cut the Gmail habit.
2) Dedicate more time to reading alternative news sources.
3) Seek out news sources from countries other than the US, UK, and Canada.
4) Spend less time with information that is pushed at me on the endless social media scroll and more time lingering on stories I seek out.