Data on high schools in Boston

For this data piece I will tell the story of finding the answer to this seemingly simple question: “how many high schools are there in Boston?”cheap air max

I am also calling every high school in Boston and telling the story of trying to collect data by cold calling school receptionists, documenting their response to quantitative and qualitative questions such as “how many students attend your school?” and “what makes your school special?”air jordan sale

new balance shoes

‘It is just an accident’


I could not concentrate on this assignment mainly because of my 8 year old daughter being injured at the PE class recently. Her face under her eyes was cut and ended up 11 stiches being operated at the ER which suggested us that we should consider aesthetic operation after one year to reduce the visibility of the scar.

The responsible PE teacher accused me to ‘have nerves to write rude messages’ after I question the safety measures taken in the class. The vice Principal of Graham and Parks school said that ‘That is just an accident. No one is responsible. There are children breaking their legs and arms every year at schools’. Yes, indeed.

“In 2009, an estimated 2.6 million children aged 0–19 years were treated in U.S. EDs for sports- and recreation-related injuries” as “unintentional injury”. According to National Electronic Injury Surveillance System (NEISS), only in 2012 (there are no figures later than that) total 573 children are injured at PE classes at schools in the US, according to the simple data search I made. Well, in 2014, one of the victims was my child.

Sorry for this highly personal and not developed assignment. This was a limited attempt to make a connection between data and human dimension.

photo Bahar accident after photo Bahar accident before operation

Visualizing GIFGIF by country

Kevin Hu & Travis Rich built a site called GIFGIF, which aims to crowd tag animated gifs with various emotions. From GIFGIF’s website: “An animated gif is a magical thing. It contains the power to convey emotion, empathy, and context in a subtle way that text or emoticons simply can’t. GIFGIF is a project to capture that magic with quantitative methods. Our goal is to create a tool that lets people explore the world of gifs by the emotions they evoke, rather than by manually entered tags.”

For this project, Kevin and I are building a map tool, along the lines of What We Watch, so that people can explore GIFGIF’s current dataset to see which gifs are most representative of certain emotions in each country.

GIFGIF’s data will soon be made publicly available through an API.

Data Story: Why People Take Free Online Courses (MOOCs)

Millions of people have signed up for Massive Open Online Courses, known as MOOCs. Early studies show that the majority of those who have signed up already have a college degree, and most do not opt to pay for a certificate to prove they passed the class. Put simply, they’re not looking to get college credit in any way. So I’m curious to dig deeper into what motivates these online “students.”

I am late to post because I’ve been digging around for a killer data set on this. I’ve made requests to HarvardX and to some researchers who have a large MOOC dataset, but so far no one has been willing to share their raw numbers. But HarvardX has published some demographic and survey data (not much). My sense, though, is that their data does not answer the question very well (most MOOC surveys only offer a few multiple choices on motivation).

So for the assignment, I’m focusing on playing around with fuzzier “data” – the student postings to forums in a MOOC. In many MOOCs, students post short introductions in the forums at the beginning of the term, usually saying why they are taking the course. I’ll analyze the intro discussion postings in one MOOC and group them into broad categories (my categories won’t capture everything, but there are definitely clear patterns in the responses).

My plan is to pick an astronomy course on edX that just started.
There are only about 200 intro posts, so it should be do-able in the short time frame.

I plan to pull out one student post that is the best example of each category I create. So the interface will be a simple pie chart with the percentages of each reason for taking a MOOC, but then when you click on a specific group/color, you’ll be taken to that person’s intro post so you viewers can “meet” them.

I’m certainly open to suggestions on tools, critique, etc.

Data Story: Congress and the Financial Crisis

For this assignment, I plan to use data to illustrate how the United States Congress works and responds to societal challenges. Specifically, I want to illuminate how Congress responded to the 2008 financial crisis through text data such as bills, reports, and hearings. What was the legislative response to the crisis? Who were the key actors and players? What was the content of successful laws?

I am working with data that is accessible through .gov websites such as the Government Printing Office. There are also some academic efforts to parse bills and laws into sections that I will use. My hope is that this approach will also be useful for current congressional activities.

More generally, and beyond the scope of this assignment, I’d be interested people’s ideas about how data-driven approaches could help us understand and analyze the processes of government and democracy.



Posted in All

Mapping the conflict in Syria

For the data story assignment, I would like to present data that I found on the current conflict in Syria while working on a GIS project. I will be using crowdsourced maps such as Syria Tracker and Open Street Map, which are based on the work of local volunteers, to map the conflict in Syria.

I am currently leaning how to use GIS, which is the study of geospatial information, and I thought it would be interesting to use the geospatial data I am currently working with, to tell a story, a story of conflict and its link to geographic features.

Syria Tracker has geospatial data on the number of deaths “resulting from the Assad regime”, recorded by volunteers on the ground since March 2011. Although this data must be taken with a grain of salt, it gives a good overview of patterns such as female as opposed to male casualties, and the location of casualties amongst the opposition forces and the population living within opposition control.

Open Street Map on the other hand, gives an overview of the main roads, waterways and land cover in Syria. By overlaying different data sets, one could visualize if there is any link between conflict density and proximity to roads for example.

Below is an example of a crowdsourced map which shows the main roads in the country. I plan on producing a more comprehensive map for this assignment but please let me know if you have any suggestions for improvement.

—  Elissar


Data project: Internet in Romania

I will look at data about internet penetration in Romania, and the paradox of being one of the countries with the fastest Internet connection, and one of the lowest rates of penetration in Europe (second to last in the EU28, for sure).

I’m trying to tie that with what I can find on e-governance and the relationship citizens have with authorities through digital means. As a new generation of civic-minded activists is looking to online for offline change, I thought it’d be interesting to survey the infrastructure landscape.

I’ll use Eurostat data for the most part, and I’ll return with questions as they arise, which I’m sure they will.

How big is the gap between health news and research?

For my data storytelling assignment, I’d like to see how health journalism coverage in the US compares to mortality data and health-research spending. I hope to tell a story about whether there is adequate coverage in the US media of things that harm and kill people here, and how that maps against where the government invests in health research to find out whether there are under-covered areas of science.

NIH has data on spending by category for 2010 and beyond:

CDC has mortality data by cause (most recent being 2010):

I hope to work with a programmer to access all news articles related to health from LexisNexus and use automated clustering to identify the most covered topics.

I’ll probably use Many Eyes to visualize the data in a bubble chart so that comparisons among research spending, journalism, and mortality can be easily made. But suggestions welcome!

Data stories: Narrative of education

I am working on a data story based on the narrative of education in Pakistan— particularly how we talk about education and how the narrative has changed in recent times. My data corpora include education stories— curated by Alif Ailaan, a Pakistani political advocacy group— and mainstream media streams curated through Media Cloud. I am going to look at latent semantic structure of text corpus provided by Alif Ailaan. In addition, I will look at juxtaposition of specific events with text highlights to understand framing around the narrative of education.