map – Future of News and Participatory Media https://partnews.mit.edu Treating newsgathering as an engineering problem... since 2012! Wed, 08 Apr 2015 12:46:51 +0000 en-US hourly 1 https://wordpress.org/?v=5.2 Is San Francisco’s Hot Housing Market Literally On Fire? https://partnews.mit.edu/2015/04/07/is-san-franciscos-hot-housing-market-literally-on-fire/ https://partnews.mit.edu/2015/04/07/is-san-franciscos-hot-housing-market-literally-on-fire/#comments Wed, 08 Apr 2015 04:11:15 +0000 http://partnews.brownbag.me/?p=6826 Continue reading ]]> This project is a collaboration between David Jimenez, Charles Kaioun, Celeste LeCompte, and Léa Steinacker.

In San Francisco, there is a growing concern about residential fires, which have displaced more than 100 residents from their homes since the beginning of the year. Have there been more big fires? If so, why? We turned to the data to answer the question.

FIRE-in-SFO-draft_3Read on for more background on our analysis.

Analysis: Beware of data!

This should be the warning applied to all datasets and visualization tools. Illuminating and bringing a new perspective with data is key to asserting ideas, but using the wrong dataset or presenting the right one falsely is as easy as discovering new information.

We started working on building fire data from the San Francisco Fire Department and news reports, alongside the San Francisco Public Health Department’s no-fault evictions data. We came up with a nice map overlaying the two datasets:


Dots are fires — the redder the dot, the bigger the fire. The shaded areas on the map show the rate of evictions — the darker the shading, the higher the eviction rate.

You could easily look at these maps and draw the conclusion: more fires happen where there is a lower eviction rate, or there is a higher eviction rate in zones with fewer fires.

But, look more closely, and you’ll see that this conclusion is flawed.

First, the datasets come from different time frames and sources. The published evictions data is from 2005-2010, while the fire data is available from the city 2011-2012 and collected from news reports between August 2014 and the present.

Second, the evictions data is only available by block group over the entire time period; by averaging the data this way, it erases the change and movement of eviction patterns over time.

Finally, causation is a very subtle thing to test — even with good datasets and strong correlation, this is not so easy. I invite you to check this website (http://www.tylervigen.com/) if you don’t trust me.

We aren’t the only ones asking these questions. An online mapping project has attempted to answer some of these same questions with their own data visualization. However, their maps have many of the same problems — and other problems — as those we tested.

But having wrong data and wrong visualization is not solely the fault of the writers — it’s also on the data providers. Aiming for more transparency and informing people is a noble quest, but doing it wrong can lead to disastrous effects such as this one. The San Francisco Fire Department not only stopped publishing its public data after mid-2012 (though you can request it with a 10-day lead time), the data it provided was full of duplicates, unclear, and very difficult to fetch. (I invite you to check the way they do it, a collection of XMLs with incremental data encapsulated in ZIPs… a real pleasure…).

If cities want to embrace open data, they need to find better ways to publish, maintain, and support the information they’re making available.

Updated with revised image, April 8, 2015, 8:45am

]]>
https://partnews.mit.edu/2015/04/07/is-san-franciscos-hot-housing-market-literally-on-fire/feed/ 1
Natural Disasters Vs Mining Operations in Indonesia https://partnews.mit.edu/2015/02/23/natural-disasters-vs-mining-operations-in-indonesia/ https://partnews.mit.edu/2015/02/23/natural-disasters-vs-mining-operations-in-indonesia/#comments Tue, 24 Feb 2015 01:32:32 +0000 http://partnews.brownbag.me/?p=5713 Continue reading ]]> I started this data visualization set at 4.30 pm today and finish it almost four hours later. This is the first time I try to visualize several data sets using CartoDB, after participating in a workshop on using this tool last January.

The idea is combining three different data sets about natural disasters in Indonesia (floods, landslides and forest fire) to see the places where it happened most of the time in 2014, and then layered it with a map of places where non oil and gas mining operates in the country.

I suspect that most of these disasters happened in places where nature has lost its ability to sustain the balance, due to over exploitation of the resources. Obviously, although the trigger is natural cause, disasters such as flood, landslides and forest fire are basically human made.

All of the data sets used here are taken from government database, available at http://data.go.id.

I try to find other relevant datasets to combine the disasters map, such as: industrial zone maps, deforestation map, oil and gas mining zone, but unfortunately, those map don’t have similar georeference codes that can be read in CartoDB. So I finally settled with only a distribution map of the non-oil and gas mining industry.

Initially I wasn’t sure how to make the connection because Indonesia has more that 1.000 mining locations spread all over the archipelago, but then I found the torque heat animation which I think can represent the different concentration level of the mining industry in different places. The heat animation can highlight and contrast the disasters map which are only represented by different colors of simple circles.

From doing this exercise, I realize the complexity of data visualization, the importance of having a clean data set and the powerful image it can give to the audience. I hope when people look at this map, they can really make the connection between these horrible disasters and the mining industry that for years have been operating without a clear environmental regulation and oversights. (*)

Click here to see the map: Indonesian Natural Disasters Vs Mining Operation Distribution Map

]]>
https://partnews.mit.edu/2015/02/23/natural-disasters-vs-mining-operations-in-indonesia/feed/ 3
GIFGIFmap https://partnews.mit.edu/2014/04/02/gifgifmap/ https://partnews.mit.edu/2014/04/02/gifgifmap/#comments Wed, 02 Apr 2014 05:13:18 +0000 http://partnews.brownbag.me/?p=4556 Continue reading ]]>

Background: Kevin Hu & Travis Rich built a site called GIFGIF, which aims to crowd tag animated gifs with various emotions. From GIFGIF’s website: “An animated gif is a magical thing. It contains the power to convey emotion, empathy, and context in a subtle way that text or emoticons simply can’t. GIFGIF is a project to capture that magic with quantitative methods. Our goal is to create a tool that lets people explore the world of gifs by the emotions they evoke, rather than by manually entered tags.” As we know, animated gifs are also a popular storytelling mechanism for social news and entertainment websites.

The cultural phenomenon of using animated gifs to express emotions has been the subject of numerous journalistic inquiries:

Fresh From the Internet’s Attic – NYTimes

Christina Hendricks on an Endless Loop: The Glorious GIF Renaissance – Slate.com

GIF hearts Tumblr: a fairytale for the Internet age – Wired.co.uk

Visualization project for this week: Kevin, Travis, and I built a map tool so people can explore GIFGIF’s current dataset to see which gifs are most representative of certain emotions across different countries. Out of 1.8 million votes, 1.4 million votes had IP data which links the votes to the location of the voter. GIFGIFmap can be found here.

Screen Shot 2014-04-02 at 1.03.12 AM

In a future version, we would like to show the top gifs per emotion that countries have in common with each other, and what are unique top gifs for each country (along the lines of What We Watch). However, there are limitations to the GIFGIF data set in terms of global coverage. For example, the top 21 countries account for 92% of the votes. Additionally, we excluded countries that had less than 10,000 total votes across all categories, so as to avoid making generalizations based on limited data. We chose to include the number of votes per country (per emotion) to make the data set more transparent in terms of representation.

We think the tool we are building could complement existing stories about the phenomenon of using animated gifs to communicate (stories like the ones we linked to above).

These are some potential questions that we hope journalists could explore using a map interface to the GIFGIF dataset:

1) Do people from different countries interpret the emotional content of gifs differently?

2) If there are variances in interpretation, are there clusters of countries that have more similar interpretations? Do these match up with proximity, or immigration patterns?

3) What top gifs per emotion are unique to a given country?

 

Note: GIFGIF’s data will soon be made publicly available through an API.

 

]]>
https://partnews.mit.edu/2014/04/02/gifgifmap/feed/ 1