MAS S61 final project

An Analysis of Sina Weibo Censorship Using WeiboScope Search Data

Starting at 4:28 PM May 19, 2012, I posted on my Sina Weibo account two names as well as the Chinese words for “Taiwanese independence.” The first name I posted was “Chen Guangcheng” (in English and Chinese), the blind lawyer who escaped house arrest in Shandong province and made his way to the U.S. Embassy in Beijing. The second name was “Bo Xilai” (in English and Chinese), the former Party Secretary of Chongqing who recently fell from power. Less than 14 hours later, I received a message from Sina Weibo’s system administrator informing me that my two posts on “Chen Guangcheng” were “inappropriate” and had been censored. While I can still see the two “Chen Guangcheng” posts on my Sina Weibo account page, no one else can. Surprisingly, my posts on “Bo Xilai” and “Taiwan independence” were not censored.

Herein lies the conundrum with censorship in China. We know that certain topics are censored from blogs hosted in China, Chinese search engines and Weibos. But we don’t know where the line lies. Part of the reason is because the line is constantly moving. Baidu, Sina and Tencent could help identify the line by publishing a list of banned topics or keywords, but they don’t. Rather, they hire “monitoring editors” and rely on self-censorship to ensure that user generated content does not run afoul of Chinese authorities.

Some computer scientists in academia have tried to make sense of censorship in Sina Weibo by analyzing the data. In March 2012, David Bamman, Brendan O’Connor and Noah Smith at Carnegie Mellon University published a paper entitled “Censorship and deletion practices in Chinese social media” in First Monday after analyzing 56 million Sina Weibo messages and found that more than 16% had been deleted. King-Wa Fu and Cedric Sam at the University of Hong Kong’s Journalism and Media Studies Centre have hacked the Weibo Scope Search that archives deleted posts on Sina Weibo.

For my MIT Media Lab final project, I’ve tried to build on King-Wa Fu and Cedric Sam‘s work by analyzing the data collected from the Weibo Scope Search to try to make some sense of Sina Weibo censorship. Since its inception February 1, 2012 to May 20, 2012, the Weibo Scope Search has collected 12,032 deleted messages from Sina Weibo. The first thing I did was to simply plot all the deleted messages on a timeline from February 1, 2012 to May 20, 2012 and this is what I got:

My findings were consistent with the Carnegie Mellon team’s findings. There are spikes in Sina Weibo censorship as a result of media reports and rumors. During the Carnegie Mellon survey duration from June 27, 2011 to September 30, 2011, there was a rumor that former President Jiang Zemin passed away causing a spike in Sina Weibo deletions. From February 1, 2012 to May 20, 2012, the following incidents in China caused in censors employed by Sina Weibo to work overtime:

  • February 6, 2012 – Chongqing Public Security Bureau head Wang Lijun goes to U.S. Consulate in Chengdu with information about the death of British businessman Neil Heywood that implicates Chongqing Party Secretary Bo Xilai.
  • March 8, 2012 – Chongqing Party Secretary Bo Xilai fails to show up at the National People’s Congress, sparking rumors that he has fallen from power.
  • March 15, 2012 – Bo Xilai is removed from post as Chongqing Party Secretary.
  • March 18, 2012 – A Ferrari crashed on Beijing’s Fourth Ring Road killing one and injuring two people.
  • April 22, 2012 – Blind lawyer Chen Guangcheng escapes from house arrest in Shandong province and makes his way to U.S. Embassy in Beijing.
  • May 14, 2012 – The Beijing Daily posts a message on its official Weibo charging that U.S. Ambassador to China Gary Locke is posing as an ordinary citizen and calls for Locke to disclose his wealth.

Interestingly, deletion of Sina Weibo messages tend to hit a low on Saturdays. I’m not too sure why that is except that maybe censors want to take time off on weekends as well. If you want to maximize the length of time your message will remain on Sina Weibo, probably the best time is to post the message after 11 PM Friday night.

The second analysis I did with the Weibo Scope Search was to try and figure out how long it took the censors to delete messages on Sina Weibo. Each Sina Weibo has a time stamp for when it was created. The Weibo Scope Search checks Sina Weibo‘s timeline at most four times a day (but usually less due to limits that Sina Weibo imposes). Let’s say for instance, a user posts a message on Sina Weibo at 8 AM. Weibo Scope Search checks Sina Weibo‘s timeline at 9 AM, 3 PM, 9 PM, and 3 AM. If the message was deleted by the censor at 10 AM, it would show up on Weibo Scope Search‘s “deleted time” as 3 PM.

shortest time 0:04:04 hours
longest time 3401:51:45 hours

The fastest a post was deleted on Sina Weibo was just over 4 minutes. The longest time it took for the censor to get around deleting a message on Sina Weibo was over four months. For the posts created on May 20, 2012 and deleted on the same day, it took on average 11 hours for Weibo Scope Search to detect the deletion. It took the censors about 14 hours to delete my post “chen guangcheng.” Determining the average time it takes for censors to delete “irresponsible” messages is a bit tricky since we don’t have data on exactly how long it takes for each post to be deleted. Out of curiosity, I pulled up three messages that took over four months to delete to see what they said:

time created time deleted  hours  message
2011-12-29 00:30:41 2012-05-18 18:22:25 3401:51:45 “如果明年欧美名校在三四月份一起召开家长会的话,那么中国的十八大就很可能开不了了。”
“If the top universities in Europe and the U.S. hold their parent-teacher conference next March or April, then China will not be able to hold it’s 18th Party Congress then.”
2011-12-17 20:52:01 2012-04-27 20:40:12 3167:48:11 “【媿尔公侯高窃位,怆然世事急抢滩】国际盲人日当天@张海迪 通过私信@我是闻正兵 公布了她之前为光诚的努力。当年我和袁伟静嫂子向她求助时抱了很大期望,但坦率说从未得到她哪怕一个电话询问或慰问,这肯定谈不上“做了应该做的一切”。如今舆论环境更好而光诚处境则更糟,难道她一点努力都不能做吗?”
“On World Blind Day, paraplegic writer Zhang Haidi told Wen Zhengbin in a private letter that she did her utmost to help Chen Guangcheng. At the time, Chen Guangcheng’s wife Yuan Weijing and I asked her for help, but she didn’t even call us or ask us how we were doing. She didn’t do everything she should have. Today, Chen Guangcheng’s situation is even worse while there is greater openess for debate. Shouldn’t she have done more?”
2011-11-26 16:13:26 2012-03-28 07:55:28 2943:42:02 RT:”演藝界人士周星馳表支持唐英年參選下屆行政長官,欣賞他的處事,為人豁達開通。他又說,唐英年一點也不蠢,是有智慧的人,自己亦不會與蠢的人做朋友。對於唐英年有感情缺失,會否流失支持,周星馳認為並無關係,因為現時是選特首,不是選男朋友。″
“Hong Kong actor Stephen Chow said he supports Henry Tang’s bid to be the next Chief Executive of Hong Kong. Chow admires Tang’s way of doing things and open mindedness. Chow added that Tang is not stupid, but a smart person. Chow says that he wouldn’t be friends with stupid people. Regarding whether Tang’s infidelities will cause him to lose support, Chow says that it shouldn’t matter because people are voting for the Chief Executive, not choosing a boyfriend.”

I’m not too sure why it took so long to delete the posts. Cedric Sam points out that the posts may have been in the Weibo Scope Search database to begin with and they just didn’t turn up until several months later. The researchers at University of Hong Kong’s Journalism and Media Studies Centre are constantly adding new Sina Weibo to their list. Or, they could have just turned on the deletion marking system in the Weibo Scope Search so that it would have caught some censored posts that weren’t caught before.

To be sure, there is no way to tell for sure whether some of the posts were deleted by the users themselves instead of “monitoring editors.” Sina’s API returns two types of error messages: “Weibo does not exist” and “Permission denied.” We assume that when a post is deleted by the user, the “Weibo does not exist” error message comes up. When a post is censored, the “Permission denied” error message comes up. Weibo Scope Search keeps track of all the deleted posts that have the “Permission denied” error message.

If I had more time (and knew how to code), I would have liked to have analyzed more of the data that Weibo Scope Search came up with. Among the things I would have liked to explore are:

  • Geographic distribution of deleted messages on Sina Weibo – The Carnegie Mellon paper also looked at geographic distribution of censored Sina Weibo and found that messages issued from Tibet, Qinghai and Ningxia are deleted at a higher rate. Weibo Scope Search also had data on the city and province that each message originated from. However, I didn’t have enough time to figure out how to convert Sina’s data in its city and province into a fungible type of data to transpose on a map.
  • Relationships between the most censored Sina Weibo accounts – Using Weibo Scope Search, we’re able to rank the 3,524 users whose Sina Weibo messages are being deleted the most to last. One thing I’d be interested in exploring is how many followers these Sina Weibo accounts have and whether they follow each other. It’s not clear to me if the censors have compiled a list of influential Sina Weibo accounts and are tracking them daily or the censors are using key word searches to figure out what to censor.
  • A deeper analysis into the most censored Chinese words on Sina Weibo – Several weeks ago, I did a word cloud of the most censored Chinese words on Sina Weibo to see what came up. By far, the most censored words were the Chinese words for “retweet” followed by “ha ha” or some variation. It makes sense, but it’s not very helpful. Given more time, I would have liked to dig a little deeper to see if there were any words or code words that consistently came up again and again after filtering out the “retweets,” “ha ha,” and other stop words.

How to Analyze WeiboScope Search Data

King-Wa Fu and Cedric Sam at the University of Hong Kong’s Journalism and Media Studies Centre have built a WeiboScope Search that sends all of the deleted Weibo posts to a server in Hong Kong and stores them.  However, the data is in JSON format, which looks like this:

To make sense of the data collected, we need to first clean up the data. I used Google refine to clean up the data by:

      1) Download + install Google refine
      2) Click on “Create Project”
      3) Click on “Web Addresses (URLs)”

      4) Insert link
      5) Click “Next”

      6) Highlight the fields you’re interested in and left click the mouse.

    Google refine should automatically put all the fields into columns:

      7) Click “Create Project”
      8) Click “Export”
      9) Click “Excel”

Now that we have the data formatted, we want to make sense of it.

The first project I did was to graph the deleted weibos on a timeline. My classmate Eugene Wu suggested that the best software to visualize the data is Tableau.

      1) Download + install Tableau
      2) Click on “Open Data”
      3) Under “Connect to Data: In a file”, click on “Microsoft Excel”
      4) When the “Excel Workbook Connection” window pops up, click “Ok”

      5) Change the format for the “created at” and “deleted” columns from “text” to “Date & time” by right clicking the mouse, selecting “Change Data Type” then “Date & time.”

        6) Go to the Dimensions box and drag the “deleted” data set to “Columns”
        7) Go to the Measures box and drag the “Number of Records” data set to “Rows”
        8) In the “Show Me” box, select the type of graph you want. Voila!



A tool to automatically geocode and analyze location based data

As state and national level government agencies continue to make community related data available online (, NY community health data), it brings the exciting opportunity to look for rich information between the datasets.

For example, one datasets may contain low birth rates across counties in new york, while another dataset may contain youth pregnancy data. Exploring correlations between the two datasets is an important first step towards uncovering the next big scoop (or interesting facts!).

Throughout the semester, I identified three hurdles that make these types of analyses difficult.

  1. Data import: Simply getting data into a program you’re comfortable with is hard. The data may be stored in JSON, or displayed as an HTML table on a website. Government datasets often come as hundreds of files in a zipfile.
  2. Extracting location information: Many datasets are very difficult to deal with because the statistics (e.g., low birth rates) are assigned to communities, cities, counties, or states. One dataset may report birth rates by county, while another reports then by zip code. We can only look between them by knowing how to equate zip codes with county names.
  3. Looking for patterns: Managing even 5 different datasets can quickly be unwieldy.

EasyData is a prototype to make each of these three steps less cumbersome. It tries to automate each of these three steps as much as possible.

Easy Data Import

EasyData will do a good enough job of importing your data. If the data contains headers (e.g., “birth_rate”, “county_name”), it will try to identify them. If the data has errors, it will ignore them.

Automatic Geocoding

EasyData will try to analyze your data to see if any of the columns are zipcodes, state names, addresses, or latitude longitude coordinates. If it thinks it sees an address and a state name, it will try to automatically geocode the table. Otherwise, you can tell EasyData which columns to geocode and it will do the rest.

Automatic Correlation Search

Once your data hase been geocoded, EasyData will combine the statistics of data that references the same location (e.g., downtown boston), and see if there are any interesting trends. It will plot the most interesting trends.

Interested? Try it out!

(It’s a prototype so may be really slow if multiple people are using it.)

Here’s a screenshot so you know its real.


Using Dataforager to Report on Pakistan

Several hours before my classmate J. Nathan Matias first showed me his new tool Dataforager Sunday, Pakistan blocked Twitter apparently because some tweets were urging people to join the third “Everybody Draw Muhammad Day” campaign on May 20, 2012. While I was playing around with Dataforager, I applied it to five news reports about Pakistan blocking (then unblocking) Twitter and came up with the table below:

newspaper Washington Post BBC New York Times Global Voices Guardian
title Pakistan blocks, then restores, Twitter access Pakistan restores Twitter after block Pakistan Blocks Twitter Over Cartoon Contest Pakistan: Twitter Goes Through Weekend of Censorship Pakistan blocks Twitter amid blasphemy fears
data forager results @Innovations @SenRehmanMalik @nytimesworld @FizaBatoolGilan @GdnPolitics
@fispahani @MarkLGoldberg @JonathanHaynes
@marvisirmed @’SenRehmanMalik @mediaguardian
@sherryrehman @abidbeli
missed Twitterers Rehman Malik Fizza Batool Gilani Farieha Aziz
Cyril Almeida Arif Rafiq Imran Khan
Rehman Malik Emrys Schoemaker
Ali Dayan Hasan
Raza Rumi

The Washington Post and Global Voices do the best job of coding links in the story to interviewees’ Twitter accounts. The New York Times and Guardian cite sources as having Twitter accounts, but forces the reader to search for each Twitter account.

One of J. Nathan Matias‘ original intentions for Dataforager was to help users compile a list of experts to learn more about a particular subject or topic. Since I know very little about Pakistan besides the fact that it borders India and it’s where Osama Bin Laden was living for the past six years, I tested J. Nathan Matias‘ theory to see if I could find enough information to write a story about Pakistan using the list of Twitterers compiled using Dataforager.

For this experiment, I used Dataforager to compile a list of experts on Pakistan on Twitter from Washington Post’s Pakistan blocks, then restores, Twitter access article by Richard Leiby and Storify’s Flurry of tweets in wake of Pakistani Twitter ban article by Annie Ali Khan.

From Annie Ali Khan‘s Storify articleDataforager pulled up a list of 17 Pakistani’s on Twitter:

From Richard Leiby’s Washington Post article, Dataforager pulled up a list of 3 Pakistani’s on Twitter:

Going through the list of tweets compiled from Dataforager Tuesday morning, both mention some kind of unrest in Karachi. Entrepreneur Mohammed Sumair Kolia tweeted that 8 people were killed and 30 injured in the unrest in Karachi. Pakistan’s Ambassador to the U.S. and former journalist Sherry Rehman tweeted that two journalists were injured during the firing at the Awami Tehrik rally in Karachi.

By themselves, the tweets aren’t enough to piece together what happened in Karachi. We know how many people were injured, but we still don’t know what happened, who did it, how it happened, and why it happened. Fortunately, the tweets use a #Karachi hashtag. Doing a “#Karachi” search on Twitter, I get a list of tweets about what happened in Karachi.

A sports reporter with GEO News, Faizan Lakhani, was the first one to tweet “Reports of firing in Boltun Market and old city area. ‪#Karachi‬” with a “#Karachi” hashtag.

Two and a half hours later, the Express Tribune posted on its web site and Twitter a live blog/news article about the riots that unfolded in Karachi after unidentified gunmen open fire on a Awami Tehreek and Peoples Amn Committee rally.

A trip to the Pacific: mapping invisible environmental risks

The remote, low-lying coral atoll of Vaitupu in the Pacific Ocean could be helped by a satellite mapping project at MIT that seeks to expose environmental risks invisible to human eyes.

Vaitupu, with 1,600 inhabitants, is part of the tiny nation of Tuvalu, where the highest point is just 4 metres above sea level and so among the most vulnerable to sea level rise caused by climate change. The new environmental maps could help set a benchmark against which to monitor future changes in places like Tuvalu.

Normal maps show “streets and buildings – but they don’t really tell us much about the environment,” says Arlene Ducao of MIT, who is also co-principal with Ilias Koen of the DuKode Studio in Brooklyn, which runs the “Open-IR: Infrared for Everyone” project.

Blue pin = Vaitupu

Areas with buildings show up in pink

The maps use infrared, outside the visible spectrum, and filters that make hard-to-spot features – such as vegetation, soil, water or height above sea level – “pop” out from the background when translated into bright visible colours such as pink, blue or green.

water shows up as black or blue

Ducao and Koen obtained satellite maps of Tuvalu from U.S. Landsat satellite data for this News and Participatory Media item. Many experts say seas could rise a metre this century, as glaciers and ice caps melt and ocean waters expand – a creeping threat to Vaitupu’s population with projected erosion and salt contamination of cropland and fresh water supplies.

“The satellite data is available, it’s just difficult to use,” Ducao said in an interview in her office at MIT filled with a jumble of shelves with bits and pieces of metal, plastic, books and papers from other projects.

For Vaitupu, the images could help set a benchmark to track if the areas of vegetation or soil shrinks in coming years. Vaitupu is the largest of Tuvalu’s atolls and the site of the nation’s only secondary school.

Vegetation shows up as red

The main pilot project of OpenIR shows areas of Jakarta that are vulnerable to tsunamis – apart from the coastline it unexpectedly reveals some low-lying built-up areas inland to the east of the city that are also at high risk since there is little vegetation to slow any waves from the sea.

Another uses satellite data to peer into abandoned city-owned lots in New York to show which have vegetation and could be easily converted to parks.

Some places are hard to map than others. The main island of Tuvalu – Funafuti – is elusive on U.S. Landsat satellite data, apparently because it lies exactly on the far side of the planet from the north-south line set as the usual baseline for longitude in Greenwich, England.

“I think that island falls…just on the seam,” Koen said. Vaitupu and several other atolls north of Funafuti are visible.

Sitting in front of her computer, Ducao applies a new filter to the map of Jakarta. “The vegetation just pops,” she says of a sudden shift to bright-red tracts around the city Jakarta that highlight vegetation far better than a normal camera using visible light.

She switches on a different filter and all the impermeable surfaces – pavement, concrete buildings are shown in bright pink. “The downtown just pops,” she said. And an infrared filter well outside the visible spectrum reveals large areas around the city – apparently rice paddies.

Rice paddies are “not something you can see in true colour. And it doesn’t show up on street maps,” she said with a laugh. has some similar data but it is targeted at scientists and is hard to use.

Jakarta map shows tsunami risks -- the redder the more vulnerable

“Our initial audience is crisis responders,” she said.

The maps could also help, for instance, emergency relief teams deploy after a disaster such as Haiti’s earthquake – which could add data about faults where new tremors might occur or places where shelter was available.

In the longer term, it could help plan how to site buildings or agriculture out of harm’s way. Insurance companies might also be interested in the data for long-term premiums. Others from engineers to health workers could find uses.

“A lot of this stuff is done manually by crisis responders. Our value proposition is that a lot of this can be automated,” she said. Ducao and Koen were planning to create a web application where you could plot in latitude and longitude and get out a risk map.

The pilot pictures are based on free images by the Landsat satellite system, which ended in 2006. Ducao and Koen are now turning to data from modern satellites.

In New York, the project is helping an organization called 596 Acres, named after the area of city-owned land in Brooklyn that was going unused.

“The idea is to make abandoned public lots known to everyone so that people can make better use of these lots,” she said. Some lots are boarded off, while satellites can see if it comprises cement or buildings – harder to use immediately – or simply vegetation.

“If you can identify what is green it shows where you easily could develop a community garden, for instance,” said Koen. “This could be a very interesting civic use of the data.”

“NASA and government satellite agencies are not particularly interested in urban areas,” said Ducao. “Maybe if this project gains enough traction it could be taken over by agencies who have the computing power to take it over to make models, predictive modeling.”

“That would be cool.”


To access prototypes:



Old Satellites, New Tricks

Why seeing the un-seeable should be a superpower available to everyone.

For the last few years Arlene Ducao, a researcher at MIT, and her design partner Ilias Koen, have been building an awesome piece of software. It’s a visualization tool with the uncanny power to reveal hidden worlds; its called Open IR and they’re about to show me their first Prototype.

I meet Ducao in her office on the Media Lab’s 3rd floor. It has a narrow view of the Boston skyline and what look like drawing boards are leaning sideways against the desks. Prototypes for other projects are scattered everywhere. Ducao is petite, colorfully dressed and bursting with enthusiasm. She greets me with a huge grin.

Her partner Koen is already here, but invisibly so because he’s only with us via Skype. His avatar on Ducao’s IPad is a comic book hero I don’t recognize but he assures me he really is in Brooklyn. We position him on my knee so that he can see Ducao’s large screen monitor where the demo is about to happen. Every now and then the noise of New York traffic breaks the studious peace of the Media Lab.

The first thing Ducao loads is a Google Map satellite view of Jakarta, Indonesia’s capital city on the North West coast of Java. It’s the largest city in South East Asia; home to over 10 million people. The map displays all the usual features of a sprawling metropolis as seen through Google-vision; a digital network of yellow lines and white text with the names of towns, suburbs and major roads imposed on the dusty green fuzz of reality.

According to Ducao, what we have here is an optical satellite image that has high spatial resolution (each pixel represents one meter) and lots of street names and landmarks but not much about the natural world. “You can just make out what’s vegetation and what’s urban sprawl, but it’s all a bit murky”.

Then she switches to the Open IR interface, and everything changes. With a similarly scaled satellite view of Jakarta, she begins to transform the city before our eyes. Each new click reveals a new layer of reality and features that were barely visible to the naked eye are suddenly amplified with eye-popping colors.

With the first click, the city itself turns a shocking pink which leaks down into the southern hillsides in spidery clotted lines like spilt nail polish. The pink areas are all the impermeable surfaces, the built world of concrete and tarmac.

Jakarta's impermeable surfaces in hot pink.

With the next click, the colors switch. The metropolis turns green but it’s now spattered with a livid red that gradually blooms at the cities edges into a red cloud billowing out over the southern slopes and the land to the east and west. Where red meets green the border is ragged, the colors interlace like fingers meshed. From the IPad on my knee Koen tells me through traffic, that the red is everything that is photo-synthesizing.

With the final click, the ordinary-looking fields to the North East of the city give up a secret. They throb a deep navy, almost black. This view highlights moisture and the fact that they’ve turned the same color as the sea means they are very wet. Paddy fields perhaps? “Yes that’s what we think, either that or marshland” says Ducao. A close zoom in with Google maps view confirms they are in fact rice paddies.

What we’re seeing here is our familiar world enhanced by infrared vision. Its the superior machine-vision of Landsat 7, a satellite that’s been part of NASA’s long-running Earth Observation System since 1999 and is orbiting somewhere above us right now.

Ducao and Koen discovered this rich dataset when they were working as science visualisation artists at the New York Museum of Natural History. Previously they’d both trained as computer artists at the New York School of Visual Arts.

“Coming out of there, most people wanted to work for Pixar or wanted to become gallery artists but I wasn’t feeling either of them,.” Ducao says, laughing.

Instead they ended up making art out of science for the Museum. “Our job was to keep the permanent exhibitions fresh,” says Ducao, which they did with a mix of video and computer installations. They worked in small teams; a producer, an animator and a scientist, with the scientist doing most of the data processing.

“Gradually we started doing more and more of the data processing and visualisation ourselves,” says Ducao. “When we came across the Landsat data and we had some sense of its power.”

NASA describes Landsat 7 as not only providing the longest record of the earth’s continental surface as seen from space, but a record “unmatched in quality, detail, coverage, and value.”

Unlike the optical satellite images we are familiar with, Landsat’s specialty is multispectral imaging, in other words it measures energy reflected from the earth’s surface across a broad range of the light spectrum from the visible to the lower frequency mid-infrared.

The ability to ‘see beyond the visible’ has given scientists an enhanced picture of what goes on on the ground in places they can’t easily get to, particularly environmental changes. They’ve used it to analyse oil spills, earthquake damage, logging-related landslides, soil composition, fault lines and water-depth. In each of these situations, objects on the ground consisting of organic and inorganic matter reflect back a unique array of electromagnetic waves according to their chemical composition and temperature. The more of these waves that can be detected, the more scientists can determine what the objects are made of, which is why multispectral measuring devices are considered so useful.

Since 2008 the data has been available for anyone to use. And indeed NASA publishes a Science Data User’s Handbook. The problem is it comes in what Ducao refers to knowingly as “g-zipped text files”; reams of encoded data that need to be converted into something recognizable before they can be interpreted – a time-consuming process. “That’s why its really only scientists who use it” says Ducao.

The pair believe that this data deserves to be liberated. Open IR is all about releasing it into the wild.

The first thing they did was to create an animation using data about China’s Pearl River Delta, on the mainland opposite Hong Kong. They called it the Human Footprint because it focuses on human impact on the natural landscape. But the animation also demonstrates powerfully how different combinations of visible and infrared light can be used to reveal differences in land cover such as whether a field is ready for planting, or for harvesting, or what’s recently be deforested or built.

As Ducau explains, its not that IR shows you things you can’t see, its more that it detects more detail and enhances what is rather murky with merely optical vision.

It was the Japanese Tsunami that sharpened Ducao and Koen’s thinking about how the data might be used. As a wealthy country Japan had access to the best quality map resources to guide its disaster relief work, in contrast to the developing world and the enduring chaos that remains in many areas hit by the 2004 Tsunami in South East Asia. Ducao and Koen realized that easily accessible online IR maps could help the most vulnerable understand changes to their environments and vastly improve their disaster relief efforts.

Jakarta was a natural choice for their next visualization. Its not only economically and environmentally vulnerable but some NGOs, like Open Streetmap are already working with local communities to improve their woeful ground-mapping resources so it was easy for Ducao and Koen to find partners to work with.

Japan’s crisis supplied another idea – the risk map, created by experts to determine evacuation zones and disseminated to the public via the media.

When Ducao clicks on the risk-map they made for Jakarta, a grim picture appears. The whole city turns ‘at-risk’ red, including areas far from the coast. The low-lying rice paddies are, of course, set to be obliterated, but Koen is quick to point out that the risk map is not predictive. It merely reflects how vulnerable each land-cover type is to flooding, based on a ten-point list gleaned from previous crisis management research.

Jakarta Tsunami risk map.

There aren’t really any surprises that come out of this list (low elevation concrete is the most vulnerable, high elevation vegetation is the least), but the risk map certainly makes it clear that smooth hard streets let water flow much more easily than barriers of vegetation, even if they are at greater elevation.

Ducao and Koen are looking for partners to develop new projects with Open IR. They see a plethora of users amongst the lay-public, from agricultural workers to architects, urban planners to environmental activists, public health workers to community groups and even travelers looking for the greenest city to visit.

Perhaps the most compelling use of Open IR will always be for crisis management in the aftermath of a natural disaster.

“Imagine if you could combine this environmental data, with a crowd-map like Ushahidi” says Ducao, “then you’d have location-based human information and dynamic real-time environmental information all in one place.”

But for that dream to happen, they’re going to have to jump ship. Sadly Landsat 7 suffered a hardware component failure in 2003. Its still up there measuring, but bits of information will always be missing.

Koen tells me they now have their eyes set on ASTER, a Japanese sensor on board NASA’s Terra earth orbit satellite (also launched 1999 ). Its capable of recording energy from 15 different bands of the electromagnetic spectrum and has double the spatial resolution of Landsat allowing you to zoom in closer.

“We are excited because you can put in a request to have the satellite collect some data for you” says Koen. They’ve put in a request to update their Jakarta visualisation that’s still pending.

For now, Ducao and Koen are building more visualizations city by city. They are currently working on London (although no risk map yet)  and New York, and will be building up a database of cities and ecosystems at high risk either of natural disaster or climate change.

But its not all disasters and emergencies, Ducao and Koen are working with a local community organization in Brooklyn that is trying to reclaim vacant lots for public use by turning them into green spaces. 596 Acres have retrieved a complete list of vacant lots from local government and are surveying them, lot by lot, to see what’s there. The New York Open IR map will automate that process, revealing which of the lots are already green and could be developed further, and which neighborhoods are in dire need of more green space, thus helping the organisation focus its resources.

Infrared visualizations may look esoteric compared to the average google map, more like an abstract Rothko painting, but with Open IR’s democratising vision, to release the data and make sense of it, we may soon all be able to understand their bright colors and strange visions and many will be using them for social change.

“My best friend recently said to me,” says Ducao, “that climate change is getting personal. It used to be something we’d only read about or watched on the news, but now we’re experiencing it firsthand.” If that’s really how people feel, then having environmental data visible in our online maps may prove a really useful tool and who knows what people will do with them.

NASA’s satellites maybe old, but as is happening in so many areas of culture, it seems the participatory internet is about to give their data a new lease of life.



Community based Q & A fact-checking service

The community based Q & A fact-checking service helps journalist & bloggers corroborate the veracity of a quote / statement from a pool of certified experts. I have set up a pilot Q & A platform aimed at reporters or bloggers focused on climate change & sustainability in Cities & Built environment.

Ask your questions here & get your answers instantly from a domain expert:

Should Kenyans #PayInterns?

This article is part of an assignment in which I tested the DataForager software to support my research for this article.

Kenyans this week have been debating an issue that seems common to companies everywhere: should companies offer unpaid internships? Internship debates always highlight strong underlying social disagreements about employer fairness, education, and the purpose of work. How do those themes play out in Kenya, where at last report, 38% percent of Kenyan young people between 15 and 29 are neither students nor employed? (pdf, p74)

On Twitter, Jackie LifestyleDiva asks if this is a genuine debate:

Who is actually talking about #PayInterns? According to a Global Voices article by Ndesanjo Macha, the online debate began with a tweet by tech blogger Robert Alai, asking tweeps to ask companies if they pay interns. The story escalated into a general conversation about internship pay, with some companies even tweeting details of just how much they pay their interns.

This sentence brought to you by DataForager:

Other people quoted in the Global Voices article include entrepreneurs, IT staff, a corporate social responsibility consultant, journalists, and even the Kenya Police department.

What is a typical internship?

Here are some internships I found on Kenyan job listing sites like JobsEastAfrica247, Dealfish, TipTopJobs, the InternKenya blog, and the University of Nairobi’s jobs and internships website page.

The Kenyan Association of Manufacturers recently posted 3-6 month unpaid internships in policy research and advocacy for advanced students and recent graduates. They do not offer the possibility of employment after the internship. This looks like a classic unpaid internship, occuring over an extended period with no offer of tangible outcomes.

The multinational investment bank Renaissance Group offers a much better internship (though apparently unpaid). This two-month programme offers training in specific areas and promises an opportunity to join the Renaissance Academy, the bank’s training scheme for incoming employees.

ACTED, an NGO, has offered an internships with a $300 USD/month living allowance, in addition to reimbursement for accommodation, food, travel, and life insurance. The reporting internship lasts six months, but unlike the investment bank opportunity, offers no promises of future work. At $300/month plus insurance and expenses, the internship offers nearly as much value as the median salary of administrative assistants in Kenya, according to But the internship is advertised globally- my guess is that Kenyan students only have a small chance on this one.

This IT Internship opportunity from Toolkit Solutions is for computer science students with their own laptops and 3G modem sticks. What do they get? Programming experience as well as unlimited monthly Internet paid for by the company.

Advice for Prospective Interns

Several of Kenya’s universities offer support for interns. The Strathmore University career development office organises a global internship fair. They also coordinate “Industrial Attachments” internships for 3rd year undergraduates which grant academic credit. The University of Kenya offers similar “placement services.”

So, Should we #PayInterns?

I think this is the wrong question. As we can see from the diversity of work experience opportunities (as well as discussion on #PayInterns), it’s more important for employers and prospective interns to choose arrangements that provide everyone with the value they need. The best internships, whether they offer reimbursment of pay, offer training, work experience, networks, and a job opportunity at the end. I have seen plenty of well-paid internships where employers don’t offer any training and interns leave with no experience beyond the coffee machine and the file cabinet.

Overall, I think Kenyan employers are doing a bad job of signaling what students get out of internships. Too many internship listings include a page or more about the kind of applicant they expect, with no mention of what they plan to offer to their interns, pay or not. Companies with no plan for delivering learning and mentorship to interns have a problem much deeper than pay.

A company may choose not pay interns, but it does have a responsibility to make sure interns get something valuable out of the experience.