Tracing the links of the Germanwings disaster

A week ago a German jet crashed into the Alps, killing all 144 people on board. For the first several hours after the tragedy it was considered an accident, but it is now apparent that the plane’s co-pilot, Andreas Lubitz, is responsible, and details continue to emerge about his past. As more facts surface, news outlets covering the tragedy have released them in incremental updates. These updates have touched on a wide variety of questions: Why was no one aware of or worried about his mental health issues? Should he have been flying a plane in the first place? Have suicide plane crashes happened before? How has small-town Germany — such as the town of the 16 high school students on board or the pilot’s hometown — reacted to the horrific event?

When publishing these updates, publishers are often linking back to previous stories as a proxy for background information. The “original” story breaking the incident tends to be low on hyperlinks (such as the first link above, which only links to a Germany topic page) while later updates start to link back to archival stories for context. I was curious whether these internal, archival hyperlinks could be followed in order to automatically create a community of stories, one that touches on a variety of aspects of the incident. Links are rarely added to stories retroactively, so in general, following the links means traveling back in time. Could a crawler organize all the links for me, and present historical content (whether over the past 3 days or 10 years) for the Germanwings disaster?

I built a crawler that follows the inline, internal links in an article, and subsequently builds a graph spidering out from the source, storing metadata like link location and anchor text along the way. It doesn’t include navigational links, only links inside the article text; and it won’t follow links to YouTube or Wikipedia, just, for instance, the Times. This quickly builds up a dialogue of stories within a publisher’s archive, around one story; from here, it is easy to experiment with simple ranking algorithms like the most-cited, the oldest, or the longest article.

I chose three incremental update articles from March 30, one each from the Times, the Post, and the Guardian, all reporting that Lubitz was treated for suicidal tendencies:

For each of these three, I spidered out as far as they could go (though in the case of the Times that turned infinite, so I had to stop it somewhere).

New York Times

My first strategy was to simply look at the links that the article already contained. While the system can track links pointing in as well as out, this aticle only had outlinks; presumably this is because a) it was a very recent article at the time of the query, and b) we cannot be sure that we have all of the related stories from the given spider.

F46D5429-FF4A-4CC8-8727-F113C3CC1794
Clicking on a card will reveal the card’s links in turn–both inlinks and outlinks.

The “germanwings-crash.html” article had several links that formed a clear community, including archival stories about plane crashes from 1999 and 2005. The 1999 story was about an EgyptAir crash that has also been deemed a pilot suicide. This suggests that old related articles could surface from following hyperlinks, even if they were not initially tagged or indexed as being related. The 2005 crash is linked in the context of early speculation about the cause of the crash (cabin depressurization was initially considered). It is a less useful signal, but it could be useful in the right context.

This community of links is generally relevant, but it does veer into other territories sometimes. The Times’ large topic pages about France, Spain, and Germany all led the crawler towards stories about the Eurozone economy and the Charlie Hebdo shooting.

Washington Post

The Wapo article collected a community of just 32 links, forming a small community. When I limited the spidering to just 3 levels out, it yielded 12 Germanwings stories covering various aspects of the incident, as well as two older ones, one of which is titled “Ten major international airlines disasters in the past 50 years.”

wapo_links
Click on the image to see the graph in Fusion Tables.

The Washington Post articles dipped the farthest back in the past, with tangential but still related events like the missing Malaysia Airlines flight and the debate over airline cell phone regulations.

The Guardian

The Guardian crawler pulled 59 links, including the widest variety of topic and entity pages. It also picked up article author homepages though (e.g. http://www.theguardian.com/profile/melissa-davey). 32 of these links ended up being relevant Germanwings articles, which is well more than I expected to see…I wouldn’t have guessed the Guardian had published so many stories about it so quickly. These ranged from the forthcoming Lufthansa lawsuit to the safety of the Airbus.

guardian_links_germanwings
Click on the image to see the graph in Fusion Tables

The Guardian seems to have amassed the biggest network, and tellingly, they already have the dedicated topic page to show for it, even if it’s just a simple timeline format. The graph appears more clustered than Wapo’s, which was more sequential. But it doesn’t dip as far back in the past, and at one point, the crawler did find itself off-topic on a classical music tangent (the culprit was a story about an opera performance that honored the Germanwings victims).

Conclusion

In the end, the crawler worked well on a limited scope, but I found two problems for link-oriented recommendation and context provision:

  1. The links were often relevant, but it wasn’t clear why. More detail surrounding the context around the link is crucial. This could be served by previewing the paragraph on the page where the link occurs, so a reader could dive into the story itself. In short, a simple list wouldn’t be as detailed as a complete graph or more advanced views.
  2. The topic pages were important hubs, but also noisy and impermanent. Most NYT topic pages feature the most recent stories that have been tagged as such; this works better for a page like “Airbus SAS” than it does for “France.” As such, such an algorithm needs to treat topic pages with more nuance. Considering topic pages as “explainer” pages in their own right, one wonders how they could be improved or customized for a given event or incident.

Another wrinkle: I returned to the NYT article the next day after a few improvements to the crawler, and found that they had removed a crucial link from the article, one that connected it to the rest of the nodes. So already my data is outdated! This shows the fragility of a link-oriented recommendation scheme as it stands now.

Demystifying the Internet in Cuba

A group of early adopters at CENIAI, Havana, 1996. Photo courtesy of Larry Press.

A group of early adopters at CENIAI, Havana, 1996. Photo courtesy of Larry Press.

When it comes to the Internet, Cuba is routinely compared to countries like China, Iran, and Vietnam, where broad-reaching Internet censorship regimes exist. The degree to which Internet use is controlled by the Cuban government is great. But unlike these and many other countries, there is no evidence that the Cuban government conducts systematic censorship of online content.

Similarly, there is no reliable data on how many people in Cuba actually use the Internet — regularly-cited statistics range from 2.9%-25%. And one could spend years reading western media coverage of Cuba’s Internet and its embattled blogging community (as both of these authors have) and never figure out precisely how the Internet works there, how many people use it, and what kinds of restrictions they face in doing so. Like many other aspects of public life and experience on the island, Cuba’s digital culture is poorly understood by outsiders…

Read the whole explainer by me and Elaine on Medium.

Explainer: Nigerian elections

Since Saturday’s presidential election in Nigeria, the world has been watching. Firstly, Nigerians and observers feared that cycles of electoral violence and rigged results might repeat themselves. Secondly, a win for presidential challenger Muhammad Buhari – which looked likely early on Tuesday – would mark President Goodluck Jonathan the first incumbent not to win re-election in the country’s history. Before results and Buhari’s historic victory were confirmed later today, I created an infographic to give a brief background explainer about the Nigerian elections. If I had had more time, I would have liked to include more info on social media and tech innovations used during this election.

Explainer-Nigerian-Elections-2015

 

 

NBA Coaching Great Phil Jackson’s Triumphant Return to New York by Tammy Drummond

 

I used TImeline JS to chart Phil Jackson’s career as one of the most legendary coaches in the history of the NBA.

<iframe src=’http://cdn.knightlab.com/libs/timeline/latest/embed/index.html?source=0Ar5D8ga_6p3LdDl4NU12MXhQN3JObkRSQXVJOUVqaWc&font=Bevan-PotanoSans&maptype=toner&lang=en&height=650′ width=’100%’ height=’650′ frameborder=’0′></iframe>

This is an explainer for a Yahoo News story about the press conference Tuesday where Knicks officials introduced Phil Jackson as the organization’s new president.

 


FOLD prototype

Screen Shot 2014-03-18 at 11.51.26 PM Screen Shot 2014-03-18 at 11.51.40 PM

Kevin and I are working on a project called FOLD, which borrows the accordion metaphor for understanding the news that Ethan described last class, and tries to anticipate the reader’s contextual needs.

FOLD allows you to expand and contract elements of a story (to get more or less detail), and associates a context bar to each section of the story. A context bar can include many elements, including historical background, maps, photographs, citizen media, videos, or technical descriptions.

From observing many people consume news, we recognize that readers spend significant time acquiring contextual information in additional browser tabs, taking their attention away from the story at hand. FOLD offers journalists a way to provide readers with a curated “tangent.”

We decided to use the FOLD prototype to create an explainer of the current situation in Ukraine and Crimea. We chose this story because historical context is very important for understanding the political, economic, and social dynamics at play in the region.

The FOLD prototype is live at fold.meteor.com (works best in Chrome for now).

Contextualizing the Crimean invasion

Ukraine reports Russian ‘invasion’ on eve of Crimea vote

Ukraine accused Russia on Saturday of invading a region bordering Crimea and vowed to use “all necessary measures” to repel an attack that came on the eve of the Black Sea peninsula’s breakaway vote.

slider

The invasion reported by the Ukrainian foreign ministry was small in scale and concerned a region that lies just off the northeast coast of Crimea called the Arabat Spit.

The dramatic escalation of the most serious East-West crisis since the Cold War set a tense stage for Sunday’s referendum on Crimea’s secession from Ukraine in favour of Kremlin rule — a vote denounced by both the international community and Kiev.

The predominantly Russian-speaking region of two million people was overrun by Kremlin-backed troops days after the February 22 fall in Kiev of a Moscow-backed regime and the rise of nationalist leaders who favour closer ties with the West.

President Vladimir Putin has defended Moscow’s decision to flex its military muscle arguing that ethnic Russians in Ukraine needed “protection” from violent ultranationalists — even though Russian Foreign Minister Sergei Lavrov told US Secretary of State John Kerry on Friday that Moscow had no plans “to invade the southeast region of Ukraine.”

But the Ukrainian foreign ministry said 80 Russian military personnel had seized a village on the Arabat Spit called Strilkove with the support of four military helicopters and three armoured personnel carriers.

The ministry in a statement demanded that “the Russian side immediately withdraw its military forces from the territory of Ukraine.”

“Ukraine reserves the right to use all necessary measures to stop the military invasion by Russia.”

Footage released….  read more