I wrote an explainer about the Cuban Thaw, which refers to the recent normalization of relations between the US and Cuba. All the gifs in the article were made with a tool that I’m working on called Glyph, which is like an instagram for making details gifs from YouTube videos.
Here’s the story: https://readfold.com/read/sannabh/what-you-need-to-know-about-the-cuban-thaw-CST8k5cg
“More than seven decades after the war, 100.000 bodies waiting to be found” (Headline in Spanish digital outlet 20 minutos).
Every other day there is news of another mass grave found in Spain. These findings coincide with a resurgence of the divisions that lead to the Spanish Civil War (1936-1939) and the following for decades of dictatorship under general Franco. As families exhume the bodies of those killed during and after the conflict, and the grandchildren of the victims push for justice, Spaniards have started asking themselves if they made the right decision when they decided to impose forgiveness from the past, instead of confronting it. Explaining what has gone wrong with Spain could help other countries in their transitions and/or dealing with the aftermath of civil conflicts: Continue reading →
Starting off my work for the interview assignment where I explored embedding citations in a page, this is the flip-side, how can you make it easy for other people to cite you as a source, e.g. when doing an explainer.
Of particular interest was following the chain of citation, maybe it’s okay to cite a wikipedia article if you see where that wikipedia article is citing itself from. In the webpage, try clicking cite with text in the paragraph with a citation and without.
(The actual post was written on fold with some more media and explanations)
Disclaimer: This article is about Cloud computing. If you’re looking for information regarding visible masses of liquid droplets, you should probably head over to wikipedia.
Last year I was riding in a car with a friend when we saw a large roadside advertisement that said “Microsoft Azure: The cloud for modern business”. The friend, a high school teacher, turned to me and asked: “What exactly is this cloud everyone is talking about and why should I give a damn?”. In this article I will try to answer both parts of that question.
Cloud computing is commonly defined as “the practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer.” Some common examples, from an end user perspective, are google docs which replaces your local word processor and dropbox which replaces your physical hard-disk backup.
However, according to the above definition, it seems that every time you access a webpage which is hosted on a remote computer you’re using the Cloud. To an extent that’s true, but to truly understand the Cloud we need to zoom out of our personal perspective as a user and think like a business.
In order to think like a business, let’s imagine we want to build a startup called PuppyGram – a social network for sharing images of… Puppies! We will go through the evolution of how we would build such a service, starting at the early days of the internet and ending today.
PuppyGram will have two major software components: a database application to store all the puppy images and a web server application that serves these images in webpages and allow users to register, upload images and comment on them. Now all we need to do is get a computer, install those two pieces of software, connect it to the internet and then forget about it in the basement. Voila, PuppyGram is online!
This approach works and for a while running your own hardware was very common. However, as PuppyGram will exponentially grow (because everyone loves puppies) we will quickly need to get more network bandwidth, more storage and more computers. These computers will require maintenance, physical cooling and in case of a power outage all hell will break loose. All we wanted to do was to build a puppy network and now we have a huge electricity bill, running out of physical storage space for our computers and hiring infrastructure people to handle all of that – something doesn’t feel right here.
What can we do? In the spirit of Capitalism, we outsource. The next evolutionary step is to pay a hosting company that will physically host our computers and provide us with the electricity, network and maintenance. basically, rent a space for our hardware and pay a management fee. More than that, we can also rent the actual computers and avoid the initial cost of the hardware. This allows us, the developers of PuppyGram to focus on our actual product instead of infrastructure. That’s not entirely true because if we ever want to replace or upgrade our hardware we need to contact a person in the hosting company to physically go to the machine and do the job.
All of that changed in the summer of 2006. The change came from a bookstore called Amazon. Amazon, much like PuppyGram, were concerned that developers were spending too much of their time maintaining hardware rather than focusing on building the best software they can. Their solution – use virtualization, a technology that allows for the creation of virtual computers that run on physical computers, and build a management layer that automatically creates and destroys virtual computers. This product was called EC2 (EC for Elastic Computing). With EC2 you can go to a webpage, specify exactly what type of computer you need, which processor, how much memory and how much network bandwidth, and create it on the fly – within minutes. You can also upgrade, destroy and clone any existing computer, paying only for the resources you’ve used. Amazon released EC2 to the public so that anyone could use their infrastructure and the response was overwhelming. From Dropbox to Airbnb and Instagram, almost any major tech company in the past decade is or was hosted on Amazon EC2.
EC2 and similar systems, whether it’s Google Cloud Computing or Microsoft Azure, are commonly referred to as Cloud Platforms because they obscure the underlying infrastructure. When you create a virtual machine with one of those services you have no idea where the actual physical machine that runs it is geographically located. More than that, it can actually swap locations if there’s a problem with it’s host. The magical thing is – you just don’t care.
So why should you give a damn? the main reason that modern web applications like PuppyGram can be built in months, weeks, or even days is a direct result of the existence of these Cloud platforms. These services allow for small groups of entrepreneurs in coffee shops and garages across the world to build their products on the same infrastructure as the industry’s giants.
A week ago a German jet crashed into the Alps, killing all 144 people on board. For the first several hours after the tragedy it was considered an accident, but it is now apparent that the plane’s co-pilot, Andreas Lubitz, is responsible, and details continue to emerge about his past. As more facts surface, news outlets covering the tragedy have released them in incremental updates. These updates have touched on a wide variety of questions: Why was no one aware of or worried about his mental health issues? Should he have been flying a plane in the first place? Have suicide plane crashes happened before? How has small-town Germany — such as the town of the 16 high school students on board or the pilot’s hometown — reacted to the horrific event?
When publishing these updates, publishers are often linking back to previous stories as a proxy for background information. The “original” story breaking the incident tends to be low on hyperlinks (such as the first link above, which only links to a Germany topic page) while later updates start to link back to archival stories for context. I was curious whether these internal, archival hyperlinks could be followed in order to automatically create a community of stories, one that touches on a variety of aspects of the incident. Links are rarely added to stories retroactively, so in general, following the links means traveling back in time. Could a crawler organize all the links for me, and present historical content (whether over the past 3 days or 10 years) for the Germanwings disaster?
I built a crawler that follows the inline, internal links in an article, and subsequently builds a graph spidering out from the source, storing metadata like link location and anchor text along the way. It doesn’t include navigational links, only links inside the article text; and it won’t follow links to YouTube or Wikipedia, just, for instance, the Times. This quickly builds up a dialogue of stories within a publisher’s archive, around one story; from here, it is easy to experiment with simple ranking algorithms like the most-cited, the oldest, or the longest article.
I chose three incremental update articles from March 30, one each from the Times, the Post, and the Guardian, all reporting that Lubitz was treated for suicidal tendencies:
For each of these three, I spidered out as far as they could go (though in the case of the Times that turned infinite, so I had to stop it somewhere).
New York Times
My first strategy was to simply look at the links that the article already contained. While the system can track links pointing in as well as out, this aticle only had outlinks; presumably this is because a) it was a very recent article at the time of the query, and b) we cannot be sure that we have all of the related stories from the given spider.
Clicking on a card will reveal the card’s links in turn–both inlinks and outlinks.
The “germanwings-crash.html” article had several links that formed a clear community, including archival stories about plane crashes from 1999 and 2005. The 1999 story was about an EgyptAir crash that has also been deemed a pilot suicide. This suggests that old related articles could surface from following hyperlinks, even if they were not initially tagged or indexed as being related. The 2005 crash is linked in the context of early speculation about the cause of the crash (cabin depressurization was initially considered). It is a less useful signal, but it could be useful in the right context.
This community of links is generally relevant, but it does veer into other territories sometimes. The Times’ large topic pages about France, Spain, and Germany all led the crawler towards stories about the Eurozone economy and the Charlie Hebdo shooting.
Washington Post
The Wapo article collected a community of just 32 links, forming a small community. When I limited the spidering to just 3 levels out, it yielded 12 Germanwings stories covering various aspects of the incident, as well as two older ones, one of which is titled “Ten major international airlines disasters in the past 50 years.”
Click on the image to see the graph in Fusion Tables.
The Washington Post articles dipped the farthest back in the past, with tangential but still related events like the missing Malaysia Airlines flight and the debate over airline cell phone regulations.
The Guardian
The Guardian crawler pulled 59 links, including the widest variety of topic and entity pages. It also picked up article author homepages though (e.g. http://www.theguardian.com/profile/melissa-davey). 32 of these links ended up being relevant Germanwings articles, which is well more than I expected to see…I wouldn’t have guessed the Guardian had published so many stories about it so quickly. These ranged from the forthcoming Lufthansa lawsuit to the safety of the Airbus.
Click on the image to see the graph in Fusion Tables
The Guardian seems to have amassed the biggest network, and tellingly, they already have the dedicated topic page to show for it, even if it’s just a simple timeline format. The graph appears more clustered than Wapo’s, which was more sequential. But it doesn’t dip as far back in the past, and at one point, the crawler did find itself off-topic on a classical music tangent (the culprit was a story about an opera performance that honored the Germanwings victims).
Conclusion
In the end, the crawler worked well on a limited scope, but I found two problems for link-oriented recommendation and context provision:
The links were often relevant, but it wasn’t clear why. More detail surrounding the context around the link is crucial. This could be served by previewing the paragraph on the page where the link occurs, so a reader could dive into the story itself. In short, a simple list wouldn’t be as detailed as a complete graph or more advanced views.
The topic pages were important hubs, but also noisy and impermanent. Most NYT topic pages feature the most recent stories that have been tagged as such; this works better for a page like “Airbus SAS” than it does for “France.” As such, such an algorithm needs to treat topic pages with more nuance. Considering topic pages as “explainer” pages in their own right, one wonders how they could be improved or customized for a given event or incident.
Another wrinkle: I returned to the NYT article the next day after a few improvements to the crawler, and found that they had removed a crucial link from the article, one that connected it to the rest of the nodes. So already my data is outdated! This shows the fragility of a link-oriented recommendation scheme as it stands now.
The way I did it was creating a timeline of the latest events in Mexico that could be involved in the story. One of the main believes is that it was the result of some of the investigative journalism that se has been doing as well as her criticism to the Federal Administration, but there isn’t any evidence about that.
Since Saturday’s presidential election in Nigeria, the world has been watching. Firstly, Nigerians and observers feared that cycles of electoral violence and rigged results might repeat themselves. Secondly, a win for presidential challenger Muhammad Buhari – which looked likely early on Tuesday – would mark President Goodluck Jonathan the first incumbent not to win re-election in the country’s history. Before results and Buhari’s historic victory were confirmed later today, I created an infographic to give a brief background explainer about the Nigerian elections. If I had had more time, I would have liked to include more info on social media and tech innovations used during this election.