So what exactly is this “Cloud” everyone is talking about?

(The actual post was written on fold with some more media and explanations)

Disclaimer: This article is about Cloud computing. If you’re looking for information regarding visible masses of liquid droplets, you should probably head over to wikipedia.

Last year I was riding in a car with a friend when we saw a large roadside advertisement that said “Microsoft Azure: The cloud for modern business”. The friend, a high school teacher, turned to me and asked: “What exactly is this cloud everyone is talking about and why should I give a damn?”. In this article I will try to answer both parts of that question.

Cloud computing is commonly defined as “the practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer.”  Some common examples, from an end user perspective, are google docs which replaces your local word processor and dropbox which replaces your physical hard-disk backup.

However, according to the above definition, it seems that every time you access a webpage which is hosted on a remote computer you’re using the Cloud. To an extent that’s true, but to truly understand the Cloud we need to zoom out of our personal perspective as a user and think like a business.

In order to think like a business, let’s imagine we want to build a startup called PuppyGram – a social network for sharing images of… Puppies! We will go through the evolution of how we would build such a service, starting at the early days of the internet and ending today.

PuppyGram will have two major software components: a database application to store all the puppy images and a web server application that serves these images in webpages and allow users to register, upload images and comment on them. Now all we need to do is get a computer, install those two pieces of software, connect it to the internet and then forget about it in the basement. Voila, PuppyGram is online!

This approach works and for a while running your own hardware was very common. However, as PuppyGram will exponentially grow (because everyone loves puppies) we will quickly need to get more network bandwidth, more storage and more computers. These computers will require maintenance, physical cooling and in case of a power outage all hell will break loose. All we wanted to do was to build a puppy network and now we have a huge electricity bill, running out of physical storage space for our computers and hiring infrastructure people to handle all of that – something doesn’t feel right here.

What can we do? In the spirit of Capitalism, we outsource. The next evolutionary step is to pay a hosting company that will physically host our computers and provide us with the electricity, network and maintenance. basically, rent a space for our hardware and pay a management fee. More than that, we can also rent the actual computers and avoid the initial cost of the hardware. This allows us, the developers of PuppyGram to focus on our actual product instead of infrastructure. That’s not entirely true because if we ever want to replace or upgrade our hardware we need to contact a person in the hosting company to physically go to the machine and do the job.

All of that changed in the summer of 2006. The change came from a bookstore called Amazon. Amazon, much like PuppyGram, were concerned that developers were spending too much of their time maintaining hardware rather than focusing on building the best software they can. Their solution – use virtualization, a technology that allows for the creation of virtual computers that run on physical computers, and build a management layer that automatically creates and destroys virtual computers. This product was called EC2 (EC for Elastic Computing). With EC2 you can go to a webpage, specify exactly what type of computer you need, which processor, how much memory and how much network bandwidth, and create it on the fly – within minutes. You can also upgrade, destroy and clone any existing computer, paying only for the resources you’ve used. Amazon released EC2 to the public so that anyone could use their infrastructure and the response was overwhelming. From Dropbox to Airbnb and Instagram, almost any major tech company in the past decade is or was hosted on Amazon EC2.

EC2 and similar systems, whether it’s Google Cloud Computing or Microsoft Azure, are commonly referred to as Cloud Platforms because they obscure the underlying infrastructure. When you create a virtual machine with one of those services you have no idea where the actual physical machine that runs it is geographically located. More than that, it can actually swap locations if there’s a problem with it’s host. The magical thing is – you just don’t care.

So why should you give a damn? the main reason that modern web applications like PuppyGram can be built in months, weeks, or even days is a direct result of the existence of these Cloud platforms. These services allow for small groups of entrepreneurs in coffee shops and garages across the world to build their products on the same infrastructure as the industry’s giants.
Posted in All

Tracing the links of the Germanwings disaster

A week ago a German jet crashed into the Alps, killing all 144 people on board. For the first several hours after the tragedy it was considered an accident, but it is now apparent that the plane’s co-pilot, Andreas Lubitz, is responsible, and details continue to emerge about his past. As more facts surface, news outlets covering the tragedy have released them in incremental updates. These updates have touched on a wide variety of questions: Why was no one aware of or worried about his mental health issues? Should he have been flying a plane in the first place? Have suicide plane crashes happened before? How has small-town Germany — such as the town of the 16 high school students on board or the pilot’s hometown — reacted to the horrific event?

When publishing these updates, publishers are often linking back to previous stories as a proxy for background information. The “original” story breaking the incident tends to be low on hyperlinks (such as the first link above, which only links to a Germany topic page) while later updates start to link back to archival stories for context. I was curious whether these internal, archival hyperlinks could be followed in order to automatically create a community of stories, one that touches on a variety of aspects of the incident. Links are rarely added to stories retroactively, so in general, following the links means traveling back in time. Could a crawler organize all the links for me, and present historical content (whether over the past 3 days or 10 years) for the Germanwings disaster?

I built a crawler that follows the inline, internal links in an article, and subsequently builds a graph spidering out from the source, storing metadata like link location and anchor text along the way. It doesn’t include navigational links, only links inside the article text; and it won’t follow links to YouTube or Wikipedia, just, for instance, the Times. This quickly builds up a dialogue of stories within a publisher’s archive, around one story; from here, it is easy to experiment with simple ranking algorithms like the most-cited, the oldest, or the longest article.

I chose three incremental update articles from March 30, one each from the Times, the Post, and the Guardian, all reporting that Lubitz was treated for suicidal tendencies:

For each of these three, I spidered out as far as they could go (though in the case of the Times that turned infinite, so I had to stop it somewhere).

New York Times

My first strategy was to simply look at the links that the article already contained. While the system can track links pointing in as well as out, this aticle only had outlinks; presumably this is because a) it was a very recent article at the time of the query, and b) we cannot be sure that we have all of the related stories from the given spider.

Clicking on a card will reveal the card’s links in turn–both inlinks and outlinks.

The “germanwings-crash.html” article had several links that formed a clear community, including archival stories about plane crashes from 1999 and 2005. The 1999 story was about an EgyptAir crash that has also been deemed a pilot suicide. This suggests that old related articles could surface from following hyperlinks, even if they were not initially tagged or indexed as being related. The 2005 crash is linked in the context of early speculation about the cause of the crash (cabin depressurization was initially considered). It is a less useful signal, but it could be useful in the right context.

This community of links is generally relevant, but it does veer into other territories sometimes. The Times’ large topic pages about France, Spain, and Germany all led the crawler towards stories about the Eurozone economy and the Charlie Hebdo shooting.

Washington Post

The Wapo article collected a community of just 32 links, forming a small community. When I limited the spidering to just 3 levels out, it yielded 12 Germanwings stories covering various aspects of the incident, as well as two older ones, one of which is titled “Ten major international airlines disasters in the past 50 years.”

Click on the image to see the graph in Fusion Tables.

The Washington Post articles dipped the farthest back in the past, with tangential but still related events like the missing Malaysia Airlines flight and the debate over airline cell phone regulations.

The Guardian

The Guardian crawler pulled 59 links, including the widest variety of topic and entity pages. It also picked up article author homepages though (e.g. 32 of these links ended up being relevant Germanwings articles, which is well more than I expected to see…I wouldn’t have guessed the Guardian had published so many stories about it so quickly. These ranged from the forthcoming Lufthansa lawsuit to the safety of the Airbus.

Click on the image to see the graph in Fusion Tables

The Guardian seems to have amassed the biggest network, and tellingly, they already have the dedicated topic page to show for it, even if it’s just a simple timeline format. The graph appears more clustered than Wapo’s, which was more sequential. But it doesn’t dip as far back in the past, and at one point, the crawler did find itself off-topic on a classical music tangent (the culprit was a story about an opera performance that honored the Germanwings victims).


In the end, the crawler worked well on a limited scope, but I found two problems for link-oriented recommendation and context provision:

  1. The links were often relevant, but it wasn’t clear why. More detail surrounding the context around the link is crucial. This could be served by previewing the paragraph on the page where the link occurs, so a reader could dive into the story itself. In short, a simple list wouldn’t be as detailed as a complete graph or more advanced views.
  2. The topic pages were important hubs, but also noisy and impermanent. Most NYT topic pages feature the most recent stories that have been tagged as such; this works better for a page like “Airbus SAS” than it does for “France.” As such, such an algorithm needs to treat topic pages with more nuance. Considering topic pages as “explainer” pages in their own right, one wonders how they could be improved or customized for a given event or incident.

Another wrinkle: I returned to the NYT article the next day after a few improvements to the crawler, and found that they had removed a crucial link from the article, one that connected it to the rest of the nodes. So already my data is outdated! This shows the fragility of a link-oriented recommendation scheme as it stands now.

Demystifying the Internet in Cuba

A group of early adopters at CENIAI, Havana, 1996. Photo courtesy of Larry Press.

A group of early adopters at CENIAI, Havana, 1996. Photo courtesy of Larry Press.

When it comes to the Internet, Cuba is routinely compared to countries like China, Iran, and Vietnam, where broad-reaching Internet censorship regimes exist. The degree to which Internet use is controlled by the Cuban government is great. But unlike these and many other countries, there is no evidence that the Cuban government conducts systematic censorship of online content.

Similarly, there is no reliable data on how many people in Cuba actually use the Internet — regularly-cited statistics range from 2.9%-25%. And one could spend years reading western media coverage of Cuba’s Internet and its embattled blogging community (as both of these authors have) and never figure out precisely how the Internet works there, how many people use it, and what kinds of restrictions they face in doing so. Like many other aspects of public life and experience on the island, Cuba’s digital culture is poorly understood by outsiders…

Read the whole explainer by me and Elaine on Medium.

Timeline for the firing of Aristegui

For this assignment I worked on the article of the New York Times about the Carmen Aristegui dismissal “In Mexico, Firing of Carmen Aristegui Highlights Rising Pressures on News Media”

The way I did it was creating a timeline of the latest events in Mexico that could be involved in the story. One of the main believes is that it was the result of some of the investigative journalism that se has been doing as well as her criticism to the Federal Administration, but there isn’t any evidence about that.

Here’s the time line and the link to it:

Screenshot 2015-03-31 18.02.47

Posted in All

Explainer: Nigerian elections

Since Saturday’s presidential election in Nigeria, the world has been watching. Firstly, Nigerians and observers feared that cycles of electoral violence and rigged results might repeat themselves. Secondly, a win for presidential challenger Muhammad Buhari – which looked likely early on Tuesday – would mark President Goodluck Jonathan the first incumbent not to win re-election in the country’s history. Before results and Buhari’s historic victory were confirmed later today, I created an infographic to give a brief background explainer about the Nigerian elections. If I had had more time, I would have liked to include more info on social media and tech innovations used during this election.




What’s going on in Yemen and why you should care

The assignment was done in collaboration with Vladimir using Thinglink and Google Maps

From Reuters: Saudi troops clashed with Yemeni Houthi fighters on Tuesday in the heaviest exchange of cross-border fire since the start of a Saudi-led air offensive last week, while Yemen’s foreign minister called for a rapid Arab intervention on the ground.

Saudi Arabia has been leading a coalition of Arab states since last Thursday in an air campaign against the Shi’ite Houthis, who emerged as the most powerful force in the Arabian Peninsula’s poorest country when they seized Yemen’s capital last year.

The Saudis say their aim is to restore President Abd-Rabbu Mansour Hadi, who left the country last week. The Houthis are allied with Saudi Arabia’s regional foe Iran, and backed by army units loyal to longtime ruler Ali Abdullah Saleh, who was pushed out three years ago after “Arab Spring” demonstrations.

Who’s fighting inside the country

Who’s joining in from the outside and why

Putin says he is against “external interference”

Russian president Vladimir Putin has prompted an angry response from Saudi Arabia after sending a letter to the Arab League commenting on Saudi-led air strikes against Yemen’s Houthi fighters. The letter was read out at a summit of Arab leaders in Egypt on Sunday.

“We support the Arabs’ aspirations for a prosperous future and for the resolution of all the problems the Arab world faces through peaceful means, without any external interference,” Putin’s letter read, going on to condemn extremist groups such as ISIS.

Yemen has been a battling ground between East and West for decades

From Wikipedia: In less than a year, after the Egyptian–Syrian unification in 1958, Egypt’s pro-Soviet strategy had returned to power. Saud had once again joined their alliance, which declined the US-Saudi relationship to a fairly low point especially after he announced in 1961 that he changed his mind on renewing the U.S. base.[11] In 1962, however, Egypt attacked Saudi Arabia from bases in Yemen during the 1962 Yemeni revolution because of Saudi Arabia’s Anti-revolution propaganda, which made Saud seek the U.S. support. President John F. Kennedy immediately responded to Saud’s request by sending U.S. war planes in July 1963 to the war zone to stop the attack which was putting U.S. interests in risk.

British colonialism as possible root cause

From Foreign Policy Journal: Arab nationalism reached unprecedented heights as a result of Western interference in the Middle Eastern affairs, especially after the Suez Canal debacle in 1956. Rebels in Yemen supported by then Egyptian President Gamal Abdul Nasser began guerrilla attacks on British forces stationed in Aden to force their withdrawal from the region.

Civil War In Yemen In 1962

Egyptian incursion into Yemen in the 1960’s is remembered as “Egyptian Vietnam”

From the Washington Post: In the 1960s, Egypt entered into a long, costly quagmire in Yemen. The Egyptian president at the time, Gamal Abdel Nasser, a secular autocrat and a champion of pan-Arabism, chose to intervene in Yemen in support of a republican coup led by military officers seeking to oust the country’s monarchy in 1962. Nasser himself came to power the decade prior on the back of an officers’ coup which overthrew Egypt’s fusty constitutional monarchy. Now, he wanted to help a neighboring Arab nation follow in Egypt’s mold.

But Saudi Arabia was set against this state of affairs and sought to return Yemen’s ruling Imam to the throne, and pumped in arms and money to royalist militias. Ironically, these included many tribesmen from the Shiite Zaydi sect, which now forms the backbone of the Houthi rebellion the Saudis are so desperate to quash.

The tens of thousands of soldiers Egypt sent in as an expeditionary force into Yemen soon found themselves on the front line of a civil war, taking the lead in the defense of Yemeni republicanism. What followed was a long, difficult conflict that ground on for nearly a decade.

More than 10,000 Egyptian soldiers died, prompting some historians to call the war the “Egyptian Vietnam”.

U.S. forces leaving Yemen

From AP, March 21st: The U.S. troops, including Special Forces commandos, were leaving the al-Annad air base near the southern city of al-Houta, Yemeni military and security officials said.

U.S. staying nearby

Screen Shot 2015-03-31 at 8.43.21 PM

U.S. military has a base in Djibouti, just across the sea from Yemen. This base serves as a launch pad for drone strikes in the region.

US military Camp Lemonnier even has a Facebook page. Feel free to ask them a question.

Posted in All

What is Deep Learning, anyway?

Recently in the tech world, the term deep learning is almost as big a buzzword as “big data” or “machine learning”. In spite to the hype in this emerging research area, a lot of tech news articles seem to be quite overly optimistic about the technology, so this explainer hopes to explain deep learning, its whereabout, recent trends, and what deep learning is not.

This presentation seeks to use graphical navigation, where each node presents a chunk of information. The node color denotes a related semantic topic group and connections suggest similarity between chunks of information. Upon hovering the nodes, the highlighted nodes are the suggested next chunks to read. Due to the time constraint, this explainer is done in a very crude-proof-of-concept way, but hopefully it should show the gist of design philosophy.

Posted in All

Is today’s pot GMO?

I was deep into a longish story about pot potency last week when I was surprised to see an unchallenged assertion that today’s marijuana is GMO, or a genetically modified organism.

I thought that question was settled by the Pulitzer-winning truth-checkers at Politifact. Last year Politifact took on Patrick Kennedy, a spokesman for Smart Approaches to Marijuana (SAM), a national group opposed to legalizing weed. Kennedy came under scrutiny for saying modern marijuana is genetically modified and much stronger than what Barack Obama smoked as a teenager. Politifact agreed that pot potency had increased, but said Kennedy’s claim about genetic modification didn’t hold up.

“The most off-base part of Kennedy’s claim is that the rise in THC levels comes from ‘genetic modification,” said Politifact. “It’s actually from genetic selection, a very old process of producing desired traits from crops.”

Continue reading

Posted in All