NewsMap- a long-term way of processing news stories

Below is a mockup of a concept I’ve had for a while called NewsMap*, which is a way of curating and annotating the news by storing news stories in a personal categorized dashboard.

NewsMap would allow people to process the news in a more coherent manner, as opposed to jumping from headline to headline, day to day. For example, if someone is interested in MOOCs, they can store an interesting article about MOOCs in the Ed Tech folder on their NewsMap dashboard. Two weeks later, when there’s another front page article about MOOCs, they can drag and drop it to that same folder, and compare the two stories and start to build a narrative of who the players are, what the trends are, etc air max sale.

NewsMap would galvanize people to action through first and foremost supporting their intellectual curiosity and helping them make connections between related news stories (as well as information on the web that isn’t necessarily categorized as “news,” such as organizations, campaigns, emails, and blog posts). Their dashboard would become a visual representation of the connections they make between research, policy, and practice original new balance.

NewsMap mockupIn the long term, this concept is something I plan to integrate into my school mapping project.

Always looking for feedback and assistance 🙂



*I am aware that the name NewsMap is already taken.

air jordan sneaker

Old and New Collide at Harvard Art Museum’s Lightbox Gallery

In November 2014, after the three stalwarts of the Boston art scene–the Fogg Museum, Busch-Reisinger Museum, and Arthur M. Sackler Museum–became one institution under the Harvard Art Museum, a smaller, experimental gallery launched within its walls under the masterful direction of Harvard’s metaLAB, “an idea foundry, knowledge­-design lab, and production studio dedicated to innovation and experimentation in the networked arts and humanities.” The space was designed to push the boundaries of an otherwise traditional museum space through, “digital experiments and new media projects that respond to collections held at the Harvard Art Museums.”

The entirety of the room is covered with screens, projectors, network jacks and various other necessities that would make any experimental artist salivate. But what might make this gallery most exciting is the juxtaposition of next-generation tech, and the traditional artworks of the museum.

Nothing feels more well-suited to this surprising marriage than the upcoming video exhibiton “YOUR STORY HAS TOUCHED MY HEART,” opening May 23. The exhibition highlights the truly extraordinary American Professional Photographers Collection, over 20,000 photographs depicting American life in the late 19th and early 20th century.

The photographs depict the hopes and dreams—and fears—of Americans as they imagined themselves at their best. Your Story Has Touched My Heart combines these photographs with new video footage, sound, and fragments of text that put the work in dialogue with memory, individuality, ephemerality, and the meaning of visual abundance as these images find their way in the digital realm.

– metaLAB

A special screening will be held on May 25, followed by a discussion of the video and corresponding works by the metaLAB’s Matthew Battles and Sarah Newman, in concert with one of the APPC’s curators, Professor Kate Palmer Albers.

The 20,000+ collection is impressive both in its size and the consistent quality of the photos and negatives. To gain more exposure before the event, follow our Boston Bot on Twitter, and receive periodic links to images in the collection!

Alfie: The Audio Butler

(Team: Brittany, Sravanti, Ashley D.)

As avid listeners of public radio and podcasts, we found ourselves wondering why it would be easier to book a trip to Timbuktu from our phones than to share an interesting clip from the latest episode of This American Life on Twitter. Most of the media we consume doesn’t have this problem. You read or watch, you export a link directly to social media with your comment and call it a day.

We have watched with interest as several notable names in audio – This American Life included – have taken steps to address this shareability gap with mixed success. Whether it’s klpr or Clammr, audiograms or Audible or Anchor, there does not yet appear to be a complete solution to sharing audio content with your networks. Either it isn’t user-driven or it isn’t integrated with the social media platforms people are actually using.

Part of the problem, we learned, is the internet itself – it simply wasn’t built with audio in mind. And the user experience for audio often remains laborious, even if the listening experience can also be serendipitous and the friend of multitaskers everywhere. Several knowledgeable people have explored why audio doesn’t tend to go viral (and when it might) – but we aren’t necessarily looking for virality. We’re looking for something deeper, something that befits the level of engagement that makes this medium uniquely valuable.

If podcasts are truly the last undiscovered country, the media market poised for continued growth rather than decline, we deserve a set of tools that allows us to easily and readily build a community – a conversation – around that content. Especially for those citizen journalist podcasters out there, all around the world, who are trying to build an audience and generate some dialogue around their grassroots endeavor.

So we set out to prototype a mobile application that would act like a sharing plug-in, overlaid on top of existing listening platforms. (Of course, this would require those platforms to integrate such an app, and while we feel it would be in their best interest to do so, let’s agree to suspend disbelief for a few minutes.) This application, which we’re calling Alfie the Audio Butler, allows the user to 1) export a cool clip to Facebook or Twitter, 2) annotate as they listen and see what their existing social networks have to say about the same content, and/or 3) leave a comment with their own voice. It’s immediate, it’s simple, and it’s on-the-go.

You can check out the interactive prototype below (thanks to Ally Palanzi’s Clipper on github for getting us started with the code), including some demos geared toward this audio story we reported on the subject:

Demo “Alfie” here.




What does Hillary Clinton’s Inbox look like?

I, too, am tired of hearing about Hillary’s use of a private email server. On the other hand, it led to a pretty neat data set to unpack: a dump of emails she’s sent and received.

I played around with this data set a bit and was particularly interested in how different groups of people interacted with Hillary. Did men use shorter sentences than women, for example? Did her staffers send one-liners versus ambassadors who sent lengthy emails? Did she have interesting relationships with people we might not be familiar with?

I didn’t get a chance to answer all of these questions, but I ended up being interested in the way words in her email were clustered, and decided to come up with a visualization based on that.

For a simple representation to start, I created a scatter plot visualization using mpld3, which creates interactive matplotlib graphs for the browser. It’s clunky to navigate (you need to switch to a zoom-in mode, drag a rectangular portion of the graph to zoom in on, then switch again to the cursor mode to scroll over words), but it’s interesting to see which words appear together for a first step.

Isn’t it interesting that “bipartisan” appears well outside the main cluster of words?

Isn’t it interesting that “bipartisan” appears well outside the main cluster of words?

Lesson learned along the way: visualizing text is hard. I found that the norm for text visualizations out there, such as word clouds or circle packing, was reductionist for some of the data I have, like topic models or k-means clustering.

While I didn’t create data visualizations for some of the questions I posed earlier, I do have some statistics:

For males:

6187 sentences
83764 word tokens
10762 word types
7.78 average tokens per type
13.54 average sentence length
5.01 average word token length
7.34 average word type length
Hapax legomena (words that appear only once – an indicator of vocabulary usage) comprise 49.60% of the types

For females:

22845 sentences
369517 word tokens
30386 word types
12.16 average tokens per type
16.17 average sentence length
4.94 average word token length
7.84 average word type length
Hapax legomena comprise 49.76% of the types

I analyzed all of Taylor Swift’s lyrics so you don’t have to.

At the 58th Grammy Awards earlier this year, Taylor Swift became the first woman to win Album of the Year twice for a solo album.

By the numbers, this shouldn’t come as a shock. Swift — an objectively gifted singer, songwriter, and performer — has had a wildly successful career by any metric. That said, if I had to list the top 10 female performers of my lifetime I’m not sure Swift would make the cut. As culture critic Camille Paglia so delicately put it for The Hollywood Reporter, I find her music to be “mainly complaints about boyfriends, faceless louts who blur in her mind as well as ours.”

While the internet is rife with Taylor Swift listicles analyzing the lyrics of her songs, data-driven analysis is scarce (or, more likely, just private). So, in the spirit of collect and verify, I decided to do a textual analysis of TSwift’s work using Word Counter to see just how boy-centric her lyrics actually were.

True to Sands prediction from last class: 80% of my time was spent on data collection, 15% was spent sifting through said data, and I’m wrapping up the remaining 5% now. Using the database AZLyrics,  I combed through the many, many songs of Taylor Swift. To date, she has released five studio albums, two live albums, two video albums, two extended plays (EPs), 37 singles, three featured singles, and eleven promotional singles To keep things simple, I decided to stick with her five studio albums, Taylor Swift (2006), Fearless (2008), Speak Now (2010), Red (2012), and 1989 (2014).

Word Counter is a pretty straightforward tool: it counts the words, bigrams, and trigrams in a plain text document which you can either paste directly into the browser or upload to the site. From there, you can download the single word counts, bigrams (2 contiguous words), and trigrams (3 contiguous words) into .csv format. Between the five albums, I copied in text from 69 songs and then downloaded the data.

Then the process became a bit less straightforward. Comparing single word-counts of individual songs and albums side by side didn’t really give me a ton of useful insight — not to mention, it’s a fairly boring way to see the data. I decided to compare Swift’s two “Albums of the Year” — 1989 (in blue) and Fearless (magenta) — by plugging the songs’ text into Tagul, a very user friendly word cloud art generator.


Other than showing Ms. Swift is a thematically consistent songwriter, this didn’t give me much to go by. Perhaps, if I compared the two albums’ most frequently used trigrams?


Aha — now we were getting somewhere. Where Fearless (right) reinforces my earlier criticism, the trigrams from 1989 — namely, the song “Shake it off” -focus more intensely on Swift herself. As she explained to Rolling Stone in 2014: “When you live your life under that kind of scrutiny, you can either let it break you, or you can get really good at dodging punches. And when one lands, you know how to deal with it. And I guess the way that I deal with it is to shake it off.”

Ultimately, my textual investigation should have supplemented a broader investigation which also examined songs Swift wrote vs. co-produced and weighted the popularity of the songs. From the data I did collect, it seems Camilla Pagalia and I should maybe give Swift another chance: the pop star is shifting tone, however incrementally, from the lovestruck ballads of albums past.   


Whales, whales, whales


A few weeks ago a friend of mine shared this image that a friend of hers had originally posted to Facebook.  The image was not linked to an article and did not cite a source (I have since found that it came from The Sun.  The image sent me down a rabbit hole learning about whale beachings (there have been two large ones since the start of the year one of a pod of sperm whale in the North Sea and the other of a pod of pilot whales of the coast of India.


Some articles posed theories about how and why these animals were beaching  but most said there were no conclusive reasons cited yet.  It seems that it conducting complete narcopsies for whales is timely and expensive.  The reports for 21 pilot whales beached in Scotland in 2013 were just released in the end of 2015.  That report supported the most likely theory that I had read among the different articles: that the whales had ingested so much mercury over their life times that it had damaged their ability to navigate the waters and resulted in their fatal disorientation.  Most papers reported that sperm whales beached in the North Sea had gotten lost in shallow waters looking for a giant squid and noted that this is often thought to be the reason that whales beach: they get lost in shallow waters and then can not get out or can not find food and die before reaching the shore.


But I wondered why the whales were getting lost and if they were getting lost more often then before.  Wikipedia offered a listing of all the reported beachings of sperm whales since the mid 1700’s but when i graphed this data it seemed erratic.  Then I decided to graph all the reported mass beachings of pilot whales and the steady increase was much more evident.  I dropped the sperm whales from the exercise and decided to focus my data on the pilot whales.

The studies are still inconclusive that increased mercury levels cause neurological damage and disorientation specifically in whales but this damage has been proven conclusively in other high order mammals and one article in National Geographic cited the study of the pilot whales and referenced the possible link between the toxins and the beachings.

As further context I visited the New Bedford Whaling Museum as part of my research and had a nice time talking for a few hours to a docent there.  The museum seemed to target elementary school programs and I think a bit of that aesthetic seemed into my video!

The map that appears in the video is originally from this site.


Posted in All

What Poverty Steals

I’m fascinated by the tangle of life expectancy, wealth and poverty, income inequality and social mobility.

My data viz was prompted by new research detailing poor people’s shorter life expectancies and The Atlantic article about the 47 percent of cash-strapped Americans who said they couldn’t come up with $400 for an unexpected expense.

Economic stability is about income, but it’s also about assets and wealth. It’s about having a cushion to shield you from the inevitable unexpected expense of car repairs or a medical emergency.

Poor people don’t have that margin of error, which is one of the reasons that economic mobility is so low.  Kids who are born poor in Shelby County (the county that holds Memphis) die poor. Only 2.6 percent of children raised in the bottom quintile of household income in the Memphis area rise to the top quintile by adulthood. According to a New York Times interactive, “Shelby County is very bad for income mobility for children in poor families. It is better than only about 9 percent of counties.”

Since Shelby County is majority black and a disproportionate share of the poor people are black, I wanted to focus specifically on black people. (Hispanics also have an insanely high poverty rate, but there’s relatively few of them in Shelby County/Memphis and most are recent immigrants.)

Here’s what I wanted to determine for people in the county where I live, Shelby County:

If poor people had the same life expectancy of rich people, how much more could they expect to earn over those additional working years? If you add up all those dollars, how many millions of dollars are poor black people in Shelby County forfeiting simply because they’re poor, black and live where they do?

If I could answer this question, I wanted to show the data similar to how Periscopic animated the years lost to gun deaths.

periscopic gun deathsSpoiler alert: I don’t have the data to answer the question I was trying to answer. Especially not on the $$ end.

Nevertheless, I did a short video, 1:06. And I figured out how to add music.

I was going to build a little bar chart showing the difference in life expectancy between poor people and rich people in Shelby County – or one comparing the income disparity in life expectancy by the biggest counties in Tennessee, but there wasn’t a whole lot of difference. And I can do a bar chart, so I was trying to figure out what I didn’t know how to do.

*** I’m pretty sure my math is all wrong, because there’s far more than 26,000 black adults in Shelby County who are poor (as defined by living in the bottom quartile of household income). Would love to think through how to answer my question with someone who knows.

How to win a Nobel

Last fall, I scraped and cleaned data for the more than 21,000 nominations submitted for Nobel Prizes between 1901 and 1966 — the only years for which data were publicly available. For each nomination, the database contains the names of both the nominator and the nominee, along with such information as their gender, hometown, birth year, death year, and profession.

Some surprising factoids began to jump out at me as I looked over the data. I thought I’d tell the story of one of them for this assignment, .

To see the story, download the zip from and open index.html in your browser.

Posted in All

Profile (belated) and data story: Sands Fish

I interviewed Sands Fish for our class profiles assignment months ago and decided to try to profile him through the medium in which he is an expert: data visualization. However, I ran into a road block that I wasn’t able to resolve until our data visualization class. So I’m combining two assignments in one and finally presenting my results.

After Sands and I talked, I transcribed 25 minutes of our interview, including even the “um”s and “yeah”s. Then I analyzed the text from several different perspectives, trying to echo Sands’ work with MediaCloud, which crunches massive amounts of data to discover the relationships between words and the people who use them. In our case, I wanted to get a visual representation of the themes and rhythm of our interview.

First, I analyzed the language we each used. Here are the words I used most often:

Screen Shot 2016-04-19 at 10.52.27 PM

And the ones Sands used most often:

Screen Shot 2016-04-19 at 10.53.21 PM

There wasn’t a lot of overlap.

Then I counted the number of words in each uninterrupted chunk of speech and made a spreadsheet recording each of those chunks under our respective names, with the minute timestamp interspersed. For example, here is the first five minutes:

Screen Shot 2016-04-19 at 11.12.56 PM

Here is a streamgraph that shows our individual share of the conversation, and the overall give and take. I used total words per person per minute to produce this graph on

Screen Shot 2016-04-19 at 10.56.54 PM

Then I took a more granular look at the first 10 minutes of conversation, using cumulative word count instead of minutes as the x-axis value. That gave me a better sense of the frequency of volleys between us, and the duration of each uninterrupted chunk of speech:

Screen Shot 2016-04-19 at 11.15.35 PM

Here are a few takeaways I gleaned about my interview style by representing the interview visually:

  • I affirm understanding in lazy ways (yeah, OK, mhmm), and I interrupt a lot.
  • It would be better would be to remain silent until the end of my interviewee’s explanation, and then affirm my understanding in a summary that uses key words and phrases that he or she has shared.
  • Overall the share of conversation is roughly appropriate for interviewer and interviewee, though the spike at 22 represents a story I shared that probably didn’t add much to the interview.