Overview: Find stories faster in massive document dumps

If you were tasked with reviewing and making sense of a huge stack of documents you’ve never seen before, you would probably go about it in a pretty standard way. Skim the first page and make a quick decision about whether it’s relevant or about a specific topic, then move to page two and make that decision again. After a few pages, you might have a few separate piles describing what you’ve seen so far.

As you continue reading, the piles might get more sophisticated. In one pile, you might place emails containing specific complaints to the school board. In another, policy proposals from a public official’s top adviser. On and on you go until you get through enough of the pile to have a fairly good idea of what’s inside.

For investigative journalists reviewing massive document dumps — responses to public records requests, for example — this may be one of the very first steps in the reporting process. The faster reporters understand what they have, the faster they can decide whether there’s a story worth digging into.

Overview, a project to help journalists sift through massive document dumps

Making sense of documents as efficiently as possible is the primary purpose of Overview, an open-source tool originally developed by The Associated Press and funded by a collection of grants from the Knight Foundation and Google, among others.

Upload your documents into Overview and it will automatically process them first using optical character recognition. It then uses a clustering algorithm called term frequency-inverse document frequency to try to sort each individual document into a series of piles. It’s somewhat similar to the way a human reporter would sort documents if she were reading the pages one by one.

TF-IDF is built on a really basic assumption. It counts the number of times each word is used in each document — say a single email in a batch of thousands. It then compares those counts to the number of times the same words are used in the larger collection of documents. If a few of the emails have words in common that are relatively uncommon in the whole collection of emails, the assumption is that those documents are related in some way.

Overview doesn’t actually derive any meaning from the words it’s counting, so the assumption the algorithm makes about documents being related might be wrong or totally unhelpful. But Overview also allows users to tag individual documents (or whole piles) with custom labels. It might, for example, help a reporter more quickly identify those complaints to the school board or the policy proposals to the public official because they’re all grouped together by the algorithm.

Overview has a few other helpful features, like fast searching and the ability to rerun the clustering algorithm with different parameters — specific terms of interest or stop words, for example. It’s also seamlessly integrated with another tool called DocumentCloud, a popular platform journalists use to annotate and publish documents online.

Visual Explanatory Illustrations: “Back of a Napkin” methodology

[[* I reviewed the lists of tools, but understood that the selected tool does not need to be among the ones listed *]]

As a reaction to the access to huge amounts of information, we’ve seen a surge of explanatory media. Vox.com is known for its tagline “Explain the news”, theSkimm has a set of guides to hot news topics, and the tool FOLD lets writers link media cards along with their writing to provide more context.

News and storytelling already rely on images, audio, maps, cards, data diagrams, and more, to support their arguments and provide context. There is, however, an underuse of illustrations that help explain how systems work. We are visual thinkers and most of us learn better with pictures. While glorified illustrations of data and aesthetically pleasing designs are appealing, I am now talking about pictures that enable understanding by for example showing how things are connected. Future news sources that leverage this tool of explanatory illustrations, and successfully satisfy readers’ demand for understanding the news, will be at an advantage.

Figure 1: Example of an explanatory illustration

A specific tool that teaches anyone to problem-solve and communicate with pictures is Dan Roam’s book The Back of a Napkin: Solving Problems and Selling Ideas with Pictures. Dan Roam provides a methodology for discovering, developing, and selling ideas through pictures. He shows how to decompose a problem and come up with both simple pictures, as illustrated in Fig. 1, and more complex pictures.

 

 

Dan Roam describes the process of visual thinking as four steps, with separate chapters describing how to do each step:
1) looking, i.e. collecting and screening
2) seeing, i.e. selecting and clumping
3) imagining, i.e. seeing what is not there
4) showing, i.e. making it all clear

The book also includes concrete methodology charts, as shown in Figure 2, that can be useful starting points when determining how best to illustrate a topic or your ideas with pictures.

Figure 2: A chart to help determine how best to visualize a problem. The rows specify what type of problem it is (who/what, where, etc.) and the columns specify what should be highlighted (quality vs. quantity, vision vs. execution, etc.).

 

 

 

Posted in All

Anushka’s bio

 

My name is Anushka Shah, and I work as a researcher at Ethan Zuckerman’s Center for Civic Media here at the MIT Media Lab. My work focuses on using text analytics to analyze news language and on producing research with a new analytics tool called Media Cloud.

Home is Mumbai (really, Bombay) for me. It’s where I grew up, where I went to school, and where my family lives. I studied Government and Economics in the U.K. for my undergraduate education, with the hope of returning to India to participate in the political sector. When I did return home, I slowly came to realize there were two Indias; a socially and economically comfortable one that I grew up in, and a difficult, dark, disadvantaged one that I only saw at a distance.

I spent the next three years working with non-profit organizations and grass-roots political parties trying to understand various aspects of this other India. It was an important experience for me, not because I learned much of how certain issues could be positively affected, or what policies worked on ground and didn’t, but because I understood how deeply complex rural India is.

Amidst other things, the simplistic narratives about rural India that I and many others grew up with, kept the two Indias apart. I got interested in media as a way to affect opinion, knowledge, and eventually civic engagement in India. I studied applied quantitative research with a focus on news analytics, and now work in Ethan’s lab using Media Cloud to research Indian media.

Going forward, I want to use my quantitative media skills and field experience in India to design effective media messaging back home.

Posted in All

Tools to transcribe audio and video content

I’m pretty new at making podcasts. It’s not always easy when English is not your first language. Especially the transcription! If I had to do it myself by hand, it would take ages before I start editing. But with a help from some tools, I can edit and produce podcasts without a pain. I’ve only used the first one, but saw a demo for the second one at ONA last year, which was impressive.

  • Pop-up archive is a good for transcribing audio material. The accuracy is pretty good and I love the timestamping features.
  • Trint is a tool for transcribing audio and video material. It also has timestamping features with a function to adjust. The text can be also adjusted. You can also highlight the segment you want to use and it automatically tells you the time duration of the selected part.

FYI, in case of audio/video production, I always listen or watch the entire raw material of the interview. Even you have everything transcribed, it is just a guide for editing. Find the best part of the interview using your own eyes and ears!

Media Cloud: A tool for news analysis

The news plays a critical role in civic engagement today. Our existing knowledge of an issue, the ability to identify with a cause, or empathize with a group within civic movements, often depends on how the news educates us about these. To deconstruct the influence of news in order to construct public opinion, design media campaigns, and strategize advocacy is key to improving civic engagement.

Media Cloud is a big data, open-source platform designed to bring together media and civic engagement. Developed by the Center for Civic Media at the MIT Media Lab (where I work as a researcher on this platform) and the Harvard Berkman Klein Center, this web-based tool aggregates news stories daily from over 50,000 sources across the world, and delivers analysis and visualizations on media influence and attention.

Citizens, activists, journalists, and others interested in media can use Media Cloud to provide data-based answers to questions such as how much news attention a topic received, which sources were influential in driving a specific conversation, what impact a media campaign had, how liberal versus conservative sources, or online versus traditional newspapers differ in their framing of an issue, and so forth.

Media Cloud has been used to assess campaigns such as Black Lives Matter in the U.S. and Dalit Lives Matter in India, advocated to Indian news sources about coverage gaps around women’s issues, helped organizations like the Gates Foundation encourage local philanthropy in developing countries by mapping existing perceptions around the topic, identified strategic news partners for improved public health conversations, and mapped information availability around contraceptive use in Kenya and Nigeria.

Media Cloud has the potential for immense impact and can be used for various practices and in geographies around the world.

 

Posted in All

PGP: An Old Technology for a New Media Environment

Data privacy is, and should, be top of mind for journalists. As the Trump Administration takes an antagonistic approach with the media, it’s not very unrealistic to imagine the President signing an executive order any day now forcing news organizations to release emails to the government or have to pay significant fines or even face jail time if they do not reveal sources for leaks.

Just this week, President Trump tweeted about the “illegal tweets coming out of Washington” following the resignation of Michael Flynn as National Security Advisor. Flynn’s resignation was due in large part to reporters from The New York Times, the Washington Post, and other outlets publishing stories based on leaked information from government officials about Flynn’s conversations with Russia.

For journalists to keep informing the public of the stories that the Administration is trying to hide or ignore, they must continue using anonymous sources from within the government. These leaks cannot stop, regardless of whatever measures the Administration tries to put in place to stop government employees from speaking out and contacting the press.

The Need for Encryption

But for many of these employees, there are major ramifications to divulging top secret or sensitive information. Before any government employee considers leaking information to the press, they need to be sure that the communication is delivered securely and their identity is not divulged. Outside of in-person, secret meetups Deepthroat-style, this means that the journalist will need to use encryption to keep the information secure. Similarly, the journalist will need to keep the information secure to keep sources private to continue reporting the stories that need to be told.

PGP: A Golden Standard

Pretty Good Privacy (PGP) is a free encryption and decryption program created by Phil Zimmermann and typically used for email that has been around since 1991. The name, which is a tribute to A Prairie Home Companion, is misleading, as the tool is known to be more than just “pretty good” when it comes to maintaining a user’s privacy. In a post titled “Why do you need PGP?,” Zimmermann explains the need for the encryption tool:

Intelligence agencies have access to good cryptographic technology. So do the big arms and drug traffickers. So do defense contractors, oil companies, and other corporate giants. But ordinary people and grassroots political organizations mostly have not had access to affordable military grade public-key cryptographic technology. Until now. PGP empowers people to take their privacy into their own hands. There’s a growing social need for it.

Encryption, much like PGP, is a very old technology that is still just as relevant and powerful as it was when it  was first invented. Through encryption, the message you send is muddled up into a meaningless string of letters and numbers so that anyone snooping through your email cannot decipher the message. Only those with the correct key can unlock the meaning:

(via Lifehacker)

To start using PGP, you need to download GNU Privacy Guard (GnuPG), either through GPGTools (OS X) or Gpg4win (Windows). Once he or she has his or her own PGP key, the person can communicate with anyone else through encryption, so long as the recipient also has a PGP key. There are several browser extensions you can download to make the process of sending an encrypted email quicker, including PGP Anywhere and Mailvelope. PGP also works with mail clients such as Mozilla Thunderbird for email encryption.

The biggest hurdle for anyone new to PGP is finding others who have their own PGP keys as well. WIthout the two-way system, you cannot send the encrypted messages. This may be a deterrent for some reporters who cannot convince sources to use a PGP key because of the time it takes to set it up. But for journalists who want to protect information and confidentiality, the upfront costs are worth the privacy gained through encryption.

To avoid this issue, there are other encryption tools journalists can use, such as Virtru. This tool is used in conjunction with other platforms such as Gmail and Salesforce to keep information secure through data encryption. However, unlike PGP, Virtru and other similar products are not free for users.

PGP is only the first step

Though email encryption is only one step journalists can take to keep their messages secure and the privacy of their sources intact, it’s one of the most important and the first they should consider. PGP is not the perfect solution for encryption, as several government agencies to have the ability to unlock keys and decipher the message. But using PGP can be seen as a gateway for journalists to better maintain confidentiality and keep information secure. Creating a key and locking their emails is the first step journalists can take to unlocking the road to better privacy habits.

Sara’s bio

Sara picture

I’m a first year graduate student in Comparative Media Studies and a Research Assistant at MIT’s Open Documentary Lab. Before coming to MIT, I was the Researcher on Central America at Amnesty International, based in Mexico City.  There I covered human rights issues in the region and led a year-long project on Central American migrants fleeing (and being deported back to) unrelenting violence. Before that I was the the Americas Program Researcher at the Committee to Protect Journalists, based in New York, where I covered press freedom issues in Latin America and the United States. I’ve also worked as a freelance journalist and with a number of international NGOs and foundations throughout Latin America, predominantly in Argentina and Colombia, as well as in my home town of New York City. I’m a journalism junkie and film buff and am interested in looking at how to apply new narrative and storytelling techniques to the human rights issues I’ve been working on for the past several years, particularly in the area of freedom of expression.

Posted in All

Bio

Hello everybody,

I am Eva, and I am currently a Master of Public Administration student at the Harvard Kennedy School.

I am French, and grew up in Paris.

I have always been passionate about advocacy, public policies and international relations. I have worked for both the French government and International Institutions. When I was 25, I move to Washington DC, and lived there 5 years before coming to Cambridge. I worked first for the French Embassy, and then for the World Bank. At the World Bank, I worked on education and social protection issues, and traveled to many countries in Africa to support World Bank policies and programs.

I am an avid media consumer, and I have always admired news reporters, and journalists. I have always tried to be inspired by their ability to connect with people and shape public opinion in my work. A while back, I participated in the creation of a European daily newspaper.

I love writing, playing the piano, making sculpture, reading novels, re-reading favorite books, traveling, listening to others’ stories, meet strangers and learn to know them, wandering around with a camera, drinking expressos and green tea, going to exhibitions, giving advice on great places to go to in Paris.

My hope for this class is to know more about media today, the different tools that can be used, and thinking about ways to build bridges between classic media and social media. I am very excited to be part of such a diverse group and learn from all of you!

Posted in Bio

A few thoughts on media and storytelling tools

There are several tools that I had never heard about before reading the articles assigned for this week’s class, and that I believe can have important implications for the future of news and storytelling.

I believe that news has to have tools that enable to collaborate with social media – one the one hand, social media can benefit from the higher quality of content news provide; and on the other hand, news can benefit from the bottom-up information sharing that is vivid on social media. In particular, when it comes to the sharing of stories, tools such as Shorthand Social, StoryMap.js, or Storyful multisearch could be very interesting and fruitful.

I also believe that data visualization has an important role to play – we live in a world with a huge number of data, and many people are not aware of the figures, or do not know how to read them. Data provide a lot of information, but the information has to be processed. That’s why I believe that tools such as Silk.co, DataPortals.org

Finally, I believe that tools using current tools and trying to analyze them, such as advanced twitter search, and Tweetdeck might be particulary interesting in the months and years to come.

Politwoops: tracking politicians’ social media stumbles

Deleting tweets is something we’ve probably all done from time to time – whether it’s just to fix a typo or to tone down our reaction to the latest aggravating news story. As private citizens, erasing an earlier post is a reasonable expectation. Yet it might be argued that for politicians in public office, what is said (and read) should stay said, much as a hot-mic gaffe, for example, can’t be taken back.

Twitter has become an important medium for politicians, whether campaigning for office or serving constituents. But sometimes, politicians (and their staffers) can get a bit carried away – and become just as susceptible as the rest of us to some post-tweet regret. Fortunately, the website Politwoops, now hosted for U.S. politicians by ProPublica, preserves these deleted tweets. Their archive makes for an interesting insight into the tweets that politicians wish they could (and perhaps believe they have) taken back. Given the Tweeter-in-Chief’s no-holds-barred nocturnal musings, for example, it’s a tool that may well prove useful for journalists in the coming years.

Several journalists have already noted, for example, the chronological coincidence that President-elect Trump praised Russia’s nonchalant response to Russian sanctions at exactly the time his recently fired National Security Adviser Mike Flynn was holding sensitive discussions with the Russian ambassador. That wasn’t a tweet Trump ever deleted – but it’s certainly reassuring to know that if he had, it would still be on record.