Google Translate: A keystone for global communication

Google Translate is a tool that most of us already know and use. As one of the more popular Google products, it currently serves 500 million monthly users. While Google Translate historically may have been helpful for casual browsers of the internet, it’s not really useful enough to rely on completely for every day conversation, nor for a comprehensive understanding of foreign website.

Google’s recent update of Google Translate, however, has changed that. As of December last year, Google introduced AI into Google Translate, making the product astoundingly better. NYTimes shares the below example:

“Uno no es lo que es por lo que escribe, sino por lo que ha leído.”
With the original Google Translate: “One is not what is for what he writes, but for what he has read.”
With the new A.I.-rendered version: “You are not what you write, but what you have read.”

The difference is stark. Not only has the improvement enabled more coherent and seamless translations, the Google Neural Machine Translation tool now is able to link between two different languages that haven’t been previously linked. That is, Google Translate (idiomatically speaking) has it’s own language that it translates all languages to, thus enabling it to translate two different languages that it hasn’t been explicitly linked to. This improvement opens the door to more language pairings without much of the previous heavy lifting of explicitly linking one language and translating it to another.

This change has interesting implications on the future of news. It makes international news articles accessible to everyone. It allows journalists much easier and faster (and more reliable) access to sources–whether it be other people or documentation and data. More data will simply be more accessible.

It also may have implications on the labor force in the news industry–local speakers may not eventually be needed for reporting. How might this change the type of coverage we get? In a time when some news articles are already written by bots, will Google Translate improve our coverage because we can “understand” more? Or will this make news stories even more impersonal and spotty as we miss cultural nuances and context that only a local expert can provide? The potential implications seem both exciting, and daunting.

 

Sources and more information:

Google’s AI translation tool seems to have invented its own secret internal language

Posted in All

Hi! I’m Aileen.

Hi! I’m Aileen, a second year Sloan MBA who is coffee & pastry obsessed. In a world where I have oodles of money, I would own a high end bakery, and smell the smell of baking croissants all day. I hum when I feel awkward.

But perhaps more relevantly–

 

 

  • My Background: 
    • Education: Majored in Political Science, Minored in Economics. Originally I wanted to be a journalist to pay the bills as I worked my way through the next great American novel. Was fascinated most in my classes by the role of media in political society.
    • Work Experience (journalism ended up not working out): 
      • Advertising: I used analytics and statistics to optimize media placement, brand messaging, and media mix for clients like JetBlue, and Match.com.
      • Google: Decided I wanted to understand how businesses worked. I helped launch and grow a new product, and also did operations strategy.
      • Entrepreneurship: Creating your own product felt compelling, and still is. I am a co-founder for Armoire, a startup that was in MIT’s summer accelerator this past summer, and still going strong.
  • My Personal interests:
    • Better media for the average person: After studying mass media in American democracy during my undergrad, I struggled with some of the shortcomings in today’s media: the sensational headlines, dizzyingly short news cycles, parachute journalism, and inaccessibility by the average American. I’m passionate about finding a media structure that is engaging and educational for everyone, not just people who read The Economist.
    • Food science: Because, science makes everything tasty!
    • Other things I do in my free time: Learning how to photograph & edit, blogging & writing, learning French, baking, and learning how to gracefully lose at chess.

 

Mic Check: Jeneé O.

Peace! I’m a Nieman Fellow (Nieman Foundation for Journalism at Harvard). I’m also a lifestyle columnist and culture critic at The Kansas City Star where I write about race, gender and civil rights issues through the lens of pop culture.

Journalism is rapidly changing and we can’t just change with it, we have to innovate, too. And it’s important to me that we think about how to do that inclusively.  Diversity and accessibility in digital storytelling is a must.

When I’m not learning as much as possible and representing for my Hogwarts family, I’m walking my two boxers or listening to trap music and doing yoga. You can find me on Twitter @jeneeinkc.

 

Posted in Bio

Overview: Find stories faster in massive document dumps

If you were tasked with reviewing and making sense of a huge stack of documents you’ve never seen before, you would probably go about it in a pretty standard way. Skim the first page and make a quick decision about whether it’s relevant or about a specific topic, then move to page two and make that decision again. After a few pages, you might have a few separate piles describing what you’ve seen so far.

As you continue reading, the piles might get more sophisticated. In one pile, you might place emails containing specific complaints to the school board. In another, policy proposals from a public official’s top adviser. On and on you go until you get through enough of the pile to have a fairly good idea of what’s inside.

For investigative journalists reviewing massive document dumps — responses to public records requests, for example — this may be one of the very first steps in the reporting process. The faster reporters understand what they have, the faster they can decide whether there’s a story worth digging into.

Overview, a project to help journalists sift through massive document dumps

Making sense of documents as efficiently as possible is the primary purpose of Overview, an open-source tool originally developed by The Associated Press and funded by a collection of grants from the Knight Foundation and Google, among others.

Upload your documents into Overview and it will automatically process them first using optical character recognition. It then uses a clustering algorithm called term frequency-inverse document frequency to try to sort each individual document into a series of piles. It’s somewhat similar to the way a human reporter would sort documents if she were reading the pages one by one.

TF-IDF is built on a really basic assumption. It counts the number of times each word is used in each document — say a single email in a batch of thousands. It then compares those counts to the number of times the same words are used in the larger collection of documents. If a few of the emails have words in common that are relatively uncommon in the whole collection of emails, the assumption is that those documents are related in some way.

Overview doesn’t actually derive any meaning from the words it’s counting, so the assumption the algorithm makes about documents being related might be wrong or totally unhelpful. But Overview also allows users to tag individual documents (or whole piles) with custom labels. It might, for example, help a reporter more quickly identify those complaints to the school board or the policy proposals to the public official because they’re all grouped together by the algorithm.

Overview has a few other helpful features, like fast searching and the ability to rerun the clustering algorithm with different parameters — specific terms of interest or stop words, for example. It’s also seamlessly integrated with another tool called DocumentCloud, a popular platform journalists use to annotate and publish documents online.

Visual Explanatory Illustrations: “Back of a Napkin” methodology

[[* I reviewed the lists of tools, but understood that the selected tool does not need to be among the ones listed *]]

As a reaction to the access to huge amounts of information, we’ve seen a surge of explanatory media. Vox.com is known for its tagline “Explain the news”, theSkimm has a set of guides to hot news topics, and the tool FOLD lets writers link media cards along with their writing to provide more context.

News and storytelling already rely on images, audio, maps, cards, data diagrams, and more, to support their arguments and provide context. There is, however, an underuse of illustrations that help explain how systems work. We are visual thinkers and most of us learn better with pictures. While glorified illustrations of data and aesthetically pleasing designs are appealing, I am now talking about pictures that enable understanding by for example showing how things are connected. Future news sources that leverage this tool of explanatory illustrations, and successfully satisfy readers’ demand for understanding the news, will be at an advantage.

Figure 1: Example of an explanatory illustration

A specific tool that teaches anyone to problem-solve and communicate with pictures is Dan Roam’s book The Back of a Napkin: Solving Problems and Selling Ideas with Pictures. Dan Roam provides a methodology for discovering, developing, and selling ideas through pictures. He shows how to decompose a problem and come up with both simple pictures, as illustrated in Fig. 1, and more complex pictures.

 

 

Dan Roam describes the process of visual thinking as four steps, with separate chapters describing how to do each step:
1) looking, i.e. collecting and screening
2) seeing, i.e. selecting and clumping
3) imagining, i.e. seeing what is not there
4) showing, i.e. making it all clear

The book also includes concrete methodology charts, as shown in Figure 2, that can be useful starting points when determining how best to illustrate a topic or your ideas with pictures.

Figure 2: A chart to help determine how best to visualize a problem. The rows specify what type of problem it is (who/what, where, etc.) and the columns specify what should be highlighted (quality vs. quantity, vision vs. execution, etc.).

 

 

 

Posted in All

Anushka’s bio

 

My name is Anushka Shah, and I work as a researcher at Ethan Zuckerman’s Center for Civic Media here at the MIT Media Lab. My work focuses on using text analytics to analyze news language and on producing research with a new analytics tool called Media Cloud.

Home is Mumbai (really, Bombay) for me. It’s where I grew up, where I went to school, and where my family lives. I studied Government and Economics in the U.K. for my undergraduate education, with the hope of returning to India to participate in the political sector. When I did return home, I slowly came to realize there were two Indias; a socially and economically comfortable one that I grew up in, and a difficult, dark, disadvantaged one that I only saw at a distance.

I spent the next three years working with non-profit organizations and grass-roots political parties trying to understand various aspects of this other India. It was an important experience for me, not because I learned much of how certain issues could be positively affected, or what policies worked on ground and didn’t, but because I understood how deeply complex rural India is.

Amidst other things, the simplistic narratives about rural India that I and many others grew up with, kept the two Indias apart. I got interested in media as a way to affect opinion, knowledge, and eventually civic engagement in India. I studied applied quantitative research with a focus on news analytics, and now work in Ethan’s lab using Media Cloud to research Indian media.

Going forward, I want to use my quantitative media skills and field experience in India to design effective media messaging back home.

Posted in All

Tools to transcribe audio and video content

I’m pretty new at making podcasts. It’s not always easy when English is not your first language. Especially the transcription! If I had to do it myself by hand, it would take ages before I start editing. But with a help from some tools, I can edit and produce podcasts without a pain. I’ve only used the first one, but saw a demo for the second one at ONA last year, which was impressive.

  • Pop-up archive is a good for transcribing audio material. The accuracy is pretty good and I love the timestamping features.
  • Trint is a tool for transcribing audio and video material. It also has timestamping features with a function to adjust. The text can be also adjusted. You can also highlight the segment you want to use and it automatically tells you the time duration of the selected part.

FYI, in case of audio/video production, I always listen or watch the entire raw material of the interview. Even you have everything transcribed, it is just a guide for editing. Find the best part of the interview using your own eyes and ears!

Media Cloud: A tool for news analysis

The news plays a critical role in civic engagement today. Our existing knowledge of an issue, the ability to identify with a cause, or empathize with a group within civic movements, often depends on how the news educates us about these. To deconstruct the influence of news in order to construct public opinion, design media campaigns, and strategize advocacy is key to improving civic engagement.

Media Cloud is a big data, open-source platform designed to bring together media and civic engagement. Developed by the Center for Civic Media at the MIT Media Lab (where I work as a researcher on this platform) and the Harvard Berkman Klein Center, this web-based tool aggregates news stories daily from over 50,000 sources across the world, and delivers analysis and visualizations on media influence and attention.

Citizens, activists, journalists, and others interested in media can use Media Cloud to provide data-based answers to questions such as how much news attention a topic received, which sources were influential in driving a specific conversation, what impact a media campaign had, how liberal versus conservative sources, or online versus traditional newspapers differ in their framing of an issue, and so forth.

Media Cloud has been used to assess campaigns such as Black Lives Matter in the U.S. and Dalit Lives Matter in India, advocated to Indian news sources about coverage gaps around women’s issues, helped organizations like the Gates Foundation encourage local philanthropy in developing countries by mapping existing perceptions around the topic, identified strategic news partners for improved public health conversations, and mapped information availability around contraceptive use in Kenya and Nigeria.

Media Cloud has the potential for immense impact and can be used for various practices and in geographies around the world.

 

Posted in All

PGP: An Old Technology for a New Media Environment

Data privacy is, and should, be top of mind for journalists. As the Trump Administration takes an antagonistic approach with the media, it’s not very unrealistic to imagine the President signing an executive order any day now forcing news organizations to release emails to the government or have to pay significant fines or even face jail time if they do not reveal sources for leaks.

Just this week, President Trump tweeted about the “illegal tweets coming out of Washington” following the resignation of Michael Flynn as National Security Advisor. Flynn’s resignation was due in large part to reporters from The New York Times, the Washington Post, and other outlets publishing stories based on leaked information from government officials about Flynn’s conversations with Russia.

For journalists to keep informing the public of the stories that the Administration is trying to hide or ignore, they must continue using anonymous sources from within the government. These leaks cannot stop, regardless of whatever measures the Administration tries to put in place to stop government employees from speaking out and contacting the press.

The Need for Encryption

But for many of these employees, there are major ramifications to divulging top secret or sensitive information. Before any government employee considers leaking information to the press, they need to be sure that the communication is delivered securely and their identity is not divulged. Outside of in-person, secret meetups Deepthroat-style, this means that the journalist will need to use encryption to keep the information secure. Similarly, the journalist will need to keep the information secure to keep sources private to continue reporting the stories that need to be told.

PGP: A Golden Standard

Pretty Good Privacy (PGP) is a free encryption and decryption program created by Phil Zimmermann and typically used for email that has been around since 1991. The name, which is a tribute to A Prairie Home Companion, is misleading, as the tool is known to be more than just “pretty good” when it comes to maintaining a user’s privacy. In a post titled “Why do you need PGP?,” Zimmermann explains the need for the encryption tool:

Intelligence agencies have access to good cryptographic technology. So do the big arms and drug traffickers. So do defense contractors, oil companies, and other corporate giants. But ordinary people and grassroots political organizations mostly have not had access to affordable military grade public-key cryptographic technology. Until now. PGP empowers people to take their privacy into their own hands. There’s a growing social need for it.

Encryption, much like PGP, is a very old technology that is still just as relevant and powerful as it was when it  was first invented. Through encryption, the message you send is muddled up into a meaningless string of letters and numbers so that anyone snooping through your email cannot decipher the message. Only those with the correct key can unlock the meaning:

(via Lifehacker)

To start using PGP, you need to download GNU Privacy Guard (GnuPG), either through GPGTools (OS X) or Gpg4win (Windows). Once he or she has his or her own PGP key, the person can communicate with anyone else through encryption, so long as the recipient also has a PGP key. There are several browser extensions you can download to make the process of sending an encrypted email quicker, including PGP Anywhere and Mailvelope. PGP also works with mail clients such as Mozilla Thunderbird for email encryption.

The biggest hurdle for anyone new to PGP is finding others who have their own PGP keys as well. WIthout the two-way system, you cannot send the encrypted messages. This may be a deterrent for some reporters who cannot convince sources to use a PGP key because of the time it takes to set it up. But for journalists who want to protect information and confidentiality, the upfront costs are worth the privacy gained through encryption.

To avoid this issue, there are other encryption tools journalists can use, such as Virtru. This tool is used in conjunction with other platforms such as Gmail and Salesforce to keep information secure through data encryption. However, unlike PGP, Virtru and other similar products are not free for users.

PGP is only the first step

Though email encryption is only one step journalists can take to keep their messages secure and the privacy of their sources intact, it’s one of the most important and the first they should consider. PGP is not the perfect solution for encryption, as several government agencies to have the ability to unlock keys and decipher the message. But using PGP can be seen as a gateway for journalists to better maintain confidentiality and keep information secure. Creating a key and locking their emails is the first step journalists can take to unlocking the road to better privacy habits.

Sara’s bio

Sara picture

I’m a first year graduate student in Comparative Media Studies and a Research Assistant at MIT’s Open Documentary Lab. Before coming to MIT, I was the Researcher on Central America at Amnesty International, based in Mexico City.  There I covered human rights issues in the region and led a year-long project on Central American migrants fleeing (and being deported back to) unrelenting violence. Before that I was the the Americas Program Researcher at the Committee to Protect Journalists, based in New York, where I covered press freedom issues in Latin America and the United States. I’ve also worked as a freelance journalist and with a number of international NGOs and foundations throughout Latin America, predominantly in Argentina and Colombia, as well as in my home town of New York City. I’m a journalism junkie and film buff and am interested in looking at how to apply new narrative and storytelling techniques to the human rights issues I’ve been working on for the past several years, particularly in the area of freedom of expression.

Posted in All