What Is A Bot, Anyway?

Posted on March 30, 2016 by sands

(with Adrienne)

Bots are having their 15 minutes, so to speak. Recently, Microsoft launched the “Tay” AI bot and chaos ensued. But bots had already been making a name for themselves on Twitter, on Tumblr, and even on collaboration platforms like Slack or Github. But just because we might recognize a bot when we see it, doesn’t help us understand what’s going on. To make the lives of non-coders everywhere easier, we’ve prototyped an app that can create and configure a vertible cornicopia of bots, no code required.

* For those who are interested in a little more detail, we’ve also created a simple example, an activist bot that echoes quotes excerpts from the Boston Police Patrolmen’s Association newsletter which is…unfortunately surprising.

What is a bot?

Broadly speaking, a bot is computer program that acts like a human user on a social media platform. Though we haven’t yet seen the passing of the Turing Test by any artificial intelligence, so it is pretty easy to distinguish the humans from the code. Essentially, a bot takes in some information or content from source A (or A + B, or A + B + C, or…well you get the idea), and then potentially transforms it based on rules the developer has given it, and saves the newly crafted content to a database. From here, the bot could also have instructions to share their creation on Twitter, but it’s not a requirement.

Minimum Viable Bot is just Information In, Information Out.

What are the different kinds of bots?

Bots can take lots of different forms depending on their purpose. Some bots can help you schedule meetings through email. Others are more nefarious, and try to circumvent spam filters in your email or on Twitter. Funnily enough, the hugely popular @horse_ebooks started out as a scam bot, until it was taken over by a reporter from Buzzfeed.

This is a very special technique that I have never seen

— Horse ebooks (@Horse_ebooks) September 23, 2013

We should note that there is no canonized taxonomy, but we’re going to offer a few informal categories here.

Mash Up Bots:
These bots combine different sources of content and post them.
Example: A bot that tweets out a combination headlines.

US Natural Journalists Count Is Near Its Record Low

— Two Headlines (@TwoHeadlines) March 27, 2016

Image Poster Bots:
These bots post an image, sometimes with additional information, or generated content.
Example: A bot that posts live TV stills and improvises subtitles for them.

A magical and revolutionary iPod at an unbelievable laptop. My goodness, my cyberwarfare! pic.twitter.com/FE8cEzcjhG

— TV Helper (@TVCommentBot) March 30, 2016

Smart Learner Bots:
Some bots will grow more “intelligent” the more they are interacted with. Smart learner bots require an extra level of human care, as Microsoft learned with Tay. To learn more about ethics in bot curation, Motherboard just posted a great explainer with some of the leaders in social bot technology
Example:
Microsoft’s ill-fated “Tay”, who “learned” by accepting as valuable everything that was said to it.

Auto Notifier Bots:
Auto Notifiers listen to a content source, and then perform an action when new content is posted, or something changes. It’s kind of like If This, Then That, the extremely popular service for connecting various web platforms together. These bots are also very common in journalism. They frequently take template text and “fill in the blanks” with the latest relevant information.

Our demo bot is a version of this kind of bot, because we are not transforming our text in any way. We are simply waiting for a new newsletter to be posted, and then periodically tweeting sentences from it.

Example: A twitter bot that tweets each time there is an earthquake near L.A.

3.1 magnitude earthquake occurred 3.11mi WSW of Huntington Beach, California. Details: https://t.co/hvAa0SD6P5 Map: https://t.co/RUfnmwUCpP

— LA QuakeBot (@earthquakesLA) March 28, 2016

Replier Bots:
These bots talk to the user based on rules written by the developer. Sometimes this needs to be something the user says directly to the bot, and sometimes these bots will tweet at someone in reaction to something that’s been said. Many platforms (e.g. Twitter) have rules for keeping these bots on their best behavior.
Example: A bot that takes nouns from your tweets and turns them into tributes to deities.

HEARTBEATS FOR THE HEARTBEAT GOD! PHOTOSYNTHESISES FOR THE PHOTOSYNTHESIS GODDESS

— Appropriate Tributes (@godtributes) March 26, 2016

Expert Bots:
Much like the phone trees, these bots may either offer (semi-) useful information, or take responses and decide what to say next based on them. These bots can also sometimes be found on e-commerce sites with services like Live Chat. The bot will help to quickly sort the chatter for a human.
Example: The Bank of America customer service bot.

@NellaDesigns Thank you! Our customers mean everything to us; we strive to exceed your expectations. ^bm

— Bank of America Help (@BofA_Help) September 7, 2010

Where do bots live?

Email
Github
Slack
Twitter
IRC
and many more!

How do bots work?

Bots typically have a place where they get their content from. In some cases, this may be a very advanced system. In the case of our demo app and bot we simply feed in a web address pointing to our desired content, and it will post sentence by sentence is located.

With any program that deals with a large amount of data, most of the work is typically in cleaning up the data so that, e.g. in this case, what the bot says is correct.

Some bots will try to detect what is relevant in the data you feed it. Some will simply take the data and reproduce it without a second thought. Tay’s “repeat after me” feature did this, to disastrous effect.

It’s common for one person, once they have acquired the skills, to make and manage many bots! To see this ailment in action, have a look at the work of the wonderful Darius Kazemi!

To end, here is an example of the code that would run our bot that tweets out random sentences from the Boston Police Patrolman Association’s newsletter. This script would typically be set up on a server and run on a schedule.

It may not have been as complicated as you thought to build your own bot! If you would like an even more automated route, have a look at the article How To Make A Twitter Bot with Google Spreadsheets.

The Indispensable Value Of Fox News in an Uninformed Media Landscape >.<

Posted on March 9, 2016 by sands

(Submitted as an exercise in Motivated Reasoning and an example of how news claims need to be debunked and includes cherry-picked data.)

In news, we value truth, rigor, and awareness. These are unquestionable universals in modern society. In an information saturated environment, we are awash with perspectives, many of which are from under- or uninformed sources. The rise of the blogosphere has led to a proliferation of citizen journalism based on nothing but a singular perspective informed by nothing but confirmation bias and many times, radical views that no serious American would take seriously. And thanks to the “international” perspectives on life in the United States, the waters are further muddied with perspectives that aren’t even informed by an everyday experience in this proud country. Even worse, this has become fodder for the main stream media to feed off of. And so we’re left to the mercy of a media that stabs in the dark and makes claims largely based on indefensible sources.

The solution to this truly doesn’t lay in a messy conglomeration of independent voices vying for attention. What is needed is a combination of two things: a large, well-funded organization dedicated to producing relevant, up-to-the-minute news, and a voice that isn’t just an echo of every other media outlet; a voice that is controversial, as all truth tends to be in a world of media distortion.

We investigated the language that Fox News uses the most frequently in the context of truth. Do these issues matter to you?

A focus on what really matters: the Fox News Truth Language

Do you value a focus on family? What about a focus on the victims of crimes? How about God and love? As evidenced by this data, these are some of the core topics that Fox News cares about.

In the top list of words from the New York Times, instead you’ll find “democratic”, “headline”, “twitter”, and “celebrity”. Not quite a balanced and valuable view of the universe.

Now let’s look at the blogs that one might be tempted to retreat to given the lame stream media’s pathetic attempts to mislead and beguile you. The first thing you will notice is that the word “internet” is front and center. (The word “Google” isn’t far behind. Talk about a misleading liberal media super-power.) This is a community mainly interested in talking about themselves. For those of us who live in the real world, this should be a red flag. We need attention on real issues that affect real people.

You’ll find the word “opinions” high up on this list as well. That’s great if you’re not interested in objective reality and facts, but who has time to dawdle around in a sea of opinions when we have jobs to work, families to shop for, and taxes to pay??? You’ll also find “emotions”. This is now how one defines a valuable source of news.

So as it turns out, despite all of the attacks on the Fox News, they really do represent a valuable source of news, and an alternative to a media landscape gone awry. The data doesn’t lie. The liberal media does.

>.<

Opposition To Muslim Cemeteries In The U.S. Mainly Relying On Narrative Of Water Contamination

Posted on February 23, 2016 by sands

Residents opposing Muslim cemeteries in Texas (left) and Massachusetts (right)

In the rural town of Dudley, Massachusetts, a cemetery has been proposed. People who live close to the land the cemetery is to be built on are opposed to it. The concerns voiced primarily focus on environmental impact of the cemetery due to burial practices and potential contamination of local wells. The agreed upon conditions by the organization attempting to build the cemetery however, seem to placate these concerns. They are quoted as saying “the group will comply with whatever the town wants when it comes to burials, even if it means not strictly following tradition”. Yet there is still opposition to the construction of the cemetery.

While there is “no law in Massachusetts that directly address green burials”, people have still taken issue with a cemetery located within the proximity of the wells they rely on adjacent to the property. A hearing was held in accordance with Massachusetts law, and many residents turned up to protest the cemetery plans, however the final approval lies in the hands of the town and the Board of Health, who’s job it is to make a final and informed decision about how best to protect the residents’ health. Despite comments from the town suggesting that the cemetery will move forward, people are still upset.

If this was the entire story, it seems likely it would come to a quiet end with the Board of Health approval, given the assumption that residents trust their government and Board of Health officials. The aspect of the story that is not represented above however is that the proposed cemetery is being brought forth by the Islamic Center of Greater Worcester. Despite the assurances from the organization that they will comply with town policies and laws surrounding safe burial, and that they are willing to adapt burial practices to satisfy concerns in the town, opposition remains.

It seems the only other substantive concern residents and abutters have is a question of whether there will be noise pollution. One attendee asked whether “he was going to have to listen to “crazy music” like the call to prayer.”

Sadly, what is left seems to be a simmering anxiety based on poorly understood cultural practices and a series of statements it is hard to categorize as anything other than racism and xenophobia. The debate in Dudley echoes almost identically a number of other challenges to Muslim cemeteries in the U.S., for instance in Walpole, MA and in Farmersville, TX, which have mainly cited claims that there is a risk to local water supplies, and many residents’ insistence that the opposition is not at all based in religious prejudices.

In response to the concerns, the president of the Islamic Center of Greater Worcester Khalid Sadozai stated “We are the residents of the Commonwealth of Massachusetts. We want to bury our loved ones somewhere in Massachusetts area.”

At the time of this writing, neither the Islamic Center of Greater Worcester, nor the Dudley Board of Health, nor the Dudley Water Department, nor the Dudley Zoning Board, nor Dudley Police Chief could be reached for comment.

Journaling Media Consumption – Content, Source, Choice

Posted on February 17, 2016 by sands

In tracking my media usage for the week, I gained something that I imagine most people gain when engaging in this exercise: anxiety. Anxiety and paranoia that I have stopped paying attention to what media I was being exposed to / exposing myself to, that there were aspects of my media consumption that I was significantly less aware of, or that I was generally unconscious of the majority of my consumption on a daily basis. In other words, it worked.

My strategy for designing my media journal was not simply to find out how often I was accessing media, but to develop an ontology for engaging with media and test it to see what properties of media access were the most revelatory about my habits. What follows is a breakout by each of those properties, some of which are revelatory, and some of which might benefit from collecting over a longer timeline.

First, I categorized my media consumption by what it was about. One thing about recording this was that it drew attention to how frequently I was consuming more than one form of media at the same time. Obviously the largest category, music, was mostly consumed while also engaging with a number of the others. It is no surprise to me that work and social are among the biggest categories, but it was surprising just how large of a percentage was dedicated to art (and Instagram, which I struggled to categorize given that it is a platform with multiple types of content; I went with “social;art”, since my primary use for it is to follow artists and designers).

One of the aspects of the journaling I was most interested in was how much media consumption was a choice vs. forced on me by context and environment. Admittedly, I am likely to have dramatically underreported the media I was involuntarily exposed to. Reflecting on walking through the city, it already occurs to me that e.g. I stood in front of a number of advertisements on the back wall of the subway platform that I was subconsciously aware of, but which didn’t rise to the level of conscious consumption. That said, I do thing what I realized from this process is that, for the media I am at least partially engaged with, most of it is quite purposeful. Non-discretionary is listed, as some media, such as presentations, or readings for classwork were voluntary, but not optional/assigned by others. (This chart is based on number of engagements, not amount of time spent on each piece of media. If this were based on time, it would look dramatically different, skewing toward nondiscrentionary.)

Media type gives a bit more resolution in terms of what I was consuming. What was surprising for me was the variety. If I was to imagine the various types of media I was engaging with on a daily basis, I would have guessed perhaps only 3-4, but it appears there is still diversity in the ways I consume media. Again, advertisements are not broken out here, which might have been interesting. A stand-out is the “platform” category, which represents types such as Twitter, Instagram, Instant Messaging platforms, etc. bringing into focus the amount of times I engage with media in an ecosystem where I am likely to be exposed to many other types of content.

I tracked what channels the media I consumed came to me from. No surprise that I’m the top culprit here in terms of choosing to expose myself to media. Community, friends, and classes are about on equal footing, but on a long enough timeline, I’d be curious to see how this actually played out. My suspicion is that class would spike and the influence of my friends or online communities would stay mostly the same. (I recently purposefully locked myself out of Facebook and handed the keys to a trusted friend, so it was an interesting time for me to journal. I shudder to think what these charts would look like if my usual habits of being tempted into admittedly a lot of good, yet likely superfluous content.)

The amount of social media content here is alarming, even without Facebook. I think that if I spent more time counting the various exposures during class time (when I appropriately wasn’t diverting my attention to log every item) this would balance out with social, or at least that’s what I’m going to tell myself…

In terms of what devices this media is experienced through, I would have expected that “laptop” would have dominated my phone accesses much more. Another comparison I’m going to continue to keep my eye on. Again, this is not based on time spent on engaging with the media, so this chart would likely skew toward “laptop” given the amount of times I use it to read long-form items, which I can’t stomach via my phone. At the same time, if I added up every one of the micro-engagements I had on my phone, it’s possible the gap would be smaller than I’m imagining.

In general, I think the voluntary/involuntary comparison and source of media analyses were the most educational in understanding my own habits of consumption. No doubt, even tracking a handful of metrics for a short time period heightened my conscious awareness of the beginning of an interaction with a given form of media dramatically.

The Growth of Robo-Journalism

Posted on February 10, 2016 by sands

Robo-Journalism saw a lot of coverage in the news in the past few years, with a lot of attention being given to a particular type of event in 2014. When a magnitude 4.7 earthquake shook the Los Angeles area, a robo-journalism program wrote and helped to publish the first story about it.

A 1.0 magnitude earthquake occurred 2.49mi S of Lytle Creek, California. Details: https://t.co/DneAcreoLq Map: https://t.co/c12wmekInr

— LA QuakeBot (@earthquakesLA) February 10, 2016

There is nothing terribly remarkable about this, technically speaking. It is essentially a template whose blanks are filled out based on simple bits of information gathered programmatically. Think Mail Merge for extremely simple news stories. To decry the end of journalists based on technology like QuakeBot is alarmist. Firstly, this is more a collaboration with an algorithm than anything else; a collaboration between a programmer, a copy editor, and a human gatekeeper who decides whether or not the article should go out or whether it should be edited/rejected. Even in a more extreme case, one could imagine algorithms feeding a journalist with plain text descriptions of facts that, when concatenated, wouldn’t really amount to a story, but could streamline the task of writing an article, and contextualizing it in a nuanced way.

That said, what evolves from this idea in the future has the potential to impact the field significantly. One of the first steps has already been taken: stories are being compiled and sent “out to the wire without human intervention”. As algorithms and methods become more advanced, the scope of what is reasonable to be reported on by an algorithm will expand. Companies like Narrative Science are already deploying an artificial intelligence based natural language generation product that claims to create “perfectly written narratives to convey meaning for any intended audience”. These systems use technologies such as entity extraction to identify entities of interest such as people, places, and organizations. Add to this the ability to break down the structure of text with part-of-speech parsing and tagging, and you can see how the consumption of large bodies of knowledge could be summarized, identifying the most significant elements and focusing attention on them.

Conversational text generation has produced some humorous results, but has already advanced beyond this stage. These provocations provide opportunities to define what is most valuable in journalism, as well as the perils of technologies that we will interact with in the future. Can an article be mistakenly generated that causes great disaster? Can a negative conversation with a chat bot that mirrors previous undesirable real-world human inputs cause real emotional distress? Can we trust the organizations designing and deploying these technologies? It would be trivial to have an algorithm exclude references to specific people, words, or topics. What are the ethics of blending generated forms of knowledge with human-authored publications?

“Who needs pants anyway?”

Hello, My Name Is Sands Fish

Posted on February 10, 2016 by sands

My name is Sands Fish. I am currently a first year master’s student in the Media Arts and Sciences program at the MIT Media Lab. I work under Ethan Zuckerman in the MIT Center for Civic Media as a Research Assistant, primarily focused on the Media Cloud platform. I design data visualizations that reveal hidden patterns in the content and structure of the news at large scales. My current efforts are in detecting conversations and frames in issues discussed online, anywhere from main stream media to citizen blogs. This effort (initially called a “Network of Frames”) is a network visualization that represents media sources and the words they use most frequently. The network shows common usages of words between different media sources and the layout of the network highlights clusters of language, indicating at the very least themes emerging from overall coverage of the issue. Before arriving at the Media Lab, I worked as a fellow and researcher at Harvard’s Berkman Center for Internet & Society and as a senior software engineer and data scientist at the MIT Libraries.

In my role as a designer and artist, I focus on using generative visuals, artificial intelligence, and hardware interfaces to expose beauty and intricacy in patterns from the natural and digital world. I am also one of the organizers of the Cambridge/Somerville based Tech Poetics community; a loose collection of new media and technology artists in the greater Boston area practicing or interested in the use of technology (loosely defined) in their artistic practice.

You can find me on Twitter at @sandsfish and my home on the web is http://sands.fish.

Future of News and Participatory Media

Treating newsgathering as an engineering problem… since 2012!

Author Archives: sands