Trends in Cosmetic Surgery … as measured in Units of Real Housewives (Data Visualization Assignment)

In research for an NPR series on women and image last year I came across data from the American Society of Plastic Surgeons that charts changes in cosmetic surgery procedures since 2000 – both surgical and “minimally-invasive” — generally fillers like Botox.

In 2013 Americans spent $12.6 billion on such procedures, almost all elective.  (The ASPS keeps separate data for medically-necessary procedures including reconstruction after cancer or injury.)

As you can imagine, this is pretty rich fodder.  There are numerous ways to compare or slice the data – some of which the ASPS does itself (by region, gender, age etc).  Below are some surprising discoveries, two of which we attempted to represent.

– Some procedures have seen dramatic changes in 13 years.

For example: there has been a 4,565% increase in “Upper Arm Lift Surgery” – a category or procedure that practically didn’t exist in 2000.

“Lower Body Lifts” are up 3,417% – a procedure the ASPS clarifies as “improv(ing) the shape and tone of the underlying tissue that supports fat and skin.” Basically, get rid of your sagging butt, your flabby belly, your dimpled thighs – all in one go!  Amazing, right?  If you like that kind of thing and are willing to spend some time in recovery.

– In the non-invasive sphere, “injectables” like Botox, and “soft-fillers” like Restylene show triple-digit increases.   These new substances have largely appeared since 2000, and are increasingly advertised in women’s publications and in the waiting rooms of dermatologists’ offices.   This stuff has become, in some cities, the “new normal” for women in certain industries, age groups or in the public eye.

There’s some good reporting to be done about how these chemicals have been created or repurposed, approved and brought to market as material that can be injected, absorbed and broken down by the body.

– There are also declines – nose jobs and liposuction are down.

I encourage anyone curious to simply look over the data, with a critical eye – as some things show dramatic growth by virtue of being new; and others –  like nose jobs, eye lifts and breast augmentation – still accounting for the lion’s share of these elective surgeries.

I simply wanted to show some of these dramatic changes.  I could say this is “value-neutral” but obviously, I have some thoughts here about how we are reshaping the norms of female appearance (the vast majority of the procedures are done by women).   Thanks to classmate Celeste LeCompte, I wandered my way into exporting data to Excel and making charts, and then to Photoshop, to illustrate them.

We originally tried to render some of these comparisons in Excel charts but they were both visually boring and confusing when we tried to compare rates of growth in procedures.

So, we settled on some icons who’ve helped introduce America to the Brave New World of Surgical Enhancement:  the Real Housewives.

Below are two procedures:  the lower body lift, as represented by Real Housewives of New Jersey’s Jacqueline Laurita jacqueline-laurita-rhonj-reunion-finale

And the nose job as visualized here by Atlanta’s own Real Housewife Kim Zolciak.   article-2130709-12A07CED000005DC-489_306x423-2

Several caveats apply:   Data is available in 2000, not from 2001-2004, and picks up again in 2005-2013.  It’s self-reported, by the ASPS, not by any government agency (which classmate Gideon Gil says is not required).  The Photoshopped images are – to scale, sort of.  And probably a hundred other things that make this scientifically squishy.

Let’s start with what’s NOT happening as often:  nose jobs.


“Nose reshaping surgery” has fallen off from 389,155 performed in 2000 to only 221,053 in 2013 – a drop of 43% — or, only about half the Kim Zolciaks as once took place.  (The Reality Star has denied having rhinoplasty – as recently as last week).

Since there a few years of missing data, it’s hard to pinpoint when the decline started.  As to the why? That’s entirely speculative.  People happier with the noses God gave them? Who nose?

Let’s turn, instead, to a growth industry: the Lower Body Lift.


As you can (sort of) see, in 2000, it was almost non-existent – some 207 procedures.

By 2013, 7,281 people had this done in a year.  Interesting to note a drop from 2006 to 2007, a bump in 2010, then drops.   RHONJ Ms. Laurita has publicly discussed her several procedures, so we’re not casting aspersions by using her image.

There is so much value-laden here, and so many, many possible interpretations.  Among our questions:  did the economic meltdown of 2008 have an impact (it seemed to in some instances) given that these are essentially discretionary purchases? Could any tool show a predictive association? Is there a way to cross-reference around a marketing push by the pharmaceutical industry?  Will Joan Rivers’ death at a medical day surgery center have an impact on the safety of this kind of thing?

I’ve treated this as a light-hearted exercise simply to get practice in working with new tools (Thanks, Celeste!) like basic manipulation of tables, and Photoshopping.  But there are myriad possibilities for some serious news gathering here and some even more serious discussion of what we make normative.   Women have been enhancing their appearances at least since Cleopatra; so who am I to judge whether using surgery or fillers is somehow less acceptable than, say, wearing lipstick?  But I am left uneasy, seeing this data, and hope it’s something we as a society can consider.



Posted in All

Moral Values and the Discussion on Abortion on Social Media

One of my main interests is in analyzing user-generated data, whether that be comments, tweets, or check-ins. I have a side research project that I am working on related to abortion and public policy and so decided to use this homework assignment as a way to get myself started on analyzing the data from this project.

I did most of the work in python, using the awesome libraries of tweepy (Twitter API wrapper), matplotlib (plotting), pymongo (interface to mongo database), and nltk (natural language toolkit). I used a mongo database to store the data but it wasn’t super necessary (plain text files can easily suffice). I forgot to take into account how long it would take for the scripts to crunch through all the data, so when I got started last night, I quickly realized I’d better let the scripts run overnight and write up a post this morning.

My dataset consisted of 663131 tweets related to abortion collected from the year 2013. To find tweets related to abortion, I looked for key terms such as “abortion”, “abort” + “baby”, “abort” + “birth”, “prolife”, “prochoice”, and some others, including common hashtags.

Here is some basic info on the tweets I collected:

Total Volume of Tweets over time (x=month of 2013, y=number of tweets): figure_1

You can see that the volume varies quite a bit. Looking at the top words used each month, removing stopwords (very common English words), we see the following (I show the word as well as the number of times that word appeared in tweets in that month):

Jan ‘prolife’, 5783 ‘women’, 2894 ‘life’, 2540 ‘roe’, 2460 ‘baby’, 2276
Feb ‘prolife’, 3300 ‘women’, 1965 ‘baby’, 1606 ‘bill’, 1334 ‘prochoice’, 1307
Mar ‘prolife’, 3026 ‘dakota’, 2528 ‘north’, 2441 ‘ban’, 2030 ‘baby’, 1926
Apr ‘gosnell’, 14940 ‘prolife’, 6591 ‘clinic’, 5682 ‘trial’, 4586 ‘baby’, 4081
May ‘gosnell’, 6672 ‘prolife’, 5483 ‘murder’, 3401 ‘doctor’, 3301 ‘baby’, 3156
Jun ‘texas’, 12705 ‘bill’, 11218 ‘women’, 6538 ‘prolife’, 6530 ‘filibuster’, 4771
Jul ‘texas’, 14946 ‘bill’, 11675 ‘prolife’, 8196 ‘women’, 7289 ‘law’, 5142
Aug ‘prolife’, 4958 ‘women’, 3112 ‘tcot’, 2493 ‘prochoice’, 2343 ‘like’, 1822
Sep ‘prolife’, 3572 ‘pope’, 2950 ‘baby’, 2577 ‘women’, 2238 ‘church’, 1884
Oct ‘prolife’, 3905 ‘texas’, 3393 ‘baby’, 2562 ‘judge’, 2550 ‘law’, 2272
Nov ‘weeks’, 5254 ‘texas’, 4721 ‘baby’, 4634 ‘prolife’, 4285 ‘women’, 3596
Dec ‘praytoendabortion’, 28535 ‘prolife’, 5228 ‘life’, 5082 ‘women’, 4954 ‘baby’, 4083

We can clearly see that some volume seems to be driven by news events, such as Senator Wendy Davis’s filibuster in June to block a restrictive abortion bill in Texas. Other drivers perhaps include Twitter campaigns (#praytoendabortion). This also is a good point at which to audit one’s data and zoom into weird findings to check if the data is properly cleaned. I didn’t have time to do that here, but if I did, I would look at the tweets behind some of the weirder top 5’s that I didn’t understand and either learn something new about abortion or find ways to remove the invalid tweets from the dataset.

The last thing I did was to analyze the language in the tweets for moral values. This is part of a larger research project I am working on related to modeling ideology and linking that to policy change. You can see a complete version of this work here when I looked at same-sex marriage. Another important step which I am skipping is validation, or trying to correlate numbers crunched from the data to traditionally collected data, such as census or poll numbers.

To analyze moral values, I am using a supplemental LIWC dictionary built by political psychologists and linguists that attempts to match key words with underlying moral values. The 5 moral values we use are taken from research on moral foundations by political scientist Jonathan Haidt and some other people. They’re an attempt to understand the underlying values that people find important. Do you care more about fairness or more about loyalty and authority? Not surprisingly, these moral foundations somewhat correlate with either liberal or conservative ideologies.

So, given the keywords ascribed to each moral foundation, I counted the relative occurrence of each moral foundation within every tweet and then averaged that relative occurrence across all tweets within a month. The following is the outcome:

Moral Values Over Time (x=month in 2013, y=average relative occurrence in tweets)



Though we see more authority and harm language than the other 3, this doesn’t necessarily mean that people think more about some values relative to other values because our method can’t be comprehensive. But we can look at a single value over time. For instance, it’s pretty notable how purity language jumps up in October. I didn’t have time to dig into why but that would be my next step.

Future work that I intend to do would be to look at these traits broken up by state. You can do that by analyzing the location field that people specify in their profile and trying to match that to a state. I would want to look at several of the other LIWC categories and also come up with some more features of my own. Finally, it would be interesting to look at features over time – leading up to and after key events, for instance.

Posted in All

Boston’s Urban Orchards

With the weather becoming warmer again, for this week’s assignment I reported on a lighter and sweeter topic: urban orchard’s in the Boston Area.

I was fascinated to learn that there are, in fact, fruit trees and berry bushes around Cambridge, Somerville, Boston, and more that are publicly foragable. This dataset from the city of Boston data portal lists the known plants, and I overlayed it onto a color coded map.

I was unable to get the embed working in wordpress which does not allow for iframes, so here’s the link!

MIT’s Finances

Gideon Gil, Michael Greshko, and I set out to figure out MIT’s “Brown Book,” its yearly report on its finances. This is a work in progress and more visualizations are coming, but we found a number of interesting facts.

Posted in All

Is San Francisco’s Hot Housing Market Literally On Fire?

This project is a collaboration between David Jimenez, Charles Kaioun, Celeste LeCompte, and Léa Steinacker.

In San Francisco, there is a growing concern about residential fires, which have displaced more than 100 residents from their homes since the beginning of the year. Have there been more big fires? If so, why? We turned to the data to answer the question.

FIRE-in-SFO-draft_3Read on for more background on our analysis.
Continue reading

Hot and Cold in the Media Lab

The new Media Lab building, E14, was opened in 2009. The beautiful building, designed by the famous architect Fumihiko Maki of Japan, celebrates transparency, creativity and collaboration. The new building has also been equipped with various sensors across all internal spaces.

These sensors allow for a unique point of view into the building. In this article I will focus on temperature readings and will look into the stories entailed in the edge cases: the hottest and coldest spaces in E14.

First, some general statistics. 180 spaces are tracked in E14, every open space, meeting room, personal office and even storage units are monitored. The average temperature is 21.8c, which correlates well with the average thermometer setting : 21.9c. The readings I used were measured at 7pm, April 7th. Looking at the historical data reveals that the temperatures are stable throughout the day.

Warm and Empathetic – Opera of the Future


The highest temperature reading in E14, 24.8c, was from E14-433, The living room for the Opera of the Future group on the upper deck of the Swatch lab. a quick visit to the space reveals that it is indeed warmer than other spaces in the building although being an open space with a thermometer set to 22c. It is quite possible that heat from the entire swatch lab accumulates in this specific point.

Regardless of the reasons, the warmth is well suited for the Opera of the future group. It fits right within the creativity and empathy which guide the group in it’s work.
(waiting for a comment from Tod Machover)

Cold and Mysterious – E14-396T


The lowest temperature reading in E14, 13c, was measured in E14-396T. A mysterious locked door and a room number sign is all the innocent spectator has access to. Although it’s conceivable that behind the door is a storage unit or an electrical breaker box I can’t help but wonder: In the mystical playground that is the media lab, maybe an off the grid experiment is hidden behind that door? one that requires a cool 13c temperature.

Update: According to the Media Lab facilities department this space belongs to MIT IS&T and is dedicated to communications. Intriguingly, no one in the media lab has access to this space. 

About the data: 
All the data was collected through the Responsive Environments Chain API: an open source sensor data aggregation framework:
Media Lab sensors summary: (CSV)
You can also see this data live on project doppellab:



Posted in All

Visualizing newspapers words

For this assignment I worked in a project that evolved and get included in a larger one. I’m currently participating with my hometown university (ITESO) trying to understand what the newspapers are publishing related to political candidates, with elections for mayors and local congress in 60 days the team in Mexico is collecting the news related to the campaign every day.

With the news as a data set I processed it in Wordij a tool to generate semantic networks from .txt files, the software also gets a count of words in a csv file, this files where processed in Excel and then visualized in Tableau, with all the data we could run queries to know how many times a candidate has been mentioned by each newspaper, and adding data every day we are getting a larger picture of whom the newspapers are talking about.

Screenshot 2015-04-07 22.49.09

As part of another project that I’m involved with to monitor the political campaigns, we decided to include the data-viz tool as part of the site, all the info is in spanish but if any one wants to try the tool you could use the right side panel, adjusting dates, frequency of words or search for specific words, for example search for “PRI” or “PAN” or “MC” political parties or for “Villanueva”, “Alfaro” or “Petersen” last name of the candidates. The visualization is here.

Visualization showing queries for "Alfaro", "Villanueva" and "Petersen"

Visualization showing queries for “Alfaro”, “Villanueva” and “Petersen”

The Bright Knight of Boston: Comparing Streetlight Density and Crime Density

Crime is often associated regarded by how someone perceives the environment he or she lives in. There are often places in a city that are more dangerous than others. However, are these places somehow attributable to the infrastructure within the community?

One hypothesis is brightness in the surrounding environment. While not bidirectional, but in unsafe regions, dark areas are perceives as especially dangerous. To test this hypothesis, I use the Boston Police crime report data and map the list of crimes occurring during nighttime to streetlight deployment. While incident reports seem quite ubiquitous, the crime density still seems quite high and perceivable in regions that are less safe and with less light installations.

Further, divide the crime incident by type, we can observe a heavy tail in terms of crimes that are distant from the nearest lights, emphasizing the perceptive unsafe nature. For example, crime types such as vandalism, burglary, and forgery have clear heavy tail. It is interesting to see crimes like vigilante and violence also share this characteristic to certain degree. This quick study provides a unique perspective to how interventions that cause perceptive differences may be a potential way to thinking about solving current municipal issues.

Follow the link to see the visualization of the data analysis (Note: the page may take some time to load).

Posted in All

Jackpot … or not? New Haven School Lottery Odds

By Melissa Bailey and Audrey Cerdan.

Every year, thousands of New Haven parents try their luck in a lottery for schools of “choice.” Most walk away disappointed: In 2012, “9,333 local and suburban students applied for 2,677 open seats at 29 charter and magnet schools covering grades pre-K to 12,” according to Bailey. Students who live near the school, or already have a sibling in the school, get first dibs on open seats.

Bailey wrote about this in a 2013 article. The interviews were good, but the data was … not so easy for the reader to picture. So Bailey recruited top Parisian digital journalist Audrey Cerdan to try to visualize it better.

Our mission: Give disappointed parents a better sense of their odds in the lottery, to inform their future choices. For example: Maria really wants her daughter to go to Barnard school, which has a fancy vegetable garden and a good reputation. But she lives way across town. What are her chances of getting in?

We answer that question — and many more — in this exciting data visualization. Click “search” to try it out.

Screen Shot 2015-04-07 at 3.17.42 PM

We used Caspio because Bailey liked the interactive dropdown menus and got a free account at a recent Super Computer-Assisted Reporting Conference. BUT it was not so easy to use. And in the end we decided bar charts weren’t the best choice.

Our major hurdle was how to represent the concept of “zero.” As in, you have zero chance of getting into that school if you don’t live nearby. At the end of our struggle with Caspio, we thought of a better way to visualize the data. We think would be a much better way to really give a sense of a person’s odds. Here’s a sketch by Cerdan:


Posted in All

Data Story: Ferguson A Timeline

Data Story: Ferguson A Timeline 

Using data aggregated from major news outlets, I made a timeline on Timeline JS regarding the Michael Brown Shooting in Ferguson, MO (August 2014).

With this data I then implemented Weebly.  Weebly enabled me to create a platform that would store my timeline and allow me to analyze the information I discovered (from each of the major news outlets and various timelines).

Once on the website, click “about” to read my analysis, discoveries (i.e. the varying information each news source decided to stress or otherwise neglect to report), and for links to other happenings regarding the issues of Ferguson.

Posted in All