For the final class project, I want to do something with the data collected from the University of Hong Kong Journalism and Media Studies Centre’s WeiboScope Search project. In class last week, Ethan Zuckerman suggested that one option may be to do an online art piece using the most censored Chinese words on Sina Weibo. Out of curiosity, I did a draft of the 100 most censored Chinese words on Sina Weibo to see what came up. Here’s a quick translation of the most censored Chinese words:
转发微博 | retweet weibo (simplified Chinese) |
转 | retweet |
转发 | retweet |
轉發微博 | retweet weibo (traditional Chinese) |
哈哈 | ha ha |
偷笑 | smile |
嘻嘻 | hee hee |
呵呵 | he he |
哈 | ha |
哈哈哈 | ha ha ha |
蜡烛 | candle |
怒 | anger |
吃惊 | surprise |
泪 | tears |
围观 | crowd |
话筒 | microphone |
思考 | think |
赞 | praise |
威武 | mighty |
求证 | confirm |
衰 | decline |
挖鼻屎 | pick boogers |
The most common words are the Chinese equivalent of “retweet” or “RT.” The next most common are expressions, such as “ha ha” or “anger.” It doesn’t make much sense that the 50 cent party are simply censoring emotions. I’ll need to figure out a way to come up with a way to dig one layer deeper.
Hi – I’ve joined some groups on Soundcloud who do field recordings and I’ve found recordings from China, I’d be interested to try, if you are interested, adding a soundtrack to the censorship story and seeing if it works in any way.
I could add the track just above the translations after the intro and people could respond – if no one likes it I’ll just take it down, what do you reckon?
H
Hi Chi Chu,
A common approach to this is to create a list of “stop words” — things which you have identified as meaningless for your analysis — and remove them from the data source before you do your analysis. That might help you dig further.