If you are interested in Chinese Internet censorship, I highly recommend you flip through Xia Chu’s latest update to his* research project “Complete GFW Rulebook for Wikipedia.” This latest revision of a document originally released last October, it identifies a massive list of actual trigger words (which Xia calls “rules” because they are often attached to specific conditions) which cause a Chinese Internet user’s connection to specific sites like Wikipedia to be disrupted by the Great Firewall (GFW). Not only that, it also includes a list of over 3,600 websites that he has currently confirmed to be unreachable from within China due to the GFW. The conclusions in the paper don’t necessarily upend anything that we thought about the GFW, but if you want a peek behind the curtain of how the GFW works (big takeaway: IT’S REALLY HAPHAZARD), this is as close as we can currently get.
The methodology behind Xia’s testing is sound and the breadth is among the most comprehensive attempts to document the Great Firewall’s blacklisted keywords—though Xia notes his debt to Jed Crandall et al’s ConceptDoppler paper, GreatFire.org, and others, including arrested civil rights lawyer Xu Zhiyong, for inspiring him. The paper is mostly jargon-free, and the testing process used is transparent and not at all ultra-sophisticated (a compliment!); an amateur coder like myself could replicate everything that Xia has done in the paper. The paper is pretty self-explanatory and there’s not much commentary for me to add, but below are a few notes I’ll make including a description of a similar tool I’ve developed for identifying sensitive keywords in Chinese news articles as well as how there are curious coincidences between how Sina Weibo and the GFW censor.
Looking back on 2013: Five Blocked on Weibo posts I particularly liked from last year
2013 has personally been an incredibly fun year. I finished grad school, my book was published, and I started working for this neat research lab. Chinese Weibo users though, especially prominent ones, had a particularly rougher time, with increased harassment and censorship by authorities inducing an unfortunate chill on discussion of sensitive topics on the site. Here’s hoping the next year brings a relaxation of such policies: I couldn’t be happier if I had nothing to write about on this blog.
So before we move on to 2014, a look back at five Blocked on Weibo keywords and posts that I particularly enjoyed uncovering and writing about in the past year:
Infographic showing Weibo censorship being linked to offline events in Bo Xilai scandal
In the infographic below, we have collected data from a number of sources, including GreatFire.org, China Digital Times, Blocked on Weibo, and Twitter users to chart the moments when Bo’s name became blocked or unblocked on Weibo. The speculation is that the authorities blocked his name when online conversations got too unpredictable to control and unblocked it when they sought to give netizens the space to criticize Bo. We have lined up those moments with what was taking place offline at the same time, presenting a connection between how real-life political turmoil was often reflected in changes in censorship online. Click to launch the interactive infographic.
Concept, Research, and Authorship: JASON Q. NG Design: JANE GOWAN and ANDREW HILTS Development: ANDREW HILTS
Update: The Chinese keywords on messaging app LINE’s “bad words” list and why they are “bad”
Last week, the research lab I pitch in at published the first in a series of posts investigating censorship and privacy concerns in three chat applications: WeChat, LINE, and KakaoTalk. These instant messaging programs, which often replace text messages on smartphones, are expanding rapidly across the world. While WeChat has garnered most of the foreign press, LINE, a Japanese subsidiary of the Korean Internet giant Naver, is no pushover: it has over 200 million registered users, generated $130 million in revenue last year, and is poised for a $10 billion market cap value when it goes public next year.
In addition to the series of 21 blog posts I did on the first chunk of the original list of uncovered “bad words” in LINE, I have translated the remainder of the 150 keywords on the original list as well as translated the majority of the 370 keywords on the recently decrypted list in the following spreadsheets:
The Chinese keywords on messaging app LINE's "bad words" list and why they are "bad"
For an updated summary of the Citizen Lab’s excellent research into censorship on LINE to which I contributed, see this Nov 21, 2013 post.
Back in May, Twitter user @hirakujira was poking around in the code for Lianwo, the Chinese version of the popular mobile chat app LINE, when he noticed a curious line: “<key>warning.badWords</key>” followed by a string that read in Chinese: “Your message contains sensitive words, please adjust and send again.” Hirakujira subsequently identified the application files which contain these so-called “bad words” and posted them. The Next Web and Tech in Asia reported on how even though LINE (which is a Japanese spin-off of Naver, a Korean company) wasn’t yet actively censoring messages sent through its Chinese-branded app, the inclusion of such files indicated they had built in such a capability into the program—a forward-thinking move for any foreign content provider/distributor that hopes to succeed in China.
What I hope to do in the next few days is to take a closer look at the first roughly 40 of the 150 words that Hirakujira posted, translating and explaining the significance of those words with respect to current Chinese politics. Some are quite obvious, but others are quite obscure. By examining the words, we may hope to get a sense of what LINE thinks is worth censoring in order to appease their Chinese regulators. So for the next few days, consider this site rebranded as “Blocked on LINE (maybe in the future).”
The twenty-one posts (links will be added as the posts go up):
"People are often torn when they start, but later they go numb and just do the job," said one former censor, who left because he felt the career prospects were poor. "One thing I can tell you is that we are worked very hard and paid very little."
”Our job prevents Weibo from being shut down and that gives people a big platform to speak from. It’s not an ideally free one, but it still lets people vent,” said a second former censor.
They said women shunned the work because of the night shifts and constant exposure to offensive material.
2. Can we have a whole book of this? My appetite has only been whetted. Some day maybe there’ll be a Chinese remake of The Lives of Others, but with a Weibo censor instead of a Stasi one. (It might be hopelessly boring though if it’s anything like what the article makes their lives out to be.)
3. The number of censors cited in the article (“40 censors work in 12-hour shifts” and “100 people worked non-stop for 24 hours” during intense periods) fits with the theory that much of the censorship is computer assisted. Without using algorithms to flag posts, Zhu et al’s paper (The Velocity of Censorship) estimated that you’d need 4,200 censors to to manually read every new post on Weibo, which would be inefficient. But is this Tianjin office the only such facility Sina Weibo has, or is it just one of many? Article doesn’t make that clear.
4. ”The most frequently deleted posts are the political ones, especially those criticising the government…” said one censor. This is interested in light of Gary King and co’s argument that the primary cause for deletion of posts online is the material’s potential for collective action—meaning whether there might be actual demonstrations of protests stemming from the post. Their argument (I’m oversimplifying of course) based on the data they’ve collected is that you can criticize the authorities as much as want—so long as you don’t call on others to do something together. They might get a pass because their first research paper (How Censorship in China Allows Government Criticism but Silences Collective Expression) doesn’t cover Weibo, but their new paper does (A Randomized Experimental Study of Censorship in China). So either King’s conclusions really don’t apply to Weibo or this one censor doesn’t really have a good sense of what his team is actually censoring. Though then again, “political” posts doesn’t exclude such posts from also being calls for collective action, but there are vast numbers of critical, anti-gov posts that don’t call for such action. My gut is that the King conclusion may not apply well to Weibo or is perhaps an overreach, but as far as folks can tell, their data seems pretty irrefutable (would love to get my hands on the second paper’s dataset).
5. For the most part, the details revealed about how the censorship works mostly gibes with the research out there (high five researchers!). For instance, Zhu et al estimate of how efficient censors are (can read 50 posts per minute) is exactly what the censors report (3000 posts per hour).
Interactive charts showing changes in Weibo keyword censorship (Jun - Aug 2013)
Thanks to the excellent work being done by researchers and journalists at China Digital Times, GreatFire.org, and many others, there has never been more information about what is being censored online in China. However, what is less discussed and written about are instances when the censors withdraw keywords or topics from their censorship watchlists.
480 keywords blocked from searching on Weibo as of Jun 29, 2013
During the past month, I’ve been working as a summer research fellow at The Citizen Lab in Toronto. It’s been great to not only have time to dedicate to updating this blog and pushing forward collaborative projects with researchers I’ve been fortunate enough to meet over the past two years, but also to pitch in with all the amazing work being done here at this one-of-a-kind lab. Among the projects I’ve been helping out with is one pertaining to the list of censorship and surveillance keywords in the Chinese chat clients TOM-Skype and Sina UC, which the team decrypted then analyzed in collaboration with Jed Crandall and Jeff Knockel at the University of New Mexico.
Of course, my first desire was to take the keywords they extracted and to test them on Weibo. Below are 480 unique keywords which were blocked from searching on Sina Weibo as of June 29. I’ve written more about the other censorship games I’ve detected in this post over at The Citizen Lab’s blog. Among the things I discuss are the overlap of keywords between different Internet services in China as well as what drastic changes in the number of search results for keywords might mean.
A full spreadsheet of the data mentioned in the report can be viewed in this Google Fusion Table or downloaded in .csv format for further analysis by all you researchers reading along at home. I look forward to sharing other relevant work my colleagues and I get done at the Lab during the rest of the summer.
“The removal of “June 4” from the list of blocked terms—an area of much ridicule for Weibo both in Western media and among Chinese netizens, many of whom evade the censors by using alternative coded slang to stand in for sensitive keywords—may be a sign that Weibo has become more comfortable trusting its human censors to manually delete sensitive posts quickly and effectively. They’re slowly moving away from the crutch of the keyword block, which while certainly effective at preventing the spread of sensitive information, is also at times overly broad and not responsive enough to more precise needs. … What is and is not off-limits has now become slightly harder to determine—another step in making censorship invisible and all-pervasive.”—My article “Weibo Keyword Un-Blocking Is Not a Victory Against Censorship” on Tea Leaf Nation, cross-posted on The Atlantic website
Which Chinese politicians get blocked (from criticism) on Weibo
A momentary break from the usual posts: first, a thank you to Tumblr for featuring the blog in its Tumblr Radar. Glad to reach out to new folks and hope you all continue to find this interesting. Second, I’ll be presenting a draft paper I wrote with Pierre Landry at the Chinese Internet Research Conference held this year at the Oxford Internet Institute. I’m flying out tomorrow and look forward to meeting all the attendees. Below is the abstract of the paper and a link to the pdf. After the jump, find out which 19 CCP politicians are still explicitly blocked on Weibo (they’re probably not the ones you expect).
This paper seeks to use the dynamics of Internet censorship by China’s most important social media site, Sina Weibo, to achieve a better understanding of the 18th National Congress of the Chinese Communist Party in November 2012. To this end, searches were performed daily on the names of all 2,270 delegates to the Party Congress on Sina Weibo for five weeks before and after the event. Data recorded included information on the number of results reported and whether the keywords were reported to be blocked or not. As a complement to work by researchers including Gary King, David Bamman, King-wa Fu, and Tao Zhu into Chinese social media censorship, our study concludes that Sina Weibo actively manipulated and filtered the search results of Communist Party delegates—particularly higher-ranked and incumbent officials—during the observation period, with an apparent decrease in search blocks after the Party Congress. This study offers evidence that the Party, through proxies like Sina Weibo, proactively attempts to shape public opinion online, just as they do in traditional media. The decrease in search blocks perhaps indicates that the Party is still seeking to find a balance between utilizing the Internet as a check on officials and suppressing social media to prevent dissent; or perhaps it is a short-term effect due to a new wave of leaders taking office.
“Wikipedia may be hesitating to switch to HTTPS-only because they fear they could be blocked completely in China. The fact that the censors have not fully blocked Gmail and Github, which have already switched to this HTTPS-only approach, speaks against this. On the other hand, the fact that Wikipedia has been fully blocked in the past shows that it’s a possibility. We argue that even if Wikipedia is blocked, that is better than the current, censored version. The reason that Wikipedia is better than for example Baidu Baike is that it’s not censored. By allowing the authorities to selectively censor articles, that whole argument is lost. Wikipedia should take a bold step clearly showing that they do not accept any level of censorship.”—The good folks at GreatFire.org make a forceful argument that Wikipedia should enable HTTPS by deault in China in order to prevent the continued censorship of sensitive articles (more background | 中文).