<List(s) of Chinese keywords for censorship testing and sensitive content collection>

Last week, a researcher during The Citizen Lab’s annual Connaught Summer Institute workshop raised an interesting problem. She wanted to test for censorship on a Chinese online service, and she had somewhat limited resources and time. What keywords should she use for her test?

In theory, this is a solved problem, what with the numerous lists of censored and sensitive Chinese keywords available on the web, including those shared by this site. However, sometimes the keyword list may be too broad for one’s taste, or may simply have too many keywords to efficiently use. And plus, what if I only want to test the most sensitive of the keywords, e.g., Falun Gong, June 4, Xi Jinping, and so on? For those not experienced in Chinese or Internet censorship, this can be a daunting task to winnow down already existing lists to something more usable.

Thus, a few of us sat down at the workshop and we collected 8 known Chinese keywords lists (see below) and aggregated them together in a single, easily share-able and sortable file, which we’ve posted to Github. The CSV files contain not just the keywords, but all sorts of other info like translations and tags (though not all of them; it’s an ongoing project which you are welcome to contribute to since it’s an open-source project).

As of Aug 4, there are 8,087 sensitive keywords collected from 8 different lists. To get a sense of what data is included in these CSV files, you can view a spreadsheet of these 8,087 keywords sorted by the number of lists they appear on.

Creator Tested on/found from # of keywords Year Method + source
The Citizen Lab Sina UC 1,818 2013 reverse engineered from the client; more analysis here; download link
The Citizen Lab Tom-Skype 2,574 2013 reverse engineered from the client; more analysis here; download link
The Citizen Lab LINE 673 2014 reverse engineered from the client; more analysis here; download link
Jason Q. Ng (Blocked on Weibo) Sina Weibo 839 2013 running Wikipedia China article titles through Sina Weibo search; more analysis and book
Xia Chu Great Firewall 669 2014 HTTP request scans of Wikipedia China articles to see if they’d trigger GFW block; more analysis here; download link (removed duplicates and keywords related to meta and user pages)
China Digital Times Sina Weibo 2,448 ongoing crowdsourced testing of suspected sensitive keywords on Sina Weibo; more analysis on CDT and in CDT’s Grass Mud Horse Lexicon e-book; download link
GreatFire.org Wikipedia 488 2013 testing to see if Wikipedia pages are available in China; more info; download link
Google/ATGFW.org Google/Great Firewall 456 2012 ATGFW.org and GreatFire.org reverse engineered the keywords Google was using to warn users of censorship while using their service in China; download link

To follow future changes to these lists, you can follow the Github repository. You are encouraged to adapt and update these lists as you see fit, however please do credit back to the Github repo if you do. Hopefully this is helpful to researchers who are searching for sensitive content in Chinese or testing for network interference.

<64 Tiananmen-related words blocked today (June 4, 2014)>


     (photo credit: CND.org?)

Today, on the 25th anniversary of troops being ordered into Tiananmen Square to clear student demonstrators, I tested several hundred June 4-related keywords on Weibo. I used the same set of keywords that I tested last year at The Citizen Lab, and I found relatively similar levels of censorship this year. Also like last year, the enhanced restrictions on keyword searches were apparently implemented specifically for the anniversary: for instance, 坦克 (tank) and 六四 (6-4, i.e., June 4) were free to be searched as recently as May 11.

A short article over at WSJ provides some more context and lists those 64 keywords that I identified as being blocked from searching on Weibo today. Of the keywords I tested, there were nine that were unblocked last year but have been added to the blacklist this year, including 八九 (89), 维多利亚公园 (Victoria Park, the site of the commemorative vigil in Hong Kong), and VIIV (roman numerals for June 4).

After the jump are the 64 keywords that are currently blocked from searching on Weibo (though some will no doubt be unblocked once this sensitive period passes):

Read More

地铁涨价方案 (subway price increase plan / dìtiě zhǎngjià fāng’àn) refers to Beijing’s plan to shift its subway fares from the current rate of 2 yuan per trip (roughly 30 cents) to a distance-based price system. Beijing has among the cheapest subway fares in China, and the system is heavily subsidized by the state, with the government claiming they lose 5 yuan on every trip.

Why it is blocked: The plan was floated in late 2013, igniting outcry among subway users online after forthcoming price adjustments were officially confirmed in March of this year. The story picked up steam again last week when a photograph of a supposed document outlining the specific price increases was circulated. The alleged document stated that short trips would rise to 3 yuan, and long trips would be capped at 5 yuan.

However, while previous discussion of the price increases show results that appear to have been mostly uncensored, with large numbers of posts criticizing the plan (though a fair number do support it, arguing that the increases may reduce the subway’s notorious overcrowding), posts sharing this new document were censored according to Free Weibo. Authorities quickly denounced the document and stated that pricing plans were still being evaluated. Not all posts containing the photo of the document were deleted: for instance, this one from China Daily, a state newspaper, were allowed to stand as they contained a message refuting the document.

No doubt, Beijing authorities are very sensitive to the potential for increasing unrest surrounding the issue, which gets at the rising income inequality in the city. (More about this in my WSJ article.)

Photo source: AFP/Getty Images via WSJ

On March 1, an organized group of knife-wielding attackers indiscriminately stabbed passengers at a railway station in the Chinese city of Kunming. Over 140 were injured and 33 were reported killed. Suspicions on Weibo and state media turned to Uyghur separatists from the northwestern province of Xinjiang. China Digital Times reported that the government had issued the following directive to news organizations:

Media may publish a moderate amount of criticism and Internet commentary which oppose terrorism and violence and which condemn the killers. However, do not hype this incident.

The following word combinations are found to be currently blocked from searching on Weibo:

  • 恐怖 + 新疆 (terrorist + Xinjiang / has been blocked in past)
  • 砍杀儿童 (children stabbed and killed / also connected to other past incidents)
  • 新疆 + 昆明火车站 (Xinjiang + Kunming train station)
  • 穆斯林 + 昆明火车站 (Muslim + Kunming train station)
  • 维族 + 昆明火车站 (Uyghur + Kunming train station)
  • 东突 + 昆明火车站 (East Turkestan Liberation Organization + Kunming train station)

Note: 昆明火车站 (Kunming train station) on its own is searchable.

秘书帮 (secretary gangmìshu bāng): the secretaries in question are not your typical clerical workers, but the powerful party secretaries and protégés of embattled retired Politburo member Zhou Yongkang. In the past week, two more close political associates of Zhou’s were detained for investigation for “serious violation of discipline” (a euphemism for corruption) by the CCP’s Discipline Committee. After a pair of China Business Journal articles popularly described the group as a “secretary gang,” the term is now found in articles in local papers, Baidu Baike, and even the official China Youth Daily. “Secretary gang” is also a sort of reference to the Shanghai gang, a term used to criticize former president Jiang Zemin’s close allies, who were also accused of corruption.

Why it is blocked: And while none of the articles make direct mention of Zhou Yongkang’s name it appears that Zhou is being methodically prepared for a downfall. (Update: As of February 27, a Baidu Baike user edited the article’s oblique reference about “a certain retired member of the standing committee” to Zhou directly; one wonders how quickly that version will stay.) The investigation of Zhou, who is already reportedly under house arrest, is drawing intense scrutiny from domestic and foreign observers as to how far Xi Jinping is willing to crack down on corruption at even the highest levels of the Party—or use the charge as a fig leaf to take down someone who was once Bo Xilai’s most ardent supporter. 

While blocking a politician’s name is often about protecting them from criticism, one might argue that in this case the government is less concerned with protecting Zhou than with controlling any sort of discussion that might spring up from an opening up of Zhou’s misdeeds for public discussion. The government’s crackdown last summer on online rumors—which included the targeting of journalists and anti-corruption watchdogs who were once encouraged by authorities—shows that officials are still incredibly wary of the unpredictable nature of Internet discourse.

七宗罪 (seven deadly sinsqī zōng zuì) are a category of vices that according to Catholic teachings threaten a person with eternal damnation. They are wrath, greed, sloth, pride, lust, envy, and gluttony. The phrase was blocked on Weibo until October 2012, at which point it was unblocked. It remains searchable to this day.

Why it is blocked: At the time I was writing my book in 2012, I theorized that the word might be related to religious sensitivity or some other moral issue, but otherwise was quite mystified as to the specific reason for the censorship. However, while reading Emily Parker’s Now I Know Who My Comrades Are, I finally connected the dots: in the book, she interviews numerous Chinese bloggers and activists, including Michael Anti. She notes that in 2002, Anti, who was fed up with traditional Chinese news media, wrote a guide for aspiring reporters entitled Manual for New Journalists (新新闻人自学手册), which included exhortations like:

After we’ve said ‘F**ck’ to the giant media system we can begin our individual journeys through the desert to become new journalists. Maybe we won’t successfully reach the Holy Land of freedom of the press, but at least we will leave the enslavement of truth. (translation from Parker)

The guide included an appendix (也谈中国新闻记者七宗罪 | Sina mirror) that listed the “seven deadly sins” of Chinese reporters: ignorance, cowardice, thirst for power, naïveté, pride, low self-esteem, and despair. The “seven deadly sins” are thus not related to any religious improprieties, but rather the moral failures Chinese journalists should watch out for. The “seven deadly sins” appendix along with the other documents in the manual serve as a passionate attack on traditional Chinese news media and a call for a new generation of journalists to take their place—criticism that authorities no doubt did not look kindly upon.

Note: If you are interested in digital activism, I highly recommend you take a look at Emily Parker’s Now I Know Who My Comrades Are, out next week. I hope to write more about it later, but suffice it to say it’s an insightful look at how some bloggers and activists in China, Cuba, and Russia are using the Internet. Filled with interviews of these folks on the front lines, it was an especially good palate cleanser for me after reading Evgeny Morozov’s To Save Everything, Click Hereyet she also does a very good job acknowledging the limitations of the Internet and those who utilize it. Clearly, as evidenced by this post, I learned a lot from it.

老胡同 (old hutonglǎo hútòng) is an urban feature found in the historic districts of several Chinese cities, most notably in Beijing. Hutongs are literally the narrow streets or alleys in these old neighborhoods, some of which trace their roots back to nearly a thousand years ago, but hutongs now generally refer to these old neighborhoods themselves and the distinctive style of architecture and traditional culture held within.

Why it is blocked: Hutongs stand as a marked contrast to the new commercial and dense residential buildings found in cities across China. Not surprisingly, hutongs have been the source of numerous controversies, especially in recent years as urban development in China continues. The destruction of hutongs, which admittedly has been ongoing for centuries in China, received particular attention in the run-up to the 2008 Beijing Olympics, for which city officials razed numerous old neighborhoods in order to build infrastructure and modern buildings. It was reported that citizens who were ordered to vacate their homes were undercompensated, and the loss of history was mourned by local residents, preservationists, netizens, and tourists alike. It is this combination of citizens protesting the loss of their homes—and as with any story about land grabs in China, a whiff of corruption between city officials and the developers who stand to profit the most also hangs in the air—and foreign media attention that causes 老胡同 to be a sensitive term.

credit: Sean Gallagher

Updated Jan 30: The peerless Brendan O’Kane smartly points out that 老胡同 could simply be a mocking nickname for Hu Jintao, in which case 老胡同 would translate to Old Comrade Hu. I wasn’t aware if this nickname was popularly used, but even if not, this would not be the first time an incredibly obscure reference to an official was blocked because it was insulting. However, a little sleuthing reveals that it has been used, although apparently not always in a mocking fashion. Some of the references appear genuine (though how much irony is being lost on me, I don’t know since one would have to be part of the community to really get if it’s an in-joke or not.)

Another theory (though as @bokane notes, it’s not quite grammatical): It could also be an abbreviation for 胡锦涛老同志, that is Hu Jintao’s old comrades (同 might also be short for 同学, classmate). In that case, 老胡同 would be a criticism of the Communist Party patronage system, wherein top officials promote and appoint their longtime friends, business partners, and classmates. Hu Jintao wasn’t quite as notorious as some top leaders for bringing his old-boy’s network with him to the top (or perhaps he wasn’t as successful at it as Jiang Zemin, whose Shanghai Clique ruled much of Chinese government and business throughout his time in power), but the so-called Youth League Faction was seen as Hu’s base of support. Though Hu Jintao and numerous other top officials are now technically unblocked from searching on Weibo, many combinations of the surnames of Hu, Wen (Jiabao), and Xi (Jinping) with other words are blocked still, and 老胡同 would fit that pattern.

Here’s a long, wide-ranging interview I did with VICE magazine about Internet censorship in China. Thank you Reihan for the thoughtful questions and for putting up with my rambling.

Also, if any folks are in DC, I’ll be giving a lunchtime talk at Georgetown’s School of Foreign Service on Friday, Jan 24. It’s free and open to the public, so please share with any who are in the area and might be interested. Much appreciated.

<Comments and takeaways from Xia Chu’s “Complete GFW Rulebook for Wikipedia”>

Note: Update to “64-byte search string limitation indicates Weibo and GFW” section added Jan 11

If you are interested in Chinese Internet censorship, I highly recommend you flip through Xia Chu’s latest update to his* research project “Complete GFW Rulebook for Wikipedia.” This latest revision of a document originally released last October, it identifies a massive list of actual trigger words (which Xia calls “rules” because they are often attached to specific conditions) which cause a Chinese Internet user’s connection to specific sites like Wikipedia to be disrupted by the Great Firewall (GFW). Not only that, it also includes a list of over 3,600 websites that he has currently confirmed to be unreachable from within China due to the GFW. The conclusions in the paper don’t necessarily upend anything that we thought about the GFW, but if you want a peek behind the curtain of how the GFW works (big takeaway: IT’S REALLY HAPHAZARD), this is as close as we can currently get.

The methodology behind Xia’s testing is sound and the breadth is among the most comprehensive attempts to document the Great Firewall’s blacklisted keywords—though Xia notes his debt to Jed Crandall et al’s ConceptDoppler paper, GreatFire.org, and others, including arrested civil rights lawyer Xu Zhiyong, for inspiring him. The paper is mostly jargon-free, and the testing process used is transparent and not at all ultra-sophisticated (a compliment!); an amateur coder like myself could replicate everything that Xia has done in the paper. The paper is pretty self-explanatory and there’s not much commentary for me to add, but below are a few notes I’ll make including a description of a similar tool I’ve developed for identifying sensitive keywords in Chinese news articles as well as how there are curious coincidences between how Sina Weibo and the GFW censor.

Read More

<Looking back on 2013: Five Blocked on Weibo posts I particularly liked from last year>

2013 has personally been an incredibly fun year. I finished grad school, my book was published, and I started working for this neat research lab. Chinese Weibo users though, especially prominent ones, had a particularly rougher time, with increased harassment and censorship by authorities inducing an unfortunate chill on discussion of sensitive topics on the site. Here’s hoping the next year brings a relaxation of such policies: I couldn’t be happier if I had nothing to write about on this blog.

So before we move on to 2014, a look back at five Blocked on Weibo keywords and posts that I particularly enjoyed uncovering and writing about in the past year:

1) Jan 23: 宪法法院 (constitutional court) is blocked during the Southern Weekend censorship controversy.

2) Mar 9: Weibo censors delete post of masked Mao portrait criticizing Beijing air pollution.

3) Jun 4: “The Flower of Freedom” (自由花) is a Cantonese song written by Hong Kong lyricist Thomas Chow to commemorate the victims of the 1989 Tienanmen crackdown.

Read More

我沒有敵人 ("I Have No Enemies" / wǒ méiyǒu dírén) is a speech written by jailed dissident and Nobel Peace Prize winner Liu Xiaobo. Liu was arrested in December 2008 just before the release of Charter 08, a document he co-authored calling for various political and legal reforms in China. He was formally charged in June 2009 on charges of “suspicion of inciting subversion of state power” and was tried on December 23, 2009. He was convicted and began serving an 11-year sentence four years ago, today.

Why it is blocked: "I Have No Enemies: My Final Statement" was a prepared speech Liu read to the court during his trial. However, after 14 minutes, the judge cut him off, saying Liu had used up his allotted time. The full speech was published and widely circulated online in Chinese in January 2010 and gained even more prominence when it was read aloud in English by actress Liv Ullmann during the 2010 Nobel Peace Prize Ceremony (video: part 1 | part 2).

Though Liu decries the supposed crimes for which he has committed and maintains his innocence, it is not an angry rant. Liu primarily takes on a martyr’s role: sticking to his ideals, accepting his fate as a victim, thanking his prosecutors and judges for their decency during the trial, and noting with optimism that change is on the horizon. Liu then addresses his wife, Liu Xia:

Throughout all these years that I have lived without freedom, our love was full of bitterness imposed by outside circumstances, but as I savor its aftertaste, it remains boundless. I am serving my sentence in a tangible prison, while you wait in the intangible prison of the heart. Your love is the sunlight that leaps over high walls and penetrates the iron bars of my prison window, stroking every inch of my skin, warming every cell of my body, allowing me to always keep peace, openness, and brightness in my heart, and filling every minute of my time in prison with meaning. My love for you, on the other hand, is so full of remorse and regret that it at times makes me stagger under its weight. I am an insensate stone in the wilderness, whipped by fierce wind and torrential rain, so cold that no one dares touch me. But my love is solid and sharp, capable of piercing through any obstacle. Even if I were crushed into powder, I would still use my ashes to embrace you.

The speech is part-love letter, part-reflection on Liu’s past, part-manifesto. I’ll cite a few notable passages, but it should be read in full (HRIC translation | David Kelly translation):

When I think about it, my most dramatic experiences after June Fourth have been, surprisingly, associated with courts: My two opportunities to address the public have both been provided by trial sessions at the Beijing Municipal Intermediate People’s Court, once in January 1991, and again today. Although the crimes I have been charged with on the two occasions are different in name, their real substance is basically the same - both are speech crimes. […]

But I still want to say to this regime, which is depriving me of my freedom, that I stand by the convictions I expressed in my “June Second Hunger Strike Declaration” twenty years ago ‑ I have no enemies and no hatred. None of the police who monitored, arrested, and interrogated me, none of the prosecutors who indicted me, and none of the judges who judged me are my enemies. Although there is no way I can accept your monitoring, arrests, indictments, and verdicts, I respect your professions and your integrity, including those of the two prosecutors, Zhang Rongge and Pan Xueqing, who are now bringing charges against me on behalf of the prosecution. During interrogation on December 3, I could sense your respect and your good faith.

Hatred can rot away at a person’s intelligence and conscience. Enemy mentality will poison the spirit of a nation, incite cruel mortal struggles, destroy a society’s tolerance and humanity, and hinder a nation’s progress toward freedom and democracy. That is why I hope to be able to transcend my personal experiences as i look upon our nation’s development and social change, to counter the regime’s hostility with utmost goodwill, and to dispel hatred with love. […]

I hope that I will be the last victim of China’s endless literary inquisitions and that from now on no one will be incriminated because of speech. Freedom of expression is the foundation of human rights, the source of humanity, and the mother of truth. To strangle freedom of speech is to trample on human rights, stifle humanity, and suppress truth. In order to exercise the right to freedom of speech conferred by the Constitution, one should fulfill the social responsibility of a Chinese citizen. There is nothing criminal in anything I have done. [But] if charges are brought against me because of this, I have no complaints. Thank you, everyone.

<Infographic showing Weibo censorship being linked to offline events in Bo Xilai scandal>

In the infographic below, we have collected data from a number of sources, including GreatFire.org, China Digital Times, Blocked on Weibo, and Twitter users to chart the moments when Bo’s name became blocked or unblocked on Weibo. The speculation is that the authorities blocked his name when online conversations got too unpredictable to control and unblocked it when they sought to give netizens the space to criticize Bo. We have lined up those moments with what was taking place offline at the same time, presenting a connection between how real-life political turmoil was often reflected in changes in censorship online. Click to launch the interactive infographic.

Concept, Research, and Authorship: JASON Q. NG
Development: ANDREW HILTS

Source: The Citizen Lab

<Update: The Chinese keywords on messaging app LINE’s “bad words” list and why they are “bad”>

Last week, the research lab I pitch in at published the first in a series of posts investigating censorship and privacy concerns in three chat applications: WeChat, LINE, and KakaoTalk. These instant messaging programs, which often replace text messages on smartphones, are expanding rapidly across the world. While WeChat has garnered most of the foreign press, LINE, a Japanese subsidiary of the Korean Internet giant Naver, is no pushover: it has over 200 million registered users, generated $130 million in revenue last year, and is poised for a $10 billion market cap value when it goes public next year.

I’ve already written a number of blog posts translating and describing some of the 150 words that were initially revealed to be on LINE’s “bad words” list. This list, uncovered by Twitter users @hirakujira, was thought to be a precursor to future censorship by the LINE application, but The Citizen Lab’s recent reports uncovered a second set of 370 keywords which do trigger censorship—but only for users who have registered with a Chinese phone number. Thus, LINE users in China would receive error messages when sending messages that contain any of these keywords and asterisked-out text when receiving them. 


In addition to the series of 21 blog posts I did on the first chunk of the original list of uncovered “bad words” in LINE, I have translated the remainder of the 150 keywords on the original list as well as translated the majority of the 370 keywords on the recently decrypted list in the following spreadsheets:

Read More

64屠城 (June 4 massacre / 64 túchéng) refers to the crackdown on the student-led protesters in Beijing’s Tiananmen Square on June 4, 1989. After months of demonstrations in the heart of Beijing and much internal infighting about how to handle the protesters, authorities decided to send in tanks and troops to quell the “disorder.” Even though according to recent revelations, there wasn’t actually much blood spilled on the square itself, hundreds or thousands were still killed (the exact figure is debated) that night across the city. These victims are commemorated every year on June 4 in Chinese diasporic communities around the world—most notably in Hong Kong which holds an annual memorial in Victoria Park. However, such remembrances are strongly discouraged on the mainland and every year there is a marked increase in censorship around the date.

屠城 is an interesting phrase in that it generally refers to massacres that take place in cities—typically the killing of civilians in a captured city by military forces. The Nanjing Massacre of 1937 (also known as the Rape of Nanking) is perhaps the most notable recent 屠城 in Chinese history; others include the Yangzhou Massacre and the three massacres in Jiading in 1645.

In a break from our usual series of highlighting words blocked from searching on Weibo, for the next two days I’ll be looking more deeply at the keywords on chat messenger app LINE’s “bad words” list. For more about this series, see this introductory post.

李正源李刚 (Li Zhengyuan; Li Gang / Lǐ Zhèngyuán Lǐ Gāng) are the names of two unrelated people named Li who were both involved in scandals involving drunk driving, censorship, and attempted abuses of power. In addition to the similarities in names and other parallels, both cases showed the growing power of social media as a tool for correcting injustices.

Li Zhengyuan, the son of Taiyuan’s city’s police chief Li Yali, beat up a traffic cop in Oct 2012. Zhengyuan had been pulled over by a traffic cop (交警 / jiāojǐng), Xia Kun (夏坤) for drunk driving, at which point he assaulted the officer in front of multiple eyewitnesses, who posted evidence of the beating online. He was not arrested and instead was escorted home by other police officers. A cover-up followed, with surveillance footage deleted and a blackout on reporting of the incident. Zhengyuan’s father, Li Yali, was eventually found to be responsible for the cover-up in addition to selling positions on the police force, and was removed from his post in Dec 2012.

Li Gang is the name of a deputy police chief whose son, Li Qiming, was involved in a hit-and-run. In Oct 2010, Qiming drunkenly drove into and killed a rollerblading college student on Hebei University’s campus grounds. He drove away and when security officers caught up to them, he sought to escape punishment by declaring “My father is Li Gang!”—assuming this gave him immunity. After this was reported, outraged internet users tracked down Qiming, turning him and his brazen declaration into a meme and symbol of injustice in Chinese society. Authorities tried—and failed—to control the increasing outrage by censoring the event, and in the end Qiming was sentenced to six years in prison.

In a break from our usual series of highlighting words blocked from searching on Weibo, for the next two days I’ll be looking more deeply at the keywords on chat messenger app LINE’s “bad words” list. For more about this series, see this introductory post.