If you are interested in Chinese Internet censorship, I highly recommend you flip through Xia Chu’s latest update to his* research project “Complete GFW Rulebook for Wikipedia.” This latest revision of a document originally released last October, it identifies a massive list of actual trigger words (which Xia calls “rules” because they are often attached to specific conditions) which cause a Chinese Internet user’s connection to specific sites like Wikipedia to be disrupted by the Great Firewall (GFW). Not only that, it also includes a list of over 3,600 websites that he has currently confirmed to be unreachable from within China due to the GFW. The conclusions in the paper don’t necessarily upend anything that we thought about the GFW, but if you want a peek behind the curtain of how the GFW works (big takeaway: IT’S REALLY HAPHAZARD), this is as close as we can currently get.
The methodology behind Xia’s testing is sound and the breadth is among the most comprehensive attempts to document the Great Firewall’s blacklisted keywords—though Xia notes his debt to Jed Crandall et al’s ConceptDoppler paper, GreatFire.org, and others, including arrested civil rights lawyer Xu Zhiyong, for inspiring him. The paper is mostly jargon-free, and the testing process used is transparent and not at all ultra-sophisticated (a compliment!); an amateur coder like myself could replicate everything that Xia has done in the paper. The paper is pretty self-explanatory and there’s not much commentary for me to add, but below are a few notes I’ll make including a description of a similar tool I’ve developed for identifying sensitive keywords in Chinese news articles as well as how there are curious coincidences between how Sina Weibo and the GFW censor.