Showing posts tagged socialmedia

<The Chinese keywords on messaging app LINE’s “bad words” list and why they are “bad”>

For an updated summary of the Citizen Lab’s excellent research into censorship on LINE to which I contributed, see this Nov 21, 2013 post.

Back in May, Twitter user @hirakujira was poking around in the code for Lianwo, the Chinese version of the popular mobile chat app LINE, when he noticed a curious line: “<key>warning.badWords</key>” followed by a string that read in Chinese: “Your message contains sensitive words, please adjust and send again.” Hirakujira subsequently identified the application files which contain these so-called “bad words” and posted them. The Next Web and Tech in Asia reported on how even though LINE (which is a Japanese spin-off of Naver, a Korean company) wasn’t yet actively censoring messages sent through its Chinese-branded app, the inclusion of such files indicated they had built in such a capability into the program—a forward-thinking move for any foreign content provider/distributor that hopes to succeed in China.

What I hope to do in the next few days is to take a closer look at the first roughly 40 of the 150 words that Hirakujira posted, translating and explaining the significance of those words with respect to current Chinese politics. Some are quite obvious, but others are quite obscure. By examining the words, we may hope to get a sense of what LINE thinks is worth censoring in order to appease their Chinese regulators. So for the next few days, consider this site rebranded as “Blocked on LINE (maybe in the future).”

The twenty-one posts (links will be added as the posts go up):

  1. 浙江签单哥Zhejiang’s receipt-signing Brother
  2. 警察杜平 / Police Dupin; 宣恩杀人现场 / Xuanen murder scene
  3. 叶迎春内衣 / Ye Yingchun underwear; 叶迎春 / Ye Yingchun
  4. 孙国相拆迁 / Sun Guoxiang demolition
  5. 中央领导内幕 / Central leadership insider
  6. 盘锦开枪 / Panjin shot; 四学者建言 / Four scholars suggestions; 
  7. 盘锦二表哥姜伟华Panjin, Second Watch Brother: Jiang Weihua; 姜伟华名表 / Jiang Weihua namebrand watches; 江诗丹顿 表叔 / Vacheron Constantin uncle
  8. 只身挡坦克 / Tanks block alone
  9. 爆料不孝女Expose: unfilial daughter; 爆料朱熹后人 竟是政协委员 / Expose: Zhu Xi’s descendants, suddenly CPPCC committee members
  10. 人大附中择校费杨东平Renmin High School, school choice fees; Yang Dongping
  11. 奥数叫而不停 / Complaints about Math Olympiad have not ceased
  12. 帝都 实行宵禁Imperial Capital implements night curfew
  13. 11月5日至15日 出租车禁行Nov 5 to 15 rental cars banned; 表叔 陈应春 / Uncle Chen Yingchun
  14. 江泽民被控制Jiang Zemin has been controlled; 江系军委被撤 / Jiang withdraws from Military Commission
  15. 张蓓莉200万耳环Zhang Peili 2 million RMB earrings; 温家 戴梦得 / Wen [Jiabao] Diamond; 温家宝 27亿 / Wen Jiabao 2.7 billion [USD]; 影帝温家 / Actor Wen Jiabao; 温家 资产700亿 / Wen Jiabao assets 70 billion
  16. 网络封锁 / Internet blockade
  17. 维族 砍人Uyghurs stab people
  18. 和田 暴乱 / Hotan rebellion
  19. 万鄂湘亚视 / Wan Exiang, Asia Television Limited
  20. 李正源李刚Li Zhengyuan, Li Gang; 交警夏坤 / Traffic cop Xia Kun
  21. 64屠城June 4 massacre


<480 keywords blocked from searching on Weibo as of Jun 29, 2013>

During the past month, I’ve been working as a summer research fellow at The Citizen Lab in Toronto. It’s been great to not only have time to dedicate to updating this blog and pushing forward collaborative projects with researchers I’ve been fortunate enough to meet over the past two years, but also to pitch in with all the amazing work being done here at this one-of-a-kind lab. Among the projects I’ve been helping out with is one pertaining to the list of censorship and surveillance keywords in the Chinese chat clients TOM-Skype and Sina UC, which the team decrypted then analyzed in collaboration with Jed Crandall and Jeff Knockel at the University of New Mexico. 

Of course, my first desire was to take the keywords they extracted and to test them on Weibo. Below are 480 unique keywords which were blocked from searching on Sina Weibo as of June 29. I’ve written more about the other censorship games I’ve detected in this post over at The Citizen Lab’s blog. Among the things I discuss are the overlap of keywords between different Internet services in China as well as what drastic changes in the number of search results for keywords might mean.

A full spreadsheet of the data mentioned in the report can be viewed in this Google Fusion Table or downloaded in .csv format for further analysis by all you researchers reading along at home. I look forward to sharing other relevant work my colleagues and I get done at the Lab during the rest of the summer.



<Where do Weibo users live? City and provincial breakdown of various Chinese Internet statistics>

They live in Guangdong (well, many of them do at least):

Some background: Now that I finally got around to playing with Weibo’s API, I’ve been collecting (you might call it hoarding…) a lot of fun data. I’m currently engrossed in this dataset I’ve developed of anti-Japanese comments and I’ve been doing a lot of spatial analysis—all of which is only possible because Weibo neatly provides a wealth of detailed location data included with every post/comment. Whereas Twitter offers whatever location a user supplies (“In your head”; “Your mom’s house”) along with a time zone (geo-coordinates and detailed location info are only available on a tiny percentage of tweets), Weibo’s API neatly gives you every user’s province, city code, and chosen location. The options are selected, not filled-in, so the data is super clean and crisp (well, outside of people who lie about their location).

Thus, seeing as it might be helpful for my other projects to know where Weibo users are blogging from (or at least say they are), I conducted a data expedition, grabbing the latest 200 posts from Weibo every five minutes for one full week. After discarding repeat messages (Weibo’s API doesn’t guarantee the posts are the absolute most recent, though for the most part, the majority of the posts matched my download date-time), I came up with a sample of 283,109 unique users, 236,611 of whom live in mainland China and which I used to generate the map above and chart below (this whole exercise was basically an excuse to show off some of Google’s super easy-to-use Fusion tables and an unnecessary distraction to my thesis writing, sigh).


direct link

Read More



<There are NOT millions of Twitter users in China: Supporting @ooof’s result and refuting GWI’s conclusion>

The question of how many Chinese Twitter users there are made headlines a few months back when the market research company GlobalWebIndex published results from a survey which claimed that 35 million people in China used Twitter. Media outlets ran with the story of how there was a huge secret upswell in “free” netizens in China who climbed the Great Firewall to access blocked sites like Twitter, with the seeming implication being that revolución! was just around the corner. Social/human rights progress may still indeed take place in China in the near future, but most smart social media watchers agree it won’t be because of Twitter: Chinese folks just aren’t on the service in the same numbers that they are on other local social media sites like Sina Weibo, RenRen, and even upstart mobile apps like WeChat/Weixin. People (and even companies in advertisements) don’t pass around their Twitter handle in the same frequencies as they share their Weibo contact info.

Even if our eyes told us that Twitter seemed to have attracted an active but small group of activists in China—but not many others in the country—was there a possibility that we were all missing something? Was there really a secret group of Chinese Twitter users being overlooked? Fortunately, after this week, I hope we can finally dismiss GWI’s 35 million number once and for all. Inspired by an SCMP story detailing the findings of the Chinese Twitter user @ooof (h/t Steven Millward of Tech In Asia)—who cleverly used data on the website Twiyia.com to conclude that roughly 18,000 people who posted a tweet in Chinese selected Beijing as their home timezone—this weekend I performed a similar test using publicly available tweets on Twitter utilizing its API. According to the data I extracted, there are most likely tens of thousands of Twitter users in China, not millions as claimed by GWI, a result that confirms @ooof’s finding.[1a] The exact numbers @ooof and I come up with may differ, and only Twitter itself would be best able to  reveal how many Chinese Twitter users there actually are, but our independent results are likely within an order of magnitude to the actual number of Twitter users in China, unlike GWI’s result which is about 2000 times greater than our calculations. The hard evidence backs up what our eyes are telling us.

If you’re interested in the technical information of how I performed this fairly rigorous (though certainly not at the level of an academic research paper) test, read on. (Apologies for the non-Weibo-related post; I hope it’s still of relevant to those who read this blog.)

Read More



<Who in Wen Jiabao’s family is blocked on Weibo>

Note: 0 is the new blocked (results below are from Sina Weibo, Nov 4, 2012)

温家宝 (Wen Jiabao): 0 results
张蓓莉 (Zhang Beili, wife): 0 results
杨志云 (Yang Zhiyun, mother): 0 results
温家宏 (Wen Jiahong, younger brother): 0 results
温云松 (Wen Yunsong, son): 0 results
杨小萌 (Yang Xiaomeng, daughter-in-law): unblocked
温如春 (Yun Ruchun, granddaughter): unblocked
劉春航 (Liu Chunhang, granddaughter’s husband): unblocked
张建明 (Zhang Jianming, brother-in-law): unblocked
张剑鹍 (Zhang Jiankun, brother-in-law): 0 results
于剑鸣 (Yu Jianming, Wen Yunsong’s classmate and business partner): 0 results
段伟红 (Duan Weihong, investor): 0 results
郑裕彤 (Chen Yu-tong, investor): unblocked
李嘉诚 (Li Ka-shing, investor): unblocked

image source: NY Times, The Wen Family Empire



<All sensitive terms on Sina Weibo now show 0 results>

As of the beginning of this month, Sina Weibo has made a number of changes to the way they handle their censorship of search results. I’ve previously tweeted about a rising number of searches that are “partially blocked" rather than blocked wholesale with the typical "According to relevant laws, search results are not displayed" message.

Read More