Thursday, November 17, 2011

Innovative Features or Fast UI?

High Scalability發表的Google+開發工具介紹時,提到為什麼Google+ API推出速度緩慢的猜測:
So the approach is: make the UI first, make it fast, and then wrap an API around whatever evolved. A controversial methodology, but given the imperative for making a responsive UI, it makes sense.

但是作者又問:
However good this approach is for creating fast UIs, it's death for feature development and fast responsive innovation. With an API you can develop in parallel and release stuff faster and iterate faster.

身邊一些技術背景的朋友,對 API 遲遲未能開放,看不到各式各樣 APP 的狀況,都是持負面看法的,所以上述疑問是很多人共同的看法吧。

Google的做法究竟是對是錯呢?

Thursday, August 25, 2011

資安管理十誡 - The 10 deadly sins of information security management

近日查閱資安管理(information security management)資料,不時看到 Ten Deadly Sin 字眼出現,今晨看過往Business Management 雜誌對現在公司所在領域的幾個原廠總字輩大佬的專訪,又看到文末列出這十宗罪(十宗罪好像是大陸習慣用法?)。

透過谷歌,很快就查到,原來這十誡典出學者 B. von Solms and R. von Solms 於 2004年7月發表在 Computer & Security 的文章。此十誡言簡意賅,強調資安管理不是技術部門獨有的責任,旨哉斯言,爰執鍵(盤)為之記。
  1. Not realizing that information security is a corporate governance responsibility (the buck stops right at the top)
  2.  Not realizing that information security is a business issue and not a technical issue
  3. Not realizing the fact that information security governance is a multi-dimensional discipline (information security governance is a complex issue, and there is no silver bullet or single ‘off the shelf’ solution)
  4. Not realizing that an information security plan must be based on identified risks
  5. Not realizing (and leveraging) the important role of international best practices for information security management
  6. Not realizing that a corporate information security policy is absolutely essential
  7. Not realizing that information security compliance enforcement and monitoring is absolutely essential
  8. Not realizing that a proper information security governance structure (organization) is absolutely essential
  9. Not realizing the core importance of information security awareness amongst users
  10. Not empowering information security managers with the infrastructure, tools and supporting mechanisms to properly perform their responsibilities
[書目資訊]

B. von Solms and R. von Solms, "The 10 deadly sins of information security management," Computers & Security, vol. 23, no. 5, pp. 371-376, Jul. 2004. [Online]. Available: http://dx.doi.org/10.1016/j.cose.2004.05.002

Wednesday, August 24, 2011

Size Matters - 機器學習的分散式和平行處理議題

John Langford的個人部落格看到這個消息:由 Ron Bekkerman (LinkedIn),John Langford (Yahoo! Research)和 Misha Bilenko(Microsoft Research)共同編輯的 Scaling up Machine Learning 將在今年底出版。而且他們將在 KDD 2011 以這本書為基礎發表 Scaling Up Machine Learning 的 Tutorial

依照 John Langford 的介紹:
This tutorial focuses on providing an integrated overview of state-of-the-art platforms and algorithm choices. These span a range of hardware options (from FPGAs and GPUs to multi-core systems and commodity clusters), programming frameworks (including CUDA, MPI, MapReduce, and DryadLINQ), and learning settings (e.g., semi-supervised and online learning). The tutorial is example-driven, covering a number of popular algorithms (e.g., boosted trees, spectral clustering, belief propagation) and diverse applications (e.g., speech recognition and object recognition in vision).

不在現場的我,是沒有機會親聆這場盛會啦,不過稍微瞄了下簡報檔,覺得這本書裡面的題材都蠻有意思的,比如說下面第一個圖提到克服隱私權疑慮的嘗試,和第二個圖 Tree Ensembles 的說明。

期待這本書的上市,不過看到價格實在是讓人有點遲疑啊,哈哈!




Monday, August 22, 2011

Defense in Depth 的解釋

在資通訊安全領域,有個源於軍事領域的術語 defense in depth (DID), 在軍事上,DID 是一種策略理念,防禦不能只靠一道強大的防線(比如說中國的長城和法國的 Maginot Line ),必須用多層次(multi-layer)、多角度、多點、多面的防禦,提高攻擊者的成本與難度,延遲破解防禦的時間,讓防守者有足夠的時間、空間找出破解攻擊的方案。

有人是這樣詮釋的:在建立防禦據點時,整個防禦工事的部署一定要具備足夠的縱深,才能提高攻堅的難度與拉長敵軍的攻擊時程,以避免一下子就被長驅直入。縱深的另一層意義就是要有重重的關卡,就像洋蔥或高麗菜那般地一層一層地把核心包裹得密不透風

以上的說法,把多層次的防禦和 in depth 的物理意義,說的精闢通透,但是沒有點出多點、多面的策略意義,而且把 DID 和資安領域另外一個詞彙 layered security 畫上等號。這兩個觀念雖然有很多重疊的地方,但畢竟不是同一件事,而且 layered defense 僅是 defense in depth 策略的一部分而已。

TechRepulibc 上有一篇 Chad Perrin 寫的短文,把這兩個很易於混淆的名詞間的差異解釋的很清楚。理解這兩個名詞最大的障礙就是望文生義造成的誤解,就像數學歸納法不是歸納,階層式防禦不只是千層派或高麗菜的層次,DID 裡的 depth 也不只是物理上的距離。

階層式的安全防禦理念說來單純,每個人(工具)都有優、缺點,寸有所長尺有所短,只用一種工具或技術來防禦顯然是不夠的,所以要應用各種技術布置在攻擊的路徑上,以達到最佳防禦效果。這個理念的精髓是工具間優劣互補(be used to cover the gaps in the others’ protective capabilities.),而不是用車輪戰累死攻擊方(當然,這是附帶的效果)。比如說,一個家計用戶,在上網的時候,可能採用的工具包括:
  • 防毒軟體
  • 防火牆
  • 親子內容控制(Parental Control)
  • 隱私權控制軟體、
像這樣從網路到內容,把安全防禦的網張到最大,就是典型的 layered security

DID 所圖則更遠大,防禦不僅要從技術面來考量,人員素質與觀念、工作標準程序都在策略範圍內,甚至還要考慮到反制手段(means tofight back actively)。美國政府的 Information Assurance Technology Analysis Center (IATAC) 出版的電子雜誌 IAnewletter 在 1999 年討論 Defense in Depth 時就提到:
The Defense in Depth approach employs and integrates the abilities of people, operations, and technology to establish multilayer,  multidimensional protection — like the defenses of a  castle. The approach employs successive layers, using a variety of methods at multiple, key locations,to prevent the potential breakdown of barriers and penetration to the innermost areas of the system.

透過技術,設下層層壁壘(successive barriers),僅是整個防禦策略的一個組成元素,還有制度與流程,有了規章律令,徒法不足以自行,人員的自覺與訓練也不能落下。在美國國家安全局(NSA)出版的白皮書中,很清楚的說明 Defense in Depth 是達成 Information Assurance 的策略,策略中包含人員、技術與作業流程三個構面。



在 NSA 白皮書中,還列出每個構面所需涵蓋處理的最小子集:
  • People
    • Policies and Procedures 
    • Training and Awareness
    • System Security Administration
    • Physical Security
    • Personnel Secuirty
    • Facilities Countermeasures
  • Technology
    • IA Architecture
    • IA Criteria (Security, Interoperability & PKI, etc)
    • Acquisition/Integration of Evaluated Products
    • System Risk Assessment
  • Operations
    • Security Policy
    • Security Management
    • Ceritification and Accreditation
    • Key Management
    • Readiness Assessments
    • Attack Sensing, Warning and Response
    • Recovery and Reconstitution
簡而言之,Defense in Depth 的目的是為保障資料安全與品質,它是一個全面性的策略指導原則。就像之前提到的,望文生義容易產生誤解,in depth 不是一維空間的距離,它根本是立體空間裡每個維度(dimension)的深度探討。Layered defense 是 DID 策略裡面的戰術之一,但不是全部。

[延伸閱讀]

Sunday, August 21, 2011

人心早就不古!

今年(2011年)才開始,澳洲的 Vodafone Australia 就爆出资料外洩事件,接著一個又一個大型資安事故的新聞接踵而至。

先是 Sony Play Station 網路被駭,然後韓國最大的社群網路 Cyworld 三千五百萬筆客戶資料外洩,還有不要忘了惡名昭彰的駭客團體  Anonymous 宣佈攻破 Exxon Mobil, ConocoPhillips, Canadian Oil Sands Ltd., Imperial Oil, the Royal Bank of Scotland 的消息。上半年才結束,立刻傳來國際貨幣基金(IMF)資料外洩的消息,相形下,衛生署對萬芳醫院病歷外洩裁罰五萬的消息,根本就是不成話的小兒科。

Ponemon Institute 和 Symantec 每年做資料外洩事故成本(cost of data breach)分析,以美國企業為樣本空間的報告顯示:資料外洩事故逐年增加,且處理的成本也逐年升高。今年四月份,Verizon Business 發布年度 Data Breach Investigations Report (DBIR) 的副標題 Breaches Increased Dramatically 更是為這些資安事故的蓬勃「發展」下了「畫龍點睛」的註腳

不過,資料外洩的事故,真的是如旭日東昇,一天比一天多嗎?資安領域著名的顧問公司 Securosis LLC 技術長 Adrian Lane 不這麼認為,他在 Dark Reading 發表的短文《Data Breach On The Rise?》以他自身輔導企業的經驗和媒體公佈的新聞相參照,斷言並沒有足夠的統計證據支持現在的資安事故比以前多(There is no statistical evidence that breaches are on the rise.)。

Andrian Lane 的論點很簡單,現在的資料庫技術和工具,確實比以前進步,但是人性並沒有改變、企業裡面便宜行事的流程沒有改善、為公司股價著想壓下資安(還包括工安)事故新聞的例子也沒有增減。僅憑登上媒體的公司的有名程度,就斷言現在的事故數比以前多,在論證上是站不住腳的。

換句話說,這年頭,好人沒有比過去少,壞人也沒有比過去多,人性如此,貪婪和犯罪企圖依舊。我們沒有活在比過去壞的世界裡,只是犯罪的技術與工具變了,媒體的報導風格變了,如是而已。以前的人沒有比較高尚,在企業裡撞鐘的和尚不多不少,行事風格依然故我,所以說 on the rise 言過其實,因為以前大過小錯就很多,只是沒有說給你聽而已

這篇短文讓我想起「人心早就不古」的老段子,坦白說,我還蠻認同  Andrian Lane 的觀點。

讀文章有感,是為記。

Friday, August 19, 2011

這回唱給未來的自己聽

好一陣子,讀書筆記寫的不多,倒是傷春悲秋的東西炮製了不少,於是不得不面對師友親人的「關注」和「提醒」。面對大家的關心,鄙人當然是虛心檢討,從此少說廢話啦。不過,今日在 Scientific American 看到傷春悲秋有益身心的最佳辯護,心理學家 James W. Pennebaker 說,他在 1980 年代發現,把自己的情緒變化寫下來,對書寫者的健康有益。

啊哈,感謝親友團的關心,往後鄙人要是在此胡言亂語,純粹是為了健康著想啦。

坦白說,今年三月底到六月初一場莫名而來的病,給我莫大的困擾,身心都受到極大的磨礪摧折。但職場裡面碰到的周折,除了事件發生當下的不快,我倒不是這麼看重,只是慨嘆人性果真如此,奢求奇迹例外不可得而已。

奇妙的是,當初那些人與事的負面猜測,都在心有所感後不久直接、間接的證實。不知是該高興自己始終心明眼亮,或是隨著年紀益發洞燭世情,還是有超強的第六感(grin)。

四月初,聽過 Bob Dylan 演唱會後,寫了篇《給未來的自己》,提醒自己莫忘本心(還有,莫做爛好人)。今晨,博士班同學在 FB 上分享梁靜茹唱的《給未來的自己》,反正都是借他人酒杯,澆自己塊壘,用唱的也不錯。


Tuesday, August 16, 2011

又一碗字母湯( Another Bowl of Alphabet Soup)

雖然走進資安領域,不在原本的規劃裏,但既然被機緣牽進這村徑,總要好好逛遍這山林,庶幾不負這不可言、不可測的緣法。既然要逛逛這原未涉足的土地,那就得認真閱讀此處的風土誌,要認懂這裡的地圖,就得認得這裡的方言俚語,於是得再服一碗字母湯

  • ACL - Access Control List
  • APWG - Anti-phishing Working Group
  • BASEL - Basel Accords
  • BSI - British Standards Institution
  • CA - Continuous Auditing
  • CA - Certificate Authority
  • CC - Common Criteria
  • CCM - Continuous Control Monitoring
  • CCM-AC - CCM for Application Configuration
  • CCM-MD - CCM for Master Data
  • CCM-T - CCM for Transaction
  • CEH - Certified Ethical Hacker
  • CERT - Community Emergency Response Team
  • CM - Continuous Monitoring
  • CME - Common Malware Enumeration
  • COBIT - Control Objectives for Information and related Technology
  • DAD - Database Access Descriptors
  • DAM - Database Activity Monitoring
  • DMZ - Demilitarized Zone
  • DLP - Data Loss Prevention
  • DOS - Denial of Service
  • DDoS - Distributed DOS
  • FERPA - Federal Educational Rights and Privacy Act
  • FGAC- Fine-Grained Access Control
  • GLBA - Gramm–Leach–Bliley Act
  • GPG - GNU Privacy Guard
  • GRC - Governance, Risk and Control
  • GTAG - Global Technology Audit Guide
  • HIPAA -Health Insurance Portability and Accountability Act
  • IDS - Intrusion Detection  System
  • IPS - Intrusion Prevention System
  • IIA - The Institute of Internal Auditors
  • ISACA - Information Systems Audit and Control Association
  • ISMS - Information Security Management System
  • ITIL - Information Technology Infrastructure  Library
  • KRI - Key Risk Indicator
  • MLS - Multi Level Security
  • NIST - National Institute of Standards and Technolog
  • PCI-DSS - Payment Card Industry Data Security Standard
  • PII - Personally Identifiable Information
  • SOAP - Simple Object Access Protocol
  • SOX - Sarbanes-Oxley Act
  • SSH - Secure Shell
  • SSO - Single Sign-On
  • VPD - Virtual Private Database
  • XSS - Cross Site Scripting


Monday, August 8, 2011

I can do the same.

微軟研究院的研究員 danah boyd 在  Google Plus 發表了一篇頗長的文章, 談她對最近她發表幾篇談社群網路實名制和相關議題文章(比如說這個這個)的讀者反響的想法,深感讚佩。抄錄其中幾段於後,因為,就像文章後某個讀者的留言,我希望自己也能做到I can do the same):
My arguments are not coming from a point of hatred towards any company or individual, but stemming from a determination to speak up for those who are voiceless in many of these discussions and to provide a different perspective with which to understand the issues.

And when I get pissed off about something, I rant. And that can be both good and bad. But I've found that my rants often make people think. That's what motivates me to keep ranting.

[Video] Are you ready for next thing

Clifford Stoll 根本就是好萊塢電影中科學怪人的原型,總是穿著粗布襯衫、休閑褲與球鞋。神經質的眼神,稀疏蓬鬆而張揚的頭髮,隨著走動而飛揚。說話的語氣熱切,似乎總是想用最快的方式,讓你明白「一切」他想表達的東西。

在講台上,他常因為語速跟不上思緒,雙手十指互絞,然後從講台這頭跳躍奔走到那頭,蓬鬆張揚的頭髮,就飄動的更厲害了。當我放映他的演講影片給小朋友看,他們總是一口咬定,愛因斯坦就(應該)是這樣子的。

如果你只從 Nick Bilton 的書《一位數位移民的告白》認識 Clifford Stoll,你可能會因為 Clifford 在 1995 年發表在新聞週刊的文章 Why Web Won't Be Nirvana - The Internet? Bah! 而認為他個是老朽、自大的蛋頭學者,不懂網路卻卻偏要大放厥辭,這下丟臉丟到家了吧。如果你真的這樣想,那就大錯特錯了。

其實 Clifford 真是個很難用三言兩語概括完的人,他是天文學家,也是電腦安全領域的先驅人物之一,1986 年在網路上和 KGB 駭客鬥法,他把過程寫在《The Cuckoo's Egg 》(中文書名叫 捍衛網路)裡,一直到最近幾年,還是很多教電腦安全的老師們指定的參考讀物之一。

1995 年,他出版了另外一本書《Silicon Snake Oil》,書的副標題叫做 Second Thoughts on the Information Highway,大約也就是那時候他投書給新聞週刊,談他對網際網路的憂心(The Internet? Bah!)。

也許他沒能正確預言後來網路技術的發展,和人類如何面對、使用這些技術的場景,但是他始終堅持著,他曾講過一段話 :
While I admire the insights of many of the people in the world of computing, I get this cold feeling that I speak a different language. 

下面的嵌入的影片是他在 TED 2006 的演講,就像他始終堅持的,他再次告訴大家 - 新事物,不代表未來。他不是科學怪人,也不是「從來不學什麼,也不忘記什麼」的蛋頭學者,他是始終對 explore next thing 抱著無比熱情的 Clifford Stoll。

TED 中文網站是這麼介紹這場演講的:Clifford Stoll 用一個又一個狂野有活力的軼事、觀察、演說甚至是科學實驗深深的擄獲他的聽眾,當然,用他對自我的評價:我是一位科學家。"當我正在著手一件事情時,我已經在想下一件事情了"

他可能會預言錯誤,但是他從不放棄探索,從這部影片,我們才能稍微認識一個人、一個熱情洋溢的探索者、一個科學家。

看影片之前,我們再看一句 Clifford 的名言:
Why is it drug addicts and computer afficionados are both called users


Saturday, August 6, 2011

[Video] 下雨天的週末


我好喜歡週末 我好喜歡雨天
我好喜歡 下雨天的週末
因為那小雨 我才可以和你
共撐一把雨傘 靠得好近 好近
因為那週末 我才可以和你
聊得好晚 好晚

[詩戀] 我們底戀啊 像雨絲

感謝+老貓+Jyh-Ming Yang+Mulberry He,若不是他們,我幾乎忘了詩戀的味道,七夕凌晨,聽 鄭愁予的《雨絲》正好

我們底戀啊 像雨絲
在星斗與星斗間的路上
我們底車輿是無聲的
曾嬉戲於透明的大森林
曾濯足於無水的小溪

那是
擠滿著蓮葉燈的河床啊
是有牽牛和鵲橋的故事
遺落在那裡的
遺落在那裡的

我們底戀啊 像雨絲

斜斜地 斜斜地織成淡的記憶
而是否淡的記憶
就永留於星斗之間呢

如今已是摔碎的珍珠 流滿人世了




Sunday, July 24, 2011

Tragic or Romantic? It depends...


( via Peantus )

[Video] 來挑戰什麼好呢?

熟悉 SEO 的業界中人,大概沒有人不曉得谷歌的 Matt Cutts 吧,他不僅是位「有名」的電腦科學家,還是一位小說家(看Video 就知道怎麼回事),最近,他還要挑戰馬拉松哦。


夜騎 50 公里,在這回大病前就已經達成,現在只是要調養和鍛煉回原來的體力。下個30天,挑戰什麼新目標呢?必須是從來沒做過的是事情,嗯,let me think....


Wednesday, June 8, 2011

Tuesday, April 26, 2011

Retrospect: Who's talking the Future of News

 一位數位移民的告白前言裡面提到作者 Nick Bilton 在 2009 年接受 Wired 網站訪問,談新聞的未來,竟因為訪問中提到他不再看以紙張印出的報紙(In fact, he doesn't even get the Sunday paper delivered to his house.),但是未來新聞或透過各種不同的方式、設備、媒介,不再局限於報「紙」這種媒介,這將給新聞產業和讀者一個更好的世界。

訪問刊出之後,雖然訪問中除了提到他不再看紙張印刷的報紙外,通篇稿子都是 Nick 對未來充滿信心的正面的訊息,作者仍然飽受同事和長官的抨擊和批評。有圖有真相,禍首就是下面這篇刊在Epiccenter專欄的稿子。



後記:

Wednesday, April 20, 2011

[Video] I do believe we shall overcome someday

一向很喜歡 Joan Baez 唱的 We Shall Overcome,睡前讀 @soundfury 寫的時代的噪音 Pete Seeger 章,才真正明白這首歌的意義和背後充滿血與淚的故事。睡前反覆聽這首歌不同的版本,心裡不停提醒自己,不要氣餒,比起前人的血淚,我遇到的困難算什麼,I do believe I shall overcome!


Pete Seeger - We shall overcome






Joan Baez - We shall overcome



Monday, April 18, 2011

[Video] MySQL's Happy Place ?

Twitter 的 DBA Lead Jeremy Cole 在 O'Reilly MySQL CE 2011 的演講《 Big and Small Data at @Twitter》很有意思,值得一看。9 分 41 秒開始解釋 MySQL 好在什麼地方,13分03秒談 MySQL 不擅長的應用場合,還有從 13 分半起談到 MySQL 的適用範圍(Happy Place)的部份,尤其讓在 PostgreSQL 和  MySQL 間左右為難的筆者眼睛一亮

配合視訊,Roland Bouman 做的整理很不錯,值得參考:

Sunday, April 17, 2011

NewSQL 是什麼?

在IT產業裏,每天總有翻新的Acronym(首字母縮略語)出現,不管是新瓶舊酒還是橫空出世,似乎只要每出現一個新詞,總能炒出新的話題和市場機會。在資料庫產業,繼爭議不斷的 NoSQL 之後,現在又出現了 NewSQL

The 451Group 的分析師 Matthew Aslett 在網誌文章中發明了新詞 NewSQL, 在 Matthew Aslett 筆下,NewSQL 不是結構性查詢語言的本身的變革,而是代表一些致力追求高縮放性及性能的資料庫廠商,這些廠商各自選擇不同的技術策略及社羣合作方式來達成目標。同時他還特別強調大家不要拘泥於字面的意義 - NewSQL is not to be taken too literally - NewSQL 指的是供應商(NewSQL is used to describe a loosely-affiliated group of companies),不是語言本身。

在4月6日的文章中,作者列出了他心目中屬於 NewSQL 陣營的廠商名單:
In the first group we would include (in no particular order) ClustrixGenieDBScalArc,SchoonerVoltDBRethinkDBScaleDB, Akiban, CodeFuturesScaleBaseTranslattice, andNimbusDB, as well as Drizzle, MySQL Cluster with NDB, and MySQL with HandlerSocket. The latter group includes Tokutek and JustOne DB. The associated “NewSQL-as-a-service” category includes Amazon Relational Database Service, Microsoft SQL AzureXeround, Database.com and FathomDB.

一週後,作者再度撰文解釋 NoSQL, NewSQL 爲什麼是關聯式資料庫的未來,同時他一語雙關的用 SPRAINed 來形容 RDBMS 的現況,和未來。



SPRAIN 分別是下列六個驅動力量的縮寫,不論是巧合還是精心打磨,確實是很具巧思的文字手法:

  • Scalability – hardware economics
  • Performance – MySQL limitations
  • Relaxed consistency – CAP theorem
  • Agility – polyglot persistence
  • Intricacy – big data, total data
  • Necessity – open source

RWW 和 High Scalability 對這個話題都做了報導,High Scalability 的 The NewSQL Market Breakdown 把 NewSQL 陣營中的廠商切割爲 New MySQL storage engines、 New databases 和 Transparent Sharding 三大類,各類別適合不同的應用需要,而不是一攬子全放在一起相提並論,倒是比 RWW  僅對原文摘要的簡略報導深入又易懂。

Sunday, April 10, 2011

失之毫釐是不是謬以千里

在西班牙電信公司 Telefonica 研究院 工作的學者 Xavier Amatriain,前幾天在網誌上發表了一篇文章 Recommender Systems: We're doing it (all) wrong ,談到研究推薦系統的學者和開發者,在使用數據時,務必要注意數據的性質。

很多人使用 Likert Scale 做評分(Ratings)的量表基礎,比如說像「非常不喜歡、喜歡、無所謂、不喜歡、非常不喜歡」這樣的評分表就極爲常見,但是 Xavier 提醒我們 Likert Scale 的數據是 ordinal data ,這種數據僅僅表達次序關係,但是兩兩評分之間未必是 equidistant 的。若用這樣的數據計算距離(計算距離是相似性的基礎),其結果可能是失真的,循此邏輯推演下去,計算推薦系統準確率的指標 RMSE 的意義也可能失準。

從數學的角度來看,誤用定義當然是極爲嚴重的基本功的失誤,但是若從實務上考量,把 Likert 式評分當做 internal data,對推薦系統的成果究竟影響又多大,實在不好說 。不過,看來在這一點上不察,誤把馮京當馬涼的研究人員和開發人員可能不少哦!

Xavier Amatriain 寫這篇文章,是受 Judy Robertson 在 Blog@ACM 上的文章 We're Doing It Wrong 所啓發。Judy 在文中提到 2010 ACM Conference on Human Factors in Computing Systems  有學者發表研究 前一年會議中發表論文《Powerful and consistent analysis of Likert-type rating scales 》,爬梳學者使用的數據和統計工具,發現驚人的事實,原文是這樣的:
Kaptein, Nass, & Markopoulos (2010) published a paper in CHI last year found that in the previous year's CHI proceedings, 45% of the papers reported on likert type data but only 8% used non-parametric stats to do the analysis. 95% reported on small sample sizes (under 50 people). This is statistically problematic even if it gets past reviewers!

使用 Likert Scale 作爲實驗分析方法的學者竟然約略達到五成,Judy 在文章下半部提出她對此現象原因的觀察和建議,我對統計是大外行,只能點頭諾諾。但最抓住我眼球的句子是“95% reports on small sample size”這句,產業界鮮少有人信服學界真能做出「有用」的東西,確實有點道理,怨不得人。


[參考資料]
Kaptein, M., Nass, C., Markopoulos, P. (2010) Powerful and consistent analysis of Likert-type rating scales. In Proceedings CHI 2010, ACM, New York, NY, 2391-2394. DOI= http://doi.acm.org/10.1145/1753326.1753686

拼搏在雲端,上面空氣好不好?

上週末 ReadWriteWeb 網站 雲端頻道介紹了 Horn Group 製作的一份雲端服務同溫層圖解。雲層依  Infrastructure as a Service、Platform as a Service 、Software as a Service 和 Communications and Social Applications 順序由下往上,每層雲中填滿了在這領域裡面耕耘廠商的名字的公司標誌(logo)。

若是依照 99 年 4月 29 日行政院第3193次院會通過經濟部提出的「雲端運算產業發展方案」,台灣上空應該是怎樣的圖像呢?


Saturday, April 9, 2011

[Video] Put Your Head On My Shoulder

我想說,什麼時候能換換角色,借個肩膀靠一下。

其實,若像歐陽永叔《浪淘沙》那樣把酒祝東風,且「共從容」,境界才叫人心怡哪,思考「明年知與誰同」就大可不必了。


Geeking and Murmuring

最近公司同事正在撰寫一份企劃書,部分章節需要一點「理論」支撐,讓觀點看來更有說服力,提及建議解決方案的章節,也需要找些學界的研究成果,作爲方案內容的基礎。毫不意外的,負責撰寫文件的同事,找到我這兒來尋求奧援,我沒怎麼考慮就答應幫忙。

四月初的長假裏,除了去聽迪倫伯在臺北的演唱會,就是打開瀏覽器,再次使用曾經每日相伴的 Google Scholar,尋找企劃書所需的相關文獻。重溫過去幾年每日例行不變的工作習慣,用不同的關鍵字反覆查找可能有用的文章,下載文章之後,快速的瀏覽摘要和論文的第一個章節,將文章加入 CiteULike 的書籤庫。曾經熟悉的節奏又回到心頭,唸書時候的種種壞習慣,也一一回到案頭,但是這陣子被諸多煩心事弄的起伏不定的心情,竟然意外的平靜下來;紛亂的思緒,也脈絡分明,條分縷析絲毫不爽。

把幾篇文章的導論和文獻回顧瀏覽過一遍,很快找出一個自己能掌握的脈絡,寫下幾點次日討論用的備忘記事,這過程是愉悅欣喜的。可惜,結束這個工作,回到現實,情緒又硬生生的落回原本惶然無奈的低檔。

突發奇想,邊收拾次日上班要用到的文件,邊思考著若是寫一篇《論 Google Scholar 和 CiteULike 可以安神》,不知要投到那個期刊,會不會被接受呢?

前些日子,看 Greg Linden 不定期發表的 Latest Reading 系列,既嘆服又嫉妒。月初假期裏的遭遇,讓我想到,東施效顰未必是個壞主意,倘若我寫個《Geeking and Murmuring》系列,強迫自己做更多的深層閱讀,或許對整理心情安神療傷大有俾益也未可知。

最近這段日子,身心都處於極糟的低潮狀態。原本以爲可以靠着自身免疫力熬過去的感冒愈發惡化,每每在講一小段話之後,連咳不止,直到胸痛頭暈仍欲罷不能。我明白,這症狀的源頭還是在心裏,心病心藥總心靈,問題還是在疲倦和失望,其實,寄望 Geeking and Murmuring 治病療傷其實一點也不可笑。

疲倦是因爲不自量力,以凡人肉軀硬扮內衣外穿的超人,將一個草莽階段的組織領往 well-established 境界的工作太多、太繁,精細處卻又微妙難言,領情識趣的同行之人不是太多。每每對朋友自我解嘲,每天早上九點帶上神采飛揚的面具,晚上八點變成沒有表情、沒有生氣的礦物。

肉體疲倦可以透過睡眠恢復,但是精神上的耗損卻沒有那麼容易恢復。過程中,自認為釋出對他人全心全意的善意和全無保留的信任,卻不知別人是如何解讀我這「自以爲是」的善意,過程裏益發看清自己的軟弱、不自量力和孟浪,如何能不讓自己心驚呢?我是對自己失望啊!

我想,撥點時間讓自己每隔一陣子回到熟悉的節奏,讓大腦拾回清明,是有必要的。傷春悲秋固然必要,情緒放縱之後,捻花看世情,繼續前行纔是王道啊!

Sunday, April 3, 2011

再次寫給未來的自己


很久以前就決定要在我和 Bob Dylan 都在這時空中消逝之前,聽一次他的現場演唱,今夜親聆大佬的歌聲,於我不僅是親歷一個時代標誌人物的(可能)最後風采,在安可歌聲中冥想,恰好也是反芻自己孟浪幼稚前半生,為自己的不自量力與明知其不可而為之做個註腳與見證的時機。

未來,我還是會繼續做各種愚不可及的蠢事,但那又怎樣,愚不可及又如何。太上忘情棄世,其下不及與言,在塵網中跌跌撞撞的,正是我輩這種假的聰明人。我心磊落,縱或浮沉顛倒,只要一顆本心還在,明天定然還看的到陽光。

今夜小巨蛋裡的聽眾比想像中少,平均年齡層卻又比想像中低,有趣的現像。雖然在場中曾經自責應該先做好功課再進場,但是,就算想事先做準備,預習這件事,對從不事先公布曲目,也不允許在現場放投影幕的 Dylan 伯是沒有用的。接近 100 分鐘的演出,含混不清的咬字和沁人心脾的口琴聲(真的好喜歡 Bob 吹的口琴)同樣令人顫慄,演唱到了五十分鐘之後,歌者和樂隊情緒益發的癲狂了。總之,不論媒體報導說些什麼,所謂專業樂評怎麼評價,我喜歡這個送給自己的禮物。

Sunday, March 13, 2011

回不去了

這時代,還有人相信文字的重量等於生命的重量嗎?我相信。

Wednesday, February 9, 2011

That's life


That's life, that's what all the people say.
You're riding high in April,
Shot down in May
But I know I'm gonna change that tune,
When I'm back on top, back on top in June.

I said that's life, and as funny as it may seem
Some people get their kicks,
Stompin' on a dream
But I don't let it, let it get me down,
'Cause this fine ol' world it keeps spinning around

I've been a puppet, a pauper, a pirate,
A poet, a pawn and a king.
I've been up and down and over and out
And I know one thing:
Each time I find myself, flat on my face,
I pick myself up and get back in the race.

That's life
I tell ya, I can't deny it,
I thought of quitting baby,
But my heart just ain't gonna buy it.
And if I didn't think it was worth one single try,
I'd jump right on a big bird and then I'd fly

I've been a puppet, a pauper, a pirate,
A poet, a pawn and a king.
I've been up and down and over and out
And I know one thing:
Each time I find myself laying flat on my face,
I just pick myself up and get back in the race

That's life
That's life and I can't deny it
Many times I thought of cutting out
But my heart won't buy it
But if there's nothing shakin' come this here july
I'm gonna roll myself up in a big ball and die
My, My!

What Google Learned in 2010

Sunday, February 6, 2011

[詩戀] 熱血爭春

晨讀傅國湧从龚自珍到司徒雷登》徐志摩篇,方知民歌時期著名歌曲梅雪爭春歌詞為詩人抗議1926年由段祺瑞軍隊所發動的「三一八」事件槍殺無辜請願民眾之作,原載於1926年4月1日《晨報副刊·詩鐫》第一期,末句『梅花是十三齡童的熱血』,以血成書沉痛已極。

南方新年裡有一天下大雪
我到靈峰去探春梅的消息
殘落的梅萼 瓣瓣在雪裡掩
我笑說這顏色還欠三分豔

命運說你趕花朝節前回京
我替你備下真鮮豔的春景
白的還是那 冷翩翩的飛雪
但梅花是十三齡童的熱血

Saturday, February 5, 2011

Friday, February 4, 2011

[詩戀] 櫓聲盪入雲水

嘉慶十七年,龔自珍返回故鄉杭州,泛舟西湖,賦詞《湘月》述懷。論者對定盦詩文中簫與劍的意象闡述多矣,余借他人酒杯,澆自家塊壘 ,解語者定然笑我非計。千言萬般,櫓聲盪入雲水,何時忘卻營營?

天風吹我 墮湖山一角 果然清麗

曾是東華生小客 回首蒼茫無際

屠狗功名 雕龍文卷 豈是平生意

鄉親蘇小 定應笑我非計

才見一抹斜陽 半隄香草 頓惹清愁起

羅襪音塵何處覓 渺渺予懷孤寄

怨去吹簫 狂來說劍 兩樣銷魂味

兩般春夢 櫓聲盪入雲水

綠意

Sunday, January 30, 2011

[詩戀] 借他人酒杯,澆胸中塊壘

節前趕集,等待魚販料理魚蝦,躲在暗隅讀,見老貓王佩隔空機鋒,興味盎然,待見得太后詔諭:「住口,別以為我沒上堆就不知道你們這些曲折的腦袋在想啥。借他人酒杯,澆胸中塊壘,一樣是共犯。」,亟欲尋一酒杯自澆塊壘。

嘉慶 25 年(西元 1820 年),龔自珍做《又懺心一首》:
佛言劫火遇皆銷 何物千年怒若潮

經濟文章磨白晝 幽光狂慧復中宵

來何洶湧須揮劍 去尚纏綿可付簫

心藥心靈總心病 寓言決欲就燈燒

既是借他人酒杯,澆胸中塊壘,就不需追究此心病是否彼心病了。曲折的腦袋曰:「屠狗功名,雕龍文卷,豈是平生意」。

How recommender researchers test their algorithm and make the system smarter?

FastCompany 今日介紹 RichRelevance 首席科學家 Darren Vengroff想出一個讓研究人員測試推薦系統演算法的方法,其實這個點子很簡單,將真實世界的資料包裝成一個黑箱,讓研究者上傳程式,使用這個機制測試演算法的好壞。
There are many holy grails in online commerce, but one that has frustrated C-level executives and engineers alike is how to produce better recommendation algorithms. Produce better recommendations, and you’ll sell more stuff.
Historically, however, there has been one major structural impediment to making significant breakthroughs on this front. But the Chief Scientist at RichRelevance, which provides personalization solutions for the likes of Walmart, Sears, and Overstock.com, may just have fixed that. 
First, however, the impediment: The people who are likely to produce breakthroughs--the really smart smarty-pants in the math departments of the world’s universities--don’t have access to large bodies of real-world data. And without real-world data, they can come up with as many hypotheses and new types of math as they like, but they’ll never really know if it actually works in the real world. It’s like trying to learn how to serve without tennis balls. You can swing as much as you like, but until you actually hit a real-live ball, you can never be sure if your swing would actually place a ball in the serve box. 
For their part, the people who have real-world data--the Amazons and eBays of the world--can’t share it with the researchers for reasons of customer privacy. “Even if we anonymize it, we’re handcuffed because we can’t give out data that can be reasonably be used to reconstruct who someone really is,” the Chief Scientist, Darren Vengroff, tells Fast Company.
Vengroff, however, has come up with a novel solution: He’s created a “black box” of sorts with real-world data that researchers can use to run experiments on. Researchers won’t be able to look at the data, but they will be able to dump their algorithms in and have the box spit out results, which the researchers can then use to refine their hypotheses. 
It’s a simple idea, but it wasn’t really possible to execute until the advent of the cloud. Now researchers from any part of the globe will be able to use the system to run experiments. (In principle, of course--in practice, a committee will vet proposals and choose which ones will actually run.) 
Vengroff, who once worked as a Principal Engineer at Amazon, says he got the idea for the project while attending a computing conference last fall. “In one of the sessions, there were three consecutive papers in a row where about two-thirds of the way through, I was really excited about what was being presented, and then they went down a different path than I thought they were going to go,” he says. “I realized if they only knew what the real-live data, that I look at every day, says, they wouldn’t have gone that way. They would have gone the right way and gotten to a much better solution.”
“Seeing these brilliant ideas get misapplied because of a very reasonable assumption about how shoppers might shop, but happens not to be true in the empirical data--I realized we’ve got to find a way to have this not happen anymore.” 
The system, which is in beta right now, will launch next month at conference in Palo Alto. Says Vengroff: “We’ve got a significant new path that we think is really going to change things.”
E.B. Boyd is FastCompany.com’s Silicon Valley reporter. Follow me on Twitter, or email me.

Thursday, January 27, 2011

Sunday, January 23, 2011

[Video] To the life that used to be.

早在去年8月,網友就提醒,音樂劇悲慘世界25週年音樂會,將在去年 10月初倫敦表演。雖然沒機會躬逢其盛,但是透過網路,我們還是「間接」參與了這場盛會。


或許是期望過高,或許是被十週年紀念音樂會影響而有先入為主的「偏見」,坦白說,我對這次的表現有點小失望。比如說,經典的 One Day More 絲毫感受不到激情,謝幕前做為 encore 曲再唱一遍時,老歌手的表現固然點燃了全場熱情,但這回擔綱 Marius 的 Nick Jonas 缺乏大將之風的缺點曝露無遺。


平心而論,雖然 Marius 的表現令人失望,其他的角色還算到位,編曲、舞台、噱頭爆點的設計比起十五年前猶有過之。有些曲子還是有些亮點的,比如說 Drink with Me 中 Grantaire 的情緒爆發就比十週年的含蓄慢板要更符合原著精神,也更有舞台效果,可惜當 Marius 在曲終沉痛深情的問  Will you weep, Cosette, for me?,Nick Jonas 怎麼也不及 Michael Ball 表達出的情緒起伏。


看著歌詞,
Drink with me to days gone bye
To the life that used to be 

不禁想到月前胡亂湊的前人集句:「獨酌無相親,誰共從容」!



25 週年版 Drink with me


10週年版 Drink with me

Saturday, January 15, 2011

三杯通大道,一斗合自然



又一個舉杯的理由,薄酒不僅能讓墨客「酒發雄談,劍增奇氣,詩吐驚人語」,還能提升科研成果,突破瓶頸。「方知一杯酒,猶勝百家書」,古人誠不我欺!

日本國立材料科學院,一個研究超導體的研究團隊,在某次慶功小聚中,喝的有點多的研究人員,酒意上頭之後,決定把研究材料放到“許多、許多”酒 (原文是 many many liquor)中。

事後檢測,這批泡過慶功酒的材料,傳導性比平日的實驗結果好得多 (疑問:這些人原本要慶祝什麼?)。 進一步比較後發現,日本燒酒的成績比平日材料好 23%,泡過紅酒的材料則改善了 62%。但是根據報導,這宴會還供應威士忌和啤酒,莫非這兩種酒的表現不大好!?

我想,以嚴謹為尚的研究人員,應該還要比較混合不同酒類比例,以及不同廠牌的影響才是。

不論如何,一定很多人愛死這篇報導的結論了,”So, a little sip of something turns out to make potential superconductors much better at their jobs. And, perhaps, scientists better at their jobs as well.“。

三杯通大道,一斗合自然,如果喝一斗,就能生一篇SCI文章,....

只有更囧,沒有最囧

本以為這個月初 Vodafone Australia 泄露四百萬客戶資料的事已夠駭人聽聞,沒想到加拿大統計局 (Statistics Canada)諸君的豐功偉業才真是登峰造極令人髮指。

多倫多太陽報這個月10號在一份報導中整理過去五年 Stats Can 的犯行,在 2007年,他們把一個裝了敏感訊息的檔案櫃當作多餘的傢具賣掉,還有一次,統計局的幹員把某家公司的調查資料,留在其他的調查對象的辦公室,太陽報很客氣的說這只是 some examples of breaches.....


OCT. 2010: Purolator envelope containing 11 unencrypted, non-password-protected CDs for the Vital Statistics Program in Alberta addressed to Ottawa head office sent July 9, 2010 is discovered missing. It contains more than 21,000 electronic images of confidential information about individual birth, death, stillbirth and marriage registrations. It is found Nov. 30, 2010 locked in a rarely-used filing cabinet.
SEPT. 2009: Stats Can library's password access protocol constitutes "major security breach."
DEC. 2008: A briefcase with documents and personal notes is stolen from the car of an interviewer from Quebec. Confidential addresses of respondents were included.
JULY 2008: An error in transmission meant e-mails of 108 subscribers of Health Reports notifications were "inadvertently revealed" to all recipients of message - constituting a breach of Privacy Act and Stats Can policy.
JUNE 2008: Stats Can is informed that on Feb. 12, 2008 Surrey RCMP and Canada Post recovered completed 2006 census questionnaires from a private residence in a bust of a major identity theft ring. Other items included equipment related to credit card/ID theft, drivers' licences, 3,000 pieces of stolen mail, government-issued cheques, fake currency and more than 100 CDs with thousands of personal data profiles. Census questionnaires were not in the hands of census staff - it is believed they were obtained by tipping mailboxes or break-ins to homes and cars.
AUG. 2007: A laptop containing personal information about individuals who participated in the Labour Force Survey or Canadian Community Health Survey is stolen from the residence of an employee in Abbotsford, BC. Password was written on a sticky note stored in laptop case. Police called, affected people are informed and interviewer receives verbal reprimand.
JUNE 2007: Laptop with three completed household spending surveys stolen in home break-in in Delta, B.C.
MARCH 2007: Edmonton regional office reports two laptop thefts from field interviewers' vehicles. Staff are reminded about protocol for securing material.
MARCH 2007: Privacy Commissioner's office advised of inadvertent disclosure and loss of personal info after surplus filing cabinets with Records of Employment about 66 2006 census workers were sold at a Crown Assets Auction in Edmonton. Affected individuals are contacted and Stats Can implements more stringent procedures to avoid a recurrence.
JULY 2006: Enumerator leaves completed questionnaire instead of blank at Scarborough, Ont. respondent's home.
APRIL 2005: Blank forms faxed to a business include additional pages of confidential information related to two other businesses. Staff receive retraining and posters/notices are displayed as reminders.
FEB. 2005: Marketing information collected for one user is reviewed by another user and possibly four other unknown individuals in a Corporations Returns Act survey.
FEB. 2005: Laptop being shipped from Williams Lake, B.C. to Edmonton containing 23 Survey of Household Spending cases - including 11 completed ones - goes missing. A flurry of e-mails ensues among senior managers at Stats Can and officials "pester" Canada Post to find the lost item. Confidential statistical info is encrypted. Laptop is found two weeks later.

Sunday, January 9, 2011

[Video] How to succeed? Get more sleep

昨夜早早就寢,今晨睜眼看到時鐘,赫然發現自己睡了足足十一個小時。猶在睡眼惺忪之際,打開 google reader,看到 Ted 中國粉絲團介紹由Arianna HuffingtonThe Huffington Post 的共同創辦人和主編,在遭遇一次健康危機後,對睡眠的反省和重新認知:How to succeed? Get more sleep



如果我的心是一朵蓮花

~ 林徽因 · 馬雁散文集 · 蓮燈 ~ 馬雁 在她的散文《高貴一種,有詩為證》裡,提到「十多年前,還不知道林女士的八卦及成就前,在期刊上讀到別人引用的《蓮燈》」 覺得非常喜歡,比之卞之琳、徐志摩,別說是毫不遜色,簡直是勝出一籌。前面的韻腳和平仄的處理顯然高於戴...