<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Oasis Feng &#187; Google</title>
	<atom:link href="http://blog.oasisfeng.com/tag/google/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.oasisfeng.com</link>
	<description>Challenge your imagination!</description>
	<lastBuildDate>Tue, 13 Jul 2010 16:56:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>		<item>
		<title>Google Wave终于支持非Wave用户匿名浏览</title>
		<link>http://blog.oasisfeng.com/2010/04/17/google-wave-finally-support-for-anonymous/</link>
		<comments>http://blog.oasisfeng.com/2010/04/17/google-wave-finally-support-for-anonymous/#comments</comments>
		<pubDate>Sat, 17 Apr 2010 05:56:03 +0000</pubDate>
		<dc:creator>oasisfeng</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[openid]]></category>
		<category><![CDATA[Wave]]></category>

		<guid isPermaLink="false">http://blog.oasisfeng.com/?p=884</guid>
		<description><![CDATA[下面这个嵌入式的Wave就是Google Wave团队的官方公告Wave，现在你不用登录Wave就能看到它了。不过匿名用户还只能浏览，参与互动仍然需要登录。但这样已经让Google Wave的可用性大大增强了，可以在更多Web领域发挥它应有的价值。 结合Google Wave API的Proxying-for，我们也可以自己实现匿名式交互，或者与其它身份系统集成（比如OpenID）。有时间的话，我会尝试做一个OpenID Proxy的Sample。]]></description>
			<content:encoded><![CDATA[<p>下面这个嵌入式的Wave就是Google Wave团队的官方公告Wave，现在你不用登录Wave就能看到它了。不过匿名用户还只能浏览，参与互动仍然需要登录。但这样已经让Google Wave的可用性大大增强了，可以在更多Web领域发挥它应有的价值。</p>
<p>结合Google Wave API的<a href=" http://code.google.com/apis/wave/extensions/robots/operations.html#Proxying">Proxying-for</a>，我们也可以自己实现匿名式交互，或者与其它身份系统集成（比如OpenID）。有时间的话，我会尝试做一个OpenID Proxy的Sample。</p>
<p><iframe width="480px" height="820px" src="http://www.oasisfeng.com/show/wave_embed.html" frameborder="0"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.oasisfeng.com/2010/04/17/google-wave-finally-support-for-anonymous/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Google Buzz的解读误区</title>
		<link>http://blog.oasisfeng.com/2010/02/11/misunderstandings-about-google-buzz/</link>
		<comments>http://blog.oasisfeng.com/2010/02/11/misunderstandings-about-google-buzz/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 15:40:24 +0000</pubDate>
		<dc:creator>oasisfeng</dc:creator>
				<category><![CDATA[Misc]]></category>
		<category><![CDATA[Buzz]]></category>
		<category><![CDATA[FriendFeed]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Salmon]]></category>
		<category><![CDATA[Social]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://blog.oasisfeng.com/?p=863</guid>
		<description><![CDATA[Google发布Buzz后，网络上迅速出现了大量对Buzz的评论，有正面的，有负面的，有炒作概念的，有跟着起哄的，甚至引发了大家对Gmail安全的担忧。这其中不乏一些对Buzz的误读，所以，在这里以我个人的理解来解释一下。 “Google Buzz是Twitter杀手！” 这是大多数媒体最喜欢的炒作方式，又一个Killer App出现了，于是编辑们都兴奋了，又可以赚足眼球了。事实上，Google Buzz和Twitter总体来看并不是一个层面上的应用，还构不成真正意义上的Killer。一些冷静的分析还是看的比较清楚，Google Buzz其实主要针对的是FriendFeed，因为它们都是聚合平台，让不同源头的信息聚合在一起。Buzz相对于FriendFeed的最大进步在于，它除了聚合信息之外，还创造性的利用Social Graph来聚合人际关系。 当然，Google Buzz除了聚合功能外，自身也充当了一个简单的信息源，可以在Buzz上发表富媒体信息。但事实上，你有自己充分的选择权，完全可以保持原有的习惯，在WordPress上写Blog，在Twitter上唠叨几句，这些信息最终都会自动被汇总到Buzz中来。 “我们不需要又一个社交网络” 当你迫不及待的跑来Buzz上兴奋的吼了几句后，才意识到它和Twitter也没多少差，反而在这里找不到Twitter上那种“振臂一呼，Follower百应”的成就感了。没过多久，你就会逐渐淡忘掉Buzz。这是因为，你把Buzz当成了一个和Twitter、Facebook、MySpace一样的社交网络。“又一个新的SNS，我不得不又一次花费时间从头建立我的关系网络。”其实这不奇怪，不光是你，连Microsoft也这么认为。 本质上，Buzz并不想打造一个新的社交网络，恰恰相反，它的目的是推进一系列开放标准，使用户不必在各个SNS维护一个又一个彼此独立的关系网络，使人际关系得以重用和汇聚，进而构造起一个去中心化的Social Graph，不依附于某一个特定的SNS。 Buzz倡导运用XHTML Friends Network (XFN) 和 Friend of a Friend (FOAF) 挖掘和汇聚用户既有的关系网，实现SNS间的互操作性。如果各个开放SNS都能响应这一号召的话，那么将来我们就再也不用担心自己的人际关系被锁死在某个SNS中，甚至还可以借助新的SNS发现原有SNS中漏掉的好友。 “Buzz让信息的回复和评论更加破碎了” 这一点确实是目前不争的事实，因为无论你从Twitter往Buzz同步也好，还是打算反过来从Buzz发布到Twitter，你都得面对一个问题，回复和评论的不同步。你很可能因为只在Twitter上读消息而遗漏了Buzz里别人的评论，或许在习惯了Buzz后，又冷落了Twitter上的Followers。在Web应用越来越多的引入“聚合（Aggregation）”功能后，这个问题逐渐凸显出来。Google Buzz现在没有解决这个问题，但这只是因为目前的Buzz还尚不完整。Buzz的API文档中有一节“Coming Soon”，其中提到了Buzz未来对这一问题的解决之道——Salmon。 之所以现在没有推出Salmon支持，我猜想，一方面是由于这个规范尚处在Draft阶段，另一方面它无法从Google单方面实现，因为信息源和聚合者都必须遵从Salmon协议，才能完整的实现评论同步。这个事情倘若让任何一家其它互联网公司来推，可能都收效甚微，但由Google Buzz倡导，其影响力就不可同日而语了。因此，Google在完成Salmon的支持前，先放出Roadmap来，让大家都意识到Google开放的心态和坚定的决心，这样Salmon才有机会得到广泛的认可和支持。 所以，就如同Wave对大多数人来说也不过尔尔，只有当你透过API去洞察其背后所希望表达的真正意图后，才能深刻理解Google每一款产品的前瞻和愿景。在大多数人被Buzz的优雅与便捷所打动时，我更看重的是它将对整个SNS生态圈所产生的深远影响，和它在推动开放和标准化上的显著贡献。]]></description>
			<content:encoded><![CDATA[<p>Google发布Buzz后，网络上迅速出现了大量对Buzz的评论，有正面的，有负面的，有炒作概念的，有跟着起哄的，甚至引发了大家对Gmail安全的担忧。这其中不乏一些对Buzz的误读，所以，在这里以我个人的理解来解释一下。</p>
<p><strong>“Google Buzz是Twitter杀手！”</strong></p>
<p>这是大多数媒体最喜欢的炒作方式，又一个Killer App出现了，于是编辑们都兴奋了，又可以赚足眼球了。事实上，Google Buzz和Twitter总体来看并不是一个层面上的应用，还构不成真正意义上的Killer。一些冷静的分析还是看的比较清楚，Google Buzz其实主要针对的是<a href="http://friendfeed.com/">FriendFeed</a>，因为它们都是聚合平台，让不同源头的信息聚合在一起。Buzz相对于FriendFeed的最大进步在于，它除了聚合信息之外，还创造性的利用<a href="http://code.google.com/apis/socialgraph/">Social Graph</a>来聚合人际关系。</p>
<p>当然，Google Buzz除了聚合功能外，自身也充当了一个简单的信息源，可以在Buzz上发表富媒体信息。但事实上，你有自己充分的选择权，完全可以保持原有的习惯，在WordPress上写Blog，在Twitter上唠叨几句，这些信息最终都会自动被汇总到Buzz中来。</p>
<p><strong>“我们不需要又一个社交网络”</strong></p>
<p>当你迫不及待的跑来Buzz上兴奋的吼了几句后，才意识到它和Twitter也没多少差，反而在这里找不到Twitter上那种“振臂一呼，Follower百应”的成就感了。没过多久，你就会逐渐淡忘掉Buzz。这是因为，你把Buzz当成了一个和Twitter、Facebook、MySpace一样的社交网络。<a href="http://code.google.com/apis/socialgraph/">“又一个新的SNS，我不得不又一次花费时间从头建立我的关系网络。”</a>其实这不奇怪，不光是你，连<a href="http://techcrunch.com/2010/02/09/microsoft-slams-google-buzz/">Microsoft也这么认为</a>。</p>
<p>本质上，Buzz并不想打造一个新的社交网络，恰恰相反，它的目的是推进一系列开放标准，使用户不必在各个SNS维护一个又一个彼此独立的关系网络，使人际关系得以重用和汇聚，进而构造起一个去中心化的Social Graph，不依附于某一个特定的SNS。</p>
<p>Buzz倡导运用<a href="http://gmpg.org/xfn/">XHTML Friends Network</a> (XFN) 和 <a href="http://www.foaf-project.org/">Friend of a Friend</a> (FOAF) 挖掘和汇聚用户既有的关系网，实现SNS间的互操作性。如果各个开放SNS都能响应这一号召的话，那么将来我们就再也不用担心自己的人际关系被锁死在某个SNS中，甚至还可以借助新的SNS发现原有SNS中漏掉的好友。</p>
<p><strong>“Buzz让信息的回复和评论更加破碎了”</strong></p>
<p>这一点确实是目前不争的事实，因为无论你从Twitter往Buzz同步也好，还是打算反过来从Buzz发布到Twitter，你都得面对一个问题，回复和评论的不同步。你很可能因为只在Twitter上读消息而遗漏了Buzz里别人的评论，或许在习惯了Buzz后，又冷落了Twitter上的Followers。在Web应用越来越多的引入“聚合（Aggregation）”功能后，这个问题逐渐凸显出来。Google Buzz现在没有解决这个问题，但这只是因为目前的Buzz还尚不完整。<a href="http://code.google.com/apis/buzz/documentation/#coming-soon">Buzz的API文档中有一节“Coming Soon”</a>，其中提到了Buzz未来对这一问题的解决之道——<a href="http://www.salmon-protocol.org/">Salmon</a>。</p>
<p>之所以现在没有推出Salmon支持，我猜想，一方面是由于这个规范尚处在Draft阶段，另一方面它无法从Google单方面实现，因为信息源和聚合者都必须遵从Salmon协议，才能完整的实现评论同步。这个事情倘若让任何一家其它互联网公司来推，可能都收效甚微，但由Google Buzz倡导，其影响力就不可同日而语了。因此，Google在完成Salmon的支持前，先放出Roadmap来，让大家都意识到Google开放的心态和坚定的决心，这样Salmon才有机会得到广泛的认可和支持。</p>
<p>所以，就如同<a href="http://blog.oasisfeng.com/2009/10/12/the-strategy-vision-behind-google-wave/">Wave对大多数人来说也不过尔尔</a>，只有当你透过API去洞察其背后所希望表达的真正意图后，才能深刻理解Google每一款产品的前瞻和愿景。在大多数人被Buzz的优雅与便捷所打动时，我更看重的是它将对整个SNS生态圈所产生的深远影响，和它在推动开放和标准化上的显著贡献。</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.oasisfeng.com/2010/02/11/misunderstandings-about-google-buzz/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>【闲言碎语】淘宝电器城、网瘾战争、轩网、GAE、tb.ly、第一推动丛书……</title>
		<link>http://blog.oasisfeng.com/2010/02/01/weekly-tweets/</link>
		<comments>http://blog.oasisfeng.com/2010/02/01/weekly-tweets/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 16:00:37 +0000</pubDate>
		<dc:creator>oasisfeng</dc:creator>
				<category><![CDATA[Tweets]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[google app engine]]></category>
		<category><![CDATA[Taobao]]></category>
		<category><![CDATA[tb.ly]]></category>

		<guid isPermaLink="false">http://blog.oasisfeng.com/?p=849</guid>
		<description><![CDATA[自从习惯了Twitter后，Blog写的是越来越少了。Twitter虽好，但相对于Blog，它其实很不利于内容的沉淀，再加上因国情问题而导致很多朋友无法访问，有价值的信息就此流失。为此，我准备尝试每周做一个Tweets的合辑，让这周中那些不是废话的内容能有机会沉淀下来，并且让更多人有机会从中获取有用的信息。当然，也随时欢迎在Twitter上Follow我。 Monday, 25th of January. 揭秘搜索引擎关键字过滤背后的实时审查系统 http://gfwrev.blogspot.com/2010/01/blog-post.html 淘宝电器城在走一个危险的模式，它通过集中展现抹杀店铺的个体品牌和特色服务，最终将竞争引向赤裸裸的价格战。淘宝上那些靠口碑经营所积累的店铺品牌，可能就此毁于一旦。 Tuesday, 26th of January. Google Reader的新特性，Feed for any page: http://googlereader.blogspot.com/2010/01/follow-changes-to-any-website.html 看完了朋友推荐的《网瘾战争》，作为一个曾经的魔兽玩家，我能深切体会到这种对现实社会近乎绝望的悲愤呐喊。那一刻，我感动的忍不住落泪了…… Wednesday, 27th of January. 在Blog上有人留言说“我最近在重玩台服的軒網 那個轩网·异眼我找很久也找不到 請問可以給我嗎?” 一个网游能在关服数年后仍然拥有忠实的玩家，我想也只有轩辕剑网络版和WoW能做到吧…… RT @oasisfeng 网上看到有人说，研究增强Java的 Hot Swap就是一个dead end。难道最近苦无进展的我，也要陷进这个死胡同了……？//再感慨一下两个月前的感慨。当你以为攀上了顶峰时，却只看到一条被迷雾吞没的险道通往更高处…… 不过我在攀登这座无尽之峰的山路中，沿途也收获了很多东西，一些将来无论踏足平川还是穿越激流时都能用得上的经验。 Thursday, 28th of January. iPad大失所望，看来不用贱卖我的eBook了。 看来Google对自家Public DNS的短处也看的很清楚，并且在积极推动改善。 http://googlecode.blogspot.com/2010/01/proposal-to-extend-dns-protocol.html 原来Google App Engine已经悄悄支持泛域解析了！在GAE中配置域名，并在Apps里添加应用URL为*，然后修改DNS配置，加一个*的CNAME指向ghs.l.google.com.就可以使用任意二级域名访问GAE应用了。 目前真正可用的Google官方ghs只剩最后的一个幸存者了，如果你从某些渠道获得所谓的ghs IP，请务必小心，这些很可能只是一个反向代理（存在一定的安全隐患）。验明真身的方法是ping -a 这个IP，看域名是否隶属于1e100.net Friday, 29th of January. 在新开张的Chromium-HTML5论坛上询问了Chrome支持Offline标准的计划，得到的答复是“It&#8217;s planned [...]]]></description>
			<content:encoded><![CDATA[<p>自从习惯了Twitter后，Blog写的是越来越少了。Twitter虽好，但相对于Blog，它其实很不利于内容的沉淀，再加上因国情问题而导致很多朋友无法访问，有价值的信息就此流失。为此，我准备尝试每周做一个Tweets的合辑，让这周中那些不是废话的内容能有机会沉淀下来，并且让更多人有机会从中获取有用的信息。当然，也随时欢迎在<a href="http://twitter.com/oasisfeng">Twitter上Follow我</a>。</p>
<p><span id="more-849"></span><strong><span style="text-decoration: underline;"><em>Monday, 25th of January.</em></span></strong></p>
<p>揭秘搜索引擎关键字过滤背后的实时审查系统 <a href="http://gfwrev.blogspot.com/2010/01/blog-post.html">http://gfwrev.blogspot.com/2010/01/blog-post.html</a></p>
<p>淘宝电器城在走一个危险的模式，它通过集中展现抹杀店铺的个体品牌和特色服务，最终将竞争引向赤裸裸的价格战。淘宝上那些靠口碑经营所积累的店铺品牌，可能就此毁于一旦。</p>
<p><strong><span style="text-decoration: underline;"><em>Tuesday, 26th of January.</em></span></strong></p>
<p>Google Reader的新特性，Feed for any page: <a href="http://googlereader.blogspot.com/2010/01/follow-changes-to-any-website.html">http://googlereader.blogspot.com/2010/01/follow-changes-to-any-website.html</a></p>
<p>看完了朋友推荐的《网瘾战争》，作为一个曾经的魔兽玩家，我能深切体会到这种对现实社会近乎绝望的悲愤呐喊。那一刻，我感动的忍不住落泪了……</p>
<p><strong><span style="text-decoration: underline;"><em>Wednesday, 27th of January.</em></span></strong></p>
<p>在Blog上有人留言说“我最近在重玩台服的軒網 那個轩网·异眼我找很久也找不到 請問可以給我嗎?” 一个网游能在关服数年后仍然拥有忠实的玩家，我想也只有轩辕剑网络版和WoW能做到吧……</p>
<p>RT @oasisfeng 网上看到有人说，研究增强Java的 Hot Swap就是一个dead end。难道最近苦无进展的我，也要陷进这个死胡同了……？//再感慨一下两个月前的感慨。当你以为攀上了顶峰时，却只看到一条被迷雾吞没的险道通往更高处……</p>
<p>不过我在攀登这座无尽之峰的山路中，沿途也收获了很多东西，一些将来无论踏足平川还是穿越激流时都能用得上的经验。</p>
<p><strong><span style="text-decoration: underline;"><em>Thursday, 28th of January.</em></span></strong></p>
<p>iPad大失所望，看来不用贱卖我的eBook了。</p>
<p>看来Google对自家Public DNS的短处也看的很清楚，并且在积极推动改善。 <a href="http://googlecode.blogspot.com/2010/01/proposal-to-extend-dns-protocol.html">http://googlecode.blogspot.com/2010/01/proposal-to-extend-dns-protocol.html</a></p>
<p>原来Google App Engine已经悄悄支持泛域解析了！在GAE中配置域名，并在Apps里添加应用URL为*，然后修改DNS配置，加一个*的CNAME指向ghs.l.google.com.就可以使用任意二级域名访问GAE应用了。</p>
<p>目前真正可用的Google官方ghs只剩最后的一个幸存者了，如果你从某些渠道获得所谓的ghs IP，请务必小心，这些很可能只是一个反向代理（存在一定的安全隐患）。验明真身的方法是ping -a 这个IP，看域名是否隶属于1e100.net</p>
<p><strong><span style="text-decoration: underline;"><em>Friday, 29th of January.</em></span></strong></p>
<p>在新开张的Chromium-HTML5论坛上询问了Chrome支持Offline标准的计划，得到的答复是“It&#8217;s planned for Chrome 5. I think it&#8217;s going to ship in the next dev channel as well.”</p>
<p>估计差不多正好那时候可以完成tb.ly的HTML5离线功能的重构。</p>
<p><strong><span style="text-decoration: underline;"><em>Saturday, 30th of January.</em></span></strong></p>
<p>tb.ly开始测试二级域名跳转功能，淘宝店铺二级域名无需申请即可直接享受tb.ly同名二级域名。例如，输入nokia.tb.ly即可直接跳转至nokia.taobao.com。后续将推出基于店铺二级域名的店内商品专享短网址~</p>
<p>有别于tb.ly采用php编写的核心部分，目前的二级域名支持是用Google App Engine实现的，关于其中的实现细节以及GAE和php混搭的经验，我准备写一篇Blog来分享。</p>
<p>写完了这篇《在Google App Engine中使用泛域二级域名》，该吃饭去了。 <a href="http://blog.oasisfeng.com/2010/01/30/use-wildcard-domains-in-google-appengine/">http://blog.oasisfeng.com/2010/01/30/use-wildcard-domains-in-google-appengine/</a></p>
<p><a href="http://www.squish.net/dnscheck/">http://www.squish.net/dnscheck/</a> 是我迄今为止见过的最为强大的DNS分析、检测工具。全路径解析跟踪对与诊断DNS记录的故障非常有帮助。</p>
<p><strong><span style="text-decoration: underline;"><em>Sunday, 31st of January.</em></span></strong></p>
<p>tb.ly为你提供超便捷的淘宝搜索：只需在浏览器地址栏输入“tb.ly?搜索关键字”即可带你直达搜索结果！现在就试试看： tb.ly?诺基亚E71</p>
<p>阔别十余年后，我又再次从浙江图书馆借来了这两本第一推动丛书中的《夸克与美洲豹》和《宇宙的琴弦》。抚摸着泛黄的书页，又想起初中时每天放学后在书店蹭书读的我，双腿站的发麻了都浑然不觉。</p>
<p>当年正是这套《第一推动丛书》让我一头扎进了科学的殿堂。前两个系列中的经典，我省吃俭用只买了《时间之箭》和《皇帝新脑》，其它很多本都是在书店蹭着读完的。后来许久没有关注，现在才知原来2004年又出了第三套。</p>
<p>Google开始在Docs和Sites服务中驱逐IE6！卸掉包袱，轻装上阵，只有做过前端的开发人员才能真切体会到Google这一决定背后的痛楚。<a href="http://is.gd/7pEQt">http://is.gd/7pEQt</a></p>
<p>好消息呀~ Google Apps Script终于开放给标准版用户了！现在免费用户也可以用Server-Side JavaScript给Google Docs编写高度自由的扩展功能了！真是让人不禁浮想联翩~ <a href="http://is.gd/7pGBA">http://is.gd/7pGBA</a></p>
<p>第一个想到的就是给我们的小饭桌账本(Google Spreadsheet)增加一个欠费提醒功能。哈哈哈~</p>
<p>Google Maps新版S60客户端的地标同步功能太方便了！今晚去普罗旺斯吃饭时就事先star了一下地标，从浙图出来直接打开Maps就找到地儿了~ 终于不用再很傻逼的从Google Maps里导出地标为Nokia格式，再传入手机了。</p>
<p>我现在用Live日历作为Google Calendar的桌面端呈现（这个还真有点讽刺……），不知各位推友是否有更方便易用的客户端推荐呢？</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.oasisfeng.com/2010/02/01/weekly-tweets/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>从胶水到运河——Google Wave的战略使命</title>
		<link>http://blog.oasisfeng.com/2009/10/12/the-strategy-vision-behind-google-wave/</link>
		<comments>http://blog.oasisfeng.com/2009/10/12/the-strategy-vision-behind-google-wave/#comments</comments>
		<pubDate>Mon, 12 Oct 2009 12:33:06 +0000</pubDate>
		<dc:creator>oasisfeng</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[Thinking]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[Gadget]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Robot]]></category>
		<category><![CDATA[Wave]]></category>
		<category><![CDATA[Web 2.0]]></category>

		<guid isPermaLink="false">http://blog.oasisfeng.com/?p=823</guid>
		<description><![CDATA[　　先来看一下Google的愿景及其诞生至今的战略布局。Google的终极愿景很明确，也几乎没有改变过，那就是：“整合全球信息，使人人皆可访问并从中受益。” 这句话讲的挺有技巧，整合全球信息，并非简单的供你们搜索和访问，“从中受益”，那前提是Google需要充分从这些信息中挖掘出价值，而后才能造福大众。“掌握和控制信息”是Google所有从属战略的核心。 　　第一代搜索引擎所代表的是“整合互联网静态信息”的愿景，Google借助其强大的搜索引擎和海量存储成功的树立了搜索领域的霸主地位。在这个年代，整合互联网信息的方式相对比较直接了当，那就是“蜘蛛+索引+搜索”。大部分静态内容都是可以方便的直接访问到的，因此Google只需要构建一个巨型索引就可以达到整合信息的战略目的了。 　　伴随着Web 2.0的迅速发展，互联网的主要构成已经由静态信息向用户贡献内容倾斜。越来越多的网站主要依靠用户发表或上传的内容主导，Google也因时而动的推出了一系列针对性的垂直搜索，例如Blog Search、Groups Search、Photo (Images) Search、Code Search。但是，Web 2.0的一个显著特征是社会化，这就造成为数不少的用户贡献信息并不面对搜索引擎的蜘蛛开放，尤其是各大SNS社区，几乎都主动屏蔽了搜索引擎。出于用户隐私的保护，其它信息形式，如照片、代码等往往也部分性的不开放给搜索引擎访问。另一方面，Web 2.0提供了更为结构化的信息，这些信息依靠蜘蛛的抓取很难保留其结构化的原貌。不断涌现的新情况让Google觉得相当的被动，于是它启动了一轮庞大的“信息控制战”，通过免费向用户提供有竞争力的各种信息的存储服务，达到将全球信息掌控在自己手中的目的。于是Gmail、Blogger、Picasa Web、Google Code便应运而生。但凡有其它初创型公司挡在了Google的战略大道上，便不客气的一口吞掉，比如YouTube、Writely。在如此强势的战略夹击下，Google在Web 2.0的时代勉强保持住了它在信息整合方面的优势地位，但面对Facebook、Twitter等新兴信息承载形式的崛起，却显得颇有些步履蹒跚、力不从心了。 　　在后Web 2.0时代，互联网对于大众的意义，已经逐渐从一个单纯的信息获取通道，转变为一个全能的服务平台，这在最近刚刚被提出的Gov 2.0中体现的尤为明显。交互应用开始取代信息媒体，成为互联网的主导力量，并剧烈的改变着互联网的面貌。尽管Google一直致力于推动数据开放化和API标准化，但互联网毕竟不是一家说了算，长尾的延伸，让Google执行其战略愿景的难度越来越大。搜索引擎的历史局限性注定其难以在新的互联网格局下继续担当整合全球信息的重任。首先，搜索只能控制用户在互联网行为的最初阶段，其快速的逸出性使搜索引擎很难像SNS那样掌握更丰富的用户信息；其次，交互应用取代单纯的信息呈现后，已经不再可能简单的通过搜索引擎体现其对用户的价值；最后，应用之间的Mashup使得互联网上信息的拓扑层次愈加复杂，搜索引擎扁平的索引方式已经很难有效整合这些信息。 　　在这样一个大背景下，Google开始酝酿其夺回战略主动权的新型武器，这就是Google Wave。表面上，如同Google所声称的那样，“Google Wave is an online communication and collaboration tool that makes real-time interactions more seamless &#8212; in one place, you can communicate and collaborate using richly formatted text, photos, videos, maps, and more.” Google当然不会直接告诉你Wave背后的战略意图，但无论是从开发资源投入、系统复杂程度、宣传推广攻势上，Wave都是空前的。其邀请机制的苛刻程度甚至超过了当年的Gmail（后者如今以成为Google除搜索外最引以为自豪的产品）。这些都充分显示出Google对于这款产品的重视程度。 　　其实，Wave API才是揭开Google [...]]]></description>
			<content:encoded><![CDATA[<p>　　先来看一下Google的愿景及其诞生至今的战略布局。Google的终极愿景很明确，也几乎没有改变过，那就是：<a href="http://www.google.cn/intl/zh-CN/corporate/">“整合全球信息，使人人皆可访问并从中受益。”</a> 这句话讲的挺有技巧，整合全球信息，并非简单的供你们搜索和访问，“从中受益”，那前提是Google需要充分从这些信息中挖掘出价值，而后才能造福大众。<strong>“掌握和控制信息”是Google所有从属战略的核心。</strong></p>
<p>　　<strong>第一代搜索引擎所代表的是“整合互联网静态信息”的愿景</strong>，Google借助其强大的搜索引擎和海量存储成功的树立了搜索领域的霸主地位。在这个年代，整合互联网信息的方式相对比较直接了当，那就是“蜘蛛+索引+搜索”。大部分静态内容都是可以方便的直接访问到的，因此Google只需要构建一个巨型索引就可以达到整合信息的战略目的了。</p>
<p><span id="more-823"></span>　　伴随着Web 2.0的迅速发展，互联网的主要构成已经由静态信息向用户贡献内容倾斜。越来越多的网站主要依靠用户发表或上传的内容主导，Google也因时而动的推出了一系列针对性的垂直搜索，例如Blog Search、Groups Search、Photo (Images) Search、Code Search。但是，Web 2.0的一个显著特征是社会化，这就造成为数不少的用户贡献信息并不面对搜索引擎的蜘蛛开放，尤其是各大SNS社区，几乎都主动屏蔽了搜索引擎。出于用户隐私的保护，其它信息形式，如照片、代码等往往也部分性的不开放给搜索引擎访问。另一方面，Web 2.0提供了更为结构化的信息，这些信息依靠蜘蛛的抓取很难保留其结构化的原貌。不断涌现的新情况让Google觉得相当的被动，于是它启动了一轮庞大的<strong>“信息控制战”</strong>，通过免费向用户提供有竞争力的各种信息的存储服务，达到将全球信息掌控在自己手中的目的。于是Gmail、Blogger、Picasa Web、Google Code便应运而生。但凡有其它初创型公司挡在了Google的战略大道上，便不客气的一口吞掉，比如YouTube、Writely。在如此强势的战略夹击下，Google在Web 2.0的时代勉强保持住了它在信息整合方面的优势地位，但<strong>面对Facebook、Twitter等新兴信息承载形式的崛起，却显得颇有些步履蹒跚、力不从心了</strong>。</p>
<p>　　在后Web 2.0时代，互联网对于大众的意义，已经逐渐从一个单纯的信息获取通道，转变为一个全能的服务平台，这在最近刚刚被提出的Gov 2.0中体现的尤为明显。交互应用开始取代信息媒体，成为互联网的主导力量，并剧烈的改变着互联网的面貌。尽管Google一直致力于推动数据开放化和API标准化，但互联网毕竟不是一家说了算，长尾的延伸，让Google执行其战略愿景的难度越来越大。<strong>搜索引擎的历史局限性注定其难以在新的互联网格局下继续担当整合全球信息的重任。</strong>首先，搜索只能控制用户在互联网行为的最初阶段，其快速的逸出性使搜索引擎很难像SNS那样掌握更丰富的用户信息；其次，交互应用取代单纯的信息呈现后，已经不再可能简单的通过搜索引擎体现其对用户的价值；最后，应用之间的Mashup使得互联网上信息的拓扑层次愈加复杂，搜索引擎扁平的索引方式已经很难有效整合这些信息。</p>
<p>　　在这样一个大背景下，Google开始酝酿其夺回战略主动权的新型武器，这就是Google Wave。表面上，如同Google所声称的那样，“Google Wave is an online communication and collaboration tool that makes real-time interactions more seamless &#8212; in one place, you can communicate and collaborate using richly formatted text, photos, videos, maps, and more.” Google当然不会直接告诉你Wave背后的战略意图，但无论是从开发资源投入、系统复杂程度、宣传推广攻势上，Wave都是空前的。其邀请机制的苛刻程度甚至超过了当年的Gmail（后者如今以成为Google除搜索外最引以为自豪的产品）。这些都充分显示出Google对于这款产品的重视程度。</p>
<p>　　<strong>其实，Wave API才是揭开Google Wave战略的关键。</strong>为什么Wave的Preview邀请明显倾向于开发者和合作伙伴？在Wave主体功能都尚未完成时，各类API和SDK却得到了优先的完善。显然，Google向第三方开发者频频伸出橄榄枝并不会单纯因为它同时提供API接口这么简单。下面就来细细解剖一下Wave API，看看这葫芦里究竟卖的什么药。</p>
<p>　　Wave API目前分为两个大类：Extension和Embed，前者相当于插件，为Wave扩充功能；后者相当于呈现包装，可以将Wave嵌入其它现有应用中。Embed的作用不用我多作解释了，而<strong>Extension则是Wave的战略核心</strong>。</p>
<p>　　Extension又可分为两个分支——Robot和Gadget，它们与Wave一起构成了应用开发中典型的MVC架构：Model是Wave框架本身，View是Gadget，而Controller则是Robot。Wave本身的三层数据结构（Wave-Wavelet-Blip），具有动态性、实时性、可交互性的特点，迎合互联网应用的Model设计需求。Gadget作为Rich Text等标准媒体类型之外的扩展接口，使Wave可以适用于各种特殊应用场合。而Robot则依应用形式的不同，可充当不同的具体角色：在封闭的Wave应用中，它们充当传统的Controller角色；而在Mashup的应用中，Robot则充当胶水，可以是Importer、Exporter或者是Transformer。</p>
<p>　　Wave API的一些设计理念也从一些侧面折射出Wave的战略愿景，比如：</p>
<ul>
<li><strong>Robot也可以创建它的Private Wavelet，或者与其它Robot共享Wavelet。</strong>这些Wavelets对用户是不可见的，它们可以被用作Robot的持久存储，或是通信通道。</li>
<li><strong>不可见Wavelet可以看成是一个支持事件通知的“通信通道”，其中的Blip则是单个“消息”。</strong>这给应用开发提供了充分的发挥空间，比如用作类似于Unix下的管道，实现应用搭桥；或者用作Provider/Consumer模型的任务分发队列；又或者用于选择性的组播；甚至是可修改的全局参数集。</li>
<li><strong>可见的Wavelet则可以当作一个人机交互的接口，便于用户以一种类似对话和交互编辑的形式与应用进行交流</strong>。当然，借助Gadget，还可以扩充至任何需要的交互形式。</li>
</ul>
<p>　　虽然整合目前互联网上的众多应用并不现实，但Google鼓励开发者通过Extension将应用的功能和接口嵌入到Wave中来，即Google所设想的“All in one place”。表面上，Wave像胶水一样，可以方便的将实现了Wave Importer和Exporter的应用Mashup。不过，<strong>当胶水成为事实上的标准之后，应用之间的竞争壁垒则被Google渐渐的填平。</strong>到那时，Wave便升华为运河，而你无论是摆渡者，还是商人，都只是在这条河道上碌碌奔波的营生者，时刻担心被竞争对手取而代之，只有这条运河本身，才是无法被取代的垄断者。</p>
<hr />
<p>　　还记得很久之前曾经读过一篇科幻小说，描绘了一个“互联网的幻境”：在那里有各式各样的智能零件，每个人可以充分发挥想象力组装出各种或实用或有趣的器具来装点自己的小屋，整个幻境世界的公共设施也是由术士们（开发者）自发协作建设起来的，而那些Geek们则喜欢利用系统的漏洞玩一些“黑魔法”。</p>
<p>　　透过Google Wave，我似乎依稀看到了“互联网幻境”的轮廓……</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.oasisfeng.com/2009/10/12/the-strategy-vision-behind-google-wave/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>GFS: Evolution on Fast-forward</title>
		<link>http://blog.oasisfeng.com/2009/08/16/gfs-evolution-on-fast-forward/</link>
		<comments>http://blog.oasisfeng.com/2009/08/16/gfs-evolution-on-fast-forward/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 02:25:26 +0000</pubDate>
		<dc:creator>oasisfeng</dc:creator>
				<category><![CDATA[Architecture]]></category>
		<category><![CDATA[BigTable]]></category>
		<category><![CDATA[File System]]></category>
		<category><![CDATA[GFS]]></category>
		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://blog.oasisfeng.com/?p=789</guid>
		<description><![CDATA[转载自ACM Queue &#8211; GFS: Evolution on Fast-forward A discussion between Kirk McKusick (known for his work on BSD Unix, including the original design of the Berkeley Fast File System) and Sean Quinlan (served as the GFS tech leader for a couple of years and continues now as a principal engineer at Google) about the origin [...]]]></description>
			<content:encoded><![CDATA[<p>转载自ACM Queue &#8211; <a href="http://queue.acm.org/detail.cfm?id=1594206">GFS: Evolution on Fast-forward</a></p>
<p>A discussion between Kirk McKusick (known for his work on BSD Unix, including the original design of the Berkeley Fast File System) and Sean Quinlan (served as the GFS tech leader for a couple of years and continues now as a principal engineer at Google) about the origin and evolution of the Google File System.</p>
<p>The discussion starts, appropriately enough, at the beginning—with the unorthodox decision to base the initial GFS implementation on a single-master design. At first blush, the risk of a single centralized master becoming a bandwidth bottleneck—or, worse, a single point of failure—seems fairly obvious, but it turns out Google&#8217;s engineers had their reasons for making this choice.</p>
<p>可能和我们想象中Google的分布式系统设计原则完全对立的一个决定，是如何产生的呢？这段对话就是从这个有趣的话题开始的。</p>
<p>整个对话在两个对文件系统有着深刻理解的业界专家之间展开，从分布式体系的设计思路及其演进、吞吐和延迟的取舍、性能瓶颈的解决策略，以及GFS和 BigTable之间相辅相成的内在联系。印象中这还是Google第一次在公开场合提及大量GFS的运作方式和实现策略的细节，强烈推荐给做分布式系统的技术人员！</p>
<hr />
<p>GFS: EVOLUTION ON FAST-FORWARD<br />
A DISCUSSION BETWEEN KIRK MCKUSICK AND SEAN QUINLAN ABOUT THE ORIGIN AND EVOLUTION OF THE GOOGLE FILE SYSTEM.</p>
<p>During the early stages of development at Google, the initial thinking did not include plans for building a new file system. While work was still being done on one of the earliest versions of the company&#8217;s crawl and indexing system, however, it became quite clear to the core engineers that they really had no other choice, and GFS (Google File System) was born.</p>
<p><span id="more-789"></span>First, given that Google&#8217;s goal was to build a vast storage network out of inexpensive commodity hardware, it had to be assumed that component failures would be the norm—meaning that constant monitoring, error detection, fault tolerance, and automatic recovery would have to be an integral part of the file system. Also, even by Google&#8217;s earliest estimates, the system&#8217;s throughput requirements were going to be daunting by anybody&#8217;s standards—featuring multi-gigabyte files and data sets containing terabytes of information and millions of objects. Clearly, this meant traditional assumptions about I/O operations and block sizes would have to be revisited. There was also the matter of scalability. This was a file system that would surely need to scale like no other. Of course, back in those earliest days, no one could have possibly imagined just how much scalability would be required. They would learn about that soon enough.</p>
<p>Still, nearly a decade later, most of Google&#8217;s mind-boggling store of data and its ever-growing array of applications continue to rely upon GFS. Many adjustments have been made to the file system along the way, and—together with a fair number of accommodations implemented within the applications that use GFS—they have made the journey possible.</p>
<p>To explore the reasoning behind a few of the more crucial initial design decisions as well as some of the incremental adaptations that have been made since then, ACM asked Sean Quinlan to pull back the covers on the changing file-system requirements and the evolving thinking at Google. Since Quinlan served as the GFS tech leader for a couple of years and continues now as a principal engineer at Google, he&#8217;s in a good position to offer that perspective. As a grounding point beyond the Googleplex, ACM asked Kirk McKusick to lead the discussion. He is best known for his work on BSD (Berkeley Software Distribution) Unix, including the original design of the Berkeley FFS (Fast File System).</p>
<p>The discussion starts, appropriately enough, at the beginning—with the unorthodox decision to base the initial GFS implementation on a single-master design. At first blush, the risk of a single centralized master becoming a bandwidth bottleneck—or, worse, a single point of failure—seems fairly obvious, but it turns out Google&#8217;s engineers had their reasons for making this choice.</p>
<p>MCKUSICK One of the more interesting—and significant—aspects of the original GFS architecture was the decision to base it on a single master. Can you walk us through what led to that decision?</p>
<p>QUINLAN The decision to go with a single master was actually one of the very first decisions, mostly just to simplify the overall design problem. That is, building a distributed master right from the outset was deemed too difficult and would take too much time. Also, by going with the single-master approach, the engineers were able to simplify a lot of problems. Having a central place to control replication and garbage collection and many other activities was definitely simpler than handling it all on a distributed basis. So the decision was made to centralize that in one machine.</p>
<p>MCKUSICK Was this mostly about being able to roll out something within a reasonably short time frame?</p>
<p>QUINLAN Yes. In fact, some of the engineers who were involved in that early effort later went on to build BigTable, a distributed storage system, but that effort took many years. The decision to build the original GFS around the single master really helped get something out into the hands of users much more rapidly than would have otherwise been possible.</p>
<p>Also, in sketching out the use cases they anticipated, it didn&#8217;t seem the single-master design would cause much of a problem. The scale they were thinking about back then was framed in terms of hundreds of terabytes and a few million files. In fact, the system worked just fine to start with.</p>
<p>MCKUSICK But then what?</p>
<p>QUINLAN Problems started to occur once the size of the underlying storage increased. Going from a few hundred terabytes up to petabytes, and then up to tens of petabytes� that really required a proportionate increase in the amount of metadata the master had to maintain. Also, operations such as scanning the metadata to look for recoveries all scaled linearly with the volume of data. So the amount of work required of the master grew substantially. The amount of storage needed to retain all that information grew as well.</p>
<p>In addition, this proved to be a bottleneck for the clients, even though the clients issue few metadata operations themselves—for example, a client talks to the master whenever it does an open. When you have thousands of clients all talking to the master at the same time, given that the master is capable of doing only a few thousand operations a second, the average client isn&#8217;t able to command all that many operations per second. Also bear in mind that there are applications such as MapReduce, where you might suddenly have a thousand tasks, each wanting to open a number of files. Obviously, it would take a long time to handle all those requests, and the master would be under a fair amount of duress.</p>
<p>MCKUSICK Now, under the current schema for GFS, you have one master per cell, right?</p>
<p>QUINLAN That&#8217;s correct.</p>
<p>MCKUSICK And historically you&#8217;ve had one cell per data center, right?</p>
<p>QUINLAN That was initially the goal, but it didn&#8217;t work out like that to a large extent—partly because of the limitations of the single-master design and partly because isolation proved to be difficult. As a consequence, people generally ended up with more than one cell per data center. We also ended up doing what we call a &#8220;multi-cell&#8221; approach, which basically made it possible to put multiple GFS masters on top of a pool of chunkservers. That way, the chunkservers could be configured to have, say, eight GFS masters assigned to them, and that would give you at least one pool of underlying storage—with multiple master heads on it, if you will. Then the application was responsible for partitioning data across those different cells.</p>
<p>MCKUSICK Presumably each application would then essentially have its own master that would be responsible for managing its own little file system. Was that basically the idea?</p>
<p>QUINLAN Well, yes and no. Applications would tend to use either one master or a small set of the masters. We also have something we called Name Spaces, which are just a very static way of partitioning a namespace that people can use to hide all of this from the actual application. The Logs Processing System offers an example of this approach: once logs exhaust their ability to use just one cell, they move to multiple GFS cells; a namespace file describes how the log data is partitioned across those different cells and basically serves to hide the exact partitioning from the application. But this is all fairly static.</p>
<p>MCKUSICK What&#8217;s the performance like, in light of all that?</p>
<p>QUINLAN We ended up putting a fair amount of effort into tuning master performance, and it&#8217;s atypical of Google to put a lot of work into tuning any one particular binary. Generally, our approach is just to get things working reasonably well and then turn our focus to scalability—which usually works well in that you can generally get your performance back by scaling things. Because in this instance we had a single bottleneck that was starting to have an impact on operations, however, we felt that investing a bit of additional effort into making the master lighter weight would be really worthwhile. In the course of scaling from thousands of operations to tens of thousands and beyond, the single master had become somewhat less of a bottleneck. That was a case where paying more attention to the efficiency of that one binary definitely helped keep GFS going for quite a bit longer than would have otherwise been possible.</p>
<p>It could be argued that managing to get GFS ready for production in record time constituted a victory in its own right and that, by speeding Google to market, this ultimately contributed mightily to the company&#8217;s success. A team of three was responsible for all of that—for the core of GFS—and for the system being readied for deployment in less than a year.</p>
<p>But then came the price that so often befalls any successful system—that is, once the scale and use cases have had time to expand far beyond what anyone could have possibly imagined. In Google&#8217;s case, those pressures proved to be particularly intense.</p>
<p>Although organizations don&#8217;t make a habit of exchanging file-system statistics, it&#8217;s safe to assume that GFS is the largest file system in operation (in fact, that was probably true even before Google&#8217;s acquisition of YouTube). Hence, even though the original architects of GFS felt they had provided adequately for at least a couple of orders of magnitude of growth, Google quickly zoomed right past that.</p>
<p>In addition, the number of applications GFS was called upon to support soon ballooned. In an interview with one of the original GFS architects, Howard Gobioff (conducted just prior to his surprising death in early 2008), he recalled, &#8220;The original consumer of all our earliest GFS versions was basically this tremendously large crawling and indexing system. The second wave came when our quality team and research groups started using GFS rather aggressively—and basically, they were all looking to use GFS to store large data sets. And then, before long, we had 50 users, all of whom required a little support from time to time so they&#8217;d all keep playing nicely with each other.&#8221;</p>
<p>One thing that helped tremendously was that Google built not only the file system but also all of the applications running on top of it. While adjustments were continually made in GFS to make it more accommodating to all the new use cases, the applications themselves were also developed with the various strengths and weaknesses of GFS in mind. &#8220;Because we built everything, we were free to cheat whenever we wanted to,&#8221; Gobioff neatly summarized. &#8220;We could push problems back and forth between the application space and the file-system space, and then work out accommodations between the two.&#8221;</p>
<p>The matter of sheer scale, however, called for some more substantial adjustments. One coping strategy had to do with the use of multiple &#8220;cells&#8221; across the network, functioning essentially as related but distinct file systems. Besides helping to deal with the immediate problem of scale, this proved to be a more efficient arrangement for the operations of widely dispersed data centers.</p>
<p>Rapid growth also put pressure on another key parameter of the original GFS design: the choice to establish 64 MB as the standard chunk size. That, of course, was much larger than the typical file-system block size, but only because the files generated by Google&#8217;s crawling and indexing system were unusually large. As the application mix changed over time, however, ways had to be found to let the system deal efficiently with large numbers of files requiring far less than 64 MB (think in terms of Gmail, for example). The problem was not so much with the number of files itself, but rather with the memory demands all of those files made on the centralized master, thus exposing one of the bottleneck risks inherent in the original GFS design.</p>
<p>MCKUSICK I gather from the original GFS paper [Ghemawat, S., Gobioff, H.,  Leung, S-T. 2003. The Google File System. SOSP (ACM Symposium on Operating Systems Principles)] that file counts have been a significant issue for you right along. Can you go into that a little bit?                    </p>
<p>QUINLAN The file-count issue came up fairly early because of the way people ended up designing their systems around GFS. Let me cite a specific example. Early in my time at Google, I was involved in the design of the Logs Processing system. We initially had a model where a front-end server would write a log, which we would then basically copy into GFS for processing and archival. That was fine to start with, but then the number of front-end servers increased, each rolling logs every day. At the same time, the number of log types was going up, and then you&#8217;d have front-end servers that would go through crash loops and generate lots more logs. So we ended up with a lot more files than we had anticipated based on our initial back-of-the-envelope estimates.</p>
<p>This became an area we really had to keep an eye on. Finally, we just had to concede there was no way we were going to survive a continuation of the sort of file-count growth we had been experiencing.</p>
<p>MCKUSICK Let me make sure I&#8217;m following this correctly: your issue with file-count growth is a result of your needing to have a piece of metadata on the master for each file, and that metadata has to fit in the master&#8217;s memory.</p>
<p>QUINLAN That&#8217;s correct.</p>
<p>MCKUSICK And there are only a finite number of files you can accommodate before the master runs out of memory?</p>
<p>QUINLAN Exactly. And there are two bits of metadata. One identifies the file, and the other points out the chunks that back that file. If you had a chunk that contained only 1 MB, it would take up only 1 MB of disk space, but it still would require those two bits of metadata on the master. If your average file size ends up dipping below 64 MB, the ratio of the number of objects on your master to what you have in storage starts to go down. That&#8217;s where you run into problems.</p>
<p>Going back to that logs example, it quickly became apparent that the natural mapping we had thought of—and which seemed to make perfect sense back when we were doing our back-of-the-envelope estimates—turned out not to be acceptable at all. We needed to find a way to work around this by figuring out how we could combine some number of underlying objects into larger files. In the case of the logs, that wasn&#8217;t exactly rocket science, but it did require a lot of effort.</p>
<p>MCKUSICK That sounds like the old days when IBM had only a minimum disk allocation, so it provided you with a utility that let you pack a bunch of files together and then create a table of contents for that.</p>
<p>QUINLAN Exactly. For us, each application essentially ended up doing that to varying degrees. That proved to be less burdensome for some applications than for others. In the case of our logs, we hadn&#8217;t really been planning to delete individual log files. It was more likely that we would end up rewriting the logs to anonymize them or do something else along those lines. That way, you don&#8217;t get the garbage-collection problems that can come up if you delete only some of the files within a bundle.</p>
<p>For some other applications, however, the file-count problem was more acute. Many times, the most natural design for some application just wouldn&#8217;t fit into GFS—even though at first glance you would think the file count would be perfectly acceptable, it would turn out to be a problem. When we started using more shared cells, we put quotas on both file counts and storage space. The limit that people have ended up running into most has been, by far, the file-count quota. In comparison, the underlying storage quota rarely proves to be a problem.</p>
<p>MCKUSICK What longer-term strategy have you come up with for dealing with the file-count issue? Certainly, it doesn&#8217;t seem that a distributed master is really going to help with that—not if the master still has to keep all the metadata in memory, that is.</p>
<p>QUINLAN The distributed master certainly allows you to grow file counts, in line with the number of machines you&#8217;re willing to throw at it. That certainly helps. </p>
<p>One of the appeals of the distributed multimaster model is that if you scale everything up by two orders of magnitude, then getting down to a 1-MB average file size is going to be a lot different from having a 64-MB average file size. If you end up going below 1 MB, then you&#8217;re also going to run into other issues that you really need to be careful about. For example, if you end up having to read 10,000 10-KB files, you&#8217;re going to be doing a lot more seeking than if you&#8217;re just reading 100 1-MB files.</p>
<p>My gut feeling is that if you design for an average 1-MB file size, then that should provide for a much larger class of things than does a design that assumes a 64-MB average file size. Ideally, you would like to imagine a system that goes all the way down to much smaller file sizes, but 1 MB seems a reasonable compromise in our environment.</p>
<p>MCKUSICK What have you been doing to design GFS to work with 1-MB files?</p>
<p>QUINLAN We haven&#8217;t been doing anything with the existing GFS design. Our distributed master system that will provide for 1-MB files is essentially a whole new design. That way, we can aim for something on the order of 100 million files per master. You can also have hundreds of masters.</p>
<p>MCKUSICK So, essentially no single master would have all this data on it?</p>
<p>QUINLAN That&#8217;s the idea.</p>
<p>With the recent emergence within Google of BigTable, a distributed storage system for managing structured data, one potential remedy for the file-count problem—albeit perhaps not the very best one—is now available.</p>
<p>The significance of BigTable goes far beyond file counts, however. Specifically, it was designed to scale into the petabyte range across hundreds or thousands of machines, as well as to make it easy to add more machines to the system and automatically start taking advantage of those resources without reconfiguration. For a company predicated on the notion of employing the collective power, potential redundancy, and economies of scale inherent in a massive deployment of commodity hardware, these rate as significant advantages indeed.</p>
<p>Accordingly, BigTable is now used in conjunction with a growing number of Google applications. Although it represents a departure of sorts from the past, it also must be said that BigTable was built on GFS, runs on GFS, and was consciously designed to remain consistent with most GFS principles. Consider it, therefore, as one of the major adaptations made along the way to help keep GFS viable in the face of rapid and widespread change.</p>
<p>MCKUSICK You now have this thing called BigTable. Do you view that as an application in its own right?</p>
<p>QUINLAN From the GFS point of view, it&#8217;s an application, but it&#8217;s clearly more of an infrastructure piece.</p>
<p>MCKUSICK If I understand this correctly, BigTable is essentially a lightweight relational database.</p>
<p>QUINLAN It&#8217;s not really a relational database. I mean, we&#8217;re not doing SQL and it doesn&#8217;t really support joins and such. But BigTable is a structured storage system that lets you have lots of key-value pairs and a schema.</p>
<p>MCKUSICK Who are the real clients of BigTable?</p>
<p>QUINLAN BigTable is increasingly being used within Google for crawling and indexing systems, and we use it a lot within many of our client-facing applications. The truth of the matter is that there are tons of BigTable clients. Basically, any app with lots of small data items tends to use BigTable. That&#8217;s especially true wherever there&#8217;s fairly structured data.</p>
<p>MCKUSICK I guess the question I&#8217;m really trying to pose here is: Did BigTable just get stuck into a lot of these applications as an attempt to deal with the small-file problem, basically by taking a whole bunch of small things and then aggregating them together?</p>
<p>QUINLAN That has certainly been one use case for BigTable, but it was actually intended for a much more general sort of problem. If you&#8217;re using BigTable in that way—that is, as a way of fighting the file-count problem where you might have otherwise used a file system to handle that—then you would not end up employing all of BigTable&#8217;s functionality by any means. BigTable isn&#8217;t really ideal for that purpose in that it requires resources for its own operations that are nontrivial. Also, it has a garbage-collection policy that&#8217;s not super-aggressive, so that might not be the most efficient way to use your space. I&#8217;d say that the people who have been using BigTable purely to deal with the file-count problem probably haven&#8217;t been terribly happy, but there&#8217;s no question that it is one way for people to handle that problem.</p>
<p>MCKUSICK What I&#8217;ve read about GFS seems to suggest that the idea was to have only two basic data structures: logs and SSTables (Sorted String Tables). Since I&#8217;m guessing the SSTables must be used to handle key-value pairs and that sort of thing, how is that different from BigTable?</p>
<p>QUINLAN The main difference is that SSTables are immutable, while BigTable provides mutable key value storage, and a whole lot more. BigTable itself is actually built on top of logs and SSTables. Initially, it stores incoming data into transaction log files. Then it gets compacted—as we call it—into a series of SSTables, which in turn get compacted together over time. In some respects, it&#8217;s reminiscent of a log-structure file system. Anyway, as you&#8217;ve observed, logs and SSTables do seem to be the two data structures underlying the way we structure most of our data. We have log files for mutable stuff as it&#8217;s being recorded. Then, once you have enough of that, you sort it and put it into this structure that has an index.</p>
<p>Even though GFS does not provide a Posix interface, it still has a pretty generic file-system interface, so people are essentially free to write any sort of data they like. It&#8217;s just that, over time, the majority of our users have ended up using these two data structures. We also have something called protocol buffers, which is our data description language. The majority of data ends up being protocol buffers in these two structures.</p>
<p>Both provide for compression and checksums. Even though there are some people internally who end up reinventing these things, most people are content just to use those two basic building blocks.</p>
<p>Because GFS was designed initially to enable a crawling and indexing system, throughput was everything. In fact, the original paper written about the system makes this quite explicit: &#8220;High sustained bandwidth is more important than low latency. Most of our target applications place a premium on processing data in bulk at a high rate, while few have stringent response-time requirements for an individual read and write.&#8221;</p>
<p>But then Google either developed or embraced many user-facing Internet services for which this is most definitely not the case.</p>
<p>One GFS shortcoming that this immediately exposed had to do with the original single-master design. A single point of failure may not have been a disaster for batch-oriented applications, but it was certainly unacceptable for latency-sensitive applications, such as video serving. The later addition of automated failover capabilities helped, but even then service could be out for up to a minute.</p>
<p>The other major challenge for GFS, of course, has revolved around finding ways to build latency-sensitive applications on top of a file system designed around an entirely different set of priorities.  </p>
<p>MCKUSICK It&#8217;s well documented that the initial emphasis in designing GFS was on batch efficiency as opposed to low latency. Now that has come back to cause you trouble, particularly in terms of handling things such as videos. How are you handling that?</p>
<p>QUINLAN The GFS design model from the get-go was all about achieving throughput, not about the latency at which that might be achieved. To give you a concrete example, if you&#8217;re writing a file, it will typically be written in triplicate—meaning you&#8217;ll actually be writing to three chunkservers. Should one of those chunkservers die or hiccup for a long period of time, the GFS master will notice the problem and schedule what we call a pullchunk, which means it will basically replicate one of those chunks. That will get you back up to three copies, and then the system will pass control back to the client, which will continue writing.</p>
<p>When we do a pullchunk we limit it to something on the order of 5-10 MB a second. So, for 64 MB, you&#8217;re talking about 10 seconds for this recovery to take place. There are lots of other things like this that might take 10 seconds to a minute, which works just fine for batch-type operations. If you&#8217;re doing a large MapReduce operation, you&#8217;re OK just so long as one of the items is not a real straggler, in which case you&#8217;ve got yourself a different sort of problem. Still, generally speaking, a hiccup on the order of a minute over the course of an hour-long batch job doesn&#8217;t really show up. If you are working on Gmail, however, and you&#8217;re trying to write a mutation that represents some user action, then getting stuck for a minute is really going to mess you up.</p>
<p>We&#8217;ve had similar issues with our master failover. Initially, GFS had no provision for automatic master failover. It was a manual process. Although it didn&#8217;t happen a lot, whenever it did, the cell might be down for an hour. Even our initial master-failover implementation required on the order of minutes. Over the past year, however, we&#8217;ve taken that down to something on the order of tens of seconds.</p>
<p>MCKUSICK Still, for user-facing applications, that&#8217;s not acceptable.</p>
<p>QUINLAN Right. While these instances—where you have to provide for failover and error recovery—may have been acceptable in the batch situation, they&#8217;re definitely not OK from a latency point of view for a user-facing application. Another issue here is that there are places in the design where we&#8217;ve tried to optimize for throughput by dumping thousands of operations into a queue and then just processing through them. That leads to fine throughput, but it&#8217;s not great for latency. You can easily get into situations where you might be stuck for seconds at a time in a queue just waiting to get to the head of the queue.</p>
<p>Our user base has definitely migrated from being a MapReduce-based world to more of an interactive world that relies on things such as BigTable. Gmail is an obvious example of that. Videos aren&#8217;t quite as bad where GFS is concerned because you get to stream data, meaning you can buffer. Still, trying to build an interactive database on top of a file system that was designed from the start to support more batch-oriented operations has certainly proved to be a pain point.</p>
<p>MCKUSICK How exactly have you managed to deal with that?</p>
<p>QUINLAN Within GFS, we&#8217;ve managed to improve things to a certain degree, mostly by designing the applications to deal with the problems that come up. Take BigTable as a good concrete example. The BigTable transaction log is actually the biggest bottleneck for getting a transaction logged. In effect, we decided, &#8220;Well, we&#8217;re going to see hiccups in these writes, so what we&#8217;ll do is to have two logs open at any one time. Then we&#8217;ll just basically merge the two. We&#8217;ll write to one and if that gets stuck, we&#8217;ll write to the other. We&#8217;ll merge those logs once we do a replay—if we need to do a replay, that is.&#8221; We tended to design our applications to function like that—which is to say they basically try to hide that latency since they know the system underneath isn&#8217;t really all that great.</p>
<p>The guys who built Gmail went to a multihomed model, so if one instance of your Gmail account got stuck, you would basically just get moved to another data center. Actually, that capability was needed anyway just to ensure availability. Still, part of the motivation was that they wanted to hide the GFS problems.</p>
<p>MCKUSICK I think it&#8217;s fair to say that, by moving to a distributed-master file system, you&#8217;re definitely going to be able to attack some of those latency issues.</p>
<p>QUINLAN That was certainly one of our design goals. Also, BigTable itself is a very failure-aware system that tries to respond to failures far more rapidly than we were able to before. Using that as our metadata storage helps with some of those latency issues as well.</p>
<p>The engineers who worked on the earliest versions of GFS weren&#8217;t particularly shy about departing from traditional choices in file-system design whenever they felt the need to do so. It just so happens that the approach taken to consistency is one of the aspects of the system where this is particularly evident.</p>
<p>Part of this, of course, was driven by necessity. Since Google&#8217;s plans rested largely on massive deployments of commodity hardware, failures and hardware-related faults were a given. Beyond that, according to the original GFS paper, there were a few compatibility issues. &#8220;Many of our disks claimed to the Linux driver that they supported a range of IDE protocol versions but in fact responded reliably only to the more recent ones. Since the protocol versions are very similar, these drives mostly worked but occasionally the mismatches would cause the drive and the kernel to disagree about the drive&#8217;s state. This would corrupt data silently due to problems in the kernel. This problem motivated our use of checksums to detect data corruption.&#8221;</p>
<p>That didn&#8217;t mean just any checksumming, however, but instead rigorous end-to-end checksumming, with an eye to everything from disk corruption to TCP/IP corruption to machine backplane corruption.</p>
<p>Interestingly, for all that checksumming vigilance, the GFS engineering team also opted for an approach to consistency that&#8217;s relatively loose by file-system standards. Basically, GFS simply accepts that there will be times when people will end up reading slightly stale data. Since GFS is used mostly as an append-only system as opposed to an overwriting system, this generally means those people might end up missing something that was appended to the end of the file after they&#8217;d already opened it. To the GFS designers, this seemed an acceptable cost (although it turns out that there are applications for which this proves problematic).</p>
<p>Also, as Gobioff explained, &#8220;The risk of stale data in certain circumstances is just inherent to a highly distributed architecture that doesn&#8217;t ask the master to maintain all that much information. We definitely could have made things a lot tighter if we were willing to dump a lot more data into the master and then have it maintain more state. But that just really wasn&#8217;t all that critical to us.&#8221;</p>
<p>Perhaps an even more important issue here is that the engineers making this decision owned not just the file system but also the applications intended to run on the file system. According to Gobioff, &#8220;The thing is that we controlled both the horizontal and the vertical—the file system and the application. So we could be sure our applications would know what to expect from the file system. And we just decided to push some of the complexity out to the applications to let them deal with it.&#8221;</p>
<p>Still, there are some at Google who wonder whether that was the right call if only because people can sometimes obtain different data in the course of reading a given file multiple times, which tends to be so strongly at odds with their whole notion of how data storage is supposed to work.</p>
<p>MCKUSICK Let&#8217;s talk about consistency. The issue seems to be that it presumably takes some amount of time to get everything fully written to all the replicas. I think you said something earlier to the effect that GFS essentially requires that this all be fully written before you can continue.</p>
<p>QUINLAN That&#8217;s correct.</p>
<p>MCKUSICK If that&#8217;s the case, then how can you possibly end up with things that aren&#8217;t consistent?</p>
<p>QUINLAN Client failures have a way of fouling things up. Basically, the model in GFS is that the client just continues to push the write until it succeeds. If the client ends up crashing in the middle of an operation, things are left in a bit of an indeterminate state.</p>
<p>Early on, that was sort of considered to be OK, but over time, we tightened the window for how long that inconsistency could be tolerated, and then we slowly continued to reduce that. Otherwise, whenever the data is in that inconsistent state, you may get different lengths for the file. That can lead to some confusion. We had to have some backdoor interfaces for checking the consistency of the file data in those instances. We also have something called RecordAppend, which is an interface designed for multiple writers to append to a log concurrently. There the consistency was designed to be very loose. In retrospect, that turned out to be a lot more painful than anyone expected.</p>
<p>MCKUSICK What exactly was loose? If the primary replica picks what the offset is for each write and then makes sure that actually occurs, I don&#8217;t see where the inconsistencies are going to come up.</p>
<p>QUINLAN What happens is that the primary will try. It will pick an offset, it will do the writes, but then one of them won&#8217;t actually get written. Then the primary might change, at which point it can pick a different offset. RecordAppend does not offer any replay protection either. You could end up getting the data multiple times in the file.</p>
<p>There were even situations where you could get the data in a different order. It might appear multiple times in one chunk replica, but not necessarily in all of them. If you were reading the file, you could discover the data in different ways at different times. At the record level, you could discover the records in different orders depending on which chunks you happened to be reading.</p>
<p>MCKUSICK Was this done by design?</p>
<p>QUINLAN At the time, it must have seemed like a good idea, but in retrospect I think the consensus is that it proved to be more painful than it was worth. It just doesn&#8217;t meet the expectations people have of a file system, so they end up getting surprised. Then they had to figure out work-arounds.</p>
<p>MCKUSICK In retrospect, how would you handle this differently?</p>
<p>QUINLAN I think it makes more sense to have a single writer per file.</p>
<p>MCKUSICK All right, but what happens when you have multiple people wanting to append to a log?</p>
<p>QUINLAN You serialize the writes through a single process that can ensure the replicas are consistent.</p>
<p>MCKUSICK There&#8217;s also this business where you essentially snapshot a chunk. Presumably, that&#8217;s something you use when you&#8217;re essentially replacing a replica, or whenever some chunkserver goes down and you need to replace some of its files.</p>
<p>QUINLAN Actually, two things are going on there. One, as you suggest, is the recovery mechanism, which definitely involves copying around replicas of the file. The way that works in GFS is that we basically revoke the lock so that the client can&#8217;t write it anymore, and this is part of that latency issue we were talking about.</p>
<p>There&#8217;s also a separate issue, which is to support the snapshot feature of GFS. GFS has the most general-purpose snapshot capability you can imagine. You could snapshot any directory somewhere, and then both copies would be entirely equivalent. They would share the unchanged data. You could change either one and you could further snapshot either one. So it was really more of a clone than what most people think of as a snapshot. It&#8217;s an interesting thing, but it makes for difficulties—especially as you try to build more distributed systems and you want potentially to snapshot larger chunks of the file tree.</p>
<p>I also think it&#8217;s interesting that the snapshot feature hasn&#8217;t been used more since it&#8217;s actually a very powerful feature. That is, from a file-system point of view, it really offers a pretty nice piece of functionality. But putting snapshots into file systems, as I&#8217;m sure you know, is a real pain.</p>
<p>MCKUSICK:  I know. I&#8217;ve done it. It&#8217;s excruciating—especially in an overwriting file system.</p>
<p>QUINLAN Exactly. This is a case where we didn&#8217;t cheat, but from an implementation perspective, it&#8217;s hard to create true snapshots. Still, it seems that in this case, going the full deal was the right decision. Just the same, it&#8217;s an interesting contrast to some of the other decisions that were made early on in terms of the semantics.</p>
<p>All in all, the report card on GFS nearly 10 years later seems positive. There have been problems and shortcomings, to be sure, but there&#8217;s surely no arguing with Google&#8217;s success and GFS has without a doubt played an important role in that. What&#8217;s more, its staying power has been nothing short of remarkable given that Google&#8217;s operations have scaled orders of magnitude beyond anything the system had been designed to handle, while the application mix Google currently supports is not one that anyone could have possibly imagined back in the late &#8217;90s.</p>
<p>Still, there&#8217;s no question that GFS faces many challenges now. For one thing, the awkwardness of supporting an ever-growing fleet of user-facing, latency-sensitive applications on top of a system initially designed for batch-system throughput is something that&#8217;s obvious to all.</p>
<p>The advent of BigTable has helped somewhat in this regard. As it turns out, however, BigTable isn&#8217;t actually all that great a fit for GFS. In fact, it just makes the bottleneck limitations of the system&#8217;s single-master design more apparent than would otherwise be the case.</p>
<p>For these and other reasons, engineers at Google have been working for much of the past two years on a new distributed master system designed to take full advantage of BigTable to attack some of those problems that have proved particularly difficult for GFS.</p>
<p>Accordingly, it now seems that beyond all the adjustments made to ensure the continued survival of GFS, the newest branch on the evolutionary tree will continue to grow in significance over the years to come.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.oasisfeng.com/2009/08/16/gfs-evolution-on-fast-forward/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>探究Google的Iterative Web App软件架构</title>
		<link>http://blog.oasisfeng.com/2009/08/04/explore-the-google-iterative-web-app/</link>
		<comments>http://blog.oasisfeng.com/2009/08/04/explore-the-google-iterative-web-app/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 15:00:42 +0000</pubDate>
		<dc:creator>oasisfeng</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Gmail]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Iterative]]></category>
		<category><![CDATA[Labs]]></category>

		<guid isPermaLink="false">http://blog.oasisfeng.com/?p=780</guid>
		<description><![CDATA[Google的软件架构向来是最吸引广大开发者的眼球并被人们乐此不彼的津津乐道，尤其是那些运作在Google最杰出服务背后的软件架构。 Google在2004年“愚人节”推出的Gmail服务可以说是Google众多服务中，除搜索外最杰出的典范之一。Gmail在过去五年多的时间里，也经历了一个持续发展和演进的过程。新功能的推出和用户体验的改善或许是大家谈的最多的，但其底层架构的变迁却并不常常能被用户切实感受到。其实，正是因为Gmail底层架构的不断升级，才支撑其众多新特性和功能的更快开发并上线。 早在2007年10月，Gmail的官方Blog上就曾经发表过一篇关于其架构变迁的文章“Code changes to prepare Gmail for the future”，其中提到： So recently the Gmail team has been working on a structural code change that we&#8217;ll be rolling out to Firefox 2 and IE 7 users over the coming weeks (with other browsers to follow). You won&#8217;t notice too many differences to start with, but we&#8217;re [...]]]></description>
			<content:encoded><![CDATA[<p>Google的软件架构向来是最吸引广大开发者的眼球并被人们乐此不彼的津津乐道，尤其是那些运作在Google最杰出服务背后的软件架构。</p>
<p>Google在2004年“愚人节”推出的Gmail服务可以说是Google众多服务中，除搜索外最杰出的典范之一。Gmail在过去五年多的时间里，也经历了一个持续发展和演进的过程。新功能的推出和用户体验的改善或许是大家谈的最多的，但其底层架构的变迁却并不常常能被用户切实感受到。其实，正是因为Gmail底层架构的不断升级，才支撑其众多新特性和功能的更快开发并上线。</p>
<p>早在2007年10月，Gmail的官方Blog上就曾经发表过一篇关于其架构变迁的文章“<a href="http://gmailblog.blogspot.com/2007/10/code-changes-to-prepare-gmail-for.html">Code changes to prepare Gmail for the future</a>”，其中提到：<br />
<span id="more-780"></span><br />
<blockquote>So recently the Gmail team has been working on a structural code change that we&#8217;ll be rolling out to Firefox 2 and IE 7 users over the coming weeks (with other browsers to follow). You won&#8217;t notice too many differences to start with, but <strong>we&#8217;re using a new model that enables us to <em>iterate</em> faster and share components&#8230;</strong></p></blockquote>
<p>这里第一次公开提到了“Iterate”，表明在新的架构下，Gmail的开发团队开始以<a href="http://en.wikipedia.org/wiki/Iterative_and_incremental_development">迭代的敏捷开发模式</a>进行着Gmail的维护和增量开发。</p>
<p>随后，在2008年5月，Gmail正式推出了一项让人耳目一新的功能，确切的是，是一系列新功能的入口——<a href="http://gmailblog.blogspot.com/2008/06/introducing-gmail-labs.html">“Gmail Labs”</a>。这这里，你可以选择性的激活你所喜欢的新特性，关闭那些对你作用不大或者不好玩的功能。这说明，Gmail此时的底层架构已经过渡到了成熟的模块化和前后端高度整合的程度。“Gmail Labs”可以看作是一个基于模块化架构的“插件平台”，使得新功能和特性可以以插件的形式开发出来，并由用户决定其想要的组合。</p>
<p>在后来2009年3月中<a href="http://gmailblog.blogspot.com/2009/03/gmail-labs-goes-global.html">Gmail官方Blog的一篇文章</a>进一步揭示了“Gmail Labs”的一些内幕：</p>
<blockquote><p><strong>Every time a Gmail user signs in we create a custom version of JavaScript for them based on the Labs features they have enabled.</strong> Since we have 43 Labs right now, there are 243 (~8 trillion) possible versions of the Gmail JavaScript that a user could get. If you account for the 49 languages where Labs are now available, it gets even bigger &#8212; 49 x 243 (~430 trillion) versions. It would obviously be a challenge to actually test all of these versions. But we put a lot of effort into building an architecture that supports this type of modularity, and fortunately, it seems to be working pretty well so far. So we figured, why not, what&#8217;s another another 422 trillion permutations?</p></blockquote>
<p>从中，我们可以看出一些线索：Gmail Labs的插件平台主要负责整合各项Labs插件对系统的改变，包括动态生成及组合影响前端界面呈现的Javascript，（可能）包括在基本处理流程的各环节中嵌入各插件的特殊处理逻辑，类似Filter Pattern。结合Google推出的开源前端框架GWT，猜想Gmail的前端界面渲染上也采取了类似GWT的“容器”+“控件推送”的机制。使得小特性的开发不需要分开完成前端和后端的设计，毕竟大部分Labs特性对前端界面的影响都在一个很有限的范围内，并以“调整”、“嵌入”等简单形式为主。</p>
<p>最近，Google Mobile似乎也走在了底层架构升级的浪尖。在他们<a href="http://www.google.com/search?q=site%3Agooglemobile.blogspot.com+iterative+web+app">官方Blog的最近多篇文章</a>中，都显著的提到了“Iterative App”这个名词，并统一引用了下面一段话作为这个系列中每篇文章的开篇：</p>
<blockquote><p>On April 7th, we announced a new version of Gmail for mobile for iPhone and Android-powered devices. Among the improvements was a complete redesign of the web application&#8217;s underlying code which allows us to more rapidly develop and release new features that users have been asking for, as explained in our first post. We&#8217;d like to introduce <strong>The Iterative Webapp</strong>, a series where we will continue to release features for Gmail for mobile.</p></blockquote>
<p>看来，Gmail for Mobile也吸纳了Gmail桌面版的经验，使用了可迭代开发的底层架构。随后，我们也可以看到，Gmail for Mobile的新特性推出速度确实加快了很多。</p>
<p>综合上面的各种线索，我们不难得出，Google所谓的Iterative Web App，指的是一种对敏捷迭代开发有较高亲和力的软件架构。在传统的层次化、分布化、服务化、易治理的Web架构基础上，Google进一步的将项目管理的因素融入到软件架构之中，形成了一个依靠架构优势保障和推动敏捷开发的新模式，避免了迭代开发理论在项目实践中可能遇到的“空中楼阁”问题。</p>
<p>最后，总结一下“Iterative Web App”架构的几个典型特征：</p>
<ul>
<li>高度模块化和层次化的系统设计，尽可能节省增量开发或插件化开发的重复工作量，确保迭代开发的编码环节是真正“敏捷”的。</li>
<li>高度服务化的系统布局，确保核心数据和功能独立于繁复的特性之外稳定发展。在不断推出新特性并改善现有功能的同时维持核心功能的稳定可靠。</li>
<li>灵活可控的特性容器（平台），支撑多样化的特性开发，并充分隔离各种特性间的相互影响，同时也为敏捷开发的并行性提供可能。</li>
<li>模块化测试和集成测试相结合的测试框架，保障迭代开发模块的完备性和系统的整体可用性。</li>
<li>自动化的部署系统，保障迭代开发的新特性和功能增强可以迅速可控（分阶段、可定制化……）的部署到线上。</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.oasisfeng.com/2009/08/04/explore-the-google-iterative-web-app/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Google Adsense开始针对用户特征投放广告</title>
		<link>http://blog.oasisfeng.com/2009/03/13/google-adsense-delivering-ads-by-character-of-visitor/</link>
		<comments>http://blog.oasisfeng.com/2009/03/13/google-adsense-delivering-ads-by-character-of-visitor/#comments</comments>
		<pubDate>Fri, 13 Mar 2009 05:36:43 +0000</pubDate>
		<dc:creator>oasisfeng</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[Adsense]]></category>
		<category><![CDATA[FriendConnect]]></category>
		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://blog.oasisfeng.com/?p=670</guid>
		<description><![CDATA[今天收到Google Adsense的邮件，得知Adsense网络的一个重大升级——“用户兴趣定位广告”。过去，Google是通过抓取网页内容以确定广告投放的定向性，也就是“以内容定广告”。如今Adsense将要推出的这个新特征将广告投放的定向性进一步深化，达到了“以访客定广告”的效果。这也间接印证了我一直以来的一个忧虑，Google长期以来在通过其服务网络收集用户特征，包括注册和非注册用户。借助cookie和javascript跨站交互，Google可以将其所有的服务网络串联起来，深度跟踪用户在其各类服务中的使用习惯和兴趣。尤其是前段时间推出的Google FriendConnect服务，更是将其触角延伸到Google自己的服务之外，渗入个人Blog和SNS之中。（所以在这一点上，我对 FriendConnect还是有点抵触的……） 看起来，目前Google已经掌握了足够的用户特征，可以正式在其Adsense网络中推出上述针对用户特征的定向广告投放了。对我们这些互联网用户而言，也不知是福是祸…… 还是那句话，别把鸡蛋放在一个篮子里，用户隐私也是一样。]]></description>
			<content:encoded><![CDATA[<p>今天收到Google Adsense的邮件，得知Adsense网络的一个重大升级——“用户兴趣定位广告”。过去，Google是通过抓取网页内容以确定广告投放的定向性，也就是<strong>“以内容定广告”</strong>。如今Adsense将要推出的这个新特征将广告投放的定向性进一步深化，达到了<strong>“以访客定广告”</strong>的效果。这也间接印证了我一直以来的一个忧虑，Google长期以来在通过其服务网络收集用户特征，包括注册和非注册用户。借助cookie和javascript跨站交互，Google可以将其所有的服务网络串联起来，深度跟踪用户在其各类服务中的使用习惯和兴趣。尤其是前段时间推出的Google FriendConnect服务，更是将其触角延伸到Google自己的服务之外，渗入个人Blog和SNS之中。（所以在这一点上，<a href="http://blog.oasisfeng.com/2008/12/17/friendconnect-a-divine-inspiration-by-google/">我对 FriendConnect还是有点抵触的……</a>）</p>
<p>看起来，目前Google已经掌握了足够的用户特征，可以正式在其Adsense网络中推出上述针对用户特征的定向广告投放了。对我们这些互联网用户而言，也不知是福是祸…… 还是那句话，别把鸡蛋放在一个篮子里，用户隐私也是一样。</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.oasisfeng.com/2009/03/13/google-adsense-delivering-ads-by-character-of-visitor/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>FriendConnect——Google的宏伟愿景</title>
		<link>http://blog.oasisfeng.com/2008/12/17/friendconnect-a-divine-inspiration-by-google/</link>
		<comments>http://blog.oasisfeng.com/2008/12/17/friendconnect-a-divine-inspiration-by-google/#comments</comments>
		<pubDate>Tue, 16 Dec 2008 17:58:05 +0000</pubDate>
		<dc:creator>oasisfeng</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[Thinking]]></category>
		<category><![CDATA[FriendConnect]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[SNS]]></category>
		<category><![CDATA[Web2.0]]></category>

		<guid isPermaLink="false">http://blog.oasisfeng.com/?p=555</guid>
		<description><![CDATA[Google是一家技术型公司，和微软一样，并不擅长于“大众行为”的研究，因此，它在IM和SNS两个领域内都不太成功。其实，并不是Google不懂用户体验，但典型的技术型公司都有一个通病，它们不屑于做“一些事情”，而这些事情又恰恰为网络“大众”所好。没办法，所以腾讯能把IM做到盆钵满金，而SNS领域也被一帮后起的小毛孩所瓜分。 Google在过去经常以技术领航者的身份自居，动不动就推一些过于超前的概念，不仅不管它人能否理解，而且还无视商业潜规则。可惜互联网时代没人懂得尊重教父，太多的后辈想要一展拳脚。因此，Open Social遭遇了滑铁卢，几乎没有一家规模的SNS网站买账。想想也能理解，在这个SNS烽火战国的年代，各大诸侯都在忙着把对方的用户挖到自己的墙里来，你却跳出来讲什么“兼爱非攻”…… 如今，Google总算学会了游戏规则，技术当然是永远不可抛弃的根本，也是打天下的利器，但铮亮的枪头得用布给包一下，既要攻其不意，还要锋芒内敛。于是，FriendConnect就在这样的背景下低调登场了。 从技术的角度讲，FriendConnect代表了“网络人际”发展不可阻挡的未来趋势。在过去的几年间，Web 2.0革命性的将个体的“人”从封闭的网站和论坛中解放出来，造就了雨后春笋般涌现的一大批具有独立言论的“博客”。只可惜革命得有点过了火，使得越来越多的个人博客逐渐迷失在自我的孤岛中。这时，那些论坛旧势力背后的贵族趁机见风使舵，披上Web 2.0的外衣，改头换面以“SNS”的全新身份又杀了回来，并且打着“互动”和“开放”的幌子，妄图把刚刚获得自由的个体又重新圈入它们筑起的高墙之中。可怜Google在一旁望着这些被花言巧语骗的跟风盲从的大众，只能兴叹革命的心酸与不易，而到头来胜利果实却被它人收入囊中。 当然，Google也并不全是表面上看起来那么高尚，真正点燃这场SNS大战导火索的其实是那些SNS网站自己。我们别忘了，Google的核心使命（或者说核心利益）是整合全球的信息。就在Google从“人肉搜索”中顿悟后，开始构思它“从整合已发布的静态信息上升到整合人们所要表达的思想”这一战略步伐时，那些不懂事的SNS网站跳出来挡道了。不光挡道，而且还狠狠的扇了Google一记耳光，因为它们开始拒绝向搜索引擎爬虫提供用户发布的信息。你说这还能不惹恼Google么？ 在潜心思索后，Google又重新站了出来，这一次它借FriendConnect的旗帜再度鲜明的指出了“个体”的重要性，强调互联网应当以“人”为中心，而非眼下这些院墙渐深的SNS社区。当然，如果你一定要将它也视作SNS的话，这张网就是整个Internet。FriendConnect试图改变现今SNS的游戏规则，它要在当前零散的星型SNS网络外面编织一个更大的、无所不包的无形的网状SNS。因为用户不必从一个固定中心的SNS网站登入，然后才开始在其中交互；以后，你在访问互联网的任何一个角落时，可能都在这个SNS网络的辐射中，你可以以一种合乎行为习惯的方式随时和朋友交流你正在浏览的内容，而不必像现在这样非得粘贴链接到社区里去，让后续讨论完全与内容的来源脱钩。FriendConnect同时很好的解决Web 2.0的“个体解放”革命中遗留下来的孤岛效应，既肯定了个人的自由和中心地位，又让整个互联网和SNS融为一体，再无森森院墙。 从战术的角度来看，这一着Google也走的相当高明。它不再傻乎乎的与现有的SNS网站正面争抢用户，而是领会了毛泽东军事思想中“农村包围城市”的精髓，采取笼络尚处于游离状态的个人博客（以引入流量为诱饵）的策略，在主流SNS之外的空白地带织网。可以猜想的到，待FriendConnect羽翼渐丰后，下一步Google很可能会以定制Gadget的方式将FriendConnect直接安插进现有的SNS网络（比如推出一个Facebook服务，整合两边的好友），再从中抽丝剥茧，内外夹击的蚕食掉这些自我中心主义的SNS社区。 作为一个开放技术的拥护者，我非常支持Google的FriendConnect。但当忧及隐私问题时，FriendConnect又一次布下了一片遮天蔽日的乌云，这一次甚至让你找不到躲开它的角落。从技术的角度讲，利用Cookie和跨站脚本，任何加入FriendConnect的网站实际上都在不知不觉中被Google利用来作为眼线，从而大大拓展了用户被跟踪的范围。想想Eagle Eye里面所描绘的图景吧，说不定那就是明天的Google。 最后，从引导互联网正确趋势以及力量制衡的角度出发，我还是希望FriendConnect一路走好。但我更希望看到未来一个对等和完全开放的FriendConnect出现，而非由Google来垄断……]]></description>
			<content:encoded><![CDATA[<p>Google是一家技术型公司，和微软一样，并不擅长于“大众行为”的研究，因此，它在IM和SNS两个领域内都不太成功。其实，并不是Google不懂用户体验，但典型的技术型公司都有一个通病，它们不屑于做“一些事情”，而这些事情又恰恰为网络“大众”所好。没办法，所以腾讯能把IM做到盆钵满金，而SNS领域也被一帮后起的小毛孩所瓜分。</p>
<p>Google在过去经常以技术领航者的身份自居，动不动就推一些过于超前的概念，不仅不管它人能否理解，而且还无视商业潜规则。可惜互联网时代没人懂得尊重教父，太多的后辈想要一展拳脚。因此，Open Social遭遇了滑铁卢，几乎没有一家规模的SNS网站买账。想想也能理解，在这个SNS烽火战国的年代，各大诸侯都在忙着把对方的用户挖到自己的墙里来，你却跳出来讲什么“兼爱非攻”……</p>
<p>如今，Google总算学会了游戏规则，技术当然是永远不可抛弃的根本，也是打天下的利器，但铮亮的枪头得用布给包一下，既要攻其不意，还要锋芒内敛。于是，FriendConnect就在这样的背景下低调登场了。</p>
<p>从技术的角度讲，FriendConnect代表了“网络人际”发展不可阻挡的未来趋势。在过去的几年间，Web 2.0革命性的将个体的“人”从封闭的网站和论坛中解放出来，造就了雨后春笋般涌现的一大批具有独立言论的“博客”。只可惜革命得有点过了火，使得越来越多的个人博客逐渐迷失在自我的孤岛中。这时，那些论坛旧势力背后的贵族趁机见风使舵，披上Web 2.0的外衣，改头换面以“SNS”的全新身份又杀了回来，并且打着“互动”和“开放”的幌子，妄图把刚刚获得自由的个体又重新圈入它们筑起的高墙之中。可怜Google在一旁望着这些被花言巧语骗的跟风盲从的大众，只能兴叹革命的心酸与不易，而到头来胜利果实却被它人收入囊中。</p>
<p>当然，Google也并不全是表面上看起来那么高尚，真正点燃这场SNS大战导火索的其实是那些SNS网站自己。我们别忘了，Google的核心使命（或者说核心利益）是整合全球的信息。就在Google从“人肉搜索”中顿悟后，开始构思它“从整合已发布的静态信息上升到整合人们所要表达的思想”这一战略步伐时，那些不懂事的SNS网站跳出来挡道了。不光挡道，而且还狠狠的扇了Google一记耳光，因为它们开始拒绝向搜索引擎爬虫提供用户发布的信息。你说这还能不惹恼Google么？</p>
<p>在潜心思索后，Google又重新站了出来，这一次它借FriendConnect的旗帜再度鲜明的指出了“个体”的重要性，强调互联网应当以“人”为中心，而非眼下这些院墙渐深的SNS社区。当然，如果你一定要将它也视作SNS的话，这张网就是整个Internet。FriendConnect试图改变现今SNS的游戏规则，它要在当前零散的星型SNS网络外面编织一个更大的、无所不包的无形的网状SNS。因为用户不必从一个固定中心的SNS网站登入，然后才开始在其中交互；以后，你在访问互联网的任何一个角落时，可能都在这个SNS网络的辐射中，你可以以一种合乎行为习惯的方式随时和朋友交流你正在浏览的内容，而不必像现在这样非得粘贴链接到社区里去，让后续讨论完全与内容的来源脱钩。FriendConnect同时很好的解决Web 2.0的“个体解放”革命中遗留下来的孤岛效应，既肯定了个人的自由和中心地位，又让整个互联网和SNS融为一体，再无森森院墙。</p>
<p>从战术的角度来看，这一着Google也走的相当高明。它不再傻乎乎的与现有的SNS网站正面争抢用户，而是领会了毛泽东军事思想中“农村包围城市”的精髓，采取笼络尚处于游离状态的个人博客（以引入流量为诱饵）的策略，在主流SNS之外的空白地带织网。可以猜想的到，待FriendConnect羽翼渐丰后，下一步Google很可能会以定制Gadget的方式将FriendConnect直接安插进现有的SNS网络（比如推出一个Facebook服务，整合两边的好友），再从中抽丝剥茧，内外夹击的蚕食掉这些自我中心主义的SNS社区。</p>
<p>作为一个开放技术的拥护者，我非常支持Google的FriendConnect。但当忧及隐私问题时，FriendConnect又一次布下了一片遮天蔽日的乌云，这一次甚至让你找不到躲开它的角落。从技术的角度讲，利用Cookie和跨站脚本，任何加入FriendConnect的网站实际上都在不知不觉中被Google利用来作为眼线，从而大大拓展了用户被跟踪的范围。想想Eagle Eye里面所描绘的图景吧，说不定那就是明天的Google。</p>
<p>最后，从引导互联网正确趋势以及力量制衡的角度出发，我还是希望FriendConnect一路走好。但我更希望看到未来一个对等和完全开放的FriendConnect出现，而非由Google来垄断……</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.oasisfeng.com/2008/12/17/friendconnect-a-divine-inspiration-by-google/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>谁家语言将成为Google App Engine的下一个宠儿？</title>
		<link>http://blog.oasisfeng.com/2008/10/24/which-language-will-be-the-next-favor-of-google-app-engine/</link>
		<comments>http://blog.oasisfeng.com/2008/10/24/which-language-will-be-the-next-favor-of-google-app-engine/#comments</comments>
		<pubDate>Fri, 24 Oct 2008 02:01:11 +0000</pubDate>
		<dc:creator>oasisfeng</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[GAE]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Health]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://blog.oasisfeng.com/?p=519</guid>
		<description><![CDATA[Google App Engine Roadmap 10/08 &#8211; 3/09 * Service for storing and serving large files * Datastore import and export utility for large datasets * Billing: developers can pay for more resource usage * Support for a new runtime language * Uptime monitoring site 顺便看看社区的民意，Java、PHP和Ruby名列三甲！ 从技术角度来讲，PHP和Ruby应该较Java在现阶段更易于实现；但从业界支持的角度来看，Java占据了企业级应用的主流，而PHP代表着Web开源社区的倾向，似乎是两难的选择呀；纯粹从语言本身来看，Java应该更适合Google的战略布局。 这个语言想必Google内部早已有了定论，并且已在紧锣密鼓的赶工中，留给大家YY也不会改变任何东西了。虽然从感情上更倾向于Java，但我还是认为PHP的可能性最大。]]></description>
			<content:encoded><![CDATA[<p><a href="http://code.google.com/appengine/docs/roadmap.html">Google App Engine Roadmap</a></p>
<blockquote><p>10/08 &#8211; 3/09</p>
<p>    * Service for storing and serving large files<br />
    * Datastore import and export utility for large datasets<br />
    * Billing: developers can pay for more resource usage<br />
    * <strong>Support for a new runtime language</strong><br />
    * Uptime monitoring site
</p></blockquote>
<p>顺便看看<a href="http://code.google.com/p/googleappengine/issues/list">社区的民意</a>，Java、PHP和Ruby名列三甲！</p>
<p>从技术角度来讲，PHP和Ruby应该较Java在现阶段更易于实现；但从业界支持的角度来看，Java占据了企业级应用的主流，而PHP代表着Web开源社区的倾向，似乎是两难的选择呀；纯粹从语言本身来看，Java应该更适合Google的战略布局。</p>
<p>这个语言想必Google内部早已有了定论，并且已在紧锣密鼓的赶工中，留给大家YY也不会改变任何东西了。虽然从感情上更倾向于Java，但我还是认为PHP的可能性最大。</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.oasisfeng.com/2008/10/24/which-language-will-be-the-next-favor-of-google-app-engine/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Symbian S60下的Google日历同步工具——CalSync</title>
		<link>http://blog.oasisfeng.com/2008/10/22/sync-google-calendar-on-symbian-s60-by-calsync/</link>
		<comments>http://blog.oasisfeng.com/2008/10/22/sync-google-calendar-on-symbian-s60-by-calsync/#comments</comments>
		<pubDate>Tue, 21 Oct 2008 16:50:07 +0000</pubDate>
		<dc:creator>oasisfeng</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[Mobile]]></category>
		<category><![CDATA[Calendar]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[S60]]></category>
		<category><![CDATA[Symbian]]></category>

		<guid isPermaLink="false">http://blog.oasisfeng.com/?p=512</guid>
		<description><![CDATA[http://s60addons.com/calsync/ 虽然仍在beta阶段，但也比以前用的GooSync.com要强多了，至少Todo List可以被正常同步。而且比用SyncML协议的速度要快那么一点。 但据我测试，仍然有bug：在手机上删除的Todo条目似乎不能正常同步删除Google日历中的条目，等再次同步时，CalSync又会将它同步下来，并且变成一个全天的备忘事项。]]></description>
			<content:encoded><![CDATA[<p><a href="http://s60addons.com/calsync/">http://s60addons.com/calsync/</a></p>
<p>虽然仍在beta阶段，但也比以前用的GooSync.com要强多了，至少Todo List可以被正常同步。而且比用SyncML协议的速度要快那么一点。</p>
<p>但据我测试，仍然有bug：在手机上删除的Todo条目似乎不能正常同步删除Google日历中的条目，等再次同步时，CalSync又会将它同步下来，并且变成一个全天的备忘事项。</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.oasisfeng.com/2008/10/22/sync-google-calendar-on-symbian-s60-by-calsync/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.622 seconds -->
