<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description>Avid Pythonista with a secret love for Erlang.</description><title>Harish Mallipeddi</title><generator>Tumblr (3.0; @mallipeddi)</generator><link>http://blog.poundbang.in/</link><item><title>http://www.infoq.com/presentations/Facebook-Hive-Hadoop</title><description>&lt;a href="http://www.infoq.com/presentations/Facebook-Hive-Hadoop"&gt;http://www.infoq.com/presentations/Facebook-Hive-Hadoop&lt;/a&gt;: &lt;p&gt;Informative talk by Ashish Thusoo and Namit Jain from Facebook’s Hive team.&lt;/p&gt;

&lt;p&gt;Hive’s &lt;a href="http://hadoop.apache.org/hive/docs/r0.4.0/api/org/apache/hadoop/hive/ql/io/RCFile.html"&gt;RCFile&lt;/a&gt; is pretty interesting - it provides record-columnar storage on top of HDFS. Apparently it results in &lt;a href="http://www.docstoc.com/docs/24403843/DataTeam-Presentation_2009-12-11_-V02"&gt;very good compression&lt;/a&gt; and higher scan throughput.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/410837147</link><guid>http://blog.poundbang.in/post/410837147</guid><pubDate>Thu, 25 Feb 2010 14:08:44 +0530</pubDate></item><item><title>http://www.dabeaz.com/GIL/</title><description>&lt;a href="http://www.dabeaz.com/GIL/"&gt;http://www.dabeaz.com/GIL/&lt;/a&gt;: &lt;p&gt;David Beazley continues his GIL investigation work from his ChiPy ‘09 talk. This time he also analyses the new GIL optimizations that are apparently present in the Python 3.2 svn branch. Look at his slides to see how the new GIL performs!&lt;/p&gt;

&lt;p&gt;It took surprisingly little amount of code changes to the Python interpreter to do this kind of investigative work (&lt;a href="http://www.dabeaz.com/blog/2009/08/inside-inside-python-gil-presentation.html"&gt;Read more&lt;/a&gt; about his code changes). He maintains a bunch of counters for each time &lt;code&gt;t&lt;/code&gt;, and then logs them at the end to a file. He then renders this data in a timeline for better understanding - the output of this step is some humungous PNG file. So then he splits the PNG image, tiles it up, and then uses a Google Maps kinda interface to view the tiles, zoom in/out etc.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/404665437</link><guid>http://blog.poundbang.in/post/404665437</guid><pubDate>Mon, 22 Feb 2010 15:31:02 +0530</pubDate></item><item><title>"The spot price of MicroSD cards is nearly identical to the spot price of the very same NAND FLASH..."</title><description>“The spot price of MicroSD cards is nearly identical to the spot price of the very same NAND FLASH chips used on the inside. In other words, the extra controller IC inside the microSD card is sold to you “for free”. Incorporating the controller into the package and having it test, manage and mark bad blocks more than offsets the cost of testing each memory chip individually. A full bad block scan can take a long time on a large FLASH IC, and chip testers cost millions of dollars. Therefore, the amortized cost per chip for test alone can be comparable to the cost of silicon itself.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;p&gt;&lt;a href="http://www.bunniestudios.com/blog/?p=918"&gt;On MicroSD Problems «  bunnie’s blog&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fascinating article on the MicroSD industry and the economics involved in manufacturing them by a Chumby engineer. He has a whole series of articles about hardware manufacturing and China.&lt;/p&gt;&lt;/em&gt;</description><link>http://blog.poundbang.in/post/396969746</link><guid>http://blog.poundbang.in/post/396969746</guid><pubDate>Fri, 19 Feb 2010 00:30:32 +0530</pubDate></item><item><title>http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/</title><description>&lt;a href="http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/"&gt;http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/&lt;/a&gt;: &lt;p&gt;Nice article providing a quick intro to RabbitMQ/AMQP terminology.&lt;/p&gt;

&lt;p&gt;Summary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Virtual hosts - one RabbitMQ server can host multiple virtual hosts (each virtual host maps to one user process). One virtual host can contain multiple exchanges.&lt;/li&gt;
&lt;li&gt;Queues - queues hold messages which consumers can consume.&lt;/li&gt;
&lt;li&gt;Routing keys - each message has a routing key.&lt;/li&gt;
&lt;li&gt;Exchanges - producers drop off messages at exchanges. Queues are registered with the exchanges.&lt;/li&gt;
&lt;li&gt;Bindings - bindings are routes configured on the exchange to let the exchange know how to route messages with routing keys to queues (a message can be routed to multiple queues aka one routing key can match multiple bindings/routes).&lt;/li&gt;
&lt;/ul&gt;</description><link>http://blog.poundbang.in/post/396214301</link><guid>http://blog.poundbang.in/post/396214301</guid><pubDate>Thu, 18 Feb 2010 12:41:00 +0530</pubDate></item><item><title>"REPLACE INTO Tickets64 (stub) VALUES (‘a’);
SELECT LAST_INSERT_ID();"</title><description>“REPLACE INTO Tickets64 (stub) VALUES (‘a’);
SELECT LAST_INSERT_ID();”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;p&gt;&lt;a href="http://laughingmeme.org/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/"&gt;  Ticket Servers: Distributed Unique Primary Keys on the Cheap -  Laughing Meme &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kellan describes the hack they use at Flickr to (atomically) generate globally unique auto-increment ids across shards. Redis would be great for something like this.&lt;/p&gt;&lt;/em&gt;</description><link>http://blog.poundbang.in/post/380296091</link><guid>http://blog.poundbang.in/post/380296091</guid><pubDate>Wed, 10 Feb 2010 00:01:41 +0530</pubDate></item><item><title>http://www.techcrunch.com/2010/01/14/next-jump/</title><description>&lt;a href="http://www.techcrunch.com/2010/01/14/next-jump/"&gt;http://www.techcrunch.com/2010/01/14/next-jump/&lt;/a&gt;: &lt;p&gt;Interesting read on NextJump’s business model. Amidst an endless list of me-too comparison shopping sites, this one has a completely different approach. Because of the way their business works, they got their hands on 3 valuable data sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;consumer demographic database - they’re considered a non-traditional benefits provider and hence they get access to the employment status of 30 million workers (who also happen to be consumers). It gets part of the employee record and sometimes even job title or salary grade.&lt;/li&gt;
&lt;li&gt;transactional data - from merchants, credit card companies&lt;/li&gt;
&lt;li&gt;consumer preference data - because it is seen as an employee perk, HR departments inside companies and the individuals themselves are willing to engage in a level of preference data sharing that has not been seen in e-commerce before. The customer preference data allows for better targeting and ultimately superior conversions (around 10-11% vs 2% for the best commerce sites today)&lt;/li&gt;
&lt;/ul&gt;</description><link>http://blog.poundbang.in/post/337040692</link><guid>http://blog.poundbang.in/post/337040692</guid><pubDate>Sat, 16 Jan 2010 12:23:35 +0530</pubDate></item><item><title>http://maglevity.wordpress.com/2009/12/17/kd-trees-and-maglev/</title><description>&lt;a href="http://maglevity.wordpress.com/2009/12/17/kd-trees-and-maglev/"&gt;http://maglevity.wordpress.com/2009/12/17/kd-trees-and-maglev/&lt;/a&gt;: &lt;p&gt;Interesting post on building a KD-tree in MagLev to answer nearest neighbor searches.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/336993764</link><guid>http://blog.poundbang.in/post/336993764</guid><pubDate>Sat, 16 Jan 2010 11:47:04 +0530</pubDate></item><item><title>Richard Hamming's "You and your research"</title><description>&lt;a href="http://www.cs.virginia.edu/~robins/YouAndYourResearch.html"&gt;Richard Hamming's "You and your research"&lt;/a&gt;: &lt;p&gt;Richard Hamming (of Bell Labs) gave a talk in 1986 on how to do great research as an individual. I particularly liked one of his comments during Q&amp;A about &lt;em&gt;reading&lt;/em&gt; - reading is important but read only as much as is required to understand the problems in the field but not to find the solutions.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/314920614</link><guid>http://blog.poundbang.in/post/314920614</guid><pubDate>Sun, 03 Jan 2010 23:57:27 +0530</pubDate></item><item><title>Hard drives with 4K sectors instead of current 512B sectors</title><description>&lt;a href="http://www.anandtech.com/storage/showdoc.aspx?i=3691"&gt;Hard drives with 4K sectors instead of current 512B sectors&lt;/a&gt;: &lt;p&gt;AnandTech explains why Western Digital is manufacturing hard drives with larger sectors (4K) instead of the current standard 512B sectors.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/291962644</link><guid>http://blog.poundbang.in/post/291962644</guid><pubDate>Sun, 20 Dec 2009 23:03:05 +0530</pubDate></item><item><title>Last.fm uses MogileFS with SSDs</title><description>&lt;a href="http://blog.last.fm/2009/12/14/launching-xbox-part-2-ssd-streaming"&gt;Last.fm uses MogileFS with SSDs&lt;/a&gt;: &lt;p&gt;Summary:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;They used SSDs to store &lt;em&gt;hot songs&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;They picked the simple FIFO noop scheduler in the Linux I/O subsystem. For more info on the different schedulers, read Redhat’s &lt;a href="http://www.redhat.com/docs/wp/performancetuning/iotuning/index.html"&gt;IO Tuning Guide&lt;/a&gt;. They also set &lt;code&gt;read_ahead_kb&lt;/code&gt; to a very low value since seek times are so low for SSDs anyway.&lt;/li&gt;
&lt;li&gt;Modified MogileFS to differentiate between storage nodes with SSDs and nodes with regular HDDs.&lt;/li&gt;
&lt;li&gt;1 SSD =&gt; 7000 requests vs 1 7200rpm SATA =&gt; 300 requests.&lt;/li&gt;
&lt;/ol&gt;</description><link>http://blog.poundbang.in/post/284535023</link><guid>http://blog.poundbang.in/post/284535023</guid><pubDate>Tue, 15 Dec 2009 16:22:00 +0530</pubDate></item><item><title>http://www.sriramkrishnan.com/blog/2009/12/stuff-ive-learned-at-microsoft.html</title><description>&lt;a href="http://www.sriramkrishnan.com/blog/2009/12/stuff-ive-learned-at-microsoft.html"&gt;http://www.sriramkrishnan.com/blog/2009/12/stuff-ive-learned-at-microsoft.html&lt;/a&gt;: &lt;p&gt;Do’s and don’ts for programmers (not the technical stuff but the soft skills, behavior, attitude at work) by Sriram who works for Microsoft’s Azure team. I’ve already experienced and observed several things from his list since I started working for this other big software company.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/281585838</link><guid>http://blog.poundbang.in/post/281585838</guid><pubDate>Sun, 13 Dec 2009 17:26:08 +0530</pubDate></item><item><title>How node.js exposes traditional blocking I/O calls in a non-blocking manner</title><description>&lt;p&gt;If you’ve not heard about &lt;a href="http://nodejs.org"&gt;node.js&lt;/a&gt;, do watch the video from a talk given by Ryan Dahl (project lead) at JSConfEU 2009.&lt;/p&gt;

&lt;p&gt;&lt;embed src="http://blip.tv/play/AYGylE4C" type="application/x-shockwave-flash" width="480" height="300" allowscriptaccess="always" allowfullscreen="true"&gt;&lt;/embed&gt;&lt;/p&gt;

&lt;p&gt;The thing which I found most interesting is how he handled the shortcomings of libraries like eventmachine. eventmachine is great for building highly-scalable single-threaded network servers using async network I/O. But while writing your network servers, you’ve to be careful not to introduce some blocking calls in your event callbacks. Since your server is single-threaded, if you block inside the event callback then you’re basically not handling any of the other incoming requests during the time that the thread was blocked. Why would you need to do blocking calls? Your MySQL client library will typically do blocking I/O. POSIX filesystem calls are blocking. There are many other such instances where the usefulness of eventmachine is reduced. So in node.js, Ryan has decided to wrap away these blocking APIs and expose them in Javascript in a non-blocking manner - basically a batteries-included arsenal of non-blocking APIs in one package if you will. Take a look at the node.js &lt;a href="http://nodejs.org/api.html"&gt;API docs&lt;/a&gt; to get an idea of the kinds of things that are supported.&lt;/p&gt;

&lt;p&gt;The way node.js works underneath is that it offloads all the blocking calls to a thread-pool and when the calls finish, the results are fetched by reading off a pipe to which the threads write to. Incidentally pipes are select()-able which means these events can be handled within the main event loop of node.js in exactly the same manner as the other network socket events. I quickly implemented a barebones proof-of-concept of this idea in Python:&lt;/p&gt;

&lt;script src="http://gist.github.com/248184.js?file=gistfile1.py"&gt;&lt;/script&gt;&lt;p&gt;Note - if you’re reading this via a RSS reader, open this in a browser to see the gist code embed above.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/267934877</link><guid>http://blog.poundbang.in/post/267934877</guid><pubDate>Fri, 04 Dec 2009 00:20:45 +0530</pubDate></item><item><title>Tumblr backup tool in 40 lines of Clojure</title><description>&lt;p&gt;The code doesn’t necessarily do a lot of error handling.&lt;/p&gt;

&lt;script src="http://gist.github.com/246242.js?file=tumblore.clj"&gt;&lt;/script&gt;&lt;p&gt;Note - if you’re reading this via a RSS reader, this blog post contains a Gist embed; open the page in your browser.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/264841046</link><guid>http://blog.poundbang.in/post/264841046</guid><pubDate>Tue, 01 Dec 2009 18:04:05 +0530</pubDate></item><item><title>Last.fm interview with CNET</title><description>&lt;a href="http://crave.cnet.co.uk/digitalmusic/0,39029432,49304380-1,00.htm"&gt;Last.fm interview with CNET&lt;/a&gt;: &lt;p&gt;What I found interesting - they use SSDs for serving out popular radio streams and regular HDDs for the rest of their streams. Streaming is done via regular Linux boxes running MogileFS.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/264734469</link><guid>http://blog.poundbang.in/post/264734469</guid><pubDate>Tue, 01 Dec 2009 15:26:14 +0530</pubDate></item><item><title>sniffer - Erlang NIF example</title><description>&lt;a href="http://jonsbraindump.blogspot.com/2009/11/using-nif-for-e-v-i-l.html"&gt;sniffer - Erlang NIF example&lt;/a&gt;: &lt;p&gt;Uses NIFs in otp-R13B03 to grab a binary at a specific memory location.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/258100691</link><guid>http://blog.poundbang.in/post/258100691</guid><pubDate>Thu, 26 Nov 2009 15:21:53 +0530</pubDate></item><item><title>Riak's dets backend - too many files open error</title><description>&lt;p&gt;I was playing with Riak yesterday and ran into this error on my Macbook running Snow Leopard:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;
{
 {badmatch,{error,{{badmatch,{error,
 {file_error,"./dets-store/1392993748081016843912887106182707253109560705024",emfile}}},
                    [   {riak_vnode,init,1},
                        {gen_server2,init_it,6},
                        {proc_lib,init_p_do_apply,3}    ]}}
 },
 [{riak_vnode_master,get_vnode,2},
  {riak_vnode_master,handle_cast,2},
  {gen_server,handle_msg,5},
  {proc_lib,init_p_do_apply,3}]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When using the dets storage backend, Riak seems to create/open a dets database file for each vnode (partition) in your ring. When there’s only one node in your cluster, I’m guessing all the vnodes/partitions are owned by this node which results in a whole bunch of files being opened (under the dets-store folder you configured). In Snow Leopard, I’d to do &lt;code&gt;ulimit -n 8192&lt;/code&gt; to increase the limit on the no. of fds a process could take. You probably won’t notice this normally - I increased the partition size in the config file and hence ran into this problem.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/250733521</link><guid>http://blog.poundbang.in/post/250733521</guid><pubDate>Fri, 20 Nov 2009 17:31:50 +0530</pubDate></item><item><title>When Linux Runs Out of Memory - O'Reilly Media</title><description>&lt;a href="http://linuxdevcenter.com/pub/a/linux/2006/11/30/linux-out-of-memory.html"&gt;When Linux Runs Out of Memory - O'Reilly Media&lt;/a&gt;: &lt;p&gt;Good article explaining Linux’s default overcommit-ing behavior (/proc/sys/vm/overcommit_memory) and the OOM killer.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/231698491</link><guid>http://blog.poundbang.in/post/231698491</guid><pubDate>Tue, 03 Nov 2009 14:57:44 +0530</pubDate></item><item><title>Hadoop at Yahoo!</title><description>&lt;a href="http://www.slideshare.net/yhadoop/hadoop-at-yahoo-university-talks"&gt;Hadoop at Yahoo!&lt;/a&gt;: &lt;p&gt;Slides from Eric14’s talk at UIUC.&lt;/p&gt;</description><link>http://blog.poundbang.in/post/227692543</link><guid>http://blog.poundbang.in/post/227692543</guid><pubDate>Fri, 30 Oct 2009 11:33:34 +0530</pubDate></item><item><title>GitHub's new architecture</title><description>&lt;a href="http://github.com/blog/530-how-we-made-github-fast"&gt;GitHub's new architecture&lt;/a&gt;: &lt;p&gt;Very well written article highlighting the new architecture of GitHub (post-Rackspace move). Every company/team should at least have a similar internal document describing their current architecture at a high-level but unfortunately doesn’t happen in most companies.&lt;/p&gt;

&lt;p&gt;GitHub seems to be using a lot of familiar Opensource projects in their stack - Nginx, Ruby/Rails/Unicorn, HAProxy, MySQL, memcached, redis, bunch of RPC services speaking the new BERT/BERT-RPC protocol. The only thing I’ve not really played with yet is DRBD and I’ve heard lots of good things about DRBD from others.&lt;/p&gt;

&lt;p&gt;So looks like now you’ve Facebook’s Thrift, Google’s Protocol Buffers, Hadoop/Doug Cutting’s Avro, just plain JSON/XML and GitHub’s BERT for your (de)serialization needs!&lt;/p&gt;</description><link>http://blog.poundbang.in/post/219335777</link><guid>http://blog.poundbang.in/post/219335777</guid><pubDate>Thu, 22 Oct 2009 02:10:31 +0530</pubDate></item><item><title>Speaking at FOSS.MY 2009</title><description>&lt;p&gt;I’ll be speaking about Hadoop at &lt;a href="http://foss.my/"&gt;FOSS.MY 2009&lt;/a&gt; in KL. They’ve an interesting bunch of speakers this year including Brain Aker, David Axmark, RMS. This will be my first time to FOSS.MY!&lt;/p&gt;

&lt;p&gt;If you’re attending, do drop by my &lt;a href="http://foss.my/2009/schedule/"&gt;talk on Sunday afternoon&lt;/a&gt; :)&lt;/p&gt;</description><link>http://blog.poundbang.in/post/217412745</link><guid>http://blog.poundbang.in/post/217412745</guid><pubDate>Tue, 20 Oct 2009 01:45:25 +0530</pubDate></item></channel></rss>
