<?xml version="1.0" encoding="UTF-8" ?>

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
   <title>RealJenius.com</title>
   <link>http://realjenius.com</link>
   <description>I'm a software developer in the game industry, and have been (for better or worse) coding on the Java platform for the last decade. I also do all my own stunts.</description>
   <language>en-us</language>
   <managingEditor>R.J. Lorimer</managingEditor>
   <atom:link href="rss" rel="self" type="application/rss+xml" />
      <item>
      <title>Micro Framework Plugin Architecture with Guice Multibindings</title>
      <link>http://www.realjenius.com/2012/01/03/guice-multibindings</link>
      <author>R.J. Lorimer</author>
      <pubDate>January 03, 2012</pubDate>
      <guid>http://www.realjenius.com/2012/01/03/guice-multibindings</guid>
      <description><![CDATA[        
<p>
    Google Guice is a snazzy way to avoid all of the declarative configuration file noise and cruft, while still getting 
    the modularity and de-coupling that you want from dependency-injection. Unlike Spring, Guice doesn't ship with the 
    kitchen sink. When I first picked Guice up, I wasn't sure how (or if it was even possible out of the box) to do what 
    I intended, which was to collect up all implementations of a particular type via injection. This is something that 
    can be done probably in thirty five distinct ways in Spring (all of which require about fifteen XML files, if memory 
    serves), so I was struggling to find the answer.
</p>
<p>
    The answer, in fact, is straightforward: 
    <a href="http://code.google.com/p/google-guice/wiki/Multibindings"><strong>Multibindings</strong></a>.
</p>
<p>
    What is particularly neat about multi-bindings (aside from the fact they can inject a set of objects) is that they
    will accept bindings from multiple modules into the final aggregate set that is injected. Both sets and maps can be injected in this way.
    This example will use the MapBinder, as the base Guice example for Sets is fairly straightforward in its own right.
    Consider a music decoding application, for example. You might have simple decoder API that looks like this:
</p>
<pre class="brush: java">
public interface AudioDecoder {
    String getAudioTypeName();
    void decode(InputStream encodedIn, OutputStream pcmOut);
}

</pre><p>
    Individual modules can inject their own audio decoding algorithm:
</p>
<pre class="brush: java">
public class Mp3AudioDecoder {
  public String getAudioTypeName() { &quot;MP3&quot;; }
  public void decode(InputStream mp3In, OutputStream pcmOut) {
    // Run through LAME (or similar Fraunhofer) decoding here.
  }
}

// ...

public class Mp3AudioModule extends AbstractModule {
  public void configure() {
    Multibinder&lt;String,AudioDecoder&gt; decoderBinder
      = MapBinder.newMapBinder(binder(), AudioDecoder.class);
      decoderBinder.addBinding(&quot;mp3&quot;).to(Mp3AudioDecoder.class);
  }
}

</pre><p>
    What this does is register a binding with the multibinder from <code>AudioDecoder</code> to <code>Mp3AudioDecoder</code>;
    effectively registering that type as part of the total set of audio decoders. Consuming these via injection requires no special
    sauce; simply declaring you want to receive the map is all that is required:
</p>
<pre class="brush: java">
public class RootModule extends AbstractModule {
  public void configure() {
    bind(AudioDecodingThingy.class);
  }
}
        
public class AudioDecodingThingy {

  private final Map&lt;String,AudioDecoder&gt; decoders;
    
  @Inject
  public AudioDecodingThingy(Map&lt;String,AudioDecoder&gt; decoders) {
    this.decoders = decoders;
  }

  public void run(String inputFile, String outputFile) {
    // Basic error handling.
    File in = new File(inputFile);
    File out = new File(outputFile);
    String extension = getExtension(inputFile);

    if(!in.exists()) throw new IllegalArgumentException(&quot;File &quot; + inputFile + &quot; not found.&quot;);
    if(!decoders.containsKey(extension)) throw new IllegalArgumentException(&quot;No decoder found for extension: &quot; + extension);
    if(!out.exists() &amp;&amp; !out.createNewFile()) throw new IllegalArgumentException(&quot;Unable to create output file: &quot; + outputFile);

    AudioDecoder decoder = decoders.get(extension);

    // Do the decoding.
    try(
      InputStream in = new FileInputStream(in);
      OutputStream out = new FileOutputStream(out)) {
        decoder.decode(in, out);
    }
  }
}
        
public class Main {
  public static void main(String[] args) {
    Injector i = Guice.createInjector(
      new RootModule(), new Mp3Module(), new OggModule());

    AudioDecodingThingy thingy
      = i.getInstance(AudioDecodingThingy.class);

    thingy.run(args[0], args[1]);
  }
}

</pre><p>
    While this particular example (and the example on the Guice site) have the modules to snap-in defined directly in the code,
    it is not a stretch of the imagination to envision the possible modules to install coming from:
</p>
<ul>
    <li>A configuration file</li>
    <li>The built-in JAR service-provider and ServiceLoader facilities.</li>
    <li>A scan for all particular annotated types; something like <code>@PluginModule</code></li>
</ul>
<p>As a final disclaimer (one that is reiterated on the Guice site) Multibindings are not a replacement for a full
modular architecture, like that which can be achieved with OSGi (in fact, Guice has
    <a href="http://code.google.com/p/google-guice/wiki/OSGi">support for OSGi</a> as well). However, sometimes 
    OSGi can be a power-drill, when sometimes all you need is a plain ol' screwdriver.
</p>
<p>
    Multibindings, which are an extension to Guice, are shipped as a separate integration JAR file. All of the official 
    extensions are <a href="http://mvnrepository.com/artifact/com.google.inject.extensions">available in the core Maven 
    repositories</a> under <code>com.google.inject.extensions</code>.
</p>]]></description>
   </item>
   <item>
      <title>Google Guava and Multimaps</title>
      <link>http://www.realjenius.com/2011/12/22/guava-multimaps</link>
      <author>R.J. Lorimer</author>
      <pubDate>December 22, 2011</pubDate>
      <guid>http://www.realjenius.com/2011/12/22/guava-multimaps</guid>
      <description><![CDATA[
<p>
It's not uncommon in Java to build some sort of in-memory registry that contains a map with a list of items at each position. Often, these implementations look something like this (excluding concurrency details for brevity):
</p>

<pre class="brush: java">
private Map&lt;String,List&lt;Something&gt;&gt; stuff = new HashMap&lt;String,List&lt;Something&gt;&gt;();

// ...

public void add(String key, Something item) {
  if(!stuff.containsKey(key)) {
    stuff.put(key, new ArrayList&lt;Something&gt;());
  }
  stuff.get(key).add(item);
}

public List&lt;Something&gt; get(String key) {
  return new ArrayList&lt;Something&gt;(stuff.get(key));
}

</pre>
<p>
<a href="http://guava-libraries.googlecode.com">Google's Guava Libraries</a> provide a few collections to help with this common case: <code>com.google.common.collect.Multimap</code>, and the more specific variants <code>ListMultimap</code>, <code>SetMultimap</code>, and <code>SortedSetMultimap</code>.

The above code can be re-written with Guava like this:
</p>

<pre class="brush: java">
private ListMultimap&lt;String,Something&gt; stuff = ArrayListMultimap.create();

// ...

public void add(String key, Something item) {
  stuff.put(key, item);
}

public List&lt;Something&gt; get(String key) {
  // might as well use the Lists convenience API while we're at it.
  return Lists.newArrayList(stuff.get(key));
}

</pre>
<p>
The multi-map has a variety of fancy features that can be used as well. Here are just a few examples:
</p>

<pre class="brush: java">
// returns a composite of all values from all entries.
Collection&lt;Something&gt; allSomethings = stuff.values(); 

// A more traditional map that can be edited.
Map&lt;String, Collection&lt;Something&gt;&gt; mapView = stuff.asMap();

// remove an individual entry for a key.
boolean removed = stuff.remove(key, someVal);

// remove all for a key
List&lt;Something&gt; removedSomethings = stuff.remove(key);

// One for each value in the map. Updating this collection updates the map.
Collection&lt;Map.Entry&lt;String,Something&gt;&gt; allEntries = stuff.entries();

</pre>
<p>
All of the collections returned by the various API are views of the multimap. This can make it particularly easy to work with the map in a variety of ways. It does mean you should probably 
perform defensive copying anywhere you might be exposing these APIs (generally good practice in most cases, anyway).
</p>]]></description>
   </item>
   <item>
      <title>Spark: Sinatra Goes Verbose</title>
      <link>http://www.realjenius.com/2011/08/02/spark-sinatra-goes-verbose</link>
      <author>R.J. Lorimer</author>
      <pubDate>August 02, 2011</pubDate>
      <guid>http://www.realjenius.com/2011/08/02/spark-sinatra-goes-verbose</guid>
      <description><![CDATA[<p>It would seem that the prevailing wisdom in the upper echelon of alpha-geeks is that Java, as a language, is no longer generally viable for effective coding of... well frankly, anything. Where-as five years ago, everyone working on the JVM was looking for the next-best-Java-framework, it seems the focus has switched to finding the next-best-language instead. Scala, JRuby, Clojure, Groovy, and a whole host of other JVM-based languages (there are several new contenders like Kotlin, Ceylon, Stab, Gosu, and Mirah) have begun making news for their vastly superior language features that promise to dramatically improve the coding experience, and ideally, improve the code itself.</p> 

<p>It's no surprise Java developers are trying to branch out: Java 7, which was nearly five years in the making, barely evolves the language features at all; most of the promised game-changers for the language (lambdas, extension methods, modules) were deferred to Java 8 due to lack of consensus and time. Java has become stagnant, too verbose, and too crufty for developers that have seen the greener grass on the other side.</p>

<p>Unfortunately, most JVM developers in the wild are still using the Java language in some capacity, and it seems to be fairly common in companies that there is a distinct reason that an alternative VM language hasn't been adopted in its place. The reasons may not satiate the idealists of the JVM-based community, but they do exist. Just to name a few:</p>
<ul>
	<li>Corporate restrictions</li>
	<li>Concerns about code maintainability</li>
	<li>Team skill-sets and strengths</li>
	<li>Stability and performance of the various language platform</li>
	<li>Lack of top-grade IDE support</li>
</ul>
<p>Frameworks, on the other hand, often seem to be an easier pill to swallow. The barrier to entry for developers is often lower, the scope of impact on your application can be less pervasive, and your tools and team skill-sets are not stretched nearly so far. In reality, some frameworks (*cough*Spring*cough*) actually add more complexity than any language shift would, but this is unfortunately a game of perception. As it pertains to frameworks in the post-Java JVM world, those of you that follow my blog or Twitter account know that I'm a big fan of the <a href="http://www.playframework.org/">Play! Framework</a>, as it re-imagined what it means to write a Java web application, and it also provides an easy gateway into Scala. It shows that while Java the language is falling behind, it isn't a complete wasteland for developers craving more.</p>

<p>A co-worker recently pointed out another intriguing Java framework that, while not being as full-featured or targeted for large applications as Play!, has a lot to offer to this neo-Java world: enter <a href="http://www.sparkjava.com/">Spark</a>.</p>

<p>Those of you who have worked with (or at least seen) Ruby's <a href="http://www.sinatrarb.com/">Sinatra framework</a> will instantly feel home (and perhaps vaguely disgruntled) with Spark. Spark is effectively the Sinatra-style of web-binding, using Java syntax. Here is an example from their home-page:</p>

<pre class="brush: java">
import static spark.Spark.*;
import spark.*;

public class HelloWorld {

   public static void main(String[] args) {
      
      get(new Route(&quot;/hello&quot;) {
         @Override
         public Object handle(Request request, Response response) {
            return &quot;Hello World!&quot;;
         }
      });

   }

}

</pre>
<p>The general idea behind Spark is to make the binding between a URL to the actual code being run as thin as possible - allowing you to focus on servicing the request. When compared to many frameworks, the list of features it <em>doesn't</em> have may be disconcerting; but, there is a certain power and portability in the simplicity. The self-coined term "micro-web-framework" is really only true due to the sheer volume of complexity and features that Java web frameworks have decided to provide (or impose) in the last few years.</p>

<p>Like Play!, Spark focuses on using very human-readable API design. The central component of Spark is the callback which handles the request. As seen in the above example, this is provided by the developer via a subclass of "Route". What happens inside the callback to build the result is entirely up to you.</p>

<p>While Spark doesn't get involved in the "manipulating data" part, there are a handful of features and utilities available to help with the control of HTTP-level web-flow. Some of these include:</p>
<ul>
	<li>Filters - These are callbacks just like the routes that can be run based on certain URL patterns, allowing for functionality to be applied orthogonal to a set of requests.</li>
	<li>Request/Response Wrappers - The servlet request and response classes are well known (and often loathed) for their design. Spark, like many frameworks, wraps these to help conceal the suck.</li>
	<li>Halt Commands - This is an increasingly popular API design in web frameworks: methods that set an HTTP status code, and fail with an exception immediately.</li>
	<li>Redirects - Browser redirects are made particularly simple.</li>
</ul>
<p>All of these features are shown in more detail on the <a href="http://www.sparkjava.com/readme.html">Spark Readme Page</a>.</p>

<p>Spark's main facility for running is to start up an embedded Jetty server to automatically handle requests. This is right in-line with how Sinatra functions by default, and provides a quick and easy process for developers to get their application going to test and do development. While not documented on the site, Spark does support a deployed mode where it can be run inside of an already-deployed application server as a WAR with a web.xml file. This is done via the spark.servlet.SparkFilter class, which is a servlet filter that can route requests to your application.</p>

<p>In summary: it's unlikely you would want to implement your entire enterprise on Spark, but that's not really its goal. Spark is really targeted for quick-to-live "scrapplications"; getting something together in a short amount of time, and in the hands of users, without the pomp and circumstance of importing 900 JARs, and creating 35 configuration files. It explicitly avoids imposing a particular application model, data model, or really any dependencies at all on the developer, instead offering a small expressive Java API (as expressive as Java gets anyway), that allows you to quickly map blocks of Java code to RESTful HTTP routes. Overall - definitely worth checking out.</p>]]></description>
   </item>
   <item>
      <title>Review: &quot;Apache Wicket Cookbook&quot;</title>
      <link>http://www.realjenius.com/2011/05/07/review-apache-wicket-cookbook</link>
      <author>R.J. Lorimer</author>
      <pubDate>May 07, 2011</pubDate>
      <guid>http://www.realjenius.com/2011/05/07/review-apache-wicket-cookbook</guid>
      <description><![CDATA[

	<img src="/public/images/articles//misc/Apache-Wicket.png" class="article_image right"/>

<p>Earlier this week, the folks over at <a href="http://www.packtpub.com/">Packt Publishing</a> were kind enough to forward me a copy of the <strong><a href="http://link.packtpub.com/D58AdD">Apache Wicket Cookbook</a></strong> for review, and given my previous positive experiences with <a href="http://wicket.apache.org/">Wicket</a>, I was excited to dive in, and see what the book has to offer.</p>

<h2>Overview</h2>

<p><a href="http://link.packtpub.com/D58AdD">Apache Wicket Cookbook</a> is a dense collection of recipes for working with the various Wicket features, and is written by Igor Vaynberg, one of the main Wicket committers, who is also a contributing author to <a href="http://wicketinaction.com/">Wicket in Action</a>. While "Wicket in Action" is targeted at understanding the "what and why" of Wicket, this new cookbook is all about the "how". To re-summarize from the preface of the book itself:</p>

<blockquote>A straightforward Cookbook with over 70 highly focused practical recipes to make your web application development easier with the Wicket web framework.</blockquote>

<p>The book is composed in a repeating format, showing different solutions to problems using the same rough outline for each recipe:</p>
<ul>
<li>Overview - To start each recipe, a short introductory set of paragraphs are provided that explain the problem at hand, and why it might be important to have a solution.</li>
<li>Getting Ready - This section sets up the example for the recipe, providing the necessary code snippets to try the recipe out for yourself.</li>
<li>How to do it - After prepping the example, this section jumps right in with highlighted code changes, showing how to solve this particular problem with Wicket.</li>
<li>How it works - Now that the reader has seen what to do to fix the problem, this section explains how Wicket uses the code provided to handle the particular problem, and provides more detail for people who like to understand what's going on under the covers.</li>
<li>There's more - Because these examples are so focused, they don't touch on all facets of a particular API. This section provides highlights of other features to explore on your own.</li>
<li>See also - A lot of the recipes are inter-related, solving similar, but different problems. This section points you to other recipes in the book that compliment the one you just read.</li>
</ul>

<h2>Review</h2>

<p>Overall, I've been quite impressed with the "Apache Wicket Cookbook". Having been quite familiar with Wicket over the years from writing a number of tutorial articles about it, and applications with it, I found the previously released "Wicket in Action" to be a bit of a redundant read; most of the code and APIs were familiar enough to me in concept, that I didn't feel that book was the right target for me. This book, on the other hand, is very no-nonsense, and hits on a lot of detailed problems that even a seasoned "Wicketeer" would probably have to really dig to solve in a novel and clean way.</p>

<p>This book touches on many of the "hard" problems, including:</p>
<ul>
<li>The nuances of safe, attractive, and error-free form handling - the web is all about this, of course, but it seems like most people don't do it right. Wicket has the tools, but you've got to know how to apply them - and there are enough recipes here to train your brain!</li>
<li>Internationalization - Who hasn't struggled with this in one fashion or another?</li>
<li>Data Tables - Paging, Sorting, Filtering tables are another "hard to implement" cornerstone of the web, and an area where Wicket really shines.</li>
<li>Security - Wicket's security model is intense; that's both good and bad, and this section really hits on some sticky (get it? Sticky wicket?) topics, like cross-site request forgery protection.</li>
</ul>

<p>Other topics touched on by the book include AJAX and Rich UIs (another Wicket "wow" feature), middleware integration (e.g. Spring), and, interestingly enough, charting and graphing.</p>

<p>Being very familiar with Wicket, I didn't feel the need to try most of these examples for myself, and so despite the book being over 300 pages in ebook form, I found myself finishing my read-through in short order. I wouldn't say the book is short, but for me, it was over quickly.</p>

<p>Some of the sections, like the aforementioned chapter on charting, were new to me, and gave me a better feel of how this book would be received by someone less familiar. Others covered problems I have previously solved myself with Wicket, and I was happy to see solutions that I felt were both cleaner and more complete than the solutions I had "cooked up" myself.</p>

<p>Most of the examples are based in reality; not having the overly contrived feel of an author searching for a problem to solve; and many of the chapters focus on a particular example, and build upon it in several stages - providing a good picture on how to take a bare implementation and layer on features until it is "complete".</p>

<p>If you are already familiar with Wicket, but would like to better understand how to use it effectively to build robust, feature-rich, and pretty applications, then this book is for you.</p>

<p>On the other hand, if you've never used Wicket before, you will be lost the moment you hit the first chapter. It just so happens, that <a href="http://wicketinaction.com/book/">Wicket in Action</a> is already available for novices, as are a <a href="http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=wicket+tutorial">number of online resources</a> (some by yours truly; albeit a little out of date).</p>

<p>For those of you that would like to try this new book out, I have been given access to a free chapter that you can download in PDF format: <strong><a href="http://www.packtpub.com/sites/default/files/1605OS-Chapter-5-Displaying-Data-Using-DataTable.pdf?utm_source=packtpub&utm_medium=free&utm_campaign=pdf">Chapter 5: Displaying Data Using DataTable</a></strong>.</p>]]></description>
   </item>
   <item>
      <title>The Folly of Try-With-Resources in Java 7</title>
      <link>http://www.realjenius.com/2011/01/23/the-folly-of-try-with-resources-in-java-7</link>
      <author>R.J. Lorimer</author>
      <pubDate>January 23, 2011</pubDate>
      <guid>http://www.realjenius.com/2011/01/23/the-folly-of-try-with-resources-in-java-7</guid>
      <description><![CDATA[<p>Java 7, which is slated to be released ninety years ago, is supposed to add a feature as part of the Project Coin initiative (http://openjdk.java.net/projects/coin/) called "try-with-resources". The idea is fairly simple, and is intended to tidy up a common development problem in Java: the safe life-cycle management of volatile resources.</p>
<p>In concept I have no problem with this sort of support. There are innumerable examples on the web of absurd Java try/catch/finally block hierarchies to properly managing input streams and output streams; something that is the norm, not the exception to the rule.</p>

<pre class="brush: java">
InputStream in;
try {
  in = loadInput(...);
    OutputStream out;
  try {
    out = createOutput(...);
    copy(in, out);
  }
  finally {
    if(out != null) {
      try {
        out.close();
      }
      catch(IOException ioe) {
        // Problem closing the output stream.
      }
    }
  }
}
catch(IOException e) {
  // Problem reading or writing with streams.
  // Or problem opening one of them.
}
finally {
  if(in != null) {
    try {
      in.close();
    }
    catch(IOException ioe) {
      // Problem closing the input stream.
    }
  }
}

</pre>
<p>I'm not even showing here how you handle the situation where reading/writing caused an error, <strong>and</strong> closing one or both of the streams caused an error. You want a way to cleanly collect all of these up into one exception bundle that can be properly logged/handled, but short of rolling your own wrapper exception to potentially hold all conditions, there simply is no way.</p>

<p>This abhorrent pile of braces can be cleanly tightened up with the new facilities in Java 7 like so:</p>

<pre class="brush: java">
try (InputStream in = loadInput(...); 
     OutputStream out = createOutput(...) ){
  copy(in, out);
}
catch (Exception e) {
  // Problem reading and writing streams.
  // Or problem opening one of them.
  // If compound error closing streams occurs, it will be recorded on this exception 
  // as a &quot;suppressedException&quot;.
}

</pre>
<p>Java will know how to close items in the try parenthesis by those elements implementing the "AutoCloseable" interface. This is much like the "Iterable" interface that was added to support the enhanced-for loops in Java 5.</p>
<p>Clearly this code is much cleaner, and automatic resource management is a facility that Java has needed (for a <em>lonnnnnngg</em> time) - so why am I miffed?</p>
<p>This feature, which is consuming a non-trivial amount of draft specification time and development time to get right, is being implemented as a language feature via the use of compiler changes - again, much in the same way as enhanced-for was implemented. For the record, I was miffed about enhanced-for as well (although I do use it regularly).</p>
<p>Adding this as a language feature, rather than a library feature, bloats the compiler specification and implementation, and completely hides the real implementation in generated code. It's functional to a point, but is not extensible by developers. Sadly, none of this would be necessary at all if Java 7 shipped with lambda/closure support, and I think the syntax is more natural. Here is a naive Ruby file copy/transform algorithm.</p>

<pre class="brush: ruby">
begin
  File.open(&quot;myInput.txt&quot;, &quot;r&quot;) do |in|
    File.open(&quot;myOutput.txt&quot;, &quot;w&quot;) do |out|
      in.each_line do |line|
        out &lt;&lt; line.upcase &lt;&lt; &quot;\n&quot;
      end
    end
  end
rescue exc
  # Exception occurred reading/writing!
end

</pre>
<p>Now, admittedly the syntax is very different than the Java version - the most notable difference is that the Ruby version is actual three nested lambas, where as the Java version is sort-of two lambdas (one to open the resources, one to work with them). I say sort-of, because the first is this special quasi-code-block that also magically accumulates opened resources for later closure.</p>
<p>I find the lambda version to be much less hand-wavy - the syntax is naturally intuitive and more flexible. I should also note that the inner-most lambda, which iterates over lines in the file, shows that the enhanced for loop in Java is completely unnecessary as well if you have lambdas.</p>
<p>Using one of the proposed syntaxes for Java lambda support (as ugly as they can be), let's take a shot at this same idea with Java:</p>

<pre class="brush: java">
try {
  File.read(&quot;myInput.txt&quot;, { in -&gt;
    File.write(&quot;myOutput.txt&quot;, { out -&gt;
      in.eachLine( { line -&gt;
        out.write(line.toUpperCase() + &quot;\n&quot;);
      });
    });
  });
}
catch(IOException e) {
  // handle as appropriate. Can still have suppressed exceptions on it.
}

</pre>
<p>This is just one way this could be implemented; because it only relies on functions, how expressive the APIs are is up to the library designer, <em>not</em> the language designer. An important distinction. Why build special scenarios into the compiler when one language feature can provide the flexibility you need?</p>

<p>Now - is this as short and tab-friendly as the Java 7 feature? Admittedly no. However, the goal of the try-with-resources is not to save on typing, but rather, to ensure resource management is as accurate as possible, relying as little as possible on the developer to dial-in boilerplate, and error-prone exception management.</p>

<p>Here is a short run-down of the library changes that would be required to support this particular code:</p>

<ul>
<li>java.io.File needs a new method called "read" which takes a String file name, and also takes a function that accepts a java.io.FileReader, and returns nothing. The code block may throw an IOException. This method does all of the resource management for the FileReader it provides to the function. If the code block throws an exception and closing the file reader also throws an exception, the method would add the close exception to the main exception as a suppressedException (just like the language feature would).</li>
<li>java.io.File also needs a new method called "write" which takes a String file name, and also takes a function that accepts a java.io.FileWriter, and again, returns nothing. This can also throw and IOException, and just like the read method, it handles all of the minutia of exception suppression and resource-closing.</li>
<li>java.io.FileReader needs a new method called "eachLine" that simple scans for line-terminators and lets a provided function see the string representation of each line. The function may throw IOException, and the method simply lets IOException propagate.</li>
</ul>

<p>Just to circle back around - Java today could support the "eachLine" method on FileReader via enhanced-for loop, if the "eachLine" method returned an Iterable<String> (as opposed to accepting a block of code) - it could look something like this:</p>

<pre class="brush: java">
for(String line : in.eachLine()) {
  out.write(line.toUpperCase() + &quot;\n&quot;);
}

</pre>
Sadly, this feature will be here with Java 7, and lambdas will not. As such, it's yet another way to buy time without lambdas, and look a little better next to the C# features list.]]></description>
   </item>
   <item>
      <title>Distilling Mirah: Type Inference</title>
      <link>http://www.realjenius.com/2010/10/05/distilling-mirah-1</link>
      <author>R.J. Lorimer</author>
      <pubDate>October 05, 2010</pubDate>
      <guid>http://www.realjenius.com/2010/10/05/distilling-mirah-1</guid>
      <description><![CDATA[
<p>Recently, I've been watching the work of a handful of developers on a new programming language: <a href="http://www.mirah.org">Mirah</a>. As a fan of the <a href="http://www.ruby-lang.org">Ruby programming language</a> and a slave to the Java overlords, Mirah offers instant appeal to me, as it borrows the core Ruby syntax almost verbatim, but creates true Java class files runnable on any Java VM. This decision makes perfect sense, considering the originator of Mirah is <a href="http://blog.headius.com">Charlie Nutter</a>, one of the key developers of JRuby, a language heavily invested in both Ruby and Java. (Mirah actually reuses the JRuby parser at the time of this writing, if that gives you any indicator how similar the syntax is).</p>

<p>Because of my interest in the development of Mirah, I've decided to begin spelunking into the implementation as it stands today, sharing with you what is going on internally. Many of you are probably familiar with my "<a href="http://www.realjenius.com/category/distilling-jruby/">Distilling JRuby</a>" series, and while these articles will likely read similarly, I suspect they will be more brief and hand-wavy. This is partially out of a desire to cover more topics over a short period of time, but also because the implementation for Mirah is very fluid, and is likely to change, rendering these articles invalid or at least out-dated.</p>

<p>Without further ado - let's kick this pig. On to Mirah's type-inferencing!</p>

<div class="seriesNote">
<p>This is <strong>Part 1</strong> in a series on <a href="http://www.mirah.org">Mirah</a> I have started, called "Distilling Mirah" - the <a href="http://www.realjenius.com/category/distilling-mirah/">entire series is available here</a>.</p>
<p><em>As with all my JRuby articles, I will be showing a number of partial snippets - these are meant to imply the idea without covering all of the code that may be rather spooky when considering all of the corner cases. I've tried to reduce the code to its bare important essentials. Oh, and as with the JRuby articles, I reserve the right to temporarily lie for simplicity's sake.</em></p>
</div>

<h2>Mirah Overview</h2>
There are a few key concepts that need to be discussed regarding Mirah before we get started:
<ul>
	<li>Mirah is not Ruby! Mirah looks like Ruby at first glance, but that is only superficial in nature. We will see why over the next series of topics.</li>
	<li>Unlike JRuby, Mirah is not implemented in Java (well, mostly not). It is actually implemented in Ruby - this is going to make the way we traverse the code in these articles very different than the JRuby series.</li>
	<li>While I say that Mirah borrows the Ruby syntax, it has to modify and add certain structures to fit the mold which has been carved for it. So while it is possible to write some programs that are almost true Ruby syntax, most Mirah programs will have a few extra sprinkles..</li>
	<li>Mirah is statically typed, and compiles to standard Java bytecode. This is one of the key reasons that Mirah is not 100% Ruby-grammar compatible.</li>
	<li>Mirah is designed from the ground up to be a language specification that can be implemented on several platforms (.NET is a perfect example). This introduces indirection in the code that may, at first, seem confusing.</li>
	<li>One of the key principals of Mirah is to avoid runtime encumbrances if at all possible. What this means is that all features in Mirah as it currently stands are implemented by either a compiler plug-in, or by using existing core libraries of the underlying platform (or a combination, of course). This goal is to hopefully avoid the 3-5 MB ball-and-chain that many languages (i.e. Scala, Clojure, JRuby) hang around your neck to run deployed code. The idea being that, if you want runtime features, you can bring Mirah modules in per your own decision, but if you want to fit a Mirah program on a micro-controller that can run Java bytecode (or Dalvik <em>cough</em>), you should be able to by forgoing some of those features that require large runtime libraries.</li>
</ul>
<p>The Mirah site can be found at <a href="http://www.mirah.org">http://www.mirah.org</a>, and the official Mirah 'master' repo is available at github: <a href="http://github.com/mirah/mirah">http://github.com/mirah/mirah</a>. Feel free to checkout and follow along, although one last disclaimer - the code is changing <em>quickly, </em>so my articles are bound to fall out of the times.</p>

<p>I'd suggest before proceeding you familiarize yourself with the language syntax - I don't plan to stop along the way.</p>

<h2>A Background on Type Inference</h2>

<p>Most JVM language replacements that are garnering attention right now in one way or another avoid explicit typing in the syntax to some degree. Languages that are compiled to normal byte-code with some degree of implicit typing, must involve some form of type inference. This is the process of statically analyzing code to determine the runtime types the code is using by <em>inferring</em> from the use of variables and parameters. Statically compiled languages on the VM must do this, because Java bytecode (and the VM) expects to work with types - and if the compiler can't figure it out, it can't compile the bytecode.</p>

<p>Consider this Java statement:</p>

<pre class="brush: java">
HashMap&lt;String,String&gt; myMap = new HashMap&lt;String,String&gt;();

</pre>
<p>There is really no reason you need to define the left-hand side (the declaration) so explicitly, considering that the right-hand side (the assignment) has already told you exactly what the variable is. Surely this should be sufficient:</p>

<pre class="brush: java">
var myMap = new HashMap&lt;String,String&gt;();

</pre>
<p>Anyone familiar with C# will likely recognize this syntax convenience. Of course, this is simple example, because you only have to examine this one line to infer the variable type. Things get much more complex when there are control structures, methods, and other language features in the way.</p>

<p>That being said, type inferencing is a well-tread path - it's certainly not unique to JVM languages; far from it. There are different levels of type inference, with the most complete often using something <a href="http://en.wikipedia.org/wiki/Type_inference">like Hindley-Milner</a> to deduce types recursively (excellent description <a href="http://www.codecommit.com/blog/scala/what-is-hindley-milner-and-why-is-it-cool">of Hindley-Milner by Daniel Spiewak on his blog</a>).</p>

<h2>Mirah's Type Inferencing</h2>
<p>As it stands today, Mirah currently implements a variant of type inference somewhere between true "local" type inference, and fully recursive type inference like Hindley-Milner. Mirah's inference uses a multi-pass infer process, where the first phase does simple local inference (or line-by-line inference), and then subsequent passes are made, looking for new type resolutions from those higher constructs. For example, consider these two Mirah methods:</p>

<pre class="brush: ruby">
def method_a()
  return method_b(5) + method_b(6)
end

def method_b(x:int)
  return x * -1
end

</pre>
<p>In this case, 'method_a' is obviously dependent upon 'method_b' - but if 'method_a' is inferred first, it will have no way to know what it's return type is, because method_b hasn't been inferred yet. In this case, 'method_a' is 'deferred' for a later inference pass. Shortly thereafter, 'method_b' will be processed, and since it can be completely analyzed through local inference, it will resolve to return an int. At that point, method_a can look at the two invocations that are involved in the return statement, and can in turn determine that it should also return an int.</p>

<h2>The Algorithm</h2>
<p>From an implementation standpoint, Mirah does this inference by utilizing the ASTs generated from the source. Each AST knows individually how to infer itself based on its recursive contents - this is something we'll investigate in more detail shortly.</p>

<p>Mirah defines a namespace and class called Typer that is used to orchestrate this entire process. The Typer is asked to analyze each AST tree parsed by Mirah individually, and then to iteratively resolve:</p>

<pre class="brush: ruby">
typer = Typer.new
asts.each { |ast| typer.infer(ast) }
typer.resolve

</pre>
<p>The infer method for an individual AST node is pretty straightforward:</p>

<pre class="brush: ruby">
class Typer
  def infer(node)
    node.infer(self)
    # error handling business
  end
end

</pre>
<p>Notice that the typer passes itself into the node - this allows the nodes to callback into the typer for a variety of reasons. For example, each node has to decide for itself whether or not it has enough information to infer. If it doesn't, it will tell the typer that it needs to be 'deferred', meaning it doesn't yet have enough information. All this effectively does is record the node for later:</p>

<pre class="brush: ruby">
class Typer
  def defer(node)
    @deferred_nodes &lt;&lt; node
  end
end

</pre>
<p>So the typer calls infer on the top level AST node, at which point the AST hierarchy will recurse, inferring and deferring nodes as appropriate. After the first recursive inference pass, the typer is then asked to resolve AST nodes iteratively until all nodes are inferred, or until no progress is made:</p>

<pre class="brush: ruby">
class Typer
  def resolve
    old_len = @deferred_nodes.length
    while true
      @deferred_nodes.each do |node|
        type = infer(node)
        if type != nil
          @deferred_nodes.remove(node)
        end
      end
    
      if @deferred_nodes.length == 0
        break
      elsif old_len == @deferred_nodes.length
        raise # can't infer error!
      end
    end
  end
end

</pre>
<h2>AST Working Together</h2>

<p>Understanding the concept of the AST recursively inferring is the key component to understanding the typer. Consider, for example, the statement 'x = method_b(5)' - this is represented by a tree of AST nodes. For those of you with experience in parsers, or experience with my previous JRuby articles, it probably won't be too hard to derive the types of nodes involved - it's basically this:</p>

<pre class="brush: text">
LocalDeclaration
|
.-- LocalAssignment (type_node)
    |
    .-- FunctionalCall (value)
        |
        .-- Fixnum (parameters)
            |
            .-- &quot;5&quot; (literal)

</pre>
<p>The idea is that the declaration will ask the assignment, which will in turn ask the call being made with the parameter types in play, and will then return the type of the call return type. Here is a sketch of the various infer methods for these nodes:</p>

<pre class="brush: ruby">
class LocalDeclaration
  def infer(typer)
    type = @type_node.infer(typer)  #type_node is the local assignment
    if(!type)
      typer.defer(self)
    end
    return type
  end
end

class LocalAssignment
  def infer(typer)
    type = @value.infer(typer) #value is the &quot;functional&quot; call.
    if(!type)
      typer.defer(self)
    end
    return type
  end
end

class FunctionalCall
  def infer(typer)
    @parameters.each { |param| param.infer(typer) }
    if #all parameters inferred, and method with params and scope is known
      return typer.method_type(@method_name, method_scope, @parameters)
    else
       typer.defer(self)
       return nil
    end
  end
end

class FixNum
  def infer(typer)
    return typer.fixnum_type(@literal) #literal is '5'
  end
end

</pre>
A few things to note here:
<ul>
<li>This is totally pseudo code - the actual code has all kinds of branches for caching and other good bits.</li>
<li>The one literal we have, Fixnum, calls back into the typer to get the actual fixnum type - we'll see this come in to play momentarily.</li>
<li>The typer has the ability to look up a method type by a signature - when methods are scanned during type inference, they record themselves in the typer for other nodes, like this one, to use when inferring since they are one case of node "hopping", where one AST can be linked to another by reference.</li>
<li>We're dodging how the functional call determines things like 'method scope' for now.</li>
</ul>

<h2>Resolving Literals</h2>

<p>As noted above, the Fixnum node is asking the typer to give it back a fixnum type. This is done for all of the literal types. It's done this way so that the platform implementation (in this particular case, Java) can plug in a particular type. So in this particular case, the Java implementation, in the JVM::Types module, provides a FixnumLiteral that looks at the provided value, and determines where in the hierarchy it belongs (for you Java folks, that's byte, short, int, long, etc). When asked to actually compile, these AST nodes actually know how to generate the ultra-fast JVM bytecode-ops for primitives.</p>

<h2>Type Annotations</h2>

<p>As seen in one of the snippets above, Mirah supports type definitions for places where typing is either required (due to a lack of inference) or desired (widening a type, for example). Forgoing the fact this is a contrived implementation for a moment, consider this method:</p>

<pre class="brush: ruby">
import java.util.Map
import java.util.HashMap
class SomeClass 
  def singleton_map(a:string, b:string):Map
    map = HashMap.new
    map.put(a,b)
    return map  
  end
end

</pre>
<p>Here we are declaring both variable types so we can control inputs, and then we are declaring the return type. The reason you might want to declare a return type like this is so that the compiled method doesn't expose too narrow of an implementation. Remember, we're compiling to Java class files here - so if the compiled type inferred that the method returned a HashMap, that is a contraint we may never be able to change in the future. By changing it to 'Map', we can adjust the API like we would in the Java world to avoid tying ourselves to an implementation. To see this in action, here's the output from mirahc when asked to generate Java code for this with and without the return type:</p>

<pre class="brush: java">
// With:
public class SomeClass extends java.lang.Object {
  public java.util.Map singleton_map(java.lang.String a, java.lang.String b) {
    java.util.HashMap map = new java.util.HashMap();
    map.put(a, b);
    return map;
  }
}

// Without:
public class SomeClass extends java.lang.Object {
  public java.util.HashMap singleton_map(java.lang.String a, java.lang.String b) {
    java.util.HashMap map = new java.util.HashMap();
    map.put(a, b);
    return map;
  }
}

</pre>
<p>Individual AST nodes know about these definitions (sometimes known as forced types), and will respect those over the corresponding inferred types. That's not to say that it will just take them for granted; the type inference still occurs. In the example above, the method body is still inferred to ensure it returns a type that can be widened to 'java.util.Map' - otherwise the code will cause runtime errors in the VM. Here's a snippet of the method definition AST analysis:</p>

<pre class="brush: ruby">
class MethodDefinition
  def infer(typer)
    forced_type = @return_type
    inferred_type = @body.infer(typer)
    actual_type = if forced_type.nil?
      inferred_type
    else
      forced_type
    end

    if !actual_type.is_parent(inferred_type)
      raise &quot;inference error&quot;
    end
    return actual_type
  end
end

</pre>
<p>The return_type field will be set by the parser if provided, and takes precedent so long as it's still able to be used in place of the actual inferred type of the method body.</p>

<h2>Uncovered Topics</h2>

<p>So this was a quick spin through Mirah-land, but even for the inference engine, a lot was left on the table if you'd like to explore from here:</p>

<ul>
<li>Finding "native" types (in this case, calls into and returning Java types)</li>
<li>Tracking class/method scope when inferring</li>
<li>Inferring against intrinsics (such as '+', and '||')</li>
<li>Dealing with multi-node inference - several nodes, like 'MethodDefinition' are expected to infer several parts, including arguments, return type, throws types, etc. This increases the complexity of the implementation, but doesn't have much impact on concept.</li> 
<li>Superclasses, invocation of 'super', overriding, overloading, etc.</li>
<li>Framework vs. Implementation (i.e. JVM) Responsibilities</li>
</ul>

<p>Stay tuned as the Mirah story unfolds!</p>]]></description>
   </item>
   <item>
      <title>Distilling JRuby: Frames and Backtraces</title>
      <link>http://www.realjenius.com/2010/03/15/distilling-jruby-frames-and-backtraces</link>
      <author>R.J. Lorimer</author>
      <pubDate>March 15, 2010</pubDate>
      <guid>http://www.realjenius.com/2010/03/15/distilling-jruby-frames-and-backtraces</guid>
      <description><![CDATA[

	<img src="/public/images/articles/jruby/logo.png" class="article_image right"/>
<p>Welcome back JRuby fans. I took a poll on twitter about what distilling article to do next, and frames and backtraces was the clear winner - so here we are! (three months later).</p>
<p>In previous "distilling" articles, I discussed how methods are dispatched, and then how the scope of variables in each method and block is managed. The scope and dispatch rules are only part of the big picture, however. Ruby, as a programming language, must gather rich information about the execution of the program, and must be able to share this with the developer when errors occur. Furthermore, Ruby itself provides a number of kernel-level methods for accessing and manipulating the current invocation stack (such as Kernel.caller).</p>
<p>This article is all about how JRuby implements those concepts.</p>

<div class="seriesNote">
This article is <strong>Part 4</strong> in a 4-part series.
<ul>
                                <li>Part 1: <a href="/2009/09/16/distilling-jruby-method-dispatching-101">Distilling JRuby: Method Dispatching 101</a></li>
                                        <li>Part 2: <a href="/2009/09/25/distilling-jruby-tracking-scope">Distilling JRuby: Tracking Scope</a></li>
                                        <li>Part 3: <a href="/2009/10/06/distilling-jruby-the-jit-compiler">Distilling JRuby: The JIT Compiler</a></li>
                                <li>Part 4: This Article</li>
                    </ul>
</div>
<h2>Overview</h2>

<p>A frame in JRuby parlance is a representation of a method call, block call, eval, etc. kept for presentation to the developer. A backtrace is a representation of the active method stack at any point in time - in other words, it's a stack of frames. In Java, this would typically be referred to as a 'stack trace' - at least, that's the most direct counterpart.</p>

<p>It can be difficult when juggling a language implementation around in your head to realize that the trace we're talking about is specific to the method calls in Ruby itself. JRuby may execute a number of "native" methods (code written in Java) that do not show up as part of this backtrace - the code that must run in between steps of the Ruby code executing is implementation-specific to JRuby; the Ruby developer shouldn't care what internal magic JRuby had to do to get a method to invoke (nor would they know what to do with that knowledge if they did have it).</p>

<p>While it may not seem incredibly important initially, JRuby goes to great pains to be as compatible as possible with MRI in terms of what backtraces are generated (This 'compatibility mode' incurs a certain cost, and it may be preferrable to turn this off to give JRuby an opportunity to bypass this internal bookkeeping if, as a developer, you don't need a backtrace to match MRI; but we'll get into those experimental optimizations later). Backtrace information turns out to be quite important, as it is the first set of information a developer typically uses to trace execution issues in their own code; if it isn't accurate (or at least traversable) it could easily make a small problem a big one.</p>

<h2>Tracking Frames</h2>

<p></p><em>I would like to mention at this point that it would be in your best interests to read the earlier Distilling JRuby articles, if you haven't already. Method dispatching, scope, and the JIT compiler are all entertwined with the concept of frames, and I will be talking about these various relationships throughout this article.</em></p>

<p>You may recall that during the article regarding tracking variable scope, I mentioned that the ThreadContext is consulted on a number of occasions to find data in the variable table. At the time, I was talking about how variables are managed; but that same context class acts as the main source for tracking the frames of method invocation. We saw previously that when a Ruby method is dispatched, a variant of method named "preMethod{...}()" would be called on the ThreadContext class, and that would in turn tell the ThreadContext to create another DynamicScope object and put it on the top of the stack. It turns out this is exactly where the frame is managed as well. Here is a block of code I showed from the JRuby codebase in that previous article:</p>

<pre class="brush: java">
public void preMethodFrameAndScope(RubyModule clazz, String name, IRubyObject self, Block block, StaticScope staticScope) {
    RubyModule implementationClass = staticScope.getModule();
    pushCallFrame(clazz, name, self, block); // &lt;-- What we care about this time
    pushScope(DynamicScope.newDynamicScope(staticScope));
    pushRubyClass(implementationClass);
}

</pre>
<p>Note how this method not only creates a new scope to represent the method's static scope, but also calls 'pushCallFrame(...)'. This is where the new frame is created to represent the method that is being invoked. This frame is represented by a 'org.jruby.runtime.Frame' object, which is put on the top of the frame stack.</p>

<p>By most accounts, the Frame object in JRuby is a simple mutable Java bean. The class is relatively simple, and carries a few key pieces of information:</p>
<ul>
    <li>The object that owns the code being invoked</li>
    <li>The name of the method (or block or eval) being invoked</li>
    <li>The visibility of the method</li>
    <li>The name of the file <em>where the invocation of the frame occurred</em>.</li>
    <li>The line number in the calling file <em>where the invocation of the frame occurred</em>.</li>
</ul>
<p></p>... and that's it! This is basically all that is required to produce a single line in a backtrace. The entire stack of frames then, in turn, represents the entire backtrace.</p>

<h2>The Magic Line Number</h2>

<p>While the program is executing, the line number is constantly changing. The frame has some idea of this line number, but only in terms of when the method was called in the enclosing code - it's not a live representation. However, when you think about a running program, the number on the top of the trace is constantly changing - and on top of that, the frame that <em>is </em>the top of the trace is constantly changing as well - so who is keeping track of this magic number?</p>

<p>It turns out it's the ThreadContext again, where an integer is kept to keep track of the most current line number (and actually the most current file, as well). In the basic (interpreted) mode of JRuby, the various AST nodes (control statements like if and while, blocks, methods, etc) all have their line number baked into them. When they are invoking, they will update the line number on the thread context. For example, here is the top part of the 'interpret' method on org.jruby.ast.IfNode:</p>

<pre class="brush: java">
@Override
public IRubyObject interpret(Ruby runtime, ThreadContext context, IRubyObject self, Block aBlock) {
    ISourcePosition position = getPosition();
    context.setFile(position.getFile());
    context.setLine(position.getStartLine());
    // ...

</pre>
<p>For JIT compiler fans, note that it also manages the line number like the interpreted nodes, but as usual is a little more obscure. Two things are done for the JIT compiler: first - code is generated that will call into the ThreadContext to update the line number information like above (See ASTCompiler#compileNewLine). Additionally, however, the line numbers are also actually written into the generated Java bytecode using the standard label/line-number bytecode structures (This will provide distinct advantages in generating backtraces, as we will see later).</p>

<p>As the various code is invoked, this number is constantly being changed to represent the position from the original source. When a new method or block is invoked, that value is copied onto the frame and preserved. This allows the frame to keep track of when it lost control of the execution, while the thread context keeps track of the live line number.</p>

<p>Let's take a look at a sample backtrace for a specific example:</p>

<pre class="brush: text">
./another.rb:7:in `do_something_else': undefined method `call' for nil:NilClass (NoMethodError)
from ./another.rb:3:in `do_something'
from test.rb:5:in `run'
from test.rb:9

</pre>
<p>So what this tells us is that in the method 'do_something_else' in another.rb, on line 7, we had a NoMethodError trying to call the method 'call' on a nil variable. Additionally, we know the three method calls it took to get to this point. Here is a diagram that shows what the frame stack looks like in the runtime at the moment this occurs (as usual, I've done some hand-wavy magic here to simplify a few less-important details...):</p>

	
<div class="picture alignnone">
	<img src="/public/images/articles/distilling/jruby/traces1.png"/>
	<div class="caption">
		Note the line number mis-match
	</div>
</div>

<p>As you can see, the line number stored on the frame correlates to the position in the previous call where the invocation occurred. Also notice that the thread context carries the currently active file and line - but the method name is inferred from the top frame. This mixed relationship, while effective for the way that frames are recorded, can be confusing at first.</p?

<h2>Managing Frames at Runtime</h2>

<p>JRuby tries to avoid creating a huge volume of frame objects during execution; in general, the expectation is that a program is going to invoke a lot of methods during execution. If each method was represented by a frame, that would mean a lot of frame objects. To combat this, the frame objects are pre-allocated on the frame stack, and reused. Since programs are generally going to repeatedly traverse up and down the frame-stack, hovering around the same depth of execution, this is one place in Java code where pooling of objects probably makes good sense. Rather than JRuby creating thousands and thousands of frame objects, it will only create enough for the deepest level of execution per thread.</p>

<p>Internally speaking, the ThreadContext class keeps a growable Frame[], but in the process it also ensures that each slot is pre-filled with a ready-to-use frame object. If the allocation needs to grow, the array is increased by a capacity, the existing frame objects are moved to the new array, and the new empty slots are filled with additional Frame allocations.</p>

<p>When a Frame object is set up for use, some variant of the #updateFrame method is called, which basically captures all of the invocation information - it effectively behaves as a constructor:</p>

<pre class="brush: java">
public void updateFrame(RubyModule klazz, IRubyObject self, String name,
Block block, String fileName, int line, int jumpTarget) {
    this.self = self;
    this.name = name;
    this.klazz = klazz;
    this.fileName = fileName;
    this.line = line;
    this.block = block;
    this.visibility = Visibility.PUBLIC;
    this.isBindingFrame = false;
    this.jumpTarget = jumpTarget;
}

</pre>
<h2>Dispatching Options</h2>
<p>In all of the past three articles, I have brushed by the CallConfiguration enumeration. This enum is a pretty significant lynch-pin in the dispatching and execution of program flow, as it decides a number of things about method and block invocation. Each method may be dispatched using a number of possible call configurations, based on the state of the running program, and the needs of code block being executed. This CallConfiguration decides not only whether a Frame object is required for the call, but also whether or not a Scope is required. This abstraction is very useful as both the interpreter, and the JIT-compiled code dispatch using this configuration strategy.</p>

<p>Just as certain methods may not require an explicit scope (no variables are mutated), some methods also don't require frames. With both scope and frame, the primary reasons for skipping their use is performance. In the case of scope, the code which manages this is entirely transparent; you don't care how your variables are managed, as long as they are managed.</p>

<p>However, with frames it's not that simple. We've already discussed how method invocations can be compiled in to Java code in JRuby - effectively avoiding the overhead of making a series of reflection calls in favor for a generated block of Java code that properly preps the Ruby context, and invokes the method through the call site. In this process, a number of possible invocations can be generated - some that setup the frame constructs, and some that don't. If you tell Ruby it can optimize away as much as possible through flags, it will generate these method connectors to be ultra-super-cool-fast.</p>

<p>Strictly speaking, to turn off frames in the compilation process, you simply need to set 'jruby.compile.frameless' to true - although to get the most speed, you could instead set 'jruby.compile.fastest' (which implies a number of other settings as well).</p>

<p>It should be noted that, by default, these settings are turned off, and both are marked as experimental. By leaving them disabled by default, it ensures that JRuby, out of the box, is compatible with MRI Ruby as it pertains to generated backtraces and frame manipulation, and is as stable as possible. Turning them on can easily break certain frameworks and libraries that expect the frame or backtrace to be consistent, and manageable. In many cases, however, your application won't need that kind of control over the frame, and you may not care that the backtrace be exact.</p>

<p>Here are some general rules followed when dispatching to a method or block:</p>

<ul>
    <li>A full frame will <em>always</em> be used if compatibility mode is enabled (jruby.compile.frameless is set to false).</li>
    <li>Certain system-level invocations (such as the 'eval' method) get a frame no matter what, as they are frame aware.</li>
    <li>In all other cases either a backtrace-only frame (a "lite" frame) will be used at most.</li>
    <li>If jruby.compile.fastest is set to true, then frames will not be used at all unless it is required by the running program to exist. This obviously has some impact on the readability of the execution.</li>
</ul>

<p>As mentioned above, readability of backtraces is an issue with jruby.compile.fastest -- less accurate information will be available. For example, the trace above that was used looks like this when run with fastest-compilation on:</p>

<pre class="brush: text">
./another.rb:7:in `do_something_else': undefined method `call' for nil:NilClass (NoMethodError)
from ./another.rb:3:in `do_something'
from :1:in `run'
from :1

</pre>
<p>Note that all frames but the top are completely unaware of the execution details - in many cases this may be sufficient information to fix a problem, but it is certainly <em>less</em> than what is available in compatibility mode.</p>

<p>The 'lite' backtrace-only frame I mentioned is basically a trimmed down representation that doesn't hold on to references to the owning object. While this can reduce the usability of the frame (particularly for methods like 'eval' that may need to interact with the caller), it's a significant optimization as it takes several objects off the object graph, preventing long-lived references to live objects in from the program flow (such as the object that is being invoked against). This will allow the GC to handle these objects sooner than may otherwise be possible.</p>

<h2>Execution Flow</h2>
<p>When an error occurs, the execution needs to stop and unwind from the exception to the first point it is properly handled (with a rescue, or all the way out of the program). This can make following the JRuby code complicated, as Java exception flow is used as the back-bone for the Ruby exception flow, and so the two intermingle and must be kept separate in your mind.</p>

<p>The primary class that represents an exception in Ruby is org.jruby.RubyException, which is the JRuby native implementation of the Exception class in Ruby. There are a number of subclasses that are constructed (such as ArgumentError, as we will see below) that let Ruby code handle errors in a typed way, but effectively everything extends this 'Exception' class. Now, while this is called 'Exception', it's not actually a Java exception. It extends RubyObject (like all JRuby native peers), and is a representation of a Ruby exception for the runtime, but has no effect on Java as anything but a standard object.</p>

<p>However, RubyExceptions can be encountered during execution, and that should interrupt execution. Somehow this has to be handled in Java code. As an example, the 'to_sym' method on RubyString is implemented natively in Java, and that method, by contract, should throw an exception if the string is empty.</p>

<pre class="brush: text">
$ ruby -e '&quot;&quot;.to_sym'
-e:1:in `to_sym': interning empty string (ArgumentError)
from -e:1
$ jruby -e '&quot;&quot;.to_sym'
-e:1: interning empty string (ArgumentError)

</pre>
<p>As it turns out, the easiest way to interrupt Java code like this is to use a Java exception. For this, JRuby uses the class 'org.jruby.RaiseException', which is, in fact, a real Java exception. As the name hints, it represents the execution of a 'raise' keyword in Ruby (which is roughly analogous to a Java throw, but is actually a method on the Thread class). RaiseException contains the RubyException representing the error in Ruby code.</p>

<p>When Ruby code invokes 'raise', this method will delegate through org.jruby.RubyKernel#raise, which for the most part will end up throwing a new RaiseException. Now, this is where it gets tricky to distinguish the two. Keep in mind that the RaiseException simply exists so JRuby can back up the Java code to find the right Ruby code to handle the error. On the <em>other</em> side of the equation, the code in JRuby follows a pattern roughly like this:</p>

<pre class="brush: java">
public void interpret() {
    try {
        runBodyRubyCode(...);
    }
    catch(RaiseException e) {
        runRescueRubyCode(e.getRubyException());
    }
    finally {
        runEnsureRubyCode(...);
    }
}

</pre>
<p>This is pseudo-code mixed from the AST RescueNode and EnsureNode, but it captures the idea. First, the code is run - then, if a RaiseException occurs, the exception is sent into a rescue block of code. Keep in mind that when the rescue code is run, the Ruby exception is unboxed so it's directly accessible to that code block (as it always is in Ruby). The ensure code is actually handled by a separate AST node (since it may be included independently of rescue), but the concept is the same as seen above.</p>

<p>The JIT obviously changes how this code is actually invoked (via generated Java code), but the same general logic applies.</p>

<p>If an exception <em>isn't</em> handled via a mechanism like raise, then the RaiseException itself is handled by the Java bootstrap (such as the executable). This has some special consequences when it comes to embedded code, and we'll get into that shortly.</p>

<p>Additionally, there is a special version of RaiseException called NativeException (this also exists in MRI) - this is a special wrapper for exceptions that occur in Java code called from JRuby code. When this happens, the stack trace for those native parts is actually <em>preserved</em> in the Ruby stack up to the point the Ruby code invoked the Java code. Here is an example of a backtrace that was created by an exception occurring in some Java code:</p>

<pre class="brush: text">
java/lang/NumberFormatException.java:48:in 'forInputString': java.lang.NumberFormatException: For input string: &quot;15123sdfs&quot; (NativeException)
from java/lang/Long.java:419:in 'parseLong'
from java/lang/Long.java:468:in 'parseLong'
from ./another.rb:7:in 'do_something_else'
from ./another.rb:5:in 'do_something'
from [script]:5:in 'run'
from [script]:9

</pre>
<h2>Constructing Backtraces</h2>

<p>Throughout this article, we've seen examples of backtraces that were (seemingly) generated off of the frame stack. To create a proper backtrace, the currently active frame stack must be copied and turned into a point-in-time snapshot of backtrace information. When an error occurs, the backtrace is captured with participation between the RaiseException, RubyException, and the ThreadContext.</p>

<p>When the RubyException is constructed, it asks the ThreadContext to create a backtrace, which then iterates over the current frame stack, creating a RubyStackTraceElement array. This array is then bound to the RubyException. Here is a sample of the loop that creates the backtrace array (I've trimmed some unnecessary details):</p>

<pre class="brush: java">
public static IRubyObject createBacktraceFromFrames(Ruby runtime, RubyStackTraceElement[] backtraceFrames) {
    RubyArray backtrace = runtime.newArray();
    if (backtraceFrames == null || backtraceFrames.length &lt;= 0) return backtrace;
    int traceSize = backtraceFrames.length;
    for (int i = 0; i &lt; traceSize - 1; i++) {
        RubyStackTraceElement frame = backtraceFrames[i];
        addBackTraceElement(runtime, backtrace, frame, backtraceFrames[i + 1]);
    }
    return backtrace;
}

</pre>
<p>In the normal "framed backtrace" workflow, that's all there is to it. That array can then be used to emit to the console, or whatever else needs to occur.</p>

<p>Interestingly, there are a number of other "super secret" ways the backtrace can be generated. As best as I can tell, these are entirely undocumented on the JRuby site - these are simply custom values for "jruby.backtrace.style" - these include:</p>

<ul>
    <li>"raw" - This version provides a very explicit output of what happened, including all of the internal JRuby stack - very useful for JRuby development:
        <pre class="brush: text">
        from java/lang/Long.java:419:in `parseLong'
        from java/lang/Long.java:468:in `parseLong'
        from Thread.java:1460:in `getStackTrace'
        from RubyException.java:143:in `setBacktraceFrames'
        from RaiseException.java:177:in `setException'
        from RaiseException.java:119:in `&lt;init&gt;'
            from RaiseException.java:101:in `createNativeRaiseException'
            from JavaSupport.java:188:in `createRaiseException'
            from JavaSupport.java:184:in `handleNativeException'
            from JavaCallable.java:170:in `handleInvocationTargetEx'
            (... removed the rest for brevity ...)
            
</pre></li>
    <li>"raw_filtered" - Just like 'raw', but it omits any Java classes starting with 'org.jruby'. This is handy if you have code-flows that go from Ruby -> Java -> Ruby -> Java, etc - and need to see the Java code intermixed. I've used this when coding in Swing and SWT where event hooks may go into Java, and back into Ruby.</li>
    <li>"ruby_framed" (the default) - This uses the internal Ruby frame stack to generate an MRI-friendly backtrace. "rubinus" is currently compatible with this version. Depending on the settings you have enabled, this can return different values (as described above).</li>
    <li>"ruby_compiled" - This uses the Java stack trace, and parses the compiled class names. When JRuby generates compiled invokers for methods, they will have mangled names that can be re-parsed (looking for sentinels like $RUBY$). Additionally, remember earlier how I said that the line numbers were actually compiled in to the Java code straight from the Ruby code? Well, that means the Java stack trace will automatically have the correct line numbers in it, so building the Ruby backtrace is truly just a matter of parsing the Java StackTraceElement[]. Because of the nature of the bytecode and the Java VM capturing this information, when running with jruby.compile.fastest set to true, this mode can actually return <em>more</em> accurate information than ruby_framed will. Note that if a method isn't compiled, it will not show up in the Java stack, and as such the stack will only contain Java methods that were invoked (of which there may be none).</li>
    <li>"ruby_hybrid" (currently disabled) - This version is meant to be able to munge compiled and interpreted information together into a mega-stack-trace, allowing for compiled and interpreted methods to show up in the same stack trace, using the Java stack to (auspiciously) improve performance where possible -- I'm assuming it's commented out due to some flaw in the implementation.</li>
</ul>

<h2>Embedding JRuby Programs in Java</h2>

<p>When embedding JRuby in Java programs, errors that occur can potentially leave the Ruby runtime altogether. When this happens, Java code is in total control. To make this transition as seamless as possible, JRuby performs some nifty tricks with traditional Java stack-traces.</p>

<p>Our old friend RaiseException actually generates the object-graph for a backtrace like above, and then creates a pseudo-Ruby stack in the Java code that lets a Java programmer see where in the Ruby code the error occurred. Here is the example from way up above as generated in Java code:</p>

<pre class="brush: text">
Exception in thread &quot;main&quot; javax.script.ScriptException: org.jruby.exceptions.RaiseException: undefined method 'call' for nil:NilClass
 at org.jruby.embed.jsr223.JRubyEngine.wrapException(JRubyEngine.java:112)
 at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:173)
 at realjenius.SampleProgram.main(SampleProgram.java:13)
Caused by: org.jruby.exceptions.RaiseException: undefined method 'call' for nil:NilClass
 at Kernel.call(./another.rb:7)
 at Another.do_something_else(./another.rb:3)
 at Another.do_something([script]:5)
 at MyClass.run([script]:9)
 at (unknown).(unknown)(:1)

</pre>
<p>Other conversions for the backtrace (such as the fancy NativeException stuff) works naturally with this code as well, allowing for diversions in Ruby code to show up naturally in the Java stack.</p>

<h2>Frame Peeking with Ruby Programs</h2>

<p>I previously mentioned Kernel#caller, which is a method for peeking at the going-ons in the Ruby trace. Now that we understand the structure of the frames, it is probably pretty easy to see how they will be used. The implementation of org.jruby.RubyKernel#caller simply calls ThreadContext#createCallerBacktrace which is much like all of the other code we looked at, but it creates a RubyArray containing strings representing the state of the frames in the context at that time.</p>

<pre class="brush: java">
public IRubyObject createCallerBacktrace(Ruby runtime, int level) {
    int traceSize = frameIndex - level + 1;
    RubyArray backtrace = runtime.newArray(traceSize);

    for (int i = traceSize - 1; i &gt; 0; i--) {
        addBackTraceElement(runtime, backtrace, frameStack[i], frameStack[i - 1]);
    }

    return backtrace;
}

</pre>
<p>It's probably also clear by now why optimizations like 'jruby.compile.fastest' can break these methods; the frames aren't there for the ThreadContext to report against.</p>

<h2>Conclusion</h2>

<p>While the frame concepts in JRuby in and of themselves aren't that complicated, you have to have a strong foundational knowledge of how Ruby works and how method dispatching in JRuby works to understand the code flows. I hope I've been able to condense the concepts into an easy enough walkthrough.</p>

<p>I'm by no means done with these JRuby articles -- I took a little hiatus for work and personal reasons, but hope to have more coming out of the gates real soon. Here is a peek at some possible subjects:</p>
<ul>
    <li>The Library Load Service</li>
    <li>Continuations (Kernel#callcc)</li>
    <li>Java Proxying and Support</li>
    <li>The New Kid on the Block: Duby</li>
</ul>

<p>As usual, votes are welcome: <a href="http://www.realjenius.com/contact">http://www.realjenius.com/contact</a>.</p>

<p>Stay Tuned!</p>]]></description>
   </item>
   <item>
      <title>Check Out the Play! Framework</title>
      <link>http://www.realjenius.com/2010/03/01/check-out-the-play-framework</link>
      <author>R.J. Lorimer</author>
      <pubDate>March 01, 2010</pubDate>
      <guid>http://www.realjenius.com/2010/03/01/check-out-the-play-framework</guid>
      <description><![CDATA[
	<img src="/public/images/articles//play/play.png" class="article_image right"/>
<p>I've spent a lot of time recently investigating a variety of languages other than Java, such as <a href="http://jruby.org">JRuby</a> and <a href="http://scala-lang.org">Scala</a>, and truly believe from these experiences that traditional Java MVC web frameworks are inherently flawed in design and implementation. The effort involved in implementing on a framework like Struts or Spring MVC is astronomical, especially if you are going to implement things "the right way".</p>
<p>It's amazing to me how much these platforms push "hello, world" examples that are simply not realistic web applications. After trying these short examples, developers turn around and start trying to implement a complete application, and this simple example balloons into a mess of code, and that's without any real functionality yet in the application. A co-worker of mine is a fan of saying "[These frameworks] make the simple things trivial, and the hard things impossible".</p>
<p>Historically, I've been known to say "If you are doing web-development in Java, use <a href="http://wicket.apache.org">Wicket</a>"; this was based on the fact that to my experience Wicket took the most advantages from the strongly-typed, and strongly IDE-supported, Java language, as opposed to trying to hide them behind anemic and broken templating languages that have horrid editors and basically trade one problem for another.</p>
<p>Recently, however, I spent some time doing some significant development with the <a href="http://playframework.org">Play Framework</a>. I have to say that I think the Play Framework has eclipsed my Wicket fever. That's not to say that I don't still think Wicket is very powerful, but I have been particularly impressed with the feedback loop provided by Play. It has, without a doubt, the most direct code-test-cycle I have seen in any platform for Java (it approaches the instant feedback of Rails), and also has the distinct advantage of being stateless out-of-the-box (something Wicket is definitely not).</p>
<p>Play manages this feedback loop problem in a rather novel way - embedded in the framework is the <a href="http://www.eclipse.org">Eclipse </a>compiler for Java (ECJ). This means that when you're coding for the play framework, you're not sending it your class files, but rather your source files. This allows Play to recompile code in a running instance on the fly - I literally only restarted my application a handful of times while I was coding over <em>the course of several days</em>. It also integrates seamlessly with IDEs, and ships with an embedded HTTP runtime (no deployment is necessary during development).</p>
<p>There are a number of other benefits Play can provide by working with source files instead of class files. Much like Rails ability to add functionality to your application at runtime, Play can (and does) pre-process certain Java classes to add functionality.</p>
<p>I was further heartened to see that the next release of Play is meant to fully support Scala, which would allow for other modern language features to be used with this highly interactive framework.</p>
<p>It's hard to describe all of the neat features Play provides in a few hundred words, but I would highly recommend <a href="http://www.playframework.org">you check it out</a> - they have a 10 minute screencast they sells it better than I can. While I'm still convinced Java (as a language) will be surpassed for an overwhelming majority of the web-development as the language continues to stagnate, this is a compelling framework for the Java platform as a whole, even if Java isn't your language of choice.</p>]]></description>
   </item>
   <item>
      <title>JRuby &quot;IO.foreach&quot; Performance</title>
      <link>http://www.realjenius.com/2009/11/03/jruby-io-foreach-performance</link>
      <author>R.J. Lorimer</author>
      <pubDate>November 03, 2009</pubDate>
      <guid>http://www.realjenius.com/2009/11/03/jruby-io-foreach-performance</guid>
      <description><![CDATA[

	<img src="/public/images/articles//jruby/logo.png" class="article_image right"/>

<p>I've been spending some time dipping my toes in patch contribution for JRuby recently. I started with a few easy, isolated, spec issues, and have since been working my way into more entrenched problems. The past few weeks I spent a good bit of time toying with solutions to <a href="http://jira.codehaus.org/browse/JRUBY-2810">JRUBY-2810</a>: "IO foreach performance is slower than MRI". The exercise was interesting enough, that I thought it might be worth posting here. This isn't meant to be a study of the JRuby code in particular, but more-so in the thought process of diagnosing a performance problem in foreign code.</p>

<h3>Proof is in the Benchmark</h3>
<p>Performance is a very multi-faceted thing - there are so many measuring sticks (CPU, memory, I/O, startup time, scalability, 'warm up' time, etc). This makes quantifying a performance problem hard.</p>
<p>Furthermore, improvements for most performance problems typically involves making some kind of trade-off (unless you're just dealing with bad code). The goal is to trade-off a largely-available resource for a sparse one (cache more in memory to save the CPU, or use non-blocking IO to use more CPU rather than waiting on the disk, etc).</p>
<p>JRuby always has a few open, standing performance bugs. It's the nature of the beast that it is compared to MRI (the "reference" implementation), and anywhere it performs less favorably is going to be considered a bug (fast enough be damned). The performance measurement is up to the beholder, but CPU timings are generally the most popular.</p>
<p><a href="http://jira.codehaus.org/browse/JRUBY-2810">JRUBY-2810</a> is an interesting case. IO line looping was proving to be slower than MRI Ruby; in some cases <strong>much</strong> slower. In this particular case, CPU was the closely-watched resource.</p>
<p>The first step I took to analyzing the problem was reproducing it. With Ruby this is usually pretty easy, as arbitrary scripts can just be picked up and executed, as opposed to Java, where all too often you have to build a special harness or test class just to expose the problem. Scripts are very natural for this, and in this particular case, the user had already provided one in the benchmarks folder that ships with the JRuby source.</p>
<p>Having run that file, I quickly saw the performance discrepancy reported in the bug. At this point in my experimenting, I was running inside an Ubuntu VM through VirtualBox on my Windows machine, so I think that level of indirection exasperated the numbers, so I checked my Macbook Pro as well. In both cases, the differences were significant: on Ubuntu, MRI Ruby was running the code in under <strong>10 seconds</strong>, where JRuby was taking <strong>30 seconds to a minute</strong>; the Macbook was still twice as slow in JRuby (<strong>12 seconds</strong>) as compared to MRI (<strong>6.5 seconds</strong>).</p>
<p>When faced with a big gap like this, I generally start by profiling. Running the entire process under analysis will generally grab some hotspots that need some tuning. I'm enamored with how low the barrier to entry on profiling has become on modern Java VMs (something that I think is actually a big selling point for JRuby as compared to other Ruby implementations; but I digress). To do my work here, I simply ran the benchmark, and popped open VisualVM. From there, I simply connected and performed CPU profiling (which automagically connects and injects profiling code into the running system).</p>
<p>In this particular case, the first problem was quickly uncovered:</p>


<div class="picture alignnone">
	<img src="/public/images/articles//jruby-foreach/jruby-2810-profile.png"/>
	<div class="caption">
		Great Odin's Raven!
	</div>
</div>

<p>Clearly, a very large amount of time is being spent in ByteList.grow. I felt fairly fortunate at this point, as rarely is it this straightforward; having a performance problem reported with this singular of a hot-spot. When nearly 80% of the processing time is spent in a single method, it brings up several questions: What is ByteList? Why does IO.foreach use it? Why must it keeping 'growing'? Did I leave the iron on? To answer these questions (most of them, anyway) you simply have to get your feet wet in the code.</p>
<h3>Coding for Crackers</h3>
<p>At its heart, IO.foreach (and the close counterpart, each/each_line) is simply a line iterator that hands off each line of text off to a receiving block - there are a number of caveats and subtleties built into that idea, but at its core, it allows developers to write code like this:</p>

<pre class="brush: ruby">
io = #...
io.each_line do |line|
puts line
end

</pre>
<p>Deceptively, simple - isn't it? It turns out that a lot of wrangling has to occur to make this so simple - much of it having to do with how files are encoded, and the variety of line separators that may exist. Thankfully, the good folks at JRuby have cemented this in the code fairly decently - for my part, I mostly had to draw boxes around the complex encoding and line termination algorithms, and focus on the loop and data-reading itself. Most of this was occurring in a single method (for the code-eager, this was in RubyIO#getline and its derivatives). This method is used in a number of scenarios: the traditional looping algorithms, the 1.9 inverted enumerator stuff (handing the ownership of "next" off to the caller) as well as basic calls to 'IO.gets'. Internally, each call to getline allocates a new ByteList and copies data from the input stream into it.</p>
<p>This is where the high-CPU numbers started. ByteList is simply an easier-to-use wrapper around a byte[]. It backs several JRuby data structures - the most notable probably being RubyString (the Java peer for String objects in JRuby). In fact, the ByteList allocated in this routine is eventually given to a String object, and returned at the end of the call. The 'grow' method on ByteList (the offending code-point) is the automatic capacity increase mechanism, and does this via an an array-allocation and copy (much like ArrayList); this method uses a fairly standard 1.5x grow factor.</p>
<p>It's easy to see how ByteList would be central to the benchmark since it represents the primary data structure holding the bytes from the input source, but it seemed suspicious that 'grow' was the offending hotspot. I would expect it to be one of the copy methods, like 'append', which is really where the algorithm <em>should</em> be spending its time (that, and 'read' from the input source). To understand why 'grow' was so cranky, I had to look more closely at the code I was invoking: the benchmark.</p>
<h3>Understanding the Benchmark</h3>
<p>The original benchmark used to test the 'foreach' performance in JRuby when 2810 was first opened performed something like 10,000 line iterations on a file with relatively short lines. Halfway through the life of this bug, those values were adjusted in this original benchmark in a way that exposed a short-coming in the JRuby line-read routine - by generating only 10 lines that were very, very long instead.</p>
<p>For any Ruby implementation, reading a file with particularly long lines using foreach is prohibitively expensive, as the entire line has to be read into memory as a single string object that is then shared with the code block. Normally, you wouldn't want to read data this way if you knew that the file was structured so wide, and should probably consider a streamed-read instead. That being said, MRI Ruby performed much more admirably in this scenario, so it was something to be analyzed.</p>
<p>The root of the problem was this: JRuby was starting with an empty ByteList, and was then creating subsequently larger byte[]s indirectly (via ByteList.grow) - the 1.5x factor wasn't enough, as the chunks were being read 4k at a time, and these files were significantly wider than 4k. For that reason alone, the ByteList was having to grow a number of times for each line, and when we're talking about a byte[] several kilobytes in size, array copies are simply going to be expensive - all those together combine to make this an unfriendly performance proposition.</p>
<p>As I mentioned previously, the benchmark used to be a very different performance model. I decided at this point it was good to split the benchmark so that both could be run side by side, and I could see both the 'wide' scenario and the 'tall' scenario at the same time. It turned out via profiling that the tall file was experiencing pains from 'grow', but not nearly so badly. Even at 10,000 lines the amount of adverse memory allocation and churn was much smaller, as a single 4k allocation on each line was more than sufficient.</p>
<p>For reference, here is what the 'tall' benchmark looks like:</p>

<pre class="brush: ruby">
require 'benchmark'

MAX  = 1000
BLOCKSIZE = 16 * 1024
LINE_SIZE = 10
LINES = 10000
FILE = 'io_test_bench_file.txt'

File.open(FILE, 'w'){ |fh|
  LINES.times{ |n|
    LINE_SIZE.times { |t|
      fh.print &quot;This is time: {t} &quot;
    }
    fh.puts
  }
}

stat = File.stat(FILE)
(ARGV[0] || 5).to_i.times do
  Benchmark.bm(30) do |x|
    x.report('IO.foreach(file)'){
      MAX.times{ IO.foreach(FILE){} }
    }
  end
end

File.delete(FILE) if File.exists?(FILE)

</pre>
<p>The only difference in the wide benchmark is the tuning parameters:</p>

<pre class="brush: ruby">
LINE_SIZE = 10000
LINES = 10

</pre>
<p>So <strong>'tall'</strong> can be read as <strong>'10000 lines, 10 sentences long'</strong>, and <strong>'wide'</strong> can be read as <strong>'10 lines, 10000 sentences long'</strong>.</p>

<p>Also for reference, here is what it looks like to run a benchmark using this framework - 5 iterations are run (as defined in the file), and the various aspects of CPU usage are measured. Generally, the most important number is the 'real' column when measuring performance between Ruby and JRuby, as the two report user/system CPU usage very differently.</p>

<pre class="brush: bash">
# Running with JRuby
realjenius:~/projects/jruby/bench$ jruby --server bench_io_foreach_wide.rb
                                     user     system      total         real
IO.foreach(file)                63.970000   0.000000  63.970000 ( 63.764000)
                                     user     system      total         real
IO.foreach(file)                30.212000   0.000000  30.212000 ( 30.212000)
                                     user     system      total         real
IO.foreach(file)                30.973000   0.000000  30.973000 ( 30.973000)
                                     user     system      total         real
IO.foreach(file)                30.768000   0.000000  30.768000 ( 30.767000)
                                     user     system      total         real
IO.foreach(file)                32.813000   0.000000  32.813000 ( 32.813000)

#Running with MRI Ruby
realjenius:~/projects/jruby/bench$ ruby bench_io_foreach_wide.rb
                                     user     system      total         real
IO.foreach(file)                 0.200000   9.500000   9.700000 (  9.982682)
                                     user     system      total         real
IO.foreach(file)                 0.230000   9.430000   9.660000 (  9.889992)
                                     user     system      total         real
IO.foreach(file)                 0.560000   9.340000   9.900000 ( 10.232858)
                                     user     system      total         real
IO.foreach(file)                 0.520000   9.270000   9.790000 ( 10.054699)
                                     user     system      total         real
IO.foreach(file)                 0.600000   9.350000   9.950000 ( 10.348258)

</pre>
<p>After splitting the benchmarks, here is a breakdown of my two configurations:</p>
<table class="infoTable">
<tr><th>Environment</th><th>'wide' MRI</th><th>'wide' JRuby</th><th>'tall' MRI</th><th>'tall' JRuby</th></tr>
<tr><td>Ubuntu VM</td><td>10 seconds</td><td>30 seconds</td><td>6 seconds</td><td>11 seconds</td></tr>
<tr><td>Macbook Pro</td><td>6.5 seconds</td><td>12 seconds</td><td>8 seconds</td><td>15 seconds</td></tr>
</table>

<p>Keep in mind I'm just rounding here; not really trying to be exact for this blog post. Check the bugs for more exact numbers.</p>

<h3>A Solution Lurks</h3>

<p>So, we have performance problems on tall files, and a whole lot more performance problems on wide files, particularly depending on the environment. Because of the environmental discrepencies, I spent some more time comparing the two test environments. It turned out that the Macbook Pro was simply working with a more resource-rich environment, and as such wasn't hitting the wall as badly when allocating the new immense byte[]s. The implementation in JRuby was not degrading as well on older (or more restricted) hardware as MRI.</p>
<p><em>(It's probably good to note here the value of testing in multiple environments, and from multiple angles)</em></p>
<p>My first pass at a solution to this problem was to consider a byte[] loan algorithm. Basically, at the start of foreach, I effectively allocated a single ByteList (byte[] container), and for each iteration of the loop, I just reused the same ByteList -- eventually the byte[] being used internally would be sufficient to contain the data for each line, and would not need to grow any more (yay recycling!).</p>
<p>I encapsulated most of this 'unsafe' byte[] wrangling and copying into a small inner class called ByteListCache - at the start of the loop, the ByteListCache is created, and then it is shared for each iteration, being passed down into 'getline' as an optional parameter, the side effect being that the first call to 'getline' manually allocates a large byte[] (just like it did pre-patch), and each subsequent call can simply reuse the previously allocated byte[] that is already quite large. If the need arises to grow it more, it can, but it becomes increasingly less likely with each line.</p>
<p>Once the iteration is completed, the ByteListCache is dropped out of scope, ready for garbage collection. The number of calls to 'grow' drops dramatically with this implementation, and so did the impacts to the performance:

<table class="infoTable">
<tr><th>Environment</th><th>'wide' MRI</th><th>'wide' JRuby</th><th>'wide' JRuby (v1)<th>'tall' MRI</th><th>'tall' JRuby</th><th>'tall' JRuby (v1)</th></tr>
<tr><td>Ubuntu VM</td><td>10 seconds</td><td>30 seconds</td><td><strong>7 seconds</strong></td><td>6 seconds</td><td>11 seconds</td><td><strong>8 seconds</strong></td></tr>
<tr><td>Macbook Pro</td><td>6.5 seconds</td><td>12 seconds</td><td><strong>7 seconds</strong></td><td>8 seconds</td><td>15 seconds</td><td><strong>9 seconds</strong></td></tr>
</table>

<p>Unfortunately, they were only this fast because the implementation was now <strong>thoroughly broken</strong>.</p>

<p><h3>Stop Breaking Crap</h3>
<p>Okay, so I had amazing performance numbers. Except. Now over 50 ruby spec tests were failing. Oh yeah, that might be a problem. Needless to say the problem was obvious the minute I realized what I had done (I actually woke up at 6:00am realizing this, which if you know me, is a bad sign). Remember how earlier I said that the ByteList was used as a backing store for the String? Well, at the time I implemented this, that point had eluded me. I was (accidentally) creating strings with my shared bytelist, so you can probably see where that would end up creating some significant issues with data integrity.</p>
<p>To fix this, the solution was simple - create a perfectly-sized ByteList at the end of the line-read the exact size necessary for the String, copying into it from the shared bytelist, and then passing it in to the String constructor. Obviously this cut into my performance numbers by a percentage on each, but it also fixed the data corruption, which is nice.</p>

<table class="infoTable">
<tr><th>Environment</th><th>'wide' MRI</th><th>'wide' JRuby</th><th>'wide' JRuby (v2)<th>'tall' MRI</th><th>'tall' JRuby</th><th>'tall' JRuby (v2)</th></tr>
<tr><td>Ubuntu VM</td><td>10 seconds</td><td>30 seconds</td><td><strong>14 seconds</strong></td><td>6 seconds</td><td>11 seconds</td><td><strong>10 seconds</strong></td></tr>
<tr><td>Macbook Pro</td><td>6.5 seconds</td><td>12 seconds</td><td><strong>10 seconds</strong></td><td>8 seconds</td><td>15 seconds</td><td><strong>13 seconds</strong></td></tr>
</table>

<p>The lesson learned here, obviously, is that you need to run a variety of tests (a full suite of specs if you have them) when considering bug fixes. For JRuby, that means (at a minimum) running the specs, which is easy with the Ant script:</p>

<pre class="brush: bash">
ant spec # or ant spec-short to just run interpreted tests

</pre>
<h3>A Word on Limited Application</h3>
<p>Note that I isolated the use of this construct to the foreach and each_line algorithms, as these had deterministic, single-threaded behavior, and would benefit from the overhead of dealing with this additional object. The new Ruby 1.9 enumerator stuff does not use it, as there is no guarantee of single-threaded usage of the enumerator, so we can't reuse a single byte list. Similarly, individual calls to 'gets' do not currently use it, for the same general reason.</p>
<p>Changes could be made to make this byte[] re-use more long-lasting/global - but the changes required felt a little too overwhelming for a first pass, even if they did offer potentially larger benefits.</p>

<h3>Rinse and Repeat</h3>
<p>Now that I had two tests, and I had seen some improvements (but not quite in the range of MRI), it was time to revisit. Re-running the benchmarks, it was fascinating to see a new prime offender - incrementlineno. It turns out that a global variable was having to be updated through a very indirect routine that contains a fixnum representing the line number in the file, and all of this heavy-weight variable updating (going through call-sites and arg file lookups) was very expensive in comparison to the rest of the iteration.</p>
<p>At this point, I'd spend a lot of time explaining how I improved the performance of this little gem, however the truth be told once I hit this, I simply had to inform the powers-that-be, and back up. You see, I couldn't figure out (for the life of me) why this method was doing what it was doing; why it was so important for this line number to be set. This is one of the perils that I have verbally discussed with folks about consuming foreign code-bases. You can't assume secret sauce is a bad thing - I had to assume it is there for a reason, even if I don't know what it is.</p>
<p>It turns out, the JRuby folks didn't know the reason either. Well, that's not exactly true; it didn't take long for Charles Nutter to figure out why it was there, but it was clear it was only for rare file execution scenarios, and not appropriate for the more general looping scenarios I was debugging. To follow his efforts on how he optimized that code path, you can reference his commit here: <a href="http://jira.codehaus.org/browse/JRUBY-4117">JRUBY-4117</a>.</p>
<p>After his optimizations, the numbers boosted again:</p>

<table class="infoTable">
<tr><th>Environment</th><th>'wide' MRI</th><th>'wide' JRuby</th><th>'wide' JRuby (v3)<th>'tall' MRI</th><th>'tall' JRuby</th><th>'tall' JRuby (v3)</th></tr>
<tr><td>Ubuntu VM</td><td>10 seconds</td><td>30 seconds</td><td><strong>11 seconds</strong></td><td>6 seconds</td><td>11 seconds</td><td><strong>8.5 seconds</strong></td></tr>
<tr><td>Macbook Pro</td><td>6.5 seconds</td><td>12 seconds</td><td><strong>6.3 seconds</strong></td><td>8 seconds</td><td>15 seconds</td><td><strong>9.5 seconds</strong></td></tr>
</table>

<h3>Summary</h3>
<p>I think it's fascinating how varied the numbers are depending on the platform. This is a complicated benchmark, and as Charles Nutter mentioned to me, one problem we'll continue to face is that we have no control element in this benchmark. You can get consistency through repetition, but there are simply too many variables to predict exactly what the outcome will be on any given platform. I find it interesting how well the Macbook handles the wide files compared to the Ubuntu VM, which just dies a slow death in comparison - this has to be a side-effect of resource starvation in the VM; but whatever the case, it's an interesting dichotomy.</p>
<p>On average, the new numbers are much more competitive with MRI, even if they don't beat it in most cases. As I learned from working with others on this bug, your mileage may vary significantly, but it's clear from the implementation that we're causing a lot less resource churn for very little cost (the trade off here is retained memory), and that's generally a good sign things are going in the right direction. Certainly, profiling has shown that the effort is much more focused on reading from the input channel.</p>
<p>That being said, I'm sure there is more performance to be found - MRI is just a hop-skip-and-jump away!</p>]]></description>
   </item>
   <item>
      <title>Fistful of Awesome: IntelliJ Open-Sourced</title>
      <link>http://www.realjenius.com/2009/10/15/fistful-of-awesome-intellij-open-sourced</link>
      <author>R.J. Lorimer</author>
      <pubDate>October 15, 2009</pubDate>
      <guid>http://www.realjenius.com/2009/10/15/fistful-of-awesome-intellij-open-sourced</guid>
      <description><![CDATA[
<p>In a surprising move, the JetBrains team <a href="http://blogs.jetbrains.com/idea/2009/10/intellij-idea-open-sourced/">has decided to open-source</a> the JavaSE portions of IntelliJ IDEA 9.0 and beyond
 under an Apache 2.0 license. They will begin offering two products, a community edition, and an ultimate edition:</p>

<blockquote>Starting with the upcoming version 9.0, <a href="http://www.jetbrains.com/idea/nextversion/index.html?utm_source=IDEA_BLOG&amp;utm_media=Anouncement&amp;utm_campaign=IDEA9_CE">IntelliJ IDEA</a> will be offered in two editions: Community Edition and Ultimate Edition. The Community Edition focuses on Java SE technologies, Groovy and Scala development. It’s free of charge and open-sourced under the Apache 2.0 license. The Ultimate edition with full Java EE technology stack remains our standard commercial offering. See the <a style="font-family: 'trebuchet ms', verdana, tahoma, arial, sans-serif; font-size: 1em; line-height: 1.5em; color: #043dbc; text-decoration: none; border-bottom-style: solid; border-bottom-color: #3d7dc7; padding: 0pt; margin: 0pt; border: 0px initial initial;" rel="nofollow" href="http://www.jetbrains.com/idea/nextversion/editions_comparison_matrix.html?utm_source=IDEA_BLOG&amp;utm_media=Anouncement&amp;utm_campaign=IDEA9_CE">feature comparison matrix</a> for the differences.</blockquote>

<p>Very cool news! The impact in competition for other IDEs (namely <a href="http://www.eclipse.org">Eclipse</a> and <a href="http://www.netbeans.org">NetBeans</a>) remains to be seen, but this certainly brings another aggressive (and already well-liked competitor) to a broader market.</p>]]></description>
   </item>
</channel>
</rss>
