First look at Pylab/Matplotlib

Since I’ve been getting into Machine Learning/Artificial Intelligence recently, I’ve been looking at various computing environments recently. Some of the contenders are:

  • MATLAB – The traditional software stack for doing machine learning and statistical analysis
  • GNU Octave – An open-source MATLAB clone.
  • R – An open source clone of a statistical computing environment called S.
  • Julia – A language for doing statistical analysis. The goals are to compete with Matlab and R.
  • Matplotlib/Pylab/SciPy/NumPy – see below

Of these, I’ve tried Octave and Matplotlib. Matplotlib/Pylab is basically the software stack consisting of:

  • iPython – an interactive REPL for Python with things like tab completion
  • Matplotlib – a graphical plotting library
  • NumPy – a matrix library
  • SciPy – a collection of scientific and mathematical algorithms

I’ve only played with Matplotlib/Pylab a little bit, but I like what I’ve seen so far. Here’s 2 quick examples:

Displaying a Histogram of a gaussian distribution:

(I love how there are bell-shaped curves in the array representation of the histogram) :)

Plotting a function (y=x^2+10):

Also, there is a really good video from PyCon 2012 on Matplotlib: Plotting with Matplotlib.

Reddit Facebook Plusone Twitter Digg Delicious Email

How to reference self in a block in Objective-C

When using ARC (automatic reference counting) in Objective-C, you need to be careful to avoid causing retain cycles. One place where a retain cycle can occur is where self is referenced in a block. To avoid a retain cycle, you can use the __unsafe_unretained modifier as such:

__unsafe_unretained id unretained_self = self;
reachability = [Reachability reachabilityWithHostname:@"maps.google.com"];
reachability.reachableBlock = ^(Reachability *r) {
    dispatch_async(dispatch_get_main_queue(), ^{
        SurveyFormController *myself = unretained_self;
        self.mapView.reachable = YES;
        [myself configureView];
    });
};
Reddit Facebook Plusone Twitter Digg Delicious Email

Linear Regression with Gradient Descent

I’m taking the Stanford Machine Learning class. The first algorithm we covered is Linear Regression using Gradient Descent. I implemented this algorithm is Python. Here’s what it looks like:

import subprocess
 
def create_hypothesis(slope, y0):
    return lambda x: slope*x + y0
 
def linear_regression(data_points, learning_rate=0.01, variance=0.001):
    """ Takes a set of data points in the form: [(1,1), (2,2), ...] and outputs (slope, y0). """
 
    slope_guess = 1.
    y0_guess = 1.
 
    last_slope = 1000.
    last_y0 = 1000.
    num_points = len(data_points)
 
    while (abs(slope_guess-last_slope) > variance or abs(y0_guess - last_y0) > variance):
        last_slope = slope_guess
        last_y0 = y0_guess
 
        hypothesis = create_hypothesis(slope_guess, y0_guess)
 
        y0_guess = y0_guess - learning_rate * (1./num_points) * sum([hypothesis(point[0]) - point[1] for point in data_points])
        slope_guess = slope_guess - learning_rate * (1./num_points) * sum([ (hypothesis(point[0]) - point[1]) * point[0] for point in data_points])
 
    return (slope_guess, y0_guess)
 
def plot_data_and_line(line, data_file):
    # Plot with gnuplot
    plot = subprocess.Popen(['gnuplot'], stdin=subprocess.PIPE)
    line_eq = "%f*x+%f" % line
    plot.communicate("plot %s, '%s' with points;" % (line_eq,data_file))
 
if __name__ == "__main__":
    data_file = "points.dat"
    points_str = [line.strip().split('\t') for line in open(data_file,'r').readlines()]
    points = [(float(x),float(y)) for (x,y) in points_str]
    line = linear_regression(points)
    print line
 
    plot_data_and_line(line, data_file)

This code takes in a data file with a data point on each line with the x and y separated by a TAB; Example:

0.000000	95.364693
1.000000	97.217205
2.000000	75.195834
3.000000	60.105519

This code also plots the data points and the resulting best-fit line in gnuplot. For example:

This code, along with helper files, is on my github repo.

Reddit Facebook Plusone Twitter Digg Delicious Email

Embedding an XSLT in an XML document

Today, I had this XML document which I was wanting to make viewable in a browser. My plan was to run the XML document through an XSLT engine to transform it into an HTML document (which could then be opened in a browser).

Instead, I found that I could just embed the XSLT directly in the XML document. Then, when this file is opened in a browser, the XSLT executes and spits out HTML, which is then rendered into the browser window. This is great because I now have the original XML file (which we wanted to keep further analysis purposes) which can also be rendered nicely in a browser.

Here’s an example:

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xml" href="#stylesheet"?>
<!DOCTYPE catelog [
<!ATTLIST xsl:stylesheet
  id    ID  #REQUIRED>
]>
<catalog>
<xsl:stylesheet id="stylesheet" version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
  <html>
  <body>
  <h2>My CD Collection</h2>
    <table border="1">
      <tr bgcolor="#9acd32">
        <th>Title</th>
        <th>Artist</th>
      </tr>
      <xsl:for-each select="catalog/cd">
      <tr>
        <td><xsl:value-of select="title"/></td>
        <td><xsl:value-of select="artist"/></td>
      </tr>
      </xsl:for-each>
    </table>
  </body>
  </html>
</xsl:template>
</xsl:stylesheet>
 
    <cd>
        <title>Empire Burlesque</title>
        <artist>Bob Dylan</artist>
        <country>USA</country>
        <company>Columbia</company>
        <price>10.90</price>
        <year>1985</year>
    </cd>
    <cd>
        <title>Hide your heart</title>
        <artist>Bonnie Tyler</artist>
        <country>UK</country>
        <company>CBS Records</company>
        <price>9.90</price>
        <year>1988</year>
    </cd>
</catalog>

And this renders as such:

Please note that this doesn’t work for Internet Explorer. For IE, the XSL document must be referenced like this:

<?xml-stylesheet type="text/xml" href="report.xsl"?>
Reddit Facebook Plusone Twitter Digg Delicious Email

Resampling Wheel Algorithm

I’ve been taking the Udacity CS373 (Robotic Car) class, and this week the topic was Particle filters. Particle filters are basically a localization algorithm that accounts for error in measurements and sensors.

Anyway, part of the Particle Filter algorithm requires the generation of a new set of these things called “particles” based on the particles’ weights. So to accomplish this task, the Resample Wheel algorithm was presented in class. It is a particularly elegant method of generating a new set of particles by randomly drawing from an old set of particles (with replacement). The particle weight determines the likelihood of it being picked.

Here is the algorithm (with print statements – I used these to help me understand how the algorithm works):

import random
 
def generate_new_particles(old_particles, weights):
    N = len(old_particles)
    new_particles = []
    index = int(random.random() * N)
    beta = 0.0
    mw = max(weights)
    for i in range(N):
        beta += random.random() * 2.0 * mw
        print "beta =", beta
        while beta > weights[index]:
            beta -= weights[index]
            index = (index + 1) % N
            print "\tbeta= %f, index = %d, weight = %f" % (beta, index, weights[index])
        new_particles.append(old_particles[index])
    return new_particles
 
if __name__ == "__main__":
    old_particles = [1, 2, 3, 4]
    weights = [.3, 0, .4, .3]
    new_particles = generate_new_particles(old_particles, weights)
 
    print "old particles =", old_particles
    print "weights =", old_particles
    print "new particles =", new_particles

And here is a sample run (with my debug print statements):

beta = 0.536313558942
	beta= 0.236314, index = 0, weight = 0.300000
beta = 0.510275914962
	beta= 0.210276, index = 1, weight = 0.000000
	beta= 0.210276, index = 2, weight = 0.400000
beta = 0.954824882947
	beta= 0.554825, index = 3, weight = 0.300000
	beta= 0.254825, index = 0, weight = 0.300000
beta = 0.678700538951
	beta= 0.378701, index = 1, weight = 0.000000
	beta= 0.378701, index = 2, weight = 0.400000
old particles = [1, 2, 3, 4]
weights = [1, 2, 3, 4]
new particles = [1, 3, 1, 3]
Reddit Facebook Plusone Twitter Digg Delicious Email

JSONP for cross-origin AJAX

I was recently wanting to enable a webpage to utilize some server-side functionality which was on a different server than the one serving the original page. Traditionally, this has been problematic due to browsers’ Same origin policy. JSONP is a solution to this problem. Here is how to do it with jQuery…

This webpage can be saved locally to any machine, opened in a browser, and can then load data from my web server:

<html><head>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){
    $("button").click(function(){
        $.ajax({
            url:"http://test.calebmadrigal.com/javascript/jsonp_test.json",
            dataType:'jsonp',
            jsonpCallback: "jsonp_load",
            success:function(result){
                $("#outputdiv").html(result);
            }
        });
    });
});
</script>
</head><body>
    <button>Load data</button>
    <div id="outputdiv"></div>
</body></html>

Here is what the server-side looks like:

jsonp_load("<h4>This text was loaded from http://test.calebmadrigal.com/javascript/jsonp_test.json</h4>");

And here is a live example:


Reddit Facebook Plusone Twitter Digg Delicious Email

Tail call optimization in Python

Since I’ve been getting into functional programming more recently, the fact that Python doesn’t do tail-call optimization has been bothering me. So I did a bit of searching, and found this gem: Tail Call Optimization Decorator.

Here is a snippet from it:

import sys
 
class TailRecurseException:
  def __init__(self, args, kwargs):
    self.args = args
    self.kwargs = kwargs
 
def tail_call_optimized(g):
  """
  This function decorates a function with tail call
  optimization. It does this by throwing an exception
  if it is it's own grandparent, and catching such
  exceptions to fake the tail call optimization.
 
  This function fails if the decorated
  function recurses in a non-tail context.
  """
  def func(*args, **kwargs):
    f = sys._getframe()
    if f.f_back and f.f_back.f_back \
        and f.f_back.f_back.f_code == f.f_code:
      raise TailRecurseException(args, kwargs)
    else:
      while 1:
        try:
          return g(*args, **kwargs)
        except TailRecurseException, e:
          args = e.args
          kwargs = e.kwargs
  func.__doc__ = g.__doc__
  return func
 
@tail_call_optimized
def factorial(n, acc=1):
  "calculate a factorial"
  if n == 0:
    return acc
  return factorial(n-1, n*acc)

So it is a decorator that you place on a tail-call recursive function. This basically lets the function continue executing rather than throwing the RuntimeError: maximum recursion depth exceeded exception.

Reddit Facebook Plusone Twitter Digg Delicious Email

Standard Deviation in Python

I just wanted to go through the process of calculating standard deviation today, and this is how I did it in Python. Python makes such a nice calculator :)

>>> s = [2,4,4,4,5,5,7,9]
>>> def average(s): return sum(s) * 1.0 / len(s)
... 
>>> avg = average(s)
>>> avg
5.0
>>> variance = map(lambda x: (x - avg)**2, s)
>>> variance
[9.0, 1.0, 1.0, 1.0, 0.0, 0.0, 4.0, 16.0]
>>> average(variance)
4.0
>>> import math
>>> standard_deviation = math.sqrt(average(variance))
>>> standard_deviation
2.0
>>>
Reddit Facebook Plusone Twitter Digg Delicious Email