Notes from the Ruby Manor 2008 (part 2)

So this is Ruby Manor part 2. After the Ruby Manor morning session we had a tasty pub lunch, rocked up a little late, and settled down to an interesting afternoon of Ruby talks...

Again, the errors and omissions are mine.

Jonathan Conway -- Neo4J

In the beginning we had flat files. They worked well. Then we got relational databases. Great for ad-hoc queries and they allow you to define relational constraints. Brilliant. However, there are problems...

When you try and map an object into a relational table you can run into issues. Jonathan pointed out other issues too, but I think I missed them. My mistake (I'd just had lunch and was slightly snoozy)...

There are plenty of ORMs to choose from, such as ActiveRecord, Datamapper, Sequel and Og. When we use an ORM we move the constraints out of the database and into the model. It's a nice way to work, but you're no longer using the relational database in the way that it was designed to be used. You end up with semi-structured data (data that has very few mandatory fields but many optional fields) and sparse tables. You then find that you need to partition your tables sooner than you would do if you had properly structured data.

Maybe the relational database isn't a one size fits all solution. There are other possibilites:

  • Column oriented databases (e.g. lucid).
  • Document oriented databases (such as CouchDB).
  • Graph oriented databases (e.g. Neo4J).

Neo4J uses a graph as the underlying data structure. It's based on Java, but has a small memory footprint. It's also very fast, a benefit of being embedded in the JVM.

There's a Ruby wrapper -- Neo4J.rb. It's available on github.

Queries are written using Lucene. All nodes in the graph -- and the relationships between them -- can have properties defined on them. They're exposed as simple hashes in Ruby.

Jonathan doesn't think that Neo4J.rb is production ready yet, but Neo4J itself is. Also see http://neotechnology.com. It sounds pretty interesting -- well worth a look if you're interested in database technology.

Martin Loughran -- APICache

Martin got up and did a lightning talk on his APICache project; a useful little library for improving the resilience of sites that rely on (sometimes flaky) third party APIs.

Alex McCaw -- Acts as Recommendable

Alex works at Made By Many where he developed the Acts as Recommendable plugin. It allows you to add a recommendations system to your Rails app, and was largely developed using O'Reilly's Programming Collective Intelligence. There are three basic approaches to recommendations:

  • Content based (e.g. bayesian filtering).
  • User based.
  • Item based.

Acts as Recommendable uses item based recommendations.

To use the plugin you need to define a has_many :through relationship between your models, then call acts_as_recommendable:

class User < ActiveRecord::Base
  has_many :user_books
  has_many :books, :through => :user_books
  acts_as_recommendable :books, :through => :user_books
end

You will then have these extra methods on your models:

User#similar_users
User#recommended_books
Book#similar_books

In this example the second two methods are the really useful ones.

You can obviously store things on the join table, such as a score (if you're interested in making a rating system you can add a rating column to your join table).

You can choose from the following algorightms: Manhattan distance, Euclidean distance, Cosine, Pearson correlation coefficient, Jaccard, Levenshtein.

Alex described some of the issues that they ran into whilst developing Acts as Recommendable. They had to work around scaling issues (such as queries that pulled every row from their tables), caching issues, problems with Ruby (it couldn't cope with data sets so large), and performance problems with the algorithm. They solved all the issues (though I didn't quite grasp the full details of how they went about it), split the cache up, and re-implemented slow code (the Pearson algorithm) in C.

Alex's advice when implementing acts_as_recommendable:

  • You don't want to have too many relationships.
  • You don't want to have too many items (but it's been tested with 100,000).
  • Setup the similarity relationships for items, not users.

To scale it further you can use K Means clustering or split your clusters by category (which is the way Amazon does it). You need to start splitting your clusters once you get to approximately 100,000 items. The plugin can probably cope with up to 200,000 items, but you can always partition your data and use Nanite or map reduce to scale it further.

This is clearly pretty clever stuff. I've missed many of the details from Alex's slides; hopefully he'll upload his slides to his blog. The code is in github (for a change). There's also a Collective Intelligence video from Goruco 2008 that's worth watching.

Sean O'Halpin -- Unobtrusive Metaprogramming

What does unobtrusive mean? In this context it's "polite, considerate". About being nice. And metaprogramming? "Programs that write programs are the happiest programs in the world." (Andrew Hume)

Before you load ActiveSupport you have 75 methods on a Ruby object. After you load ActiveSupport you have 173. Sean's point is that Rails messes with stuff. By defining methods on Object rails can affect what you're doing in your own namespace. For example, Rails defines an inspect on class that hits the database. Would you expect that? More to the point, people who think that's okay frighten the crap out of me...

So (assuming you're not also slightly scared by these people) what's the problem?

  1. Namespace pollution: You can't use the names you want, your code is not safe, and you get clashes that can be hard to debug (we've all been there).
  2. Documentation pollution: rubydoc documents all methods on your object, so all this extra cruft ends up in your docs.
  3. Cognitive pollution: What does an object do? You can't tell by looking at its capabilities.

So it's about pollution then. Stinky.

Why does this happen? The obvious answer is that it's easy. But it's one thing to monkey patch classes/objects in your own application. It's another thing entirely to do it in a reusable library; you affect everybody.

How should we solve these problems?

  • Use functions that accept an object and return a modified version (i.e. you don't need to monkey patch another method onto an object -- functions are okay).
  • Use namespaces (i.e. modules).
  • Hide your mess.

Sean then showed us some rather clever code for "hiding your mess". Hopefully he'll post the code online. If he does, check out the KittySafe module.

And as Sean pointed out, we each paid 23p (8 minutes out of 8 hours) to hear his talk. Bargain. It would have been even better value if somebody hadn't been heckling halfway through (cheers for your input Piers, but have you considered being slightly more unobtrusive yourself by saving your quibbles until the end of the presentation?).

James Darling -- Government Hack Day

James is organising a government hack day. He's realised that it's difficult to persuade the powers that be to open up information behind freely available APIs. Most people just don't appreciate the benefit of mashups (release the data and goodness is bound to follow) until you show them.

The plan is to get some data together (he'll be launching a wiki to collaborate on data gathering), get some interested coders together, then we can all spend 24 hours producing some incredible goodness. The results will (at the very least) help to demonstrate to those who hold the keys just how much could be done if the government adopted the same approach.

Interested parties can sign up on the UK Government Hack Day site (which was later renamed to rewiredstate.org). So sign up!

David Black -- Ruby 1.9

This is a talk about the changes in Ruby 1.9. Ruby 1.8.7 has a bunch of features that were backported from 1.9, but David is looking at the changes between 1.8.6 and 1.9.

Method and block argument semantics

Required arguments can now come after optional arguments. If you call the method below with two arguments they'll get bound to a and c (i.e. required arguments get serviced first):

def m(a, b = 1, c)

Required args can now come after wildcard aguments:

def m(a, *b, c)

Arguments can be grouped with parentheses for dealing with arguments passed as arrays. If you call the method below with [1, 2], 3 then a will be 1, b 2 and c 3. If you call it with 1, 2 then a will be 1, b will be nil and c will be 2.

def m((a, b), c)

Block and method argument semantics have now effectively been merged, though they were different in previous versions of Ruby. This means you can pass optional arguments to blocks, use the parentheses above, etc.

Block variable scoping rules

In Ruby 1.8:

a = 1
Proc.new { |a| }.call(30)
a #=> 30

In Ruby 1.9:

a = 1
Proc.new { |a| }.call(30)
a #=> 1

However if the variable you update isn't the block argument then it'll still get changed in Ruby 1.9:

a = 1; { a = 2 }  # a is now 2

Unless you specify it after a semicolon, which declares the variable to be a block local variable:

a = 1; { |x; a| a = 2 }  # a is still 1

Enumerators

Enumerator objects have all the methods you get in the Enumerable module, but they don't know how to find their next item. So you have to tell them. You can teach the object how to do it by passing a block when you instantiate it, or you can get it to borrow each from somewhere else.

Here's an example:

enum = Enumerator.new do |y|
  y << ["New York", "NY"]
  y << ["Florida", "FL"]
  y << ["Texas", "TX"]
end

enum.each do |state, abbr|
  puts "#{abbr} is short for #{state}"
end

Once your object knows how to handle each you can happily call all the other Enumerable methods such as map. Nice.

David then showed us a neat bit of code for XOR'ing a string (that I think he wrote earlier today in response to Paul pointing out how much easier it is to do in Perl) that took advantage of these new enum capabilities. Sorry, I didn't capture the details...

Good talk. He's an entertaining chap.

Wrapping Up

It's been a great day. Excellent work James and Murray. Why aren't more conferences like this? I think I may actually have made money out of turning up. No really.

You see, in order to break even (having paid for the room, audio equipment, projector, etc.) the conference only needed 66 attendees. It was fully booked with 130 people. All the spare cash was put behind the bar downstairs. I don't know about you, but I've never been to a conference that cost 12 quid (quid = British pounds) and came with five free pints of beer. We didn't even manage to drink the bar dry (a poor show, but in our defense it was a subsidised University bar), so it stretched to half a curry for those of us that made it.

Genius, really.

I love feedback and questions — please feel free to get in touch on Mastodon or Twitter, or leave a comment.