Taking Things Too Far: REST

Posted by Koz Monday, June 22, 2009 06:13:00 GMT

I’m going to put up a few posts based on a talk I gave at RailsConf ‘09 in Vegas and RailsWayCon in Berlin. Sorry for the delay in updating but I wanted to deliver the talks before posting here.

There’s a common pattern I see when working on code-reviews with ActionRails or my consulting work. People find a new technique, technology or idea and put it to work in their projects. At first they get a huge benefit, problems get solved quickly and things are good. Driven by this initial success they double down, they put their new tool to work in more areas of their application, they even go back over their old stuff and seek out more pure ways to apply it.

However as time passes they find the benefits aren’t quite what they used to be. Their nice new toy has turned into something which they find gets in the way on a regular basis. Eventually they ‘throw that shit out’. Part of this is just the natural progression of technology, something better comes along and we adopt it. But another part of it is our tendency to over do things. The technology we picked up isn’t shit, the promise we saw was real. But we’ve taken it beyond its intended use, learned the wrong lessons and tied ourselves up as a result.

I’m going to cover a few techniques used in the Rails community which are great, but which turn on you if you take them too far. Starting with RESTful design.

Restful Design

RESTful design really started catching on with Rails 1.2, and by the time 2.0 was released it had become something approaching Canon Law. Everyone who was anyone and building a Rails application, was focussing on resources, CRUD and HTTP. There were two chief benefits of this change.

The first benefit, and the one everyone focussed on, was that you had a relatively straightforward way to add an API to your application. Back in the preREST dark-ages everyone who was designing an API for their application had to make a bunch of decisions about how they wanted to build it.

  • Do you re-use your controllers, or have a separate ApiController?
  • How do you pass arguments around?
  • What should the URLs look like?
  • Perhaps XML-RPC or SOAP is the right way?

With REST you get answers to all those questions, and instead of worrying about that, you just get on with building your Application.

Even for applications without an API, REST gives you some benefits. You avoid discussions about what your controllers and actions should be called, and what your URLs should look like. It also makes it easier for new developers to get up to speed with your project. Almost every rails developer now knows that if you’re looking for the thing which creates posts, you’ll be looking at PostsController#create.

Taking it Further

If we look at a slightly more complicated example, we can see the beginnings of the friction that comes from taking things too far. Take an example of a site which lets people upload photos and write blog posts, and lets users comment on one another’s data. The most common way to approach this design would be:

  map.resources :bookmarks, :has_many => [:comments]
  map.resources :posts,     :has_many => [:comments]

The nice thing about this design is that the URLs will reflect the underlying structure of the data you’re managing. For example the URL for comments on post number 5 will be /posts/5/comments and for bookmark 3 will be /bookmarks/3/comments. However where it starts to get a /little/ annoying is when you want to do something generic to all comments, like providing a ‘mark as spam’ link alongside a comment. Because comments exist solely as a child of the Commentable we can’t generate the URLs without knowing the class and id of that object. So it’s just that little bit more difficult to deal with comments generically (e.g. in an admin interface). This tends to lead to you writing a helper something like this:

  def spam_comment_url(comment)
    case o = comment.commentable
      when Post
        spam_post_comment_url(o, comment)
      when Bookmark
        spam_bookmark_comment_url(o, comment)
      end
    end
  end

Now this is a good indicator that you should probably also have a top-level resource for your comments, and thankfully there’s a feature for this case which gives you a nice pragmatic way out.

  # First define the top-level comment resource
  map.resources :comments, :member => {:spam=>:post}

  # then add a shallow collection under each of the commentables
  map.resources :bookmarks do |bookmarks|
    bookmarks.resources :comments, :shallow=>true
  end

  map.resources :posts do |posts|
    posts.resources :comments, :shallow=>true
  end

Taking it Too Far

Unfortunately people often get started with REST and love the way it simplifies their designs and gives them conventions to follow. They then take their new rose coloured glasses and start making sure everything in their app is “purely RESTful”. Every new design decision must be perfectly RESTful, anything which looks like RPC is instantly purged from the application.

Taking this more extreme approach to the problem of marking comments as spam they’ll say something like:

When you think about it, marking a comment as spam is really creating the SpamScore child resource of the comment with the value of spam set to true

And build something like this, so when they want to mark a comment as spam they ‘only’ have to construct a POST request to the bookmark_comment_spam_score_url of /posts/1/comments/2/spam_score:

  map.resources :bookmarks do |bookmarks|
    bookmarks.resources :comments do |bookmark_comments|
      bookmark_comments.resource :spam_score
    end
  end

While this may be purely restful, it’s much more complicated than the ‘impure’ RPC approach taken above with a simple URL like /comments/1/spam. Plus if you want to get truly pure your URLs should probably be more like this:

map.resources :users do |users|
  users.resources :bookmarks do |bookmarks|
    bookmarks.resources :comments do |bookmark_comments|
      bookmark_comments.resource :spam_score
    end
  end
end

The advice I typically give when I come across a “complex but pure” model like this is to go back to basics and remember why we originally started using REST. Does it help us make an API? Does it make things simpler for new developers to follow? Does it make it easier to work with some of the great plugins out there? If the answer to all those questions is no, you should probably dial back the purity and do the pragmatic thing.

Updates

Posted by Koz Monday, June 22, 2009 06:08:00 GMT

Just a few quick updates about what’s going on around here.

Sponsorship

New Relic have generously been sponsoring this site for a few months now. I’ve been a user of their RPM product for a while now and recommend it to all my clients, so when they offered to sponsor the site I was happy to accept. In addition to the product, they also have some interesting stuff on their performance focussed site RailsLab site. I don’t intend to talk about anything related to their products or services here, but in case I secretly turn into a shill, this counts as full disclosure.

Comments Policy

A few comments have been less-than-friendly to either myself, or other commenters. So my new comment policy is Be civil. Feel free to disagree with any posts or comments here, but do so civilly. Anything which violates this new policy will just be deleted without comment. However life’s too short to spend time moderating comments, so let’s all just be nice.

Action Rails

I’ve teamed up with Pratik Naik and Mike Gunderloy to form Action Rails. We provide a number of services which could be of interest to readers here. If you want your application’s code and architecture reviewed by a team of experts, ongoing support for your development team or help with a one-off specific problem we can help. For more information see the details at ActionRails.com or email sales@actionrails.com

Uploading Files

Posted by Koz Thursday, April 23, 2009 23:46:00 GMT

Anyone who’s built a rails application that deals with large file uploads probably has a few horror stories to tell about it. While some people love to overstate the issues for their own purposes, it’s still something that can be quite challenging to do well.

What’s the Problem?

As I mentioned in the article on File Downloads, your rails processes are a scarce resource. You need them to be free to handle your applications’ requests, if they’re all busy, your users will be left waiting. When we optimised the download processes we made sure that we used our webservers instead of tying up a rails process to spoon feed the file out over the network to your users. Dealing with uploads has a similar problem.

When a browser uploads a file, it encodes the contents in a format called ‘multipart mime’ (it’s the same format that gets used when you send an email attachment). In order for your application to do something with that file, rails has to undo this encoding. To do this requires reading the huge request body, and matching each line against a few regular expressions. This can be incredibly slow and use a huge amount of CPU and memory.

While this parsing is happening your rails process is busy, and can’t handle other requests. Pity the poor user stuck behind a request which contains a 100M upload!

What’s not the problem?

Some people seem to think that the File Upload problem with rails is that the entire process is blocked while the browser sends the encoded body to you. This isn’t not true, and hasn’t been for a long time. Whether you’re using nginx + mongrel, apache + mongrel or apache + passenger, your web server buffers the entire request before rails locks itself for processing. So no matter how slow a user’s connection is, your application isn’t locked while they upload their file.

What can you do?

There are a number of unattractive options to work around this slow multipart-parsing. The most common I’ve seen is to send uploads to a non-rails process such as a CGI script or a merb/mongrel/rack application. CGI scripts have the obvious disadvantage that you need to write a script simple enough to start up quickly and featured enough to process your uploads. Doing it in rack leaves you relying on ruby’s threading to handle parallelism. This is probably not what you want and your throughput is probably much lower than it would be without that upload being processed.

What else can you do?

Because neither of these options were acceptable Pratik Naik and I have built a Mod Porter an Apache module that does the heavy lifting for your file uploads. All of the hard stuff is done by libapreq though, so you don’t have to worry about using C code written by two ruby programmers!

Porter is essentially the inverse of X-SendFile. It parses the multipart post in C inside your apache process and writes the files to disk. Once that work is done it changes the request to look like a regular form POST which contains pointers to the temp files on disk. To maintain system security it also signs the modified parameters so people can’t attack your system like those old PHP apps.

This means that your rails processes don’t have to deal with anything more than a regular form post which is nice and fast. In addition to the apache module, Porter also includes a Rails Plugin which hides all of this from you. It makes an upload handled by Porter, look just like a regular Rails Upload.

How fast is it?

The speed of upload parsing isn’t particularly relevant, the reduced locking is far more important. Your user’s internet connection is much more important for the round-trip upload performance than your upload handler’s parser.

Having said all that, Porter runs significantly faster than the equivalent pure-ruby parsing code. Depending on the size and number of uploads we’ve seen response times between 30 and 200 times as fast. That’s not just compared to rails’ upload parser, it’s that much faster than every other ruby mime parser we tried.

Isn’t this just like the Nginx module?

Kinda. We’ve been thinking about this module ever since we started using lighttpd’s X-SendFile header. When I saw the nginx module get released I decided to start planning the Apache equivalent. Porter is completely transparent to your application, you don’t need a special form action, and you don’t need to tell Porter what form fields to pass through to the web application. This means you can use porter in production, and mongrel or thin in development, without any changes to your application.

The biggest improvement from this is that you don’t need to change your nginx config every time you add a new input to a form, or a new file upload to your application. This is extremely tedious and error prone, especially when making these changes involves a support ticket with your hosting provider. The major goal we have with Porter is to make sure it always ‘Just Works’, so you can put a file upload into any form without having to worry about your web server.

Getting Started

Porter is still beta software, so you’re strongly advised to test it first, but you already knew that. The porter website has the installation instructions. Once you’ve got that done you’ll need to add the rails plugin, and configure them to share a nice secure secret. Then, hopefully, your application will Just Work but your uploads will be much less painful.

If you have any issues getting it running, leave us a note on the git hub issues page.

Controller Inheritance

Posted by Koz Monday, April 06, 2009 23:20:00 GMT

Just a brief interlude from the File Management series while I sort out some time to do some benchmarking.

A common pattern I see in submissions and client-applications is repetitive declarations in controllers. There’s a neat and simple solution for this, but given how often Things like this:

  class TodoController < ApplicationController
    before_filter :login_required
    before_filter :handle_iphone_support
    before_filter :fetch_todo
    around_filter :performance_logging

    def index
    ...
    end
  end
For a single controller this set of declarations wouldn’t be a problem, but the problem is when all those declarations end up duplicated in several different controllers. Thankfully every controller inherits from ApplicationController so we can do something like this:
  class ApplicationController < ActionController::Base
    before_filter :login_required
    before_filter :handle_iphone_support
    around_filter :performance_logging

  end

  class TodoController < ApplicationController
    before_filter :fetch_todo

    def index
    ...
    end
  end

This is easy, and obvious and most people do that. However what do you do when you have several controllers which don’t need the login_required call? One option is to give up on inheritance and go back to copying-and-pasting the filter declarations. Another option is to selectively opt-out of the parent controller’s filters.

  class ApplicationController < ActionController::Base
    before_filter :login_required
    before_filter :handle_iphone_support
    around_filter :performance_logging
  end

  class SignupController < ApplicationController
    skip_before_filter :login_required
  end

However if you have several controllers which don’t require logins, you’ll find yourself duplicating the skip_before_filter declarations around. Otherwise filters and declarations which aren’t completely universal still. An approach which solves this nicely is to introduce an abstract parent controller for all your authenticated controllers.

  class ApplicationController < ActionController::Base
    before_filter :handle_iphone_support
    around_filter :performance_logging
  end

  class AuthenticatedController < ApplicationController
    before_filter :login_required
  end 

  class TodoController < AuthenticatedController
    before_filter :fetch_todo

    def index
    ...
    end
  end

This technique doesn’t just work with filter declarations most other declarations such as caching, session and csrf options work as expected too. In addition you can introduce several parent classes as needed such as a PublicController with page caching declarations, an AdminController for your admin panels etc. Inheritance isn’t the solution to every case of duplicate declarations, yet it’s a simple technique that can simplify most uses.

Storing Your Files

Posted by Koz Monday, March 16, 2009 03:51:00 GMT

This is the second article in my series on file management, the third article will cover the challenges of handling uploads then we should be able to move on to some more advanced topics.

The second problem you’ll face when building an application to handle files is where and how to store them. Thankfully there are lots of well-supported options, each with their own pros and cons.

The local file system

If your application only runs on a single server, the simplest option is to store them on the local disk of your web/application server. This leaves you with very few moving parts, and you know that both your rails application and your webserver can see the same files, at the same location. But even though this is a simple option there are a few things that you need to be careful of.

A common mistake I see is to use a single directory to handle all of the users’ uploaded files. So your directory structure ends up looking something like this:

/home/railsway/uploads/koz_avatar.png
/home/railsway/uploads/dhh_avatar.png
/home/railsway/uploads/other_avatar.png

The first, and most obvious, problem with this structure is that unless you’re careful you could end up with users overwriting each other’s files. The second, and more painful problem is that you end up with too many files in a single directory which will cause you some pain when you try to do things like list the directory or start removing old files.

The best bet is to store the uploads in a directory which corresponds to the ID of the object which owns those files. But something like the following will also leave you with a huge directory:

/home/railsway/uploads/1/koz_avatar.png
/home/railsway/uploads/2/dhh_avatar.png
/home/railsway/uploads/3/other_avatar.png

The best bet is to partition that directory into a number of sub directories like this:

/home/railsway/uploads/000/000/001/koz_avatar.png
/home/railsway/uploads/000/000/002/dhh_avatar.png
/home/railsway/uploads/000/000/003/other_avatar.png

Thankfully both of the popular file management plugins have built in support for partitioned storage :id_partition in paper clip and :partition in attachment_fu.

NFS, GFS and friends

Once you’ve grown beyond a single app / web server, using the file-system gets a little more complicated. In order to ensure that all your app and web servers can see the same files you have to use a shared file system of some sort. Setting up and running a shared file system is beyond the scope of this site, but a few words of caution.

It’s deceptively easy to set up a simple NFS server for your network and just run your application as you did when it was on a single disk, but some things which are cheap on local disk are slow and expensive over NFS and friends. Make sure you stress test your file server and pay an expert to help you tune the system. The bigger problem I’ve had with NFS and GFS is the impact of downtime or difficulties on your application. Your NFS server becomes a single point of failure for your whole site, and a minor network glitch can render your application completely useless as all the processes get tied up waiting on a blocking read from an NFS mount that’s gone away.

You can solve all those kinds of problems by hiring a good sysadmin and / or spending a large amount of money on serious storage hardware. It’s not a path that I personally choose, but it’s definitely an option you should consider.

Amazon S3

It’s not really possible to write about storage without touching on Amazon S3. In case you’ve been living under a rock for a few years S3 is a hugely scalable, incredibly cheap storage service. There are several good gems to use with your applications and the major file management plugins provide semi-transparent S3 support.

S3 isn’t a file system so there are several things which you have to do differently, however there are alternatives for most of those operations. For instance instead of using X-Sendfile to stream the files to your user, you redirect them to the signed url on amazon’s own service. By way of example our download action from the earlier article would look like this if using S3 and marcel’s s3 library

def download
  redirect_to S3Object.url_for('download.zip',
                               'railswayexample',
                               :expires_in => 3.hours)
end

But there are a few things you have to be careful with when using S3. The first is that uploading to s3 is much slower than simply writing your file to local disk. Unless you want your rails processes to be tied up for ages, you’ll probably want to have a background job running which transfers the files from your server up to amazon’s. Another factor is that when S3 errors occur your users will be greeted by a very ugly error page:

Finally there’s always the risk of amazon having another bad day which takes your application down for a few hours. Amazon’s engineers are pretty amazing, but nothing’s perfect.

Other options

There are a few options I’ve not used before, but you could investigate:

BLOBs in your database

I’ve never been a fan of using BLOBs to store large files, however some people swear by them. If you’re aware of great tutorial resources for BLOBs and rails, let me know and I’ll link to them from here.

Rackspace’s Cloud Files

When it was first announced Cloud Files from rackspace seemed like it was going to be a great competitor to S3. However there’s currently no equivalent to S3’s signed-url authentication option which means downloads become much harder. To use Cloud Files would require you to build a streaming proxy in your application, and use it to stream files from rackspace back out to the user. You’d also have to pay for the bandwidth twice, once from rackspace, and once from your hosting provider.

This makes it much more complicated than S3 but hopefully this will be addressed in a future release.

MogileFS

MogileFS is a really interesting option. It has some similarities to S3 in that it’s a write-once file storage system which operates over HTTP. But unlike S3 it’s open source software you can run on your own servers. Unfortunately MogileFS is really thinly documented and quite difficult to get up and running. If you know of a really good getting-started tutorial for MogileFS, let me know and I’ll link to it from here.

It also would require you to use perlbal for your load balancer or find an apache module that can support X-Reproxy-Url.

Conclusion

There are a bunch of different options you should consider when picking the storage for your file uploads. Generally my advice would be to start with simple on-disk partitioned storage and grow from there. Don’t rush straight to S3 because all the blogs tell you to, stay as simple as possible for as long you can.

File Downloads Done Right

Posted by Koz Sunday, February 22, 2009 02:44:00 GMT

Getting your file downloads right is one of the most important parts of your File Management functionality. A poorly implemented download function can make your application painful to use, not just for downloaders, but for everyone else too.

Thankfully it’s also one of the easiest things to get right.

The simple version

For the purposes of this article let’s assume that your application needs to provide access to a large zip file, but that access should be restricted to logged in users.

The first choice we have to make is where to store this file. In this case there’s really only one wrong answer, and that’s to store it in the public folder of your rails application. Every file stored in public will be served by our webserver without the involvement of our rails application. This makes it impossible for us to check that the user has logged in. Unless your files are completely public, you shouldn’t go anywhere near the public folder.

So let’s assume we’ve stored the zip file in:

/home/railsway/downloads/huge.zip

Next we need a simple download action to send the file to the user, thankfully rails has this built right in:

  before_filter :login_required
  def download
    send_file '/home/railsway/downloads/huge.zip', :type=>"application/zip" 
  end

Now when our users click the download link, they’ll be asked to choose a location and then be able to view the file. The bad news is, there’s a catch here. The good news is it’s easy to fix.

What’s the catch?

The problem here is one of scarce resources, and that resource is your rails processes. Whether you’re using mongrel, fastcgi or passenger you have a limited number of rails processes available to handle application requests. When one of your users makes a request, you want to know that you either have a process free to handle the request, or that one will become free in short order. If you don’t, users will face an agonizing wait for pages to load, or see their browser sessions timeout entirely.

When you use the default behaviour of send_file to send the file out to the user, your rails process will read through the entire file, copying the contents of the file to the output stream as it goes. For small files like images this probably isn’t that big of a deal, but for something enormous like a 200M zip file, using send_file will tie up a process for a long time. Users on slow connections will soak up a rails process for correspondingly longer.

If you get a large number of downloads running, you may find all your rails processes taken up by downloaders, with none left to serve any other users. For all intents and purposes your site is down: you’ve instituted a denial of service attack against yourself.

What about threads?

Unfortunately threads in ruby won’t save us. The combination of blocking IO and green threads mean that even though you’re doing the work in a thread, it’s blocking the entire process most of the time anyway. JRuby users may get a performance improvement, but it’s still going to be a noticeable consumption of resources when compared to letting a web server stream the file.

Don’t believe everything you read on the internet, threads and ruby just won’t help you with most of this stuff.

So What’s the Solution?

Thankfully this problem was solved a long time ago by the guys at live journal. They used perl instead of ruby, but had the same problems. Downloading files would block their application processes for too long, and cause other users to have to wait. Their solution was elegant and simple. Instead of making the application processes stream the file to the user, they simply tell the webserver what file to send, and let the web server bother with the details of streaming the file out to the client.

Their particular solution is quite cumbersome to set up and use, but there’s a very similar solution available called X-Sendfile. It’s supported out of the box with later versions of lighttpd, and available as a module for apache.

The way it works is instead of sending the file to our users, our rails application will simply check they’re allowed to download it (using our login_required filter) then write the name of the file into a special response header then render an empty response. Once apache sees that response it will read the file from disk and stream it out to the user. So your headers will look something like:

X-Sendfile: /home/railsway/downloads/huge.zip

The apache module has a slightly annoying default setting that prevents it from sending files outside the public folder, so you’ll need to add the following configuration option:

XSendFileAllowAbove on

Thankfully for rails users x-sendfile support is built right in to rails, allowing us to make a few minor changes and we’re done.

  
  before_filter :login_required
  def download
    send_file '/home/railsway/downloads/huge.zip', :type=>"application/zip", :x_sendfile=>true
  end

With that, we’re done. Our rails process just make a quick authorization check and render a short response, and apache uses its own optimised file streaming code to send the file down to our users. Meanwhile, the rails process is free to go on to the next request.

Nginx users can use a similar header called X-AccelRedirect. This is a little more fiddly to set up, and requires your application to write a special internal URL to the http response rather than the full path, but in terms of scalability and resource contention, it’s just as great. There’s an intro to the nginx module available if you’re an nginx user. If only uploads were this easy!

Up Next

The next article in the series will cover my experiences when dealing with the storage of your files. Should you use S3? What about blobs, NFS, GFS or MogileFS?

File Management

Posted by Koz Thursday, February 12, 2009 05:34:00 GMT

One of the most common features for web applications I’ve built over the last 4 years doing rails is file management. Users download file attachments in almost every web application I use. Thankfully rails has a really capable suite of file management tools, and there are several great plugins to handle some of the more mundane functionality you’d need.

Over the next month or so I’m going to cover the set of techniques I use when building file management solutions for my clients, and some really exciting up and coming solutions which solve the last of my annoyances.

The rough order of business will be:

  1. File Downloads Done Right
  2. File Management Plugins
  3. Painless File Uploads
  4. Storing your Files

If there’s anything in particular that you’d like to see covered, let me know in the comments.

Requests Per Second

Posted by Koz Tuesday, January 06, 2009 07:12:00 GMT

One of the most harmful things about people discussing the performance of web applications is the key metric that we use. Requests per second seems like the obvious metric to use, but it has several insidious characteristics that poison the discourse and lead developers down ever deeper rabbit holes chasing irrelevant gains. The metric prevents us from doing A/B comparisons, or discussing potential improvements without doing some mental arithmetic which appears beyond the capabilities of most of us.

Instead of talking about requests per second, we should always be focussed on the duration of a given request. It’s what our users notice, and it’s the only thing which gives us a notice.

I should prefix the remaining discussion here by saying that most of it does not apply to discussing performance problems at the scale of facebook, google or yahoo. The thing is, statistically speaking, none of you are building applications that will operate at that scale. Sorry if I’m the one who broke this to you, but you’re not building the next google :).

I should also state that requests per second is a really interesting metric when considering the throughput of a whole cluster. But throughput isn’t performance.

Diminshing marginal returns

The biggest problem I have with requests per second is the fact that developers seem incapable of knowing when to stop optimising their applications. As the requests per second get higher and higher, the improvements become less and less relevant. This lets us think we’ve defeated that pareto guy, while we waste ever-larger amounts of our employers’ time.

Let’s take two different performance improvements and compare them using both duration and req/s.

Patch Before After Improvement
A 120 req/s 300 req/s 180 req/s
B 3000 req/s 4000 req/s 1000 req/s

As you can see, when you use req/s as your metric, change B seems like a MUCH bigger saving. It improves performance by 1000 requests a second instead of that measly 180, give that guy a raise! But let’s see what happens when we switch to using durations:

Patch Before After Improvement
A 8.33 ms 3.33 ms 5 ms
B 0.33 ms 0.25 ms 0.08 ms

You see that the actual changes in duration in B is vanishingly tiny. 8% of one millisecond! Odds are that that improvement will vanish into statistical noise when compared to the latency of your network, or your user’s internet connection.

But when we use requests per second, that 1000 is so big and enticing that developers will do almost anything to get it. If they used durations as their metric, they’d probably have spent that time implementing a neat new feature, or responding to customer feedback.

Deltas become meaningless

A special case of my first complaint is that with requests per second the deltas aren’t meaningful without knowing the start and the finish points. As I showed above, a 1000 req/s change could be a tiny change, but it could also be an amazing performance coup. Take this next example:

Before After Diff
1 req/s 1001 req/s 1000 req/s

When expressed as durations you can see that it made a huge difference

Before After Diff
1000 ms 0.99 ms 999.01 ms

So 1000 requests per second could either be irrelevant, or fantastic. Durations don’t have this problem at all. 0.02ms is obviously questionable, and 999.01 ms is an obvious improvement.

This problem most commonly expresses itself when people say “that changeset took 50 requests per second off my application”. Without the before and after numbers, we can’t tell if that’s a big deal, or if the guy needs to take a deep breath and get back to work.

The numbers don’t add up

Finally, requests per second don’t lend themselves nicely to arithmetic, and make developers make silly decisions. The most common case I see this is when comparing web servers to put in front of their rails applications. The reasoning goes something like this:

Nginx does over 9000 requests per second, and apache only does 6000 requests per second!! I’d better use nginx unless I want to pay a 3000 requests per second tax.

When people do this comparison they seem to believe that by switching to nginx from apache their application will go from 100 req/s to 3100 req/s. As always, durations tell us a different story.

Apache Nginx Diff
6000 req/s 9000 req/s 3000 req/s
0.16 ms 0.11 ms 0.05 ms

So we can see that odds are you’ll only gain a 5% of a millisecond’s improvement when switching. Perhaps that improvement is worthwhile for your application, but is it worth the additional complexity?

Conclusion

Durations are a much more useful, and more honest, metric when comparing performance changes in your applications. Requests per second is too wide-spread for us to stop using it entirely, but please don’t use it when talking about performance of your web applications or libraries.

@the_rails_way.awaken!

Posted by Koz Friday, January 02, 2009 03:52:00 GMT

After more than 15 months since the last post, and constant questions from users, I’m finally ready to bring The Rails Way back from hibernation.

The challenge I had here was the amount of time involved. Review articles are incredibly time consuming, scouring an application for code to improve can take hours, making the changes takes time, and all of that is dependent on getting that perfect submission.

So while I intend to continue to do review pieces here (keep those submissions coming in) I’m going to extend the format here to include a few different kinds of posts. I’ll be doing some focussed introductory pieces which cover the best practices for a few tricky areas that I see experienced rails programmers getting wrong. I’ll also be doing a few ‘soapboxy’ pieces where I can address misinformation about Rails and Ruby or just advocate a particular piece of technology or code that I think is really cool.

One of my primary goals with this relaunch will be to post regularly, but I’m not going to try and stick to a schedule that takes the fun out of it for me. Some weeks might see multiple posts, and others will see none at all. I’m just hoping that you guys will enjoy most of them.

Many Skinny Methods

Posted by Koz Thursday, October 04, 2007 08:12:00 GMT

This refactoring is based on a topic Marcel and I covered at RailsConf Europe.

Before

1
2
3
4
5
6
7
8
9
10
11
12
class Expense < ActiveRecord::Base
  belongs_to :payee
  protected

    # Nice and concise, but what happens as we add more rules
    # and how do we write test cases for the four different possible 
    # validation states?
    def validate
      errors.add("Not enough funds") if payee.balance - amount > 0
      errors.add("Charge is too great") if payee.account.maximum_allowable_charge > amount
    end
end

After

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class Expense < ActiveRecord::Base
  belongs_to  :payee

  # Instead of one large validation method, break each individual
  # rule into methods, and declare them here.
  validate      :ensure_balance_is_sufficient_to_cover_amount
  validate      :amount_does_not_exceed_maximum_allowable_charge

  protected

    # These validation callbacks simply add error messages if a particular 
    # condition is met.   Each of them can be tested and understood on their own
    # without having to understand the entire body of the validate method.
    def ensure_balance_is_sufficient_to_cover_amount
      errors.add("Not enough funds") if insufficient_funds?
    end

    def amount_does_not_exceed_maximum_allowable_charge
      errors.add("Charge is too great") if exceeds_maximum_allowable_charge?
    end

    # By defining separate predicate methods we can test each of them individually
    # and new programmers can see the intent of the code, not just the implementation.


    # Instead of subtracting the amount from the balance and checking if the
    # value is greater than 0,  change the implementation to mirror the intent.
    # There was a bug in the before code,  this is more obvious.
    def insufficient_funds?
      amount > payee.balance
    end

    def exceeds_maximum_allowable_charge?
      payee.account.maximum_allowable_charge > amount
    end
end

While the refactored version may have more lines of code, but don’t let that scare you. It’s far more important for code to be human readable than incredibly concise.

Using ActiveResource to consume web-services

Posted by Koz Monday, September 03, 2007 04:26:00 GMT

Today I’m reviewing Joe Van Dyk’s monkeycharger application, which is a web-service for storing and charging credit cards. I loved looking at this app, because its only interface is a RESTful web service: there is no HTML involved. (If you’ve never written an app that only exposes a web-service UI, you ought to. It’s a blast.)

In general, Joe has done a fantastic job with keeping the controllers slim and moving logic to models. The only significant gripe I had with the application is that it is not ActiveResource compatible.

For those of you that are late to the party, ActiveResource is the newest addition to the Rails family. It lets you declare and consume web-services using an ActiveRecord-like interface…BUT. It is opinionated software, just like the rest of Rails, and makes certain assumptions about the web-services being consumed.

  1. The service must understand Rails-style REST URLs. (e.g. “POST /credit_cards.xml” to create a credit card, etc.)
  2. The service must respond with a single XML-serialized object (Rails-style).
  3. The service must make appropriate use of HTTP status codes (404 if the requested record cannot be found, 422 if any validations fail, etc.).

It’s really not much to ask, and working with ActiveResource (or “ares” as we affectively call it) is a real joy.

However, monkeycharger tends to do things like the following:

1
2
3
4
5
6
7
8
9
10
class AuthorizationsController < ApplicationController
  def create
    @credit_card   = Authorizer.prepare_credit_card_for_authorization(params)
    transaction_id = Authorizer::authorize!(:amount => params[:amount], :credit_card => @credit_card)
    response.headers['X-AuthorizationSuccess'] = true
    render :text => transaction_id
  rescue AuthorizationError => e
    render :text => e.message
  end
end

Three things: the request is not representing an “authorization” object, the response is not XML, and errors are not employing HTTP status codes to indicate failure.

Fortunately, this is all really, really easy to fix. First, you need (for this specific example) an Authorization model (to encapsulate both the the XML serialization and the actual authorization).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Authorization
  attr_reader :attributes

  def initialize(attributes)
    @attributes = attributes
  end

  def credit_card
    @credit_card ||= Authorizer.prepare_credit_card_for_authorization(attributes)
  end

  def authorize!
    @transaction_id = Authorizer.authorize!(:amount => attributes[:amount],
      :credit_card => credit_card)
  end

  def to_xml
    { :transaction_id => @transaction_id }.to_xml(:root => "authorization")
  end
end

Then, we rework the AuthorizationsController to use the model:

1
2
3
4
5
6
7
8
class AuthorizationsController < ApplicationController
  def create
    authorization = Authorization.new(params[:authorization])
    authorization.authorize!
    render :xml => authorization.to_xml, :status => :created
  rescue AuthorizationError => e
    render :xml => "<errors><error>#{e.message}</error></errors>", :status => :unprocessable_entity
  end

(Note the use of the “created” status, which is HTTP status code 201. Other verbs just use “ok”, status code 200, to indicate success. Also, with an error, we return an “unprocessable_entity” status, which is HTTP status code 422. ActiveResource will treat that as a failed validation.)

With that change, you could now use ActiveResource to authorize a credit card transaction:

1
2
3
4
5
6
7
8
9
10
11
12
class Authorization < ActiveResource::Base
  self.site = "http://my.monkeycharger.site"
end

auth = Authorization.new(:amount => 15, :credit_card_id => 1234,
  :remote_key => remote_key_for_card)

if auth.save
  puts "success: #{auth.transaction_id}"
else
  puts "error: #{auth.errors.full_messages.to_sentence}"
end

It should be mentioned, too, that making an app ActiveResource-compatible does nothing to harm compatibility with non-ActiveResource clients. Everything is XML, both ways, with HTTP status codes being used to report whether a request succeeded or not. Win-win!

Obviously, real, working code trumps theoretical whiteboard sketches every time, and Joe is to be congratulated on what’s done. Even though ActiveResource-compatibility can buy you a lot, you should always evaluate whether you really need it and implement accordingly.

Testing the Right Stuff

Posted by Koz Monday, August 20, 2007 02:24:00 GMT

I’m going to take a slightly different tack here, and review some of the unit tests in rails itself. They show up two common anti patterns, spurious assertions and coupling your tests to the implementation.

Perhaps the biggest benefit of a suite of unit tests is that they can provide a safety net, preventing you from accidentally adding new bugs or introducing regressions of old bugs. With a large codebase, the unit tests can also help new developers understand your intent, though they’re no substitute for comments. However if you’re not careful with what gets included in your test cases, you can end up with a liability.

Be careful what you assert

Whenever you add an assertion to your test suite you’re sending a signal to future developers that the behaviour you’re asserting is both intentional and desired. Future developers who try to refactor your code will see a failing test, and either give up, or waste time trying to figure out if the assertion is ‘real’ or whether it was merely added because that’s what the code happened to do at present.

For an example, take the test_hatbm_attribute_access_and_respond_to from associations_test.rb , especially the assertions that the project responds to access_level= and joined_on=. Because of the current implementation of respond_do?, those assertions pass. But should they?

In reality while those values will get stored in the object, they’ll never be written back to the database. This is a surprising result for some developers, and removing those accessor methods would go a long way to helping avoid some frustrating moments.

Mock and Stub with care

Mock object frameworks like flexmock and mocha make it really easy to test how your code interacts with another system or a third party library. However you should make sure that the thing that you’re mocking doesn’t merely reflect the current implementation of a method. To take a case from rails, take a look at setup_for_named_route in routing_test.rb.

It takes the seemingly sensible approach of building a stubbed-out implementation of url_for instead of trying to build a full implementation into the test cases. The stubbed version of url_for simply returns the arguments it was passed, this makes it extremely easy to work with and to test.

The problem is not with stubbing out the method, but in the way it is used in all the named route test cases. Take test_named_route_with_nested_controller.

1
2
3
4
5
6
def test_named_route_with_nested_controller
  rs.add_named_route :users, 'admin/user', :controller => '/admin/user', :action => 'index'
  x = setup_for_named_route.new
  assert_equal({:controller => '/admin/user', :action => 'index', :use_route => :users, :only_path => false},
  x.send(:users_url))
end

The strange hash value you see in the assertion is the result of the named route method calling url_for, and returning that. The current implementation of the named route helpers does this, but what if you wanted to implement a new version of named routes which completely avoids the costly call to url_for? Every single named route test fails, even though applications which use those methods will work fine.

In this situation you have two options, you could make your tests depend on the full implementation of url_for. This would probably slow down your test cases, and require a lot more setup code, but because the return values are correct you’re not likely to impede future refactoring.

The other option is to use different stubs for every test case. Leaving you with something like this:

1
2
3
4
5
6
7
def test_named_route_with_nested_controller
  rs.add_named_route :users, 'admin/user', :controller => '/admin/user', :action => 'index'
  generated_url = "http://test.named.routes/admin/user"
  x = setup_for_named_route.new
  x.stubs(:url_for).returns(generated_url)
  assert_equal(generated_url,  x.send(:users_url))
end

Doing this for each and every test case is going to be quite time consuming and make your test cases extremely verbose. As with all things in software you’ll have to make a judgement call on this trade off and make a choice between coupling or verbosity.

Whatever approach you choose, just remember that misleading test ‘failures’ can slow down refactoring, and end up reducing your ability to respond to change. As satisfying as ‘100% coverage’ or 2:1 ratios may be, don’t blindly add assertions or mock objects just to satisfy a tool. Every line in your test cases should be there for a reason, and should be placed there with just as much care as you’d use for a line of application code.

Dangers of Cargo-culting

Posted by Koz Wednesday, August 01, 2007 16:10:00 GMT

“Cargo culting”, when used in a computer-programming context, refers to the practice of using techniques (or even entire blocks of code) seen elsewhere without wholly understanding how they work. (The term “cargo cult”, if you are unfamiliar with it, has its own fascinating etymology, which is covered nicely at wikipedia.) Cargo culting is a dangerous phenomenon, watering down the state of the art and encouraging cookie-cutter code shoved blindly into black boxes.

Consider the following snippet of code, taken from a project that was submitted to us some time ago. (Alas, I cannot find the original submitter—I apologize for that!)

1
2
3
def account_code?
  !! @account_code.nil?
end

To me, this looks cargo-culted, since it is seems that the programmer did not understand what the ”!!” idiom was all about. They probably saw it used somewhere and “cargo culted” it, using it without knowledge, assuming that it was, for some reason, “necessary”.

Now, the way ”!!” works is this: take the value behind the ”!!”, negate it, and negate it again. It’s just double-negation: !(!(@account_code.nil?)). The ultimate effect is to take some value, and convert it into an honest-to-goodness “true” or “false”. (In my ever-so-humble opinion, the ”!!” idiom is an abomination: it’s far too clever for its own good. First of all, you rarely ever need a real boolean value, and for those times you do, it is better to be explicit in the conversion, by using a ternary operator or full-blown if statement, for instance.)

In other words, the double-negation of nil? results in absolutely no difference from the use of nil? by itself, since nil? will return a true/false value. This, in turn, means the effect in the original is actually not what was intended for the account_code? predicate. It should have returned “true” if the account code existed (was “non-nil”), not “false”. Thus, the method should have actually been written thus:

1
2
3
def account_code?
  ! @account_code.nil?
end

In this case, cargo-culting resulted in the code being buggy. This is not an uncommon outcome of using techniques or code without understanding their purpose. If you ever find yourself copying something into your own code, with a justfying “I-don’t-know-what-it-does, but-it-appears-to-work”, stop immediately. Do some research. Figure it out. Learn what it means.

Further, note that unless you actually need a true boolean value from that, you can shorten the implementation of the account_code? predicate even further:

1
2
3
def account_code?
  @account_code
end

This works because Ruby treats nil and false as false, and everything else as true.

If there is one thing that Koz and I want you, our readers, to come away from this site with, it is an understanding of why you should do things one way and not another. Ultimately, it makes the difference between being a mediocre programmer, and becoming a great programmer.

Free-for-all: Tab Helper (Summary)

Posted by Koz Thursday, June 28, 2007 02:39:00 GMT

The first RailsWay free-for-all came off quite well. Many of you posted your favorite solutions to the problem of tab-based navigation, as posed by Nate Morse.

Jamis’ Take

Of all the solutions posted, my personal favorite was the pragmatic and simple CSS-based solution given by Mr. eel (Nate Morse came to the same solution independently):

I take a completely different approach. I ID the body of the page with the name of the current controller. Then I use a descendent CSS selector to highlight the current tab based on the body id and an id given to each link. I don’t bother with replacing the current tab link with a span. If the user wants to click that link again… then it’s the same as refreshing. Totally up to them.

With html like:

1
2
3
4
5
6
<body id="users">
  <ul>
    <li><a href="/users" id="usersNav">Users</a></li>
    <li><a href="/comments" id="commentsNav">Comments</a></li>
    <li><a href="/posts" id="postsNav">Posts</a></li>
  </ul>

I would use CSS like this

1
2
3
4
5
6
#users #usersNav,
#comments #commentsNav,
#posts #postsNav {
  background:red;
  font-weight:bold;
}

What a great approach. Although I would make the choice of the body ID explicit (rather than depending on the controller name), it is otherwise really nice. It shrugs off the whole issue of “should the current tab be a link” by saying it just doesn’t matter—every tab is always a link. Such pragmatism gets right to the heart of the Rails Way: implement just what matters, and nothing more.

Koz’s Take

A number of solutions relied on tightly coupling the controller and tabs. While this may seem like a time-saver at first, I believe that it’s unlikely to remain useful as your application grows. You’ll find yourself moving functionality into strange locations in order to make your tabs highlight correctly.

The problem is amplified with a restful application where your choice of controllers are dictated by the resources that you’re managing. You may have a list of comments in several different sections of your application, but not want to highlight the ‘comment’ tab whenever you display them.

Personally, I prefer the really simple approach of a before filter and a navigation partial.

1
2
3
def set_current_tab
  @current_tab = :people
end

Thanks, everyone for your submissions!

RailsConf Recap: Named Callbacks

Posted by Koz Thursday, June 07, 2007 02:49:00 GMT

Another topic we touched briefly on at RailsConf was the idea of named callbacks.

Consider this snippet (also from Brian Cooke’s expense tracking application):

1
2
3
4
5
6
7
8
class Expense < ActiveRecord::Base
  protected
    def before_create
      if self.created_at == Time.now.to_date.to_time
        self.created_at = Time.now
      end
    end
end

One thing to keep in mind here is that when a new Expense record is created, the created_at column is used to track when the expense originally occurred, not when the record was created. As a special case, if the timestamp is 00:00 of the current day, then it is assumed to actually be the current time.

Now, looking at that code, it’s definitely not immediately obvious what it is trying to do. In fact, it took me a few minutes of steady concentration (and cross-referencing other parts of the project) to understand it. The fact that it uses a generic “before_create” callback makes it hard to know the purpose of the method, and the use of “Time.now.to_date.to_time” (though effective) is pretty intention-obscuring.

Here’s a clearer, more self-documenting approach, using a named callback:

1
2
3
4
5
6
7
8
9
10
class Expense < ActiveRecord::Base
  before_create :make_created_now_if_created_today

  protected
    def make_created_now_if_created_today
      if self.created_at == Time.now.beginning_of_day
        self.created_at = Time.now
      end
    end
end

The named callback helps make it clearer what the purpose of the method is (though in this case, an additional comment would not be amiss). Also, ActiveSupport comes to the rescue, allowing us to convert the convoluted “Time.now.to_date.to_time” into the more self-documenting “Time.now.beginning_of_day”. (Alternatively, you might prefer “Time.now.midnight”, though I find “beginning_of_day” to be clearer, since it reveals the intention better.)

Always look for ways to make your code document itself. Ruby is one of the most readable programming languages I’ve ever used, and it’s a pity to not take advantage of that readability as often as you can.