Code Like Bozo http://codelikebozo.com and merchant of awesome at GeekRations.com posterous.com Sat, 04 Feb 2012 10:14:00 -0800 TDD for Business Value http://codelikebozo.com/tdd-for-business-value http://codelikebozo.com/tdd-for-business-value
A New Hope

Software Craftsmanship, SOLID principles, eXtreme Programming, the list of all the "best practice" guides I've learned over the years goes on and on. They are all extremely valuable to me. They also need to make room for a new style of development being ushered in by the ideas expressed in The Lean Startup. The best way to label it is as "the way Bozo is coding for now".

What exactly do am I talking about? Temporary code written fast and with little thought. Code meant to elicit learning and then promptly discarded and removed from production. Wait what? Let me take a step back.
I've been using a new process for split testing features. It flows like this:
  1. Formulate a hypothesis AKA have a testable opinion. I am nothing if not opinionated.
  2. Design a test. ie. Let's move this thing here, and that then there. List independent variables, dependent variables, covariates, expected benefits AKA $BLING$ Make an assumption that a certain segment of my customer base is representative (enough) of my full customer base.
  3. Code up something that only works for that segment... maybe it even only half works. Fuck any kind of automated testing unless it makes this step faster.
  4. Test what's been written and decide if it's "Good Enough" to get us reliable results.
  5. Release the test to production
  6. Once the test is over, delete all the code from production.
Why Is This Magical?

Step three is the one that's gonna make me sound like an unknowing ass hat. Allow me to explain:

If you're going to throw away code and aren't going to need to maintain it the rules of the Agile game change dramatically.

Why do we write automated tests? We accept that the slight cost increase now is worth it in the long run because the code will be around for a while and fellow team mates will need to interact with it. tl;dr It's a way of keeping our costs low. Great. Seriously.

So once we accept that this code will only live in production for a day and I can reasonably say no one will have to understand my code now why do I write automated tests? I don't.

Also, notice step two. If you severely limit the customer segments you're targeting it means you can take a bunch of shortcuts like:
Someone please ask me, "But what if there's A LOT of code for the test?" Two things:
  1. Your test is probably too damn big and testing more than isolated changes. (Notice the italics. #GuidelinesNotSteadfastRules)
  2. Remember step three? You wrote that code in a shitty unsustainable manner... BUT. You'll be able to rationalize taking the time to refactor after this. Lemme tell you why.
Know The Value of Your Work

You just developed, shipped, and validated the value of some code you are planning to put into production within two days. You now know how much money the feature makes your business and that means you also now know how much money your company will lose every time the feature breaks. Suddenly you're not just pushing out features because you're bored or trying to keep your team from looking idle, you're pushing out features because they make your company more money.

This is the new TDD guys. This is Red, Green, Refactor for business value. 
  1. Red- Think about what you're doing enough to understand how you can test it.
  2. Green- Find out it makes you money and delete the code.
  3. Refactor- Write the code well and in a disciplined manner knowing that you've proven it's worth the time.
Not Worth Testing

If a change is so small it isn't worth testing to see if it effects anything then WTF are we doing here? Who the hell is prioritizing that feature? Push back and bring this up. Branding might be a reason. I'm not sure how I feel about that. I can sympathize with design and... yeah. Separate blog post.

If a change is such a sure thing that you KNOW it will bring in major money then the cost to test it is probably insignificant given the long term value.

Have You Actually Tried This?

Yes.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Sun, 01 Jan 2012 17:39:00 -0800 Event Sourcing in Javascript http://codelikebozo.com/event-sourcing-in-javascript http://codelikebozo.com/event-sourcing-in-javascript

What is event sourcing (from a Bozo's perspective)?

Event sourcing is essentially the practice of storing a system's data in its most natural form, events. Rather than worrying about my database tables and codifying my data model into a db instead I store all of the events that have happened. So in DB land we're talking one table called Events with some columns with metadata about the event (to be covered later) and a data column. The data column is a blob field that the business objects can interpret to replay the event happening to them.

Why event sourcing?
There are a ton of benefits as well as some costs to choosing this architectural style. For myself, on this project, I chose it to learn more about event sourcing through experience and I don't fancy myself enough of an expert to give guidance here. The reasons I am interested in it are:
  • A data model that integrates logging- I step through exactly what my user did.
  • Optimizations become views of data- Is calculating your users state too slow? Need SQL? No problem. Run through all of the events and store them in OR tables for fast querying. Think of it as a cache. When a new event is added that invalidates the cache, play that event against your SQL tables as well and you're good to go.
  • Migrations maybe easier(?)- Your conceptual or data model change? No problem, blow away those SQL tables and build new ones based off of the events.
  • A data model that can handle multiple changes to the same data at the same time and no information loss. Because we're just collecting events, the UI may not show the data, but it's still in the system and able to be retrieved if need be.
  • Undo supported by default- Make a change but want to take it back? Go back to a previous moment in time demarcated by... EVENTS. :)
My Experiment
I'm building a small inventory system for a school cafeteria and in an effort to remain engaged and finish it I thought I'd add the wrinkle of letting this be my first pure event sourcing system. I've posted the code online here: https://github.com/jcbozonier/HeeHaw

Interesting results so far...
As I said at the start of this post, I'm doing this as a javascript RIA. The server in my example will be nothing more than Yet Another Event Store. That's been a somewhat surprising result. How I'm going to store the data and everything... just an implementation concern. Since right now the whole app can be brought back to its current state using just the event stream I know that all I need to do to provide persistence is to send the events to the server.... or the browser's local storage... or I can store them in gists via Github's API... the list goes on. 

In the meantime I can run a fully functional version of the system without DB/server access as long as I'm content with no cross session state persistence. MongoDB/Heroku might be pretty damn quick to deploy on but DropBox is even faster.

A negative interesting result? It's taken me a long time to do a fairly CRUDy app. Bending my mind around storing these events and driving the system off of them has been painful to say the least. Also most of my events bubble up from my event store directly to my views without any real logic needing to be done. This will change as the app becomes more complex, but still just wanted to point it out.

You might be wondering, what does a javascript event store look like?

Like this: 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
var EventStore = function(){
  var self = this;
  var events = [];
  var ignore_events = true;

  self.on_hydration = function(event){};

  self.store = function(event){
    if(!ignore_events){
      events.push(event);
    }
  };

  self.hydrate = function(){
    ignore_events = true;
    events.map(function(event){
      self.on_hydration(event);
    });
    ignore_events = false;
  };
};

And what do the events look like? Here's an example:
1
2
3
4
5
{
  type: "inventory item added",
  item_id: generate_guid(),
  item_name: item_name
}

That's the event that fires when an inventory item is added to the system. One interesting aspect of this is the "generate_guid()" function call. I'm used to letting the DB handle that detail for me, but it always bugged me that my business objects couldn't handle that. Now that I use GUIDs I can. Just generate a random ID and assign it to an object.... There's a chance of a collision but there are 3.4x10^38 different possibilities. There are articles on the efficacy of GUIDs/UUIDs please Google for them at your leisure.

More coming shortly
As I make more progress on the app I'm building, I'll add another post. In the meantime I wanted to throw something out here to record my thoughts.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Fri, 18 Nov 2011 08:08:00 -0800 From Crazy Idea to Customer in a Weekend http://codelikebozo.com/from-crazy-idea-to-customer-in-a-weekend http://codelikebozo.com/from-crazy-idea-to-customer-in-a-weekend

I launched a small business called http://GeekRations.com and got a paying customer in a weekend. Full disclosure, that was my only customer. I'm still learning and likewise I'm not completely sure how to go about finding my next one. This was still a huge milestone for me this year. A paying customer. That was my goal.

You'll see a lot of parallels to The Lean Startup in this and that's totally cool. I'm not, however, trying to adhere to some methodology. Experience has taught me that that's the Wrong Thing. The zen I've been able to pick out of The Lean Startup and Customer Development is really just that a solid business model is testable. It's not a black art. There's no reason or excuse to go months building a product or service without talking to potential customers. For developers, you think these ideas are just for pointy-haired business people? I bet you've tried to start your own OSS project and garner some community support but couldn't. This shit applies to you too. We all care if our work is valuable, and we all want to know ASAP if it isn't so we don't waste our time.

Let's learn what we need to learn and iterate. Let's admit that no one cares and ask them what they do care about. Let's expect customers to buy now and when they don't let's ask them why.

Here's what I did:


  1. (Friday Night) Picked a market- I'm a programmer and a geek. All of my friends are. I have 400+ followers on Twitter who are as well. I also spend time on HackerNews which is mainly geeks.
  2. (Friday Night) Gauged interest- I put up the most cheesy generic Unbounce landing page possible. A couple sentences about my idea and split tested two pages. I wanted to know if my sense of humor would prevent people from signing up.  It didn't.  I announced it to every geeky community I am a part of. I had about a 5% conversion rate. 
  3. (Saturday) Built a single page website w/ 3 price points- I can design if I try **REALLY** hard... Fuck that. I hit  http://themeforest.net/ like a baws. Grabbed a template I could use. It was a bit too feminine for me but I thought fuck it we'll see if it works. Next I got a PayPal pay now button and then threw it all onto Heroku to be hosted for free. Also added Facebook and Twitter buttons so I could have some idea as to whether or not people were excited by the idea.
  4. (Sunday) Visitor Feedback- I began to see tweets from several people that they didn't really understand what the random gifts might be. They had no idea what they were getting into. In response to this I added a thin strip of images of things I could see myself sending to customers.
  5. (Sunday Night) Purchase of mid-price point.

One issue I have heard since then is that my product is more of a luxury item and a lot of the people who really like this idea don't have the money to spend so frivolously. A possible pivot might be to sell something like this to girlfriends who don't know what to give to their geeky boyfriends. Not sure though. I have a couple little businesses and next year's goals will require me to focus intently on one of them. I'm trying to understand which has the most likelihood of succeeding and being something I enjoy being immersed in.

Any advice (from experience) or other thoughts? Leave a comment and let's start a conversation.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Sun, 25 Sep 2011 08:38:00 -0700 Creating an Image Proxy server in Node.js http://codelikebozo.com/creating-an-image-proxy-server-in-nodejs http://codelikebozo.com/creating-an-image-proxy-server-in-nodejs

This will be a short post. I am writing this to document how I created a Node.js server that can act as an image proxy. I needed this to get around a limitation in HTML5's canvas implementation that prevents getting a loaded image's binary data if that image is from a different web domain. This function is very handy though if you're building an image editor so I had to find a work around.

My solution is to create an image proxy on the web server in question. I pass the url of the image I want to a specific route on my server and then it downloads the image data and returns it to my javascript thus hiding its true origins.
Here is my complete code. I'll explain what all of the parts do afterwards:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
app.get('/proxied_image/:image_url', function(request_from_client, response_to_client){
  sys.puts("Starting proxy");
  var image_url = request_from_client.params.image_url;

  var image_host_name = url.parse(image_url).hostname
  var filename = url.parse(image_url).pathname.split("/").pop()

  var http_client = http.createClient(80, image_host_name);
  var image_get_request = http_client.request('GET', image_url, {"host": image_host_name});
  image_get_request.addListener('response', function(proxy_response){
    var current_byte_index = 0;
    var response_content_length = parseInt(proxy_response.header("Content-Length"));
    var response_body = new Buffer(response_content_length);
   
    proxy_response.setEncoding('binary');
    proxy_response.addListener('data', function(chunk){
      response_body.write(chunk, current_byte_index, "binary");
      current_byte_index += chunk.length;
    });
    proxy_response.addListener('end', function(){
      response_to_client.contentType(filename);
      response_to_client.send(response_body);
    });
  });
  image_get_request.end();
});

Because Node is event oriented when you download the image you actually create a request and add some listeners for certain events. In order to start the download you need to call the "end" function. That signals to Node that you are done setting up the request so it can be sent. The two events that need to be listened for are "data" and "end". The data event is called each time node downloads a chunk of data from the url you requested (yes it is called multiple times for a single request). As far as I know node won't aggregate the response automatically so that's why you see me adding the chunks of data to the buffer.

One big note that threw me off for a bit of time. In order to create a buffer of the correct size (it needs to be allocated up front) you need to find out how large the image is that you're downloading. Just grab the Content-Length property from the HTTP response header... BUT! When you get the content length from the response you have to convert it from a string to an into integer before using it to allocate the size of your buffer. If you don't, the buffer will be too small and the actual number of bytes you receive will be greater than the number of bytes in your buffer and things will ASPLODE.

Hopefully that helps someone. Enjoy!

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Mon, 05 Sep 2011 08:43:00 -0700 Why I'm (Finally) Switching to CoffeeScript http://codelikebozo.com/why-im-switching-to-coffeescript http://codelikebozo.com/why-im-switching-to-coffeescript
CoffeeScript Misconceptions

You may have already heard about CoffeeScript and some of the hype surrounding it but you still have found several reasons to not make the switch. This blog post is for you. Here are some of the reasons I held out for so long:
  • I wanted to understand Javascript and just didn't see how using a "simpler version" (my own thoughts) would make my life easier in the long run.
  • If I DID use an intermediate language, I wanted to be able to dump it at any time and not feel like I was forced to continue using it.
  • Putting one more thing with bugs in between myself and my code seemed fool hardy.
So here's the reasons I finally switched:
  • It's less verbose Javascript, not a different or simplified language.
  • A couple of shortcuts that enable you to use list comprehensions rather than error prone for statements.
  • CoffeeScript compiles to pretty awesome Javascript. I wouldn't have any concern dumping CoffeeScript at any time because of this. It would also have put some great conventions in my Javascript that I could follow.
  • Eli Thompson is always right. (You should read his blog. He's smart: http://eli.eliandlyndi.com/)
For-Loop Boiler Plate Banished

Here's an example of several Javascript for-loops embedded in a switch statement:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
    document.onkeydown = function(event) {
      var game_piece, _i, _j, _k, _l, _len, _len2, _len3, _len4, _len5, _len6, _m, _n;
      switch (event.keyCode) {
        case 37:
          for (_i = 0, _len = game_pieces.length; _i < _len; _i++) {
            game_piece = game_pieces[_i];
            game_piece.pan_left();
          }
          break;
        case 38:
          for (_j = 0, _len2 = game_pieces.length; _j < _len2; _j++) {
            game_piece = game_pieces[_j];
            game_piece.pan_up();
          }
          break;
        case 39:
          for (_k = 0, _len3 = game_pieces.length; _k < _len3; _k++) {
            game_piece = game_pieces[_k];
            game_piece.pan_right();
          }
          break;
        case 40:
          for (_l = 0, _len4 = game_pieces.length; _l < _len4; _l++) {
            game_piece = game_pieces[_l];
            game_piece.pan_down();
          }
          break;
        case 189:
          for (_m = 0, _len5 = game_pieces.length; _m < _len5; _m++) {
            game_piece = game_pieces[_m];
            game_piece.zoom_out();
          }
          break;
        case 187:
          for (_n = 0, _len6 = game_pieces.length; _n < _len6; _n++) {
            game_piece = game_pieces[_n];
            game_piece.zoom_in();
          }
      }
      return the_screen.refresh(game_pieces);
    };

GNARLY! Now obviously we could clean this code up a bit... but seriously... 

Well let's just see that same code in CoffeeScript and how much better it can be:
1
2
3
4
5
6
7
8
9
  document.onkeydown = (event)->
    switch event.keyCode
      when 37 then game_piece.pan_left() for game_piece in game_pieces
      when 38 then game_piece.pan_up() for game_piece in game_pieces
      when 39 then game_piece.pan_right() for game_piece in game_pieces
      when 40 then game_piece.pan_down() for game_piece in game_pieces
      when 189 then game_piece.zoom_out() for game_piece in game_pieces
      when 187 then game_piece.zoom_in() for game_piece in game_pieces
    the_screen.refresh(game_pieces)
That's the power of expressions.

Less Complex OO

Now how about classes? These are the bane of Javascript programmers everywhere. There are few right ways to do them and a billion wrong ways. CoffeeScript classes are the biggest simplification CoffeeScript makes to Javascript and was a big reason for my holding out for so long. Really though, they're just short hand that removes a bunch of boiler plate so I have less opportunity to introduce bugs. Here's a simple example: 

Simple Javascript class:
1
2
3
4
5
6
7
8
9
10
  var foo = (function() {
    function foo(param1, param2) {
      alert("I've been constructed!");
    }
    foo.prototype.bar = function() {
      return it.shake_like(a_polaroid_picture);
    };
    return foo;
  })();

Same class in CoffeeScript:
1
2
3
4
5
class foo
  constructor: (param1, param2)->
    alert "I've been constructed!"
  bar: ->
    it.shake_like(a_polaroid_picture)

Get Off My Lawn

In the past, people tried compiling to Javascript simply because they didn't get it. This is different. It's been over a decade since Javascript began to see wide use and enough of us get it now that we're starting to see tools that don't try to cover it up for being a broken language. Instead, we're seeing improvements made to a programming language we love and think can be even better. The only reason I see remaining to stick with Plain Old Javascript is nostalgia and fear of change.

I'll leave you with one last example. It's a full set of CoffeeScript that does drag/drop and zoom/pan in canvas. You can decide for yourself which version of the code you'd rather work on.

Before CoffeeScript:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
  hand = (function() {
    function hand() {
      this.held_item = null;
    }
    hand.prototype.interact_at = function(x, y) {
      if (this.held_item !== null) {
        return this.drop_tile_if_over_it(x, y);
      } else {
        return this.pick_up_tile_if_over_it(x, y);
      }
    };
    hand.prototype.move_to = function(x, y) {
      if (this.held_item !== null) {
        return this.held_item.drag_to(x, y);
      }
    };
    hand.prototype.can_interact_with = function(touchable_game_pieces) {
      return this.game_pieces = touchable_game_pieces;
    };
    hand.prototype.drop_tile_if_over_it = function(x, y) {
      var game_piece, _i, _len, _ref, _results;
      _ref = this.game_pieces;
      _results = [];
      for (_i = 0, _len = _ref.length; _i < _len; _i++) {
        game_piece = _ref[_i];
        if (game_piece.is_at(x, y)) {
          this.held_item = null;
          break;
        }
      }
      return _results;
    };
    hand.prototype.pick_up_tile_if_over_it = function(x, y) {
      var game_piece, _i, _len, _ref, _results;
      _ref = this.game_pieces;
      _results = [];
      for (_i = 0, _len = _ref.length; _i < _len; _i++) {
        game_piece = _ref[_i];
        if (game_piece.is_at(x, y)) {
          this.held_item = game_piece;
          this.held_item.start_dragging(x, y);
          break;
        }
      }
      return _results;
    };
    return hand;
  })();

After CoffeeScript:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class hand
  constructor: ->
    @held_item = null
  interact_at: (x,y)->
    if @held_item != null
      @drop_tile_if_over_it(x, y)
    else
      @pick_up_tile_if_over_it(x, y)
  move_to: (x,y)->
    if @held_item != null
      @held_item.drag_to(x, y)
  can_interact_with: (touchable_game_pieces)->
    @game_pieces = touchable_game_pieces
  drop_tile_if_over_it: (x,y)->
    for game_piece in @game_pieces
      if game_piece.is_at x, y
        @held_item = null
        break;
  pick_up_tile_if_over_it: (x,y)->
    for game_piece in @game_pieces
      if game_piece.is_at x, y
        @held_item = game_piece
        @held_item.start_dragging x, y
        break;

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Sat, 23 Jul 2011 00:19:00 -0700 Introducing GeekRations http://codelikebozo.com/introducing-geekrations http://codelikebozo.com/introducing-geekrations

What's GeekRations?

Tonight I launched my latest project, GeekRations (check it out at http://www.geekrations.com). It's a gift of the month club for geeks that pulls weird and off the wall gifts from the hidden nooks and crannies of the internet and delivers them to you monthly. I originally envisioned it for people like myself who love receiving packages in the mail just for the surprise of what's inside. It also makes for an awesome gift for that geek in your life you don't know how to buy for. 

Where We're At Right Now

Currently, GeekRations is taking emails from interested prospective customers. As soon as we're ready to start shipping gifts you'll be notified where you can sign up for the service. Visit http://www.geekrations.com and sign up to be notified once we're taking orders! 

Geeky Details

GeekRations is a lean start up in the purest sense of the word. The purpose of the landing page was to see if anyone even cared about this business idea. Apparently people do, so the business idea will be moving forward. Furthermore, GeekRations has an A/B test running on the splash page wording. One of them is pretty straight faced and very plain in describing our service while the other tries to be a little looser and silly. I will reveal the results of which one wins once I feel I've aggregated enough data that I can tell which is the clear winner.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Mon, 11 Jul 2011 10:59:00 -0700 Unit Testing the DOM http://codelikebozo.com/60550971 http://codelikebozo.com/60550971
How I Unit Test in jQuery

I created a function that will add arbitrary html to the DOM and remove it immediately after my test has run. This is what it looks like in use:
1
2
3
4
5
6
7
8
9
10
11
12
13
test("Given a view with an email address of the logged in user.", function()
{
    using(
    {
        html:"<form><input type='hidden' id='LoggedInUsersEmailAddress' value='justin@cheezburger.com' /></form>",
        do: function()
        {
            var found_friends_view = new $.mine.found_friends_view();
            var loggedInUsersEmailAddress = found_friends_view.getLoggedInUsersEmailAddress();
            equal(loggedInUsersEmailAddress, "justin@cheezburger.com");
        }
    });
});

And this is what comprises the function:
1
2
3
4
5
6
7
8
function using(params)
{
  $("body").append("<div id='using_container'></div>");
  $("body #using_container").append(params.html);
  params.do();
  $("body #using_container").remove();
}

It is important to note that errors leave garbage divs behind. This is definitely a work in progress. :)

If you're looking for the short and quick, that was it. If you're wondering why I'm doing this read on. (You can also view the live typed version here: http://ietherpad.com/ep/pad/view/Ek6pNOcyjv/latest)
How Brittle Is Your jQuery?

When you need to test your HTML DOM manipulations which of the following best describes your approach:
  • Just don't test it. You use Javascript templating and keep the interactions simple enough that it's low risk and has never proven to be a huge issue.
  • Write a javascript unit test, write your jQuery code, then verify your jquery interactions using jQuery to test the DOM
  • Write some jQuery, load the web page, and manually test it each time you make a change
  • Write your web page and test it using Selenium after the fact
  • Just don't test it. It would be valuable for you but you just don't have the time.
These are the most common strategies in my experience. The top two are strategies I have been known to employ. In one project I have been very successful in leveraging a very event oriented MVC-like templating technique that hasn't bitten me yet for net testing my code. At  Cheezburger however I have been going with the technique of more QUnit tests. 

I have found not testing or using Selenium to be the wrong ideal.

Why Test Javascript At All?

Realistically if we're professionals we always test our code. The controversy in testing is usually related to whether or not we automate it. Why automate anything? Everything is so  easy once you understand how to do it. If you just take the time to understand the code (by grabbing your nearest warm body) you won't need tests because it is just that easy.

Because I understand that I am falliable, that I miscommunicate even when I say things exactly as they are and I mean them (ain't human perception a bitch?), because I don't want my team to have to take the time to ask for my opinion. It feels great on the ego.. that's a huge behaviour smell right there.

Why Avoid Selenium?

It's an issue of short and tight feedback cycles. To be perfectly fair, I'm sure there are ways of using Selenium such that one can have very tight feedback loops. That's not how I do it though and I've not yet experienced anyone else who has used the tool in that way either... just sayin. When I am delving into unknown code it is much too expensive to  wait a couple seconds for the server to start up and then another several seconds for the other tests to run for every change I make. 

Having said that, I love Selenium for code that is too difficult to get under unit tests. It's great to have a way to give me a pretty high degree of confidence that I haven't broken anything.

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Sat, 23 Apr 2011 01:17:00 -0700 Find/Replace on a JSON Object Graph http://codelikebozo.com/findreplace-on-a-json-object-graph http://codelikebozo.com/findreplace-on-a-json-object-graph
Today I had cause to implement a method for finding and replacing a value that appears at the end of a certain JSON path in an object graph. I couldn't find a preexisting tool to the dirty work so I wrote it myself and then this article. :)

Here's a concrete example. Imagine you have the following JSON:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
{
  "CompanyData" =>
  [
    {
      "Name"=>"Company A",
      "Employees"=>
      [
        {
          "Status"=>"Awesome",
          "Name" => "Employee 42",
          "ShirtColor" => "Purple",
          "FavoriteFood" => "Pizza"
        },
        {
          "Name" => "Employee 1",
          "ShirtColor" => "Green",
          "FavoriteFood" => "Rocks",
          "Status"=>"BAD",
        },
      ]
    },
    {
      "Name"=>"Company B",
      "Employees"=>
      [
        {
          "Status"=>"Awesome",
          "Name" => "Another Employee",
          "ShirtColor" => "Maroon",
          "FavoriteFood" => "Your Mom",
          "Manages" =>
          [
            {
              "Name" => "Mofo",
              "ShirtColor" => "Teal",
              "FavoriteFood" => "Blood",
              "Status"=>"Awesome",
             },
          ],
          "Wife" =>
          {
            "ShirtColor"=>"Purple"
          }
        },
        {
          "Name" => "THE IMP",
          "ShirtColor" => "Mauve",
          "Status"=>"BAD",
        },
      ]
    },
  ]
}

Now imagine that you want to make the shirt color of every employee with a status of awesome orange. Why? No clue. Work with me here. How would you accomplish that?

After looking for someone else having already done this, I set out to do it myself and was surprised at how simple this was in Ruby. The technique I thought of was to search through every  node in the object graph and call a special replace function on each one. If the given node matched the criteria, then it or its children would be updated accordingly.

The following code amounts to a depth first search of the object graph

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def search node, &replacement_action
  if node.class() == Hash
    node.each_value do |item|
      yield item
      search item, &replacement_action
    end
  else
    if node.class() == Array
      node.each do |item|
        yield item
        search item, &replacement_action
      end
    end
  end
end

search companies do |node|
  if node.class() == Hash
    node["ShirtColor"] = "Orange" if node["Status"] == "Awesome"
  end
end

Really all of the "magic" is in the search method which really just knows how to enumerate either a Hash or an Array, call the replace method and then recursively search its children. If the child object fails some aspect of the replace criteria, nothing happens, we just move on searching through all of its children's Hash or Array children and so on until no other options exist.

What's most surprising to me is the simplicity and elegance of the solution. I probably spent more time looking for an alternate than it took me to write that code.

Now before you think that this will only work in the simplest of cases, here is the actual replacement code I needed for my real world scenario:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def replace_on_node node, value_to_find, replacement_value
  objectType = node.class()
  if objectType == Hash
    if node.has_key? "KeyOverride"
      key_override_object = node["KeyOverride"]
      if key_override_object.class() == Hash
        if key_override_object.has_key? "_Data"
          data_object = key_override_object["_Data"]
          if data_object.class() == Array
            data_object.each do |item|
              if item.class() == Hash
                if item["<Value>k__BackingField"].class() == Hash
                  if item["<Id>k__BackingField"] == 0 && item["<Value>k__BackingField"]["<Value>k__BackingField"] == value_to_find
                    puts "Replaced!"
                    item["<Value>k__BackingField"]["<Value>k__BackingField"] = replacement_value
                  end
                end
              end
            end
          end
        end
      end
    end
  end
end

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Thu, 21 Apr 2011 20:32:00 -0700 String Calculator Kata with No If's http://codelikebozo.com/string-calculator-kata-with-no-ifs http://codelikebozo.com/string-calculator-kata-with-no-ifs

A friend (James Thigpen) issued a challenge to me today... Let's try to do the String Calculator Kata without a single if statement. My last blog post was about wanking code (aka code cuddling) so this seems an appropriate balance. ;D

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using NUnit.Framework;
using System.Text.RegularExpressions;

namespace StringCalculatorKata
{
[TestFixture]
    public class StringCalculator
    {
[Test]
        public void EmptyStringShouldReturnZero()
        {
            AssertCalculationResult(string.Empty, 0);
        }

[Test]
        public void SingleDigitShouldReturnItself()
        {
            AssertCalculationResult("34", 34);
        }

[Test]
        public void MultipleNumbersSeparatedByDelimiterShouldBeSummed()
        {
            AssertCalculationResult("3,4,55", 62);
        }

[Test]
        public void NewLineIsAlsoAValidDefaultDelimiter()
        {
            AssertCalculationResult("3\n4,55", 62);
        }

[Test]
        public void CustomDelimiters()
        {
            AssertCalculationResult("//;5;3\n4,7", 19);
        }

[Test]
        public void ExtractDelimitersFromInputString()
        {
            var actualDelimiters = Calculator.GetDelimiters("//;//!1,2,3", new []{ "//,", "//\n"});
            Assert.That(actualDelimiters, Is.EquivalentTo(new[] { "//,", "//\n", "//;", "//!" }));
        }

        public void AssertCalculationResult(string inputString, int expectedValue)
        {
            var calculator = new Calculator();
            var result = calculator.Add(inputString);

            Assert.That(expectedValue, Is.EqualTo(result));
        }
    }

    public class Calculator
    {
        public int Add(string inputString)
        {
            var customDelimiterLength = 3;
            var defaultDelimiters = new[] { "//,", "//\n" };
            var delimiterStrings = GetDelimiters(inputString, defaultDelimiters).ToArray();
            var inputStringStart = customDelimiterLength * (delimiterStrings.Length - defaultDelimiters.Length);
            var delimiters = delimiterStrings
                .Select(delimiter => delimiter[2])
                .ToArray();
            var numberStrings = inputString.Substring(inputStringStart).Split(delimiters);

            return numberStrings.Sum(numberString =>
            {
                var number = 0;
                int.TryParse(numberString, out number);
                return number;
            });
        }

        public static IEnumerable<string> GetDelimiters(string inputString, IEnumerable<string> defaultDelimiters)
        {
            var delimiters = new List<string>();
            delimiters.AddRange(defaultDelimiters);

            var regex = new Regex("(//.)");
            var matches = regex.Matches(inputString, 0);
            var matchCount = matches.Count;
            for (var index = 0; index < matchCount; index++)
            {
                var capturedValue = matches[index].Value;
                delimiters.Add(capturedValue);
            }

            return delimiters;
        }
    }
}

You can get more information on the kata here: http://katas.softwarecraftsmanship.org/?p=80

Refactorings would be awesome, I have a nagging feeling it doesn't need to look this ugly.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Mon, 10 Jan 2011 08:29:00 -0800 Selling Software or Wanking Code http://codelikebozo.com/selling-software-or-wanking-code http://codelikebozo.com/selling-software-or-wanking-code

First off, I love beautiful code and have been known to fixate on it so this article is a formalization of what I think to myself every time I start to get religious over coding quality.

Code quality is an oft talked about yet poorly defined topic amongst programmers. Ask 100 different developers what "quality" means to them and you'll receive 100 different answers. Responses will range from "Quality code is code that can be easily changed and understood" to "Quality code is hard to define but following the SOLID principles is a good start" to "it's more of an art that's hard to define." Ok, but maybe you're thinking that these are too abstract and should refer to reducing costs and reducing bugs. Sure. Maybe. Ultimately, however, all of these definitions of "quality" skirt the elephant in the room.

On commercial projects, high quality code will help enable my company to maximize its profits.

When I get in a heavy debate over whether or not someone is really "unit" testing or just "integration" testing, nowadays I ask myself (and then my sparring partner) "Is this why we can't deliver software?" Put another way, "Is this the most critical obstacle in the path of my company making more money over the short and long term?" The answer is usually no.

When it is no, I have to suck up my ego and walk away from the discussion since I've admitted there's limited value to be had. Note, there's some value, especially if my purpose is getting on the same page as my team.

When the answer is yes, as it oh so very rarely is, now I can make a bold statement if I can concretely share WHY this is more seriously affecting the performance of the company over every other concern. Here's an example:

Imaginary dev: "I don't have time for automated testing so get off my back about it."

Me: "Automated testing is the single most critical thing we can be doing to drive our company's profit because every bug we miss is a bug our customers have to catch and their time is so limited that they cant possibly catch them all. That means the bugs will make it to our customers who will slowly lose faith in our product with every issue they find. We can't afford to manually test so we absolutely have to run automated tests."

While perhaps not bullet proof, that's a strong argument. What would an even stronger counter argument look like?

Imaginary dev: "If I can do what's worked for me in the past and just get this feature done by this hard deadline our customer will pay us a $3 million bonus. Our business customers have already decided and agree that even if we do nothing but review the code I have written for two weeks after the deadline, it will still have been extremely profitable for us to take this measured risk."

Do you believe in "quality" so much that you would ask your team to not let one of the devs on the team you've seen consistently pull in results to bring in a $3 million pay day? If you do, what's your number? If you don't have one, then you're not in this business for the business.

Hi, my name is Justin. I'm a recovering code wanker.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Sat, 04 Dec 2010 10:04:42 -0800 Introducing STFU and Code http://codelikebozo.com/introducing-stfu-and-code http://codelikebozo.com/introducing-stfu-and-code My recent foray into the Ruby world with Sinatra and Heroku has taught me a lot about what we could be doing better in .NET. STFU and Code is my **first** response.

I formed this project with Tim Erickson, a great friend, to reduce the friction of getting down to brass tacks and working on a small .NET project. We're using AvalonEdit (http://wiki.sharpdevelop.net/AvalonEdit.ashx) to provide us with syntax highlighting and in order to suss out whether or not the code parses.

It's really more of a code thought, but it's usable and your comments are welcome!

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Sat, 16 Oct 2010 07:24:00 -0700 Learning Data Visualization From A Data Scientist http://codelikebozo.com/learning-data-visualization-from-a-data-scien http://codelikebozo.com/learning-data-visualization-from-a-data-scien

Twitter_hacking_at_strange_loop

How I Came Across A Real Live Data Scientist!

I was fortunate enough to be able to attend this year's Strangeloop conference (http://thestrangeloop.com/). Hilary Mason, data scientist extraordinaire, gave the opening keynote entitled "Machine Learning: A Love Story". As soon as she said we'd need a little bit of math to get through the presentation, I knew it was gonna be good. After healthy background on failed attempts at machine learning across the twentieth century she got into Bayesian statistics and then related this back to her work at bit.ly

That's when I decided it was my weekend's goal to get her to hack on something, anything, related to data mining with me. Check her out on Twitter @hmason or her website @ http://www.hilarymason.com/

Graciously, she agreed and we set up the time and place. We ended up with around ten people in total hacking for about an hour in a small cafe here in St. Louis. I published the final product here: http://github.com/jcbozonier/Strangeloop-Data-Visualization

and Hilary is hosting the visualization here:

That's the background and this is what came of it for me.

Answers Are Easy, Asking The Right Questions are Hard

I've been self-studying data analysis for a few months in my spare time and it can be so confusing knowing what I'm doing right or wrong. It's not like programming where I can tell if I have a right answer... it's more or less just me thinking the answer feels right. That's really hard for me.

By grouping up with Hilary I was hoping to get some insight into her professional workflow, what tools she uses, and also I wanted to get a feel for her general approach and mindset for answering a given question with her data-fu.

The question we ultimately decided to work on was what "What does the Strangeloop social network look like on Twitter?" In other words, who's talking to who and how much? Our shared mental model for the problem was essentially a graph of nodes interconnected with a bunch of undirected edges which indicated those two people had communicated via Twitter. Hilary had already grabbed Protovis along with a sample of using it to create a force-directed layout so it was a perfect fit for answering that question. 

Three Steps

Today I learned to think about data analysis as three main steps or phases (since the steps can get a little large). 

1. Get Data- Get the data. In whatever form is easiest, just gather all of the data you'll need and get it on disk. Don't worry about how nice and neat it is.

2. Prune it- Now you can take that mass of data and start to think about what portions of it you can use. The pruning phase is your chance to trim your data down and focus it a bit. This is where you eliminate all aspects of the data except for the ones you'll want to visualize.

3. Glam it up- Here's where you figure out what you'll need to do to get your data into a visualizable form. 

1. Getting Data From Twitter

To get our data I wrote a script that used Twitter's search api to download all tweets that contained the hash tag #strangeloop. Since the data is paged, my code had to loop through about 15 pages until it had exhausted Twitter's records.

This is the code. It's pretty simple but effective.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
require 'net/http'

pages_remain = true
number = 1
file_containing_tweets = 'strangeloop_tweets.json'

while(pages_remain)
  open(file_containing_tweets, 'a') { |f|
    Net::HTTP.start("search.twitter.com") { |http|
      response = http.get("/search.json?q=%23strangeloop&rpp=100&page=#{number}")
      
      if response.body == '{"error":"page parameter out of range"}'
        pages_remain = false
      else
        f.puts response
        number += 1
      end
    }
  }
end

There may be errors or corner cases and that's fine. None of this is code I would unit test until it became apparent that I should. The main task at hand here is to get data and in this case at least that's a binary result. It's easy to know if some part of that code went wrong. Also, I need to be able to work quickly enough that I can stay in the flow of the problem at hand. I'm really just hacking at Twitter trying to get the data I want to a file on disk. If I have to do it by hand that's fine.

2. Pruning The Data To Fit My Mental Model

I chose to download the data as JSON because I assumed that would be a pretty simple format to integrate with. Now that Ruby 1.9 comes with a JSON module out of the box, it totally was! Well... pretty much.

Once I had downloaded all of the data I manually massaged each of the 15 JSON result objects to leave behind only their tweets and none of the meta-data surrounding the search. Once I had that completed I had a file containing 1400-1500 JSON tweet objects in a JSON array. 

Now during our group session I didn't actually write this portion of the solution. It was actually David Joyner (follow him on Twitter as @djoyner) and he delivered the end result to Hilary in CSV format via Python. I've recoded it here because there was a bug in the code we wrote to create the data we visualized and I needed a way to regenerate the data once the bug was fixed. Since I didn't have his Python script I just opted to rewrite what he had done.

From here I just tried to get the data loaded up into Ruby via the JSON module. I load the saved JSON from disk with the following code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
require 'json'

def get_file_as_string(filename)
  data = ''
  f = File.open(filename, "r")
  f.each_line do |line|
    data += line
  end

  return data
end

def get_strangeloop_tweets
  text_file_containing_tweets = 'formatted_tweets.json'
  raw_json_text = get_file_as_string text_file_containing_tweets
  tweets = JSON.parse(raw_json_text)
  
  return tweets
end

My approach once again was very hack-oriented. Do a little bit of ruby script in such a way that I can verify that it worked via the command line, reiterate by adding another step or two and repeating. It's like TDD but much less thought, just hacking and feeling my way around the problem space.

3. Glamming It Up For Protovis

To recap, so far you've got me getting the data downloaded into a parseable form, this other guy loading that from disk, and then he also did the original work on pulling the data into a set of undirected edges of people talking to one another. I also rewrote this for lack of his code and for lack of Hilary's code converting his data into something Protovis could use. In order to make the graph really interesting we also decided to add up the number of times a given edge was used which you'll see being computed in this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Edge = Struct.new(:from, :to)

def get_tweep_connections_from tweets
  tweep_edges = {}
  tweets.each{ |tweet|
    tweep = tweet['from_user']
    to_nodes = extract_all_tweeps_from tweet
    
    if to_nodes.length > 0
      to_nodes.each{ |node|
        raise "node is blank!!" if node == ''
        edge_a = Edge.new(tweep, node)
        edge_b = Edge.new(node, tweep)

        if tweep_edges.has_key? edge_a
          tweep_edges[edge_a] += 1
        elsif tweep_edges.has_key? edge_b
          tweep_edges[edge_b] += 1
        else
          tweep_edges[edge_a] = 1
        end
      }
    end
  }
  
  return tweep_edges
end

David Joyner was also kind enough to send me his original Python code that essentially does the same thing:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import json, re

RE_MENTION = re.compile(r'@(\w+)')

f = open('formatted_tweets.json')
tweets = json.load(f)
f.close()

graph = {}

for tweet in tweets:
    from_user = tweet['from_user']
    for m in RE_MENTION.finditer(tweet['text']):
        to_user = m.group(0)[1:]

        pair1 = (from_user, to_user)
        pair2 = (to_user, from_user)

        if pair1 in graph:
            graph[pair1] += 1
        elif pair2 in graph:
            graph[pair2] += 1
        else:
            graph[pair1] = 1

for key, value in graph.items():
    print "%s, %s, %d" % (key[0], key[1], value)

The thought was that the more active a person was on Twitter, the more they influenced the network. This could cause someone who was really chatty to get over-emphasized in the visualization but in our case it worked out well.

So ok we had all of this data but it wasn't in the form that Protovis needed to show our awesome visualization. Hilary figured this out by downloading a sample project from their project's website. The data needed to be put in this form:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// This file contains the weighted network of coappearances of characters in
// Victor Hugo's novel "Les Miserables". Nodes represent characters as indicated
// by the labels, and edges connect any pair of characters that appear in the
// same chapter of the book. The values on the edges are the number of such
// coappearances. The data on coappearances were taken from D. E. Knuth, The
// Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley,
// Reading, MA (1993).
//
// The group labels were transcribed from "Finding and evaluating community
// structure in networks" by M. E. J. Newman and M. Girvan.

var miserables = {
  nodes:[
    {nodeName:"Myriel", group:1},
    {nodeName:"Napoleon", group:1},
    {nodeName:"Mlle. Baptistine", group:1},
    {nodeName:"Mme. Magloire", group:1},
    {nodeName:"Countess de Lo", group:1},
    {nodeName:"Geborand", group:1},
    {nodeName:"Champtercier", group:1},
    {nodeName:"Cravatte", group:1},
    {nodeName:"Count", group:1},
    {nodeName:"Old Man", group:1}
  ],
  links:[
    {source:1, target:0, value:1},
    {source:2, target:0, value:8},
    {source:3, target:0, value:10},
    {source:3, target:2, value:6},
    {source:4, target:0, value:1},
    {source:5, target:0, value:1},
    {source:6, target:0, value:1},
    {source:7, target:0, value:1},
    {source:8, target:0, value:2},
    {source:9, target:0, value:1}
  ]
};

If you scroll through that a ways you'll eventually see some data that looks like this:
{source:72, target:27, value:1},

Nice eh? Those numbers are basically saying draw a line from the node at index 72 of our list of nodes to the node at index 27. That complicated things a bit but Hilary got through it with some code I imagine wasn't too dramatically different from this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def create_protovis_data_from tweeps, tweep_edges
  counter = 0
  tweep_index_lookup = {}

  File.open('strangeloop_words.js', 'w'){|file|
    file.puts 'var miserables = {'
    file.puts 'nodes:['
    
    tweeps.each{|tweep|
      tweep_index_lookup[tweep] = counter
      file.puts "{nodeName:\"#{tweep}\", group:1}, //#{tweep_index_lookup[tweep]}"
      counter += 1
    }
    
    file.puts '],'
    file.puts 'links:['
    
    tweep_edges.each{ |edge, strength|
      from_tweep = edge[:from]
      to_tweep = edge[:to]
      
      raise "bad to tweep!!" if not tweep_index_lookup.include? to_tweep
      raise "bad to tweep!!" if not tweep_index_lookup.include? from_tweep
      
      from_index = tweep_index_lookup[from_tweep]
      to_index = tweep_index_lookup[to_tweep]
      
      file.puts "{source:#{from_index}, target:#{to_index}, value: #{(2)**strength}},"
    }
    
    file.puts ']};'
  }
end

I just basically create a hash where I store the index number for each Twitter user's name and then look it up when I'm generating that portion of the file.

Biggest Take Away: Baby Steps

There was definitely a fair amount of work here and without all of the team work we wouldn't have been able to get this done in the 45 minutes it took us. Part of the team work was just figuring out what components of work we had in front of us. The three steps I laid out in this article are how I saw us tackling the problem and there were many other much more iterative steps I left out.

When I do more data analysis in the future I plan to just work it through piece by piece and not get overwhelmed by all of the different components that will need to come together in the end. 

The Other Biggest Take Away: Get Data At Any Cost Necessary

It's easy as a programmer for me to get bogged down in thoughts of "quality". Even Hilary was apologizing for the extremely hacked together code she had written. Ultimately though t really doesn't matter here. The code will not be ran continuously and hell it may never even be ran again! If the code falls apart and blows up, I can quickly rewrite it. I'm my own customer in this sense. I can tolerate errors and I can fix them on the fly. When I'm exploring a problem space the most important thing for me is to reduce the friction of my thought process. If I think best hacking together code then awesome. Once I can get my data I'm done. I don't care about robustness... I just need it to work right now.

I'm harping on this point because it's such a dramatic shift from the way I see production code for my day job. Code I write for work needs to be understood by a whole team, solid against unconsidered use cases, reliable, etc. Code I write to get me data really quick, I just need the data.

While Hilary is a pythonista, at one point I remember her commenting on programming language choice and saying something to the effect of "It doesn't matter, they all work well." She was so calm about it... it was almost zen like. After having so many passionate talks regarding programming languages with other programmers it was very refreshing to interact with someone who had a definite preference but was able to keep her eye on the prize... the data and more importantly the answers that the data held.

Next Steps

I'd like to work on a way to tell which of the people I follow on Twitter are valuable and which I should stop following. Essentially a classifier I guess. On top of that I'd like to write another one to recommend people I should follow based on their similarity to other people I do follow (and who are valuable)... We'll see. I've got another project that desperately needs my time right now. If you happen to write this though or know of anyone who has, let me know!

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Mon, 20 Sep 2010 20:15:00 -0700 Monte Carlo Analysis of the Zero Defect Mentality of TDD http://codelikebozo.com/monte-carlo-analysis-of-the-zero-defect-menta http://codelikebozo.com/monte-carlo-analysis-of-the-zero-defect-menta

Challenges that led me here

  • Heated arguments at work regarding how much TDD is enough and how little is too little. How do we find common ground?
  • An acknowledgement of technical debt and a confusion about how to leverage it. How much debt is too much?
  • Being labeled as pedantic and a zealot. Is a Zero-Defect Mindset ever worthwhile? When?
  • Learning exercise in how we can gain concrete insights using our intuition in a methodical fashion. How can I communicate abstract ideas without concrete evidence in a rigorous manner?

This article represents my lessons learned from this exploration.

Making the Abstract Concrete

It was a normal day at work, myself and another co-worker were strongly and passionately arguing for the benefits of strict, pure, clean TDD against a couple of other equally passionate co-workers who were sold on the idea of everything in moderation. Having just completed a four month full time Agile immersion with an amazing albeit very idealistic consultant, his ideas about a zero-defect mindset and the idea that it was practically achievable were seductive. I had entertained my own idealistic fantasies for a while never really thinking they could or should be taken so seriously.

It was liberating. 

Also, it was isolating. Having these thoughts, and that excitement placed me on one extreme side of a continuum with many of my other teammates on the other side or somewhere in the middle, nearer to the side of limiting TDD in the name of practicality. Conversation after conversation, debate after debate, we ended in the same place, perhaps even galvanized a bit by the disagreement and a bit further from finding common ground.

I finally came to understand that regardless of what I knew to be right, everyone on my team had their own perception and their own knowledge of what was right as well. That's not sarcasm. In social interactions there are multiple realities and all of them need to be appreciated and considered valid enough to be worth understanding. 

How could I model my perception of reality in some sort of a concrete way that would enable me to make rigorous (albeit somewhat subjective) predictions? How could I ensure my mental model was at least self-consistent and work-able? Like self-respecting geek, I decided the best way to model uncertainty was to run thousands of simulations and projections of reality to see what lessons could be gleaned.

Finding Common Ground in a Common Purpose

The first decision I had to make was figuring out the underlying metric I would use to compare the two development methodologies. Having been just recently introduced to systems thinking and the Theory of Constraints, I thought a great start would be to use the value throughput of the companies. 

But what is value? When we speak of delivering value to our business customers what is it we are actually delivering? In discussions with my team, we decided that business value is best seen as the present day value of your company were it to be valued by an external party. For the purposes of the simulation, I assume the value delivered by completed stories to be equivalent to some randomly assigned numbers provided by a value distribution and assigned without regard for feature size. That's right, it means a feature that takes next to nothing to develop may create an enormous amount of value for the company.

For further assumptions and specifics of my model, read on.

My Model for Thinking About This (AKA My Domain Model)

Concepts and their role in the simulation:
  • User Story- In this simulation, a User Story is the smallest unit of work that the Product Development Team can work on that provides the slightest bit of business value. They also have an associated size.
  • Business Customers- Generates a random set of randomly sized (to a discrete distribution) stories each iteration. Their value and size are also randomly assigned upon creation.
  • Product Backlog- Repository for all stories. New stories are all added as a top priority in the order delivered. Bugs are randomly dispersed into the Product Backlog when they are received. 
  • Product Development Team- Anyone and everyone responsible for getting the release out the door. This includes programmers, testers, technical writers, etc., etc. The Product Development Team iterates over the Product Backlog and works to complete stories. They also are the ones deciding the cost of the various stories. Over time the speed of their work can go up if a range (minimum velocity and maximum velocity) > 1 is specified on construction. The function which controls the team's performance improvement is an "Experience Curve" as documented here: http://en.wikipedia.org/wiki/Experience_curve_effects Without getting too into it, this experience curve essentially models the decreasing cost of development over time.
  • End Users- Who the Product Development Team releases to. Because the Product Development Team includes *everyone* needed to release the software, the End User may receive the software immediately afterwards. End Users discover bugs in the software. This is currently set to a constant rate per story per iteration. So if the defect rate is 1%, then a team with a hundred stories complete can expect to have, give or take, one story per iteration reenter the Product Backlog as a new Bug Story. The size of the Bug Story is randomly determined based upon a discrete bug size distribution. 
  • Bug Story- A Bug Story is a story that is focused on fixing a defect in the software. These stories are unlike normal stories in that they have no real value for the team and thus don't improve throughput. A Bug Story actually represents more of an opportunity cost as valuable work could be done in its place if the Bug Story hadn't needed to be written.
  • Support Team- Who the bugs are reported to. Currently only really used to track the total bug count. Could be used in the future to eliminate bugs due to "user error".
Each of these concepts map directly to objects in the Javascript simulation.

How The Simulation Works

First this is how the top most level of the simulation runs:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
function run_simulation()
{
  var simulation_settings =
  {
    'teams_to_simulate' : 10,
    'weeks_to_project' : 1 * 52,
    'story_size_distribution' : [0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4],
    'bug_size_distribution' : [0,0,0,1,1,1,1,1,1,1,1,1,2,2,2,2,2,4,4],
    'tdd_defect_ratio' : .07,
    'min_tdd_velocity' : 1,
    'max_tdd_velocity' : 16,
    'std_defect_ratio' : .14,
    'min_std_velocity' : 20,
    'max_std_velocity' : 20,
  };
  
  var tdd_std_comparison_chart = new Chart();
  var developmentProcessFactory = new DevelopmentProcessFactory(tdd_std_comparison_chart);
  
  var simulated_development_teams = new SimulatedDevelopmentTeams(simulation_settings);
  simulated_development_teams.create_tdd_development_teams_using(developmentProcessFactory);
  simulated_development_teams.create_standard_development_teams_using(developmentProcessFactory);
  
  for(var i = 0; i < simulation_settings.weeks_to_project; i++)
  {
    simulated_development_teams.iterate();
  }
  
  tdd_std_comparison_chart.draw_chart_to('value_delivered_chart');
}

The distributions you see are assumed to be valid randomly chosen samples of real data. There's one set of data for story sizes and another set of data for bug sizes. In the interest of full disclosure, this is NOT real data. That's for my co-workers only. :) 

The overall process is shown in the following code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
DevelopmentProcess.prototype.iterate = function()
{
  this._report.next_iteration();

  this._customer.deliver_new_stories_to(this._product_backlog);
  this._development_team.work_from(this._product_backlog);
  this._development_team.move_finished_stories_to(this._end_users);
  this._end_users.test_stories_and_report_failures_to(this._bug_queue);
  this._bug_queue.prioritize_and_move_bugs_to(this._product_backlog);
  
  this._product_backlog.report_to(this._report);
  this._end_users.report_to(this._report);
  this._bug_queue.report_to(this._report);
};

Simultaneously, that code also shows why I have a disdain for the fixation many developers have to instantiate objects within other objects to "hide worthless noise". It reads pretty damn well.

Constructing the simulated development teams is handled by the DevelopmentTeamFactory which constructs the teams using the TDD methodology as well as the standard development teams:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
DevelopmentProcessFactory.prototype.create_tdd_development = function (simulation_settings)
{
  var tdd_development_team_report = new Report();
  tdd_development_team_report.plot_results_in_color('green');
  tdd_development_team_report.plot_results_on(this._tdd_comparison_chart);
  
  var business_customers = new BusinessCustomers();
  
  var product_backlog = new ProductBacklog();
  
  var development_team = new DevelopmentTeam(simulation_settings.story_size_distribution);
  development_team.velocity_begins_at(simulation_settings.min_tdd_velocity);
  development_team.maximum_possible_velocity_is(simulation_settings.max_tdd_velocity);
  
  var random_bug_size_generator = new DiscreteDistribution();
  random_bug_size_generator.use_these_as_samples(simulation_settings.bug_size_distribution);
  
  var end_users = new EndUsers(random_bug_size_generator);
  end_users.find_this_many_bugs_per_story_per_iteration(simulation_settings.tdd_defect_ratio);
  
  var support_team = new SupportTeam();
  
  // This is an ugly construction... How to improve it??
  var development_process = new DevelopmentProcess(business_customers, product_backlog, development_team, end_users, support_team, tdd_development_team_report);
  
  return development_process;
};

That code does instantiate within a method call. In this case I rationalize it as being a part of a factory and that being the factory's concern. I'm not certain that makes for the cleanest code though. I'll still be iterating over this after I publish this article. :)

Everything should work in the latest versions of FireFox and Chrome. It requires HTML 5 in order to use the canvas for charting.

Here's a sample chart with 50 TDD teams (green tick marks) and 50 standard teams (red tick marks) each running over two years:
50_50_2_sim

Feel free to open the code and modify the json simulation settings yourself and run your own simulations. Note that as soon as you press the Run button your browser will seize for a bit. I recommend Chrome for its amazing speed. Don't kill the process! I promise it will finish eventually. ;)

[Insert lesson about how important User Experience design is here]

How It Was Built

Baby steps. I started with one team that could be modeled and just showed final iteration stats on the web page. Then I moved onto simultaneously comparing that team with a team using a standard development model using a table of data. Next up I found a pretty good HTML 5 charting API and got the key data visualized (Total Value to Date vs. Time Passed).

Lessons Reinforced
  • Lowering the defect rate, even at the cost of reduced performance, results in higher value throughput in the long term. Lowering the defect rate in the short term however is hardly ever optimal.
  • A higher defect rate results in a much higher spread of possible value throughput... in other words there's a higher variance in what you can expect in terms of value output from a product development team.
  • Every development team has a point where the highest possible testing and quality rigor begins to outperform the less rigorous teams. The trick is identifying where this begins to happen for your particular company or project.
All of these discussions of you should always TDD, always bake quality in, always etc., etc. These statements are just as accurate for the opposing view point for some company. Where is it for your company? You'll need to have some answer for this before you can make any sort of a real argument, at least based upon the ideal of striving for zero defects.

My Hacker Mentality

Given that it's all about where we decide that testing becomes the highest value decision I realized why I never test personal projects. For me, I set the expected life of my projects to be practically zero. Likewise, I end up unable to reasonably justify any testing. By my estimates, testing won't produce any real value. The real problem with my hacker mentality is that I tend to underestimate how long I'll be working on something. Take this simulation as an example. I began it with no tests because I figured I'd slam something together in a night and be done with it. However, I enjoyed it much more than I expected and ended up wanting to explore the nooks and crannies of my model. 

Maybe some day I'll learn?

Conclusion

Assuming a business life of time in years of T where T is far enough out in the future, the more testing the better and the more defects you can prevent the better. This is regardless of the costs we encounter because in the long run if we don't test our primary concern will be just preventing errors from occurring in pre-existing features, thus barring work on any new features that add value.

However, if you can assure yourself a limited time period T, you can rest assured it may actually be in your best interest to not have a zero defect mindset. Just don't think that you can change to a 20 year life span at the end of the 5th and see an instant turn around in value throughput.

If you've read through this far, you deserve a reward. Here's my conclusions on the questions I asked in the beginning:
  • How do we find common ground? Share our assumptions and make them explicit. Codify them so that they can't be conveniently shifted when the arguments get uncomfortable.
  • How much debt is too much? I didn't model technical debt in terms of needed refactoring... just in terms of defect likelihood. Too much debt is so much that you spend most time paying maintenance costs than delivering value.
  • Is a Zero-Defect Mindset ever worthwhile? When? Yes it is, when you have set a goal of a sufficiently large life time for your product.
  • How can I communicate abstract ideas without concrete evidence in a rigorous manner? Hopefully I just did.

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Sun, 18 Jul 2010 19:45:00 -0700 Breaking Up with IoC Containers http://codelikebozo.com/breaking-up-with-ioc http://codelikebozo.com/breaking-up-with-ioc
Straight Up
 
I've stopped using or caring about IoC containers. I used to use them because they were so quick and easy and they kept my code looking pristine and beautiful. Now I do manual dependency injection and the results on non-trivial systems are very interesting and look even more beautiful. At the end of the post we'll examine a little bit more of why I left them behind. In the meantime, I've probably left you wondering what the heck I do to keep things from getting out of hand. "What about the times when you have to inject a dependency through five other objects before it gets to where you need it??" Yeah, we'll get to that.
 

Focusing the Discussion
 
The statements I make in this article assume the following:
  1. You know what an IoC container is.
  2. You only use an IoC container to clean up your dependency injection.
  3. You aren't writing prototypical, throw away code.
  4. The system under question is not an ideal of perfection. It's just a realistic system where I can concretely show the product of applying this abstract idea.
Now, the problem domain: The code in this post was for a Twitter/IM client I was building for a while. Its point was to unify all of your messaging clients into one in a way that made the different clients a non-concern to the user. It was about unifying and simplifying. It has been about a year since I've touched this code so when I first came back and looked at the IoC declaration to re-figure out the lay of the land, I was underwhelmed.
 

Here's my original IoC container declaration:

 
The problem I have that needs solving is how to make my code into living documentation that describes itself long after I've forgotten about it.
 
The Hot New Thang I Replaced IoC With
 
That code really doesn't tell me anything useful about my system. "Wha- HUH?!" you say. Seriously, I get an idea of what the objects are in my system and how they correlate with my interfaces, but what about how the objects are used by one another? The power of OOP lies in graphs of objects. A graph is nothing if not a precise way of storing the interrelationships between individual elements. 
 
So what's the alternative? Ditching IoC and wiring everything together by hand. Ok, ok. I know. It sounds extreme and it sounds painful. Let's address some of what may seem to be pain. Here we can answer the earlier question of what do you do when you need to inject an object through several layers... you won't need to. The reason you've had to do this in the past is because you instantiated objects within other objects. By giving just one class this responsibility, you prevent that from ever happening again. Just pulling all of the dependencies up to the top most level isn't what this is all about though. There's still pain and, as you can see from this example, it does very little to add to the clarity:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
namespace Unite.UI
{
    /// <summary>
    /// Interaction logic for App.xaml
    /// </summary>
    public partial class App : Application
    {
        public App()
        {
            Startup += App_Startup;
        }

        void App_Startup(object sender, StartupEventArgs e)
        {
            var pluginFinder = new PluginFinder();
            var serviceProvider = new ServiceProvider(pluginFinder);
            var credentialCache = new CredentialCache(serviceProvider);
            var jobRunner = new AsyncJobRunner(this.Dispatcher);
            var gui = new GuiInteractionContext(credentialCache, jobRunner);
            var messagingService = new ServicesManager(serviceProvider);
            var contactRepository = new ContactRepository();
            var contactManager = new ContactManager(messagingService, contactRepository);
            var messageRepository = new MessageRepository();
            var codePaste = new CodePaste();
            var messageFormatter = new MessageFormatter(codePaste);

            var credentialManager = new CredentialManager(messagingService, gui);

            var messageManager = new MessageManager(messagingService, messageRepository, messageFormatter, jobRunner);

            var viewModel = new ViewModels.MainView(gui, contactManager, messageManager);
            var view = new Views.MainView(viewModel);
            view.Show();
        }
    }
}
 
So after looking at the entire system's wire up and revisiting the different classes I came up with some more appropriate naming:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
namespace Unite.UI
{
    /// <summary>
    /// Interaction logic for App.xaml
    /// </summary>
    public partial class App : Application
    {
        public App()
        {
            Startup += App_Startup;
        }

        void App_Startup(object sender, StartupEventArgs e)
        {
            var messagingPluginFinder = new MessagingPluginFinder();
            var messagingPlugInRepository = new MessagingPlugInRepository(messagingPluginFinder);
            var appropriatePlugInDetection = new DetectPlugInToUseBasedOnRecipientAddress(messagingPlugInRepository);
            var unifiedMessenger = new UnifiedMessenger(messagingPlugInRepository, appropriatePlugInDetection);
            
            var contactRepository = new ContactRepository();
            var contactQuery = new ContactQuery(unifiedMessenger, contactRepository);

            var messagingFiber = new AsyncFiber(this.Dispatcher);
            var credentialRepository = new MessagingAccountCredentialRepository(messagingPlugInRepository);
            var securityDialogService = new SecurityDialogService(credentialRepository, messagingFiber);

            var credentialManager = new CredentialAuthorizationController(unifiedMessenger, securityDialogService);

            var codePasteToUrlService = new CodePasteToUrlService();
            var automaticMessageFormatting = new AutoFormatCodePastesAsUrls(codePasteToUrlService);

            var messageRepository = new MessageRepository();
            var unifiedMessagingController = new UnifiedMessagingController(unifiedMessenger, messageRepository, automaticMessageFormatting, messagingFiber);

            var messagingViewModel = new MessagingViewModel(securityDialogService, contactQuery, unifiedMessagingController);
            var messagingWindow = new MessagingWindow(messagingViewModel);
           
            messagingWindow.Show();
        }
    }
}

 
That's the new hawtness. Sitting down, thinking of the code and how the different objects work together and naming them in a way that binds them into a cohesive overarching vision.
 
One of the key principles of OO design is cohesion afterall. Prior to getting all of these objects together and seeing how they were interconnected I didn't really see that they weren't cohesive. The various objects weren't named in a way that illustrated their cohesion with the rest of the system and I didn't have an easy way of seeing them all related to each other. 
 
A key concept that comes out of this is that code is a form of literature. Donald Knuth says,
 
"I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature...

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other."

 Donald Knuth. "Literate Programming (1984)" in Literate Programming. CSLI, 1992, pg. 99
 
By declaring my object graph in a single spot a human can read it, can see how the different objects depend on and relate to one another, I have a great spot to introduce a new programmer into my system. It isn't necessarily easy, but it will be a more thorough and thoughtful treatment on the system than the original IoC declaration. If I had a better job of adhering to DDD principles in this system, I'd like to think this wire up would be even more valuable.
 
Before you go too long thinking that if I would have just taken the same amount of time with my IoC container I could've fixed the issues with it here's a sample of what it looks like AFTER renaming the classes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public static void BootstrapStructureMap()
        {
            // Initialize the static ObjectFactory container
            ObjectFactory.Initialize(x =>
            {
                var fiber = new AsyncFiber(Dispatcher.CurrentDispatcher);

                x.ForRequestedType<Views.MessagingWindow>().TheDefaultIsConcreteType<Views.MessagingWindow>();
                x.ForRequestedType<IUnifiedMessagingService>().CacheBy(InstanceScope.Singleton).TheDefaultIsConcreteType<UnifiedMessenger>();
                x.ForRequestedType<IContactService>().CacheBy(InstanceScope.Singleton).TheDefaultIsConcreteType<UnifiedMessenger>();
                x.ForRequestedType<IServiceProvider>().TheDefaultIsConcreteType<ServiceProvider>();
                x.ForRequestedType<IPluginFinder>().TheDefaultIsConcreteType<PluginFinder>();
                x.ForRequestedType<ICodePaste>().TheDefaultIsConcreteType<CodePasteToUrlService>();
                x.ForRequestedType<ICredentialCache>().CacheBy(InstanceScope.Singleton).TheDefaultIsConcreteType<MessagingAccountCredentialRepository>();
                x.ForRequestedType<IInteractionContext>().TheDefaultIsConcreteType<SecurityDialogService>();
                x.ForRequestedType<IMessageFormatter>().TheDefaultIsConcreteType<AutoFormatCodePastesAsUrls>();
                x.ForRequestedType<IFiber>().TheDefault.IsThis(fiber);
            });
        }
 
On top of that, even if it did read exactly the same, if I am only using it for dependency injection and you don't believe it will add more value to the system's understanding (and I can't believe that you would argue with that) then it's adding superfluous complexity to my project.
 
Having a Conversation with Your Code
 
There may come a time when you try to employ these ideas on a system you're building and it may seem too difficult to get this working. Like writing a book, technical manual, blog post, etc. this technique is an art. The technique itself is not the problem, except of course when it is. To know for sure you need to have a dialog with yourself. Look for the root cause of your difficulties, they'll usually align with places where you've bucked the corner stone OO principles (encapsulation, cohesion, and polymorphism). If, in reading this article, you wonder what about those times when you have an object with 12 parameters that is instantiated inside some other object... ask yourself why you're doing this first, and don't just answer with "Because it was the simplest, easiest thing to do." Simple is not equivalent to the least amount of work. Think towards root cause analysis and solve, or at least move towards solving, the root cause.
 
In this case, IoC containers never solved the root cause of why my objects were so difficult to instantiate and use. IoC containers never helped me to make a system that better communicated my intent. IoC containers did however help me create a composite application. So I will continue using them to that aim and cease using them for most others.
 
Good luck with this! Tell me your thoughts and if you think I'm stark raving mad by leaving a comment or shooting me an email. Thanks!

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Tue, 25 May 2010 23:38:00 -0700 Message Oriented Object Design and James Shore's Challenge http://codelikebozo.com/message-oriented-object-design-and-james-shor http://codelikebozo.com/message-oriented-object-design-and-james-shor

James Shore posted an architectural challenge this week on his blog and personally threw his gauntlet in my face to answer the challenge using this message oriented design stuff I've been ranting and raving about. Of course when I say "threw his gauntlet in my face" I really mean he said it might be interesting to see... BUT STILL! A man can't back down from that! ;)

 

You can read about his challenge in detail here: http://jamesshore.com/Blog/Architectural-Design-Challenge.html
 
To sum it up, the idea is to build a ROT13 file encoder, TDD'd and beautiful. There were two parts to his challenge. The first was just to get everything done by reading the whole file into memory. Then he wanted us to refine our design around this idea. Once that was done we could move onto to part two of the challenge which required us to process the file as we load it off of disk and save it back to disk incrementally.
 
What did I learn from this experience? I'm extremely happy with the flexibility and robustness of the designs I get when I approach things from a message oriented point of view. There are times where it's too much (there are no absolutes laws right?) but for any system I work on of any actual complexity, it is a great guiding hand for me.
 
 
To get a general idea of what I did I've provided the way I connected my objects together below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
public class ROT13EncodingFileWriter
{
    public static void Do(ITextHandOff guiWriter, string fromFile, string toFile)
    {
        var configuration = new FileSystemConfiguration();
        var encoder = new ROT13Encoding();
        var fileReader = new OneLineAtATimeFileReader();
        var fileWriter = new OneLineAtATimeFileWriter();
        var encodedTextSubscribers = new[]
                                         {
                                             guiWriter,
                                             fileWriter
                                         }.CreateMultiObserver();

        configuration.SetFileReaderToConfigure(fileReader);
        configuration.SetFileWriterToConfigure(fileWriter);
        fileReader.OnNewTextAvailableNotify(encoder);
        encoder.OnNewEncodedTextAvailableNotify(encodedTextSubscribers);

        configuration.Configure();

        fileReader.SetFilePath(fromFile);
        fileWriter.SetFilePath(toFile);

        fileReader.Read();
    }
}
 
First my mistakes:
 
1) This part is confusing. I'm basically just creating an object that will forward every message it receives to both other objects but it isn't executed well:
1
2
3
4
5
var encodedTextSubscribers = new[]
{
   guiWriter,
   fileWriter
}.CreateMultiObserver();
 
2) Instead of having a separate configuration command, the things I wanted configured, should have just been configured on the fly. Setting them to be configured and then calling for configuration to occur seems way too meh.
1
2
3
4
configuration.SetFileReaderToConfigure(fileReader);
configuration.SetFileWriterToConfigure(fileWriter);
...
configuration.Configure();
 
3) The line where I call out fileReader.Read(); is where the whole system comes to life but I fear that's not obvious.
 
Now what I like:
 
1) Whenever I create a message oriented design, I can discuss the whole system by pointing to the place where I configure my dependencies. The overall system flow may not be perfectly digestable but if one were to try to create a flow chart from this configuration they would find it very easy (I have and it lends itself well to presentations ;)  ). Another thing is, instead of needing a call graph that shows how objects talk to one another, the same ideas fall out of the view of how the objects are dependent on one another in my experience.
 
2) Whenever I run into too much pain with this approach it's a smell I did something wrong. Case in point: While working on part I of the challenge, I had begun to write and test a class that was essentially going to orchestrate all of the other classes together on top of the class which configures which objects talk to one another. Essentially I was building a router. The pain for me was that I was creating WAY too many fakes and needing to care WAY too much about what they were doing. So I took a step back and drew my objects on a piece of paper and then reconnected them per the new design I drew. There were hardly any code changes necessary and it was pretty short work.
 
3) I tend to write tiny objects. Some people hate having too many objects or objects that don't do much so your tastes may vary. I've found that smaller more focused classes help me however. When they encompass literally only a single responsibility I find them to be easier to replace/modify when they no longer fit my needs and I only need to mock when I absolutely need to.
 
If you haven't been talking with me or reading what I've been writing about Message Oriented Object Design here's a brief quick summary:
 
Message Oriented Object Design is an object oriented design philosophy wherein we view objects as sending immutable messages/publishing events on channels. MOOD systems also rely on the configuration of object networks to enable collaboration between them. A core tenet is the lack of inter-object getters (be it method or property calls).
 
4) I like how little code there is in my console application.
1
2
3
4
5
6
7
8
9
10
11
12
namespace ConsoleGUI
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var guiWriter = new GuiWriter();

            ROT13EncodingFileWriter.Do(guiWriter, args[0], args[1]);
        }
    }
}
 
That's it! Leave me a comment if you want to lend your own critique of what I've done. I also encourage you to head over to James' site and throw your own hat in the ring and critique other people's designs (be harder on the other designs though of course!).
 
Till next time.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Wed, 12 May 2010 22:50:00 -0700 Message Oriented Object Design and Machine Learning in Javascript http://codelikebozo.com/message-oriented-programming-and-machine-lear http://codelikebozo.com/message-oriented-programming-and-machine-lear
This article will show how to use Message Oriented Object Design (not unlike Message Oriented Programming aka MOP or Actor Model) to model your user interface as an actor and handle some more complex processing while updating the user interface. Specifically, the sample code implements a simple machine learning exercise wherein you enter any character on your keyboard and the program attempts to guess what you chose (without cheating ;).
 
First what is Message Oriented Object Design (henceforth referred to as MOOD)? Message Oriented Object Design is an object oriented design philosophy wherein we view objects as sending immutable messages/publishing events on channels. MOOD systems also rely on the configuration of object networks to enable collaboration between them. A core tenet is the lack of inter-object getters (be it method or property calls). Since I wrote this example in Javascript and it has no inherent support for this concept all of the ideals of MOOD will need to be enforced by convention. Message Oriented Object Design is a term I made up. I'm not sure that it's sufficiently different from Message Oriented Programming or Message Based Programming to warrant existence but I also don't want to sully those terms with my own ideas if there are important subtleties I'm missing.
 
I'm in the process of writing a very in depth article on Message Oriented Object Design so if you want to know more just let me know and I'll contact you when it's available. For now, suffice it to say that the words object and actor are interchangeable as are the words message, method, or event.
 
The problem we're going to solve is this: Given a text box where a user can enter in any character literal we will create a system that will use that information to predict what the user's next entry to be and also update the web page with our stats on how we're doing. Because we're using the MOOD philosophy, there will be no getters between objects (using them on private methods is perfectly acceptable though).
 
To get started I wrote the following very simple javascript object to represent the user interface:
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
var user_actor = function(guess_dom_target, accuracy_dom_target)
{
  this._target_element = guess_dom_target;
  this._accuracy_target_element = accuracy_dom_target;
};
user_actor.prototype.send_guesses_to = function(channel)
{
  this._guess_channel = channel;
}
user_actor.prototype.value_entered = function(value)
{
  this._guess_channel.next(value);
};
user_actor.prototype.previously_guessed_value_updated = function(guess_value)
{
  this._target_element.html(guess_value);
};
user_actor.prototype.accuracy_updated = function(accuracy_value)
{
  this._accuracy_target_element.html(accuracy_value * 100);
};
 
One of the key ideas that makes MOOD so powerful is that it views your user interface as just another MOOD object (basically an as an actor). This means that all of the UI eventing that can be so troublesome finds a home here. The idea of asynchronous actions will be built into all of our objects so even as we switch contexts to work on the machine learning portion of the system, the overall object design will look very familiar.
 
Here you can also see the concept of a channel in my objects. In MOOD (and Message Oriented Programming) we always assume that we're using a channel which well pass along our message to the correct object. So while we will end up passing an object reference as the channel, this assumption forces us to view our code as though it is an isolated object unaware of how its method calls will affect others. This will enable us to ensure an extremely clean separation of concerns (SRP) and it will make it easier on us to verify when we violate SRP. How? Look at the semantic meaning of the code in the object. Does any of it seem out of place for an object that's managing the type of UI we are? Why isn't there any knowledge of the learning that this program will be doing? Think about this as we continue.
 
Once I had this code written, I wrote some quick test code just to make sure it was outputting the correct values to the correct spots in the HTML. I'll leave writing that code as an exercise to the reader as it is fairly trivial.
 
Next up, I iterated on the actor that managed the learning task. While the latest code is utilizing a markov chain to learn the users' patterns, I started it incrementally by just having it guess "yesterday's weather" (ie. use the current input as our prediction of the next input). This is the completed implementation:
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
var learning_actor = function()
{
  this._markov_chain = {};
  this._guessed_value = "";
  this._previous_value = "";
};
learning_actor.prototype.set_guess_channel_to = function(channel)
{
  this._guess_channel = channel;
};
learning_actor.prototype._make_best_guess = function(current_value)
{
  var value_to_guess = current_value;
  var score_to_beat = -1;
  
  var guess_list = this._markov_chain[current_value];
  for(var previously_guessed_value in guess_list)
  {
    var score = guess_list[previously_guessed_value];
    if(score > score_to_beat)
    {
      value_to_guess = previously_guessed_value;
      score_to_beat = score;
    }
  }
  
  return value_to_guess;
};
learning_actor.prototype._learn_from_new_information = function(previous_value, current_value)
{
  if(this._markov_chain[previous_value] == null)
  {
    this._markov_chain[previous_value] = {};
    this._markov_chain[previous_value][current_value] = 0;
  }
  
  if(isNaN(this._markov_chain[previous_value][current_value]))
    this._markov_chain[previous_value][current_value] = 0;
    
  this._markov_chain[previous_value][current_value] = this._markov_chain[previous_value][current_value] + 1;
};
learning_actor.prototype.next = function(value)
{
  this._guess_channel.guessed(this._guessed_value, value);

  this._learn_from_new_information(this._previous_value, value);
  this._guessed_value = this._make_best_guess(value);
  this._previous_value = value;
};
 
As a refresher, the Markov Chain as I've implemented it tells us which value is most likely to be entered next given the previous value. I won't go into the implementation details but the code is fairly concise and is hopefully legible enough to be decrypted. 
 
The learning actor has just a couple of main parts to it.
  • The next(value) message that is passed the value that the user entered.
  • The _learn_from_new_information(previous_value, current_value) method that trains our markov chain.
  • The _make_best_guess(value) method that utilizes our trained markov chain to make an educated guess about the user's next entry.
  • Last but not least, a simple set_guess_channel_to(channel) message that we can use to publish what we guessed and what the right guess actually was.
Initially, I had actually written the code that is now in the scoreboard actor as a part of the learning actor. Here's that code:
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
var scoreboard_actor = function()
{
  this._values_entered_count = 0;
  this._correctly_guessed_value_count = 0;
};
scoreboard_actor.prototype.set_display_channel_to = function(channel)
{
  this._display_channel = channel;
};
scoreboard_actor.prototype.guessed = function(my_guess, correct_guess)
{
  this._values_entered_count = this._values_entered_count + 1;
  
  if(my_guess == correct_guess)
  {
    this._correctly_guessed_value_count = this._correctly_guessed_value_count + 1;
  }
  
  this._display_channel.accuracy_updated(this._correctly_guessed_value_count / this._values_entered_count);
  this._display_channel.previously_guessed_value_updated(my_guess);
};
 
You can see it's fairly simple and likewise I was hesitant to move it to a new class. As you get started with this style, you will feel this quite often. I recommend fighting through the pain until you come upon the first "major" refactoring you need to do. The ease in which you'll be able to make that change I guarantee will astound you and you'll be hooked. Another reason I hesitated to move this out of my learning actor is that I assumed I would be duplicating the concept of "the previous value must equal the last". Since MOOD doesn't allow for getters I knew that the only way I could have shared that logic would be copy 'n' paste reuse (read: ewww). Look at the algorithm left over in the learning actor though. It never cares whether or not we guessed right. It only tracks the guesses and makes a hypothesis regarding them. So if guess checking wasn't a concern of the learning algorithm why did I have it there to begin with? I simply wanted to display a scoreboard. Hence the creation of my scoreboard actor.
 
We've got all of these objects but what to do with them? The configuration of our objects is referred to as the network configuration. This is essentially just a different flavor of dependency injection. The difference here being that your configuration will be able to be factored away from the rest of your code and isolated if you so choose. Here's the object network configuration for this code:
 
1
2
3
4
5
6
7
8
9
10
var computer_guess_display = $('#computer_guess');
var computer_guess_accuracy_display = $('#computer_guess_accuracy');

my_user = new user_actor(computer_guess_display, computer_guess_accuracy_display);
var my_learner = new learning_actor();
var my_scoreboard = new scoreboard_actor();

my_user.send_guesses_to(my_learner);
my_learner.set_guess_channel_to(my_scoreboard);
my_scoreboard.set_display_channel_to(my_user);
 
The first thing that should stand out to you is that we are making no attempt to make our objects immutable. In MOOD, just like in Actor Model, we are guaranteed that an actor will only ever be used from the context of a single thread throughout its lifetime. This might seem to be a poor constraint here is why it's not: Imagine the learning actor gets some VERY complex logic. That isn't a stretch depending on how accurate you want the guesses to be. So, if you had written this code without using this style and didn't explicitly design for asynchronicity what might happen? The first time that learning actor needs to really think, your UI will freeze up. Because we wrote this using Message Oriented Object Design however, we can throw that logic _anywhere_ and it won't block our UI. What do I mean by anywhere? I mean we could literally host it on a web service and instead of implementing our actor on the HTML we could have an actor that was responsible for interacting with the web service. Someday, if Javascript gets threads we could even throw the extra work onto a thread and create a channel object to manage the threading context on the passed messages. The rest of our code wouldn't change for either case. If you need an actual example leave me a comment to that effect because for now this seems as though it's easy to see especially once someone has pointed it out. In the meantime, if you've been thinking that Message Oriented Object Design is a lot of extra work for pedantic self-indulgent programmers think about whether or not your code could do that.
 
Oh yeah. Also, notice that there is only one VERY thin object in all of that code that has anything to do with the DOM. The rest is trivially unit testable. And not just testable in a small way, but testable as in only the object under test will be exercised. I didn't TDD this code. That's just the way the MOOD pulls me.
 
Also, I apologize in advance for my horrible naming of methods and objects. Hopefully it still gets the point across.
 
That's it! Go ahead and try it, it's pretty neat. Just "randomly" pressing keys on the keyboard the way I do the code was able to guess correctly 40% of the time or so. Not bad at all! Also, regular patterns like "abcabcabc" it will get pretty quick and you'll see the code try to follow you if you do something like "aaaabababaaaababababab". Of course, like all learning agents, the more random the string you enter, the worse the agent will perform.
 
The full HTML source code is here for you to download and try. It does require you to include jquery for it to work. Leave a comment if you have any questions! :)
 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
  <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
  <title>LearningU.js</title>
  <script type="text/javascript" src="jquery.js"></script>
  <script type="text/javascript">
    var my_user = null;
    $(document).ready(function(){
    // Start user_actor
      var user_actor = function(guess_dom_target, accuracy_dom_target)
      {
        this._target_element = guess_dom_target;
        this._accuracy_target_element = accuracy_dom_target;
      };
      user_actor.prototype.send_guesses_to = function(channel)
      {
        this._guess_channel = channel;
      }
      user_actor.prototype.value_entered = function(value)
      {
        this._guess_channel.next(value);
      };
      user_actor.prototype.previously_guessed_value_updated = function(guess_value)
      {
        this._target_element.html(guess_value);
      };
      user_actor.prototype.accuracy_updated = function(accuracy_value)
      {
        this._accuracy_target_element.html(accuracy_value * 100);
      };
      
    // Start learning_actor
      var learning_actor = function()
      {
        this._markov_chain = {};
        this._guessed_value = "";
        this._previous_value = "";
      };
      learning_actor.prototype.set_guess_channel_to = function(channel)
      {
        this._guess_channel = channel;
      };
      learning_actor.prototype._make_best_guess = function(current_value)
      {
        var value_to_guess = current_value;
        var score_to_beat = -1;
        
        var guess_list = this._markov_chain[current_value];
        for(var previously_guessed_value in guess_list)
        {
          var score = guess_list[previously_guessed_value];
          if(score > score_to_beat)
          {
            value_to_guess = previously_guessed_value;
            score_to_beat = score;
          }
        }
        
        return value_to_guess;
      };
      learning_actor.prototype._learn_from_new_information = function(previous_value, current_value)
      {
        if(this._markov_chain[previous_value] == null)
        {
          this._markov_chain[previous_value] = {};
          this._markov_chain[previous_value][current_value] = 0;
        }
        
        var likely_values_based_on_prev_value = this._markov_chain[previous_value];
        
        if(isNaN(likely_values_based_on_prev_value[current_value]))
          likely_values_based_on_prev_value[current_value] = 0;
          
        likely_values_based_on_prev_value[current_value] = likely_values_based_on_prev_value[current_value] + 1;
      };
      learning_actor.prototype.next = function(value)
      {
        this._guess_channel.guessed(this._guessed_value, value);
      
        this._learn_from_new_information(this._previous_value, value);
        this._guessed_value = this._make_best_guess(value);
        this._previous_value = value;
      };
      
    // Start scoreboard_actor
      var scoreboard_actor = function()
      {
        this._values_entered_count = 0;
        this._correctly_guessed_value_count = 0;
      };
      scoreboard_actor.prototype.set_display_channel_to = function(channel)
      {
        this._display_channel = channel;
      };
      scoreboard_actor.prototype.guessed = function(my_guess, correct_guess)
      {
        this._values_entered_count = this._values_entered_count + 1;
        
        if(my_guess == correct_guess)
        {
          this._correctly_guessed_value_count = this._correctly_guessed_value_count + 1;
        }
        
        var guess_accuracy = this._correctly_guessed_value_count / this._values_entered_count;
        this._display_channel.accuracy_updated(guess_accuracy);
        this._display_channel.previously_guessed_value_updated(my_guess);
      };
      
      //Create actors
      // my_user needs to be global so UI can use it.
      var computer_guess_display = $('#computer_guess');
      var computer_guess_accuracy_display = $('#computer_guess_accuracy');
      
      my_user = new user_actor(computer_guess_display, computer_guess_accuracy_display);
      var my_learner = new learning_actor();
      var my_scoreboard = new scoreboard_actor();
      
      my_user.send_guesses_to(my_learner);
      my_learner.set_guess_channel_to(my_scoreboard);
      my_scoreboard.set_display_channel_to(my_user);
    });
  </script>
</head>
<body>
<div>
    You: <input name="human_value" value="" onclick="this.select();" onfocus="this.select();" maxlength="1" onkeyup="my_user.value_entered(this.value); this.select();" />
  </div>
  <div>
    My Guess: <span id="computer_guess"></span>
  </div>
  <div>
    My Accuracy: <span id="computer_guess_accuracy">0</span>%
  </div>
</body>
</html>

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Thu, 01 Apr 2010 19:47:00 -0700 Katas for Practicing Refactoring http://codelikebozo.com/katas-for-practicing-refactoring http://codelikebozo.com/katas-for-practicing-refactoring

Problem

I've tried to do the various TDD katas found laying around on the web (not dissimilar to those found here: http://codekata.pragprog.com/) and found I have a couple issues with most of them. First it has to do with what I perceive a kata to be.

A kata is a choreographed set of movements that are practiced ad-tedium so that they come almost instinctively, one after another. They are very narrow and focused. There is no real problem solving to be found in them.

In completing some of the various katas I've found on the net, I've noticed they are more complex than I'd like, but also, in the end, I don't really know that I've done it right. It's extremely easy for me to get off track. I also never know why I'm doing it. What am I learning? Am I learning?

It's important to note that I'm not saying the TDD katas out there are broken, if they work for you then I don't recommend you stop them although you may still want to give these a try. I'm really just saying that they are broken to me.

Solution

In response to my experiences I came up with the idea of practicing kata forms based on Martin Fowler's book, Refactoring. Almost all of the refactorings in this book are extremely simple and the kata itself implies the "solution" you should arrive at at the end. We do refactorings numerous times daily. A lot of times it's with the help of tools, but there are plenty the tools don't cover. Simultaneously, there are times when your favorite tool is broken (COUGH Resharper 5 Beta COUGH) and you may need to whip one out by hand. Also, there is no need to ponder how to model the domain of the kata. The sample has been TDD'd and all that's left is to refactor the tests and the sample code together into the direction described by Fowler's book. I have decided to keep my samples as close to Fowler's as possible (sometimes steering a little away to maintain my personal coding standards).

That gives us a bonus for free. The refactoring katas also serve as an excellent introduction into what makes TDD addicting, the safety net. When the kata is being refactored, the tests may require refactoring but they shouldn't require massive changes. Everything a practicer might do, should already be covered. This means that if they make a misstep, the tests will usually catch them and guide them back onto the correct path. 

Follow Along

I recommend using Refactoring as a guide to follow along with the refactorings and practicing them until they are a muscle reflex. This is the book to buy in case you're too lazy to Google it: http://rcm.amazon.com/e/cm?lt1=_blank&bc1=000000&IS2=1&bg1=FFFFFF&fc1=000000&lc1=0000FF&t=justibozon-20&o=1&p=8&l=as1&m=amazon&f=ifr&md=10FE9736YVPPT7A0FBG2&asins=0201485672

Contribute!

I have completed one refactoring and have half of another larger one posted to my GitHub account.There are a lot of refactorings to cover and I could really use some help. I've done the hard part and gotten the project started and added a couple of my favorite refactorings. If you like this whole concept do the same. :)

Where are they Hosted?

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Sun, 14 Mar 2010 19:34:00 -0700 TDD-ing Concurrent Code http://codelikebozo.com/test-driven-development-of-a-message-based-sy http://codelikebozo.com/test-driven-development-of-a-message-based-sy
A Method for Modelling Concurrency
 

I'm prepping code for Code Camp Boise and Seattle and I thought I'd share some of the simple stuff I'm writing as I'm writing it to act as an introduction of sorts to the concepts.

 
I hear a lot of people say things like "Well we made this process concurrent so now we can't test it." That just always felt wrong to me. Over the past year or two, as I've been reading about threading though I've kept this in mind. Like any concern, it's difficult to test without taking it into account if the concern isn't abstracted away from the code under test.
 

Testing concurrent software can be extremely difficult. While debugging, breakpoints can be seemingly randomly tripped by other threads that you don't care about, your data can change right under your nose whether or not you're paused, etc.
 
Another issue I hear is that synchronizing across threads is a pain. What happens if after verifying the object you want to use is in the appropriate state, some other thread changes it and then when you use it it throws an exception? In this way, race conditions can be extremely difficult to manage.
 
One way to handle this is to use a message based data flow oriented model. Why? Well first and foremost because this allows you to model your data dependencies and allow the abstraction to suss out the details. By just declaring a network of processes (which are essentially objects) as a directed graph you gain the ability to do this. Now you've explicitly declared how these different processes will interact with one another and since data flow programming uses immutable objects you won't have to worry about any processes interfering with each other.
 
Yet another great thing about developing a data flow network is that you can test each process in isolation without it even having any knowledge of threading. That's what I will be talking about for the remainder of this post.
 
Some Context
 
A friend of mine needed a computer program that would go through a text file of over 10,000 lines of text (sometimes more) and find all of the valid email addresses. First and foremost I came up with a quick description and overview of the process I could see going through:
 
FileReadingAgent
reads in lines from file line by line and passes them along to the ObviousEmailExtractionAgent while skipping the blank lines.

ObviousEmailExtractionAgent
extracts obviously good email addresses from each line and passes them on to the GoodEmailCollectionAgent. 
Lines without obviously good email addresses (or with none at all) are passed to the NonObviousEmailExtractionAgent for further processing.

NonObviousEmailExtractionAgent
Uses more intelligent email extraction rules to find less obvious email addresses
Passes any found email addresses on to the GoodEmailCollectionAgent.

GoodEmailCollectionAgent
Aggregates known good email addresses.
 
So to reiterate, all of these agents should be assumed to be running in their own threads. Also, they only communicate to one another via immutable messages. 
 
The FileReadingAgent would have a connection to the ObviousEmailExtractionAgent. The ObviousEmailExtractionAgent would have *two* connections. One to the GoodEmailCollectionAgent and another to the NonObviousEmailExtractionAgent.
 
In this post I'd like to share the tests that went into creating the FileReadingAgent. 
 
TDD-ing a Process
 
The first context I worked on assumed there was only one line of text in a file. This is a basic context that just helps to ensure that the basic plumbing for my agent is all hooked up. This is how I tested this context:
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
using System.Linq;
using EmailScraperAgentBehaviours.Agents.Fakes;
using EmailScraperNetwork.Actors;
using NUnit.Framework;

namespace Given_a_file.with_lines_of_text
{
[TestFixture]
    public class When_the_agent_receives_the_file_path
    {
[Test]
        public void It_should_send_out_a_message_for_each_line()
        {
            Assert.That(LinesOfTextChannel.ReceivedMessagesCount, Is.EqualTo(FileReaderWithOneNonBlankLineOfText.NonblankLineCount));
        }

[Test]
        public void It_should_read_from_the_correct_file()
        {
            Assert.That(FileReader.FilePath, Is.EqualTo(ProvidedFilePath));
        }

        private void Context()
        {
            FileReader = new FileReaderWithOneNonBlankLineOfText();
        }

        private void Because()
        {
            It.OnNext(ProvidedFilePath);
        }

[TestFixtureSetUp]
        public void Setup()
        {
            LinesOfTextChannel = new MessageCollectionChannel<string>();

            Context();

            It = new LineByLineFileReadingAgent(FileReader);
            It.ShouldSendLinesOfTextTo(LinesOfTextChannel);

            ProvidedFilePath = "c:/file_path";

            Because();
        }

        private LineByLineFileReadingAgent It;
        private MessageCollectionChannel<string> LinesOfTextChannel;
        private FileReaderWithOneNonBlankLineOfText FileReader;
        private string ProvidedFilePath;
    }
}

 
The next context I worked on contained blank text lines. I wanted to ensure that those lines of text didn't get passed on to my email finding agents.
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
namespace Given_a_file.with_lines_of_text.and_some_blank_lines
{
[TestFixture]
    public class When_the_agent_receives_the_file_path
    {
[Test]
        public void It_should_NOT_send_out_a_message_for_any_non_blank_line()
        {
            Assert.That(MessageChannel.ReceivedMessages.Any(x => x == ""), Is.False);
        }

[Test]
        public void It_should_use_the_provided_file_path()
        {
            Assert.That(FileReader.ProvidedFilePath, Is.EqualTo(ProvidedFilePath));
        }

[TestFixtureSetUp]
        public void Setup()
        {
            ProvidedFilePath = "filePath";
            MessageChannel = new MessageCollectionChannel<string>();
            FileReader = new FileReaderWithOneNonBlankLineAndMultipleBlankLines();
            It = new LineByLineFileReadingAgent(FileReader);
            It.ShouldSendLinesOfTextTo(MessageChannel);

            It.OnNext(ProvidedFilePath);
        }

        private FileReaderWithOneNonBlankLineAndMultipleBlankLines FileReader;
        private LineByLineFileReadingAgent It;
        private MessageCollectionChannel<string> MessageChannel;
        private string ProvidedFilePath;
    }
}
 
The final context has lines of text with whitespace characters and one line of text that is an email address. I wanted to ensure that only lines with any kind of text moved on to the agents that would actually try to parse out email addresses. In hindsight, this probably should have gone in the ObviousEmailExtractionAgent. It seems like the FileReading agent shouldn't really be concerned with this. I could probably just change the name of my FileReadingAgent to NonBlankLineReadingAgent and get by that way. ;)
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
namespace Given_a_file.with_lines_of_text.and_some_lines_with_whitespace_characters
{
[TestFixture]
    public class When_the_agent_receives_the_file_path
    {
[Test]
        public void It_should_send_a_message_for_only_the_line_with_text()
        {
            Assert.That(MessageChannel.ReceivedMessagesCount, Is.EqualTo(1));
        }

[Test]
        public void It_should_NOT_send_out_a_message_for_any_non_blank_line()
        {
            Assert.That(MessageChannel.ReceivedMessages.Any(x => x.Trim() == ""), Is.False);
        }

[Test]
        public void It_should_use_the_provided_file_path()
        {
            Assert.That(FileReader.ProvidedFilePath, Is.EqualTo(ProvidedFilePath));
        }

[TestFixtureSetUp]
        public void Setup()
        {
            ProvidedFilePath = "filePath";
            MessageChannel = new MessageCollectionChannel<string>();
            FileReader = new FileReaderWithOneNonBlankLineAndMultipleWhitespaceLines();
            It = new LineByLineFileReadingAgent(FileReader);
            It.ShouldSendLinesOfTextTo(MessageChannel);

            It.OnNext(ProvidedFilePath);
        }

        private FileReaderWithOneNonBlankLineAndMultipleWhitespaceLines FileReader;
        private LineByLineFileReadingAgent It;
        private MessageCollectionChannel<string> MessageChannel;
        private string ProvidedFilePath;

    }
}
 
My "final" code doesn't handle disposal or anything and it definitely should! That's an oversight on my part. Aside from that this code should be pretty much complete:
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
using System.Linq;
using EmailScraperNetwork.BaseFramework;

namespace EmailScraperNetwork.Actors
{
    public class LineByLineFileReadingAgent : IObserver<string>
    {
        private readonly IEachableFile<string> FileReader;
        private IObserver<string> ChannelToSendNonBlankLinesOfTextTo;

        public LineByLineFileReadingAgent(IEachableFile<string> fileReader)
        {
            FileReader = fileReader;
        }

        public void ShouldSendLinesOfTextTo(IObserver<string> channel)
        {
            ChannelToSendNonBlankLinesOfTextTo = channel;
        }

        public void OnNext(string filePath)
        {
            var lines = FileReader.ReadFrom(filePath);

            foreach(var line in lines.Where(x=> x.Trim() != ""))
            {
                ChannelToSendNonBlankLinesOfTextTo.OnNext(line);
            }
        }
    }
}
 
Notice the use of the IObserver interface? I'm stealing a bit from the new .NET Reactive framework (an idea that I got from Robert Ream). By using the OnNext method I can make my network of agents push oriented rather than pull oriented. The benefits of this can be enumerated in another blog post. :)
 
How could I connect these to run synchronously? Super easy. This is how I could link the LineByLineFileReader to the ObviousGoodEmailExtractionAgent:
 
lineByLineFileReader.ShouldSendLinesOfTextTo(obviousGoodEmailExtractionAgent);
 
Then to start I'd send the filepath I wanted to be processed to the lineByLineFileReader like so: 
 
lineByLineFileReader.OnNext("c:/myfile.txt");
 
Next time I'll show an overview of the whole application and how it works concurrently with a WPF UI. 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Sun, 28 Feb 2010 20:56:24 -0800 Alt.NET 2010 Registration http://codelikebozo.com/altnet-2010-registration http://codelikebozo.com/altnet-2010-registration The Alt.NET Annual Conference is on its way to Seattle again this
year! It's set for April 9th-11th. Attendance is limited so please
register and secure your spot.

If you haven't done so yet, registration can be handled online at the
following site: http://altnetseattle2010.eventbrite.com/

Thanks for your support and hope to see you there!

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier
Sun, 28 Feb 2010 20:49:00 -0800 A TDD Practitioner's Pragmatic Argument Against 100% TDD http://codelikebozo.com/a-tdd-practitioners-pragmatic-argument-agains http://codelikebozo.com/a-tdd-practitioners-pragmatic-argument-agains

Not writing unit tests can drive more value than writing them if one
makes a good gamble. More often than not TDD pundits argue that if you
don't have tests you can't easily and rapidly discern a buggy system
from a solid one. They claim you can't effectively explore your code
base by utilizing the tests for hypothesis testing. They claim that
it's risky and wasteful.

They're half right.

It's all just economics. Pure test driven development provides us with
a solid risk mitigation technique, that if followed to the T can
ensure your code will only increase in quality and robustness. Most
self-respecting developers accept that. The controversy which
surrounds this subject usually revolves around whether or not the time
invested can be rationalized as driving a sufficient amount of value
to make it worthwhile. Most controversial is the idea that everything
you program must have a test rather than just most of what you
program. Not having a test to cover a change to the program means
there's more of a risk that a bug could creep in.

How do we leverage that risk though? The case can be made that unit
testing code can take longer than just programming what you want.
Those of us who feel very strongly about TDD would say that might
happen but you'd have to be lucky to get code without many bugs.
Others would argue against that but I say ok sure! Let us assume that
you'll only get that code done correctly, faster if you're lucky
but... How lucky? How much do you stand to gain if you win? Do you see
where I'm going here?

Risk can drive an exponential increase in a return on your investment.
That's an economic fact. If you accept that as well as the idea that
not unit testing is risky (as opposed to impossible) then you
hopefully agree with me on this.

I don't care what your stance is on this subject, this is a rational
argument for strategically not testing aspects of your code.

Now if we wish to argue this subject in the future we need to argue
the specific contextual situation of each developer. For some it is a
potentially profitable venture and for others it would be enormously
expensive. Now we can frame the whole debate as being so hard to
definitively settle due to the unique data that we'd need to
rationalize it at every company. This is nothing new as many of us
who've been arguing for unit testing have given up arguing with some
people. Both sides huffing while the TDD person says you really just
need to try it for yourself. What they're really saying is you'll need
to run your own study at your own company to see if the practice is
right for you.

Some of you will say that's fine but the developers who do this will
be accruing a hellish technical debt over time that they'll eventually
have to pay off.

Technical debt is really the wrong model for what we're discussing.
Why do we have to pay it back? I'll wait for the laughing to stop but
once you've caught your breath please really think about this. If not
"improving" that code is driving a hefty increase in value then we
don't really have to because what we're actually doing is leveraging
risk. That's actually why the whole debt analogy really works in the
first place, because debts are assumed risk, the difference being that
the investor needs to be paid back. Instead, if we view them as
gambles, we understand that it's possible to get Black Jack, or a
royal flush, or hit the jackpot.

Remember every decision we make every day is a multivariate
optimization problem. "Experts" give us their recommendations based on
their own personal experience, but every company is different. If you
don't have the data you need to support the cases you wish to make
start collecting it now.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/94091/MyMugShot.jpg http://posterous.com/users/15YhR1s088x Justin Bozonier darkxanthos Justin Bozonier