Monday, June 8, 2009

Names

Two cool, and unexpected, online applications for researching and comparing given names and surnames:
  • WhitePages.com: names.whitepages.com - Shows popularity, given/surname combinations, and geographical distribution.
  • WolframAlpha: woflramalpha.com - Shows a wide variety of information about (most) names, and allows for name comparisons, for example: Michael vs. Bruno

Wednesday, May 27, 2009

Soft Oatmeal Cookies

* 1 cup butter, softened * 1 cup white sugar * 1 cup packed brown sugar * 2 eggs * 1 teaspoon vanilla extract * 2 cups all-purpose flour * 1 teaspoon baking soda * 1 teaspoon salt * 1 1/2 teaspoons ground cinnamon * 3 cups quick cooking oats 1. In a medium bowl, cream together butter, white sugar, and brown sugar. Beat in eggs one at a time, then stir in vanilla. Combine flour, baking soda, salt, and cinnamon; stir into the creamed mixture. Mix in oats. Cover, and chill dough for at least one hour. 2. Preheat the oven to 375 degrees F (190 degrees C). Grease cookie sheets. Roll the dough into walnut sized balls, and place 2 inches apart on cookie sheets. OPTIONAL: Flatten each cookie with a large fork dipped in sugar. 3. Bake for 8 to 10 minutes in preheated oven. Allow cookies to cool on baking sheet for 5 minutes before transferring to a wire rack to cool completely.

Thursday, January 29, 2009

Hazelnut-Blueberry Muffins

1 egg
2/3 cup sugar
1/4 cup melted shortening
2/3 cup milk
1 1/2 cups all-purpose flour
1/2 cup hazelnut flour (meal)
1/2 teaspoon salt
2 1/2 teaspoons baking powder
1 teaspoon vanilla
1 teaspoon fresh lemon zest
1 cup blueberries
1/4 cup chopped hazelnuts

Mix flours, salt, baking powder and sugar until combined. Add shortening, egg, milk, vanilla, and lemon zest. Mix until ingredients are just well-mixed, then fold in blueberries.

Fill muffin tins about 3/4 full with the mixture. Sprinkle each muffin with chopped hazelnuts. Bake at 350 degrees for 20-25 minutes. Allow to cool before removing from muffin tin. Makes 12 muffins.

Thursday, January 15, 2009

Database Design: UUID vs. Integer Auto-Increment

We're doing some major database design at work and I raised the possibility of using UUIDs for all our surrogate keys as opposed to the traditional per-table auto-incremented integer or sequence. I personally prefer them, and we also have some special setup and needs that might benefit from them. We have distinct databases per client, where some "partners" span multiple clients (databases), I see this aiding in reporting across databases. We also have a need for replication, possibly multi-master replication as we're thinking about segregating bulk data-processing (read+write) and daily usage (also read+write).

In the last few years, for my personal projects and some work projects, I've switched to using official and ghetto mini-UUIDs wherever auto-generated keys are used. For me, there wasn't a single revelation, or one amazing article or book that changed my mind; it was more of a gradual annoyance with simplistic database-generated keys.

My reasons have primarily been:

  1. Less Database Dependency - When generated identifiers don't (have to) come from the database, business logic has less work to do when creating new data sets, especially when the new data is heavily interrelated, application code can simply declare a new key and reference it in other new, yet-to-be-saved objects. Also, having one authoritative source for generating keys - be it a lock on a table, or a global sequence provider - has a much lower threshold for scalability, and can invoke a sizeable performance penalty, especially in write-intensive applications. Modern UUID algorithms seem to be strong enough that a central "UUID Generation Service" per-host wouldn't be necessary.
  2. Uniqueness - An object's or row's id will be unique across the table, database, server, and thankfully, most-likely, the entire universe. This truly means that for a given set of values, (hopefully guaranteed to be unique by database constraints), the id will never be used by another set of data. This makes error tracing, log analysis, and searching for old data trivial. Also, how many times have you typed up a query "DELETE FROM foo WHERE id=178", stared at it to make sure 178 is the right number, hit enter, and then realized that "foo" should have been "bar"? Say goodbye to whatever had #178 in "foo"!
  3. Mocking/Testing - When id generation can happen in the application layer, creating large, complex data sets for testing/integration purposes is trivial and requires no back and forth to the database to associate objects. If you're using a full-featured ORM (e.g. Hibernate), this seems of no benefit, as you can create large sets of interrelated objects and tell the ORM to just save them all. But when these interrelated objects span databases, or application environments (all in a transactional environment, of course), it seems beneficial to let the application layer - the actual point where the new data set is created - specify the ID.
  4. Data Privacy - Contemporary web applications often expose internal state through simplistic sequential integer IDs. How many times have you been at /foo.php?id=17443 and found interesting things when you go to /foo.php?id=17444 ? Though this should be mitigated by proper design and security, if UUIDs are used, the attack vector is nonexistent.
I initially started using a simplistic random 16-digit alphabetical string to get the benefits described above, without having to deal with full-on 32 or 36 character UUIDs whilst programming and debugging. These are currently in use on my web site, for example:
http://napkindrawing.com/photo_view/rfenrcfouhnweeoo

These are generated in application code by a function that weights letter choices according to their frequency in the English language, so you end up with almost pronounceable keys. In the database all surrogate key columns have a default which produces hexadecimal digits if the surrogate key is not specified. This has been helpful for debugging and development, though given the weighting of the letters in English and the Birthday Paradox, I wouldn't feel comfortable using this scheme on any large (billions of records) databases. Plus it's completely non-standard and rather silly - you could have a "random" key like "ohhhbuttbutthaha".

My last project at work used actual UUIDs as generated by Hibernate's UUIDHexGenerator, and the development overhead of - GOSH! - 32 character IDs was nonexistant. Even dealing with handwritten queries whilst testing, debugging and data analysis were much simpler when you don't have to remember which "17" you're looking for in a fresh database.

The main complaints googling for "uuid database" seem to be around adversely affected indexes where uuids are added in random numerical order to clustered indexes - which would seem to be addressable by using a timestamp-prefixed value, a simple "reverse(uuid())" in MySQL, for example.

I'm of the opinion that the added hardware+software burden of supporting 32 byte keys is moderate to trivial given modern hardware, and the benefits described above are enough of a reason to switch all instances of generated keys to be UUIDs

Sunday, January 4, 2009

HTML5 Sightings

From Eric Meyer's Blog: A recent post "An Event Apart and HTMl 5" describes how the new An Event Apart web site being written entirely in HTML 5. Pretty exciting!