Category Archives: SQL

Why I’ve come to love functional programming

Recently I have been starting to learn Node.js and inadvertently changed my opinion of functional programming VS Object Oriented programming. During my work on supporting legacy Coldfusion sites I stumbled across some code that forms the basis of this comparison of functional programming VS Object Oriented programming. (PS if you want a good introduction to the differences look on over to: http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html).

This particular example will be Coldfusion and Pseudo-javascript (or coffeescript) and focuses around a multi-keyword search on 3 columns on 2 tables. Our tables are as follows:

Table Name: Jedi
With Columns: Name, Planet, Lightsaber_Colour.

Table Name: Sith
With Columns: Name, Catch_Phrase, Teacher.

First of all we will define a master function, and a convenience function for searching each of the tables, we do this for an attempt at reusability. We’ll call this main search function “galacticSearch” which will return a SQL string and take the arguments: Keyword (required, string), table (required, string) and andFlag (optional, boolean, defaults to true). The andFlag determines whether our search will return with all the keywords or any of the keywords (AND and OR searches respectively). The SQL returned will be less than perfect as really we should use something more forgiving and inclusive than the ‘=’ operator, for example in MSSQL name LIKE “%#ListGetAt(arguments.keywords, i, ‘ ‘)#%”. Also the params should be using cfqueryparam etc etc, it’s not perfect code but it doesn’t have to be production code for illustration purposes!

Here’s my implementation in Coldfusion, standing up for the Object Oriented languages:

<cffunction name="jediSearch" returnType="string">
	<cfargument name="keywords" required="true" required="false" type="string">
	<cfargument name="andFlag" required="false" type="boolean" default="true">
	<cfreturn galacticSearch(arguments.keywords,'jedi',arguments.andFlag)>
</cffunction>

<cffunction name="sithSearch" returnType="string">
	<cfargument name="keywords" required="true" required="false" type="string">
	<cfargument name="andFlag" required="false" type="boolean" default="true">
	<cfreturn galacticSearch(arguments.keywords,'sith',arguments.andFlag)>
</cffunction>

<cffunction name="galacticSearch" returnType="string">
	<cfargument name="keywords" required="true" required="false" type="string">
	<cfargument name="table" required="true" type="string">
	<cfargument name="andFlag" required="false" type="boolean" default="true">
	<cfscript>
		var sql = 'SELECT * FROM #arguments.table# WHERE ';
		//loop through the keywords
		for(var i = 1; i LTE ListLen(arguments.keywords, ' '); i++){
			//change the columns for each table
			if(arguments.table is 'jedi'){
				sql &= "(name = '#ListGetAt(arguments.keywords, i, ' ')#'";
				sql &= "OR planet = '#ListGetAt(arguments.keywords, i, ' ')#'";
				sql &= "OR lightsaber_colour = '#ListGetAt(arguments.keywords, i, ' ')#')";
			}else{
				sql &= "(name = '#ListGetAt(arguments.keywords, i, ' ')#'";
				sql &= "OR catch_phrase = '#ListGetAt(arguments.keywords, i, ' ')#'";
				sql &= "OR teacher = '#ListGetAt(arguments.keywords, i, ' ')#')";
			}
			if(i LT ListLen(arguments.keywords, ' ')){
				sql &= (arguments.andFlag)?' AND ':' OR ';
			}
		}
		return sql;
	</cfscript>
</cffunction>

Now if your like me, you look at this code and think “How on earth do I add another table easily and safely?”. Well you have to add a new function like “jediSearch” and then add an extra if/else into the loop in galacticSearch. Personally I’d rather not do that as it’s kind of messy and galacticSearch isn’t really the generic function I’d like it to be. Also when you edit like this you risk breaking something already working and increasing development time.

So next we move onto my solution in Pseudo-Javascript. Things of note are that we have changed the galacticSearch function to also take a columnsFunction that takes a keyword and returns the WHERE clause SQL for searching a tables columns for one keyword.

searchJedi = (keywords, andFlag = true) ->
	return galacticSearch keywords, 'jedi', andFlag, (keyword) ->
		return "(name = '#{ keyword }' OR planet = '#{ keyword }' OR lightsaber_colour = '#{ keyword }')"

searchSith = (keywords, andFlag = true) ->
	return galacticSearch keywords, 'sith', andFlag, (keyword) ->
		return "(name = '#{ keyword }' OR catch_phrase = '#{ keyword }' OR teacher = '#{ keyword }')"

galacticSearch = (keywords, andFlag = true, table, columnsFunction) ->
	sql = 'SELECT * FROM #{ table } WHERE '
	keys = keywords.split(' ')
	for i, key in keys
		sql += (if i gt 0 then (andFlag)?' AND ':' OR ' ELSE '') + columnsFunction(key)
	return sql

I think that the code is organised much tidier in the pseudo-javascript version. All the code referring to the interpretation of the keywords is in one place and all the code about searching the tables is held in another function. This in turn means that adding another table or more complexity to your keywords is not hard and you don’t risk messing up your previous code by doing it! That I think is very cool, especially if you have to work with other people/s code.

Admittedly you *could* do something similar in Object Oriented world, if you created a “Jedi” object and a “Sith” object and then coded the galacticSearch to take an object as an argument instead of a table name, but this seems like overly complicating things. Of course there is a case somewhere where passing the object is a better approach, but for most of my coding I’d rather the anonymous function approach I used in the functional programming version.

Chime in the comments if you have an opinion – I’m fairly new to functional programming so hearing from other people who know more than me is always nice!

Contestate Site

A while back I had launched a site called contestate.net that I wrote myself with the help of a themeforest template. The site is built with the idea of a mobile/desktop app to track competitions between friends. A few of the things that I have posted in the angularjs and coldfusion sections of this blog originated in problems I had in building the site. I had previously only really opened it up to a few friends, but now I am opening it up to the world at large – it’s free (my company kindly gave me free hosting, lume.co.nz). So go check it out: contestate.net (and don’t worry about the Open Alpha tag I have on there, the security is the same I use for my live apps and we have ironed most of the bugs out.) I would appreciate any criticisms/bugs/suggestions or anything on the system, just leave them here in the comments, on the Contestate page up top or email them through to the contestate email address.

 

EDIT This has been taken offline as the hosting has been discontinued and I don’t feel the need to continue paying for hosting for something that isn’t used!

Hamming distance on database records

Today I needed to quickly compare multiple records in one table with one I already had and find the most similar one, specifically in MSSQL but the query I used can be modified for most other DB systems. I am going to write this up a little formally to make sure that it is clear. Note: that this implementation can be modified to be a dynamically created query or a stored procedure.

Problem

For a given record in sourceTable (id = 12), rank records in table, based on how many values of each record match the corresponding column:value pair.

Solution

Use a modified hamming distance algorithm in the SQL as follows:

SELECT table.id,

CASE WHEN sourceTable.columnName1 = table.columnName1 THEN 0 ELSE 1 END +
CASE WHEN sourceTable.columnName2 = table.columnName2 THEN 0 ELSE 1 END +
...
CASE WHEN sourceTable.columnNameN = table.columnNameN THEN 0 ELSE 1 END AS hammingDistance
FROM table, sourceTable
WHERE sourceTable.id = 12
 ORDER BY hammingDistance ASC

This implementation of the hamming code algorithm ranks all the records in the [table] table based on how many differences there are between the column values with the same column name in each table. So as output from this query we get the id’s of the table ranked with the most similar first (a hamming distance value of 0 being identical) and the least similar last. I am not sure that this is the most efficient way of doing this but at this point in time it works and can be modified for more complicated comparisons.