Category Archives: Algorithms

Why I’ve come to love functional programming

Recently I have been starting to learn Node.js and inadvertently changed my opinion of functional programming VS Object Oriented programming. During my work on supporting legacy Coldfusion sites I stumbled across some code that forms the basis of this comparison of functional programming VS Object Oriented programming. (PS if you want a good introduction to the differences look on over to: http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html).

This particular example will be Coldfusion and Pseudo-javascript (or coffeescript) and focuses around a multi-keyword search on 3 columns on 2 tables. Our tables are as follows:

Table Name: Jedi
With Columns: Name, Planet, Lightsaber_Colour.

Table Name: Sith
With Columns: Name, Catch_Phrase, Teacher.

First of all we will define a master function, and a convenience function for searching each of the tables, we do this for an attempt at reusability. We’ll call this main search function “galacticSearch” which will return a SQL string and take the arguments: Keyword (required, string), table (required, string) and andFlag (optional, boolean, defaults to true). The andFlag determines whether our search will return with all the keywords or any of the keywords (AND and OR searches respectively). The SQL returned will be less than perfect as really we should use something more forgiving and inclusive than the ‘=’ operator, for example in MSSQL name LIKE “%#ListGetAt(arguments.keywords, i, ‘ ‘)#%”. Also the params should be using cfqueryparam etc etc, it’s not perfect code but it doesn’t have to be production code for illustration purposes!

Here’s my implementation in Coldfusion, standing up for the Object Oriented languages:

<cffunction name="jediSearch" returnType="string">
	<cfargument name="keywords" required="true" required="false" type="string">
	<cfargument name="andFlag" required="false" type="boolean" default="true">
	<cfreturn galacticSearch(arguments.keywords,'jedi',arguments.andFlag)>
</cffunction>

<cffunction name="sithSearch" returnType="string">
	<cfargument name="keywords" required="true" required="false" type="string">
	<cfargument name="andFlag" required="false" type="boolean" default="true">
	<cfreturn galacticSearch(arguments.keywords,'sith',arguments.andFlag)>
</cffunction>

<cffunction name="galacticSearch" returnType="string">
	<cfargument name="keywords" required="true" required="false" type="string">
	<cfargument name="table" required="true" type="string">
	<cfargument name="andFlag" required="false" type="boolean" default="true">
	<cfscript>
		var sql = 'SELECT * FROM #arguments.table# WHERE ';
		//loop through the keywords
		for(var i = 1; i LTE ListLen(arguments.keywords, ' '); i++){
			//change the columns for each table
			if(arguments.table is 'jedi'){
				sql &= "(name = '#ListGetAt(arguments.keywords, i, ' ')#'";
				sql &= "OR planet = '#ListGetAt(arguments.keywords, i, ' ')#'";
				sql &= "OR lightsaber_colour = '#ListGetAt(arguments.keywords, i, ' ')#')";
			}else{
				sql &= "(name = '#ListGetAt(arguments.keywords, i, ' ')#'";
				sql &= "OR catch_phrase = '#ListGetAt(arguments.keywords, i, ' ')#'";
				sql &= "OR teacher = '#ListGetAt(arguments.keywords, i, ' ')#')";
			}
			if(i LT ListLen(arguments.keywords, ' ')){
				sql &= (arguments.andFlag)?' AND ':' OR ';
			}
		}
		return sql;
	</cfscript>
</cffunction>

Now if your like me, you look at this code and think “How on earth do I add another table easily and safely?”. Well you have to add a new function like “jediSearch” and then add an extra if/else into the loop in galacticSearch. Personally I’d rather not do that as it’s kind of messy and galacticSearch isn’t really the generic function I’d like it to be. Also when you edit like this you risk breaking something already working and increasing development time.

So next we move onto my solution in Pseudo-Javascript. Things of note are that we have changed the galacticSearch function to also take a columnsFunction that takes a keyword and returns the WHERE clause SQL for searching a tables columns for one keyword.

searchJedi = (keywords, andFlag = true) ->
	return galacticSearch keywords, 'jedi', andFlag, (keyword) ->
		return "(name = '#{ keyword }' OR planet = '#{ keyword }' OR lightsaber_colour = '#{ keyword }')"

searchSith = (keywords, andFlag = true) ->
	return galacticSearch keywords, 'sith', andFlag, (keyword) ->
		return "(name = '#{ keyword }' OR catch_phrase = '#{ keyword }' OR teacher = '#{ keyword }')"

galacticSearch = (keywords, andFlag = true, table, columnsFunction) ->
	sql = 'SELECT * FROM #{ table } WHERE '
	keys = keywords.split(' ')
	for i, key in keys
		sql += (if i gt 0 then (andFlag)?' AND ':' OR ' ELSE '') + columnsFunction(key)
	return sql

I think that the code is organised much tidier in the pseudo-javascript version. All the code referring to the interpretation of the keywords is in one place and all the code about searching the tables is held in another function. This in turn means that adding another table or more complexity to your keywords is not hard and you don’t risk messing up your previous code by doing it! That I think is very cool, especially if you have to work with other people/s code.

Admittedly you *could* do something similar in Object Oriented world, if you created a “Jedi” object and a “Sith” object and then coded the galacticSearch to take an object as an argument instead of a table name, but this seems like overly complicating things. Of course there is a case somewhere where passing the object is a better approach, but for most of my coding I’d rather the anonymous function approach I used in the functional programming version.

Chime in the comments if you have an opinion – I’m fairly new to functional programming so hearing from other people who know more than me is always nice!

Hamming distance on database records

Today I needed to quickly compare multiple records in one table with one I already had and find the most similar one, specifically in MSSQL but the query I used can be modified for most other DB systems. I am going to write this up a little formally to make sure that it is clear. Note: that this implementation can be modified to be a dynamically created query or a stored procedure.

Problem

For a given record in sourceTable (id = 12), rank records in table, based on how many values of each record match the corresponding column:value pair.

Solution

Use a modified hamming distance algorithm in the SQL as follows:

SELECT table.id,

CASE WHEN sourceTable.columnName1 = table.columnName1 THEN 0 ELSE 1 END +
CASE WHEN sourceTable.columnName2 = table.columnName2 THEN 0 ELSE 1 END +
...
CASE WHEN sourceTable.columnNameN = table.columnNameN THEN 0 ELSE 1 END AS hammingDistance
FROM table, sourceTable
WHERE sourceTable.id = 12
 ORDER BY hammingDistance ASC

This implementation of the hamming code algorithm ranks all the records in the [table] table based on how many differences there are between the column values with the same column name in each table. So as output from this query we get the id’s of the table ranked with the most similar first (a hamming distance value of 0 being identical) and the least similar last. I am not sure that this is the most efficient way of doing this but at this point in time it works and can be modified for more complicated comparisons.

Index Combiner Algorithm

Earlier in my work at Mobile Strategy I had a nice algorithmic question which was phrased as follows (I am posting it now as I just had to fix it):

There are N sets of indices (essentially a 2d array) that have to be combined together (in a list of all combinations, order does not matter) in such a way that no combination contains more than one index from one set in N. Our example will look like:

N = [[0], [1], [2,3]]

and our desired output will be:

["0", "0,1", "0,1,2", "0,1,3", "0,2", "0,3", "1", "1,2", "1,3", "2", "3"]

As far as I know the only way of getting this output is recursively (anyone who can find a non-recursive solution, leave a comment – I’m all ears for more efficient solutions), so here is the code I used for this:

ArrayList<String> strings = new ArrayList<String>();
private void recursive(int[][] indices, int array, String currentString) {
      if (array < indices.length) {
            for (int i = 0; i < indices[array].length; i++) {
                  // At this level recurse down for all items in it.
                  String fullString = ((currentString.length() > 0) ? currentString + "," : "") + indices[array][i];
                  strings.add(fullString);
                  recursive(indices, array + 1, fullString);
            }
            recursive(indices, array + 1, currentString);
      }
}

The code in itself is rather simple, the trick is the recursive part, the initial recursing call for our example would be (call 1):

recursive(new int[new int[0], new int[1], new int[2,3]], 0, "")

If we step through the code; the first call of recursive starts the loop placing “0” into strings, then recurses down passing “0” as fullString. The second call of recursive (call 2):

recursive(indices, 1, "0")

would then in the loop put “0,1” into strings and recurse with the following parameters (call 3):

recursive(indices, 2, "0,1")

Again in the loop “0,1,2” would be put into strings and then we would call recurse again (call 4):

recursive(indices, 3, "0,1,2")

As at this point the variable “array” (value 3) is greater than the length of indices – see line 3 of the code, we simply return up again at this point to call 3, continuing with the for loop we insert “0,1,3” into strings and call recurse again (call 5):

recursive(indices, 3, "0,1,3")[/code</span></span>]

This call functions as <span style="color: #ff0000;">call 4</span> did, so I will not recover the ground already covered and return into the evaluation of <span style="color: #00ff00;">call 3</span>. When we return from <span style="color: #ff0000;">call 5</span> the loop in <span style="color: #00ff00;">call 3</span> will finish and we will call recurse as on line 10 (note the use of <span style="text-decoration: underline;">currentString</span> not <span style="text-decoration: underline;">fullString</span> in this case, important in later calls) <span style="color: #ff0000;">(call 6)</span>:

recursive(indices, 3, "0,1")

Again this call functions as call 4 and 5 did so we return, finishing the execution of call 3 returning back to call 2. As this loop only has 1 element in it, the loop finishes and calls recurse on line 10 (call 7):

recursive(indices, 2, "0")

This call adds the strings "0,2" and "0,3" - I won't bother filling in the rest, I think I have shown enough for you all to get the idea of how the execution goes. You can trace out the rest from these examples I hope!

Note on colouration choices: Blue has been used for the second level calls of the function, green for third and red for the fourth level calls - in this case all red calls trigger the 'base case' ending the recursion.