Category Archives: Database

Database Theory, SQL coding.

Why I’ve come to love functional programming

Recently I have been starting to learn Node.js and inadvertently changed my opinion of functional programming VS Object Oriented programming. During my work on supporting legacy Coldfusion sites I stumbled across some code that forms the basis of this comparison of functional programming VS Object Oriented programming. (PS if you want a good introduction to the differences look on over to: http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html).

This particular example will be Coldfusion and Pseudo-javascript (or coffeescript) and focuses around a multi-keyword search on 3 columns on 2 tables. Our tables are as follows:

Table Name: Jedi
With Columns: Name, Planet, Lightsaber_Colour.

Table Name: Sith
With Columns: Name, Catch_Phrase, Teacher.

First of all we will define a master function, and a convenience function for searching each of the tables, we do this for an attempt at reusability. We’ll call this main search function “galacticSearch” which will return a SQL string and take the arguments: Keyword (required, string), table (required, string) and andFlag (optional, boolean, defaults to true). The andFlag determines whether our search will return with all the keywords or any of the keywords (AND and OR searches respectively). The SQL returned will be less than perfect as really we should use something more forgiving and inclusive than the ‘=’ operator, for example in MSSQL name LIKE “%#ListGetAt(arguments.keywords, i, ‘ ‘)#%”. Also the params should be using cfqueryparam etc etc, it’s not perfect code but it doesn’t have to be production code for illustration purposes!

Here’s my implementation in Coldfusion, standing up for the Object Oriented languages:

<cffunction name="jediSearch" returnType="string">
	<cfargument name="keywords" required="true" required="false" type="string">
	<cfargument name="andFlag" required="false" type="boolean" default="true">
	<cfreturn galacticSearch(arguments.keywords,'jedi',arguments.andFlag)>
</cffunction>

<cffunction name="sithSearch" returnType="string">
	<cfargument name="keywords" required="true" required="false" type="string">
	<cfargument name="andFlag" required="false" type="boolean" default="true">
	<cfreturn galacticSearch(arguments.keywords,'sith',arguments.andFlag)>
</cffunction>

<cffunction name="galacticSearch" returnType="string">
	<cfargument name="keywords" required="true" required="false" type="string">
	<cfargument name="table" required="true" type="string">
	<cfargument name="andFlag" required="false" type="boolean" default="true">
	<cfscript>
		var sql = 'SELECT * FROM #arguments.table# WHERE ';
		//loop through the keywords
		for(var i = 1; i LTE ListLen(arguments.keywords, ' '); i++){
			//change the columns for each table
			if(arguments.table is 'jedi'){
				sql &= "(name = '#ListGetAt(arguments.keywords, i, ' ')#'";
				sql &= "OR planet = '#ListGetAt(arguments.keywords, i, ' ')#'";
				sql &= "OR lightsaber_colour = '#ListGetAt(arguments.keywords, i, ' ')#')";
			}else{
				sql &= "(name = '#ListGetAt(arguments.keywords, i, ' ')#'";
				sql &= "OR catch_phrase = '#ListGetAt(arguments.keywords, i, ' ')#'";
				sql &= "OR teacher = '#ListGetAt(arguments.keywords, i, ' ')#')";
			}
			if(i LT ListLen(arguments.keywords, ' ')){
				sql &= (arguments.andFlag)?' AND ':' OR ';
			}
		}
		return sql;
	</cfscript>
</cffunction>

Now if your like me, you look at this code and think “How on earth do I add another table easily and safely?”. Well you have to add a new function like “jediSearch” and then add an extra if/else into the loop in galacticSearch. Personally I’d rather not do that as it’s kind of messy and galacticSearch isn’t really the generic function I’d like it to be. Also when you edit like this you risk breaking something already working and increasing development time.

So next we move onto my solution in Pseudo-Javascript. Things of note are that we have changed the galacticSearch function to also take a columnsFunction that takes a keyword and returns the WHERE clause SQL for searching a tables columns for one keyword.

searchJedi = (keywords, andFlag = true) ->
	return galacticSearch keywords, 'jedi', andFlag, (keyword) ->
		return "(name = '#{ keyword }' OR planet = '#{ keyword }' OR lightsaber_colour = '#{ keyword }')"

searchSith = (keywords, andFlag = true) ->
	return galacticSearch keywords, 'sith', andFlag, (keyword) ->
		return "(name = '#{ keyword }' OR catch_phrase = '#{ keyword }' OR teacher = '#{ keyword }')"

galacticSearch = (keywords, andFlag = true, table, columnsFunction) ->
	sql = 'SELECT * FROM #{ table } WHERE '
	keys = keywords.split(' ')
	for i, key in keys
		sql += (if i gt 0 then (andFlag)?' AND ':' OR ' ELSE '') + columnsFunction(key)
	return sql

I think that the code is organised much tidier in the pseudo-javascript version. All the code referring to the interpretation of the keywords is in one place and all the code about searching the tables is held in another function. This in turn means that adding another table or more complexity to your keywords is not hard and you don’t risk messing up your previous code by doing it! That I think is very cool, especially if you have to work with other people/s code.

Admittedly you *could* do something similar in Object Oriented world, if you created a “Jedi” object and a “Sith” object and then coded the galacticSearch to take an object as an argument instead of a table name, but this seems like overly complicating things. Of course there is a case somewhere where passing the object is a better approach, but for most of my coding I’d rather the anonymous function approach I used in the functional programming version.

Chime in the comments if you have an opinion – I’m fairly new to functional programming so hearing from other people who know more than me is always nice!

Contestate Site

A while back I had launched a site called contestate.net that I wrote myself with the help of a themeforest template. The site is built with the idea of a mobile/desktop app to track competitions between friends. A few of the things that I have posted in the angularjs and coldfusion sections of this blog originated in problems I had in building the site. I had previously only really opened it up to a few friends, but now I am opening it up to the world at large – it’s free (my company kindly gave me free hosting, lume.co.nz). So go check it out: contestate.net (and don’t worry about the Open Alpha tag I have on there, the security is the same I use for my live apps and we have ironed most of the bugs out.) I would appreciate any criticisms/bugs/suggestions or anything on the system, just leave them here in the comments, on the Contestate page up top or email them through to the contestate email address.

 

EDIT This has been taken offline as the hosting has been discontinued and I don’t feel the need to continue paying for hosting for something that isn’t used!

Hamming distance on database records

Today I needed to quickly compare multiple records in one table with one I already had and find the most similar one, specifically in MSSQL but the query I used can be modified for most other DB systems. I am going to write this up a little formally to make sure that it is clear. Note: that this implementation can be modified to be a dynamically created query or a stored procedure.

Problem

For a given record in sourceTable (id = 12), rank records in table, based on how many values of each record match the corresponding column:value pair.

Solution

Use a modified hamming distance algorithm in the SQL as follows:

SELECT table.id,

CASE WHEN sourceTable.columnName1 = table.columnName1 THEN 0 ELSE 1 END +
CASE WHEN sourceTable.columnName2 = table.columnName2 THEN 0 ELSE 1 END +
...
CASE WHEN sourceTable.columnNameN = table.columnNameN THEN 0 ELSE 1 END AS hammingDistance
FROM table, sourceTable
WHERE sourceTable.id = 12
 ORDER BY hammingDistance ASC

This implementation of the hamming code algorithm ranks all the records in the [table] table based on how many differences there are between the column values with the same column name in each table. So as output from this query we get the id’s of the table ranked with the most similar first (a hamming distance value of 0 being identical) and the least similar last. I am not sure that this is the most efficient way of doing this but at this point in time it works and can be modified for more complicated comparisons.

MSSQL Update Trick

So today I found a neat trick (that was intuitive but not obvious, at least to me and my google search for “SQL update set column = column”) was that you can set one column to be relative to anther column in the same table.
So in this example my table has 2 columns: buy and sell, initially I have set both to the same value but now I want to add 15% margin onto this. The solution is very simple:

UPDATE table SET sell = buy * 1.15

And like magic all the sell values in the table are 15% higher than the buy price.

Useful SQL Functions.

This page is really a cheat sheet for functions I use regularly and forget just as regularly.

MSSQL

mm/dd/yyyy format for dates: CONVERT(VARCHAR(10), GETDATE(), 103) AS [DD/MM/YYYY] (thanks to http://www.sql-server-helper.com/tips/date-formats.aspx – also has many other formats)

Decimal format for currency: convert(decimal(18,2), 23.234,0), though if possible it is easier to get the format correct with use of proper declaration of variables in the first place.

A quick summary of Database Normal Forms

So today in my study for exams, I worked out some simple ways of remembering up to Boyce-Codd normal form (BCNF).

  • 1st Normal Form: All rows/tuples must have the same number of columns/attributes.
  • 2nd Normal Form: Only applies to tables or schemas with composite keys, if it only has one key it is already in 2NF.
    A non-key column value must be relevant to ALL column values of the composite key. For example: on the table (keys are bolded) [Part Warehouse | Quantity | Warehouse Address] 2NF is violated as Warehouse Address is only related to Warehouse and not Part.
  • 3rd Normal Form: A non-key column can only be related to a key column. Modifying the earlier example to; [Part | Warehouse | Warehouse Address] this violates 3NF as Warehouse Address is related to Warehouse, not Part.
  • BCNF: