Getting Rich with SQL Server: Finding moving average crossovers with T-SQL Window Functions

If you’re a fan of Jim Kramer and friends over on CNBC, you’ll no doubt be aware that speculators place great importance on moving averages as a tool for analysing and indeed predicting movements in the prices of stocks, currencies, commodities, and heaven knows what else.

Some T-SQL developers, earlier.

For the uninitiated, here’s a quick primer. For a given time-series of data, a moving average is the arithmetic mean of the last n observations, where n is chosen according to the needs of the application. The idea of this is to smooth out short term fluctuations in order to highlight longer term trends. The smaller the value of n, the more closely the moving average will track the observations themselves.

Now, the theory as it relates to financial markets is that if we take the 50-day and 200-day (n = 50 and n = 200) moving averages of the price of the instrument (meaning stock, bond, whatever) we’re interested in, we can attain great wealth by waiting until these lines cross one another and taking action accordingly. The action we take is that we “buy” or “go long” the instrument when the shorter average crosses above the longer, and “sell” or “go short” when the shorter average crosses below the longer. The idea of this is to profit from the “trend” in the price data; so many words have been written about this already that I won’t add to the tally here.

This kind of calculation has traditionally been the domain of Excel, or Excel’s “big brother” SQL Server Analysis Services, but since the introduction of the T-SQL window functions in SQL Server 2012, we can get rich using just the facilities built into the database engine.

The instrument we’re going to be looking at is the Euro/US dollar exchange rate. The reason I’ve chosen this is that the historical data we need is freely available from the Federal Reserve website ; data about stocks and commodities tends to come with all kinds of licensing complications on redistribution, which means you can use it to get rich in the comfort of your home, but using it anywhere else is rather more risky. In the roundabout way of many things in finance, EUR/USD looks like it might mean Euros per USD, but in fact it means the exact opposite, USD per Euro.

The Setup

I’ve downloaded a list of historic rates from the Federal Reserve, this comes as a csv file which is easy to import with `BULK INSERT`. For the time being, this url should take you directly to the right place. There are a few rows of header information at the start of the file which we need to skip, as well as some rows that have no exchange rate recorded – these have “ND” in place of an exchange rate. The reason there are no rates recorded for these dates is that they are public holidays, so rather than take any special action to code around the “ND” entries, we’ll just permit a few errors to allow the rest of the rows to be inserted.

You’ll see a few errors of the form

in SQL Server Management Studio, but if you issue a

you should see some data returned.

Calculating the moving averages

To calculate a moving average, we need to take the arithmetic mean of the last n operations. This can be easily accomplished with the windowed `AVG()` function in SQL Server 2012.

We’ve selected the existing columns from our table, as well as calculating two new ones named SMA50 and SMA200. (The “S” stands for “simple”, in order to differentiate this from other kinds of moving average, which are off-topic here). The syntax illustrates one of the main differences between the relational engine and SSAS/Excel, which is that in both SSAS and Excel, the data is intrinsically ordered, whereas in the relational engine we have to supply the ordering at query time for a predicate like “ROWS BETWEEN 49 PRECEDING AND CURRENT ROW” to make sense.

The good old days?

Before moving on, let’s take a quick look at how we would have done this moving average calculation in SQL Server 2008 R2. A common approach would be to assign row numbers using the `ROW_NUMBER()` function in a CTE, then self-join the CTE to find the last n values, where n = 50, 200, etc.

The clue, of course, is in the correlation names.

If we do a quick performance comparison, remembering to issue

before each query, the results are as follows:

CTE version

As you can see, there’s no contest. You can check out the reasons why not by looking in the execution plans for the two queries; I won’t repeat this comparison for the rest of the queries in this article.

Finding the crossovers

Having generated the moving averages, the next step on the road to riches is to find the points where the two averages cross over. If we define “crossing over” to mean “yesterday the SMA50 was less than the SMA200 and today it isn’t” (or vice versa), it’s obvious that the `LAG()` function can help us here.

Now, it would be great if we could just wrap our existing calculations in `LAG()` like this

but sadly this just gives us the error

It seems that window functions, like the cuckoo, don’t like nesting.

To work around this constraint, we’ll just wrap our existing query in a CTE and append the new columns.

We now have an “Action” column with a large number of nulls. If you scroll through you should see a few signals indicated, on 30 January 2001, 18 May 2001, 25 September 2001 and so on. If we try to move the “Signal” expression into the `WHERE` clause to filter out the nulls:

we get another error message:

The error message suggests the underlying reason for this restriction; since windowing functions operate on the result set of the rest of the query, they are not (logically) evaluated until near the end of the query, after the where and group by clauses; the final set of rows needs to be known before an expression like `LAG(column, 1)` makes sense.

So, we’ll need to wrap this query in yet another CTE when we go on to calculate the next step:

PROFIT!

More Background Information

Profit and Loss in Retail FX trades are generally reckoned in “Pips”. A “pip” is the smallest price change that can be recorded for a given FX pair. Since our EUR/USD rates are quoted to 4 decimal places, this is equivalent to 1/100 of a cent (or 1/10000 of a dollar) per Euro.

We will assume for the sake of this example that whenever we get a “Buy! Buy! Buy!” signal we will go long EUR/USD – meaning we sell USD and buy Euros, and whenever we get a “Sell! Sell! Sell! signal we will close that trade and immediately enter a short trade – meaning sell Euros and buy USD.

We can measure the pips “profit” by subtracting the exchange rate at the time of the trade from the exchange rate at the time of the next trade. This sounds like a job for the `LEAD()` function. Finally, we can use a windowed `SUM()` function to calculate our cumulative profit in pips.

The complete query

As you can see, between 1999 and 2013 this strategy didn’t quite live up to its promise!

This is quite a long posting already so I’ll leave it there for today; I intend return to this example in future posts to add a few more calculations.

Some thoughts on “I Am Better Than You”, or How ‘DevOps’ is [not] Killing the Developer

I was going to write this a as a comment on Grant’s blog post, but it ended up being way too long.

In Grant’s post, he makes the indisputable point that within an organisation, regardless of each individual’s specialisation, be it sysadmin, dba, developer, SAN admin, or even “devops”, each of them share a common focus on helping the business serve its customers, defined in the widest sense.

Taking a step back, there’s another way in which all of these technical specialisations have something in common.

It’s easy enough for “real developers” to be dismissive of the “point-and-click” brigade, whether this refers to the system administrators, the SharePoint consultants, or the Access developers. Thinking more abstractly though, if a system admininstrator uses the “metro” Server Manager to add a server to a domain, all that is really going on is that he is instructing the “system” – considered abstractly – to behave in a certain way. Unlike in a “real” programming language, these instructions are expressed through a sequence of clicks, rather than through a sequence of statements, expressions, and operators, compiled into a file to be placed in position (by “devops”?)

Image from Technet

Consider the Scratch tool for creating simple games.  Massively simplifying, this allows a child (or adult!)  to drag and drop pre-defined elements to assemble a simple computer game. Now, the server that hosts Scratch has a way of persisting these clicks so that the game can be played in a repeatable fashion, just like Active Directory has a way of persisting the clicks in Server Manager so that the “system” responds in a repeatable fashion. No-one would deny that Scratch is “programming”, but somehow this conceptual leap is further in the case of  Server Manager.

What Scratch and Server Manager have in common is that they are both high-level abstractions, and like many high level abstractions, are designed to suit a fairly specific purpose. They suffer from many of the problems of high-level abstractions, such as the difficulty of collaborative “development”, not least because of the problems with source control (VBA, anyone?), and their lack of adaptability to tasks outside a specific purpose.

Another example of such an abstraction is Transact-SQL, of course, which is great for database access but has far fewer capabilities outside the database than COBOL, for instance, which is a much lower level of abstraction.

In the case of Server Manager, for example, the tool can’t be adapted to configure a Sharepoint site, you need to use a different tool:

Image from Technet

Again, we create our “program” for Sharepoint through a series of clicks, and the result of these clicks is magically persisted somehow.

If we switch to a lower level of abstraction, such as Powershell, then we can use this single tool to do both of the tasks above, and probably do them quicker to boot. Instead of clicking, we create a “special” file, (our .ps1 file) and put it in a folder somewhere and ask the system to do something with it. Again, this result is magically persisted. From the perspective of the user, of course, a Sharepoint site configured using Central Admin is exactly the same as one configured using Powershell. We could equally use C# to do any of the tasks above, as well as a bunch of other stuff that is beyond the remit of Powershell.

Powershell is of course an abstraction itself, of the .NET framework, which itself abstracts the Windows API, etc. etc. The difference is that real programmers do their pointing and clicking by placing special files that they make with “compilers”, or placing scripts in special places so they can be converted into such special files.

So, in terms of the end result, all of the above activities can be considered “programming”, it’s just that they operate at different levels of abstraction; it’s not until you meet a guy with a soldering iron that you have got to the end of the abstractions. All the activites are directed at making a “system” respond to “input” in a certain way, and presumably this behaviour is defined by a “customer” of some description.

So what is this if it’s not “programming”?

xkcd: Real Programmers

Behind the Green Door: Granting non-privileged users permissions to read execution status from the SSIS catalog

The SSIS catalog, introduced in SQL Server 2012, is a feature that allows us to manage projects and parameters, execute SSIS packages, and generate reports on our SSIS environment, all within a single SQL Server database, rather than spread across the range of .dtsconfig files, dtexec.exe batch files (did someone say 32bit?) , and log locations provided in previous versions of SQL Server Integration Services. You can read more about it here.

The SSIS catalog is contained in a single database called SSISDB. This is a user database, rather than a system database, and can be managed just like any other user database.   In this respect, you can think of SSIS as being just like any other vendor application which uses SQL Server to store its data, except that in this case the vendor in question is Microsoft.

I recently came across a requirement to allow the end users access to view the execution status of their SSIS packages without recourse to the IT Department. Naturally, this information was to be displayed in a “traffic light” indicator, since I heard somewhere that this is the only language end users understand. The users in question don’t need permission to run the SSIS packages, only to see the last execution time and execution status of each SSIS package – all the packages are run by SQL Server Agent scheduled jobs. I won’t be discussing the traffic light further here, we’ll content ourselves with creating a select statement to return the relevant information from the SSIS catalog.

I’ve created a dummy package, which you can download from here if you like, and a SQL Server Agent Job to execute the package.

The job has a single step, which uses the SSIS subsystem to execute our package:

All of the information we require for our query is contained in a view named catalog.executions, inside the SSISDB database:

I’ve run the package a few times here.

Now, in the normal course of things, our users naturally don’t have any permissions in the SSIS catalog.  At installation there is a single database role created in the SSISDB database, called SSIS_Admin, whose members can read and execute all packages and other objects  in the SSIS catalog. This is rather more access than our users require, so the next step is to create our own database role (SSISDB is a user database rather than a system database, after all) and grant the required permissions to this role.

The following script creates a new login for a windows user “englebert”, and a user for this login in the SSISDB database. Next it creates a role “SSIS_Reader” and adds our user to this role. The role is granted “read” and “read objects” permissions on our folder, equivalent to the action in this dialog:

Sadly this query returns no rows.

This is because we’ve granted permissions on the objects in our folder, but this permission doesn’t extend to the executions of our packages.

To work around this, we need to change the SQL Server Agent job to use a TSQL step type.

We call the stored procedure catalog.create_execution , which gives us back the  execution_id  of this run of the package. Next, before running the package with catalog.start_execution , we call catalog.grant_permission , but with  @object_type  of 4, which means “operation”, and pass in the execution _id we just retrieved.

Having run this a couple of times, “Englebert” is now able to see all the executions that were started through this mechanism.

Changing the database owner during SSDT Database Project Deployment

Slightly dis-spiriting that this is still required in 2014, given the blog entry from Gert Drapers in 2007 here and associated connect item here, but I was reminded of this recently when I noticed some stuff running as a named user account – the application in question made use of “execute as owner” in a couple of places.

The suggestion in the original blog entry is to use sp_changedbowner  in a post-deployment script to set the database owner, but nowadays we can do slightly better than that, not least because sp_changedbowner has been deprecated since SQL Server 2008.

The other drawback with sp_changedbowner is that like many of the “legacy” administrative stored procedures, it requires you to be in the context (using use …) of the database you wish to alter. Now, this isn’t generally a problem when using SSDT, as the generated deployment script issues a USE \$(DatabaseName)  at the top, but you can never be too careful, particularly as you don’t know what else has gone on in the other post-deployment scripts – unless, of course, you can be bothered to read them.

ALTER AUTHORIZATION  is the current way to change the owner of a database, and indeed of any other securable.

So, what I do now is to include a file AlterAuthorisationDB.sql in every project, and call it from the post-deployment script. (Remember, you can only have one script per project marked as the post-deployment script).

This file consists of exactly one line:

This will work the same way in any project, since the \$(DatabaseName)  variable is pre-defined.

I then call this from the main post-deployment script:

You can see I generally organise post-deployment scripts into folders, I use “Misc” for stuff like this.

Database refactoring and SSDT Deployment

This topic came up on Twitter last night; in particular the question of how to split a table into two during a deployment without losing the data. The articles – and Connect item – linked in that conversation were all a few years old, so I thought I’d revisit this topic using SSDT in Visual Studio 2013.

Splitting a table is a fairly common operation to have to undertake during an application upgrade – in particular when you realise that a relationship you thought was one-to-one should really be one-to-many, or even many-to-many, for instance Customer to Address. This operation is so common, in fact, that it’s addressed in the Refactoring Databases book.

I’m using the Chinook database from chinookdatabase.codeplex.com, and we’ll make precisely this change, splitting the Customer table into Customer and Address. I’ve already imported the database into a Visual Studio 2013 Database Project. In this instance, we’re going to allow one customer to have many addresses, so we will need a foreign key from the new Address table back to our existing Customer table.

Here’s the DDL for the existing Customer Table:

Of these columns, we want to move Address, City, State, Country and PostalCode to our new Address table, and add a foreign key back to the Customer table.

The first step is to create our new table, and make the changes we need to our existing Customer table. The Primary Key of the new table will be CustomerID and AddressType.

Now for the fun part. We need to add pre-deployment and post deployment scripts to our project. Pre-deployment scripts are executed after the schema comparison has taken place, so any objects we create here won’t be dropped by the dacpac upgrade operation. We can use this feature to create a temporary table, a table variable, or a “temporary” table (meaning a table called “tmp<table_name>” ) to hold the contents of our customer table whilst the rest of the deployment runs.  For no particular reason, I generally do the third of these.

There’s another consideration at this point, which is that we only want to do all of this if we’re changing from a version of the database with the address data in the Customer table to one where the addresses are held separately. There are a few approaches to this involving maintaining database version numbers in a table (e.g. AdventureWorks), or even in extended properties.  The approach I like to use is simply to test whether the change we’re making has already been made – I realise this isn’t as robust as the other methods, but it has suited me up until now. You can see this at the start of my pre-deployment script.

Finally, SSDT is going to complain that “Rows were detected. The schema update is terminating because data loss might occur.”. We have a couple of options to deal with this, the first being to uncheck the “Block Incremental Deployment If Data Loss Might occur” box in the publish profile advanced properties dialog. However, given that we quite like having this box checked, we’re going to take a different approach, namely deleting all the rows from the customer table before proceeding (remember we saved these rows in the “temporary” table).

The final gotcha is that if we inspect the Chinook data model, we can see that the Customer table is referenced by two other foreign keys, from the Invoice and Employee tables.

Chinook Data Model

We need to disable both of these in our pre-deployment prior to deleting the rows.

All that remains is to populate the Customer and Address tables from our post-deployment script. Again, we only want to do this if it’s necessary, so we’ll test for the existence of our “temporary” table before doing anything. We’ll assume that we can use Address Type “1″ (it is, after all the first type!). There is a slight annoyance here, which is that Customer has an identity column, but this is easily overcome.

Finally, before heading to the pub, we need to re-enable those constraints we disabled in the pre-deployment step.

Intelligent Design, or the Evolution of the dacpac?

As the title suggests, what follows is a speculative history, prompted by Jamie Thomson’s “Dacpac Braindump” post, and thoroughly uninformed by any inside information on the development of SQL Server Data Tools. Nonetheless, speculating on why something works the way it does can sometimes be useful in understanding how something works the way it does.

In the beginning there was something called DataDude, or Visual Studio Team System for Database Professionals, and eventually something called Visual Studio 2010 database projects. These tools enabled users of Visual Studio to develop database objects in a declarative manner, build their projects into .dbschema files, and deploy these .dbschema files with vsdbcmd.exe. The Visual Studio developers looked at this, and saw that it was good.

Some things were still a bit of a pain though, not least the fact that one had to have a Visual Studio license in order to take advantage of these tools, and you needed to redistribute, or at least copy to your production servers, the deployment utility vsdbcmd.exe.

In parallel to this, a feature known as “Data Tier Applications” was introduced with SQL Server 2008 R2, and subsequently back-ported to SQL Server 2008 Sp2. This used the DacFx framework version 1.0 to create data-tier applications, which were a special kind of database that could be deployed using the “Data Tier Application Wizard” and “monitored” using something called the “SQL Server Utility”. Visual Studio 2010 also contained a “Data Tier Application” project type, which could be used to create such “applications”.

One major impediment to the widespread adoption of this technology was that in-place upgrades of the deployed schema weren’t possible; the “upgrade” process (from msdn) was as follows:

• Verifies that the deployed DAC and the DAC package both have the same application name (for example, both names are set to Finance). It also evaluates whether the instance of the Database Engine meets the requirements specified in the server selection policy (if defined) in the new version of the DAC, and whether the existing database contains objects not supported in a DAC.
• Deploys the new version of the DAC from the DAC package. This creates a new database with a temporary name.
• Sets the original database to read-only if it is not already in read-only mode, and copies the data to the new database.
• If the original database was in read-only mode, the new database is set to read-only.
• The original database is renamed by appending a string to the end of the database name.
• The new database is assigned the original database name.
• After the database administrator has confirmed that the new database is operating correctly, the original database can be archived.

In short, this feature was as much use as a chocolate teapot.

These limitations were addressed with the release of DacFx 1.1, available in SQL Server 2008 R2 Sp1 and Visual Studio 2010 Sp1. However, since this new functionality was introduced in a service pack (or a service pack to a service pack?), nobody paid any attention to it (Hello, DAXMD!).

Enter SQL Server 2012, with its “extended” beta period. At this time, the tools now known as SSDT were referred to as project “Juneau”, which combined the development environment from Visual Studio Database Projects with the deployment capabilities of the DacFx framework. This, in my uninformed opinion, is the reason why SSDT was released with so many “missing” features – unit testing, schema view, data generation, etc, etc, as all the effort was going into re-engineering the project system to generate .dacpacs instead of .dbschemas.

So now, in early 2014, where do we find ourselves? Many of the “missing” features have been added back into the SSDT project system, and we’re able to deploy our projects idempotently using just the facilities provided in a base SQL Server installation, without having to redistribute vsdbcmd. There are alternative tools, most notably from RedGate, but it’s rare that I do any new database work without using SSDT.

Testing SSIS Send Mail Tasks without a mail server

Papercut is a handy utility for testing SSIS Send Mail tasks, or anything else  – Database Mail, Powershell, etc. – that needs to send mails, in environments where it’s either inconvenient or impossible to set up a “real” mail server. You can download it from Codeplex here.

We can minimise this window and it will disappear into the system tray.

Now we create an SSIS package with an SMTP Connection Manager, and just set the “SMTP Server” parameter to “localhost”:

And we’re done. When we run the SSIS package, we get a little pop-up in the system tray:

Clicking on the envelope opens the Papercut window so we can view the message:

In case you need it, the SSIS project for this post can be downloaded here. To import an .ispac file into SSDT-BI, follow the instructions here.

Dragging and dropping from SQL Server Object Explorer to an SSDT Project

Am I the last person on earth to have learned about this feature? I went so far as to raise a connect item requesting something similar.

Here’s the scenario:

I have an SSDT project (containing the Northwind.dacpac from this post), and as you might expect the objects in the SQL Server Object Explorer (SSOX) window correspond exactly to the ones in the solution explorer window.

However, on this project, we’re using a shared development database, and look what we’ve found:

Needless to say, this “temp” table is nowhere to be found in our SSDT project, nor in our source control.

What I learned today from Mr “Microsoft” on Connect is that you can drag and drop the “table” from SSOX right into the appropriate folder in Solution Explorer:

And here is Barry’s table, installed in our project, and ready to be added to source control:

Changing the Master Data Services Web Service Account

Not too long ago, I had to change the service account, or rather the “Application Pool Identity” for a customer installation of SQL Server 2012 Master Data Services.

Searching revealed  one result from the msdn forums, which seems to only tell part of the story.

There are three steps to this process; the first is to create a new SQL Server login and add it to the appropriate roles. I already have a domain user called SQLSOPHISTRY\MyNewMdsAccount, so now we need to create the SQL Server login:

and add our new login to the mds_exec group in our MDS database:

Adding the new login to the mds_exec role in the MDS database

The next step is to add the new login to the “MDS_ServiceAccounts” domain group:

Adding the the MDS app pool account to the MDS_ServiceAccounts domain group.

The next step is to change the application pool identity using Internet Information Services Manager, or “inetmgr”, as you may know it.

The application pool account is in the “Advanced” settings of the application pool:

We need to click on the ellipsis to change the “Identity” property of the Application Pool:

The final step is to recycle the MDS Application Pool:

After this, you should be able to see the MDS Service running as the new account.

Installing SQL Server Master Data Services Web Service Prerequisites with Powershell

This script will “switch on” all the required Windows Roles and Features listed at Web Application Requirements (Master Data Services) on a new installation of Windows Server 2012. Depending on what you selected when you were installing Windows and/or SQL Server 2012, some of these items may be present already, but the Powershell cmdlet Add-WindowsFeature seems to tolerate this.

If you’re running the web client on the same machine, you’ll need Silverlight 5 as well.