Data Quality Problems and RedGate Solutions

I need to reverse engineer a database, but can’t connect my computer to the database server

Data modeling tools can reverse engineer the physical model directly from the database, but often direct access to a database instance is restricted.  This is a perfect case for SQL Clone, where you can quickly clone a database, mask any sensitive data and generate your model from the clone.  SQL Source Control or SQL Compare can also be used to generate a blank database which can be reverse engineered.

Most modeling tools can also reverse engineer from CREATE scripts.  Both SQL Source Control and SQL Compare can be used to generate the scripts needed for this.  SQL Compare can be used in an ad-hoc or ongoing basis to produce scripts in a folder, and SQL Source Control should be updated with every deployment so its scripts are current/

I need to profile my data

Data profiling can be a resource intensive process, and it’s not advisable to run in-depth profiling against a highly transactional database during business hours.  One of the more basic approaches is to use a traditional backup/restore to a location more suitable for profiling.  It works, but this can be time consuming and lacks the ability to hide sensitive data.  SQL Backup Pro might be another option if you need to orchestrate backups and restores for profiling. 

SQL Clone can be used to automate large backups and restores with less space and shorter times.  When used in conjunction with SQL Data Generator can mask sensitive data, making profiling less worrisome. 

Database Documentation

Database documentation is an essential component in data governance, especially if there are a number of disparate systems which suddenly must exchange data.  Systems can use the same term but with different definitions.  SQL Doc can be used to generate  documentation in a number of formats (including GitHub flavored Markdown).  SQL Doc can also be used to add extended properties to tables and columns.  These can be definitions, or identifiers which can be used to map back to an enterprise data dictionary.

sql_variant and unique constraints

One of my database engineer teammates put this little demo together, but she doesn’t blog so she let me most it here (thanks Y.L.!).

Sql_variant is a data type introduced in SQL Server 2008 which allows data of varying types to be stored in the same column in their native types.  Despite its flexibility, sql_variant suffers from some misunderstanding and doesn’t find wide use.

One of the nuances of sql_variant is with unique constraints.  Usually when we apply a unique constraint to a column which is of a single type so we don’t think about the comparison other than the value.  However, when we use a sql_variant, uniqueness is determined by both value and type.

To put this to the test, you can run this little experiment in any version of SQL Server since 2008:

CREATE table #test (GCId SQL_VARIANT, name VARCHAR(10))

CREATE UNIQUE NONCLUSTERED INDEX [UK_test ] ON #test
(
[GCId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 70)
GO

insert into #test (GCId, name)
values( ‘2009728366’, ‘varchar’)

insert into #test (GCId, name)
values( 2009728366, ‘bigint’)

SELECT *
FROM #test

–drop table #test

Even with the implicit conversion between bigint and varchar, you can still insert the same value as long as its a different type.  Try re-running the insert and select statements again and see them both fail because the second attempt violates the unique constraint.

This is in no way meant to dissuade the use of sql_variant, but merely to point out one consideration regarding its use.  When it comes to storing data, we need to store data in their correct type.

10 Reasons You Need SQL Prompt 7

I put a lot of thought into doing the least amount of work possible, and you should too.  That’s not to say I’m lazy–quite the opposite–it’s to say we all need to put some time into working smarter.  Working smarter is one of the reasons I’m such a fanboy of RedGate’s products, both for my .NET as well as my SQL Server and MySQL work.  One of my favorite tools is SQL Prompt, an SSMS plugin that adds “missing” functionality, or has similar features which work better.  Either way, if you use SSMS, SQL Prompt will make your life better.  Here are 10 reasons how:

1. Better snippets.  How many times do you start with “Select * From” or “Select Top 100 * From”?  How often do you love typing all of that, each time, over and over and over again?  Work smart with SQL Prompt, and type “ssf” or “st100” followed by any key, and have those commands auto-expanded for you.  You can even easily make your own snippets, or grab one from the community repository at https://github.com/gvohra/sqlpromptsnippets.

ssf

It’s true that since 2012, SSMS has also had snippets, but working with them is clunky.  Here’s how to insert a snippet, and here’s how to create one.  Ugh.

Pro tip: Try the “yell” snippet.

2. More intelligent IntelliSense.  As with snippets, SQL Prompt has a better implementation of an existing SSMS feature.  IntelliSense is Microsoft’s trademark for the “what we think you mean to type next” feature, and in .NET it’s a great implementation.  In SSMS, the implementation isn’t as smooth.

SSMS’ Intellisense orders everything alphabetically–user tables and views are mixed in with system objects.  This will get annoying fast if you’re a down-arrow user, or if your tables start with the same letters as system objects.

2016-08-16_17-12-49

In contrast, SQL Prompt groups objects by type, then alphabetically.  All user tables are listed first, arranged alphabetically.  Then user views, system objects, and so on.  The suggestions are filtered in the same way as you type the object name, making it very quick to select what you need with only a few keystrokes.

2016-08-16_17-11-53

SQL Prompt is also alias-aware, and will make suggestions for temporary tables and procedure variables, including table variables.

2016-08-16_23-23-08

SQL Prompt gives you more options for its behavior that SSMS’ Intellisense, also.

2016-08-16_17-14-21 2016-08-16_17-14-40

3. Query reformatting.  When writing .NET code, Ctrl+K, Ctrl+D “prettifies” the code–fixing indents, line breaks, and so on to improve readability.  The same key combination in SSMS is reserved, but doesn’t reformat a query (apparently it’s a feature in the “text editor” only).

SQL Prompt makes it happen in the query editor with Ctrl+K, Ctrl+Y.  There are all kinds of options you can turn on, but one of my favorites is “Expand Wildcards” (off by default, enable at SQL Prompt >> Options >> Format/Styles/Actions).  Undo (Ctrl+Z) reverses the formatting, like it should.

format

4. Copy as IN clause.  Something else we do when debugging–copy a set of values, reformat it somehow to make a list, then paste it in another query.  We all have our ways of reformatting, from query magic to macros in text editors.  None of that is needed anymore–SQL Prompt adds a menu extension for “Copy as IN clause”, where a selected column of values can be copied and pasted into a query.

5. Open in Excel.  Excel is the world’s #1 BI tool, and is commonly used for data debugging and profiling.  Rather than “Copy with Headers” on your resultset, you can now just open he results directly in Excel.

6. Script as INSERT.  When testing or debugging, I’ll often take the results of a query and reformat them so I can insert them into an temp table, then use that temp table as part of the next validation step, and so on.  This feature greatly simplifies my process by creating an INSERT statement with a temp table and all the selected values in a new tab as soon as you choose this option.  This is a very intelligent feature–you can select a single value, multiple values, or the entire resultset and the ensuing INSERT statement will be accurate.

7. Colored tabs per connection.  SSMS lets you color the status bar for each connection (server and database), which means you know which tab you need just by the color (unless you’re the kind of person who has eight bazillion tabs open at once, not much anyone can do there).  For me, I color code by server, and local=blue, test=green, beta=yellow, and prod=red.  Tab colors FTW!

8. Execution warnings.  I’m sure you’ve never forgotten a WHERE clause in an UPDATE or DELETE statement, just the same as I never have.  Yeah, I’ve never done that…  Because I’ve never done that, I actually write these statements backwards, starting with the WHERE clause, just to make sure I put one in.

Here’s the feature which may save your butt, big-time.  Would have been great to have when I executed a few queries of notoriety.

2016-07-28_23-20-59

9. Matching object highlighting. Want to see everywhere an identifier (column name, alias, etc) is used? Select any instance, and all the other instances of that identifier are highlighted. Click on whitespace to unhighlight.

sql-prompt-highlight-matching-objects

(I stole this picture from the release notes at http://www.red-gate.com/blog/building/sql-prompt-7-2-released-sql-server-management-studio-2016-support-new-features).

10. Development as a stand-alone product.  SSMS features are added with new versions of SSMS, every 2-ish years.  SQL Prompt is a stand-alone product and adds dozens of useful features during those intervals.  You can see teh SQL prompt release notes at http://documentation.red-gate.com/display/SP7/Release+notes+and+other+versions to get an idea of what a “point release” can include.  Beta features are public months in advance–see http://www.red-gate.com/blog/redgate-products/why-support-and-upgrades-matter for an example of how much is going on at all times.

In conclusion: work smart, my friends.  Some of these features are in the SSMS base product, but SQL Prompt does a better job, and some of these features are unique to SQL Prompt.  If SQL Prompt can do this (and more) for just writing queries, imagine what the whole SQL Toolbelt can do for your entire database development process–writing, source control, testing and deployment.  There is no reason to do any of that half-assed.

A Simple Unit Test Script for SQL Server

As we develop applications, it often makes sense to put some functions in a SQL CLR assembly.  As these assemblies are deployed through test, beta and production environments we need an easy way to ensure the assemblies have been updated correctly and are functioning as designed.  Many things can go wrong—incorrect permissions, deploying the wrong version of an assembly, even assumptions we made in the business logic.  I’m a big fan of RedGate’s SQL Test plugin—it’s an attractive SSMS plugin which organizes and runs tests via an easy to read panel—and the tSQLt (a unit testing framework for SQL Serve) which SQL Test is based on.  My next post will show recreate the testing strategy shown here using SQL Test.

One downside to tSQt is that you have to install a number of stored procedures and functions into your database.  Depending on your choice of database, permissions in your database and willingness to add additional objects to your database which aren’t drectly related to the data, using tSQLt may not be an option.  Although we’re testing a user defined function, we could also test stored procedures and SQLCLR assemblies.  This technique is also not limited to SQL Server—I use this in both SQL Server and VistaDB.  As you build out your application, you may want to consider adding diagnostic pages (or screens) where you can execute these testing procedures should your application start throwing errors.

The function we’ll be testing will be simple—we’ll pass in an nvarchar, and return the same nvarchar with the word “wombat” added to the end.

Start by creating a custom function:

CREATE FUNCTION [AddWombat]
( @phrase NVarChar(50)
)
RETURNS NVarChar
AS
BEGIN
return @phrase + ' wombat'
END

Now it’s time for the test harness.  I’m using VistaDB, creating a stored procedure and using table variables, syntax supported by more recent versions of SQL Server.  You may need to optimize for the database you’re using.  The first table holds our test cases—the test name, the value we want to pass into our function, and the expected result.  We then loop through all the tests, calling the function we want to test, and updating the result.  Our result comparison is very simple, similar to an Assert.AreEqual.  With a little more work, additional test types could be added, and the test indicated in the test case.

create procedure TestAddWombat as
begin

DECLARE @testcases TABLE(TestNumber INT IDENTITY(1,1) PRIMARY KEY,
TestName NVARCHAR(100),
TestCase NVARCHAR(100),
Expected NVARCHAR(100),
Result NVARCHAR(20));

DECLARE @testname VARCHAR(100);
declare @testcase nvarchar(MAX);
DECLARE @expected NVARCHAR(MAX);
DECLARE @result NVARCHAR(max);

DECLARE @testCaseCount INT;
DECLARE @counter INT;

SET @counter = 1;

INSERT INTO @testcases
( TestName ,
TestCase ,
Expected
)

VALUES  ( N'should_return_hello_wombat' , -- TestName - nvarchar(100)
N'Hello, ' , -- TestCase - nvarchar(100)
N'Hello, wombat'  -- Expected - nvarchar(100)
)

INSERT INTO @testcases
( TestName ,
TestCase ,
Expected
)

VALUES  ( N'should_return_null' , -- TestName - nvarchar(100)
NULL , -- TestCase - nvarchar(100)
NULL -- Expected - nvarchar(100)
)

SELECT @testCaseCount = count(*) FROM @testcases

WHILE @counter <= @testCaseCount

BEGIN

SELECT @testname = TestName,
@expected = Expected,
@testcase = TestCase
FROM @testcases
WHERE TestNumber = @counter

SELECT @result = dbo.AddWombat(@testcase)

IF(@expected is NULL)
BEGIN
IF(@result is NULL)
UPDATE @testcases SET Result = 'True' WHERE TestNumber = @counter
ELSE
UPDATE @testcases SET Result = 'False' WHERE TestNumber = @counter
END
ELSE
BEGIN
IF(@expected = @result)
UPDATE @testcases SET Result = 'True' WHERE TestNumber = @counter
ELSE
UPDATE @testcases SET Result = 'False' WHERE TestNumber = @counter
END
SET @counter = @counter + 1;

END

SELECT * FROM @testcases

end

 

 

Running this sproc returns the following results:

 

image

 

From here, we have a very basic test harness which we can easily add additional test cases, and easily extend.  TDD is a best practice in any aspect of application development, but database development is often not treated as application development and is far behind adopting practices common to C# developers.  This script is a simple way to introduce some TDD into database development.

Connecting SQL Source Control to Git in Team Foundation Service

Team Foundation Service is Microsoft’s hosted TFS, and supports both Git and TFS source control.  TFS(ervice) is free for teams up to five, and all repos are private.  If you need a Team Foundation Service account, go to http://tfs.visualstudio.com and log in with your Microsoft ID.  If you’re using Git, you’ll also need to create an alternate set of credentials to connect to the repo.

In SQL Source Control, Git is now a top-level provider, but full Git support hasn’t been implemented yet (Subversion and TFS have much better support in the current version, 3.4 at the time of this post).  For the current version of SQL Source Control, you’ll still need to switch to your favorite Git tool for add/commit/push.  Upcoming features in SQL Source Control for Git include better branching support and support for migrations.  Migrations allow you to alter schema objects without dropping them first—like changing the name of a table without dropping it first—as well as seeding initial data.

1. Create project in TFS

image

 

2. Navigate to Code tab, clone the repo in your favorite Git tool using the URL provided.  TFS only supports HTTPS right now, so you need a set of alternate credentials to use Git.

image

image

 

3. In SSMS, right-click the database you want to put under source control and select “Link database to source control”.

image

 

4. Then, browse to your working folder and select whether each developer will have their own copy of the database, or everyone will work from one central database.

image

 

5. Once the database has been linked, click on the “Commit Changes” tab and choose “Save Changes”.  For a Git repo, this just saves the script files—you still need to do an add, commit and push in your favorite tool.

image

 

6. Once you’ve done an add/commit/push, you can log into TFS again and use all the functionality of diffs, history tracking, etc.  Your other team members can pull from this repo and use SQL Source Control to easily keep their instances up to date.

A SQL Script to generate a 4-4-5 Mon-Sun calendar table

In a previous life, I did A LOT of BI reporting, and everything we did was based on the company’s 4-4-5 fiscal calendar, and a Mon-Sun work week.  Adding to the fun was the mutant Julian style date JD Edwards uses internally.  This system violated every known calendaring convention, but “that’s how it’s set up in JDE”, and if you’re a JD Edwards shop, you know that argument renders all others invalid.  Oh yeah, the fiscal year started on a different day every year.

To make our lives easier and our apps and reports more performant, we utilized calendar tables.  Problem is, every couple years, we had to add more dates to the tables.  When this duty fell onto me, rather than being The Last Person, and spend all day fiddling with Excel and a subsequent import, I spent part of a day writing a simple script that can be run whenever necessary to add more dates.

Because we were integrating information from a number of systems (WMS, TMS, LMS and payroll, notably), all with their own equally mutant calendaring systems, we have a fairly wide table with all kinds of date indicators in it.  This script is kind of a recursion hell, but it needed to be in order to iterate all of the output columns correctly.

Below are the comments from the SQL file; you can download the entire script from this link: 4-4-5 Calendar

/*

This script is meant for a 4-4-5 calendar, Mon-Sun week. Every leap year introduces an extra week, which we add in November.

User Variables

FiscalCalendarStart = The date on which a fiscal year starts. This is used as the base date for all calculations

EndOfCalendar = The date on which the calendar should end. This does not have to be the end of a fiscal year, but if it’s not, you might have to run the script again to get to the end of the fiscal year.

RunningDaySeed = Usually 1, this is used to measure the number of days since the calendar began, often used for depreciation

RunningPeriodSeed = Usually 1, the number of fiscal months since the original calendar began

RunningWeekSeed = The number of fiscal weeks since the original calendar began

FiscalYearSeed = The starting fiscal year

Iteration Variables–don’t mess with these

JdeJulian = the date expressed in JDE’s Julian format

CurrentDate = The calendar date being calculated against

WorkWeekSeed = Fiscal Week

WorkPeriodSeed = Fiscal Month

WorkQuarterSeed = Fiscal Quarter

WeekOfMonth = Rolling week of month

FiscalWeekEnding = Last day of the fiscal week

WorkPeriodSeed = Some legacy thing we must have or the world will end.

But, used to assign where the extra “leap week” goes. Based on the 4-4-5 calendar.

IsLeapYear = 29 days in February extra week in November?

 

Output Columns (most of these exist to make reports easier)

DateSID = JDE’s Julian Date

CalendarDate = the date, duh

Sysdate = YYYYMMDD, based on calendar date

RunningDay = the number of days since we installed JDE

WorkPeriod = roughly correlates to the fiscal month

RunningPeriod = the number of fiscal months since we installed JDE

WorkWeek = The fiscal (Mon-Sun) week of the year

RunningWeek = the number of fiscal weeks since we installed JDE

FiscalYear = The numeric fiscal year

FiscalYearLabel = the pretty field used on a report

WorkQuarter = The fiscal quarter

FiscalQuarter = Another representation of the fiscal quaruer

FiscalQuarterLabel = used on reports

FiscalPeriod = YYYY + WorkQuarter + WorkPeriod (zero-padded)

FiscalPeriodLabel = used on reports

FiscalWeek = YYYY + WorkQuarter + WorkPeriod (zero-padded) + week of the fiscal month

FiscalWeekLabel = used on reports

CalendarYear = calendar year, duh

CalendarQuarter = the traditional definition of a quarter

CalendarQuarterLabel = used on reports

CalendarMonths = the traditional calendar month

CalendarMonthLabel = used on some reports

WeekEnding = the last day of the calendar week (Saturday)

FiscalWeekEnding = the last day of the fiscal week (Sunday)

FiscalMonth = Based on the fiscal calendar, relates to the WorkPeriod somehow

FiscalMonthLabel = used on some reports, based on the FiscalMonth

*/

Sept 2011 Is Microsoft Month at Packt Publishing

Apologies to my SQL Server friends for not getting this posted sooner, the SQL Server book specials have expired.  However, there is still time for the SharePoint and Silverlight folks to save on great books from Packt Publishing.

From 9/11 through 9/20, the featured subject is SharePoint.  Take 20% off all print books, and 30% off all e-books.  Included are titles on development, administration and books for end users!

From 9/21 through 9/30, take the same discounts off all Silverlight books.  This includes books on MVVM, Silverlight 5 LOB apps and a Windows Phone 7 Cookbook!

For full details, including all titles, check out http://www.packtpub.com/packt-10-days-of-sql-server-silverlight-sharepoint.

Speaking at Aug 10 2010 Pgh .NET, with swag!

Five great sessions this time (#1 is sure to be the best, though).  This will be my first talk as a Technical Evangelist for ComponentOne.  And, I'm bringing C1 swag.  

 August PGHDOTNET Meeting – 5 Guys with Code (plus Swag-a-palooza)

 Tuesday, August 10th, 2010
 5:30 PM – 7:30 PM
 Microsoft Offices, 30 Isabella St., Pittsburgh, PA 15212
 Registration details :
http://www.pghdotnet.com/201008.aspx
 
Join us for a fun-filled evening as 5 developers present for 15 minutes on a topic of their choice.  In between each session we'll be giving away all sorts of swag.  This is the can't-miss event of the year!!
 
Session 1
  A Quick Look at the new SQL CE Engine
  Rich Dudley
 
Session 2
  Layout Controls for XAML
  John Hidey
 
Session 3
  jQuery Code Snippets in Visual Studio 2010
  David Hoerster
 
Session 4
  Bowling for TDD
  Eric Kepes
 
Session 5
  Table Value Parameters with SQL Server 2008 and Microsoft .NET

  John Sterrett 

SQL Source Control Release Candidate Drops!

RedGate’s SQL Source Control has been a great tool, and it’s now hit v1.0!  Pricing has been announced, too.  Here’s the email:

Great news! The Release Candidate for SQL
Source Control
is now available.

Recommendations:

·        Use your own local dedicated db for
development.  If you require a shared db, see
our forum.

·        Do not use SQL Source Control on your
production database server.

Price:

SQL Source Control is licensed per
user.  It will be included in the SQL Toolbelt and SQL Developer
Bundle.  If you have one of these with support and upgrades, then you will
get SQL Source Control at no extra cost.  SQL Source Control is also
available as a standalone tool:

SQL Source Control
Pricing

(without support &
upgrades)

(with support and
upgrades)

 

1 user

5 user

10 user

 

1 user

5 user

10 user

$

 $295  

 $1,195

 $1,995 

$

 $369

 $1,494

 $2,494

$/user

 $295 

 $239

 $199

$/user

 $369

 $299

 $249

For more information, please contact sales@red-gate.com.

RedGate’s SQL Source Control is now feature complete!

Big news for this cool SSMS add-in–RedGate's SQL Source Control is now feature complete.  A full list of v1 features is listed at http://www.red-gate.com/MessageBoard/viewtopic.php?t=11129.

The current beta version is 0.4.0.4, and I've been working with it since somewhere in the late 0.1 early 0.2 timeframe.  It's been a very stable and well done tool the whole time I've been evaluating it.

Interestingly, each beta version expires two weeks from release, so it looks like RedGate is using Scrum to develop the product. Cool!

Easter egg alert: If you leave the Help >> About screen open for 30 seconds or so, one of my favorite old arcade games opens up.