SQL Server 2008 - Decipher Information Systems

shrubberystatuesqueData Management

Dec 1, 2012 (4 years and 9 months ago)

219 views

Page
1

of
44







SQL Server 2008


All Blog
Posts till 04/30/2008













Page
2

of
44


© Copyright Decipher Information Systems, 2008. All rights reserved.

http://decipherinfosys.wordpress.com

Last revised: Jan. 2008



Page
3

of
44


Table of Contents

Table of Contents

................................
................................
................................
............................

3

Server Consolidation
-

things to look for in SQL Server

2008

................................
...................

4

GROUP BY and CUBE(), ROLLUP(), GROUPING() and GROUPING_ID()

functions

.........

4

GROUPING SETS in SQL
Server

2008

................................
................................
....................

8

SQL Server 2008
-

Table Valued

Parameters

................................
................................
...........

10

New Data types in SQL Server 2008
-

I

................................
................................
...................

11

Row Value Constructor Support in

SQL Server

2008

................................
..............................

12

New Features in SQL Server 2008
-

I

................................
................................
.......................

13

MSFT learning portal for SQL Server

2008

................................
................................
.............

14

Intellisense in SQL Server

2008

................................
................................
...............................

14

SQL Server 2008: INSERT over a DML

statement

................................
................................
.

16

MERGE command in SQL Server

2008

................................
................................
...................

18

Change Data Capture Feature in SQL Server

2008

................................
................................
..

21

Data Collector in SQL Server

2008

................................
................................
..........................

26

Compound Assignment Operators in SQL Server

2008

................................
...........................

29

Back Up Compression Option in SQL Server

2008

................................
................................
.

31

FORCESEEK Table Hint in SQL Server

2008

................................
................................
........

33

HierarchyID Data Type in SQL Server

2008

................................
................................
...........

36

FILESTREAM Data in SQL Server

2008

................................
................................
................

39

XML DML Enhancement in SQL Server

2008

................................
................................
........

41

Lock Escalation Changes in SQL Server

2008

................................
................................
.........

42

SQL Server 2008 Deployment Guides from

Dell

................................
................................
.....

42

Filtered Indexes and Statistics in SQL Server

2008

................................
................................
.

42

Virtual

Labs

................................
................................
................................
..............................

43



Page
4

of
44


Server Consolidation
-

things to look for in SQL S
erver

2008

We had talked about
virtualization in the database world

in a previous post. In SQL Server 2008,
there are a lot more features available for do
ing server consolidation
-

here is a link to the MSFT
whitepaper that breaks this down into the sections of flexibility, manageability and scalability &
performance:

http://download.microsoft.com/download/a/c/d/acd8e043
-
d69b
-
4f09
-
bc9e
-
4168b65aaa71/SQL2008SrvConsol.doc

If you have downloaded the CTP 4, we would recommend that you get a head start with these
features by playing with them in o
ne of your VM environments.


In case you have not upgraded
to SQL Server 2005, chances are that you would be upgrading directly to SQL Server 2008 by
the end of next year once the product is released and has been tested in real world
implementations.

GROUP BY and CUBE(), ROLLUP(), GROUPING() a
nd
GROUPING_ID()

functions

You must already be aware of the GROUP BY clause. It is used for grouping the rows by a
selected set of attributes and is typically done to get some aggregated information for that group
of columns.

/*****************************
************************************************

Creation of a TEST table and population of some dummy data

**********************************************
*******************************
/

CREATE TABLE TEST_TBL (COL1 INT, COL2 VARCHAR(10), COL3 INT)

GO

INSER
T INTO TEST_TBL VALUES (1, ‘A’, 10)

INSERT INTO TEST_TBL VALUES (1, ‘A’, 20)

INSERT INTO TEST_TBL VALUES (1, ‘A’, 30)

INSERT INTO TEST_TBL VALUES (1, ‘B’, 90)

INSERT INTO TEST_TBL VALUES (2, ‘A’, 30)

INSERT INTO TEST_TBL VALUES (2, ‘A’, 100)

INSERT INTO TE
ST_TBL VALUES (3, ‘C’, 110)

INSERT INTO TEST_TBL VALUES (3, ‘C’, 120)

GO

/*****************************************************************************

Before starting, let’s take a look at the data that exists in the table

********************************
**************
*******************************
/

SELECT * FROM TEST_TBL

GO

Page
5

of
44


COL1 COL2 COL3

-----------

----------

-----------

1 A 10

1 A 20

1 A 30

1 B 90

2 A

30

2 A 100

3 C 110

3 C 120

/*****************************************************************************

Simple Example of a GROUP BY Clause. The grouping is being done over COL1 and COL2

and

the aggregate functions are used to display the total, the average, the max and min values,
etc.

**********************************************
*******************************
/

SELECT COL1, COL2, COUNT(*) AS CNT_RECORD, SUM(COL3) TOTAL_VAL,
AVG(COL3) AVG_V
AL, MAX(COL3) MAX_VAL, MIN(COL3) MIN_VAL

FROM TEST_TBL

GROUP BY COL1, COL2

GO

COL1 COL2 CNT_RECORD TOTAL_VAL AVG_VAL MAX_VAL MIN_VAL

-----------

----------

-----------

-----------

-----------

-----------

-----------

1 A

3 60 20 30 10

2 A 2 130 65 100 30

1 B 1 90 90 90 90

3 C 2 230 115

120 110

/*****************************************************************************

Now, let us take the same SQL and use the RollUP() function in addition to the GROUP BY
clause:

**********************************************
**********
*********************
/

SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL

FROM TEST_TBL

GROUP BY ROLLUP (COL1, COL2)

GO

Usage of the ROLLUP() function generates the GROUP BY aggregate rows PLUS super
-
aggregate (cumulative) rows and a final grand total row as well.


If you see below, one row with
a sub
-
total is generated for each unique combination of values of (COL1, COL2), and (COL1).
As should be clear from the preceding statement, the order of the columns in the ROLLUP()
function can change the output as well as
the number of rows in the final result set.

COL1 COL2 TOTAL_VAL

-----------

----------

-----------

1 A 60 ==> Same as in the representation above

1 B 90 ==> Same as in the representation abov
e

1 NULL 150 ==> Aggregation of the records from above

2 A 130 ==> Same as in the representation above

Page
6

of
44


2 NULL 130 ==> Aggregation of the record from above

3 C 230

==> Same as in the representation above

3 NULL 230 ==> Aggregation of the record from above

NULL NULL 510 ==> The grand total

The above SQL can also be written as:

SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL

FROM
TEST_TBL

GROUP BY COL1, COL2 WITH ROLLUP

/*****************************************************************************

Now, let us take the same SQL and use the CUBE() function in addition to the GROUP BY
clause:

******************************************
****
*******************************
/

SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL

FROM TEST_TBL

GROUP BY CUBE (COL1, COL2)

GO

A CUBE() as the name suggests generates data for the grouping of all permutations of
expressions i.e. since we have 2 columns in our
example that we are using the CUBE() function
on, we have 2^2 which means 4 grouping sets:

COL1, COL2

COL2, COL1

COL1

COL2

Thus, one row will be produced for each unique grouping set from above and in addition, a sub
-
total row is generated for each row and

an aggregated grand total row is produced with NULL
values in all the other columns. You can see the output below:

COL1 COL2 TOTAL_VAL

-----------

----------

-----------

1 A 60

2 A 130

NULL A

190

1 B 90

NULL B 90

3 C 230

NULL C 230

NULL NULL 510

1 NULL 150

2 NULL 130

3 NULL 230

And as is obvious from the output,

the order of the columns has no bearing on the output in the
case of the CUBE() function.

Page
7

of
44


The above SQL can also be written as:

SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL

FROM TEST_TBL

GROUP BY COL1, COL2 WITH CUBE

In order to distinguish the NULL values
that are returned because of the usage of the ROLLUP()
and/or CUBE() functions versus actual NULL values in the tables, one can make use of the
GROUPING function. Example:

SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL, GROUPING(COL2) AS
GRP_VALUE

FROM TEST_TBL

GROUP BY ROLLUP (COL1, COL2)

GO

COL1 COL2 TOTAL_VAL GRP_VALUE

-----------

----------

-----------

---------

1 A 60 0

1 B 90 0

1 NULL 150 1

2 A

130 0

2 NULL 130 1

3 C 230 0

3 NULL 230 1

NULL NULL 510 1

As seen from above, whereever the GRP_VALUE is marked as 1, those are the records that were
g
enerated because of the ROLLUP() function usage.


Since the GROUPING() function takes in
only one argument, we made usage of the outermost column in order to filter out all those
records where the NULL value was being generated because of the usage of the
function. In the
case of the CUBE() function, you can use the GROUPING() function multiple times to filter out
those records.

Another function to be aware of is the GROUPING_ID() function. This function can be used to
compute the level of grouping. So, if
there are two columns like we have in our example in this
post, the GROUPING_ID() will be computed as:

Column(s) GROUPING_ID(COL1, COL2) = GROUPING(COL1) + GROUPING(COL2) GROUPING_ID()
Output

COL1 10

2

COL2 01 1

COL1, COL2 11 3

COL2, COL1 11

3

Now, let us see this with the actual SQL execution:

SELECT COL1, COL2, SUM(COL3) AS TOTAL_VAL, GROUPING(COL1) AS C1,
GROUPING(COL2) AS C2, GROUPING_ID(COL1, COL2) AS GRP_ID_VALUE

FROM TEST_TBL

Page
8

of
44


GROUP BY ROLLUP (COL1, COL2)

GO

COL1 COL2 TOTAL_VAL C1 C2 GRP_ID_VALUE

-----------

----------

-----------

----

----

------------

1 A 60 0 0 0

1 B 90 0 0 0

1 NULL 150 0 1

1

2 A 130 0 0 0

2 NULL 130 0 1 1

3 C 230 0 0 0

3 NULL 230 0 1 1

NULL NULL 510 1 1 3

In the next post,

we will talk about the new feature of GROUPING SETS that has been
introduced in SQL Server 2008. Using the GROUPING SETS, one can aggregate only the
specified groups instead of the full set of aggregations as generated by the CUBE() or
ROLLUP() functions.

GROUPING SETS in SQL Server

2008

Yesterday, we had covered the GROUP BY clause and the usage of the R
OLLUP(), CUBE(),
GROUPING() and GROUPING_ID() functions within the GROUP BY clause. In today’s post,
we will look at one of the enhancements in this are in SQL Server 2008. The enhancement is
called GROUPING SETS. This clause let’s you specify multiple gro
uping sets in the same
query. You can achieve the same functionality in prior versions using multiple queries and using
a UNION ALL clause to club the result sets together. With GROUPING SETS though, you can
accomplish the same thing with much lesser code
and the query will be more efficient as well.
This can be very useful for reporting purposes when you want to visualize the data distribution
based on different sets of groups.

Considering the same example as we did in yesterday’s post (see yesterday’s pos
t to get the
DDLs for the example):

If we want to see the effect on the totals for these groups:

COL1, COL2

COL2

COL1

No Grouping

then, we can re
-
write the SQL to be:

SELECT COL1, COL2, COUNT(*) AS CNT_RECORD, SUM(COL3) TOTAL_VAL FROM
TEST_TBL GROUP BY COL
1, COL2

UNION ALL

SELECT NULL AS COL1, COL2, COUNT(*) AS CNT_RECORD, SUM(COL3)
TOTAL_VAL FROM TEST_TBL GROUP BY COL2

Page
9

of
44


UNION ALL

SELECT COL1, NULL AS COL2, COUNT(*) AS CNT_RECORD, SUM(COL3)
TOTAL_VAL FROM TEST_TBL GROUP BY COL1

UNION ALL

SELECT NULL AS COL1,

NULL AS COL2, COUNT(*) AS CNT_RECORD, SUM(COL3)
TOTAL_VAL FROM TEST_TBL

GO

This will yield:

COL1 COL2 CNT_RECORD TOTAL_VAL



-----------

----------

-----------

-----------



1 A 3 60



2 A 2

130



1 B 1 90



3 C 2 230



NULL A 5 190



NULL B 1 90



NULL C 2 230



1 NULL 4 150



2 NULL 2 130



3 NULL 2 230



NULL NULL 8 510

In SQL Server 2008, you can achieve this very easily by using GROUPING SETS functionality:

SELECT COL1, COL2, COUNT(*) AS CNT_RECORD,

SUM(COL3) TOTAL_VAL

FROM TEST_TBL

GROUP BY GROUPING SETS

(

(COL1, COL2),

(COL2),

(COL1),

()

)

GO

COL1 COL2 CNT_RECORD TOTAL_VAL


Page
10

of
44



-----------

----------

-----------

-----------



1 A 3 60



2 A

2 130



NULL A 5 190



1 B 1 90



NULL B 1 90



3 C 2 230



NULL C 2 230



NULL NULL 8

510



1 NULL 4 150



2 NULL 2 130



3 NULL 2 230

It is the same data just ordered differently because of the grouping. The different grouping sets
are separated by a comma and t
he different elements of a grouping set are also separated by
commas. In addition, one can have multiple GROUPING SETS in the same GROUP BY clause.

SQL Server 2008
-

Table Valued

Parameters

We have blogged on the upcoming features in SQL Server 2008 (Code Named: Katmai) in some
of our previous posts. You can search for “SQL Server 2008


on
the blog site to see other posts.
We will keep on posting new

articles as and when we play with those new features (likewise for
Oracle 11g as well). In this post, we will talk about another good T
-
SQL feature in SQL Server
2008 : The table valued paramete
rs. In order to pass an array into a stored procedure (see
previous post
-

here
), one has to use XML in SQL Server 2005 and then de
-
construct it usi
ng
XQUERY features or OPENXML. This, though functional is not as simple. SQL Server 2008
now introduces table valued parameters which can be passed into the stored procedures. One
does not need to de
-
construct and parse through the XML anymore. One can jus
t pass a table full
of data i.e. an array into a stored procedure and the code can simply use it for their processing
logic. Let’s see this feature with an example to make it a bit more clear:

CREATE TYPE TestType AS TABLE

(

EMP_NAME VARCHAR(30) NULL,

HIRE
_DATE DATETIME NULL,

MGR_NO INT NULL

)

GO

Page
11

of
44


And then, I can simply pass this into my procedure:

create proc my_test_proc

(

@tbl TESTTYPE READONLY

)

as

begin

….


your logic will go over here.


you can use the TVP (Table Valued Parameter)


as a regular table a
nd do the READ
-
ONLY operations on it

end

go

In order to use this stored procedure now, I can declare a variable of that type, populate it with
the data and then pass it in. Example:

All these types are visible under the sys.table_types system catalog.

Please remember that this data set is READONLY and hence the readonly clause is mentioned in
the procedure as well. In addition, these table variables cannot be used as OUTPUT parameters
-

they can only be used as input parameters.

DECLARE @TestTbl AS Test
Type

Now, I can insert the data into this table variable:

insert into @TestTbl select emp_name, hire_date, mgr_no from emp_master where emp_id <
1000

And once I am done, I can then pass this into the stored procedure:

exec my_test_proc @tbl = @TestTbl;

Whe
n table variables are passed in as parameters, the table resides in tempdb and not in
memory…this means that we can use larger result sets also with relative ease. We will
benchmark this in our test environments and post the results on it in a later blog p
ost.

This type functionality has existed in Oracle for quite some time now so the Oracle professionals
who work in SQL Server also will find this to be a welcome addition.

New Data types in SQL Server 2008
-

I

Page
12

of
44


SQL Server 2008 has quite a few new data
-
types. We will cover those in 2 parts as we start
playing with those in our test environments.

1) D
ATE and TIME Data Types:

Currently, SQL Server supports datetime and smalldatetime
but there is no data
-
type that supports just the DATE part or just the TIME part and that is
considered to be de
-
normalized by some data modelers since the same field is bei
ng used to
represent both the date and the time piece. Not only that, in certain situations where one needs to
record the different times related to a given date, Example: The trucks can take off from the
warehouse at 8:00 A.M., 10:00 A.M., 12:00 P.M. and
so on for a given day. One needed to store
the date repeatedly for just capturing this time specific information. That is no longer the case
with the introduction of these 2 new data types in SQL Server 2008. In prior versions, one then
needed to apply the

CONVERT() function in order to retrieve only what one needed. That won’t
be necessary anymore.

The TIME data
-
type supports upto 100 nano seconds.

2) Datetime2 data type:

The name is funny. MSFT could have come up with a better name (this
does remind me of

the Oracle VARCHAR and VARCHAR2 data
-
types :
-
)). This data type
extends the granularity of the time. As you might already know, the granularity of the time in the
datetime data type is 1/300th of a second i.e. you can store only .000, .003 or .007 seconds
. This
new data type extends that to 100 nano seconds just like the TIME data type. The fractional part
can be specified by declaring the variable or a column as datetime2(n) where n is the precision
with valid values from 0 to 7.

3)

In order to support th
e TIMEZONES in a better way, MSFT has introduced
DATETIMEOFFSET(n)

data type. This support is provided in Oracle using the TIMEZONE
family of data
-
types. In SQL Server 2008, one can use it to provide better support for the
applications that require timezon
e support, Example: A Warehouse in Eastern Timezone that has
it’s data center in central timezone and also has people entering data sitting in the Pacific
timezone and this data is timezone sensitive due to shipping date and times. Likewise for a
voting sy
stem with cut off timelines for the voters etc..

There are more new data types: Better User Defined Types Support: HIERARCHYID (for better
support for hierarchical data), spatial data types (Geometry, Geography) and support of TYPES
(this feature has been
available in Oracle for ages now and will help make
array processing

and
nesting of procedures a lot easier)., FILESTREAM (for better binary data st
orage) and there have
been changes to the NTEXT, TEXT and IMAGE data
-
types as well which we will cover in the
next part.

Row Value Constructor Support in SQL Server

2008

One of the new additions that is being made to SQL Server 2008 is the addition of the row value
constructors. I have not tested it completely for all the sc
enarios but here are some examples:

Insert into dbo.decipher_test (col1, col2) values (1, 10), (2, 20), (3, 30)

Page
13

of
44


Likewise, in an update statement:

update decipher_test

set (col1, col2) = (select col3, col4 from dbo.test where col3 = decipher_test.col1)

wher
e colx > 10

Similarly, in a select statement:

select * from dbo.decipher_test where (col1, col2) in (select col3, col4 from dbo.test)

Update (SET and where clause) and Select statement (where clause) support for row value
constructors has existed in other
RDBMS like Oracle. DB2 LUW, MySQL as well as
PostgreSQL support the row value constructors for the insert statement as well. It is good to see
that SQL Server is moving in that direction as well. This is in fact an ANSI SQL standard. There
are other good T
-
SQL features also that are coming our way in the next release (like the much
awaited MERGE command, better support for the
analytic functions

-

better support for the
OV
ER clause for ranking functions and aggregates etc.) that we will be covering in our future
blog posts.

New Features in SQL Server 2008
-

I

We have blogged before about the new upcoming release of SQL Server. You can search for
“Katmai” or “SQL Server 2008


on this blog for more

information. There are a lot of new
features that are being introduced in this release as well as improvements to the existing features
like the SSIS pipeline improvements and SSIS persistent look
-
ups. In our blog posts, we will
start covering some of tho
se new features as we start playing with the CTP’s. One of the new
features that is going to be introduced is the Resource Governor. The resource governor allows
the DBA to define resource limits and priorities for different workloads which will help in
en
abling concurrent workloads to provide a consistent performance to the applications. This
feature has been available in Oracle for quite some time now and has been available in IIS 6.0 as
well. This is a very nice feature to have since you will now be able

to allocate and limit the usage
of SQL Server resources based on priorities and the workload needs of different applications.

In applications that allow the adhoc query formation against the databases (decision support
systems or adhoc end user reporting
for example) will benefit from it and at the same time if the
instance is being used for several applications, one can throttle the usage by creating policies for
usage. You should be able to do this for an application or a user.
More details from MSFT are

still to come through but if this feature is going to be anything like the Oracle’s database
resource manager, then one should be able to get these benefits (this is just a wish list)
:

1) Should be able to set policies based on different criteria like the

machine from where the
connections are coming from, the type of application, the user name, group of users etc.

Page
14

of
44


2) After policy configuration, one should be able to guarantee specific applications and users a
minimum amount of processing resources regardl
ess of the load on the system and the number of
users on the system.

3) One should be able to set up policies to be able to distribute available processing resources by
allocating percentages of CPU time to different applications and/or different users. Fo
r example:
In a data warehousing application, a higher percentage of the processing power can be given to
the ROLAP (Real Time OLAP) application as compared to the batch jobs that happen only
during the night.

4) One should be able to decrease the parallel
ism execution plan generation for any operation
performed by an user or members of a group of users.

5) Dynamic management of changing the allocation of resources without requiring a shutdown
or re
-
start.

6) Throttle and log the execution of certain operat
ions that violate the policy settings. Proper
error reporting and logging is necessary for troubleshooting the issues.

7) Should be able to allow for automatic switching of users from one group to a fallback group
based on the defined criteria in the polic
ies. Say, a higher priority is assigned to a particular
application or user and the session executed by that application is executing for more than the
specified threshold, that particular session should be allowed to automatically switch to another
group
of users with different resource requirements. This will ensure that they higher priority
tasks even though they reached their threshold get to completion.

8 ) Allowing for creation of thresholds on connections per application. Any additional
connections s
hould be allowed to queue with a timeout limit that should be configurable.

MSFT learn
ing portal for SQL Server

2008

SQL Server 2008 is slated to be released early next year. If you search on SQL Server 2008 on
our blog, you will get a good list of blog posts and we keep on adding new ones as we play
around with those feature sets. The late
st CTP for that version is CTP5.

MSFT also has put together a learning portal for that release of SQL Server and you can access it
here:

http://www.microsoft.com/learning/sql/2008/defa
ult.mspx

Intellisense in SQL Server

2008

With CTP5 release, comes intellisense in SSMS (SQL Server Manag
ement Studio). Here is an
image depicting the basic functionality:

Page
15

of
44



It is a really neat feature and can also help in learning the new objects in the system. A colleague
of mine though absolutely hates it
and wanted to turn this off. So, he went to Tools/
Options
and under the Text Editor, he set those values to off.


Well, when he came in the next day and re
-
started SSMS, there it was again. All his changes
were reverted back.


He then started toying with the registry entry and went directly to the
regist
ry setting that is toggled by those GUI options:

HKEY_LOCAL_MACHINE
\
SOFTWARE
\
Microsoft
\
Microsoft SQL
Server
\
100
\
Tools
\
Shell
\
Languages
\
Language Services
\
SQL

Page
16

of
44


and set the CodeSense field to 0. If this setting is 1, it turns on Intellisense by default.

Again,
it is a personal preference thing. I believe it is a great feature and will help everyone a
great deal
-

I always loved this feature when doing development in Visual Studio or doing
PL/SQL development in PL/SQL Developer.

SQL Server 2008: INSERT over a DML

statement

In this post, we are going to cover another T
-
SQL enhancement in SQL

Server 2008. We will be
using the same set of tables that we did in one of our
previous blog post

in which we had talked
about the MERGE command in SQL Serve
r 2008.

Using the same set of tables, here is how the data looks like:

CREATE TABLE PRODUCT_CATALOG (PRODUCT_NAME NVARCHAR(100) NOT
NULL, QTY INT NOT NULL CHECK (QTY > 0));

CREATE TABLE PRODUCT_SALE (PRODUCT_NAME NVARCHAR(100) NOT NULL,
SALE_QTY INT NOT NU
LL);

INSERT INTO PRODUCT_CATALOG (PRODUCT_NAME, QTY) VALUES (’TOY1′, 100);

INSERT INTO PRODUCT_CATALOG (PRODUCT_NAME, QTY) VALUES (’TOY2′, 50);

INSERT INTO PRODUCT_SALE (PRODUCT_NAME, SALE_QTY) VALUES (’TOY1′,
100);

INSERT INTO PRODUCT_SALE (PRODUCT_NAME,
SALE_QTY) VALUES (’TOY2′, 10);

INSERT INTO PRODUCT_SALE (PRODUCT_NAME, SALE_QTY) VALUES (’TOY3′,
500);

GO

If you read that post, you will see that we had used the MERGE command to do the DML
(Delete, Insert, Update) logic on the PRODUCT_CATALOG table. This

was the first example of
the MERGE command that we had used in that post:

MERGE PRODUCT_CATALOG PC /* target table */

USING PRODUCT_SALE PS /* source table */

ON PC.PRODUCT_NAME = PS.PRODUCT_NAME

WHEN MATCHED AND (Qty
-

SALE_QTY = 0) THEN

DELETE

WHEN MATC
HED THEN

/* update stock if you still hold some stock*/

UPDATE SET Qty = Qty
-

SALE_QTY

WHEN NOT MATCHED THEN

/*insert a row if the stock is newly acquired*/

INSERT VALUES (PRODUCT_NAME, SALE_QTY)

/* output details of INSERT/UPDATE/DELETE operations



made

on the target table*/

Page
17

of
44


OUTPUT $action, inserted.PRODUCT_NAME, inserted.QTY, deleted.PRODUCT_NAME,
deleted.QTY;

Now, suppose that you want the output of this to be put into a separate audit or processing table
but only when the update is done or when a cert
ain other condition is met. It is possible to be
able to now to a select atop the MERGE command. We will first need to create a table for storing
that data.

CREATE TABLE TEST_INSERT_OVER_DML (ACTION_CODE NVARCHAR(50),
I_PRODUCT_NAME NVARCHAR(100), I_QTY
INT, D_PRODUCT_NAME
NVARCHAR(100), D_QTY INT)

GO

/*************************************************

And now the insert over DML statement

**************************************************/

INSERT INTO TEST_INSERT_OVER_DML (ACTION_CODE, I_PRODUCT_NAME,
I_Q
TY, D_PRODUCT_NAME, D_QTY)

select Action_Code, Inserted_Product_Name, Inserted_QTY, Deleted_Product_Name,
Deleted_Qty

from

(

MERGE PRODUCT_CATALOG PC /* target table */

USING PRODUCT_SALE PS /* source table */

ON PC.PRODUCT_NAME = PS.PRODUCT_NAME

WHEN MATC
HED AND (Qty
-

SALE_QTY = 0) THEN

DELETE

WHEN MATCHED THEN

/* update QTY if you still hold some QTY for that PRODUCT*/

UPDATE SET Qty = Qty
-

SALE_QTY

WHEN NOT MATCHED THEN

/*insert a row if the PRODUCT is newly acquired*/

INSERT VALUES (PRODUCT_NAME, SALE
_QTY)

/* output details of INSERT/UPDATE/DELETE operations



made on the target table*/

OUTPUT $action, inserted.PRODUCT_NAME, inserted.QTY, deleted.PRODUCT_NAME,
deleted.QTY

) Change (Action_Code, Inserted_Product_Name, Inserted_QTY, Deleted_Product_Name,

Deleted_Qty);

And after executing it, let’s take a look at the data in the PRODUCT_CATALOG and the
TEST_INSERT_OVER_DML tables:

/*************************


PRODUCT_CATALOG

**************************/


Page
18

of
44


PRODUCT_NAME QTY

------------

-----------

TOY2

40

TOY3 500


/*************************


TEST_INSERT_OVER_DML

**************************/


ACTION_CODE I_PRODUCT_NAME I_QTY D_PRODUCT_NAME D_QTY

-----------

--------------

-----------

---------------

-----------

INSERT TOY3

500 NULL NULL

DELETE NULL NULL TOY1 100

UPDATE TOY2 40 TOY2 50

As you can see from above, the action history log has been captured and the DELETE, INSERT
and UP
DATE actions took place on the PRODUCT_CATALOG table since the product: TOY1
got deleted, the QTY for TOY2 got decremented by 10 and TOY3 record got added.

This new feature will help a lot in data warehousing ETLM operations and also for preparing
audit lo
gs for DML operations though CDC (
Change Data Capture
) would be a better option for
auditing.

MERGE command in SQL Server

2008

MERGE command (also known as UPSERT in Oracle circles) is nothing new to Oracle. It has
existed since quite some time.
We have also blogged about it on our site
-

including the
enhancements to that command that were made in Oracle 10g
. It gets introduced in SQL Server
2008 as well which is a welcome addition for all the data
-
warehousing/data
-
mining community
as well as those writing complicated feeds to their OLTP systems. This command is way more
powerful than just thinking of it as an UPDATE/DELETE/INSERT combined int
o one single
statement. Let’s set up two tables and then start going through the features of this command to
see how one can utilize it.

/*****************************************************************************

We are setting up two tables
-

One for t
he Product Catalog and the other one for the sales of the
products

**********************************************
*******************************
/

SET NOCOUNT ON

GO

CREATE TABLE PRODUCT_CATALOG (PRODUCT_NAME NVARCHAR(100) NOT
NULL, QTY INT NOT NULL CHECK
(QTY > 0));

CREATE TABLE PRODUCT_SALE (PRODUCT_NAME NVARCHAR(100) NOT NULL,
SALE_QTY INT NOT NULL);

INSERT INTO PRODUCT_CATALOG (PRODUCT_NAME, QTY) VALUES (’TOY1′, 100);

INSERT INTO PRODUCT_CATALOG (PRODUCT_NAME, QTY) VALUES (’TOY2′, 50);

Page
19

of
44


INSERT INTO PRODU
CT_SALE (PRODUCT_NAME, SALE_QTY) VALUES (’TOY1′,
100);

INSERT INTO PRODUCT_SALE (PRODUCT_NAME, SALE_QTY) VALUES (’TOY2′, 10);

INSERT INTO PRODUCT_SALE (PRODUCT_NAME, SALE_QTY) VALUES (’TOY3′,
500);

GO

/******************************************************
***********************

Apply changes to the PRODUCT_CATALOG table based on daily sales

that are tracked in the incoming Product_Sale table from different stores.

Delete the record from the PRODUCT_CATALOG table if all the items for that Product_Name
have
been sold.

Update and decrement the quantity if quantity after the sale is not 0, and

Insert a new record if there was a new item introduced at one of the regional

stores.

**********************************************
*******************************
/

MERG
E PRODUCT_CATALOG PC /* target table */

USING PRODUCT_SALE PS /* source table*/

ON PC.PRODUCT_NAME = PS.PRODUCT_NAME

WHEN MATCHED AND (Qty
-

SALE_QTY = 0) THEN

DELETE

WHEN MATCHED THEN

/* update QTY if you still hold some QTY for that product*/

UPDATE SET
Qty = Qty
-

SALE_QTY

WHEN NOT MATCHED THEN

/*insert a row if the PRODUCT is newly acquired*/

INSERT VALUES (PRODUCT_NAME, SALE_QTY)

/* output details of INSERT/UPDATE/DELETE operations



made on the target table*/

OUTPUT $action, inserted.PRODUCT_NAME, ins
erted.QTY, deleted.PRODUCT_NAME,
deleted.QTY;

$action PRODUCT_NAME QTY PRODUCT_NAME QTY


----------

-------------

------

-------------

------


INSERT TOY3 500 NULL NULL


DELETE NULL NULL TOY1 100


UPDATE TOY2 40 TOY2 50

SELECT * FROM PRODUCT_CATALOG;

GO

PRODUCT_NAME QTY

------------

-----------

TOY2 40

TOY3 500

Page
20

of
44


This is a very simple example to illustrate the functi
o
nality of the MERGE command. The
comments

given above are self explanatory. Let’s move on to other advanced features of this
wonderful enhancement. In the example above, you saw that we are joining between two tables.
There is no reason why we cannot join the PRODUCT_CATALOG table with another su
b
-
query:

Suppose, we had another table called SALE_ORDER in which we kept track of the sales of all
the products per store and the quantity that was sold. In that scenario, we will get the total Sale
Quantity by summing it up over the PRODUCT_NAME for a gi
ven date.

CREATE TABLE SALE_ORDER (STORE_NBR INT NOT NULL, PRODUCT_NAME
NVARCHAR(100) NOT NULL, SALE_QTY INT NOT NULL, ORDER_DATE DATETIME
NOT NULL DEFAULT GETDATE());

INSERT INTO SALE_ORDER (STORE_NBR, PRODUCT_NAME, SALE_QTY) VALUES
(1, ‘TOY1′, 10);

INSER
T INTO SALE_ORDER (STORE_NBR, PRODUCT_NAME, SALE_QTY) VALUES
(2, ‘TOY1′, 20);

INSERT INTO SALE_ORDER (STORE_NBR, PRODUCT_NAME, SALE_QTY) VALUES
(3, ‘TOY1′, 50);

GO

And let us drop and re
-
create the two tables from above.

Please note that in the query below
, we are

MERGE PRODUCT_CATALOG PC

USING (SELECT PS.PRODUCT_NAME, SUM(SO.SALE_QTY) AS SALE_QTY

FROM PRODUCT_SALE PS

INNER JOIN SALE_ORDER SO

ON SO.PRODUCT_NAME = PS.PRODUCT_NAME

WHERE SO.ORDER_DATE >= CONVERT(VARCHAR(10), GETDATE(), 101)

AND SO.ORDER_DATE <

CONVERT(VARCHAR(10), GETDATE() + 1, 101)

GROUP BY PS.PRODUCT_NAME) AS IV (PRODUCT_NAME, SALE_QTY)

ON PC.PRODUCT_NAME = IV.PRODUCT_NAME

WHEN MATCHED AND (Qty
-

SALE_QTY = 0) THEN

DELETE

WHEN MATCHED THEN

/* update QTY if you still hold some QTY for that pr
oduct*/

UPDATE SET Qty = Qty
-

SALE_QTY

WHEN NOT MATCHED THEN

/*insert a row if the PRODUCT is newly acquired*/

INSERT VALUES (PRODUCT_NAME, SALE_QTY)

/*output details of INSERT/UPDATE/DELETE operations



made on the target table*/

OUTPUT $action, inserted
.PRODUCT_NAME, inserted.QTY, deleted.PRODUCT_NAME,
deleted.QTY;

Page
21

of
44


$action PRODUCT_NAME QTY PRODUCT_NAME QTY


----------

-------------

-----------

-------------

-----


UPDATE TOY1 20 TOY1 100

SELECT * FROM PRODUCT_C
ATALOG;

GO

PRODUCT_NAME QTY

—————————————————————————————————
-

———


TOY1 20

TOY2 50

In the next post, we will go over another T
-
SQL enhancement in SQL Server 2008
-

in that one,
we will cover the enhancement to the INSERT statement
-

one can insert over a
DML and one
can also have minimally logged insert operations (same as direct path inserts in the case of
Oracle).

Change Data Capture Feature in SQL Server

2008

We have been blogging for qui
te some time now on the new feature sets of the upcoming release
of SQL Server
-

SQL Server 2008. You can search for SQL Server 2008 on this blog and get to
those posts. In this post, we wil cover another new feature called “Change Data Capture”. Using
thi
s feature, one can capture the DML activities (insert, update and delete) that take place against
the tables. There are two steps that need to happen in order for this to take place: A login that has
sysadmin rights first needs to enabled the database for
change data capture and then a member of
the db_owner database role needs to enable the tables for which the DML activities need to be
captured.

Let’s first enable the database and then a table for demonstrating this and then we will go over
how the captur
e process works internally.

On our instance, currently none of the databases is configured for CDC (Change Data Capture).
You can see that from the IS_CDC_ENABLED column of SYS.DATABASES:

USE MASTER

GO

SELECT IS_CDC_ENABLED, NAME FROM SYS.DATABASES

GO

IS_C
DC_ENABLED NAME


--------------

-------------------


0 master


0 tempdb


0 model


0 msdb


0 MDW


0 ReportServer


0 ReportServerTempDB

Page
22

of
44



0 AdventureWorks


0 DECIPHER_TEST

Now, in order to enable it for the database DECIPHER_TEST, we will have to run the stored
procedure: “sys.sp_cdc_enable_db_change_data_capture” in the context of DECIPHER_TEST
database:

USE DECIPHER_TEST

GO

EXEC sys.sp_cdc_ena
ble_db_change_data_capture

GO

Now, if we execute the first query, we will see that DECIPHER_TEST has been enabled for
CDC:

IS_CDC_ENABLED NAME


--------------

-------------------


0 master


0 tempdb


0 model


0

msdb


0 MDW


0 ReportServer


0 ReportServerTempDB


0 AdventureWorks


1 DECIPHER_TEST

So, what exactly did SQL Server do when we enabled the CDC option? It went ahead and
created the
cdc schema, the cdc user, the meta
-
data tables and other system objects that will be
used for this process. You can look at the image below to see some of those objects that were
created.

Page
23

of
44



Now that we have enabled the database, we will enable the table
TEST_TBL in the dbo schema
for the CDC process. In order to do so, a member of the db_owner fixed database role needs to
create a capture instance for the TEST_TBL table. We need to first ensure that the SQL Server
Agent is running and then use the “sys.sp
_cdc_enable_table_change_data_capture” procedure:

USE DECIPHER_TEST

GO

EXECUTE sys.sp_cdc_enable_table_change_data_capture

@source_schema = N’dbo’

, @source_name = N’TEST_TBL’

, @role_name = N’cdc_Admin’

GO

This execution will then create two jobs that you

will see in the informational messages as part
of this execution:

Job ‘cdc.DECIPHER_TEST_capture’ started successfully.

Job ‘cdc.DECIPHER_TEST_cleanup’ started successfully.

Page
24

of
44


In addition, if you want to track which tables have been marked for the CDC proce
ss, you can
use this SQL:

select is_tracked_by_cdc, name, type from sys.tables where type = ‘u’

go

The ones marked with 1 are the ones that are tracked.

is_tracked_by_cdc name type


-----------------

-----------------

----


0
ddl_history U


0 lsn_time_mapping U


0 captured_columns U


0 index_columns U


0 dbo_TEST_TBL_CT U


1 TEST_TBL U


0 systranschemas U


0

change_tables U

By creating a capture instance, a change table is also created by this process. There are a couple
of options that can be utilized when specifying the capture process for a table:

a) You can specify which columns you wa
nt to capture,

b) You can specify an index to help uniquely identify the rows in the change table,

c) Name of a fileroup in which that change table should be created,

d) There is a concept of net change tracking with CDC. By default, only one UDF is create
d to
access the data in the change table: cdc.fn_cdc_get_all_changes_<name of the capture instance>.
In our example from above, the name of the UDF is
cdc.fn_cdc_get_all_changes_dbo_TEST_TBL. You can also use the parameter
@supports_net_changes and assign
it a value of 1 to create another UDF which will return only
one change for each row that was changed in a given interval.

So, essentially when the DML operations are made against the source tables, these are then
recorded into the transaction log of the d
atabase.


The CDC process that we just set up then reads
the log records and inserts modifications made to captured columns in the corresponding change
table(s) in the CDC schema.


Now, let us follow it up with an example of DML operations
against the TEST
_TBL table. If you recall from a previous posts in which we had covered some
Page
25

of
44


other T
-
SQL features of SQL 2008 (
here

and
here
), this table has three columns: COL1, COL2
and COL3 and this set of data:

SELECT * FROM TEST_TBL

GO

COL1 COL2 COL3


-----------

----------

-----------


1 A 10


1 A 20


1 A 30


1 B 90


2 A 30


2 A 100


3 C 110


3 C 120

Now, we will execute these DML
statements against this table:

delete from TEST_TBL;

insert into TEST_TBL values (1, ‘A’, 1000);

Update dbo.TEST_TBL

set COL2 = ‘X’

where col1 = 1;

And this should have created the log enteries into the table in the CDC schema. Let’s query that
using the U
DF that was created for us when we had set up the CDC process:

DECLARE @start_time datetime, @end_time datetime, @from_lsn binary(10), @to_lsn
binary(10);

SET @start_time = GETDATE()
-
1

SET @end_time = GETDATE();

SELECT @from_lsn = sys.fn_cdc_map_time_to_ls
n(’smallest greater than or equal’,
@start_time);

SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn(’largest less than or equal’, @end_time);

/****************************************************************

Show the changes that happened during that query windo
w

*****************************************************************/

SELECT __$start_lsn, __$seqval,

case __$operation

when 1 then ‘Delete’

when 2 then ‘Insert’

when 3 then ‘Update (Before)’

when 4 then ‘Update (After)’

end as Operation,

Col1, Col2, Col3

Page
26

of
44


F
ROM cdc.fn_cdc_get_all_changes_dbo_test_tbl (@from_lsn, @to_lsn, ‘all update old’)

order by 1, 2;

GO

__$start_lsn __$seqval Operation Col1 Col2 Col3


----------------------

----------------------

---------------

--
---------

----------

-----------


0x00000019000001CF000B 0x00000019000001CF0002 Delete 1 A 10


0x00000019000001CF000B 0x00000019000001CF0003 Delete 1 A 20


0x00000019000001CF000B 0x00000019000001CF000
4 Delete 1 A 30


0x00000019000001CF000B 0x00000019000001CF0005 Delete 1 B 90


0x00000019000001CF000B 0x00000019000001CF0006 Delete 2 A 30


0x00000019000001CF000B 0x00000019
000001CF0007 Delete 2 A 100


0x00000019000001CF000B 0x00000019000001CF0008 Delete 3 C 110


0x00000019000001CF000B 0x00000019000001CF0009 Delete 3 C 120


0x00000019000001D20
004 0x00000019000001D20002 Insert 1 A 1000


0x00000019000001D30006 0x00000019000001D30002 Update (Before) 1 A 1000


0x00000019000001D30006 0x00000019000001D30002 Update (After) 1 X 1000

You

can imagine the benefits of the CDC feature
-

fine grained auditing, enhanced functionality
for an ETL application that needs to incrementally populate a data mart. There are of course a lot
of other things that need to be discussed pertaining CDC
-

what
happens when the source table
gets modified by a DDL command (change of data
-
type, addition/deletion of a column etc.), the
overhead of having a CDC process
-

is it any different than using a DML trigger (the answer is
Yes
-

we will cover it in a future bl
og post).

Data Collector in SQL Server

2008

This is a wonderful new feature in SQL Server 2008. Data

Collector in SQL Server 2008 is a
component that is installed in a SQL instance and which can either be configured to run on your
defined schedule or can run all the time and collect different sets of data pertaining to
performance diagnostics, historical

data for baseline comparisons, policy based data etc.. It can
then be used to store this data into a relational database that is called the management data
warehouse. So, how is it different than just using SQL Trace and perfmon? It actually provides a
ce
ntral point for data collection across servers and across applications and allows you to control
what gets collected, allows you to report over it, allows you to create a data warehouse of this
data for baseline and trend analysis, it allows you to extend
it further using an API.

Page
27

of
44


Let’s first see where it appears in SSMS and then how do we configure it. After that, in
subsequent posts, we will cover the different terminologies and different usage bits for this
wonderful feature. In the image below, you can s
ee that once you connect to the instance, you
will see “Data Collection” icon under the Management tree:


Since this is the first time we doing it, we will need to configure the management data
warehouse. So, right click and select “Configure Management D
ata Warehouse”. It will bring up
the wizard and on the second image below you will see that you will need to mention the
instance, the new database (we are calling it CMDB) and then a cache directory location.

Page
28

of
44



The third image below shows the place where
you will need to map an existing login (you can
create a new one from this screen) to the roles within the database. You can see the descriptions
of those roles in the image below.

Page
29

of
44



Once that is done, the wizard will do the configuration and also provide
you with the output of
the work done. In subsequent posts, we will start looking at the data collection sets, reporting and
other features of Data Collector.

Compound Assignment Operators in SQL Server

2008

Continuing in the series on the new features in SQL Server 2008, here is another small little tid
-
bit. SQL Server 2008
(tested in CTP5) supports the compound assignment operators. Here is a
working example of those operators:

declare @i int

set @i = 100

/**************************

Addition and assignment

***************************/

set @i += 1

select @i

Page
30

of
44


———


101

/********
******************

Subtraction and assignment

***************************/

set @i
-
= 1

select @i

———


100

/**************************

Multiplication and assignment

***************************/

set @i *= 2

select @i

———


200

/**************************

Divi
sion and assignment

***************************/

set @i /= 2

select @i

———


100

/**************************

Addition and assignment

***************************/

set @i %= 3

select @i

———


1

/**************************

xor operation and assignment

*********
******************/

set @i ^= 2

select @i

Page
31

of
44


———


3

/**************************

Bitwise & operation and assignment

***************************/

set @i &= 2

select @i

———


2

/**************************

Bitwise | operation and assignment

***********************
****/

set @i |= 2

select @i

———


2

Back Up Compression Option in SQL Server

2008

We have been covering the new features of the upcoming release of the next version of SQL
Server. You can search on “SQL Server 2008


on this blog and get

to those posts. In this post,
we will briefly cover the back up compression option that has been introduced in SQL Server
2008. All this code is valid as of CTP5.

First, let’s check the size of the database first.

/******************************

Get the s
ize of the databases

*******************************/

sp_helpdb

go

name

——————

————
-

AdventureWorks 174.63 MB

DEC_TEST_FS 46.44 MB

DECIPHER_TEST 678.31 MB

We will pick up the DECIPHER_TEST database for this post. We will run the back
-
up
command first witho
ut specifying any compression and then next with compression.

Page
32

of
44


/******************************

Do the back
-
up
-

Uncompressed

*******************************/

BACKUP DATABASE DECIPHER_TEST

TO DISK = ‘C:
\
Back_UP
\
DT.bak’

WITH INIT

/****************************
**

Do the back
-
up
-

Compressed

*******************************/

BACKUP DATABASE DECIPHER_TEST

TO DISK = ‘C:
\
Back_UP
\
DT_COMP.bak’

WITH INIT,
COMPRESSION

For the
un
-
compressed

back
-
up, we got:

BACKUP DATABASE successfully processed 65985 pages in 118.665 sec
onds (4.555 MB/sec).

For the
compressed

back
-
up, we got:

BACKUP DATABASE successfully processed 65985 pages in 129.773 seconds (4.165 MB/sec).

The difference in size was:
528MB for the un
-
compressed vs 104MB for the compressed
. All
this was done on a very
small VM but the difference in timings and the size of the back
-
ups
should give you an idea on the difference between un
-
compressed and compressed back
-
ups. The
compression ratio of course is dependent upon what kind of data do we have in the database. For

example: The string data compresses better than other data. Now that we have backed up the
database, how can we tell whether a particular back
-
up was done using compression option or
not? For that, we can use the RESTORE HEADERONLY command and check the v
alue under
the COMPRESSED column. If that value is 1 then, it is a compressed back
-
up and if it is 0, then
it is an un
-
compressed back
-
up. Example:

restore headeronly

from disk = ‘C:
\
Back_Up
\
DT_COMP.bak’

go

/*Abbreviated: Showing only the column we are int
erested in*/

Compressed

———


1

restore headeronly

from disk = ‘C:
\
Back_Up
\
DT.bak’

go

Page
33

of
44


Compressed

———


0

And the compressed and un
-
compressed backups cannot be mixed up on the same file. Example:
Trying to do a compressed back
-
up on a file that already has n
on
-
compressed back
-
up will give
us this error:

Msg 3098, Level 16, State 2, Line 1

The backup cannot be performed because ‘COMPRESSION’ was requested after the media was
formatted with an incompatible structure. To append to this media set, either omit
‘CO
MPRESSION’ or specify ‘NO_COMPRESSION’. Alternatively, you can create a new media
set by using WITH FORMAT in your BACKUP statement. If you use WITH FORMAT on an
existing media set, all its backup sets will be overwritten.

Another thing to note is that you

can also change the default behavior of the back
-
ups to be
always compressed back
-
ups. You can do so by changing the instance level setting using the
sp_configure command. Example:

EXEC sp_configure ‘backup compression default’, 1

RECONFIGURE

GO

You can c
hoose to do so or do it only at the individual command level. We will be doing some
testing with the RTM version (whenever it is released) on some real world databases to see the
CPU overhead and the back
-
up time and will post the results on the blog. Whil
e restoring these
back
-
up sets, there is nothing that needs to be specified to indicate that we are restoring a
compressed back
-
up set. Now, while doing a restore
-

we saw that the restore from the
compressed back
-
up took lesser time as compared to the res
tore from the un
-
compressed back
-
up. One would have thought that it would be the reverse since some cost will be associated with
the un
-
compressing. This is something that we will research and post in a future blog post. Looks
like the smaller size of the
back
-
up set in the case of a compressed back
-
up off
-
sets some of that
time.

FORCESEEK Ta
ble Hint in SQL Server

2008

In prior versions of SQL Server, there was a way to force the optimizer to consider an index in
the execution plan
-

however, there was no way to always force it to do a seek operation. That
has now been added in the next releas
e (tested against CTP5). A point of caution here
-

don’t be
trigger happy and start using the hints. In a majority of the scenarios, the optimizer makes the
right decision for the execution plan
-

you just need to make sure that it has statistics available

to
form the histograms correctly for the data distribution and it will do it’s job. However, there are
always corner cases and that is why SQL Server (as well as Oracle and DB2 LUW) have table
level, Query level and Join related hints available. The locat
ion of the hints and their syntax and
usage is of course different in the three RDBMS. We have discussed some of these hints before
like the ORDERED hint in Oracle (same as OPTION(FORCE ORDER) in SQL Server).

Page
34

of
44


The FORCESEEK hint is a table level hint. It ca
n also be applied to views and applies to both
the clustered as well as the non
-
clustered indexes. It can also be applied in conjunction with the
index hint i.e. you can force the optimizer to use a specific index as well as force it to use a seek
operatio
n on it. Let’s create an artificial situation here to demonstrate the usage. We will create a
table with a million records in it which has a column that has only 3 distinct values. The data will
be heavily skewed for this column. Yes
-

we are trying to re
-
create the bind variable peeking
(Oracle) or parameter sniffing (SQL Server) scenario to help demonstrate the usage of this new
hint. We had covered that in some of our posts before
-

you can read them
here

and
h
ere
.

Here goes:

SET NOCOUNT ON

GO

CREATE TABLE BIG_TABLE

(

COL1 INT IDENTITY NOT NULL,

COL2 TINYINT NOT NULL

)

GO

DECLARE @I INT

SET @I = 1

WHILE (@I <= 1000000)

BEGIN

IF (@I <=10)

INSERT INTO BIG_TABLE (COL2) VALUES (0)

IF (@I > 10 AND @I <= 100000)

INSERT INTO BIG_TABLE (COL2) VALUES (10)

IF (@I > 100000 AND @I <= 1000000)

INSERT INTO BIG_TABLE (COL2) VALUES (90)

SET @I = @I + 1

END

GO

Here is the data distribution that was created by this small script:

SELECT COL2, COUNT(*) AS REC_CNT

FROM BIG_TABLE

GROUP BY COL2

ORDER BY COL2;

Page
35

of
44


COL2 REC_CNT


----

-----------


0 10


10 99990


90 900000

Now, let us create a non
-
clustered index on it:

CREATE NONCLUSTERED INDEX BIG_TABLE_IND_1 ON BIG_TABLE (COL2)

/*FILEGROUP CLAUSE*/

GO

And now, let’s fire off a p
arameterized query against this using the data value of 90 which we
know from above qualifies for 900,000 records i.e. 90% of the records in the table.

SET SHOWPLAN_TEXT ON

GO

DECLARE @I INT

SET @I = 90

SELECT * FROM BIG_TABLE WHERE COL2 = @I

GO

Here is th
e execution plan for this statement:

StmtText

—————————————————————————————————————

|

Table Scan
(OBJECT:([DEC_TEST_FS].[dbo].[BIG_TABLE]),
WHERE:([DEC_TEST_FS].[dbo].[BIG_TABLE].[COL2]=[@I]))

And now, let us change the value that is being passed in to be 0
. As we can see from the data
distribution from above, the value of 0 qualifies for only 10 records in the table and that has a
very good selectivity (0.001 %). The index should jump right at it and use it.

DECLARE @I INT

SET @I = 0

SELECT * FROM BIG_TABLE

WHERE COL2 = @I

GO

StmtText

—————————————————————————————————————

|

Table Scan
(OBJECT:([DEC_TEST_FS].[dbo].[BIG_TABLE]),
WHERE:([DEC_TEST_FS].[dbo].[BIG_TABLE].[COL2]=[@I]))

As we know from previous posts on the topic, since the execution plan is already
in the cache
(based on the prior execution), the same execution plan gets used again this time even though the
parameter has very good selectivity. There are ways to circumvent this issue even in prior
versions of SQL Server and we have discussed those on
our blog site. Another way to solve this
Page
36

of
44


would be to always re
-
compile this query regardless of the variable or to have a constant in this
case instead of a parameter. Usage of plan guides is another way of fixing this.

Yet another way would be to use the
FORCESEEK hint in places where we know the code will
be getting fired only for the value of 0. Let’s take a look at the execution using this hint:

DECLARE @I INT

SET @I = 0

SELECT * FROM BIG_TABLE WITH (FORCESEEK) WHERE COL2 = @I

GO

StmtText

——————————————
—————————————————————————
————————————
-

|

Parallelism(Gather Streams)

|

Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000], [Expr1005]) WITH
UNORDERED PREFETCH)

|

Index Seek
(OBJECT:([DEC_TEST_FS].[dbo].[BIG_TABLE].[
BIG_TABLE_IND_1]
),
SEEK
:([DEC_TEST_FS].[
dbo].[BIG_TABLE].[COL2]=[@I]) ORDERED FORWARD)

|

RID Lookup(OBJECT:([DEC_TEST_FS].[dbo].[BIG_TABLE]),
SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)

HierarchyID Data Type in SQL Server

2008

This is another blog entry in the series of SQL Server 2008 posts. In this post, we will cover the
basics of one more data
-
type: hierarchyID. This d
ata
-
type, as the name suggests, can be used to
store hierarchical data structures like organizational trees, file system structures or any other type
of hierarchical tree structures. In SQL Server 2008, you can now store that data and very easily
query it
as well. In this post, we will just look at a basic example of putting in the hierarchy data
and querying it. There’s a whole lot more to it including updating and re
-
arranging the hierarchy,
re
-
ordering of the data, creation of indexes to help optimize th
e searches on hierarchical data etc.

Let’s look at this briefly using an example:

Let us create an EMP_MASTER table which will house only the Employee data and then two
table variables
-

one to act as the source of the data for the hiearchies and the secon
d where we
recursively populate the data in a hierarchical fashion.

/*******************************************************************

Creation of the Employee Master table

********************************************************************/

CREATE TABL
E EMP_MASTER

(

EMP_NBR INT NOT NULL PRIMARY KEY,

EMP_NAME NVARCHAR(50),

Page
37

of
44


MGR_NBR INT NULL

)

/*******************************************************************

Populating the table with some sample data

The MGR_NBR column reflects the immediate manager of
the

Employee

********************************************************************/

INSERT INTO EMP_MASTER VALUES (1, ‘DON’, 5);

INSERT INTO EMP_MASTER VALUES (2, ‘HARI’, 5);

INSERT INTO EMP_MASTER VALUES (3, ‘RAMESH’, 2);

INSERT INTO EMP_MASTER VALUES (4,
‘JOE’, 5);

INSERT INTO EMP_MASTER VALUES (5, ‘DENNIS’, NULL);

INSERT INTO EMP_MASTER VALUES (6, ‘NIMISH’, 3);

INSERT INTO EMP_MASTER VALUES (7, ‘JESSIE’, 3);

INSERT INTO EMP_MASTER VALUES (8, ‘KEN’, 5);

INSERT INTO EMP_MASTER VALUES (9, ‘AMBER’, 4);

INSERT

INTO EMP_MASTER VALUES (10, ‘JIM’, 6);

/*******************************************************************

Declaration of the two table variables

@ORG_HIERARCHY will hold the actual hierarchical data after

the recursion query populates it

@HIERARCHY_TREE

will hold the EMP_MASTER data with actual

numbering of the data partitioned by their manager’s number

********************************************************************/

DECLARE @ORG_HIERARCHY TABLE (HIERARCHY_NODE HIERARCHYID,
EMP_NBR INT, MGR_NBR INT,

EMP_NAME NVARCHAR(50));

DECLARE @HIERARCHY_TREE TABLE (EMP_NBR INT, MGR_NBR INT,
NBR_REC_IN_NODE INT)

INSERT INTO @HIERARCHY_TREE

SELECT EMP_NBR, MGR_NBR, ROW_NUMBER() OVER (PARTITION BY MGR_NBR
ORDER BY MGR_NBR)

FROM EMP_MASTER;

/************************
*********************************************

Recursive Query to take the data and populate the @ORG_HIERARCHY

table variable. This stores the Node and the path information

which we will then query using Hiearchical query methods

First we get the root data

(using the GetRoot() method)

and then we recurse through the tree and populate the

hierarchy_Node column. Note that you would need to do a cast

to a string and use the ToString() method to get the hierarchyid.

*********************************************
*************************/

Page
38

of
44


WITH ORG_PATH (path, EMP_NBR)

AS (

SELECT hierarchyid::GetRoot() AS Node, EMP_NBR

FROM @HIERARCHY_TREE AS A

WHERE MGR_NBR IS NULL

UNION ALL

SELECT

CAST(p.path.ToString() + CAST(NBR_REC_IN_NODE AS nvarchar(50)) + ‘/’ AS
hierarchyi
d),

A.EMP_NBR

FROM @HIERARCHY_TREE AS A

JOIN ORG_PATH AS P

ON A.MGR_NBR = P.EMP_NBR

)

insert into @ORG_HIERARCHY

select PATH, A.EMP_NBR, A.MGR_NBR, A.EMP_NAME

from EMP_MASTER AS A INNER JOIN ORG_PATH AS B ON A.EMP_NBR =
B.EMP_NBR;

/************************
*********************************************

Now, let’s query the data using the ToString() and GetLevel()

methods and then order by the Level field. Output is shown below

**********************************************************************/

SELECT Hier
archy_Node.ToString() AS Node, Hierarchy_Node.GetLevel() as Level, *

FROM @ORG_HIERARCHY

ORDER BY Level;

Node Level HIERARCHY_NODE EMP_NBR MGR_NBR EMP_NAME

------------

------

---------------

-----------

-----------

--------

/
0 0x 5 NULL DENNIS

/1/ 1 0x58 8 5 KEN

/2/ 1 0x68 4 5 JOE

/3/ 1 0x78 1 5 DON

/4/

1 0x84 2 5 HARI

/4/1/ 2 0x8560 3 2 RAMESH

/2/1/ 2 0x6AC0 9 4 AMBER

/4/1/1/ 3 0x856B 6 3 NIMISH

/
4/1/2/ 3 0x856D 7 3 JESSIE

/4/1/1/1/ 4 0x856B58 10 6 JIM

As you can see from above, the different levels are defined and the Nodes are defined as well.
So, what exactly does this repl
ace from the previous system? i.e. what are the alternate ways of
doing this hierarchical structure? One can use the normalized structures of a parent/child
relationship, one can even make use of the XML data
-
type to store un
-
structured data. We will
do so
me benchmarks and will post the performance results on our blog for the different methods.
In our future blog posts, we will cover the indexing schemes, the updates and re
-
arranging of the
data in the hierarchyID data
-
type columns.

Page
39

of
44


FILESTREAM Data in SQL Server

2008

We have been covering the new features in SQL Server 2008 in some of our blog posts.


I
n some
of those, we had also discussed the new data
-
types in that release.


You can search for SQL
Server 2008 on this blog site to get to all those posts.


In today’s blog post, we will discuss what
can be done for designing and implementing FILESTREAM st
orage in SQL Server 2008.


Prior
to SQL Server 2008, storage of items like documents, images was typically done by storing a
link in the table column and storing the documents and images in a particular mapped out
location on the SAN or other storage devic
e.


For images, some people also used to store it in the
imsage data
-
type but that was not always the most performant solution.

In SQL Server 2008, Filestream enables the application to store such un
-
structered data on the
file system.


So, the application

can now take advantage of the streaming APIs and performance
of the file
-
system.


In addition, one can now co
-
relate the un
-
structured and structured data
easily.


The Win32 filesystem interfaces provide the streaming access to the data and a good
thing a
bout filestream is that it does not use the SQL Server buffer cache
-

it uses the Windows
system cache for the caching of the file data.

In order to make use of FILESTREAM, one needs to enable filestream for an instance of SQL
Server and then one needs to
create a database that supports filestream.


Once that is done, we
can use the


FILESTREAM clause next to the VARBINARY(MAX) data
-
type to take advantage
of this new feature.


Let’s follow this up with an example:

1) Let us first enable the instance for fil
estream:

USE MASTER

GO

EXEC sp_filestream_configure @enable_level = 3

This can also be changed from the GUI from the Advanced level properties for the instance.


Below is the jpg image for it.


You can see from the drop
-
down the different levels that are
available for FILESTREAM.


We
have chosen the highest level
-

Transact SQL and filesystem (local and remote client access).

Page
40

of
44


2) Now that we have enabled the filestream option at the instance level, let us go ahead and
create a new database.


We would need t
o create a filegroup that has the CONTAINS
FILESTREAM as the clause so that we can designate that filegroup (and it’s files) to contain the
filestream data.

USE MASTER

GO

CREATE DATABASE DEC_TEST_FS ON PRIMARY

( NAME = DEC_TEST_FS_data,

FILENAME = N’C:
\
DEC
_TEST_FS_data.mdf’),

FILEGROUP DEC_TEST_FS_FG_1

( NAME = DEC_TEST_FS_REGULAR,

FILENAME = N’C:
\
DEC_TEST_FS_data_1.ndf’),

FILEGROUP DEC_TEST_FS_FG_2 CONTAINS FILESTREAM

( NAME = DEC_TEST_FS_FILESTREAM,

FILENAME = N’C:
\
DEC_TEST_FS’)

LOG ON

( NAME = DEC_TEST_F
S_LOG,

FILENAME = N’C:
\
DEC_TEST_FS_log.ldf’);

GO

Please note that if you try to create this database by specifying a path for the filestream files that
is not on NTFS, you will get the error message:

“The path specified by ‘d:
\
DEC_TEST_FS’ cannot be used f
or FILESTREAM files because it is
not on NTFS.”

For the Filestream filegroup, the FILENAME refers to the path and not to the actual file name.


It
creates that particular folder
-

from the example above, it created the C:
\
DEC_TEST_FS folder
on the filesyst
em.


And that folder now contains a filestream.hdr file and also a folder $FSLOG
folder.

If you already have a database, you can add a FILESTREAM filegroup to it using ALTER
DATABASE command.

3) Now, that we have the instance and the database taken care of
, let us create a table to take
advantage of this new feature:

USE DEC_TEST_FS

GO

CREATE TABLE DEC_FS_TEST

(

COL1


INT IDENTITY,

COL2


UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,

Page
41

of
44


COL3


VARBINARY(MAX) FILESTREAM

);

The table definition needs a ROWG
UIDCOL column
-

this is required by FILESTREAM.


The
actual data is stored in the column COL3.

Now, let’s insert data into this table:

INSERT INTO DEC_FS_TEST (COL2, COL3) VALUES (NEWID(), NULL);

INSERT INTO DEC_FS_TEST (COL2, COL3) VALUES (NEWID(), CAST(’
MY DUMMY
TEST’ AS VARBINARY(MAX)));

And now we can select out of this table and see how this data is represented:

COL1


COL2


COL3

———


————————————

———————————————————


1


78909DBF
-
7B26
-
4CA9
-
A840
-
4D45930F7523
NULL

2


0B0F5833
-
1997
-
4C9C
-
A9A7
-
F2536D68CFED 0×4D592044554D4D592054455354

If you see on the filesystem, you will see additional folders have been created under
DEC_TEST_FS folder.


That is shown in the jpg image below.


One can also use T
-
SQL to
update the filestream data if one needs to.


The deletes will also work
-

the delete also marks the underlying file system files for deletion.


We will be doing some
benchmarks on this new feature and will post our findings on the blog.

XML DML Enhancement in SQL Server

2008

This is another post to mention one more enhancement made in the next
version of SQL Server.


In SQL Server 2005 itself, the XML data type supports several methods such as value, exist,
modify, query and nodes.


These methods allow querying as well as modifying an XML
instance.


The modify method in SQL Server 2008 now also

allows DML on the XML
instances.


Here is an example:

/********************************************************************

Do the declarations and the assignments

Page
42

of
44


*********************************************************************/

declare @a xml = ‘<r
oot><sku ID=”10″/><sku ID=”20″/></root>’;

declare @b xml = ‘<sku ID=”1″/>’;

/********************************************************************

Do the set and the modify

*********************************************************************/

SET @a.modify

(’insert sql:variable(”@b”) as first into (/root)[1]‘);

select @a;

<root>

<sku ID=”1″ />

<sku ID=”10″ />

<sku ID=”20″ />

</root>

Lock Escalation Changes in SQL Server

2008

We have talked about lock escalation in some of our previous posts
-

you can read more on it
here

and
here
. In SQL Server 2008, there is an enhancement to the ALTER TABLE statement
that allows a new LOCK_ESCALATION option

that controls the escalation options on a table.
One can thus disable table level lock escalations. If you have implemented partitioning in your
system, then this option can be applied at the partition level as well.

This option has three possible values:

TABLE, DISABLE and AUTO. TABLE is the default
value and the behavior remains the same as SQL Server 2005 i.e. the lock escalation (when
happens) is done at the table level regardless of whether the table is partitioned or not. DISABLE
option prevents the
table level lock escalation and the AUTO option lets SQL Server engine
decide the lock escalation granularity
-

if the table is partitioned, then the escalation is allowed at
the partition level and if it is not partitioned, then the escalation is done at
the table level.

SQL Server 2008 Deployment Guides from

Dell

I was having a
discussion with one of our client’s IT directors and he pointed out some
deployment guides from Dell for SQL Server 2008 deployments. This client is one of the early
adopters of SQL Server 2008 and is evaluating going live on it by the end of the year. The
se
guides have a wealth of information in them and I would encourage everyone to read through
them:

SQL Server 2008 CTP5
,
SQL Server 2008 on Windows 2003

and
SQL Server 2008 on Windows
2008
.

Filtered Indexes and Statistics in SQL Server

2008

Page
43

of
44


Filtered indexes have been in
troduced in SQL Server 2008 as a means of optimizing the non
-
clustered indexes. Suppose that you have a column which has a range of status values ranging
from 0 to 99. However, the application only queries the data that lies between 20 and 30. A
filtered i
ndex will be suited for this scenario since it will have the data in a well defined subset.
So, what would be the benefit of having this index on a sub
-
set of the data? Storage definitely is
the first thing that comes to mind i.e. the storage needs for the

index will be reduced which goes
hand in hand with lower maintenance overhead since the rebuild or update of the stats will take
lesser time in this case. In addition to these, it is good for performance as well since it is smaller
in size as compared to
the regular non
-
clustered index.

And there are other creative ways of using it as well. If you have read one of our previous
blog
p
ost

on ways to allow a unique index in SQL Server and DB2 LUW to allow more than one
unique value, what we had provided was a work
-
around. One can make use of filtered indexes to
create such a constraint. Let’s use the same example that we had used in that

post:

CREATE TABLE TEST_UQ (COL1 INT IDENTITY(1,1) PRIMARY KEY, COL2
NVARCHAR(10) NULL)

GO

insert into test_uq (col2) values (’abc’);

insert into test_uq (col2) values (’xyz’);

insert into test_uq (col2) values (Null);

insert into test_uq (col2) values (N
ull);

Now, let’s create the Unique index as a filtered index:

create unique index Decipher.TEST_UQ_IND_1 on Decipher.test_uq(col2) where
Decipher.test_uq(col2) is not null;

By making use of the filtered index, we have removed the NULL values while enforcin
g the
constraint. Another thing to keep in mind is that filtered statistics have also been introduced in
SQL Server 2008 (to see the difference between indexes and statistics, see our previous
blog
post
). Filtered Statistics are essentially created on a sub
-
set of the data to help improve the
performance of queries that select from a defined sub
-
set (say only specific status codes,
only
open orders, only specific billing codes etc.).

You can read up more in the
online BOL version
.

Virtual

Labs

MSFT virtual labs is a good way to try out new technology through a guided hands on labs in
their virtual online environment. They have virtual labs available for Business Intelligence, SQL
Server (in
cluding SQL Server 2008), Windows (including the latest 2008 OS),Exchange and
many others. This is a very good way to get your hands dirty and gain some knowledge without
investing into the software, hardware or training. Check out this link on TechNet for

the virtual
labs:

Page
44

of
44


http://technet.microsoft.com/en
-
us/bb467605.aspx?wt.svl=leftnavvirtuallabs

Also, here is the link for Windows Server 2008 virtual labs:

http://www.microsoft.com/events/series/windowsserver2008.aspx?tab=virtuallabs