SQL CLR Demystified

bubblesradiographerΔιακομιστές

4 Δεκ 2013 (πριν από 3 χρόνια και 4 μήνες)

86 εμφανίσεις

SQL CLR Demystified

A look at the integration of the CLR into SQL Server


Matt Whitfield

Atlantis Interactive UK Ltd

What will be covered


Overview of CLR
integration


SqlContext


Stored Procedures


Scalar & Table
Functions


CLR DML Triggers


CLR DDL Triggers


Aggregate Functions



CLR Types


Visual Studio
Database Projects


A look at permission
sets


Examples:


Data maniplation for
performance


Environment
manipulation for
maintenance

Overview


CLR integration into SQL Server allows us to run
CLR code that conforms to certain restrictions


CLR integration means that we can use common
business rule validation code in the middle tier
and the database


For some operations, CLR integration allows us
to extract maximum performance


CLR integration allows us to perform
maintenance tasks that would have been very
difficult or impossible before

CLR Disabled by Default


CLR integration is disabled by default in
SQL Server, and therefore must be
enabled:

EXEC

sp_configure

‘clr enabled’
, 1;

RECONFIGURE WITH

OVERRIDE;

GO


This can also be done using the
Surface Area Configuration tool.

What Framework version?


The CLR in SQL Server is always loaded as the
2.0 runtime


Until .NET 4.0, a single CLR host process could
only load a single version of the .NET runtime


It is possible future versions of SQL Server may
allow more up
-
to
-
date runtime versions to be
loaded


We can confirm the loaded version with:

select

*

from

sys.dm_clr_properties


However, we
can

use types from .NET 3.0 and
3.5, as these releases did not update mscorlib

Some Things To Note


When SQL Server loads assemblies, they are cached in
memory. When the O/S signals memory pressure to
SQL Server, explicit garbage collection may be run, and
the assemblies may be unloaded. This can cause
performance issues if it happens frequently.


SQL CLR code cannot be costed accurately by the query
optimiser, as it does not look at what the code actually
does


this can affect execution plans.


SQL CLR code can sometimes prevent parallelism, and
SQL CLR procedures are usually single threaded.
Sometimes this can hurt performance, other times you
may find you achieve the same amount of work in the
same time using 1 thread instead of 8.

SqlContext


SqlContext is a class that gives us access
to four members:


IsAvailable


Pipe


TriggerContext


WindowsIdentity


IsAvailable indicates whether the other
members can be used

SqlContext.Pipe


Pipe is the main point of interaction with
the SQL Server when returning result sets
or messages


Messages can be sent to the client using
the Send(string) function:

SqlContext
.Pipe.Send(
“Hello World”
);


Single row result sets can be sent using
the Send(SqlDataRecord) function.

Multi
-
row Result Sets


SendResultsStart is used
to specify the schema of
the result set


No data is sent by the
SendResultsStart call


0 or more calls to
SendResultsRow are
then sent with data


Empty result sets can be
sent by making no calls to
SendResultsRow


SendResultsEnd marks
the end of the result set


SendResultsStart
(
SqlDataRecord
)
SendResultsRow
(
SqlDataRecord
)
SendResultsEnd
()
SqlContext.WindowsIdentity


This returns a standard
System.Security.Principal.WindowsIdentity
object


The property can only be used under
EXTERNAL_ACCESS or UNSAFE
permission sets


For 99% of CLR work, using this property
is not required

SqlContext.TriggerContext


TriggerContext gives you access to:


ColumnCount


number of columns in the target table


EventData


SqlXml that returns the same as the
EVENTDATA T
-
SQL function


TriggerAction


a TriggerAction enumeration value
that specifies which action caused the trigger to fire


IsUpdatedColumn


method that determines if a
column was updated


same as the T
-
SQL UPDATE()
function


…but what about the inserted and deleted virtual
tables that are available in normal triggers?

Context Connection


To access data from within CLR code, the
special ‘context connection’ connection string is
used…

using (
SqlConnection

connection =


new
SqlConnection
(
"context connection=true"
))

{


connection.Open();


...

}


We can then run any SQL statement as we
would normally from .NET


The inserted and deleted virtual tables are
available through this method when in a DML
trigger

Stored Procedures


What is a stored procedure?


A procedure that contains a sequence of
operations, which may affect existing data,
return result sets or both


CLR Stored Procedures are no different


CLR Stored Procedures are implemented as
static members of a class


The containing class is irrelevant to the
execution of the procedure, and is an
organisational unit only

Hello World

public partial class

StoredProcedures

{


[Microsoft.SqlServer.Server.
SqlProcedure
]


public static void

HelloWorld()


{


SqlContext
.Pipe.Send(
"Hello World!"
);


}

}



Procedure is just a method, decorated with the SqlProcedure attribute.


Procedure is registered in SQL Server as follows:


CREATE PROCEDURE

[dbo].[HelloWorld]

AS EXTERNAL NAME

[SQLAssembly].[StoredProcedures].[HelloWorld]

GO

Simple Data Access


In part 1:


we will open a context connection


we will execute a command


we will read the results, and use those to populate a Dictionary
object


In part 2:


We will create record metadata


We will create a record set


We will populate the record set from the Dictionary


While a very mundane example, and not something you
would usually use the CLR for, it shows the basic
mechanics of moving data into and out of the CLR

Simple Data Access (1/2)

public static void

RowCounter()

{


Dictionary
<
string
,
int
> typeDictionary =
new

Dictionary
<
string
,
int
>();


using

(
SqlConnection

connection =
new

SqlConnection
(
"context connection=true"
))


{


connection.Open();


using

(
SqlCommand

command =
new

SqlCommand
(
"SELECT [type] from [sys].[objects];"
, connection))


{


using

(
SqlDataReader

reader = command.ExecuteReader())


{


while

(reader.Read())


{


string

type = reader.GetString(0);


if

(typeDictionary.ContainsKey(type))


{


typeDictionary[type] = typeDictionary[type] + 1;


}


else


{


typeDictionary.Add(type, 1);


}


}


}


}


}

...

Simple Data Access (2/2)

...


SqlDataRecord

record =


new

SqlDataRecord
(
new

SqlMetaData
[] {


new

SqlMetaData
("Type", SqlDbType.NChar, 2),


new

SqlMetaData
("Count", SqlDbType.Int)


});


SqlContext
.Pipe.SendResultsStart(record);


foreach

(
KeyValuePair
<
string
,
int
> kvp
in

typeDictionary)


{




record.SetString(0, kvp.Key);


record.SetInt32(1, kvp.Value);


SqlContext
.Pipe.SendResultsRow(record);


}


SqlContext
.Pipe.SendResultsEnd();

}

Things to note


That we did not create a new
SqlDataRecord object for every row, we
just populated the existing object


That the name exposed to SQL Server
does not have to match the name of the
implementing method


The return type of the stored procedure
can be int to return a value, as with T
-
SQL
stored procedures


Scalar Functions


What is a scalar function?


A statement or set of statements that return a
single value of a specific type based on zero
or more parameters


CLR Scalar Functions are no different


CLR Scalar Functions are implemented much
the same as stored procedures


a single
static method of a class

String Length Scalar Function

[Microsoft.SqlServer.Server.
SqlFunction
]

public static long

fn_LengthTestCLR(
string

input)

{


return

input.Length;

}


CREATE FUNCTION

[dbo].[fn_LengthTestCLR]


(@input nvarchar (
MAX
))

RETURNS

bigint

AS EXTERNAL NAME

[SQLAssembly].[UserDefinedFunctions].[fn_LengthTestCLR]

GO

Performance Comparison


Consider the equivalent T
-
SQL function, and the following SQL
Statement:


CREATE FUNCTION

dbo.fn_LengthTest(@input varchar(MAX))

RETURNS

[bigint]

AS

BEGIN


RETURN
LEN(@input)

END



SELECT

SUM
(dbo.fn_LengthTest(o.[name]))


FROM

[sys].[objects] [o]
CROSS JOIN



[sys].[all_columns] [ac]
CROSS JOIN



[sys].[all_columns] [ac1]

CLR would be slower, right?


No!



On a dual core 2GHz CPU



SQL Function


9.036 seconds


SQL CLR Function


3.918 seconds



Overhead of calling a SQL CLR function is
lower than that of calling a T
-
SQL function


Table Functions


What is a table function?


A statement or set of statements that return a single
result set with a defined schema based on zero or
more parameters


CLR Table Functions can only be multi
-
statement
equivalents, there is no concept of an ‘inline CLR
table function’


Implemented as a pair of methods


one that returns
an IEnumerable, and one that sets outbound
parameters for each object


both are static members



Regular Expression Function


In Part 1:


We will implement an object that we can return to
hold the data required to fill each row


In Part 2:


We will create a list of our storage object


We will run a regular expression match based on the
parameters


We will populate our list with the results of the regular
expression


We will return the list


In Part 3:


We will set the values to be stored in each row

A Regular Expression Function (1/3)

private struct

_match

{


public readonly int

MatchNumber;


public readonly int

GroupNumber;


public readonly string

CaptureValue;


public

_match(
int

matchNumber,


int

groupNumber,


string

captureValue)


{


MatchNumber = matchNumber;


GroupNumber = groupNumber;


CaptureValue = captureValue;


}

}

A Regular Expression Function (2/3)

[
SqlFunction
(FillRowMethodName =
"FillRow"
,


TableDefinition =
"MatchNumber int, "

+


"GroupNumber int, "

+


"CaptureValue nvarchar(MAX)"
)]

public static

IEnumerable

GetCaptureGroupValues(
string
input,


string
pattern)

{


List
<
_match
> matchList =
new

List
<
_match
>();


int

matchIndex = 0;


foreach

(
Match

m
in

Regex
.Matches(input, pattern))


{


int

groupIndex = 0;


foreach

(
Group

g
in

m.Groups)


{


matchList.Add(
new

_match
(matchIndex, groupIndex++, g.Value));


}


matchIndex++;


}


return

matchList;

}

A Regular Expression Function (3/3)

public static void

FillRow(
Object

obj,


out int

MatchNumber,


out int

GroupNumber,


out

SqlString

CaptureValue)

{


_match

match = (
_match
)obj;


MatchNumber = match.MatchNumber;


GroupNumber = match.GroupNumber;


CaptureValue = match.CaptureValue;

}

Creation SQL

CREATE FUNCTION

[dbo].[GetCaptureGroupValues]


(@input [nvarchar] (
MAX
),


@pattern [nvarchar] (
MAX
))


RETURNS TABLE



([MatchNumber] [int]
NULL
,


[GroupNumber] [int]
NULL
,


[CaptureValue] [nvarchar] (
MAX
)
NULL
)

AS EXTERNAL NAME

[SQLAssembly].[UserDefinedFunctions].[GetCaptureGroupValues]

GO



Again, the name exposed to SQL Server does not have to match the
name of the main body function


The SQL definition can specify different column names and
parameter names to the main method and fill row method




Things to note


That we used a struct for the storage object,
rather than a class, because of reduced
instantiation costs


That our function was decorated with a
SqlFunction attribute that named the fill row
method and specified the shape of the output
table


That the fill row method took an object as it’s first
parameter, and had an outbound parameter for
each of the columns we wanted to fill

CLR Triggers


Known bug in Visual Studio 2005 and
2008 that means SqlTrigger attribute does
not allow you to specify the schema of the
target table


Fixed in Visual Studio 2010


Workaround is to comment out the
SqlTrigger attribute, and deploy manually

DML Triggers


What is a DML Trigger?


A statement or set of statements run when
data in a target table is modified


CLR DML Triggers are no different


CLR DML Triggers still have access to the
inserted and deleted virtual tables


CLR DML Triggers are implemented as static
methods in a class

A Simple DML Trigger


We will specify the event type and target
object using the SqlTrigger attribute


We will connect to the database using the
context connection


We will select the rows from the inserted
virtual table, and count them


We will send a message to the SqlContext
pipe stating the number of rows

A Simple DML Trigger

[Microsoft.SqlServer.Server.
SqlTrigger
(Target=
"CallLog"
,


Event=
"FOR UPDATE"
)]

public static void

TestDMLTrigger()

{


int

i = 0;


using
(
SqlConnection

connection =
new

SqlConnection
(
"context connection=true"
))


{


connection.Open();


using

(
SqlCommand

command =
new

SqlCommand
(
"SELECT * from inserted;"
, connection))


{


using

(
SqlDataReader

reader = command.ExecuteReader())


{


while

(reader.Read())


{


i++;


}


}


}


}


SqlContext
.Pipe.Send(i.ToString() +
" rows in inserted"
);

}

DDL Triggers


What is a DDL Trigger?


A statement or set of statements run when a
schema modification event occurs


CLR DDL Triggers are no different


CLR DDL Triggers have access to the event
data via SqlContext.TriggerContext


CLR DDL Triggers are implemented as static
methods in a class

A Simple DDL Trigger


We will specify the target and the event
type using the SqlTrigger attribute


We will create an XmlDocument based on
the TriggerContext object from the
SqlContext object


We will then use XPath to find the created
object’s schema and name


We will send a message to the SqlContext
pipe stating the name of the created object

A Simple DDL Trigger

[Microsoft.SqlServer.Server.
SqlTrigger
(Target=
"DATABASE"
,


Event=
"FOR CREATE_TABLE"
)]

public static void

TestDDLTrigger()

{


XmlDocument

document =
new

XmlDocument
();


document.LoadXml(
SqlContext
.TriggerContext.EventData.Value);



string

objectName = document.SelectSingleNode(
"//SchemaName"
).InnerText +


"."

+


document.SelectSingleNode(
"//ObjectName"
).InnerText;



SqlContext
.Pipe.Send(
"Table "

+ objectName +

" was created."
);

}

Aggregate Functions


What is an aggregate function?


A function that takes a series of values and
accumulates them into a single, aggregated result


CLR Aggregate functions are no different


Implemented as a class with four methods:


Init


Accumulate


Merge


Terminate


Aggregate functions in SQL Server 2005 can take
only one parameter, in SQL Server 2008 this
restriction is lifted

Aggregate Function Methods


Init method is called first, here we initialise
our member fields


Accumulate is called to add a value to the
aggregate


Merge is called to add together two
aggregates (for example, as a result of
parallelism)


Terminate is called to return our value


A Simple Aggregate


We will use the SqlUserDefinedAggregate
attribute to specify the serialisation format of our
aggregate


We will track the minimum value in Accumulate,
and also track that we have seen values


In Merge, we will check if the other aggregate
class has seen values
-

if it has, then we take
the minimum value seen


In Terminate we will return the minimum value
we have seen if we have seen any values, or
NULL if we have not seen any values

A Simple Aggregate (1/2)

[
Serializable
]

[Microsoft.SqlServer.Server.
SqlUserDefinedAggregate
(
Format
.Native)]

public struct

MinValue

{


int

minValue;


bool

hasValues;



public void

Init()


{


minValue =
Int32
.MaxValue;


hasValues = false;


}



public void

Accumulate(
SqlInt32

Value)


{


hasValues =
true
;


if

(Value.Value < minValue)


{


minValue = Value.Value;


}


}

...

A Simple Aggregate (2/2)

...


public void

Merge(
MinValue

Group)


{


if

(Group.hasValues)


{


if

(Group.minValue < minValue)


{


minValue = Group.minValue;


}


hasValues =
true
;


}


}



public

SqlInt32

Terminate()


{


if

(hasValues)


{


return

minValue;


}


else


{


return

SqlInt32
.Null;


}


}

}

SqlUserDefinedAggregate


This attribute contains parameters which can
affect the query optimiser, and can cause
incorrect results if set wrong:


IsInvariantToDuplicates


should only be true if
aggregating the same value many times does not
affect the result


IsInvariantToNulls


should only be true if aggregating
NULL values does not affect the result


IsInvariantToOrder


should only be true if the result
is not dependent on the order of the aggregated
values


IsNullIfEmpty


should only be true if the aggregate of
0 values is NULL

Native and UserDefined
Serialisation


If your aggregate contains reference types then you
must implement custom serialisation, and specify
Format.UserDefined in the SqlUserDefinedAggregate
attribute


When specifying UserDefined format, your aggregate
must implement the IBinarySerialize interface, and
specify it’s maximum size in the
SqlUserDefinedAggregate attribute


Maximum byte size can be 1 to 8000 in SQL Server
2005, or
-
1 for any value between 8001 bytes and 2 GB
in SQL Server 2008.


The same serialisation interface applies to both
Aggregates and User Defined Types

CLR Types


What is a user defined type?


A type based on a system type, which may be
bound to old
-
style rules and defaults


CLR Types
are

different


CLR Types can be complex, and can have
properties and methods


CLR Types can have static methods (and the
built in CLR Types in SQL Server 2008 have
some)


A Simple CLR Type


Part 1:


We will define the serialisation format of the type


We will provide the IsNull property and the Null static property to
return a NULL instance


Part 2:


We will define methods to convert the type to and from strings


Part 3:


We will define a property for the symbol, a method to change the
symbol and a static method to create a new currency with a
defined symbol and value


Part 4:


We will declare our member fields and provide implementation of
the IBinarySerialize interface to load and store our type

A Simple CLR Type (1/4)

[
Serializable
]

[Microsoft.SqlServer.Server.
SqlUserDefinedType
(
Format
.UserDefined,


MaxByteSize=24)]

public struct

Currency

:
INullable
,
IBinarySerialize

{


public bool

IsNull


{


get


{


return

_null;


}


}



public static

Currency

Null


{


get


{


Currency

h = new
Currency
();


h._null = true;


return

h;


}


}

A Simple CLR Type (2/4)


public override string

ToString()


{


return

_symbol + _value.ToString();


}



public static

Currency

Parse(
SqlString

s)


{


if

(s.IsNull ||


string
.IsNullOrEmpty(s.Value) ||


s.Value.Length < 2 ||


string
.Equals(s.Value, "NULL",


StringComparison
.OrdinalIgnoreCase))


{


return

Currency.Null;


}


Currency

u =
new

Currency
();


u._symbol = s.Value[0];


u._value =
Decimal
.Parse(s.Value.Substring(1));


return

u;


}

A Simple CLR Type (3/4)


public char

Symbol


{


get


{


return _symbol;


}


}



[
SqlMethod
(IsMutator=
true
)]


public void

ChangeSymbol(
char

newSymbol)


{


_symbol = newSymbol;


}



public static

Currency

CreateCurrency(
char

symbol,
Decimal

value)


{


Currency

c =
new

Currency
();


c._symbol = symbol;


c._value = value;


return

c;


}

A Simple CLR Type (4/4)


private char

_symbol;


private

Decimal

_value;


private bool

_null;



public void

Read(
BinaryReader

r)


{


_null = r.ReadBoolean();


_symbol = r.ReadChar();


_value = r.ReadDecimal();


}



public void

Write(
BinaryWriter

w)


{


w.Write(_null);


w.Write(_symbol);


w.Write(_value);


}

}

The CLR Type in use


Using our static method to create an instance:

DECLARE

@i [Currency]

SET

@i = [Currency]::[CreateCurrency](
'$'
, 0.53)

PRINT

CONVERT
([varchar], @i)



Using our instance mutator method to change the symbol:

DECLARE

@i [Currency]

SET

@i = [Currency]::[CreateCurrency](
'$'
, 0.53)

SET

@i.ChangeSymbol(
'£'
)

PRINT

CONVERT
([varchar], @i)



Retrieving the value of a property:

DECLARE

@i [Currency]

SET

@i = [Currency]::[CreateCurrency](
'$'
, 0.53)

PRINT

@i.Symbol


Things to note


We can call static methods and properties on CLR types
with [type
-
name]::[member
-
name]


IBinarySerialize must be used to serialise types that
cannot be serialised natively


A CLR type should always be able to parse it’s own
string representation


Methods that change the state of the type must be
marked as mutator methods


Although in .NET we can define operator overloads to
specify how operators (e.g. +,
-
, *, /) are applied to our
classes, SQL Server does not use these.

Permission Sets


Each assembly in SQL Server has an
associated ‘permission set’


one of:


SAFE


EXTENAL_ACCESS


UNSAFE


The permission set assigned to the assembly
determines how limited the CLR functionality is


This, in turn, determines how much damage we
can do to the stability of the host process (i.e.
SQL Server itself)

Allowed operations

SAFE

EXTERNAL_ACCESS

UNSAFE

Code access security
permissions

Execute only

Execute + access to
external resources

Unrestricted

Programming model
restrictions

Yes

Yes

No

Verifiability
requirement

Yes

Yes

No

Local data access

Yes

Yes

Yes

Ability to call native
code

No

No

Yes

Programming model restrictions


A few of the programming model restrictions for SAFE
and EXTERNAL_ACCESS assemblies are as follows:


No static fields may be used to store information


PEVerify type safety checking test is passed


No synchronization may be used


Finalizer methods may not be used


No threading may be used


No self
-
affecting code may be used


There are more requirements that must be met under
the above two permission sets


the above is just
some of the more common restrictions.

Visual Studio Database Projects


Visual Studio offers us the ability to create
a ‘CLR Database Project’ which enables
simplified deployment and debugging


Some issues present, though these are
mostly solved in VS2010


Database projects handle the deployment
of the objects that we create simply, which
can be really helpful during
implementation

Performance Example

Phil Factor Speed Phreak Competition 4

The Log Parsing Problem


Producing aggregated results from an IIS log file


Made difficult by the need to determine
‘sessions’


i.e. groups of rows from the same IP
within a set time
-
span


Requirement to produce three aggregated result
sets


visitor summary by day, visitor summary
and page summary by day

How did the CLR help?


CLR methods can yield excellent results when
doing forward
-
only navigation through result
sets, performing aggregation and calculation
along the way


Iteration over the log allowed the data to be
parsed into ‘visits’ by applying the session
timeout


Iteration over collection of visits allowed the data
to be aggregated into country and page
summaries

What was the result?


Execution in 301ms, as opposed to the
closest T
-
SQL approach which took
1779ms.


Efficiency gains through the fact that the
CLR procedure would only occupy 1
thread


Maintenance benefits in readability and
simplicity


Maintenance Example


CLR Procedure to retrieve the free disk space
on any specified server


by Tara Kizer


Uses System.Diagnostics namespace to
interface with WMI to retrieve results


Requires UNSAFE assembly permission set
because
PerformanceCounterCategory

is
synchronised


Simple use of framework classes to retrieve the
information required


When should we use the CLR?


To improve performance:


In situations where a forward
-
only scan over a result set requires
row by row processing to produce an aggregated result set


In situations where complex logic cannot simply be expressed in
T
-
SQL


To improve maintenance:


In situations where information required can be simply obtained
from any .NET process


In situations where interaction with the O/S is required (will
require elevated permission set)


For functionality:


In situations where commonly available .NET Framework
classes provide desired functionality (classic example is Regular
Expression handling


To bring business logic from the middle tier into the database

And when should we not?


When the functionality presented offers little over
it’s T
-
SQL equivalent


Instead of regular DML statements (i.e. a CLR
procedure to insert a row is not the way to go!)


To replicate row by row update logic from a
cursor


conversion to set
-
based T
-
SQL yields
far higher performance


When the desired operation really does not
belong in the database


When the database has to be portable to other
types of database server

Summary


We learned:


How to implement each type of CLR object


How to get data out of and back into SQL Server


That the only SQL CLR object type which differs
significantly from it’s T
-
SQL equivalent is the CLR
User Defined Type


That the CLR can offer significant performance
improvements, particularly for forward
-
only scans and
scalar functions


That we are restricted as to what we can achieve in a
CLR method based on the permission set of the
assembly