Data Access Methods (ODBC, ADO, OLEDB and ADO.NET)

climbmoujeanteaΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 3 χρόνια και 7 μήνες)

407 εμφανίσεις

Data Access Methods (ODBC, ADO, OLEDB and
ADO.NET)

ODBC
was and is one of the oldest technologies. It was developed by Microsoft and probably is one of
the few Microsoft technologies to become popular without being questioned. It is widely "seen as an
open
" standard. ODBC was limited in its functionality and provided low
-
level database access. It

worked
in a two layer manner with a
n ODBC manager and ODBC drivers for various data sources. Any database
provider could write an ODBC driver for its product and r
est assured it will work the same way. It was
one of the biggest hits by Microsoft and is still regarded as a standard de facto data access mechanism.

ADO

came later, added more features and made accessing data much more easy by implementing a
matured Obje
ct Model. ADO was a matured data access method which was built after using the
experiences gained by DAO and RDO implementations.

OLEDB

came even later. It was fully COM based (COM+ to be precise). DAO had become rather complex.
Data access object models c
ame once full circle when OLEDB came. It implemented a rather simple
Object Model (which looked more like JDBC).

ADO.Net

is a later version of ADO which was developed keeping in mind .Net computing framework.
ADO.Net is very different (and supposedly much
more efficient).

I guess you could say ODBC was version 1 of MS's quest for universal data access. It involves a standard
API that uses "drivers" to access data. Depending on the driver used, it could be just a wrapper to a SQL
DBMS client library, or it

could actually have data parser functions built in (thus making it more of a data
engine itself). If the database is just a file, then

the driver was likely more of a
n engine, whereas, if the
database was a server like Oracle, then the driver was more or

less a wrapper for the DBMS client
library.

OLEDB was like version 2. OLEDB was a rethink that included more features, better performance, more
modular organization of the API. You could access data sources using ODBC, or using OLEDB, or using
ODBC, but

wrapping it in the OLEDB API. MS started to obsolete ODBC, by coding MS SQL management
tools to use only the OLEDB interface, instead of ODBC.

ADO is an API for accessing and manipulating data gathered through an OLEDB or ODBC connection. It
doesn't dir
ectly access data (well maybe text/xml).

ADO.NET is like version 3 and is quite different from ADO v2.x. ADO.NET Data Providers are again, a
new version of universal data access, that access data sources directly to gain performance advantages.
ADO.Net a
lso makes good use of XML standards to more effeciently transmit and cache data. It uses a
more TCP
-
like connectionless metaphor so that all ADO.Net providers use sessionless connections. They
get data and don't maintain a session on the server.

Data Acc
ess API of the Day, Part I

Why does it seem like Microsoft is coming up with a new Data Access API every time you turn
around? First there was DbLib, then ODBC, then DAO, RDO, ODBCDirect, OLEDB, ADO,
ADO.NET
--

When is Microsoft going to run out of letters? And now what's this LINQ thing?
And what about these "Entities" someone mentioned the other day? And who is the person
writing this BLOG?

I'll answer the last question first. My name is Mike Pizzo, and I'm

an Architect in the Data
Programmability team at Microsoft


yes, the same team which brought you that “alphabet soup”
of Data Access interfaces. In fact, I've been in the Data Programmability Team (under various
names) for better than 15 years, and perso
nally contributed to each of those APIs (with the
exception of DbLib).

Following is a short history of the evolution of Microsoft Data Access APIs in four parts (so
far…)



Part I


The Early Years (ODBC, DAO, RDO, ODBCDirect)



Part II


Componentizing Data Access (OLE DB)




Part III


Disconnected Programming in a Managed Environment (ADO.NET)




Part IV


Programming at t
he Conceptual Model

(ADO.NET Entities and LINQ)

My goal is to describe the industry trends, platform shifts, and usage scenarios that shaped our
API landscape over the past 15 years (and, in the retelling, perhaps receive some absolution for
my sins…)

S
o, without further “ado”…

Part I


The Early Years…


First, let me apologize. I know


Microsoft knows


that changing something as fundamental as
the way your program accesses and works with data is a tremendous cost in terms of existing
codebase, tools,

components and expertise. Every new option increases the decision matrix for
writing new programs. I hate the very thought of mentioning a new data access API to
customers. Some days I wish we could have stayed just with ODBC all of these years.

But what

if we had? What if we had simply extended ODBC to support building a federated
database out of individually reusable components? What if we had extended ODBC to support a
disconnected programming model? What if we had extended ODBC to work over XML? What
if
we made ODBC work with CLR Objects?

The answer is; it wouldn't be ODBC. ODBC (which stands for Open DataBase Connectivity)
was designed for a specific purpose; to be a common Call Level Interface (CLI)
to a relational
database
. Through several years, m
any Standards meetings, and a good deal of excellent Indian
food (a favorite of Jim Melton’s, the editor of the ANSI SQL Specification) we even made
ODBC into a SQL/CLI addendum to the SQL 92 specification.

In our ODBC 2.0 design preview back in the early

90s I presented a set of proposed ODBC
ISAM (Indexed Sequential Access Method) extensions for working with sets of indexed data that
didn't support Query Processing. The idea was that we could provide a common query processor
over non
-
relational sources l
ike text, excel, dBase and Paradox files. The feedback we got was
positive


that we should support ISAMs, but unequivocal that it
NOT

be part of ODBC.
ODBC's strength, we were told, was its direct tie to relational stores. This feedback totally
surprised
us, but in retrospect they were absolutely right, and I believe that's why ODBC is still
popular today.

And then something happened. Visual Basic became popular as a scriptable "automation
language". ODBC, being a C
-
style interface, was not directly consu
mable from VB. However,
some of you clever folks figured out that Microsoft Access supported executing queries against
ODBC Datasources, and that Access did support scriptable automation through its Data Access
Object (DAO) API. Voila! Now you could write
applications against ODBC sources using VB.

However, DAO went through Access's internal "Jet" (Joint Engine Technology) database engine,
which defaulted to building local keysets for each result in order to do advanced query
processing and cursoring again
st the remote data. This was fine if you needed that functionality,
but significant performance overhead and additional round trips when you didn't.

Enter the Visual Basic team who, responding to customer demand for better performance against
ODBC sources
, came up with something called Remote Data Objects (RDO). RDO implemented
the same DAO programming patterns directly against ODBC, rather than going through Jet.
RDO was extremely popular among VB developers, but the fact that we had two different sets of

automation objects for accessing ODBC sources caused confusion.

But apparently not enough confusion, because our solution was to introduce "ODBCDirect".
Despite its name, ODBCDirect was not a new API; it was just a mode we added to DAO that set
defaults
in such a way as to avoid the overhead of building keysets and such.

And then something happened. The industry made a big push toward object
-
oriented, distributed,
componentized software. With the Object Management Group (OMG) pushing their Common
Object
Request Broker Architecture (CORBA), Microsoft needed first
-
class support for data
access for its Component Object Model (COM)…

Next:
Part II


Componentizing Data Access (OLE DB)


Welcome to Part II of
Data Access API of the Day
; a brief history of the evolution of Microsoft’s
Data Access APIs.

In
Part I

we looked at ODBC as Microsoft’s C
-
based Relational API, and the DAO and RDO
automation interfaces that made relational databases exposed through ODBC available to
languages like VB.

After discarding the idea of compromising ODBC
’s relational heritage by grafting on support for
non
-
relational stores and components, we looked to Microsoft’s Component Object Model
(COM). Our goals for building reusable data access components seemed to fit in well with
COM's componentized architectur
e, and OLE DB was born.



[In retrospect "OLE DB"
--

which stood for "OLE DataBase"
--

wasn’t that great of a name; it
didn’t really have anything to do with "OLE" (which originally stood for "Object Linking and
Embedding", but became a generic term for CO
M components) and its main design premise was
to work with all types of data; not just data from a database.


I used to joke that, other than the
"OLE" and the "DB", the name was just fine. Except the fact that, at the time, the Microsoft
Word spellchecker

suggested "OddBall" as a replacement for "OLEDB", but even this didn’t
seem so bad given that the suggested alternative for our competitor "IDAPI" was "Diaper Pail"…
But I digress…]

In addition to being a first
-
class COM
-
based interface for working with
data, OLE DB was
designed to support the concept of a "federated" (or componentized) database.


The idea was to
define a common base "Rowset" interface that could be used to represent a set of "tuples" (rows)
regardless of where they came from or what they

represented
--

a query result, a base table or
view, an ISAM, Text, or Excel file, the registry, email, directory services, etc.


OLE DB was
even intended to be a part of "Cairo OFS"


Microsoft’s first foray into an object file system.


On
top of this ba
se Rowset interface, you could use COM’s QueryInterface to detect whether
additional functionality (for example, scrolling, updating, or index
-
based navigation) was
supported.



Given this common data representation, we could build common components to add

functionality against any type of data, such as a common indexing engine, a common cursoring
component, or a common query processor that could provide database functionality on top of a
multitude of both relational and non
-
relational data sources.


The fa
ct that the representation of
the source data matched the representation of the final results provided performance benefits as
well as common model for code and tools to work with sets of data.

OLE DB's ability to expose common interfaces over a multitude

of data source types made it
particularly appealing to distributed services


in fact, Microsoft SQL Server today uses an OLE
DB Rowset representation as its internal interface between the Relational Engine and Storage
Engine, as well as supporting distri
buted queries among SQL Server and other heterogeneous
data sources, and to support full
-
text indexing of SQL Server content through the Microsoft
Search engine.

Like ODBC, OLE DB was a "low level" interface that used pointers, explicit memory
management,

and explicit lifetime control.


And like ODBC, it required a separate wrapper API
to make it available (and suitable) to automation languages like Visual Basic.


So, we rearranged
the letters of DAO, quickly rejected "DOA", and came up with "ADO".


"ADO"
initially stood
for "ActiveX Data Objects" (the "X" was silent, but had to be there due to a copyright issue) but
eventually we dropped the ActiveX altogether, and it just became ADO.

ADO followed the same Connection/Command/Recordset model of DAO, but in
tegrated
technology from the acquired FOX database team for an advanced local cursor engine.


This
"Disconnected Recordset" exposed the same ADO Recordset model as a forward only cursor
over a stream of results from the database, and properties on the Reco
rdset were used to
determine whether the results were scrollable or updatable, and what type of isolation the results
had from changes made to the store.


The Disconnected Recordset used metadata from the
original query to generate insert, update, and dele
te statements to propagate local changes back
to the store.

Together, ADO and OLE DB made up what we called "Microsoft’s Universal Data Access".

And then something happened.


Recognizing that we were losing developer mindshare to Java
(COM in its origina
l form was complex, and things like reference counting, memory handling,
etc. were hard to get right) Microsoft embarked upon an ambitious project to develop a new,
language
-
independent, managed framework for writing applications (.NET Framework) along
wit
h a new programming language (C#).


One of the strengths of the new platform was that the
various components and APIs of the Framework were designed and reviewed as a cohesive unit,
rather than a semi
-
related set of technologies shipped by different isolat
ed teams.

Welcome to Part III of
DataAccess API of the Day
; a brief history of the evolution of Microsoft’s
Data Access APIs.

In
Part I

we looked at ODBC

as Microsoft’s C
-
based Relational API, and the DAO and RDO
automation interfaces that made relational databases exposed through ODBC available to
languages like VB. In
Part II

we looked at OLE DB as Microsoft’s first
-
class Data Access API
for componentized data access within Microsoft’s Component Object Model (COM)
environment.

With the introduction of a common, coherent, langua
ge
-
independent, managed framework for
writing applications (.NET Framework), Microsoft (again) asked the question “How do we
provide first
-
class support for data?”

In August of 1998 a group of architects from Microsoft met in Semiahmoo, WA, for a
"Beachsi
de Offsite for Lightning Technology" (“BOLT”


pretty clever, huh?). "Lightning” was
the code name for what at the time was called "COM3" (until we realized that naming a directory
“COM3” caused strange things to happen in DOS) and eventually became the .N
ET Framework.
I attended the offsite as a representative for data access, with the hope of re
-
using much, if not
all, of the existing ADO/OLE DB architecture, and spent the next year pushing us not to reinvent
the wheel.

The truth was, though, that the wo
rld had changed since we did ADO and OLE DB. With the
growing popularity of the Internet (thanks Al…), disconnected programming was becoming
more important and prevalent in the industry, and XML was becoming an increasingly popular
way to work with data. A
DO.NET addressed these new challenges by making an explicit
separation between connected data access (through provider
-
specific "Data Providers") and
disconnected data access (through a common in
-
memory "DataSet") with an explicit, extensible
"DataAdapter"

mechanism for bridging the two.

The separation between connected and disconnected data access was in sharp contrast to
ADO/OLE DB which attempted to abstract away remote access and data source functionality
differences. While ADO’s model of having a sing
le RecordSet object that could be either
forward
-
only or scrollable, and could represent local or remote data, seemed like a nice
simplification, in practice differences in functionality, latency, memory usage, and where errors
could occur made it difficul
t to hide the details of such diverse implementations under a common
façade. One of the lessons we learned from OLE DB, and DCOM in general, was that even if
you made the interfaces look the same, such differences fundamentally affected the way
application
s behaved. To build reliable, responsive applications, remote access needed to be
done asynchronously and in a non
-
blocking fashion, which meant knowing when an operation
was (potentially) remote.

Although we attempted to preserve as much of the ADO progr
amming paradigm as possible in
ADO.NET (for example, the connection/command/result model for providers) the move to
ADO.NET was dramatic for most programmers. The first response from beta testers was almost
universally "What happened to my ADO?" And then,
as they started to use the new ADO.NET
there was a gradual realization of the power that the explicit separation between connected and
disconnected provided, and with it a feeling of "I could never go back…"

Each of these evolutions was driven by a major
platform shift; a client interface for talking to
relational stores; scripting and RAD (rapid application development) scenarios; COM/DCOM; a
cohesive .NET Framework. And in each case, we strove to provide the right API for the new
platform while balancing

retaining concepts from previous APIs with taking into account lessons
learned and changes in the industry, including federation/componentization of the store, XML,
disconnected programming paradigms, etc.

So what's ADO.NET Entities and LINQ?

Next:
Part IV


Programming to the Conceptual Model

Welcome to Part IV of
Data Access API of the Day
; a brief history of the evolution of
Microsoft’s Da
ta Access APIs.

In
Part I

we looked at ODBC as Microsoft’s C
-
based Relational API, and the DAO and RDO
automation interfaces that made relational database
s exposed through ODBC available to
languages like VB.


In
Part II

we looked at OLE DB as Microsoft’s first
-
class Data Access A
PI
for componentized data access within Microsoft’s Component Object Model (COM)
environment.


In
Part I
II

we looked at the introduction of ADO.NET, a managed API for the
.NET Framework that revolutionized the relationship between connected data access and
working with disconnected sets of data, and started to rationalize the relationship between
relational

data and XML.

ADO.NET Entities builds upon our mutual investment in ADO.NET by adding the ability to
write applications against a rich conceptual "Entity Data Model" schema, rather than a flat
relational database schema. The Entity Data Model (EDM) exten
ds the relational data model
with Entity
-
Relationship (ER) constructs for modeling real
-
world concepts such as Inheritance
(cars and trucks are vehicles), Relationships (customers have orders), and complex members
(street, city, region, and postal code com
posed as a single "address" property within a customer).


An extended
-
SQL grammar called "Entity SQL" allows you to directly query your conceptual
schema, leveraging inheritance, accessing complex members, and navigating relationships. In
many cases, build
ing these concepts into the conceptual schema and query language removes the
need for complex joins, unions, and subqueries to do conceptually simple operations.

These rich conceptual schemas are exposed and queried through an "Entity Client".


The Entity

Client is an ADO.NET Data Provider that builds queries against storage
-
specific providers using
client
-
side read/write "views". Queries and updates written against these conceptual views are
expanded by the Entity Client and executed as queries against un
derlying storage
-
specific
providers.


All the actual query execution is done in the store (not on the client), and the results
are assembled into possibly hierarchical, polymorphic, results with nesting and composite
members. This separation between the co
nceptual model that the application targets and the
storage schema of the database is an extremely powerful concept that we believe will greatly
simplify the authoring and maintenance of database applications.

Exposing client views through an ADO.NET Data

Provider allows us to retain the familiar
ADO.NET programming model, leveraging investments in code, tools, and knowledge built
around ADO.NET.


The fact that the Entity Client consumes existing ADO.NET Data Providers
(extended to support a new canonical
query tree representation) builds on the growing
community of popular, as well as custom, ADO.NET Data Providers.

Many of the constructs added we added to the Entity Data Model also exist in popular object
-
oriented programming environments, including the
.NET Framework.


This is not by accident.

Since version 1.0, customers have sought an Object/Relational solution within the .NET
Framework. Microsoft has made several attempts at an O/R solution for .NET, most notably
"ObjectSpaces" which, though never re
leased, was premiered as a "technical preview" at PDC
when we launched the .NET Framework, and pretty much every conference there
-
after.

Like most O/R solutions today, ObjectSpaces attempted to support a rich set of mappings and
scenarios through custom q
uery generation. Adding support for a new type of inheritance
mapping, for example, meant adding code to a query generator to insert the necessary join
conditions in all the right places.


Understanding how this new construct composed with other
joins, pro
jections, unions, and predicates added throughout the query to model other object
-
like
concepts made the code complex and somewhat brittle.


Trying to understand how to generate
updates against such complex queries, or if such updates were even possible, w
as even more
difficult. Worse of all was trying to verify that all possible combinations of the constructs
composed into a query were handled correctly.

ADO.NET Entities takes a different approach.


By modeling a rich conceptual schema through
client
-
side

query and update views, the Entity Client leverages the significant investment and
research that has gone into relational database view theory. Updating, for example, is done by
applying well
-
defined view maintenance techniques to the update views in orde
r to produce a set
of delta expressions that are combined with the query views to produce update expressions. The
resulting query views and update processing are both composable and verifiable.

For those that prefer to work with data as strongly typed CLR

objects rather than untyped result
records, ADO.NET Entities includes "Object Services" which build on top of the Entity Client
and allows you to query and retrieve results in terms of application "Data Classes" whose
identity and changes are managed for
you.

So what about LINQ?

"LINQ" stands for "Language INtegrated Query". As the name implies, LINQ integrates query
concepts directly into the programming languages, enabling data access code to be type
-
checked
by the language complier, and developer tool
s like Intellisense to make it easier for developers to
write queries. This, along with higher level conceptual models (like Entities), contributes to
reducing the impedance mismatch between applications and data.

LINQ is supported as a first
-
class citize
n within the ADO.NET Entity Framework through
"LINQ to Entities".


LINQ to Entities is part of the Object Services layer which enables you to
build queries through strongly typed language expressions and built
-
in query comprehensions (in
C# and VB) as well

as textual Entity SQL statements.


This means that the same conceptual
client views are available through existing ADO.NET provider programming patterns or through
consuming Object Services using either ad
-
hoc textual queries or integrated language querie
s.

ADO.NET Entities, along with LINQ, will be featured in the upcoming February Orcas CTP.


In
the meantime, more on the ADO.NET Entity Framework can be found
here
, as well as in this
Channel 9 video
.

It’s been interesting for me to see our data access APIs evolve over the years from ODBC, a C
-
level interface for accessing data
in a SQL database that is still popular today, to a broad Entity
Framework that supports modeling data as rich conceptual objects, querying through a common
extended SQL grammar or query constructs embedded within the language, and interacting with
the dat
a as business objects with identity management and change tracking.

What’s next for Microsoft in Database APIs?


Time will tell, but the bet on the Entity
Framework, and the Entity Data Model in particular, is big. You can expect to see more and
more serv
ices within SQL Server, as well as technologies throughout the company, embrace and
leverage the Entity Data Model as the natural way to describe data in terms of real
-
world
concepts.


Although I don’t admit it to many people, after almost 20 years at Micr
osoft I still find
working with data interesting, and I look forward to continuing the journey with you, wherever it
may take us.

Mike Pizzo

Architect, Data Programmabilty


Evolution of the SQL Server Data Access Model

Applies to:


Microsoft® SQL Server™


Microsoft® ADO.NET

Summary:

Get an annotated overview of the data access layers available today that work with SQL Server,
including ADO, the XML interface of SQL Server 2000 (SQLXML), ADO.NET, and the upcoming ADO.NET 2.0.
(14 printed pages)

Contents

The Road to OLE DB and ADO

With
Further ADO

ADO.NET

Working wit
h SQL Server Through XML

The SQLXML Managed Library

What Will Be New (and Hot) With Visual Studio 2005

Conclusion

In the past few years, several programming models for Microso
ft® SQL Server™ databases have followed
one upon another without a break. The features provided by the underlying database management system
(DBMS) changed over time, catching up with users' requirements, and the overall goal of designing a more
powerful a
nd successful product. Likewise, the associated data access models evolved to incorporate newly
released features, while still providing an easy and increasingly abstract way for developers and architects to
build an effective data access layer in end
-
appl
ications.

I worked on my first Windows® project in the early 1990s. Our application needed some sort of database
support. In itself, the database wasn't a critical part of the application, and there was no reason for
choosing, say, Microsoft® Access over P
aradox, or SQL Server™ instead of Oracle. We just needed a
common programming interface and the ability to read, write, and filter a few records. We ended up
adopting Open Database Connectivity (ODBC) as the emerging standard for open database connectivity
. It
wasn't an obvious choice, though. It was perfect for our purposes because we could afford it. The ODBC API,
in fact, was designed for C/C++ programmers. It was great for us, just not for everybody.

ODBC was the first attempt to normalize access to a v
ariety of DBMS systems through a common API.
Before that, programmers were unfairly expected to have an intimate knowledge of the database's nuts and
bolts. Programmers had to be in sync with the specialties and idiosyncrasies of the particular SQL dialect

and the underlying architecture design. Although not perfect on its own, ODBC showed the right way to go.

In this article, I'll provide an annotated overview of the data access layers available today to work with SQL
Server. I'll focus on ActiveX Data Ob
jects (ADO), the XML interface of SQL Server 2000, and ADO.NET, but I
won't forget to add a word or two on the future, specifically ADO.NET 2.0.

The Road to OLE DB and ADO

ODBC was designed as a C
-
oriented application programming interface (API), but the
advent of Component
Object Model (COM) and rapid application development (RAD) tools like Microsoft® Visual Basic® urgently
posed the problem of finding a COM
-
oriented alternative. More or less at the same time, Access and Visual
Basic developers were enjo
ying the Data Access Object (DAO) object model bundled with Microsoft Access,
and leveraging the Jet database engine. DAO was first enhanced to call into DBMS systems such as SQL
Server and Oracle through ODBC. Basically, DAO acted as a proxy between code
and the ODBC API. This
model introduced some overhead, though, because the
resultset

had to be converted to the DAO object
model to really become usable. To work around the issue, Microsoft first introduced RDO and then
ODBCDirect. Remote Data Objects (RDO
) is a COM object model that exposes the functionalities of the
ODBC model. It is not richer or more powerful than ODBC; it is simply a COM wrapper for ODBC designed to
make ODBC accessible from within script, Web, and Visual Basic applications. ODBCDirect
, instead, was
created to speed up existing DAO applications calling into SQL Server. ODBCDirect is a sort of DAO working
mode; when enabled, it uses RDO and ODBC instead of Jet to get and set data on SQL Server databases.

However, the first significant in
novation in data access after ODBC is OLE DB. OLE DB was introduced in
1998 as part of the Visual Studio® 6.0 platform. OLE DB is the programming side of what was then called
the Universal Data Access (UDA) vision. The inspiring principle of UDA was making

any data storage
accessible through a common set of COM interfaces. Basically, it is the same abstract idea behind ODBC, but
reworked and refined to fit into the most popular and successful programming platform of that time, the
Component Object Model.

O
nce again, the model came with little imperfections at first. Fully COM
-
based, the consumer/provider
model of OLE DB was relatively easy to implement in C++ applications, but proved impossible to follow in
Visual Basic and Active Server Pages (ASP) applica
tions. Both Visual Basic and ASP components are unable
to work with low
-
level COM interfaces, not to mention the overall programming model is extremely
sophisticated and requires a good deal of work even for simple operations.

ADO is the COM library creat
ed to make the OLE DB model accessible to all programming environments.
Like RDO for ODBC, it sits between the application and the low
-
level data access API and mediates
operations, ensuring that the stream of data retrieved from the DBMS is exposed to cal
lers in an appropriate
and convenient format.

With Further ADO

The OLE DB programming model consists of two interacting entities, the consumer and the provider. Both
are implemented as COM objects and are required to expose a fixed number of interfaces to

guarantee
interoperability. The provider wraps the functionalities of a given data storage, and makes functionalities
externally available through the suite of interfaces designed for OLE DB providers. The consumer must know
and be able to invoke the prov
ider's interfaces. Next, it is responsible for managing data in input and output.

When the client reads data out of the data source, the consumer receives a pointer to an object that
implements the
IRowSet

interface. The data is available as a stream of by
tes, and mapping it to more
manageable containers like arrays or
Recordset

objects is up to the consumer.

ADO deals with the underlying OLE DB machinery for you. When used, ADO is the real OLE DB consumer,
not your application. In terms of raw performance
, this inevitably adds some overhead due to both data
packaging and some housekeeping code. It is remarkable, in fact, that ADO cross
-
checks the value of any
object properties to ensure they do not contradict one another. You might be surprised to see that

ADO is
smart enough to fix some incompatible settings, such as the cursor type and location. If the two are in direct
disagreement, ADO transparently adjusts their values. Here's a quick example.

rs.CursorLocation = CursorLocationEnum.adUseClient rs.Open(
cmd, connString,
CursorTypeEnum.adOpenKeyset)

First, the
Recordset

object is configured to use a client
-
side cursor location; then, when the
Open

method
is invoked, the cursor type parameter is set to a server
-
side cursor. No exception is thrown, but the
cursor
type parameter is silently adjusted to
adOpenStatic

to reflect the client
-
side location.

Table 1 lists the key objects that form the ADO object model.

Table 1. Main objects in the ADO object model

Type

Description

Connection

Represents a connectio
n to the specified data source

Command

Represents a command to execute

Field

Represents a column in a returned record

Parameter

Represents a parameter to add to a command

Recordset

Represents a block of records generated by a query

In order to execut
e a command with ADO, first create a
Command

object, then bind it to an active
connection, and run it. If the command is expected to return a block of records, you can opt for a
Recordset

object. The
Open

method of the
Recordset

object does everything

open
s the connection,
executes the command, and fills the
Recordset

with the
resultset
. Here's a quick example.

Dim rs As New Recordset Rs.Open("SELECT * FROM employees", _
"PROVIDER=sqloledb;DATABASE=northwind;SERVER=(local);" & _ "Integrated
Security=SSPI",
_ CursorTypeEnum.adOpenForwardOnly, _
LockTypeEnum.adLockReadOnly)

ADO is the primary choice (often the only reasonable choice) for building a SQL Server data access layer in
COM, ASP, and Visual Basic 6.0 applications. It also has been incorporated and i
s easy to call from other
non
-
Microsoft RAD tools such as Delphi and PowerBuilder.

What if You're Building .NET Applications Instead?

ADO is a COM object, so there's no architectural or syntax counter indication in calling it from within a .NET
applicati
on. This said, using ADO in .NET is not generally a good idea.

In the .NET Framework, a large part of the ADO functionalities have been incorporated into the ADO.NET
framework. The OLE DB provider is no longer the preferred way to access data replaced by t
he managed
data provider. In addition, in the .NET Framework you can call a fair number of OLE DB providers directly,
without the intermediation of ADO. In summary, ADO is not a good or recommended choice for .NET
applications, except in a couple of situat
ions.

ADO.NET doesn't support server cursors and doesn't supply a rich object model for schema manipulation like
ADOX. If your application needs these features, ADO is the preferred choice, no matter if you're building a
.NET
-
managed system.

ADO.NET

ADO.
NET is a set of classes that expose data access services to .NET applications. Several syntax differences
exist between the object models of ADO and ADO.NET. In spite of this, the functionalities of ADO and
ADO.NET look much the same, just because Microsof
t strove to align as many programming aspects of the
ADO.NET object model with ADO as was possible. In this way, data developers don't need to get familiar
with too many new concepts in order to use ADO.NET, and can migrate on a relatively short learning c
urve.
With ADO.NET, you probably won't be able to reuse much of your existing code; you will certainly be able to
reuse all of your ADO skills, though.

ADO.NET consists of two high
-
level blocks, data containers and data providers. Data container classes fo
rm
a sort of in
-
memory database model. Classes like
DataSet
,
DataTable
, and
DataView

are array
-
like
classes and can be filled with any data, including data retrieved from a database. In addition, these classes
provide a disconnected and memory
-
based databa
se model that supports advanced features such as tables,
relations, constraints, and primary keys.

Managed providers are the second group of logical components in the ADO.NET architecture.. They are the
.NET counterpart to OLE DB providers. Managed provid
ers wrap DBMS systems, and in general data stores,
and expose their functionalities through common programming interfaces and data containers.

Managed providers differ from OLE DB providers in at least two key areas; they're managed object (as
opposed to
COM objects), and they're simpler objects (implementing a more compact suite of interfaces).
Another big benefit of managed providers is that they return data using high
-
level, framework
-
specific
objects, making any conversion to manageable containers (lik
e the
Recordset

in ADO) totally unnecessary.

A .NET application that needs to work with SQL Server will use the classes of the SQL Server managed
provider to open and close a connection, prepare and run commands, and parse the results of a query. At
the sa
me time, ADO.NET container classes (specifically, the
DataSet

class) will be used in all those
situations in which the results of a query must be cached and used in a disconnected manner. The
DataSet

is also a fundamental tool to update a database in batch

mode.

The ADO.NET object model revolves around the objects listed in Table 2.

Table 2. Logical Components of a Managed Provider

Component

Description

Connection

Creates a connection with the specified data source, be it
SQL Server, Oracle, or any data so
urce for which you can
indicate either an OLE DB provider or an ODBC driver.

Transaction

Represents a transaction in the underlying database
server.

Command

Represents a command that hits the underlying database
server.

Parameter

Represents a parameter
for the command object.

DataAdapter

Represents a database command based on a
disconnected set of records. The DataAdapter can be
used to get a collection of records or to batch
-
update the
database with the values in the current collection. The
collection
of record is represented with a
DataSet

or
DataTable

class.

DataReader

Represents a read
-
only, forward
-
only cursor created on
the underlying database server.

The .NET Framework supplies a managed provider for SQL Server 7.0 and newer versions, which is b
y far
the most effective way to access SQL Server from within .NET applications. In general, a .NET application
can access a SQL Server database in two ways. It can use the SQL Server managed provider
(recommended), or the OLE DB managed provider. In the l
atter case, the OLE DB provider passes through
the COM
-
based OLE DB provider used by ADO. The rub is that the COM
-
based OLE DB provider component
(called SQLOLEDB) passes through the COM Interop layer, which seamlessly provides for data and type
marshaling

with the accompanying overhead that this requires. (Using the SQLOLEDB component is
necessary if, for some reason, you have to connect to SQL Server 6.5.)

The ADO.NET programming model is based on a relatively standard sequence of steps that first create
a
connection, then prepare and execute a command, and finally process the data retrieved. This simple model
is broadly equivalent to the ADO's (the names of the objects involved are also similar), and doesn't
significantly change if you switch to another d
atabase.

Dim conn As New SqlConnection(connString) Dim cmd As New SqlCommand(cmdText,
conn) cmd.Connection.Open() Dim cursor As SqlDataReader =
cmd.ExecuteReader() ' Process the data cmd.Connection.Close()

The code above shows how to run a query. To execu
te a non
-
query statement like an UPDATE, change it as
follows.

Dim conn As New SqlConnection(connString) Dim cmd As New SqlCommand(cmdText,
conn) cmd.Connection.Open() cmd.ExecuteNonQuery() cmd.Connection.Close()

If you're going to execute a stored proced
ure, set the command text to the stored procedure's name and
add the following declaration to the command object.

cmd.CommandType = CommandType.StoredProcedure

A key difference between ADO and ADO.NET is that ADO is a general
-
purpose object model and a di
rect
result of the OLE DB and UDA vision. Simply put, it is one programming model that fits all data stores (as
long as the data store stocks an OLE DB provider). The same code works unchanged (or requires limited
changes) if you switch, say, from Access t
o SQL Server.

All .NET managed providers are optimized for the needs and the features of a particular data source. Unlike
OLE DB providers, they have a smaller number of interfaces to implement and can offer a mechanism closer
to the real working of the D
BMS, with no abstraction or unnecessary complexity. This increased speed
comes at a price; writing database agnostic code is a bit harder, although perfectly possible.

Each target database has its own working set of classes to carry out basic functions

co
nnection,
transaction, commands, and the like. To open a connection with SQL Server, you use the
SqlConnection

class. To run a command, the
SqlCommand

class is needed. To process a read
-
only cursor, the
SqlDataReader

object is necessary. Different classes
are required to target Oracle or Access. For example,
you use
OracleConnection

and
OracleCommand

classes to work with the Oracle database. The naming
convention in use guarantees that names are similar, but the actual behavior may differ quite a bit.
Diffe
rences don't show up that much with basic operations like executing a query or inserting a new record;
it is likely to happen, instead, with more specific tasks like managing BLOB fields or server cursors.

Note

Good news is expected with ADO.NET Whidbey

on the point of database
-
independent code. The next
version of ADO.NET features a brand new factory model for .NET data providers. Stay tuned with this
Developer Center to learn more in the upcoming months.

Working with SQL Server Through XML

In the past
few years, the evolution of the computer industry has raised the need for total software
integration and communication. As a result, any available data must be transformable into another model in
order to be consumed as required by the context. The capabil
ity to accept and return data through XML
streams is a key factor for a modern and up
-
to
-
date database.

Microsoft SQL Server 2000 comes with an embedded engine capable of manipulating data as XML. The
syntax of some T
-
SQL commands has been modified to inco
rporate these new features, and a new set of
commands has been added. All in all, a new XML
-
based API exists for SQL Server 2000 which lets users
send and receive XML streams. The low
-
level changes in the SQL Server programming interface also have
effects
on the higher
-
level interface modules like ADO and ADO.NET. The SQLXML managed library in
particular provides many new .NET classes specifically designed to support the XML capabilities of SQL
Server. Let's review these capabilities first and the SQLXML AP
I next.

In SQL Server 2000 there are two basic ways to retrieve XML data. You can use the XML extensions to the
SELECT

statement or, alternatively, execute a query on a particular text or BLOB field that contains text
formatted as XML data. SQL Server 2000

doesn't mark those fields with a special attribute or data type to
indicate they contain XML data, though. Only with SQL Server 2005 will the XML data type allow you to
create table columns made of XML data.

If the SELECT statement contains a trailing
FOR

XML

clause, the
resultset

is transformed in a string of XML
text. Let's consider the following statement.

SELECT TOP 3 customerid, companyname FROM Customers FOR XML AUTO

The generated output is an XML fragment, as shown below.

<Customers customerid="AL
FKI" companyname="Alfreds Futterkiste"/> <Customers
customerid="ANATR" companyname="Ana Trujillo Emp ..."/> <Customers
customerid="ANTON" companyname="Antonio Moreno Taquería"/>

Within the FOR XML clause, you can specify a working mode. Feasible values ar
e listed in Table 3.

Table 3. Modes of the FOR XML extension

Mode

Description

AUTO

Returns query results as a sequence of XML nodes
named after the table. Columns are rendered as
attributes. If the additional ELEMENTS clause is specified,
rows are render
ed as child nodes instead.

RAW

Returns query results as a sequence of generic <row>
nodes with as many attributes as the selected fields

EXPLICIT

Defines the schema of the XML document being returned.

Note that the XML data may optionally include schema

information if you append the XMLDATA attribute to
the FOR XML mode of choice, as shown below.

SELECT * FROM Employees FOR XML, XMLDATA

Schema information is incorporated in a <schema> node prepended to the rest of the XML fragment. As you
may have notice
d already, what SQL Server returns is not a whole, well
-
formed XML document. It fulfills all
syntax requirements for a well
-
formed document, but lacks a unique root node. For this reason, it is called
an XML fragment.

If the query joins two tables on the
value of a column, then the resulting XML schema provides nested
elements. For example, consider the following query.

SELECT Customers.CustomerID, Customers.ContactName, Orders.OrderID FROM
Customers INNER JOIN Orders ON Customers.CustomerID = Orders.Custo
merID FOR
XML AUTO

Here's the XML output. As you can see, it automatically groups child records below the parent.

<Customers CustomerID="ALFKI" ContactName="Maria Anders"> <Orders
OrderID="10643"/> <Orders OrderID="10692"/> <Orders OrderID="10783"/> :
</Cu
stomers> <Customers CustomerID="ALFKI" ContactName="Ana Trujillo">
<Orders OrderID="11459"/> <Orders OrderID="10987"/> : </Customers>

SQL Server 2000 also supports XML as an input format for write operations. In particular, it accepts XML
input through th
e following channels: the
OPENXML

statement, XML bulk loading, and
updategrams
.

OPENXML

is a T
-
SQL function that takes care of inserting data represented as an XML document into a
table.
OPENXML

parses the contents of the XML document and exposes it as a r
owset. Bear in mind that
OPENXML

has been designed and optimized to handle relatively small documents up to 50 KB. For
documents over that threshold, you should constantly monitor the response time, to decide whether you
should stick with
OPENXML

or if you

need something different, like XML bulk loading.

XML bulk loading is another technique that lets you load XML data into SQL Server tables. Functionally
similar to
OPENXML
, bulk loading is implemented through a COM object and provides higher performance
w
hen large amounts of XML data are processed. The bulk loader reads the XML data and tries to use the
SQL Server's BULK INSERT statement, meaning it attempts to batch all the records in a single shot. By
doing this, it achieves a higher throughput. In other

situations

for example, when you have an identity
column that needs to propagate an ID from a parent to a child row

the bulk loader processes records
individually.It identifies the database tables and columns involved, and then prepares and executes SQL
s
tatements. When the bulk loader encounters an XML element, it utilizes schema information to associate it
with a record on a table. The record is actually written when the closing tag for that element is found. This
algorithm ensures that, in cases of pare
nt
-
child relationships, all the children are always processed before
the parent row.

As mentioned earlier, XML Bulk Loading is implemented through a COM object. The object's progID is
SQLXMLBulkLoad
. The following Visual Basic 6.0 code shows how to use th
e object.

conn = "PROVIDER=sqloledb;SERVER=(local);DATABASE=" & _
"northwind;Integrated Security=SSPI;" Set bulk =
CreateObject("SQLXMLBulkLoad.SQLXMLBulkload.3.0") bulk.ConnectionString =
conn bulk.Execute "schema.xml", "data.xml"

If run within a transac
tion, the XML bulk loader uses a temporary file for each table involved in the
operation. Each file gathers all the changes for the specified table. When commit occurs, the contents of the
various files are flushed into the corresponding SQL Server table u
sing the BULK INSERT statement.

Finally, updategrams are an XML description of the changes (insertions, deletions, and updates) that must
be applied to the database. Updategrams can be executed in various ways. For example, you can send the
updategram text

to SQL Server over HTTP, you write the XML content out to a file and then point the
browser (or any other HTTP
-
enabled software) to that URL so that the content gets executed, or you can
send an updategram out using an ADO command stream.

The SQLXML Manag
ed Library

SQLXML 3.0 (Service Pack 2 now available) is an extension to SQL Server 2000 aimed to keep it current
with evolving W3C standards for XML. Available as a free download, SQLXML 3.0 also includes many
managed classes in order to expose some of the

XML
-
based functionalities to .NET applications. SQLXML 3.0
gives SQL Server 2000 the ability to expose stored procedures as a Web Service via SOAP and also adds
support for ADO.NET diffgrams and client
-
side XML transformations. To learn more about using S
QLXML 3.0,
check out the Data Access and Storage Developer Center's
SQLXML content

[
http://msdn.microsoft.com/data/sqlsolutions/sqlreldata/default.aspx#xml ] .

W
hen you install SQLXML 3.0, you also get a few managed classes for .NET programming. The set of
SQLXML managed classes consists of two main classes


SqlXmlCommand

and
SqlXmlAdapter

plus a
few ancillary classes.
SqlXmlCommand

is the fundamental class used t
o execute an XML
-
driven command
against SQL Server. The
SqlXmlAdapter

class is actually a wrapper for the command that just exposes the
results through a
DataSet

object. A good introduction on SQLXML managed classes can be found
here

[
http://msdn2.microsoft.com/en
-
us/library/aa902660(SQL.80).aspx ] .

The key thing to note about SQLXML 3.0 is that any operations directed at SQL Server pass through the OLE
DB provider for SQL Ser
ver (SQLOLEDB). This situation will change in Visual Studio 2005, where all the
XML
-
oriented features of SQL Server (e.g., bulk loading, XML data readers) will be fully integrated with the
.NET Framework.

What Will Be New (and Hot) With Visual Studio 2005

A lot of new features are ready for SQL Server programmers in ADO.NET 2.0. First and foremost, the
excellent provider factory model makes it possible to create a connection object in an indirect way,that is,
using a factory object instead of the usual
new

operator. Batch update enhancements are also available for
all data providers. Specific to the SQL Server managed provider are other features, like bulk copy and
asynchronous execution of commands.

In ADO.NET 2.0, each .NET data provider defines a factory

class. Among other things, this class is
responsible for returning a connection object for the provider. You pass the factory the name of the provider
you want to use, and it returns a connection object of the correct type, as shown by the following code
snippet.

DbProviderFactory fact; fact =
DbProviderFactories.GetFactory("System.Data.SqlClient"); IDBConnection conn
= fact.CreateConnection();

The global
GetFactory

method takes the name of the provider with which you want to work, and returns
the corresp
onding factory class. Once you've got it, you call the
CreateConnection

method to obtain the
connection object in an indirect way. The introduction of this model makes generic database programming
much easier to code.

Batch update is a feature introduced w
ith ADO 2.1 and significantly enhanced in ADO.NET 1.x. The biggest
limitation of ADO.NET batch update is that records are always submitted one at a time. This means that, for
example, if one hundred rows have been updated, inserted or deleted, then one hun
dred roundtrips to SQL
Server occur to complete the operation. ADO.NET 2.0 introduces a new property on the data adapter object
that lets you control the number of records grouped together and sent to the DBMS in a single shot. The
property is named
BatchU
pdateSize
, and set to 1 by default. You can increase that number at will, but you
will notice that an overly large number may clog the network and result in a loss of performance instead.

The bulk copy functionality provides a much faster way to transfer l
arge amounts of data into a SQL Server
table. The performance you get with a specialized operation, such as a bulk copy, is typically remarkably
better than using an INSERT statement. In SQL Server 7.0 and newer, the BULK INSERT statement is used
to copy f
ormatted data, stored in a ASCII file, into a SQL Server table. You can use this statement from
within any .NET Framework 1.1 applications that use an appropriate
SqlCommand

object. In ADO.NET 2.0,
a new class named
SqlBulkCopyOperation

provides bulk copy
facilities directly at the application level,
without the need to know about SQL Server internal utilities and T
-
SQL commands.

A database operation is normally a synchronous operation, meaning that the caller regains the control of the
application only whe
n the interaction with the database has completed. This way of working may pose
performance and scalability issues in cases of lengthy operations. ADO.NET 2.0 provides true asynchronous
support for two specific scenarios: opening connections and executing
commands. Leveraging these
features, you can open a connection and populate the command object while the connection is physically
established. This is a clear performance advantage, because it really gives you a bit of parallelism if SQL
Server lives on a
remote machine.

In ADO.NET 2.0, you find an additional pair of methods to open a connection,
BeginOpen
and
EndOpen
. A
connection can only be closed synchronously, though.

string connStr = "SERVER=...;DATABASE=...;... "; SqlConnection conn = new
SqlConnect
ion(connStr); SqlCommand cmd = new SqlCommand(query, conn); //
Begin connecting IAsyncResult ar = conn.BeginOpen(null, null); // Poll until
connection is opened while(!ar.IsCompleted) { // Do some work in the mean
time } // End connecting conn.EndOpen(ar);


The asynchronous pattern is even more interesting if applied to the execution of commands. The support for
asynchronous operations is built into the
SqlCommand

class and is limited to executing non
-
query
commands, getting a reader, and an XML reader. Let
's briefly review the case of readers.

The first step is calling the
BeginExecuteReader

method, to which you pass a callback function and an
object that represents the state of the particular call. The state object is any object that contains information
u
seful to the callback. In this case, I simply pass a reference to the command object.

IAsyncResult ar = cmd.BeginExecuteReader(MyCallback, cmd);

After initiating the asynchronous operation, you can forget about it and do some other work. If there's a
plac
e in your code from which you can't move away without the results of the query, you place a
synchronizer so that the code will automatically be stopped until the other thread invokes the callback.

ar.AsyncWaitHandle.WaitOne();

The callback follows the fo
llowing scheme.

public void MyCallback(IAsyncResult ar) { // Retrieve the context of the
call (the command object) SqlCommand cmd = (SqlCommand) ar.AsyncState; //
Terminate the async operation SqlDataReader reader =
cmd.EndExecuteReader(ar); // Process the

results : }

The context of the call that you specified as the second argument to
BeginExecuteReader

is packed in the
AsyncState

property of the
IAsyncResult

object. Typically, the callback will perform any user interface
refresh that is needed after comp
leting the operation.

Note

Asynchronous calls are implemented
only

within the Whidbey.NET data provider for SQL Server.
Asynchronous calls require a network library with true asynchronous support, which is supplied with MDAC
9.0, that ships with SQL Ser
ver 2005 Beta 1, and the more recent Community Technology Preview of Visual
Studio .NET 2005.

Conclusion

ADO (along with some made
-
to
-
measure tools like Remote Data Services) and ADO.NET are the two main
data access technologies for SQL Server. Choosing on
e depends chiefly on the Windows platform you're
targeting, Win32® and COM, or .NET. In addition to ADO and ADO.NET, and spanning the worlds of Win32
and the bright lights of .NET, is SQLXML, an object model that fully exploits the XML capabilities of SQL
Server 2000.

In this article, I've discussed the main data access layers available to work with SQL Server. I've tried to put
each into perspective to give the sense of the history, the current situation, and the future evolution.

In the end, ADO is the m
ost reasonable (sometimes, unique) choice if you have to write COM, ASP, or
Win32 applications. If you are instead writing an application to be based on the .NET Framework, then
ADO.NET is a no
-
brainer. If you need to perform particular operations (e.g., s
erver cursors, schema
manipulation, bulk copy), ADO.NET might not offer the support you expect, but the trend is highly
encouraging. ADO.NET 2.0, slated in Beta 1 in mid
-
2004, promises to integrate missing functions and
facilities into the existing Framewo
rk, thus delivering a powerful, consistent, and self
-
sufficient data access
layer.

Dino Esposito is a trainer and consultant based in Rome, Italy. A member of the
Wintellect

[
http://www.wintellect.com/ ]
and
VB2TheMax

[ http://www.vb2themax.com/ ]
teams, Dino specializes in
ASP.NET and manages the ADO.NET and .NET Framework courseware for Wintellect. Dino spends most of
his time teaching and consulting across Europe and
the United States, but this didn't prevent him from
writing the Microsoft Press guide to
Programming Microsoft ASP.NET

[
http://www.microsoft.com/mspress/books/6667.asp ]
and the monthly
"
Cutt
ing Edge
"

column for
MSDN
Magazine

[ http://msdn.microsoft.com/msdnmag ] .