Migrating From C++ To C#

parkmooseupvalleySoftware and s/w Development

Jul 5, 2012 (5 years and 1 month ago)

620 views

Migrating From C++ To C#
Introduction
Since its beginning in the 1980s, C++ has come a long way. It has a large

established user base, tested software, its own tools (compilers, etc), and lots of

experienced programmers. It has also developed its own idioms and techniques

for programmers to write effective software. C++ programmers are comfortable

in getting things done with the facilities that are provided with it in an efficient

manner.
.NET is a powerful new platform with a great deal of promise. C# is designed

from the ground up to harness the power of this new framework. It provides a

whole host of features and is strongly based on C++. C# is an object oriented

language and is the first component-oriented language in the C family. It also

makes writing Windows and Web applications faster and easier. C# is gaining

wide acceptance and it is clear that it is here to stay for a long time.
C# is not a replacement for C++, and it is more than likely that both will be used

widely for the foreseeable future. However, there are many practical cases where

there is a necessity to migrate from C++ to C#. For instance, your company's

policy may be to change all existing code to .NET, or perhaps you wish to take

advantage of some of the facilities made available in .NET.
The question is, how do we make the transition as smooth as possible while

getting the best results? Adopting a new language doesn't just mean converting

the existing code from C++ to C#. By just knowing the syntax, a C++

programmer cannot straightaway start programming in C#. These two

languages differ largely by their design and approach towards problem solving,

which makes the language transition harder.
System Requirements
It is preferable that the reader has access to the C# compiler available in

Microsoft Visual C#.NET. This case study is for those programmers coming from

a C++ background, who are new to C# or have just started programming in it.

The programmers with a good understanding of C# are in a better position to

understand the approaches taken in the conversion process.
Case Study Structure
The case study consists of three main sections:

The approach
In this section, we briefly cover the basic theory that is necessary for

understanding the issues in conversion. It is possible that you may not be clear

about few of the C# features mentioned here - they are covered in the following

section.

Comparing C++ and C# features:
In this section we look at the different features of the two languages which are

necessary to make the conversion possible.

Steps in converting existing code:
The steps that are required for converting the existing code from C++ to C# are

covered in this section. An example of converting class hierarchies from C++ to

C# is also covered.
The Approach
What is the best approach for getting equipped for a smooth transition from C++

to C#? Understanding! Migrating from one language to another involves a

considerable effort. This is not because of a change in syntax, rather because of

changes in methodology - design approach, underlying technology, and the

approach towards problem solving. Understanding that there is such a

fundamental shift, and having the knowledge of where the major differences lie,

will help a lot.
The underlying translation models for C++ and C# are quite different. C++

follows a static linkage model, meaning that the source code is compiled by the

compiler to result in object code. The object files are linked to result in an

executable file. The operating system loads and controls the execution of the

program. The language features are designed with this approach in mind. For

example, there is no support for reflection. Moreover, the code is only source

code portable and not much runtime support is available.
C# follows an entirely different translation model - it combines compilation and

interpretation. The source code is converted to an intermediate format known as

MSIL (Microsoft Intermediate Language). A virtual machine, referred to as the

CLR (Common Language Runtime), takes over to execute the instructions. The

execution is thus in the hands of CLR and the code executed is referred to as

managed code. This change in translation model is reflected in the language

features as well.
A very important difference is in the area of memory management: the

programmer no longer has the complete control of the lifetime of the objects in

the heap. The garbage collector takes care of deleting objects whose lifetime is

over. So there is no need for the keyword "delete". However, there are

destructors in C#. If the "delete" keyword is not available in C#, then what is the

use of destructors? In reality, the destructor syntax in C# is very misleading,

especially for programmers from a C++ background. They are actually finalizers

that are called before an object is garbage collected.
Another issue to understand is the change in design criteria. C++ is designed for

experienced programmers in mind and 'trusts the programmer' (in the C

tradition). So no extensive runtime checking is done, there are implicit casts and

promotions in function calls. These features have proven to be very useful, but

also very bug-prone. Therefore, only experienced programmers should use them.

However, C# is designed so that even novice users can learn it fairly easily, and

is also designed with robust software in mind. It performs extensive runtime

checking with very few implicit conversions and tries to make the life of the

programmer easier.
How can this understanding of language design change help in transition from

C++ to C#? Let us use an example. A single argument constructor also serves the

purpose of a conversion operator in C++. When a conversion is required, that

constructor will be called implicitly because, it 'trusts the programmer': it is

assumed that the C++ programmer is aware of it. Such implicit calls may lead to

subtle bugs, like:
class Stack{
public:
Stack (int initivalCapacity);
// constructor that takes int as an argument
// other members
};
// now consider the code
Stack s;
s = 25;
// implicit conversion, a new Stack object is created with int as

argument
// s = Stack(25);
// beware! the programmer may have programmed without being aware that

the
// constructor with int argument is called for the conversion operation
// from int to Stack
To avoid such problems, you cannot use single argument constructors as

conversion operators in C#. You have to support explicit conversion operators

for that. Also, you can appreciate the use of implicit and explicit keywords in C#

better. With this knowledge you are better equipped now. When you write

equivalent C# code, you will also need to examine if a conversion operator needs

to be implemented and decide if it should be declared as explicit or implicit if the

original C++ code had any single argument constructors.
The problem solving approach also differs considerably in these two languages.

Consider writing a simple calculator program. You require a postfix expression

evaluator, and for that you may prefer to have your own reusable version of

Stack. The interface for Stack is well-known and the logic is pretty straight

forward. Still, your approach towards solving such problems may be entirely

different depending on the language you use.
In C++, you would write a template class for the stack. If you want to evaluate an

integral expression then you will instantiate an integer version from that Stack

template class. It has its own benefits like static type checking. You can use this

same implementation for any type of expression, for example floating point

expression, without any changes. It is also extensible.
In C#, all objects come from the common base class 'object', and so you can write

a Stack class which stores 'objects'. Since all the objects inherit from this class, you

can store virtually any object in that Stack. When you retrieve the elements, you

have to employ dynamic type checking to make sure that the types don't mix-up.

As you can see, even for the same well-defined problem, the problem solving

approach differs considerably and you make a different set of decisions and an

entirely different implementation depending on the language you are using!
Another important factor in the transition from C++ to C# is that it is a transition

from an unmanaged environment to a managed environment. In C++ there is

only trivial support from the runtime available, whereas C# has the sophisticated

.NET runtime environment. C++ programmers need to make special efforts to

understand the advantages with the managed environment. For example,

reflection is a powerful feature which can be used to generate and execute

assemblies dynamically. Runtime checks ensure that the security privileges are

available for providing access to resources. You have array bounds checking,

versioning support and most important of all - components that are created from

any language can interact freely. However, it should be noted that the managed

environment also comes with restrictions: you can no longer allocate objects

anywhere you wish - you can only allocate to the heap. Also, you cannot do

generic programming with templates as you could in C++, as .NET doesn't

support it yet. The concept and benefits of a managed environment are new to

C++ programmers, and hence exposure to the facilities with the underlying

framework is essential to get the most out of C#.
In essence, having a broad picture of these two languages and understanding the

differences in the underlying technology and approaches to design and problem

solving are essential for migrating from C++ to C#.
Comparing C++ and C# Features
The first requirement that is needed to move from C++ to C# is a shift in your

mindset. C++ is a language which trusts the programmer. This provides the

programmer with the ability to do whatever he wants. This power does have

drawbacks though - it can be misused and can end up causing major headaches.

C# on the other hand, doesn't trust the programmer as much. It takes many of

the responsibilities from the programmer and enables him to concentrate on the

bigger picture. It removes a few features that were error prone, and introduces

new ones that simplify programming.
Let us now compare the features available in the two languages.
Data Types
The types in C++ can be subdivided into three categories: primitive types,

aggregate types, and pointer types. The primitive types are: bool, char, int, float,

double, wchar_t. The aggregate types are those that are composed of other types.

These include arrays, structures, unions and enums. Both pointers and references

are called as pointer types. In C#, things are a little different, as it only has value

and reference types. The value type is capable of storing data by itself, whereas

the reference type cannot. It stores a reference, which points to the actual data.

The value types can be thought of as equivalent to the primitive types in C++.

They are derived from the class
System.ValueType
. These types can be stored in

the stack frame of a method. The reference types cannot be stored in the stack

frame, only in the heap.
However, a difference between C++ and C# data types are their size. While the

size of most of the types is implementation-dependent in C++, we have fixed

sizes in C#. We need to be cautious while converting between the available

types. For example, in C++ there is
long double
, which is 10 bytes. There is no

long double
type in C#, and
double
occupies 8 bytes. There is a new type,

decimal
, available in C# that occupies 16 bytes. Since the
decimal
type occupies 6

more bytes than the
long double
(in C++) you may think that you should be able

to store a value in
decimal
in 16 bytes what
long double
stores in 10 bytes.

However,
decimal
isn't used to give a wider range, rather it's used for getting a

more precise value, as in the case of currency values. If your intention of using

long double
is for higher precision, you don't have any problems, however if it

was for a wider range you may have trouble.
Unsigned types are supported in C#, but they are better avoided because using

them makes the code non CLS-compliant (Common Language Specification -

compliant).
References
We have seen many C# programmers considering C++ references equivalent to

C# references. This is wrong! Actually C# references are closer to C++ pointers.

Remember that the references in C++ serve as a name alias. They are sure to

point to an object, and sure to point to the same object throughout the scope of

the reference. However, it's different in C#. Just like pointers, they can be defined

without initializers, but they can point to different objects at different times, and

they can even point to nothing - the null (actually they throw

NullPointerException
and not
NullReferenceException
when you attempt to

refer a null reference!). So:
//C++
MyClass &ref = null; // error, references cannot be null
MyClass &ref = obj; // needs an initializer
ref = anotherObj // Error: can't change the reference
//C#
MyClass ref; // OK initializer not needed
ref = anObj;
ref = anotherObj; // OK change the reference
ref = null; // allowed
You can think of C# references as 'restricted and safe C++ pointers':
// C++
string * s;
s = new string;
// C#
string s;
s = new string();
Declarations and Definitions
With the discussion of C# references, you may have noticed one drastic

difference in the semantics of the following statement:
string str;
This same statement will be interpreted in different ways by these languages. A

C++ compiler sees this statement as a definition of a variable called str. It

allocates a new stack object (or data area if declared globally) and calls the

default constructor on the allocated object (a string object in this case). A C#

compiler sees the same statement as a declaration for the reference variable str. It

allocates space for the reference alone. It neither allocates space for an object nor

calls the constructor. This should be done explicitly be the programmer:
str = new string();
This statement now allocates memory for the object in the heap and then calls the

default constructor.
C# combines declarations and definitions together, whereas C++ clearly

distinguishes between the two. For this reason, there are no function prototypes

and forward declarations. The C# compiler carefully checks for definite

assignment - you cannot use a variable without initializing it. Such facilities help

avoid bugs, and greatly simplify the life of the programmer.
Structs
Except for the default access specifier, C++ never differentiates between structs

and classes. Both of them are functionally the same. A struct can contain methods

and can be inherited by a class, but C# takes a different path. Here the structs are

just placeholders of other data types and no behavior can be specified. This

means that the structs can no longer contain any methods. No classes can inherit

from them. The advantage is that, as they are value types, they can be stored in

the stack frame. They do not require any indirection and so are more efficient

than classes.
When you want to group some related data where no methods have to be

associated with them, structs are the best solution. When we want to model a real

world entity with both data and methods, classes have to be used. For example,

'Point' in a graph is a simple aggregate type, and for that a struct can be used.

Implementing a 'Vehicle' type may require encapsulating lots of data and

methods operating on it, and for that, classes are better suited. Actually there are

no hard-and-fast rules for deciding between structs and classes. A good rule-of-
thumb is to use structs for the simplest aggregate types and classes for any non-
trivial types.
One notable advantage when you use structs is that they are allocated on the

stack itself and there is no memory overhead. Lots of memory will be saved

when hundreds of objects are created, for example a big array of struct type.

When you use a class type, the objects will be allocated on the heap and hence a

lot of memory overhead is involved (In the current version of .NET, 10 more

bytes are occupied for each heap object compared to an equivalent stack object!).

So, using structs for small types can lead to saving significant amounts of

memory.
MyStruct [] sArr = new MyStruct[10];
Whereas for the class type:
MyClass [] oArr = new MyClass[10];
for(int i = 0; i < 10; i++)
oArr[i] = new MyClass();
Arrays
Arrays are the simplest data structures that are widely used in programming. In

C++, arrays are treated as a contiguous memory location. The low level nature of

arrays create problems with object oriented programming. A base class pointer

cannot be used to iterate through the array of derived class objects:
class Base{
public:
// Base class data members
virtual void boo();
};
class Derived: public Base{
public:
// Derived class data members
virtual void boo();
};
void foo(){
Derived dArr[10];
Base * bPtr = dArr;
for(int i =0; i<10; i++)
bPtr[i]->boo();
// Will not work properly
}
This is because, the size of the base class object may not be equal to the size of the

derived class object. The compiler cannot identify the proper object at the time of

compilation. C# is a dynamic language and has fully-fledged support from the

run time. Arrays are no longer contiguous location in the memory. This makes

such operations legal and safe.
C# does not treat arrays as mere continuous memory locations. It adds object-
oriented characteristic by giving a class
System.Array
, from which all the arrays

inherit. This class abstracts the operations on an array and can be casted into any

of the arrays. Remember that arrays of all types are also derived from this class.

As arrays are instances of a class, they are always reference types and this holds

good for the arrays of value types. This helps in bound checking for every access

in an array, but a problem is that has it to be allocated on the heap only.
Both the languages support rectangular and jagged arrays. For rectangular

arrays, a chunk of plain memory locations are allocated and indexing is done on

it. In C++, jagged arrays can be implemented by having a pointer array and

allocating memory dynamically for each array. The same idea is followed in C#,

but instead of pointers, references are used. This makes optimal use of space,

since the sub-arrays may be of varying length. The compromise is that additional

indirections are needed to refer to access sub-arrays. This access overhead is not

there in rectangular array since all the sub-arrays are of same size.
// C++ language example for 'rectangular arrays'
float rectArr[5][20];
// C# rectangular arrays, note the difference in syntax
float [,] rect = new float [5,20];
// C++ language example for 'jagged arrays'
float **ptr;
ptr = new float *[5];
for (i=0; i< 5; i++)
ptr[i] = new float [20];
// C# example for 'jagged arrays'
float [] [] ptr;
ptr = new float[5][];
for(int i=0; i<5; i++)
ptr[i] = new float[20];
When more than one method of representation is supported, at some point the

user will require to switch from one representation to another. Here, to convert

from one array type to another, techniques called boxing and un-boxing are used

(discussed later). It also should be noted that C# supports 'Indexer' members that

allow array-like access to data structures.
Enums
Enumerations are of the type int in C and in C++; its type depends on the

number of enumeration constants declared. C#, as an improvement over the old

enumeration, allows you to specify the type of the enumeration:
enum holidays : ubyte{
Sunday = 0,
Saturday = 1
}
C# enums differ from C/C++ enums in that the enumerated constants need to be

qualified by the name of the enumeration when they are used.
enum workingDay { mon,tue,wed,thur,fri };
workingDay today;
today = workingDay.mon;
//note that mon is qualified by workingDay
This name.member syntax helps the enumeration constants to remain in a

separate namespace, thus preventing them from polluting the global namespace.

Furthermore, it prevents name clashes between two different enums:
// C#: no name clashes with other enum members
enum Days { mon, tue, wed, thur, fri, sat, sun };
enum CosmicObjs { earth, mars, jupiter, sun, moon};
enum Companies {sun, microsoft, dell, digital, compaq};
myDay = Days.sun;
computer = Companies.sun;
cosmicObject = CosmicObjs.sun;
Variable Length Argument Lists
Experience has shown that programmers prefer C style printf format, because it

is convenient for exact format specification and is easy to use. C# provides

'params' for the support of variable length argument lists. So you can write your

functions using this facility as in:
int MyPrintf(string format, params object [] args);
For printing, C follows the format string with variable length argument strings;

C++ uses << with cout; Java has overloaded the + operator. In C#, to print the

arguments, the numbering should be as follows:
Console.WriteLine("{1} {2} {3}", i, obj, "someString);
Writing 'Unsafe' Code
C++ is good for writing low-level code, which is useful for programming

systems with features like pointer arithmetic. C# understands the importance of

that, and allows 'unsafe casts', pointers, and pointer arithmetic to be performed

in code segments that are explicitly labeled as unsafe. Note that the keyword

'unsafe' may be misleading - it just specifies that is isn't managed code and that it

may perform low-level operations. Also, it is not as easy nor as powerful as in C+
+.
Argument Passing
When we pass a variable to a method, we are not sure whether it will get

modified or not. To ensure that the variable should not be modified, the

programmer should use the
const
qualifier for that argument in that method. The

absence of such const qualifiers indicate that the variable could be used for

multiple return values in C#. It introduces two new keywords to achieve these

multiple return values. If the method has multiple return values, it should

explicitly use the ref or out keyword.
Furthermore, C# supports two new types of arguments: ref and out. When we

pass an argument to a method, the caller should be aware that the parameter

may be modified. The ref keyword indicates this. As wekk as during the method

definition, the ref keyword is also used in the method invocation:
//C++
void foo(MyClass & arg1, MyClass & arg2){
// other code;
arg1 = newValue1;
arg2 = newValue2;
}
foo(obj1, obj2);
// Note: the caller may not expect obj1 and obj2 will change
//C#
int foo(ref MyClass arg1, ref MyClass arg2){
arg1 = newValue1;
arg2 = newValue2;
}
foo(ref obj1, ref obj2);
// Now the programmer is aware that obj1 & obj2 may be changed
In a few cases, we may want to initialize the arguments only in the method. The

use of the ref keyword will be flagged as an error by the compiler as a definite

assignment has to be done before the first use. One elementary way to avoid the

error is to initialize the variable with the default value and then to pass it to the

method. C# introduces a new keyword for this situation. Instead of ref, we can

use out, which doesn't force the caller to initialize the variable. However, it is

mandatory for the method to assign some value to it.
//C#
void foo(ref MyClass arg1, out MyClass arg2){
// other code;
arg1 = someValue; // optional
arg2 = someValue; // need to assign some value
}
MyClass obj1, obj2;
obj1 = aValue; // need to initialize
foo(ref obj1, out obj2); // note obj2 is not initialized
Class Abstraction
Just like C++, the basic unit of abstraction is a class. The access specifiers
public
,

protected
and
private
have the same meaning in both the languages. In addition,

C# provides
internal
and
protected internal
access specifiers. The internal

members are available to the whole assembly and the
protected internal
to the

assembly and the derived classes. Why do you ever need these access specifiers?

There are few cases where you need to access members of other classes in the

same assembly but shouldn't be exposed to the external classes. Since friend

access is not there in C#, this can be a useful feature particularly when you are

designing libraries.
Inheritance
C# doesn't support multiple class inheritance. It only supports single inheritance,

but you can still inherit from multiple interfaces. Pure abstract classes in C++ can

be treated as interfaces in C#. There are many restrictions in using interfaces for

inheritance. You can only have public abstract methods, and no fields are

allowed (not even const fields). However, one interface can inherit from another

interface.
C# only supports public inheritance. Not having private or protected inheritance

doesn't affect the functionality as such. There are a few inconveniences with this

approach, for example, once you implement
ICloneable
, all the classes that

inherit from that class becomes automatically cloneable, as only public

inheritance is available.
The Object Base Class
C# doesn't support templates as .NET doesn't support it yet. However, a weaker

form of generic programming is supported in C# through the
System.Object
base

class. This is the apex class for all the objects. This includes the value types like

structs and ints and reference types like arrays and strings. This property is

exploited in the Collections provided in the framework that works in terms of

Objects.
The standard libraries of both C++ and C# provide support for the container

classes. Consider this example of using the vector class:
MyClass obj;
string str = "string object";
const int size = 5;
vector<MyClass> vect(size);
vect[0] = obj;
vect[1] = str;
// Compiler Error: vect can store only MyClass and not others
// insert more elements
// iterator provides a pointer-like syntax for
//traversing the container
cout<<vect[0]<<vect[1]<<endl;
vector<int>::iterator iter = vect.begin();
while(iter != vect.end()){
cout << *iter;
// calls overloaded << operator of MyClass
iter++;
}
Thus, you can have elements of only one type, and the traversing and accessing

is done through iterators. With C#, .NET provides an equivalent container class

for vector - the ArrayList container:
// Creates and initializes a new ArrayList
MyClass obj;
string str = "string object";
ArrayList arrLst = new ArrayList();
arrLst.Add(obj);
arrLst.Add(str);

// can simply use foreach statement for traversing the colletion
foreach(MyClass elem in arrLst){
Console.WriteLine( " {0} ", elem);
}
// throws 'InvalidCastException' as the second element is a string
Operator Overloading
In C++ almost all the operators can be overloaded - there are only a few

operators like the conditional operator, . operator, .* and .-> operators that

cannot be overloaded. C# provides support for operator overloading but to a

limited extent. The syntax for overloading the operators is:
// C++
<return type> ClassName::operator <the operator> (arguments)
// usage example
class MyClass{
public:
MyClass operator + (MyClass &rhs);
};
// C#
<return type> public static operator <the operator>(arguments)
// usage example
class MyClass{
public static MyClass operator + (MyClass lhs, MyClass rhs){}
}
The main difference is that while you can have member or global (mostly friend)

functions in C++, you have static methods for overloading in C#.
Although the syntax looks similar there are a few constraints imposed by C# for

operator overloading. The most important are:

The methods should be declared as public and static.

Many of the operators are required to be overloaded in pairs. For

example, if you define == you should overload the != operator also.

If you define the + operator, the compiler defines the += operator for you

to make things easier.
//C++
class CPPClass{
protected: // can be public or protected or private
bool operator ==(CPPClass &rhs);
// the another argument is passed implicitly by 'this' pointer
// note no != operator defined

bool operator ++(); // type of the return value is not forced
bool operator +(CPPClass &rhs);
bool operator +=(CPPClass &rhs);
// += is not implicitly defined
static int operator-(CPPClass &lhs, CPPClass &rhs);
// both static and non static methods are allowed
};
//C#
class CSharpClass{
//note that all the operators are public and static
public static bool operator ==( CSharpClass lhs, CSharpClass rhs){}
public static bool operator !=( CSharpClass lhs, CSharpClass rhs){}
// relational operators should be overloaded in pair
public static CSharpClass operator ++(CSharpClass arg);
// return type and argument types are forced for few operator
public static bool operator +(CSharpClass rhs);
// += is implicitly defined by the compiler when
// binary + is defined
}
Exceptions
Exception handling in C# is similar to C++. The exception specification of a

method lists all the possible exceptions that the method might throw. In C++,

when the method doesn't lists any exceptions, beware that it is then allowed to

throw any exception, and there is no constraint for a method to catch the

exceptions thrown. Further, exceptions are not only thrown in the form of

classes, but also in the form of primitive types.
The C# exception handling mechanism is much simpler and more elegant.

Firstly, a method cannot throw the exceptions that are not listed in the exception

specification of the method. Catching of the exceptions is mandatory and only

objects of Exception (or derived from) are thrown.
//C++
void foo(){
throw 10;
throw MyException();
throw "This is an Error";
}
void boo() throw (int, Exception){
throw "Something is wrong";
//Error: can only throw int / Exception
}
void doo() throw (){
// guaranteed that no exceptions will be thrown
}
//C#
void foo(){
//will not throw any exception
}
void boo() throws IOException{
throw new IOException(); //OK
throw new MyException();
// Error: allowed to throw only IOException
}
Namespaces
Namespaces are supported in C++ for better organizing the code and are

valuable in large-scale programming. In C#, the syntax for declaring and

organizing classes in a namespace is similar to that of C++. There is no concept of

header files (C# design is such that there is no need for header files, for example,

it combines declarations and definitions) and you have to use the using directive

to open up the members in the namespace for access in the code. You can also

have aliases:
using alias_name = namespace_or_type;
Just like in C++, you can have nested namespaces. The syntax is a bit different:
namespace outer.inner{
// some members
}
Note that you have to use one namespace within another for a similar goal in C+
+:
namespace outer{
namespace inner{
// some members
}
}
There is an importance difference between the namespaces in C++ and C#. In C+
+, namespaces are logical entities and no physical enforcement of namespaces

exists. However in C#, in addition to logical separation, a physical separation of

namespace members and enforcement of hierarchy is there in the form of

assemblies and sub-assemblies. This enables the namespace rules to be enforced

at the physical level.
Properties
It is common for a C++ programmer to give the
get
and
set
methods for data

members. Not only does this help in abstracting the details, but it also gives a

few advantages such as that the user cannot assign illegal values to the field,

such as 500 to a field called age, or the programmer can give a read-only version

of the member, such as size of a container, etc.
class MyClass{
private:
int someInt;
int length;
public:
inline int getLength(){
return length;
}
inline int getSomeInt(){
return someInt;
}
inline void setSomeInt(int arg){
if(arg >= minValue && arg <= maxValue)
someInt = arg;
else
error("illegal value");
}
};
//usage:
MyClass anObj;
anObj.setSomeInt(100);
int len = anObj.getLength();
As most of these methods are inlined, the performance isn't affected. However

there are two problems with the usage of such functions. The first is that the

syntax of accessing them is a bit unwieldy. The next is that the approach itself

violates the object oriented programming guidelines. An object is supposed to

expose a behavior and not the implementation. By these methods, obviously the

object exposes its private fields to the user. C# provides a whole new way to

handle this situation through properties.
Properties are very much like the get-set methods, but syntactically different.

Consider this example written with properties in C#:
class MyClass{
private int someInt;
private int length;
public int Length{
get{
return length;
}
}
public int SomeInt(){
get{
return someInt;
}
set{
if(value >= minValue && value <= maxValue)
someInt = value;
else
error("illegal value");
}
}
}
//usage:
MyClass anObj;
anObj.SomeInt = 100;
// set the value of the field through mutator property
int len = anObj.Length;
// get the value of the field through accessor property
Note that a variable
value
is used in the set method. It is the implicit parameter

passed to the method by the compiler. Its type is the same as that of the property.

As we can see, the syntax is more intuitive to use.
Indexers
We tend to have many container classes that are used to hold a set of objects.

Stacks, Queues, Maps and Hashtables are just a few such important containers.

There are many other objects that can also be viewed as containers. For example,

a menu can be thought of a container of the menu items. In most cases we will

need to access the objects in the containers through an indexer. In C++ this can

be done by overriding the array subscript operator []. We can override it not only

with integers, but with any object we want, which sometimes makes the

subscripting more meaningful:
class EmployeeContainer{
private:
Employee emp[100];
public:
Employee& operator[](int empNo){
//return the employee with the empNo
}
Employee& operator[](string name){
// return the employee with the name
}
};
void foo(){
EmployeeContainer empCont;
// add the employees to the container
Employee emp1 = empCont[5];
empCont["Pranni"].age = 24;
}
C# introduces the indexers to fit this problem of indexing a container. The

equivalent
Employee
class can be written in C# as:
class EmployeeContainer{
private Employee emp[100];
public Employee this[int empNo]{
// implement it like a property
get{
return emp[empNo];
}
set{
emp[empNo] = Employee;
}
}
public Employee operator[string name] {
// implement it like a property
get{
// getting Employee index mapped by string info
}
set{
// code for setting Employee detail at index position
}
}
}
void foo(){
EmployeeContainer empCont = new EmployeeContainer();
// add the employees to the container
Employee emp1 = empCont[5];
empCont["Pranni"].age = 24;
}
Attributes
Attributes are a significant addition to C#. When you are creating your own

types or components, there is a necessity to associate related details of the

components and their elements. In COM you used type libraries to achieve such

functionality. Traditionally, comments and macros are used in C++

programming for storing the metadata about the class and/or its members. C#'s

attributes are far more powerful and you can give meta-information for many

language elements: fields, methods, events, etc. You can retrieve and examine

such meta-information at runtime using reflection (discussed later). There are

two types of attributes: intrinsic (predefined) and custom attributes.
C# supports a preprocessing facility but there is no separate tool - it is handled

by the compiler itself. The preprocessor support has restricted use though, for

example you cannot have macros. One of the uses of the preprocessor is

conditional methods and that is achieved through Conditional attributes. It is an

intrinsic attribute used for including the method depending on the condition. In

C++ you use preprocessor facilities directly.
//C# code

#define DEBUG
// such definitions should occur only in the beginning
class MyClass{
[Conditional("DEBUG")]
public static void debugFunction(string message){
cout<<message<<endl;
}
// other members
}
C#'s conditional methods are very powerful when used with the Debug and

Trace classes available with the
System.Diagnostics
namespace. There are many

such useful attributes; one is Serializable, which is discussed later.
You can define your own custom attributes. You have to derive your class from

the
AttributeUsage
class. Here is one simple example for maintaining the code

comments from the author of the code:
using System;
[AttributeUsage(AttributeTargets.All, AllowMultiple=true)]
// tells that this attribute can be used on any program element
// and there can be multiple entries for each use of attribute
public class CommentAttribute : Attribute{
public CommentAttribute(string comment){
this.commentText = comment;
}
private string commentText;
public string CommentText{
get{
return commentText;
}
}
}
[Comment("Written by Ganni and Pranni")]
class GuineaPig {
// ...
}
class Test {
public static void Main(){
Attribute[] attributes =

Attribute.GetCustomAttributes(typeof(GuineaPig));
//This static method GetCustomAttributes
//is used to retrieve the attribute info
foreach(CommentAttribute attribute in attributes)
Console.WriteLine(attribute.CommentText);
}
}
You can use the custom attributes with the same special syntax as in intrinsic

attributes and there is no need to call the constructor explicitly - you can initialize

the attribute directly. The static method
GetCustomAttributes
of the Attribute

class is used for retrieving the attributes by passing the type.
Callback Functions
Function pointers are a useful facility in C/C++. The following example shows a

real world example of using function pointers. Say you want to write a menu

program. The aim is to write a program that will call a corresponding function

that is selected in the menu at runtime. Therefore, we have to declare a function

pointer whose signature matches the functions that are written for the menu:
void (*menuSelector)( );
// get the input from the user - selection of the menu item
switch(select){
case NEW : menuSelector = & New( ); break;
case OPEN : menuSelector = & Open( ); break;
// assign the address of the corresponding function to menuSelector
}
menuSelector( );
// now call the selected functionality
The calling of functions using these function pointers, whose value is determined

at runtime, is known as 'call back'. C# provides support for callback functions

and it is called 'delegates' (you can also consider it as an improved version of the

'function objects' in C++).
Delegates closely resemble function pointers, and C# promises that delegates are

type-safe, secure, and object-oriented. A delegate is capable of a holding a

reference to another function so that function can be called later. Even multiple

functions can be installed like that. Callbacks are valuable for event handling. C#

also supports events that are useful in the case of event driven programming like

Windows Forms:
public delegate void Selector();
// Selector is the type that can be used to instantiate
// delegates that take no arguments and return nothing
public Selector menuSelector;
public void New(){
Console.WriteLine("You selected 'New' option");
}
public void Open(){
Console.WriteLine("You selected 'Open' option");
}
string select;
// get the value of select from calling the menu...
Test t = new Test();
switch(select){
case "New" : t.menuSelector = new Selector(t.New);
break;
case "Open" : t.menuSelector = new Selector(t.Open);
break;
// ...
// register the selected method to menuSelector
}
t.menuSelector( );
// call the delegate and it will inturn call the registered method
Reflection and RTTI
When doing object oriented programming, we treat an object as if it were an

more general type. So, for example, we can view a Dog as a mammal, an animal,

or even simply a living thing. So when we have a more generalized version,

sometimes we would like to know what the exact type is and act accordingly.

Say if we have a living thing, we would perform some operations on a mammal,

that we wouldn't on an amphibian. We would perform even more specific

operations if it were a Dog. In such cases, RTTI (Run Time Type Identification)

comes into the picture. C++ provides the typeid operator and a set of classes that

enable the querying of the type of an object at runtime. This operator will return

the exact type of the object only if there is at least one virtual function in it:
class Base{
// no virtual methods
void Base1();
void Base2();
};
class Derived1 : public Base{
virtual void vMethod();
};
class Derived2: public Derived1{
};
void foo(){
Derived2 d2Obj;
Base* bPtr;
bPtr = &d2Obj;
cout<<typeof(*bPtr)<<endl;
Derived1 *dPtr;
dPtr = &d2Obj;
cout<<typeof(*dPtr)<<endl;
}
//output:
// class Base
// class Derived
Reflection is a feature available only in dynamic (interpreted) languages.

Reflection is a powerful facility as we can dynamically load classes, create

objects, change their properties, and invoke methods on it. Although fully

exploiting the power of reflection will not be explored in this case study (see

http://www.csharptoday.com/content.asp?
id=1852&WROXEMPTOKEN=1518115ZIn19JBRkpiV5wX71qk
for a whole piece

on the topic), here is a sample that loads an assembly and invokes its methods

dynamically:
using System;
using System.Reflection;
class ReflectionTest{
// this method will be called dynamically
public void InvokeDynamic(){
Console.WriteLine("Hello, dynamic world!");
}
public static void Main(){
Type t = Type.GetType("ReflectionTest");
// get the type by passing the name of this class
MethodInfo m = t.GetMethod("InvokeDynamic");
object o = Activator.CreateInstance(t);
// Activator is a class defined in System namespace
// you can use it to create objects (remote or local)
m.Invoke(o, null);
// the second argument is the list of arguments passed
// to Invoke - null in this case
}
}
// output:
// Hello, dynamic world!
Memory Management
Moving from C++ to C# takes away a lot of the programmer's freedom. C++

allows you to determine whether to create an object in the stack or on the heap,

whereas C# doesn't. The change from unmanaged to managed environment has

drawbacks to. C# is a dynamic language and all the allocation is done on the

heap. Only value types are allocated on the stack. So, you have to allocate the

memory for all the objects on the heap manually, even for those objects you used

to allocate statically in C++. In C#, in addition to using 'new' for dynamic

allocation for heap objects, you can use it for stack objects (structs) to call the

constructors.
The difference in where the objects are allocated is significant. For example,

when casting is done from a value type to a reference type, memory needs to be

allocated on the heap and initialized. This process is referred to as 'boxing'. For

example:
int i = 10;
object o = i;
Note that you don't need an explicit cast here, as it is an 'upcast'. When the

conversion is done from reference type to value type, it is referred to as

'unboxing'. However you need explicit casting to do that as it is a 'downcast':
int i = 10;
object iRef = i;
int j = iRef + 100; // doesn't compile, needs explicit cast
int k = (int)iRef +100; // now OK
Such conversions are not possible in C++ as there is no common base class.

Boxing and unboxing are costly operations and need to be avoided whenever

possible as it involves creation and destruction of objects.
Garbage Collection
The burden of managing the memory is greatly reduced in C#, as the garbage

collector automatically reclaims the unused/unreferenced objects. With garbage

collection, most of the problems with managing the memory like dangling

pointers and memory leaks are gone. Garbage collection is only for memory

objects, but there are other resources like network connections that need to be

released when the object is recollected. This is done in the finalize method. C#

still supports C++'s destructor syntax, but C# destructors are 'syntactic sugar' for

finalizers.
~MyClass(){
// release resources like database connections}
is equivalent to:
protected override void Finalizer(){
try{
// release resources like database connections }
finally{
base.Finalize();
}
}
which is little tedious to type, and hence the destructor syntax is convenient. The

meaning of destructors is not the same in these languages even though the

syntax is the same. There is no assurance that the object will be garbage collected

or finalizers will be called immediately when there are no more references to that

object. If there are important resources like file handles or database connections

that are released in C++ destructor code, you shouldn't go for Finalize in C#.

Rather, you have to implement the
IDisposable
interface, override the
Dispose

method, and write the code for releasing such connections or handles.
using System.Runtime.InteropServices;
class MyClass : IDisposable{
MyClass(){
// get resources
}
public void Deallocate(){
// code for releasing resources here
}
public void Dispose(){
Deallocate();
GC.SuppressFinalize(this);
// since Dispose is called, the Finalize method should
// not be called... so tell GC to suppress call to
// Finalizer method
}
~MyClass(){
Deallocate();
}

public static void Main(String []args){
MyClass obj = new MyClass();
// use obj;
obj.Dispose();
}
}
To be more precise: it is not possible to determine exactly when the garbage

collector will be called, and so C# doesn't have deterministic finalization. To

overcome this, you have to implement the
IDisposable
interface and provide the

implementation for the
Dispose
method. After you use the object, you can release

it by calling the
Dispose
method explicitly. Who is responsible for calling this

method for objects that are from various sources? The time honored C++

principle of disposing heap objects applies to this also: 'whoever allocated the

memory has to recollect it'.
Steps in Converting Existing Code
There are cases where systems that are written in C++ need to be ported to C#.

The .NET environment can use C++ code directly in two cases:

When the classes are written in Managed Extensions to C++

If they are COM components
If the application is written as COM components, then the component can be

used directly in .NET. In the case of COM components, you can use the Type

Library Importer (
tlbimp.exe
) utility. It reads the COM type library information

and converts it to an equivalent .NET assembly as a proxy class that contains the

necessary metadata. However, it should be noted that the code is still

unmanaged.
'Managed extensions to C++' (MEC) is a set of extensions to the C++ language

provided by Microsoft that can be compiled to code targeting .NET environment.

Most of the existing C++ code is not for component programming; so the code

cannot be used directly in C#. MEC is new to the programming world and hence

there is no possibility that legacy code is written in that.
C# provides support for low-level programming and has facilities to make use of

legacy code. For example, the methods that are available in the DLLs can be

accessed by declaring such methods with the DllImport attribute. You have to

declare such methods as extern - it has a similar use as in C++ for accessing

methods from other languages. It can be applied only to methods implemented

externally. Say, you want to use your favorite MessageBox in traditional

Windows programming:
[DllImport("User32.dll")]
public static extern int MessageBox
(int h, string m, string c, int type);
// now you can use it in your C# code
This feature is of great use if yours is a code library or framework and not a full-
fledged application. You just need to declare the methods in your C# code and

can make use of them by storing them in DLLs.
When you want to convert existing C++ code to run under the .NET platform,

the following decisions need to be made. If the code is simple enough that it can

be rewritten without much effort, then you can go for C#. Practically, C++ code

may involve low-level programming like accessing hardware features. Such

functionality can be done in C# itself to some extent due to its support of C like

structures and allowing restricted use of native pointers. At the level where full-
control over resources is required, you can do explicit memory management as

well. Such code should be done in 'unsafe' blocks. If it is complex enough that it

cannot be handled with the facilities that are available in 'unsafe' then direct

conversion could be made from C++ to Managed Extensions to C++. Code

written like that is accessible from C# code. All this means that the tested, legacy

C++ code need not be discarded and you can still use it under .NET

environment, albeit as unmanaged code.
Thinking of one-to-one correspondence of functionality leads to poor design and

fragile code. Translating C++ code on a line-by-line basis is not feasible as the

two languages differ considerably in their functionality and support. Let us

illustrate this with an example. In C#, all the functions have to be abstracted

inside classes, as no global functions or data is supported. C# doesn't support

global variables/functions because it strictly enforces class as the basic

abstraction mechanism. So, when you are moving to C#, it is better to stick to the

C# mindset - don't think in terms of C++. To illustrate how these ideas

materialize, consider the following example of converting the class hierarchies.
Converting the class hierarchies
Designing class hierarchies differs drastically in C++ and C#. This is because

multiple class inheritance is not supported in C#, only public inheritance is.

Consider the following hierarchy available in C++:
class Base1{
// pure abstract base class
}
class Base2{
// abstract base class
}
class Base3{
// concrete class
}
class Derived: public Base1, protected Base2, private Base 3{
}
Base1 can be represented as an interface as a C++ pure abstract base, which is

equivalent to an interface in C#. The Base2 can be an abstract class in C#. The

problem arises here because multiple class inheritance is involved, as there can

be only one base class in C#. If possible, try to convert Base2 into an interface.

That implementation is available for a few of the methods. In the other cases,

those implementations can be provided in the concrete class, thus making Base2

as an interface feasible. The problem arises when there are data members. In that

case, having it as an interface is not feasible - moving data members is not

advisable.
In general, this can be solved by having Base3 inheriting from Base2. Since Base2

is an abstract class, it can better serve as base, rather than Base3 serving as a base

class for Base2. The C++ code has private, protected, and public inheritance.

How can they be handled in C#? Note that C# supports only public inheritance.

So, you are forced to use public inheritance for all the three types of inheritance

supported in C++, public, private and protected. Using public inheritance

doesn't affect the functionality. The real difference lies in abstraction. In C#

solution, all the members are exposed and the hierarchy looks like this:
interface IBase1{
}
// the naming convention in C# suggests interfaces to use I prefix

before name
abstract class Base2{
}
class Base3 : Base2 {
}
class Derived: IBase1, Base3{
}
Having the exact C++ hierarchy in C# is not possible. However, this can be

achieved to some extent by understanding the inheritance model supported in

these languages.
Case Study Review
Migrating from C++ to C# is not easy as it may seem. C# is strongly based on C+
+, but the two languages differ in their design. The syntactic similarities between

the two languages can be misleading, as there are many semantic and pragmatic

differences. There are many places where the C++ programmer will truly get lost

when he starts programming in C#.
A C++ programmer needs to have a good understanding of the migration

process and should be clear in his/her approach to get best results from such a

transition. The two languages differ in many fundamental ways: design

approach, memory management, problem solving approach, and the underlying

translation technology are just a few differences. To get the best results, it is

essential that the programmer has an overall view of such issues.
The second section of the case study is not just looking at the differences in

features. Rather, it's a discussion of how the transition can be done from C++ to

C# by analyzing its features. Naturally, a clear picture emerges of what to expect

and what not to expect in such a transition.
When there is a necessity to convert the existing code from C++ to C#, a set of

decisions needs to be made. If the code is available as COM components, it can

be used directly instead of manually converting the code. If the code is a library/
framework available as DLLs, then no conversion needs to be done and it can be

used directly in C#. Managed extensions to C++ can be used for minimal

changes in the code and the application becomes available in the .NET

environment. A decision needs to be made if it is necessary to rewrite the whole

code in C#. In that case, line-by-line conversion of code is not feasible and such

transition will need significant effort on the programmers part. It will also

necessitate a change in design approach and new strategies.
All rights reserved. Copyright

Jan 2004.