Josh's Object Oriented Scripting Language - Babel18

ugliestharrasSoftware and s/w Development

Nov 4, 2013 (3 years and 9 months ago)

79 views

Babel18

Chapter 1. Resources

In the beginning there was the resource.


When developing games these days, the largest part of the application is the
assets
. Assets include art, sound,
maps, and all the other data which is created before the game is shipped. This is as opposed to the data which is
created as the game is played. Assets are never changed when the program executes, they could be left on the C
D
(though may be copied to disk since CD access is slow). Assets need not all be in memory at the same time, in fact
it is pretty likely that the assets won’t even fit in memory. Instead, assets are loaded from disk and then cached.
Since assets are imm
utable, those which are not being referenced may be safely discarded when memory gets tight.
This should be a big win over virtual memory systems which have to write memory back to disk. Clearly, they
don’t need to be saved with saved games.


Actually, I

should be more precise. Assets really are all the data (‘content’) produced by the artists, level designers,
writers, and sound guys. Assets rightfully deserve version control, mechanisms for delegating work, and a bunch of
other tools for asset managem
ent. Think of resources as the end product which gets included with the game. My
game framework has a module called the resource manager for accessing those resources. The resource manager
assumes that all the assets have been compiled into resource fil
es using game specific formats.


The resource file begins with an index which maps a numeric resource id to a position and length in the file.
Actually it stores a little more to support compressed resources, but that is transparent to anything outside of

the
resource manager and resource compiler. While I’m mentioning technicalities, the index may also indicate that the
resource is in another file (‘linked’ rather than ‘embedded’
--

useful during development). The resource id contains
the type of resour
ce and which module it was compiled in, as well as a number to make the id unique (well, unique
with a very minor caveat). Storing which module the resource was compiled from helps reduce dependencies
between modules
--

the compiler need only insure the
id is unique within the module to guarantee global uniqueness.
The resource manager reads in this index when the resource file is opened, the rest is loaded on demand.


Every resource has an associated type given by a number called the ‘type id’. Classes

can be made loadable as
resources by:


1) Assigning the class a type id unique to that class with the member declaration:



enum { TypeID=1 };


2) Giving the class a member function for loading the data:



void Load(ResourceLoader &rl);


3) Having a defau
lt constructor

Note that there is no global list keeping track of which classes are assigned to particular ids. The id is only used as
a type check for safety reasons. Also note that the ‘
Load
’ function takes an abstract ‘
ResourceLoader
’ base
class as it
s only parameter. This hides the specifics of how the file is stored: text vs. binary, endian
-
ness,
compression, etc.


In order to access a resource, you declare a variable of type
Res<T>
, where
T

is some class meeting the
requirements of the above paragr
aph. The parameter to the constructor is the numeric resource id. The constructor
of the
Res<T>

class then asks the resource manger if the resource has already been loaded.


1) If the resource was not loaded:



a)
Res<T>

asks the resource manager for a
R
esourceLoader

appropriate for loading the data.



b)
Res<T>

then creates a
ResWrap<T>

(which has an instance of
T

as well as a reference count)



c) calls
T
’s
Load

function



d) adds the resulting object to the ResourceManager’s database under the id numbe
r


2) If the resource is already loaded:



a) The reference count for the data is increased.



b) If the data was not already in use, it is removed from the ‘not in use’ list.


Once constructed, the
Res<T>

acts as a ‘smart pointer,’ emulating C++’
s pointer semantics using operator
overloading. Since the semantics of resources is for immutable data, the access functions of
Res<T>

only return
const pointers/references. This implies that
Res<T>
’s may be copied (which increments the reference count)
and
the data may be safely shared. When a
Res<T>

is destroyed by going out of scope, the reference count is
decremented. If the reference count ever falls to 0, the data is moved to the ‘most recently used’ end of the ‘not in
use’ list.


The system descr
ibed above has virtually no dynamic dispatch. The only virtual functions are destructors and in
ResourceLoader
. Instead, the template mechanism of the compiler figures out which ‘
Load
’ function to call.
Note that the resource manager only manipulates po
inters to
ResWrapBase
, from which the
ResWrap<T>

is
descended.


The resource manager also provides a preload mechanism which makes sure the data has been fetched from disk.
This function is not aware of the type of the data so no ‘
Load
’ function is called

if the object is not already in
memory. This function should take advantage of nonblocking file reads if the platform supports it.


In the code, if you want to refer to a resource without forcing it to be loaded, store the numeric resource id rather
than

a
Res<T>
.


My asset system can deal with a few version issues which crop up. It supports loading multiple resource files and
varying which files are loaded. The resource files that are loaded might depend on if you want the strings in
French, German, En
glish, or Japanese. Some of the art depends upon screen resolution. This slightly relaxes the
uniqueness of resource ids. Still, only a single version of a resource is available given a resource id at runtime.
This is a handy feature, but it is mostly

transparent. I only mention it here for completeness, and because it raises
the issue of managing these different versions at the resource creation stage.

Chapter 2. The Object Model

Script files started out as just a description of what was supposed to

be compiled into a resource file. Really they
should have asset management features such as the ability to convert many different source file formats (.jpg, .gif,
.tga, .png, etc. for images), detecting which files have been updated, etc.


The idea of th
e scripting language is to introduce another concept, called an object. Objects are going to contain all
of the dynamic data of the script engine, but should be accessed in a way much like resources. So:



Objects have an id which uniquely identifies th
e object. You can think of an id as a segment selector.



The high bit of object ids is always 1, while the high bit of resource ids is always 0. Many primitive
commands work with either; they check the high bit and do the right thing.



Objects have a t
ype. Due to the dynamic nature of objects, the type is not embedded in the object id like for
resources. However, there is a primitive function for getting the type of associated with an id which works for
both object ids and resource ids.



Objects have

a data area consisting of a sequence of primitive types. All primitive types take four bytes. This
data area may be resized.



To access the data area of an object, you need the object id and the offset into the type.



The only special primitive type i
s an object reference. Part of the type information indicates which offsets in
the object have object references. Note that if the reference is known at compile time to have a resource id,
then it is optional for the type to indicate the object reference
. This is most important for the ‘type’ resource,
below, whose object references only refer to resource ids.



The other data types (int, float, bool, enum, ...) are not interpreted by the runtime engine at all.


A consequence of this is that even a threa
d’s stack and global variables are objects, since they need to contain
dynamic data. The stack object will use all of the object features above, including the ability to resize. The first
few data members of the stack contains the current function resid,

the current instruction pointer, a base pointer, and
a bit array storing which stack items are object references. There will be one global data object for each compiled
module, and it will have an object id that is fixed at compile time. For enumeration

purposes, global functions are
considered members of the global data object. Really though, functions are only associated with objects by
convention.


Now the clever bit is that both types and functions are represented with resources, and so can be represented by their
unique resource id. In order to make the language more uniform, a type resource is also associated with every type
of resource (this is
recursive, but it terminates: there a single resource id representing the type of type resources, and
it is its own type). The type of a resource is fixed, however, so certain optimizations are possible. Note also that
some resources are opaque in that n
o interface is provided for accessing their internal structure, others (such as the
type resource) overload such things as operator[] and can even overload the primitive instructions of the virtual
machine.


Function resources contain their type (a resourc
e id) and a sequence of instructions. The type of a function contains
the return type, the number of parameters, the type of each parameter, and possibly a bunch of debug information.
For uniformity, all instructions take some multiple of four bytes.


Ty
pe resources conceptually store a mapping of enums to integers, though some of those integers represent resource
ids. These values are queried using the same instruction as to get the data at an offset of an object. The only
difference is that the valid
queries are not contiguous. Some enum subranges are reserved:



for ‘operator()’: describing the return type and parameters. This is the only range set in the type object for
functions.



for objects: initialization data



the ‘parent type’


Otherwise t
he type typically returns the resid of a function given an enum. Some enums are predefined:



a mechanism for storing which data offsets correspond to object references



standard functions such as constructor, destructor, addref, decref, etc.



a mechan
ism for seeing which types are compatible with this type

It is also common to store constants (which may be overridden in subtypes), and other types (example: the type of
the iterator associated with this container, or the type of the value contained in th
is container). Another possibility
is to store data offsets of data members, but this capability is not currently provided. One could imagine specific
cases where this would be useful, like that many dialogs have an ‘OK’ control.


Once you’ve queried the

resid of a particular member function, the function may be called directly or its type may
be queried. There is a primitive instruction for calling a member function given an enum and either object id or
type id. Note that if given an object id, it pass
es that object id as the first parameter to the function. Note that
subtyping is accomplished by having the subtype map a given enum to a function with the same prototype as in the
parent type.


Primitive Instructions mentioned so far:


Get/Set value at o
ffset X of object Y (can’t set the value of resources)


Get/Set size of data are of object X (can’t resize resources)


Get type of object X (I may allow object type to be modified, but certainly not for resources)


Call function X (exactly equivalent to applying operator() to X)


Call member X of Y


Get stack object


For an object Y, the difference between


T=Type of object Y


Call member X of T

and


Call member X of Y

is that the second passes object Y as the first

parameter to the function.


Just to stretch your brain: due to the essential similarity between objects and resources, it is possible to make objects
which behave very similarly to resources. Remember the runtime engine / virtual machine can overload the

primitive instructions based on whether the high bit indicates its a resource or object. The idea here is to allow
string resources to be completely compatible (even down to the virtual machine instructions used) with dynamic
strings. This extends to al
lowing objects with operator()’s to be passed as functions, and objects with a specific list
of required functions (see above) can act as types. This is convenient at various points: parameters of type
‘function’ are sufficient in many cases where you wou
ld otherwise have to pass an object, the compiler may at times
need to create type objects to handle template features not supported by the underlying type system, you could make
a type object which delegates most of its functions to another type but provi
des different initialization in the default
constructor, etc. The key is that the [] and () operators are fundamental: they parallel commands in the byte code,
and making them work right makes almost everything else work right.


It is worth mentioning tha
t this is the second script language. The first only had a concept of functions.
Unfortunately, it did not have much heap management to speak of, and no chance of incorporating garbage
collection. I was having a hard time building in support for dialog
boxes since the language had no features to
support a nice syntax without special case code. A design goal for the new scripting language is to not only support
objects, but also a syntax which makes declaring a dialog box clean.


I should say exactly wha
t type checking is done at run
-
time and what type checking is done at compile
-
time. The
facilities for both are in place: dynamic casting with run
-
time checks is supported in addition to a parameterized type
mechanism with compile
-
time error checks. The
standard library will have a preference for compile
-
time checks,
but can be used with run
-
time checks as well. In practice there are a lot of abstract classes followed by a layer of
concrete classes. Any layers beyond that may require dynamic typing. Ho
wever, for the safety of the virtual
machine, some crucial checks are performed at run
-
time even though the compiler would never produce code which
would cause a problem. If I become extremely ambitious, I’d consider proof carrying code, but almost certai
nly
not.


-----

Left off here
-----

Should still talk about



type resources



function resources



function type tells number of parameters, return type, type of each parameter, whether each parameter is in, out,
or inout



resources should be type compatible with corresponding objects, if you apply the abstract machine instruction
for accessing an array, you should get the same result applying it to a resource.

Chapter 3. Script Language Features

Inheritance


How compiler a
ssigns enums to member functions: optional functions, extern functions, also optional/extern data
members.


Restricted multiple inheritance


<rules>


Operator overloading


.get .set


Types (actually just those type references which may be resolved at compi
le
-
time) may be cast to a function which
takes the same parameters as the constructor, returning a new instance of that type. Would be nice if I can make the
beginning of the type record for a class look like the type record of this function.


clone (shal
low copy) is a standard member function. Maybe also deepcopy?


Several standard member functions are automatically generated. Ex: clone, constructor glue, does_extend,
next_obj_ref, (next_weak_ref,) reflection functions.


Strings ‘compatible’ with strin
g resources, even with eventual ‘final inline’ support (see below)


Bool arrays, like strings, are stored in a space efficient manner. Operator overloading (with .get and .set) handles
the translation to and from the compact form.



Importing member functions’: Setting member functions to any compatible member function (requires the compiler
to store which member functions are referred to by each member function, as well as data layout referenced). This
is needed to resolve member co
nflicts when using multiple inheritance


Dynamically sized arrays


Automatic generation of wrappers around primitive types (when an object is an expected)


Function objects: any object with an operator() may be passed as a function parameter.


The type typ
e! IsClass primitive function. Future: Has a bounding parameterization (i.e. ‘types extending
container<int>’).


Derivation of a class at variable declaration time.

Example:


// Standard Displayable with a title bar

class Window(string title) extends Dis
playable { ... }



// OKButton is like a button but:


// * sets itself as default


// * has text 'OK'


// * defaults to reasonable position in lower right

class OKButton() extends ButtonControl { ... }


class StatusMessage(string message="") extends Window
("Status Update")


{


override mBackgroundColor=rgba(1,1,1,1); // override default value


// How do we set modal? mModal=true? something else?


var mMessage=message;

// new member


const helperclass=Window;

// define a type alias



override function OnCreate(creator: object): void


{


StaticTextControl(mMessage) { y=10; x=CenteredX(); }


OKButton() // OKButton's creator is 'StatusMessage'


{ void OnPress() { global.log.add(creator.mMessage); } }



}


var mStatusImage = ImageControl(0) { x=20; y=20 }


}


Localization support. Still need to work out some issues here. Would be best if had a special tool to support this.
What do we need besides alternate versions of strings and other resources?


‘inout’ and ‘out’ parameters of functions use copy
-
in
-
copy
-
out protocol. Less indirection, and easy to make
compatible with ‘emulated variables’, i.e. functions with .get and .set modifiers. This protocol fails gracefully if,
for example, no function is

defined.


Use smart pointers to maintain all references from the C++ code into scripts (primarily callbacks and tying of game
objects to script objects). These will be root nodes for the garbage collector (in addition to global objects and stack
objects
inside the script engine).


Future: Reflection/introspection!


Future: ‘final inline’ functions (efficient, but less flexible). Normally just prevents overrides, but to use resource
strings and dynamic string interchangeable, require stricter type compati
bility.


Future: templates for the automatic generation of types, though only those that share an implementation. Template
parameters may only represent object types. Should support member templates, bounded recursive specialization,
bridges.


Future: fl
oat support


Future: Unrealscript ‘state’ support via switching between types. ‘Mode
-
switching’ from ‘A Theory of Objects’.
Restriction: only allowed to switch between parent/child with the same data format. Call optional member
functions OnLeaveState(t
ype to) and OnEnterState(type from).


Future: garbage collection. Note explicit delete operation supported. Only re
-
uses addresses if at garbage
collection time it decides it needs to compact (should only compact at garbage collection time since that is
when all
references are known and may be renumbered). Makes it easy to detect when dangling references are used and
handle them safely. Addresses in age order makes it easy to keep track of generations and maintain a generational
write barrier without ex
tra per
-
object storage. Unfortunately still need extra storage to get an incremental garbage
collector.


Thought: can I implement garbage collection in the language itself? Need hooks for write barrier, but there should
be plenty of reflection/introspect
ion information.


Future: weak references. May be able to get away with only having them in the “containedby<T>” base class.
Should be easy to implement in the virtual machine, but need a little extra support for compaction. Unfortunately
doesn’
t solve the “dialogs with an array of dialog controls each which wants a back pointer to the dialog not the
array” problem.


Future: ‘thread’ return type for functions. For functions which run in their own thread. Maybe they (immediately)
return a thread

id which can later be used by thread functions (suspend/kill/query/getstackof thread)


Sather’s rules for function parameters? Yes. In several places we need the concept of ‘this function/class fulfills
the requirements promised by a given declaration.’

For example: forward declarations. Would be nice if forward
declarations could be extern, but the function/class itself (which satisfies the protocol but may vary on details) is not.


Question:

Principle: When given a programmer declaration and a declar
ation which could be deduced (inheritance, whatever),
use the ‘tighter’ or ‘stricter’. When does this arise?


Rules:

Can’t use a type variable as a parameter or return type, have to use a template parameter in that case, or do dynamic
typing.


Can’t call
constructor of type variable. Can only call constructor on types that may be determined statically (i.e. at
compile
-
time). Instead: can cast a type to a function with the same parameters as the type’s constructor. (see next
item)


Those types which may
be determined statically (i.e. at compile time) may be cast to the function type matching the
signature of the constructor.


Different kinds of types:

static/compile
-
time/const type

template type

type variable


Note that a type variable corresponding to th
e actual type of a template parameter is (secretly) passed during
construction.


Future: Expose the object model concept of an offset within the type. Should approximate C++’s member function
pointers.


Chapter 4. Parameterized Types and Functions

Also kn
own as ‘templates’ or ‘generics’. Creates a family of classes parameterized by one or more ‘type
parameters.’


Type parameters occur inside angle brackets (‘<’ ... ‘>’).


Classes, global functions, and member functions may be parameterized.


Supports bounded parameterization: type parameter must extend a particular class. Useful since can only access
functions or member declared in the bound. Still useful without type bound, for example: containers that can
contain any object, and doesn’t re
quire assumptions about what type of object it holds.


Example:

class Comparable


{


require function IsSmaller(Comparable):bool;


}

class KeepBiggest<T extends Comparable>


{


var m : T;


function Add(p:T):T


{


var ret:T;



if (m.IsSmaller(p))


{


ret=m;


m=p;


}


else


ret=p;


return ret;


}


}

class IntLike(member i:int) extends Comparable, GarbageCollected


{


override function IsSmaller(p:Compara
ble):bool


{ // use a dynamic cast in this example to keep things simple


var p2 = Cast<IntLike>(p);


if (p2)


return i<p2.i;


else


return false;


}


/* Better way:


bridge function IsSmaller(IntLike p2):bool


{ if (p2)


return i<p2.i;


else


return false;


}


*/


// an even better way uses recursive bounded parameterization



function GetValue():int


{


retu
rn i;


}


}


function test():void


{


var i2=IntLike(2);


var i3=IntLike(3);


var i4=IntLike(4);


var k=KeepBiggest<IntLike>();


k.Add(i3); // returns null object, sets k.m to i3


k.Add(i2); // returns i2, k.m still i3


k.Add(i4); // returns i3, k.m now i4


}


Complicated, terse example:

class A<T1 extends B, T2 extends C<T1>> extends D, E<T1>


{ //...


function memberfunc(T2):T1;


function templatedmember<T3, T4>(T4):T3;


}

function globalfunc<T5 extends F,

T6 extends G>(V<T5>, F):T6;

var a=A<H,I>();

var i=I();

var h:H=a.memberfunc(i);

var j=J();

var k:K=a.templatedmember<K>(j);

class L extends F;

var v:V<L>;

var l=L();

var g:G=globalfunc<G>(v, l);


Concept of erased types. The type safety afforded from tem
plates is strictly from static analysis done at
compile
-
time. Most template information is lost at runtime. On the other hand, runtime type querying of objects is
more powerful, but slower.


Single bytecode of function works for all possible type paramet
ers.


Can’t use primitive types as type parameters, only descendants of ‘object’.

For many reasons: garbage collection, compaction, reference counting, function dispatch, etc.


Careful! ‘type’ type and ‘function’

type not concrete/primitive! Usually stored as a resource (so safe to incref/decref
but not necessary) but sometimes stored objects.


Use wrappers for primitive types, e.g. Int, Bool, Float, and Enum<T>.

(Maybe should support automatic usage of the wrappe
r when a primitive type is specified as a type parameter)


Actual type passed as hidden parameter to constructor.


Recursive parameterization allowed (and encouraged!)


Example:


class LessThanComparable<T extends LessThanComparable<T>>



{ require functio
n operator<(T):bool; }


class String extends LessThanComparable<String>;

If need dynamic dispatch, can then use descendants of String.


Given: “class A<T extends B>”

Restrictions: A can’t call T’s constructor
--

don’t know number and type of parameters. C
an’t use “T.member”
except during construction or if member is ‘final’ in B.