Synthetic Biology – Engineering in Biotechnology - D-BSSE - ETH ...

lowlytoolboxΒιοτεχνολογία

22 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

155 εμφανίσεις





Schweizerische Akademie der Technischen Wissenschaften
Seidengasse 16, CH-8001 Zürich
Tel. +41 44 226 50 11, Fax +41 44 226 50 19 Mitglied der
info@satw.ch, www.satw.ch Akademien der Wissenschaften Schweiz












Synthetic Biology – Engineering in Biotechnology




A report written by Sven Panke, ETH Zurich

On behalf of the Committee on applied Bioscience, Swiss Academy of Engineering Sciences




















„What I cannot build, I cannot understand“
Accredited to Richard Feynmann
Synthetic Biology – Enginering in Biotechnology
- 2 -
Executive Summary

Synthetic Biology summarizes efforts directed at the synthesis of complex biological systems to
obtain useful novel phenotypes based on the exploitation of well-characterized, orthogonal, and re-
utilizable building blocks. It aims at recruiting design structures well established in classic
engineering disciplines, such as a hierarchy of abstractions, system boundaries, standardized
interfaces and protocols, and separation of manufacturing and design, for biotechnology. Synthetic
Biology generally entertains the notion that successful design of a biological system from scratch – as
opposed to the current practice of adapting a potentially poorly understood system over and over
again towards a specific goal – is the ultimate proof of fundamental understanding and concomitantly
the most powerful way to advance biotechnology to the level required to deal with today’s challenging
problems.
First achievements include the design and implementation of synthetic genetic circuits, the
design of novel biochemical pathways, and the de novo synthesis of a bacterial genome. However, it is
clearly the ultimate aspiration of the field to extend the mastery of biological engineering to systems
complex enough to deal with problems such as the design of novel therapeutic treatments, the
production of liquid transportation fuels, or the efficient manufacturing of biopharmaceuticals and
fine, bulk, or fuel chemicals.
Synthetic Biology is a field in its infancy, but as it desires to adopt structures that have
resulted extremely successful in industrial history, the potential benefits of the field are enormous.
The success of Synthetic Biology will depend on a number of challenging scientific and technical
questions, but two points stand out: [i] Synthetic Biology needs to prove that the concept of
orthogonal parts and subsystems is viable beyond the few domains for which it has been established
at the moment (in particular the field of genetic circuits and the “Registry of Standard Biological
Parts” at the MIT); and [ii] Synthetic Biology is in need of a technological infrastructure that a)
provides access to cheap but accurate non template-driven (de novo) DNA synthesis that can cope
with the requirements of system-level engineering and b) allows installing and maintaining an
“Institute of Biological Standards” that provides the base for implementing the Synthetic Biology
principles across the biotechnological community.


ETH Zürich, April 2008
Synthetic Biology – Enginering in Biotechnology
- 3 -
Contents Page

Executive summary 2
1. Synthetic Biology in context 4

2. The challenges of Synthetic Biology 9
2.1. Orthogonality in synthetic biology 11
2.1.1 Integration of unnatural amino acids into proteins 11
2.1.2. Orthogonal RNA-RNA interactions 14
2.1.3. Orthogonality in single molecules 14
2.1.4. Orthogonality by alternative chemistries 16
2.2. Evolution and Synthetic Biology 17
2.3. The technology to implement synthetic biology 18
2.3.1 DNA synthesis and assembly 19
2.3.1.1. Towards large-scale, non template-driven DNA synthesis 19
2.3.1.2. Oligonucleotide synthesis 20
2.3.1.3. Oligonucleotide assembly 25
2.3.1.4. Assembly of DNA fragments 26
2.3.1.5. Assembling DNA at genomic scale 27
2.3.1.6. Transferring artificial genomes into novel cellular environments 29
2.3.1.7. Practical considerations 29
2.3.2. Miniaturizing and automating laboratory protocols and analysis 30
2.3.2.1. Microfluidic components 31
2.3.2.2. Fabrication methods 32
2.3.2.3. Applications of microfluidics 33
2.4. The tool collection and modularity in Synthetic Biology 37
2.4.1. Modularity in DNA-based parts and devices – the Registry of Standard
Biological Parts 38
2.4.2. Devices: Genetic circuits 42
2.4.3. Fine tuning of parameters: the assembly of pathways 43
2.4.4. Modularity in other classes of molecules 44
2.4.5. The chassis for system assembly: Minimal genomes 45
2.5. Organizational challenges in Synthetic Biology 50
2.6. Synthetic Biology in society 52
2.6.1. Synthetic Biology and biosafety 53
2.6.2. Synthetic Biology and biosecurity 54
2.6.3. Synthetic Biology and ethics 55

3. Summary 55

4. Acknowledgements 56

5. Useful websites 56

6. Abbreviations 58

7. References 59
Synthetic Biology – Enginering in Biotechnology
- 4 -
1. Synthetic Biology in context


Since the advent of genetic engineering with the production of recombinant somatostatin and
insulin by Genentech in the late 1970s, biotechnology has tried to exploit cells and cellular
components rationally to achieve ever more complex tasks, primarily in the areas of health and
chemistry, and more recently also in the energy sector. The manufacturing of biopharmaceuticals has
advanced from simple peptide hormones (the disulfide-bridges of which had to be introduced
afterwards) to the manufacturing of monoclonal antibodies complete with specific glycosylation
patterns. The manufacturing of small pharmaceutical molecules has advanced from an increase in
overproduction of natural products (such as the penicillins) to the assembly of entire novel pathways
(e.g. for artemisinic acid production [1]), and so has the manufacturing of chemicals (e.g. from
overproducing intermediates of the citric acid cycle such as citrate to compounds alien to most
microbial metabolisms such as 1,3-propanediol and indigo [2, 3]).
The same can be said for other sectors that apply the tools of molecular biology, such as gene
therapy [4], tissue engineering [5, 6], diagnostics [7], and many others.
However, despite the numerous successes of the biotechnology industry, major challenges
remain. The chemical industry might need to replace a substantial part of their raw material base for
a substantial part of its products – can biotechnology come up reliably with robust solutions for novel
production routes in a suitable timeframe and with reasonable development costs? The same goes for
our energy industry – can biotechnology play a significant part in the de-carbonization of our energy
systems, preferably without provoking major opposition on environmental or ethical issues? The list
of questions could easily be extended for example into the field of gene therapy or any other field in
which biotechnology plays already or could play in the future a major role. In fact, against these
challenges, the successes of biotechnology appear only as something as the first steps of an industry
that has managed to mount a couple of convincing cases but needs to do much more to adopt the
potentially overwhelming role it might theoretically be well placed to assume.
The reason for this lies of course in the complexity of biological systems, which makes it
difficult to reliably engineer them and essentially converts every industrial development project into a
research project that needs to cope with unexpected fundamental hurdles or completely new insights
into the biological system. Still, the challenges we face will need to be addressed to a large extent in
these complex biological systems – cells. Cells represent a multitude of different organic molecules
with a multitude of interactions, most of them highly non-linear, and thus prone to emergent
properties. Of course, advances in Systems Biology allow us to delineate an ever larger number of
these interactions in considerable detail (see below), but it is sobering to realize that even the
dynamics of arguably the best understood single pathway in metabolism, glycolysis, in one of the best
studied model organisms, Saccharomyces cerevisiae, can still not be satisfactorily explained in terms
of its constituting members [8]. Overexpressing one recombinant gene in Escherichia coli triggers
Synthetic Biology – Enginering in Biotechnology
- 5 -
hundreds of other genes to change their transcriptional status, and only for some of them it is clear
what might motivate the cell to do so [9]. Clearly, our understanding of biological systems is still far
from complete.

This prompts two questions: [i] how is it possible that against such a complex background it is
possible at all to successfully introduce complex artificial functions (such as assembling a novel
pathway for the production of a complex small molecule) and [ii] are we using the right concepts to
deal with complexity in biotechnology?

For the first question, I guess it is safe to argue that the successes in recombinant
biotechnology are primarily the result of a non-straightforward process in which a basic idea was put
to the test and modified and expanded for so long until every (unexpected) difficulty along the way
was taken care off to such an extent that the resulting process was still economical (I restrict the
argument to biotechnological manufacturing processes here, but it is easy to make the equivalent
argument for other areas as well). Or to put it in different words, I guess it is safe to argue that
today’s successfully operating fermentation processes employing cells that had to undergo
substantial genetic engineering are not the result of a targeted planning effort of a group of people
that had a clear idea of the problems they were facing and in addition a clear idea of the tools that
they had at their disposal to work around such problems in a reasonable amount of time. Or, to
paraphrase the key point once more, one could argue that the implication of essentially any novel
biotechnological process today primarily remains a research project, as pointed out above. The
existing examples illustrate that it is possible to be successful with such research projects, but the
effectively small number of examples (relative to the number of, e.g. novel chemical processes that
have been introduced over the same time) suggests also that such research endeavors remain difficult
and costly.
For the second question, it might be helpful to compare the current “design processes” in
biotechnology with design processes in more mature fields, such as classical engineering disciplines
(e.g. mechanical engineering or electrical engineering) in which the success rate of design efforts is by
orders of magnitude better than in biotechnology, and then to try to understand whether there is a
chance that we could recruit some of the key practices to biotechnology in the future.
This comparison has been at the heart of the emerging field of Synthetic Biology, and
consequently it has been treated in a number of very instructive papers over the last years [10-18]. In
summary, five points have been identified that are crucial in engineering but by and large absent
from biotechnology:
Synthetic Biology – Enginering in Biotechnology
- 6 -

1) Comprehensiveness of available relevant knowledge
2) Orthogonality
3) Hierarchy of abstraction
4) Standardization
5) Separation of design and manufacturing

Knowledge of relevant phenomena in classical engineering disciplines is rather comprehensive.
Mechanics is a mature science from which mechanical engineers can select the proper elements that
are relevant to their specific tasks, and so is electrophysics (for electrical engineering), while chemical
engineering can at least draw on an – at least theoretically – comprehensive formalism to describe the
effects of mass, heat, and momentum transfer on chemical reactions. In all of these examples, rather
elaborate mathematical formalisms are in place that allow to describe the behavior in time of such
systems, and so are the techniques that allow reasonable estimates for the unknown parameters (if
any) required for these descriptions.
Biotechnology works under rather different circumstances from the five points discussed above:
First of all, our knowledge of the molecular events in a cell is far from complete. Even though the
cataloguing of biological molecules (genomics, transcriptomics, proteomics, metabolomics) has made
tremendous progress since the first complete genome sequences started to become available [19], the
interactions between these molecules and their dynamics remain to the largest extent unknown, and
in many cases we do not even know about basic functions (e.g. 24% of the genes in E. coli remain
without proper functional characterization [20].
Next, the issue of orthogonality (“independence”): This concept has been adopted from
computer science, where it refers to the important system design property that modifying the
technical effect produced by a component of a system neither creates nor propagates side effects to
other components of the system. In other words, orthogonality describes the fact that adjusting the
rear view mirror in the car does not affect the steering of a car or accelerating does not affect listening
to the car’s radio. It is the prerequisite for the possibility to divide a system into subsystems that can
be developed independently. As such, it is also the prerequisite for modularity: a system property that
allows that various (orthogonal) subsystems can be combined at will, i.e. that they have the proper
connections to fit to the other modules. Various elements contribute to orthogonality in non-
biological designs: for example, electrical circuits and thermal elements can be insulated, chemical
reactions can be confined to separate compartments, and mechanical elements can be placed
spatially separated so that they do not interact.
In contrast, cells are complex and orthogonality mostly absent and then difficult to implement.
In bacterial cells, for example, only one intracellular reaction space is available, the cytoplasm, and it
hosts hundreds of different simultaneous chemical reactions while at the same time working on the
Synthetic Biology – Enginering in Biotechnology
- 7 -
duplication of its information store. Introducing and expressing a single recombinant gene into E. coli
changes the expression patterns of hundreds of genes [9], indicating the degree of connection
between various cellular functions. On the other hand, nature provides ample examples of
orthogonality, for example in eukaryotic cells that have available organelles to carry our specialized
tasks or in higher organisms that have organs dedicated to specific functions. However, it is probably
safe to say that consciously engineering orthogonality into biological systems to help the design
process has so far played no role in biotechnology.
The third point, the hierarchy of abstraction, reflects the assembly of complex non-biological
systems from orthogonal sub-systems. If it is possible to separate the overall system into meaningful
subsystems, which in turn might be once more separated into meaningful subsystems, and so on,
then the design task can be distributed over several levels of detail at the same time (provided that
the transitions from one level to the next/one sub-system to the next are defined in advance). This
has two major implications – the development time will be much reduced by parallel advances and
every level of system development can be addressed by individuals who are specialists for this specific
level of detail, so the overall quality of the design improves. To adopt an example from the design of
integrated electric circuits, diodes, transistors, resistors, and capacitors are required to fabricate
electronic logic gates (AND, OR, NOT, NAND, NOR), which in turn are the elements to make
processors (e.g. for a printer). The term “transistor” describes a function (controlling a large electrical
current/voltage with a small electrical current/voltage), but it also hides detailed information which
is in most cases not relevant for the person that uses the transistor to build an AND gate. The AND
gate in turn describes a function (an output signal is obtained if two input signals are received
concomitantly) and can be implemented in various forms composed of various circuits elements,
which influence its performance, but the details of how this was achieved are not relevant for the
designer of the processor. In other words, at every level of the design process, detailed information is
hidden under abstract descriptions. The higher the level, the more information is hidden.
Again in contrast, the description of biological systems is usually strongly focused on the
molecular level, and formalized, abstract or functionalized descriptions in the sense discussed above
are rare. A frequently used analogy here is that the equivalent to programming cells today (writing
down a specific recombinant DNA sequence) is programming a computer in binary code instead of
using one or the other very powerful specialized user-friendly software. The advent of Systems Biology
is about to establish the idea of functional modules in the cell [21], but clearly we are nowhere near
the sophistication with which we are familiar in classical engineering disciplines when it comes to
structuring a biological engineering approach. Even worse, in essentially all instances our adoption of
functional terms (e.g. “promoter”) reflects a qualitative understanding of a higher-level function, but it
hides the fact that detailed quantitative information on the specifics of this function are not or only
partly available, should we be interested in the details of the next level of the hierarchy (which
proteins bind to regulate activity – how strong is the binding- what is the importance of the
Synthetic Biology – Enginering in Biotechnology
- 8 -
nucleotides between the -35 and the -10 region – what is the promoter clearance rate under which
circumstances?)
As already mentioned, the hierarchy of abstraction requires that the transitions between the
different levels are properly specified at the beginning, so that the various levels can integrate
seamlessly in the end. To continue the example from above, to be able to assemble a processor from
the various logic gates, it will be helpful if the gates are cascadable – the electronic currents leaving
one gate have to be in the proper range to be recognized by the next gate, and the two gates need to
be physically connected to each other, for which it is helpful if the number and size of output pins of
the first gate is compatible to the number and size of the receiving sockets in the next gate. In
addition, if these transitions are made according to more general specifications, it should be possible
to integrate (potentially superior) parts from other manufacturers. In other words, the transitions
should be standardized. In fact, standardization needs to encompass many more elements of the
design process: For example, measurement protocols need to be standardized so that practices
become comparable and can be made available to everybody involved in the design process. The same
is true for the documentation of system components and the storage of this information in databases.
Only sufficient compliance with these standards ensures that a designed element of the system has a
high chance of re-utilization.
Biotechnology is characterized by the absence of standards. As orthogonality in cells is at best
an emerging concept, no clear ideas on how to standardize information exchange between such
elements exist. In terms of standardizing experimental protocols, it is only now that with the advent
of Systems Biology certain standards on data quality and measurement techniques start to produce
an impact [22] in the face of the need to handle ever larger sets of experimental data delivered by the
experimental “-omics” technologies. But the impact of the lack of standards in experimental
techniques goes much further: a simple discussion about the relative strength of two different
promoters between two scientists from two different labs will quickly reveal that the discussion is
pointless, because in the corresponding experiments they have used different E. coli strains on
different media after having placed the promoter on replicons with different copy numbers per cell to
drive expression of two different reporter genes, of which one encoded a protein tagged for accelerated
turnover. In summary, it is very difficult to answer basic questions that can be expected to be
important for designing biological systems, and even if comparable experiments exist, it almost
always requires an extensive search through the available scientific literature to locate the
information.
The fifth key difference between current engineering practice and “biological engineering” that
was identified is the separation of design and manufacturing. Put differently, the design of a car is
done by a different group of people than the car’s assembly at the assembly line, and the assembly is
actually requires comparatively little effort. The different groups of people also have different
Synthetic Biology – Enginering in Biotechnology
- 9 -
qualifications and have received different training – they are specialists, and can afford to be because
the design was so detailed that the manufacturing can be confidently expected to give no problems.
On the other hand, the major part of the time an average PhD student in molecular
biotechnology spends at the bench, is still spent on manipulating DNA – isolating a gene from the
genomic DNA of a specific strain, pruning it of specific restriction sites inside the gene, adapting the
ends to cloning in a specific expression vector, adapting codon usage, playing with RNA and protein
half-life, etc. In other words, the manufacturing of the system is still a major part of the research
project, and in fact in many cases it is still a research project on its own.

Against this background, Synthetic Biology, in the definition adopted for this report, aims at
providing the conceptual and technological fundamentals to implement orthogonality, a hierarchy of
abstraction, standardization, and separation of design and manufacturing, into biotechnology. In other
words, it aims at providing the prerequisites of operating biotechnology as a “true” engineering
discipline. This implies aiming [i] at making biological systems amenable to large-scale rational
manipulations and [ii] at providing the technological infrastructure to implement such a large scale
changes. In this effort, Synthetic Biology depends on a substantial increase of the knowledge base in
biotechnology, which is expected from Systems Biology.


2. The challenges of Synthetic Biology


Of the points discussed above, six main challenges arise – two scientific, two technological, one
organizational, and one societal challenge.
One key scientific challenge of Synthetic Biology will be to implement the concept of
orthogonality in complex biological systems. Rational, forward-oriented engineering on a considerable
scale for example in cells will only be possible if we can limit the impact of our modifications on the
cellular background into which we want to implement our instructions. Failing here, success in large-
scale cellular manipulations will necessarily remain a matter of trial and error, because it appears
presumptuous to assume that we will understand cellular complexity to such an extent that we can
comprehensively integrate it into the design process.
In addition, orthogonality will be a fundamental prerequisite for the application of the hierarchy
of abstraction. The latter depends on our ability to define meaningful sub-systems that can be
worked on and later combined. This makes only sense if the number of interactions between the
different subsystems is limited, ideally to the few interactions that are desired and eventually
standardized.
Synthetic Biology – Enginering in Biotechnology
- 10 -
The second key scientific challenge will be how to deal with evolution. Though this question has
so far hardly played a role in the Synthetic Biology discussion, it is intrinsically connected to the
long-term vision of Synthetic Biology: because evolution is a permanent source of change in living
systems, it will always represent an obstacle when it comes to preserving the integrity of artificial
designs in the long run. In the short term, the problem might not be particularly dramatic – after all,
we have learned how to deal with unstable but highly efficient production strains over the
approximately 70 generations it takes to fill the large scale reactor starting from a single cell from the
cell bank. However, if the ideas of standardization, abstraction, and orthogonality can be effectively
implemented, it is easy to foresee that strains undergo various generations of modifications, and it
will be important to be sure that the strain that receives the final modification is still the one that
received the first generation of modifications.
One technological challenge of Synthetic Biology is clearly to adapt our current protocols for
strain manipulation to the drastic change in synthetic scope that will be required. The ambition to
use ever more complex catalysts for biotechnology will inevitably lead to the requirement to realize
ever more voluminous synthetic tasks: on the one hand, our ambition to adapt cellular systems to
concepts such as orthogonality will require large-scale adaptations of existing cells, which is
currently only possible by working through complex lengthy laboratory protocols. On the other hand,
the DNA-encoding of ever larger sets of instructions will require the synthesis of ever longer DNA
fragments. At the same time, we will need to adapt our data acquisition technologies to the
requirements of an engineering discipline with respect to accuracy, reproducibility, and amount of
labor to invest per measurement. It should be clear that it will be impossible to simply linearly scale
up our current efforts in the construction of strains for biotechnology if the vision is that at least
dozens of gene adaptations are required for any given new strain generation. It should be as obvious
that by transforming biotechnology from a research-driven discovery science into an engineering
activity, the reliable determination of fundamental data under pre-defined circumstances will become
much more important and has to become much more routine, and we will require the corresponding
technology to accomplish this.
A second technological challenge is to make available the tools of the trade – the “parts” and
“devices” from which complex systems can be assembled rationally and the strains and cell lines into
which these assemblies can be implemented. This is of course intimately connected to solving the
problem of orthogonality. An additional aspect here is to provide suitable modes of assembly – it
would be helpful if we could develop parts in such a way that they can be easily combined - in fact
behave modular.
Organizationally, much of the impact of Synthetic Biology will depend on whether it is possible
to gather a critical mass of scientists to adopt engineering-inspired standards and standard operating
procedures, for example for the measurement of variables of central importance in Synthetic Biology
(e.g., promoter clearance rate of recombinant promoters) and its suitable subsequent documentation
Synthetic Biology – Enginering in Biotechnology
- 11 -
for example in a centralized data repository. Beyond the experimental arena, this request for a critical
mass needs to be extended to the adoption of certain strategies to deal with questions of ownership –
it will be difficult to motivate a lively use of central repositories, for example, if the user has to go
through several layers of patent research before using.
Finally, Synthetic Biology will inevitably touch on questions regarding its ethical boundaries
and aspects of biosafety (the safety of scientists and the public from the application of Synthetic
Biology in the laboratory) and biosecurity (the security of people against military or terroristic
exploitation of the technology). The notion of making life “engineer-able at the discretion of the
engineer” will undoubtedly trigger the need for communicating at least the broader aims of this
scientific community. Efforts to define a “minimal-life chassis” (see below) will most probably trigger
ethical concerns about who has the authority to make such definitions. Certain aspects of the debate
on the safety of genetic engineering from the 1980s and 90s will be re-visited once large-scale
manipulation of living systems becomes a reality and most probably there will be a need to develop
strategies on how to use the novel technologies of Synthetic Biology to increase biosafety. And finally
the potential for military abuse of specific technological developments needs to be assessed.

In the following, I will discuss work that has been addressing the challenges for Synthetic
Biology mentioned above. It is important to point out that much of the work that is described here
was not carried out under the label of Synthetic Biology. Rather – next to some of the “hallmark-
experiments” of Synthetic Biology - I have collected work that indicates that the ambitious goals of
Synthetic Biology might indeed be reached at some point in the future.

2.1. Orthogonality in Synthetic Biology

Orthogonality is a concept that in a way is intrinsically alien to many current notions of biology.
In particular the various “-omics” disciplines convey the image that biology is above all about complex
interactions that might even be too complex to ever fathom completely, let alone be mastered to a
biotechnologically relevant extent. While the complexity of for example living cells is for sure a
daunting obstacle in the way of implementing Synthetic Biology, a number of experiments have
shown that orthogonality can be “engineered” into a cell provided a suitable experimental strategy is
chosen. Specifically, the directed evolution coupled to a proper selection strategy can be used to
identify the system variant that complies at least to a considerable extent with the requirements of
orthogonality, as can be seen from the examples discussed below.

2.1.1. Integration of unnatural amino acids into proteins (Schultz-lab, Scripps, San Diego)

Protein synthesis at the ribosome is usually limited to a set of twenty amino acids, the
proteinogenic amino acids. Even though the cell has developed a variety of strategies to extend the
diversity of protein composition post-translationally (e.g. by glycosylation, phosphorylation, or
Synthetic Biology – Enginering in Biotechnology
- 12 -
chemical modification of selected amino acids), there is considerable interest in extending the canon
of proteinogenic amino acids beyond twenty, for example to overcome limitations in structural
analysis of proteins or provide precise spots for posttranslational modifications to facilitate any
chemical manipulation of biologically produced proteins [23].
The specificity in protein synthesis is maintained at various points throughout the process:
First, an amino acid is attached by a synthetase to a corresponding empty tRNA. Each amino acid is
recognized by a specific synthetase (interface 1) and the resulting aa-enzyme (aa for aminoacyl)
complex discharges its amino acid onto a specific (set of) tRNAs (interface 2). Finally, the anticodon of
the charged tRNA interacts with the mRNA at the ribosome (interface 3). Consequently, to introduce a
novel, unnatural amino acid into cellular protein synthesis, there are at least the interactions at three
different interfaces to consider. Each of these interfaces requires consideration of two times two
specific questions:
Interface 1:
a1) The unnatural amino acid must be recognized by one empty synthetase …
a2) … but it must not fit into any other synthetase.
b1) The synthetase must recognize the new amino acid, but….
b2) … it must nor recognize any other amino acid.
Interface 2:
c1) The empty tRNA must be recognized and charged by the charged aa- synthetase complex….
c2) … but it must not be recognized nor charged by any other charged aa-synthetase complex.
d1) The charged aa-synthetase complex must recognize and charge the empty tRNA….
d2) … but it must not recognize nor charge any other empty tRNA.
Interface 3:
e1) The codon on the mRNA must recognize the anticodon on the charged aa-tRNA…
e2) … but it must not recognize any other aa-tRNA.
f1) The charged aa-tRNA must recognize the codon on the mRNA…
f2) … but it must nor recognize any other codon.
Clearly, not all 12 requirements really need to be fulfilled for a system to work efficient – even in
the wild-type system a tRNA can recognize several codons in some cases, all of which code for the
same amino acid (“wobble”), and some requirements are equivalent (a1/b1, c1/d1, e1/f1). But for the
introduction of an unnatural amino acid into the process of protein synthesis, ideally there should be
a completely orthogonal system where each of the three interfaces is specific enough to ensure
meeting all of the 9 unique requirements. As it becomes clear from the structure of each pair of
requirements, each requirement in a pair can be enforced by a combination of positive (“must
recognize”) and negative selection (“must not recognize”). Consequently, by undergoing at most 6
experimental pairs of positive and negative selection, it should be possible to select a system that can
Synthetic Biology – Enginering in Biotechnology
- 13 -
integrate an unnatural amino acid without interfering with the remaining protein synthesis
machinery.
This strategy has been realized by the group of P. Schultz in San Diego [23]. They used a
tyrosine-transferring tRNA from Methanococcus jannaschii and changed the anticodon such that it
would recognize the amber STOP codon UAG on an E. coli mRNA (interface 3). Next, they established
two selection strategies:
To engineer interface 2, they put a library of the engineered tRNA into an E. coli strain
harboring a gene encoding a toxic gene product (barnase). The gene contained two amber STOP
codons. Expressing a library of variants of the engineered tRNA in this strain eliminated all those
tRNA variants that could be charged by any of the E. coli synthetases (the charged tRNA variants
would suppress the amber STOP-codons in the barnase gene, leading to cell death – negative
selection). In a second step, those library members in strains that survived step 1 (= could not be
charged by the E. coli machinery), were recovered and transferred into an E. coli strain that contained
the recombinant Tyr-tRNA-synthetase from M. jannaschii together with an -lactamase resistance
gene inactivated by another amber STOP codon. Expressing the variants of the tRNA library in the
presence of ampicillin allowed only those strains to survive that harbored variants that could be
charged by the recombinant synthetase, because only those were able to suppress the STOP codon in
the resistance gene (positive selection).
In a next step, the amino acid-specificity of the recombinant synthetase was engineered
(interface 1), again by a combination of negative and positive selection. A library of synthetase-
variants was transferred into an E. coli strain that contained an antibiotic resistance gene with an
amber mutation. The library was then grown in the presence of the antibiotic to select for those
variants that could suppress the mutation, presumably because a synthetase variant was able to
charge the cognate tRNA, either with the unnatural amino acid from the medium or with one of the
natural amino acids. To eliminate variants that suppressed with natural amino acids, the survivors of
the positive selection were transferred into a strain with a mutated barnase gene (see above).
Expressing this library in the absence of the unnatural amino acid eliminated all synthetase variants
that charged natural amino acids onto the cognate tRNA.
With this experimental system, requirements a1, b1/b2, c1/c2, d1, e1, and f1 could be covered.
Assuming further that requirement f2 is guaranteed by the selection of a “wobble”-free tRNA,
insertion of unnatural amino acids could be engineered to meet 6 or the 9 unique requirements with
this system. In this specific case, requirement e2 is also covered in a manner of speaking – a
suppressed STOP codon is by definition only recognized by the suppressor tRNA. However, the STOP
codon appears in a variety of other genes as well, where it actually should preferably maintain its old
function, so the system allows not to properly select against the “natural” function of the amber
STOP-codon.
Synthetic Biology – Enginering in Biotechnology
- 14 -
Nevertheless, the system has been used very successfully to engineer the insertion of unnatural
acids into proteins [23]. This gives a clear indication that the power of directed evolution can be
recruited to help the development of orthogonal systems once it is possible to design the proper
selection systems.

2.1.2. Orthogonal RNA-RNA interactions (Chin-lab, MRC Cambridge, and others)

Another very promising experimental system has been implemented by the group of J. Chin in
Cambridge: They used the specificity of the interaction between ribosome binding site (RBS) on the
mRNA and the 16S rRNA on the ribosome to create orthogonal ribosome populations that do not
translate wild-type mRNAs but only those mRNAs whose gene is preceded by an orthogonal RBS [24].
The experimental strategy followed is very similar to the one discussed above: The authors used a
gene fusion connecting an antibiotic resistance gene (for positive selection) and a gene for a uracil
phosphoribosyl transferase, whose gene product catalyzes the formation of a toxic product from the
precursor 5-fluorouracil (for negative selection). In a first round, RBS’s that do not give rise to
translation in a normal E. coli background were selected from a library of all possible variants of the
natural RBS by negative selection. In the second round, the survivors were used to complement a
library of mutated 16S rRNA molecules, and mRNA/ribosome pairs that could produce antibiotic
resistance were positively selected [25].
In an alternative approach from the laboratory of J. Collins in Boston, the interaction between
RNA molecules was used to either repress or induce gene translation. Translation was repressed
when an mRNA was equipped at the 5’-end with a sequence that would produce an RBS-sequestering
secondary structure. The repression could be relieved by producing a trans-activating (ta) RNA that
forced the 5’-end into a different secondary structure which made the RBS-sequence available. The
authors produced 4 pairs of 5’-RNA sequences and corresponding taRNA sequences. When they
checked for interactions between non-corresponding 5’-mRNA and taRNA sequences, they did not
find any, suggesting that these combinations of RNA sequences were orthogonal to each other and
thus might be a general tool to be used with a variety of different promoters [26].

2.1.3. Orthogonality in single molecules

Orthogonality can also be engineered into the elements that make up single molecules. So far,
three groups of molecules have been intensively discussed: DNA, RNA, and proteins. The notion that
promoters, ribosome binding sites, genes, and transcriptional terminators can be combined freely to
a large extent is one of the pillars of genetic engineering. Even in systems were DNA sequences have
multiple use, this can usually be changed: For example, in phage T7 certain parts of the DNA
sequences are used to encode the C-terminus of one protein and simultaneously the regulatory
region and the N-terminus of a second protein, and the implications of this “double usage” are
quantitatively unclear. “Rectifying” this situation by expanding the DNA and making the
Synthetic Biology – Enginering in Biotechnology
- 15 -
corresponding elements “mono-functional” led to viable viruses – implying that “orthogonalization” of
DNA sequences can work – that produced, however, much smaller plaques – implying that the
orthogonalization had a direct impact on the fitness of the virus [27].
However, as DNA is only the store of information and the information is extracted via RNA
molecules and then in many cases executed by proteins, the impact of this orthogonality is limited to
the orthogonality of the subsequent operators. As a consequence, it is often difficult to attach a
specific quantitative measure to a specific DNA sequence. For example, the efficiency of a RBS might
be a strong function of the surrounding sequence context – even though an RBS might actually very
efficient in initiating translation, being sequestered by the surrounding mRNA prevents it from
functioning at all. Consequently, to make use of quantitative descriptions of the effect of specific
sequences (promoter clearance rate connected to a specific DNA promoter sequence – rate of
translation initiation as a function of RBS sequence – percentage of terminated mRNA syntheses as a
function of transcriptional terminator sequence) we most probably will need to define specific DNA
contexts that allow a direct use of the information because we can be sure that, for example, there is
no sequestering or RBS involved.
Again, such sequences can be engineered given the right selection system: Orthogonality as for
example engineered into artificial regulatory RNA elements, as discussed above for interactions
between separate RNA molecules, but also for orthogonality within single RNA molecules as recently
demonstrated in [28]. The authors designed RNA molecules that contained [i] an aptamer sequence
that was responsive to a small molecule and [ii] a segment that was complementary to a part of the
mRNA sequence that contained the start codon of a recombinant gene. The presence of a small
molecule in the cytoplasm induced a change in the alignment of complementary sequences in the
synthetic riboregulator and either sequestered or exposed the part of the riboregulator that could
sequester the start codon on the mRNA. These synthetic riboregulators were orthogonal in that the
aptamer parts could be interchanged without significant change in regulatory behavior and in that
the regulated gene could be exchanged as well (of course within the limits of the tested set of
combinations).
Various degrees of orthogonality across protein domains are in fact a well established fact, in
particular in regulatory proteins that can be frequently separated at least into effector domains and
protein-DNA interaction domains. One important domain-class in DNA-protein interactions is the
zinc-finger protein (ZFP) domain [29]. Here, one such domain, stabilized by a zinc-ion, recognizes
typically a DNA base-triplet. When more than 3 (or in some cases, 4) base pairs are needed for
recognition, several ZFP domains can be connected with a small linker without interfering with the
interactions of the first ZFP domain. Ideally, orthogonal ZFP domains would be available for all 64
possible DNA base-triplets and by simply selecting the proper set of ZFP domains it would be possible
to create highly selective DNA-binders to arbitrary DNA sequences. In fact, multiple applications can
be envisioned around such a technology [30-35]. Even though this perfect scenario is not yet
Synthetic Biology – Enginering in Biotechnology
- 16 -
possible, considerable advance has been made in designing orthogonal ZFP domains for 48 of the 64
possible triplets by rational and phage display methods and up to 6 of these ZFP domains have been
combined to produce selective binders with a recognition sequence of 18 bp [36].
However, pushing orthogonality to such extremes also revealed current limitations: for example,
while a designed binder did bind to the designed sequence, they also bound with less affinity to
similar sequences [37]. Such limitations not withstanding, ZFPs have found widespread use [32], in
particular in the area of genome editing, where the ZFP domains are coupled to the DNA restricting
domain of specific restriction endonucleases and thus can be used to produce targeted DNA double
strand cuts. These cuts trigger in turn the exchange of chromosomal genes for externally provided
genes, an attractive proposition for gene therapy [38].
Next to the ZFPs, the frequent orthogonality between protein domains has also been exploited
in a number of alternative systems, preferably to re-program signaling pathways in cells [39, 40].

2.1.4. Orthogonality by alternative chemistries

An alternative approach to orthogonality could be to operate with molecules that do not occur
in typical cells and for which therefore no interactions have been designed naturally. Of course, there
might be ample unintended interactions, so such strategies need to be carefully controlled. Still, this
concept could be applied on various levels. For example, while central carbon metabolism defines a
set of standard ways from glucose to the set of standard starting metabolites for anabolism
(glycolysis, pentose phosphate pathway, etc…), it is possible to produce alternative routes on paper
relying on novel intermediates which for example might not have the regulatory effects that glycolytic
intermediates have. In fact, work is done on the establishment of such routes, in which pathways are
built up by going backward from an intermediate (the end of the pathway) and successively evolving
the enzyme to catalyze the step before [41].
Alternatively, one could propose a different set of molecules to encode genetic information that
is replicated by dedicated enzymes and thus represents an orthogonal store of information in the cell
[12]. First steps to the implementation of this strategy were successful, for example as it was possible
to generate DNA-polymerase variants that could incorporate alternative nucleotides with novel
hydrogen bond patterns in PCR reactions [42].
In fact, many other examples can be quoted in which the reengineering of the interface of
molecular interaction has allowed the introduction of novel small molecules into the set of cellular
interactions, and these novel molecules could potentially all behave orthogonally [43, 44]. However,
the corresponding publications have usually been performed with a rather narrow experimental focus
and thus all lack the proper controls to determine whether the novel engineered interaction is indeed
orthogonal beyond the immediate scope of the experiment and does not lead to additional,
unanticipated interferences with different cellular functions.

Synthetic Biology – Enginering in Biotechnology
- 17 -
As discussed, a variety of concepts exist how to realize orthogonality in biological system, and
from my point of view it is likely that some of these strategies will be successful in the end – in
particular, when orthogonality becomes selectable, as suggested in the example of in vivo
introduction of unnatural amino acids. This particular experimental system has also been shown to
be rather robust, as many different unnatural amino acids have been inserted into proteins following
this strategy [23]. But even here it is clear that the system is not fully orthogonal – most importantly,
it would be necessary to prune the genome of the exploited cells of all other uses of the codon that is
used to encode the novel amino acid. It is also clear that many interactions might simply remain
undetected, because we do not bother to look or simply would not know how to look. These hidden
interactions might or might not turn out to be important in the long run, when an existing orthogonal
system is made part of more complex design. After all, it is completely unclear to what degree of
completeness we have to insist on orthogonality for the various schemes of cellular reorganization. In
summary, the field is only at its beginning, but at an important beginning. In the words of Sismour
and Benner: “Ultimately, Synthetic Biology succeeds or fails as an engineering discipline depending
on where the independence approximations become useful in the continuum between the atomic and
macroscopic worlds.” [12]

2.2. Evolution and Synthetic Biology

As pointed out above, the long-term implications of evolution have not been discussed in the
relevant literature so far, even though their implications have already been visible (see the discussion
of the partial re-design of phage genomes above [27]). Clearly, for example orthogonality might make
cellular behavior more predictable and easier to manipulate rationally, but in many instances this
will be connected to expanding the amount of genomic information. With this, I mean that in order to
achieve the same functionality, a thoroughly orthogonal system might need substantially more DNA
to encode the information and more proteins, enzymes, and small molecule to implement the
functionality. Obviously, these are bad starting conditions when competing for a limited pool of
resources, and left to their own, it is probably safe to argue that from an evolutionary point of view
synthetic biological systems are poised to loose out.
How to deal with this evolutionary pressure? Two layers of action can be envisioned in theory:
a) interfering with the evolutionary machinery itself: and
b) making control and repair of biological systems easier.
Point a) refers to manipulations of the enzymes involved in replicating and repairing DNA in
cells. Even though it might appear unlikely at the moment, it might be possible to improve these
molecules in terms of accurate DNA propagation. Alternatively, all strategies that enable robust and
conditional synchronization of replication would contribute to reducing the impact of evolutionary
pressure. Point b) refers to technological solutions that first identify the modified section of DNA and
Synthetic Biology – Enginering in Biotechnology
- 18 -
then provide means to rapidly return to the previous state (essentially, DNA sequencing and DNA
synthesis – see below in the technical sections).
Available Synthetic Biology prototypes are so small that they do not drastically interfere with
the fitness of the cell, and none of the applications has been so rewarding that long-term experiments
regarding phenotype-stability were performed. Only one experiment has addressed this problem to
some extent, however in an exceptional experimental context: a population-density regulation was
shown to be stably maintained in a chemostat over appr. 50 generations, but the reactor volume (and
therefore the population pool to generate critical mutations) was only 16 nL (roughly 6 orders of
magnitude less than typical small-scale chemostat experiments), which makes comparison to
traditional data difficult [45].
However, as will be discussed below, the ambition of synthetic biologists regarding the scope of
their synthetic systems is rapidly changing – synthetic genomes are becoming available [46, 47], and
it is easy to predict that evolutionary pressure will become an important point on Synthetic Biology’s
agenda.

2.3. The technology to implement synthetic systems

As much as a conceptual problem, the progress of Synthetic Biology is a technological problem.
The extent of changes in cellular functions that are required to implement Synthetic Biology is of a
different order of magnitude than biotechnology has traditionally addressed. Just to name two
examples: only to eliminate all amber stop codons from the genome of E. coli so that it could be used
to insert unnatural amino acids only at the desired place would require around 300 mutations; and
re-synthesizing the genome of Mycoplasma genitalium required the synthesis and assembly of 580
kbp [48].
Such orders of magnitude differences require a substantial leap in technological proficiency,
but also in providing “content” to synthesize. First the technological side: Just as the “-omics”
technologies have changed the analytical side of biology from single gene to genomes, so does
Synthetic Biology need to implement the methods to move from the manipulation of single genes to
that of suites of genes and eventually even genomes. Secondly, we need to adapt our measurement
tools to the fact that the accurate analysis of dynamic systems will become of crucial importance,
recognizing that this will require covering our measurements much better statistically (=more
measurements per data point).
There are two broad lines of technological advances that in my view will determine the rate with
which Synthetic Biology can advance: [i] the advance in our capacity to synthesize de novo (non-
template driven) and
error-free large (> 5 kbp) segments of DNA and [ii] the miniaturization and
automation of current laboratory protocols for the manipulation and analysis of biological systems.

Synthetic Biology – Enginering in Biotechnology
- 19 -
2.3.1. DNA synthesis and assembly

2.3.1.1. Towards large-scale, non template-driven DNA synthesis
Biological systems store the instructions they require for maintenance, growth, replication, and
differentiation in DNA-molecules. Consequently, the ability to write new DNA is central to efforts in
biological systems design. However, so far our ability to manipulate DNA has been rather limited, as
will be argued in the following:
Cells need to produce DNA whenever a cell divides, in order to provide each daughter cell with
the same set of DNA-encoded information. To do so, cells copy existing DNA-molecules. To be more
precise, they produce the complementary version of an existing DNA-strand. This copying process is
of exceptional quality – typical error rates in in vivo DNA replication, for example when the bacterium
E. coli divides, are on the order of 10
-7
to 10
-8
(i.e. roughly one base substitution every 5 genome
duplications [49]). The (few) errors made in copying are an important source of the genetic variation
that is required for biological system to evolve. Still, DNA is naturally propagated by copying only.
This is also reflected in the laboratory tools that we have developed to synthesize DNA. For
example, the polymerase chain reaction (PCR) exploits thermostable DNA-polymerases to duplicate
the two strands of an existing template-DNA double strand. Repeating this duplication over and over
again allows exponential amplification of the template until enough material for further experiments
or analysis is produced. The same is true for example for our current protocols that are used to
determine DNA sequences. Irrespective of the specific protocol used, all methods rely on the
reconstruction of a second strand of DNA along a template-strand. This reconstruction is then
exploited analytically in various ways (for example by inserting nucleotides that cause the extension
reaction to stop (Sanger sequencing [50]) or by controlling the availability of the next nucleotide for
strand extension and recording any chemical reaction that might or might not have taken place
(pyrosequencing [51])), but at the heart of all the processes is the ability to synthesize DNA while
concomitantly evaluating the information encoded on the complementary strand.
The process of copying can be adapted to modifications – the conditions in which a PCR
reaction takes place can be adjusted so that the error rate of the synthesis enzyme increases (the
principle of directed evolution). We can also introduce specific sequence modifications into a DNA
template that is supposed to be amplified by selecting adequate, chemically synthesized
oligonucleotides to start the PCR process. But in both cases, the introduced modifications are minor
compared to the sequence of the original template and in the latter case limited to the sequence of
the oligonucleotides (as the rest of the molecule is again produced by copying from the template).
In order to demonstrate how inadequate these procedures are for addressing the task of
(re-)designing large sections of chromosomal DNA, let us examine an example, the recruitment of the
a cytochrome P450 monooxygenase from Artemisia annua (sweet wormwood), that catalyzes the
3-step oxidation of amorphadiene to artemisinic acid as part of a novel pathway from glucose to
artemisinic acid in E. coli and S. cerevisiae [52]. First of all, the codon usage of the novel gene needs
Synthetic Biology – Enginering in Biotechnology
- 20 -
to be changed from the plant A. annua to that of e.g. the Gram-negative bacterium E. coli. Next, the
gene needs to be integrated into the regulatory structure of the new pathway – it might for example
be part of an operon requiring tight transcriptional and translational coupling to the genes in front
and afterwards. Furthermore, we might want to fine-tune the amount of enzyme available relative to
the other pathway-members by influencing e.g. the efficiency of the ribosome binding site, the half-
life of the corresponding section of the mRNA (by introducing specific secondary structures, see
below) or of the protein itself (by adding specific tags to the protein, also see below). Alternatively, the
gene might need to receive its own regulatory structure, including promoter and transcriptional
terminator. Finally, in order to allow the rapid insertion of improved variants of the gene into the
operon, it might be desirable to have the gene flanked by unique restriction sites, while at the same
time internal restriction sites might have to be eliminated.
To go comprehensively through all these modifications, step by step, with the methods
described above is so laborious that there is just no way that this can be done for more than a few
genes in any given project. To be truly able to modify large stretches of DNA and adapt them to our
specific requirements, we need to switch completely to de novo, non template-dependent DNA
synthesis methods (Fig. 1). Only this change can give the power to implement comprehensively
engineered DNA sequences on a significant scale into novel biological systems.
Even though this switch is desirable and absolutely vital for Synthetic Biology, it is not easy. In
order to design DNA sequences de novo, we need to rely on our capabilities to synthesize DNA
chemically
– without the requirement for a template. In this process, a single stranded DNA molecule
is built up nucleotide by nucleotide and the nucleotide sequence is only determined by the sequence
of reactions the operator desires (Fig. 2). Such chemically produced stretches of DNA
(oligonucleotides or “oligos”) are typically between 20 and 100 bp long. The technology as such is
established and is an instrumental requirement for example for PCR reactions. But, as already
indicated above, synthetic oligonucleotides are short, and to make up meaningful novel DNA
sequences they need to be assembled into larger and larger molecules. The assembly of these short
oligonucleotides into (ultimately) DNA sequences of genome size is one of the technologies at the
heart of Synthetic Biology.

2.3.1.2. Oligonucleotide synthesis
Before discussing the assembly of ever larger DNA sequences from oligonucleotides, it is
important to point out that there is an important reason why oligonucleotides used for assembling
longer DNA sequences tend to be not longer than 100 bp: The manufacturing of oligonucleotides is
error-prone and the likelihood of sequence errors increases with increasing length (Fig. 3).
Errors are introduced on various levels: [i] On the level of chemical synthesis of the
oligonucleotides; [ii] on the level of assembling the oligonucleotides into larger fragments of DNA; and
[iii] during storage of fragments in living cells, such as E. coli.
Synthetic Biology – Enginering in Biotechnology
- 21 -
1. Bioinformatics-supported design of 1’600 oligonucleotides
2. Production of 64 appr. 500 bp “synthons” from oligos by PCA
4. Uracil-DNA glycosylase-supported cloning of PCR fragments in vectors
3. Amplification of synthons by PCR with uracil containing primers
5. Ligation-by-selection-supported assembly of 6 fragments of appr. 5 kb
6. Assembly of 32 kb cluster by assembling 5 kb fragments
with the help of unique restrictions sites
A
B

Fig. 1: Construction of a 32 kbp polyketide synthase cluster from 40meric chemically
synthesized oligonucleotides. A) Single steps. B) Ligation-by-selection-supported cloning. BsaI and BbsI
are type II restriction enzymes that have their recognition sequence here outside the relevant gene
fragment The selection for a successful cloning step occurs via unique combinations of antibiotic
resistances (in this case, for kanamycin (Km) and tetracycline (Tet). PCA: polymerase cycling assembly.
Data taken from [53].



In order to understand the sources of errors in oligonucleotide production, it might be helpful
to repeat briefly the fundamental steps in oligo-synthesis. The corresponding chemistry is well
established. Currently, the bulk of syntheses is carried out by the “classical” phosporamidite protocol
as a solid phase synthesis. Briefly, it operates as follows: A first nucleotide with its 5’-OH function
protected by a DMT-group is coupled to polystyrene beads as the solid phase. Next, the DMT- group
is removed by acid treatment (eg. TCA), generating a free 5’OH-group. Then, the phosporamidite of
choice (A in Fig. 2) is added, converted to a reactive intermediate (B) in weakly acidic conditions, and
coupled to the free 5’OH (C) to produce a novel phosphite linkage. These reactions take place in THF
or DMSO. As the 5’OH of the added nucleotide is still protected, only one nucleotide is added to the
growing chain. The 5’OH groups that do not react need to be capped so that they cannot continue to
take part in the synthesis process and generate oligonucleotides with deletions. This is achieved by
acetylation after treatment with acetic acid and 1-methylimidazole (not shown in Fig. 2). Finally,
water and iodine are added to oxidize the phosphite linkage to a phosphodiester linkage. In between
steps, the system is conditioned by washing with a suitable solvent. After repeating this sequence of
Synthetic Biology – Enginering in Biotechnology
- 22 -
steps for the required number of times, the oligonucleotide is finally cleaved off the column and
treated with ammonium hydroxide at high temperature to remove all remaining protecting groups.
In order to make this process amenable to miniaturization and to produce many different
nucleotide sequences in a confined space, the deprotection of the 5’OH-group was made sensitive to
light. By producing suitable masks that direct the light only to certain parts of a solid-phase
synthesis array (photolithography), only these parts of the array are prepared for extension in the
next round of adding phosphoramidites. This can be achieved by replacing the acid-labile DMT group
by the photo-labile -methyl-6-nitropiperonyloxycarbonyl (MeNPoc) protective group. This technology
allows the concomitant preparation of several thousand oligonucleotides on one solid support (in
hybridization arrays, up to 1 million features per cm
2
are possible).
The photolithography approach has been developed further: In order to eliminate the time-
consuming and expensive utilization of photolithographic masks, a system with “micro-mirrors” has
been developed (Nimblegen, Febit). Here, the light pattern is produced by rapidly adjustable high
contrast video displays. The German company Febit claims that with this procedure, they can
produce up to 500’000 oligonucleotides per day and chip.

A couple of properties are common to all three technologies: One crucial parameter in the oligo-
synthesis is the coupling efficiency, which is a function of the completeness with which deblocking
and chain-extension proceed. If in every step 99% of all started oligonucleotide chains are extended
(see Fig. 2 from B to C), only 60% of all chains contain all nucleotides after 50 steps. This has
considerable implications for the scale at which the synthesis needs to be started. Furthermore, also
the capping does not proceed with total efficiency. Taken together, a significant fraction of the
molecules on the solid phase has a different length from the intended oligo.
Next to deletions, also chemical modifications play a role: for example, the phosphoramidites
that are used for chain extension are not completely pure. Then, oligonucleotide syntheses are prone
to depurination (in particular under the acidic conditions during DMT removal), during which an
adenine or a guanine can be hydrolysed off the sugar-phosphate backbone, leaving a free hydroxyl
group.
These two lines of errors lead to a considerable percentage of wrong oligonucleotides in any
given oligonucleotide mixture coming out of a synthesizer. The exact percentage is difficult to
estimate and is also be a function of the specific supplier that is used and the implemented quality
control criteria. It certainly is a function of the required oligonucleotide length, which should be
obvious from the various error sources mentioned above. The various reports in the literature that
describe the assembly of larger DNA fragments from oligonucleotides typically use oligonucleotides of
around 50 bp [47, 53] as a compromise between the desire to produce long oligonucleotides to
facilitate assembly and short oligonucleotides to minimize the number of errors due to chemical
synthesis.
Synthetic Biology – Enginering in Biotechnology
- 23 -

Next to ongoing efforts to improve the synthetic procedures and thus reduce the frequency of
error introduction, several methods have been developed to identify errors introduced in synthesis
and eliminate the corresponding DNA molecules. They are based on enzymatic and physical methods.
The physical methods focus on the exploitation of size differences and the disturbances in the
hybridization of error-containing complementary oligonucleotides. Polyacrylamide gel electrophoresis
(PAGE) is for example easily sensitive enough to separate oligonucleotides of no more than one
nucleotide difference. Therefore, subjecting oligonucleotides to a PAGE-purification can substantially
reduce the number of erroneous oligonucleotides at the start of the experiment [47]. The same can be
achieved by preparative HPLC, which is a standard technology if high-quality oligonucleotides are
required.
Alternatively, hybridization under stringent conditions can be used to identify mismatches.
Perfect complementarity between two DNA strands leads to a maximum number of possible hydrogen
bonds between the two DNA strands and thus a higher temperature is required to separate the
molecules again (“melting temperature”). This has been applied to reduce the error rate with light-
directed, chip-based oligonucleotide synthesis [54]. A first chip was used to produce the
oligonucleotides for DNA-fragment assembly (“construction oligos”). These oligonucleotides were
eventually released from the first chip and then hybridized under stringent conditions to sets of
complementary oligonucleotides that had been synthesized on a second chip. Ideally, those
oligonucleotides with errors in their sequence should find no identical match in the set of correction
oligonucleotides and be lost in a washing step because of the resulting decreased melting
temperature. Obviously, this procedure requires that the all construction oligonucleotides are
carefully designed before the start to have approximately the same melting temperature. The result
was encouraging: the authors reported 1 error in a sequence of 1’400 bp. Though this is still far too
high for any large scale synthesis approach, it is by a factor of 3 better than standard solid phase
approaches [53].
The enzymatic methods include enzymes that can detect DNA structures that are typical when
erroneous DNA molecules hybridize. For example, endonuclease VII of phage T4 can identify apurinic
sites in DNA and then restricts both strands of the DNA molecule close to lesion site [55]. This leads
to shorter fragments which can again be separated from the correct oligonucleotides (see above).
Alternatively, E. coli’s MutHLS proteins can be used to detect mismatches and insertions/deletions in
double stranded DNA. If such errors are present, fragments are cleaved at GATC sites [56] and the
remaining correct fragments can again be isolated by size selection. This way, the error rate in DNA
fragments produced from chemically synthesized oligonucleotides could be reduced by an order of
magnitude [56].
Even when the error rate in the produced oligonucleotides can be reduced by (a combination of)
the methods mentioned above, the produced larger DNA fragments require sequencing after assembly
Synthetic Biology – Enginering in Biotechnology
- 24 -
[57] to confirm that the desired sequence has been achieved. Of course, this is a very laborious and
time-consuming way of error-correction.


O
H
O
H
H
H
H
DMTO
Base
P
N
OCE
O
H
O
H
H
H
H
DMTO
Base
P
N
OCE
N
N
N
O
H
O
H
H
H
H
DMTO
Base
P
O
OCE
O
H
O
H
H
H
H
Base
R
O
H
O
H
H
H
H
DMTO
Base
P
O
O
O
H
O
H
H
H
H
Base
R
-
O
A
spont., 99% I, H O
2 2
H
+
B
C
D
Fig. 2: Phosphporamidite-procedure for DNA synthesis in 3’- to 5’-direction: Commercially available
nucleoside-3’-phosphoramidites (A) are added to a growing oligonucleotide chain and exposed to weak
acid. This leads immediately to the formation of the reactive intermediate B. Within 1 min, B has formed
a new phosphite linkage with the oligonucleotide chain attached to the solid support (C). C can be
oxidized with iodine/water to D. CE: 2-Cyanoethyl; DMT: Dimethoxytrityl. Amine groups of bases are
also protected (e.g. benzoyl)




Fig. 3: Dependence of the overall yield (OY) of an oligonucleotide synthesis on the number of steps
and the average yield (AY), which gives the percentage of extended oligonucleotide chains per step.


Synthetic Biology – Enginering in Biotechnology
- 25 -
2.3.1.3. Oligonucleotide assembly
Next, the oligonucleotides have to be assembled into larger DNA-fragments, usually to a size of
around 500 bp. This is typically achieved by one of a variety of enzyme-assisted methods. The
corresponding oligonucleotides are mixed, hybridized, and then converted to larger assemblies by
polymerase cycling assembly (PCA, Fig. 4). In a PCA reaction, all oligonucleotides that together
represent the targeted double stranded DNA fragment are present. By repeated melting and re-
hybridization, the oligonucleotides are step-by-step extended into longer sections until a certain
population reaches the desired length. Note that this reaction is carried out without terminal
oligonucleotides in excess, so it is not an amplification reaction. Rather, every full-length fragment
consists of oligonucleotides and their extensions, thereby reducing the chance of introducing errors
by polymerase action. Indeed, a detailed study found polymerase action to be a negligible
contribution in the overall error rate [53]. Remarkably, the error rate observed in the assembled
500 bp fragments was even lower than that expected from the presumable error frequency associated
with the oligonucleotides, suggesting that the error rate in these oligonucleotides was either
overestimated or that the PCA reaction contributed to error correction in some unappreciated way.
However, extending the number of PCA cycles beyond 25 increased the error rate substantially.
A specific feature of the light-directed synthesis technologies is the rather low amount of
oligonucleotide delivered. While a solid phase synthesis can be scaled to the amount of
oligonucleotide required, the chip-based technologies can only produce the amount that is allowed by
the feature-size on the chip (typically a chip-derived single sequence is made at around 10
5
to 10
8




1
5
6
2
5’
1st extension
1
6
5’
2nd extension
1
5
6
2
5’
Hybridization
1
6
5’
Hybridization
1
5
6
7
8
2
3
4
5’
3’
Oligonucleotide design

Fig. 4: Polymerase cycling assembly (PCA): extension of oligonucleotides (40-70 bp) to larger
fragments (appr. 600 bp) by repeated melting and polymerase-based extension. Only the first two
cycles are shown.

Synthetic Biology – Enginering in Biotechnology
- 26 -

molecules per feature translating into picomolar concentrations or lower after release into solution
[54]). This is typically not enough for the subsequent stages of assembling oligonucleotides, so that a
DNA-amplification step needs to be introduced. This amplification step needs to be done on all
oligonucleotides at the same time, putting high requirements on the design of the oligonucleotides
that need to be amplified. Essentially, the oligonucleotides obtain standard linkers on both ends that
serve as hybridization targets for the PCR. However, PCR is not error free itself and thus can
contribute to the error rate in the oligonucleotide set. Furthermore, it is unlikely that indeed all
oligonucleotides can be faithfully amplified with a comparable efficiency, leading to potentially
pronounced imbalances between the amounts of oligonucleotides in a sample (or even the absence of
specific oligonucleotides).
After such an amplification step, the assembly of oligonucleotides to larger fragments was
performed by a variant of the PCA reaction mentioned above, the polymerase assembly multiplexing
(PAM) reaction. It was applied on a pool of oligonucleotides that represented the genes for 21
ribosomal proteins. However, rather than combining these 21 genes into one large DNA fragment
immediately, the authors added terminal primers that allowed amplifying only a specific sub-set of
oligonucleotides. Repeating this process, they obtained each of the 21 genes, but each from a
different reaction. In a second round of PAM reactions, they could then recombine the 21 genes with
a novel set of primers into one 14.6 kbp DNA fragment with all the genes.

2.3.1.4. Assembly of DNA fragments
Once the DNA oligonucleotides have been assembled into DNA fragments of still relatively
modest lengths (typically around 0.5 kbp), these fragments still need to be assembled to larger
fragments or even genomes. The methods that are used for this part are still very traditional, even if
they are rather ingeniously applied. Essentially, fragments are combined by traditional cutting and
pasting DNA. This can be considerably facilitated by vigorously applying intelligent working routines.
For example, the assembly of 5 kbp fragments from 500 bp synthons was achieved by facilitating the
cloning steps by smart selection (Fig. 1). Briefly, a synthon was excised together with an antibiotic
resistance gene and inserted in a vector that had been prepared by eliminating a different resistance
gene while retaining a second resistance gene different from the other two. Then, selection for
successful ligation can be made by selecting from the unique combination of resistance genes,
effectively reducing the amount of time that has to be invested to identify correct clones. Other
methods such as PAM variants have been mentioned above.

Synthetic Biology – Enginering in Biotechnology
- 27 -

Synthesized or
assembled DNA
Length
(bp)
Year Ref.

Comments
Ala-tRNA gene
(yeast)
77 1970 [58]

None
somatostatin gene 56 1977 [59]

None
Tyr-tRNA gene (E.
coli)
207 1979 [60]

None
poliovirus genome 7’558 2002 [57]

Overlapping fragments of 400-600bp assembled from
69bp oligonucleotides, fragments confirmed by
sequencing and then combined to roughly 2.5 kbp
fragments
genome of 174
bacteriophage
5’386 2003 [47]

Gel purification of oligonucleotides (to eliminate
oligonucleotides of wrong length), PCA and PCR
assembly, final test by transforming E. coli and
selecting for functional phage, then sequencing to
identify one correct genome
polyketide synthase
gene cluster
31’656 2004 [53]

1’600 oligonucleotides assembled to synthons of 500bp
and then in fragments of 5 kbp.
genes encoding
proteins the 30S
ribosome subunit
of E. coli
14’600 2004 [54]

Light-directed oligonucleotide synthesis on a chip,
additional error correction by hybridization on control
chip, then assembly in steps
genome of the 1918
influenza virus
13’500 [61]

[62]

Separate genes assembled from oligonucleotides and
later combined
Genome of
M. genitalium JCVI-
1.0
582’970 2008 [48]

Four cassettes of 5-7kbp, acquired from commercial
suppliers and verified by sequencing, were assembled
by in vitro recombination to 25 kbp A-assemblies, then
to ~72kbp B-assemblies, and then to ~144kbp
C-assemblies, all stored in E. coli BACs. Subsequent
steps D and E (580 kbp) in S. cerevisiae by TAR
cloning. Each step verified by sequencing
Tab. 1: Milestones in DNA synthesis and assembly


2.3.1.5. Assembling DNA at genomic scale
One prominent example where the assembly of synthetic DNA was extended by 2 orders of
magnitude beyond the 5 kbp status is the assembly of the 580 kbp M. genitalium JCVI-1.0 genome.
The authors obtained 101 DNA cassettes of 5 to 7 kbp (already checked for the correct sequence)
from commercial providers. These cassettes overlapped by between 80 and 360 bp. Four of such
cassettes were excised from their vector, treated with T4 polymerase to make the overlapping ends
single-stranded, hybridized, and the ends were repaired and ligated (A-assembly). This set of 4
cassettes was then inserted into a BAC by PCR with primers that covered part of the specific A-
assembly and part of the BAC. This led to 25 inserts of appr. 24 kbp. In the next stages, typically 3 of
these A-inserts were combined to 8 B-assemblies of approximately 72 kbp and then 2 B-assemblies
were combined to a C-assembliy (~144 kbp) by similar methods. The different assemblies were stored
again in E. coli BAC vectors. However, the assemblies of the next stages (half and entire genome)
Synthetic Biology – Enginering in Biotechnology
- 28 -
could not be stored in E. coli (for unknown reasons) and were performed in S. cerevisiae instead. As
S. cerevisiae is very proficient in homologous recombination, transformation into S. cerevisiae of even
long sequences of interest together with a yeast-centromer-containing sequence can lead to intact
circular recombinant constructs if sufficiently large regions of overlap are available (transformation
assisted recombination, TAR) [63]. This TAR-cloning approach was used to produce the D and E
assemblies, the latter representing the complete M. genitalium genome reassembled in a YAC together
with the required yeast-specific sequences [48]. It should be noted that the E-assembly was done
directly with the 4 C-assemblies.
It is remarkable that the number of errors introduced during the assembly of the 5 – 7 kbp
fragments into a 580 kbp genome using the various in vitro and in vivo methods pointed out above
seemed to be rather small, even though the exact error rate remains unclear. However, it is clear that
the errors could be traced back to various sources – errors due to providing wrong sequences to the
commercial suppliers, errors in the cassettes obtained in return, and errors acquired during
propagation of assemblies in E. coli. Crucial here is the error management that has to be applied –
the authors of the M. genitalium genome assembly study went through a series of comprehensive
sequence determinations – in total, the genome was sequenced about 4 times at various stages of the
assembly process, excluding oversampling and the sequencing of the 5 – 7 kbp cassettes that was
done at the commercial suppliers [48]. Whenever errors in the sequence were encountered, the
corresponding assembly had to be carefully repaired or re-made.
The M. genitalium genome is currently the largest synthetic genome that has been assembled
from entirely chemically produced oligonucleotides. However, other approaches were followed to
assemble large segments of natural DNA up to entire genomes in bacterial hosts along with the
original chromosome of the host. These attempts have focused on Bacillus subtilis as host because
this bacterium is well-known for its ability to acquire DNA from the environment. In a process called
“inchworm cloning”, the entire genome of the photosynthetic bacterium Synechocystis PCC6803 was
transferred piece by piece into various sites on the chromosome of B. subtilis 168 [64]. The process
relied on double homologous recombination for transfer of the Synechocystis genome to predesigned
points in the B. subtilis genome. At 4 different positions on the B. subtilis genome, landing pads of
about two times 5 kbp of Synechocystis DNA were placed into which about 30 kbp of Synechocystis
DNA could be recombined. Then, another 5 kbp of Synechocystis DNA were placed in front of the
growing section (again by double homologous recombination) and the insertion of another 30 kbp
could take place, and so on. This way, the entire Synechocystis genome was transferred to the 4
different locations on the B. subtilis genome. This rather laborious method has been facilitated
somewhat to allow the assembly of large pieces of DNA from rather short fragments, generated for
example by PCR, on dedicated B. subtilis genome vectors [65]. This way, small genomes up to 134.5
kbp (rice chloroplast genome) could be successfully cloned and maintained.

Synthetic Biology – Enginering in Biotechnology
- 29 -
2.3.1.6. Transferring artificial genomes into novel cellular environments
Once the genomes of choice have been assembled, they need to be introduced into a cytoplasm
so they can start to operate. Although this approach has so far not been reported for a synthetic
genome, it has been accomplished for a natural genome [46]. The authors developed a method to
isolate the chromosome of a Mycoplasma mycoides strain in essentially intact form and to transplant
this into a Mycoplasma capricolum bacterium. The last step was achieved by incubating the
chromosome in the presence of an excess of yeast tRNA and polyethylene glycol with intact
M. capricolum cells that had been made “competent” by starvation and CaCl
2
treatment. The authors
could convincingly demonstrate that by this procedure a number of cells were produced that had
exclusively the M. mycoides chromosome in their cytoplasm, suggesting that the chromosome had
been successfully transplanted. Even though the exact mechanism of this method remains unclear
and it has to be acknowledged that the chromosome transplantation took place between two closely
related microorganisms, it is clear that with this experiment all technological elements are available
to synthesize and assemble a completely synthetic genome and then to put it into a cellular
environment that allows its functioning. In other words, the technological chain from chemical
oligonucleotide synthesis to genome functioning has been established once. Clearly, this technology
is not yet at a state that it can be repeated at will for any other (for the time being, bacterial) genome
– but it was done once, and it is highly likely that it will be repeated soon for other bacteria or even
S. cerevisiae as well. Apparently, the technology of synthetic genomics is about to become available.

2.3.1.7. Practical considerations
One important issue in assessing whether de novo DNA synthesis will become a major driving
force in everyday laboratory procedures will be the associated costs. For example, let us assume a
PhD student (60% position) is supposed to assemble a 10 kbp pathway of 10 genes encoding a
synthetic pathway for a novel molecule. Let us further assume the following details: every gene
requires on average 2 modifications (involving the regulatory region in front of the gene and one
modification inside the gene, such as removal of a critical restriction site), each requiring 2
oligonucleotides of 60 bp; the student can handle 3 genes in parallel and one modification takes one
week. This means that the student requires about 6 weeks for adapting all the single elements and
then another 5 weeks for assembling, a total of 11 weeks. In summary, this is an expense of roughly
14 kCHF (9.7 kCHF salary, 3.5 kCHF consumables, 1 kCHF oligonucleotides). The de novo synthesis
of 10 kbp of DNA in 10 pieces of 1 kbp will cost approximately 15 kCHF plus an additional 5 weeks of
assembly work (1.6 kCHF for consumables and 4.4 kCHF of salaries), a total of 21 kCHF (already,
some vendors offer gene-synthesis prices as low as 55 $cents/bp, reducing the most important entry
to 5.5 kCHF at current exchange rates). Clearly, both approaches are in the same range, with the
total synthesis having the added advantage of allowing a much more radical redesign of the sequence.
The major cost item at the moment is the acquisition of the 1 kbp fragments. Here, it might be
Synthetic Biology – Enginering in Biotechnology
- 30 -

Fig. 5: Development over time of the number of transistors per chip (“Moore’s law”), DNA
sequencing capacity, and DNA synthesis capacity, put off against the synthesis rate of E. coli’s DNA
polymerase III. Taken from [66].



informative to emphasize the exponential pace which DNA synthesis capacity has developed over the
last years (Fig. 5). It is safe to assume that the costs of de novo synthesized DNA fragments will be
reduced dramatically over the next couple of years, so it can be expected that the DNA synthesis is
likely to transform the working routines of the standard molecular biology laboratory in a similar
fashion to the introduction of PCR.

2.3.2. Miniaturizing and automating laboratory protocols and analysis

It should have become clear from the previous discussion, that already in the near future, the
bottleneck in de novo synthesizing large fragments of DNA will be in the assembly of elements rather
than in the production of small DNA elements. Clearly this remains at the moment a laborious, step-
by step procedure open for substantial improvement for example by miniaturization and automation.
At the same time, system-scale adaptation of bacterial genomes (knock-outs, mutations, etc.) is
currently also based on laborious laboratory procedures limiting highly trained personnel to
achieving maybe a couple of knock-outs every other week. And while the “-omics” technologies allow
comprehensive snapshots of the system state at a given time-point, their associated costs and effort
make the repetition of one experiment more than once more the exception than the rule – which is
clearly not enough to provide reliable and accurate data for systems design.
Synthetic Biology – Enginering in Biotechnology
- 31 -
In order to change this situation fundamentally, it appears necessary to me to re-consider the
way and in particular the scale in which we carry out biological experiments. There is substantial
evidence in the literature that by substituting current protocols by microfluidics-based protocols,
many of these protocols can be substantially volume-reduced, shortened, and parallelized, addressing
the key issues of the postulated paradigm change.
Since the integration of capillary electrophoresis on a microchip in the 1990s, microfluidics
plays an increasingly important role as an enabling technology in the miniaturization of laboratory-
based processes, and it is the decisive technology in modern “lab-on-a-chip” systems. There are two
main arguments that motivate the introduction of microfluidic systems into the modern
biotechnology laboratory:
[i] Microfluidics allows the introduction of new and the improved application of established
physical phenomena to implement experimental processes, which are carried out at high speed: small
volumes allow short diffusion times, temperature changes can occur rapidly due to low heat
capacities, and rapid electrokinetic separation of biomolecules are possible due to high field strengths
[ii] Microfluidics allows miniaturization and thus parallelization of biological experimental
processes and therefore coping with the new order of magnitude in the numbers of required parallel