Design of a Lightweight TCP/IP Protocol Stack with an Event-Driven ...

hollowtabernacleΔίκτυα και Επικοινωνίες

26 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

81 εμφανίσεις

J
OURNAL OF
I
NFORMATION
S
CIENCE AND
E
NGINEERING
28, 1059-1071 (2012)
1059

Design of a Lightweight TCP/IP Protocol Stack
with an Event-Driven Scheduler
*


J
OONHYOUK
J
ANG
1
, J
INMAN
J
UNG
1
, Y
OOKUN
C
HO
1
,
S
ANGHOON
C
HOI
2

AND
S
UNG
Y. S
HIN
3

1
School of Computer Science and Engineering
Seoul National University
Seoul, 151-744 Korea
2
School of Computing
Soongsil University
Seoul, 156-743 Korea
3
Electric Engineering and Computer Science Department
South Dakota State University
Brookings, SD57007 USA

The traditional TCP/IP protocol stack is associated with shortcomings related to the
context-switching overhead and redundant data copying. The software-based TOE (TCP/
IP Offload Engine), also known as lightweight TCP/IP, was developed to optimize the
TCP/IP protocol stack to run on an embedded system. In this paper, we propose the de-
sign of a lightweight TCP/IP protocol stack that runs on an event-driven scheduler. An
event-driven scheduler is one of the main components of a real-time operating system
that provides essential functionalities for an embedded system in network communica-
tion. We discuss the problems involved in designing a lightweight TCP/IP with an event-
driven scheduler, especially for the issues of TCP transmission and TCP retransmission.
We implemented and evaluated the proposed TCP/IP stack on an embedded networking
device and verified that the proposed TCP/IP stack is well suited for high-performance
networking in embedded systems.

Keywords: TCP/IP, TCP/IP offload engine, embedded system


1. INTRODUCTION

With the rapid growth of wired/wireless networks, embedded systems are required
to have a high capability for network communications. Several studies have been done to
accelerate the speed of packet processing in embedded devices that are uniquely applica-
ble to network communications. Considering the communication environments of the
embedded devices, these studies focus on enhancing the TCP protocol [1] and optimizing
the TCP/IP implementation [2].
TOE (TCP/IP Offload Engine) [3, 4] refers to a type of implementation of a protocol
stack that is optimized for embedded devices. Although general TOEs are implemented
in hardware [2, 5], in some cases TOEs are implemented in software [6, 7]. Software-
based TOEs have high degrees of flexibility and a low cost compared to hardware-based
TOEs. A protocol stack implemented in a TOE is designed to support more efficient
packet processing by overcoming the shortcomings of the traditional TCP/IP protocol.
Received May 31, 2011; accepted March 31, 2012.
Communicated by Junyoung Heo and Tei-Wei Kuo.
*
This research was supported by Basic Science Research Program through the National Research Foundation
of Korea (NRF) funded by the Ministry of Education, Science and Technology (2011-0027454) and by Min-
istry of Culture, Sports and Tourism (MCST) and from Korea Copyright Commission in 2011.
J
OONHYOUK
J
ANG
, J
INMAN
J
UNG
, Y
OOKUN
C
HO
, S
ANGHOON
C
HOI

AND
S
UNG
Y. S
HIN


1060

Software-based TOEs, sometimes referred to as lightweight TCP/IPs, run on real-
time operating systems. Because real-time operating systems do not support the full func-
tionality of general operating systems, the software-based TOE architecture is custom-
ized to the real-time operating system in which runs. This paper proposes the design of a
lightweight TCP/IP protocol, which runs on an event-driven scheduler in a real-time op-
erating system. We also define the problems involved in the TCP transmission and re-
transmission on an event-driven scheduler.
The rest of this paper is organized as follows. In section 2, we introduce previous
works on software-based TOEs or the lightweight TCP/IP protocol. In section 3, we de-
fine the execution environments associated with the problem. In section 4, we discuss the
problems and their solutions as regards TCP transmission and re-transmission in the exe-
cution environments described in section 3. In section 5, we evaluate our implementation,
and in section 6, we conclude this paper.
2. SOFTWARE-BASED TCP/IP OFFLOAD ENGINE
The traditional TCP/IP protocol is designed as a layered architecture. This design
concept is not suitable for running on an embedded device because the resource limita-
tions of an embedded device-size of binary executable and memory space-are not con-
sidered. In addition, the overhead of context-switching slows down the packet processing,
as the layers or the network protocols are executed in separate processes. Each of the
software layers handles a data unit of its own form as used in the specific protocol and
manages the data structure to maintain the states of the data unit. For example, a TCP
connection processes a bit stream, which is divided into segments, while the IP protocol
processes the datagram. In an actual implementation case, there are many more software
layers, and they all maintain a specific form of data structure, consuming memory space
and processing time while dealing with data packets. In particular, data copying between
the layers is the most time-consuming process in the network protocol stack. When
transmitting data packets, each layer copies data from the packet buffer in the upper layer
to its packet buffer and transfers the data to the lower layer after processing the data
packets. This occurs in reverse order when receiving data packets.
The most remarkable feature of a software-based TOE, aka lightweight TCP/IP, is
the elimination of the overhead of managing complex data structures and copying re-
dundant data packets between software layers in the traditional design of the network
protocol stack [8-11]. It provides the layers with unified methods of accessing a shared
buffer space to allocate and de-allocate packet buffers. In this way, data structures can be
simplified and the number of redundant, time-consuming data copying events is reduced
to zero, one, or two. Zero-copy packet transmission and reception can be achieved by
integrating applications and network drivers together into a unified buffer management
scheme. Otherwise, one or two instances of copying can be allowed to preserve the in-
dependence of the applications and network drivers. Another feature of lightweight TCP/
IP is its software architecture, which is designed to integrate software layers. Because the
layers are not separated strictly, they can stick together or flexibly take charge of the
functionalities of the network protocols. With this feature, the program binaries occupy
less storage space and the communication between the layers can be simplified. More-
D
ESIGN

OF

A
L
IGHTWEIGHT
TCP/IP P
ROTOCOL
S
TACK

WITH

AN
E
VENT
-D
RIVEN
S
CHEDULER


1061

over, the software can be executed in only one or two processes, resulting in more effi-
cient network communication than in the traditional design of the protocol stack.
Previous works in this research area introduced micro-IP [12], lwIP [13], tinyTCP
[14], NexGenIP [15] and NETX [16]. Particularly, lwIP, open source software developed
by SICS, is one of the most well-known implementations of lightweight TCP/IP. It pro-
vides IP, ICMP, UDP, TCP, and DHCP and is designed to support various real-time op-
erating systems. lwIP implements the pBuf, structure which dynamically allocates and
de-allocates packet buffers to increase the packet processing performance. Several stud-
ies have been done to port lwIP onto real-time operating systems or to optimize a light-
weight TCP/IP for an embedded platform [17-19].
3. OVERALL SOFTWARE ARCHITECTURE
3.1 Execution Environments

The embedded system described in this paper is assumed to be a device dedicated to
network communication. The program is compiled in one binary file which is embedded
in the system. This implies that the software components included in the binary file are
executed in one control flow with a main function which calls them. The software em-
bedded in the device consists of the main program, applications, a TCP/IP protocol stack,
and a real-time operating system (Fig. 1). The operating system is simplified, including
an event-driven scheduler, a network device driver, and a timer. Upon execution, the
main program calls an initialization routine and the scheduler. The scheduler executes
tasks which are in the ready sate. Putting a task into the wait queue in the schedule is
implemented as registering a function pointer in it. The initialization routine registers the
main functions of applications and the TCP/IP protocol stack and when the scheduler
executes them, the applications and TCP/IP protocol stack process their jobs by register-
ing their functions as a task with the scheduler.
Applications
Frameworks
Network layer
(TCP/IP)
Interface for
network communication
Porting layer
Operating system
Event-driven scheduler
EMAC driver
Timer
Main program
Hardware
Send packets
Receive packets
Create packet
reception tasks
Process packet
reception tasks
Transfer data
to applications
Create packet
transmission tasks
Process packet
Transmission tasks
Transfer data
to EMAC driver
Application
layer
Network
layer
MAC
(S/W)
MAC
(H/W)
PHY

Fig. 1. Software architecture of the system with a representation of network layers and transmission/
reception flows.
J
OONHYOUK
J
ANG
, J
INMAN
J
UNG
, Y
OOKUN
C
HO
, S
ANGHOON
C
HOI

AND
S
UNG
Y. S
HIN


1062

The event-driven scheduler uses a non-preemptive prioritized scheduling policy. It
contains two wait queues, and one of two priorities is assigned to each. A task is assigned
a priority at the time of creation and is put into the corresponding wait queue. Task crea-
tion is done in three ways. A task can be created by an application task, by a network
protocol task, or by the timer in the operating system (Fig. 2). The timer provides four
functions. It notifies current time, creates a task at a reserved time, creates tasks periodi-
cally, and deletes a reserved task.
Wait queue(high priority)
Wait queue(low priority)
Applications
TCP/IP protocol stack
Reserved
task
Task creation
Task reservation
Task
creation
Running
Timer

Fig. 2. Task creation and execution through the scheduler and timer.

3.2 TCP/IP Protocol Stack

We implemented TCP, UDP, IP, ICMP, ARP, and the Ethernet protocol in the TCP/
IP protocol stack. These protocols are the minimum requirements necessary to commu-
nicate with other entities in an IP network [20]. In addition, the TCP/IP protocol stack
does not provide full functionality of the network protocols. Only the essential parts the
network protocols are implemented in the TCP/IP protocol stack. We briefly describe the
functional specification of the lightweight TCP/IP protocol below.

 Packet transmission and reception
 Multiple applications (port)
 TCP window management for flow control
 TCP connection establishment, management, and destruction
 Checksum calculation
 No TCP options except the MSS (Maximum Segment Size) configuration
 Reordering of out-of-order segments in packet reception is not supported
 Congestion control is not supported
 Calculations like RTO (Retransmission Time Out) is replaced with constant values



D
ESIGN

OF

A
L
IGHTWEIGHT
TCP/IP P
ROTOCOL
S
TACK

WITH

AN
E
VENT
-D
RIVEN
S
CHEDULER


1063

4. PACKET TRANSMISSON AND RECEPTION
4.1 Task Granularity

Tasks that are executed on an event-driven scheduler are classified as application
tasks, TCP/IP tasks, packet transmission tasks, or packet reception tasks. In these tasks, it
is important to define the granularity of the packet transmission task, because this di-
rectly affects the implementation methods and system performance. For an event-driven
scheduler, it is difficult to guarantee fairness in distributing the execution time in the
manner of schedulers in general operating systems do. Hence, it is necessary to balance
the execution time of application tasks, packet transmission tasks, and packet reception
tasks. The granularity of the packet transmission tasks is the most important factor in
balancing the tasks to improve performance of the system.
packet1
packet2
packet3
packet4
packet1, 2, 3
packet4, 5
packet5
Create tcp_send() task
Create tcp_send() task
tcp_send()
tcp_send()
tcp_send()
tcp_send()
tcp_send()
Create task
Creation task
a) First-level granularity
b) Second-level granularity
Application
TCI/IP
protocol stack
Packet transmission task
Packet transmission task
Packet transmission task
Application
TCI/IP
protocol stack
Application
TCI/IP
protocol stack

(c) Third-level granularity.
Fig. 3. Three levels of task granularity.

Fig. 3 shows three levels of task granularity. At the first level of task granularity, a
packet transmission task can transmit only one packet. In this case, every time packet
transmission is required, a packet transmission task is created and put into the wait queue
in the scheduler. This is simple to implement, and the synchronization problem (dis-
cussed later) is not serious. Because the first-level granularity maximizes the response
time of the system, it is the most appropriate design for interactive communications.
However, as the size of data increases, number of tasks waiting in the queue increases
(a) First-level granularity.
(b) Second-level granularity.
J
OONHYOUK
J
ANG
, J
INMAN
J
UNG
, Y
OOKUN
C
HO
, S
ANGHOON
C
HOI

AND
S
UNG
Y. S
HIN


1064

rapidly. If the size of the wait queues is not sufficiently large, task creation fails repeat-
edly and the failures reduce the system performance greatly. With the second-level
granularity, a packet transmission task transmits multiple packets when executed. In this
case, the number of tasks increases much more slowly than the system with first-level
granularity. A system with second-level granularity can cope with the problem of task
creation failure. Moreover, by controlling the number of packets to transmit assigned to a
single task, the system can transmit a large number of data packets without degrading the
performance. A disadvantage of second-level granularity is synchronization. Given that
the event-driven scheduler is non-preemptive, the execution time of a packet transmis-
sion task increases linearly as the number of packets assigned to it increases. Packet re-
ception tasks cannot be executed in a short time if a transmission task is running; thus,
the response time of the system increases. Though the effect of this problem is not great
without packet losses, the system performance is greatly decreased when a packet is lost.
With the first and the second types of granularity, tasks are created when an appli-
cation task requests transmission or when data packets remain in the packet buffer. With
third-level granularity, a task managing packet transmission runs continuously, such as a
daemon process, by creating a task that is identical to itself before it terminates. With the
third type of granularity, a packet transmission task can be run as a process in a system
with a general operating system. However, in the event-driven scheduler described above,
a packet transmission task is executed too frequently to break the balance with other type
of tasks.

4.2 Fast Retransmission and Synchronization

In TCP, retransmission is described to be triggered by retransmission timeout (RTO).
In reality, most retransmissions are triggered by a fast retransmission mechanism when
three duplicated acknowledgements are received from a recipient [20]. To support fast
retransmission, a TCP connection is required to have additional states. The state trans-
mission occurs when duplicated acknowledgements are received. However, as described
in section 4.1, it is challenging to execute packet transmission tasks and packet reception
tasks at the exact time on an event-driven scheduler.
During the time between the arrival of a packet and the execution of the packet re-
ception task processing the packet, there can be multiple packet transmission tasks that
are created (Fig. 4). In other words, when a TCP connection decides to retransmit a data
Created tasks
A packet loss
3 DUP ACKs
are received
3 DUP ACKs
are processed
Creating of packet transmission tasks
/Processing packet reception tasks
Actual
transmission and reception

Fig. 4. Fast retransmission and synchronization.
D
ESIGN

OF

A
L
IGHTWEIGHT
TCP/IP P
ROTOCOL
S
TACK

WITH

AN
E
VENT
-D
RIVEN
S
CHEDULER


1065

packet by fast retransmission, out-of-order packets are waiting to be sent in the wait
queue. If the transmission speed is low enough only a few packets are sent in this way,
but progressively more out-of-order tasks remain in the wait queue as the transmission
speed increases. These tasks significantly disturb the synchronization and consequently
degrade the system performance.
To receive the acknowledgement of the retransmitted packet from a recipient, a
mechanism is required to cancel or delay the execution of out-of-order tasks. There are
two methods of task cancelation. The first method is to store the IDs of the packet trans-
mission tasks in the connection state, and destroy them at the time fast retransmission is
triggered. This method reduces the response time of the system. However, additional
states increase the size of the data structure for the connection and the scheduler is re-
quired to support the operation so as to destroy numerous tasks at one time. The second
method is to determine whether or not a task is out-of-order during the execution of the
task itself. If a currently running task is out-of-order, it terminates immediately. This
method does not require much structural modification, but the improvement in the re-
sponse time is smaller than in the first method because the decision is made after the task
has been executed by the scheduler.

4.3 Retransmission Timer

According to the TCP standard [20], a TCP connection maintains a retransmission
timer for each packet and retransmits it when a timeout occurs. However, an absolute
majority of detections and retransmissions of lost packets is performed by the fast re-
transmission mechanism, not by the retransmission timeout. Thus, it is not cost-effective
to manage a timer for each data packet for timeout events that rarely occur in terms of
time and memory space. We manage only one retransmission timer in a TCP connection,
and the timer stores the transmission time of the first packet in a sliding window that is
not acknowledged by the recipient, i.e., the data packet with the lowest sequence number
among all unacknowledged data packets. When a timeout occurs, the first packet in the
sliding window is retransmitted and the subsequent data packets are processed in the
same manner as fast retransmission.

4.4 Dynamic Adaptation

Because the event-driven scheduler does not guarantee fairness of the execution
time among tasks, much of the system performance depends on how we balance the ex-
ecution frequency of the three types of tasks, these being application tasks, packet trans-
mission tasks, and packet reception tasks. Packet reception tasks are created periodically
with a default period of 1 millisecond. Considering the purpose of a target device, this
period and the task granularity described in section 4.1 can be controlled to maximize the
performance of the system. If we increase the frequency of packet reception tasks, the
packet reception throughput and the interactivity of the communication increases. Oth-
erwise, the packet transmission throughput can be maximized if we decrease the fre-
quency. In addition, we dynamically adapted the execution frequency of packet trans-
mission tasks and packet reception tasks at runtime by measuring the number of trans-
mitted packets and received packets and observing the state transition of the TCP con-
nection.
J
OONHYOUK
J
ANG
, J
INMAN
J
UNG
, Y
OOKUN
C
HO
, S
ANGHOON
C
HOI

AND
S
UNG
Y. S
HIN


1066

5. PERFORMANCE EVALUATION
5.1 Throughput

We implemented the proposed lightweight TCP/IP protocol stack on a board embed-
ded with TI’s TMS320C6455 DSP. Table 1 shows the experimental environments. To
ensure the correctness of the measurement, we used three different tools for network
analysis: Wire Shark, iPerf, and our own code inserted into the program.
Table 1. Experimental environments.
Host
Operating system Microsoft Windows XP
Processor Intel® Core™2 T7400 @ 2.16GHz
Memory 1 GB
Ethernet Intel® PRO / 1000 PM Network
Embedded device
Operating system None
Processor 1GHz TMS302C455
Memory DDR2 512 MB
Ethernet EMAC

First, we measured the maximum throughput at the EMAC driver level without con-
sidering the receiver’s reaction. This represented the upper bound in this experimental
environment (Fig. 5). We changed the packet size from 128 to 1500 bytes with 100,000
packets and changed the number of packets from 1,000 to 20,000. The maximum through-
put at the EMAC driver level was 426 Mbps. In an identical environment, we compared
the transmission throughput of TCP/IP with that of UDP/IP. Each packet transmission
task transmits five packets, and UDP/IP transmission uses a protocol stack identical to
that of TCP/IP transmission apart from the TCP layer. Transmitting 1,000,000 packets,
TCP/IP transmission reached 296 Mbps and UDP/IP transmission reached 326 Mbps.


0
2
4
6
8
10
12
14
16
300
320
340
360
380
400
420
440
460
480
500
No. of packets(K)
Throughput(Mbits/sec)
EMAC Tx throughput
0
500
1000
1500
300
320
340
360
380
400
420
440
460
480
500
Packet Size(Byte)
Throughput(Mbits/sec)
EMAC Tx throughput

(a) EMAC transmission throughput. (b) EMAC transmission throughput.
Fig. 5. EMAC transmission and reception throughput according to the packet size (a) (c) and the
number of packets (b) (d).
D
ESIGN

OF

A
L
IGHTWEIGHT
TCP/IP P
ROTOCOL
S
TACK

WITH

AN
E
VENT
-D
RIVEN
S
CHEDULER


1067

0
500
1000
1500
300
320
340
360
380
400
420
440
460
480
500
Packet Size(Byte)
Throughput(Mbits/sec)
EMAC Rx throughput
0
2
4
6
8
10
12
14
16
300
320
340
360
380
400
420
440
460
480
500
No. of packets(K)
Throughput(Mbits/sec)
EMAC Rx throughput

(c) EMAC reception throughput. (d) EMAC reception throughput.
Fig. 5. (Cont’d) EMAC transmission and reception throughput according to the packet size (a) (c)
and the number of packets (b) (d).

0
100
200
300
400
500
600
700
800
900
1000
0
50
100
150
200
250
300
350
400
No. of packets(K)
Throughput(Mbits/sec)
DSP UDP Tx throughput by TCP/IP Tester
0
100
200
300
400
500
600
700
800
900
1000
0
50
100
150
200
250
300
350
400
No. of packets(K)
Throughput(Mbits/sec)
DSP TCP Tx throughput by TCP/IP Tester

(a) TCP/IP transmission throughput. (b) UDP/IP transmission throughput.
Fig. 6. Comparing TCP/IP transmission throughput (a) with UDP/IP transmission throughput (b)
according to the number of packets.

The transmission speed of the proposed lightweight TCP/IP protocol stack reached 69%
of the upper bound using EMAC and 90% of the UDP/IP transmission (Fig. 6).
To compare the proposed lightweight TCP/IP protocol stack with the traditional
TCP/IP protocol stack, we ported our implementation to a Linux machine using a raw
socket. As the size of the data increased, the transmission throughput using the socket
interface reached about 80 Mbps and our implementation reached 95 Mbps. The trans-
mission throughput of the proposed lightweight TCP/IP protocol stack was 1.2 to 1.5
times faster than that of TCP/IP protocol stack with Linux.

5.2 Task Granularity and Task Cancelation

We measured the impact of the task granularity described in section 4.1 on the
transmission throughput of the system (Fig. 7). With the first and the second levels of
granularity, the transmission throughput changed from 10 to 296 Mbps. When the num-
ber of data packets for a task was less than 5, the transmission throughput was less than
250 Mbps. In this case, the transmission throughput changed according to the size of the
wait queue in the scheduler as too many tasks were created. The size of the wait queue
J
OONHYOUK
J
ANG
, J
INMAN
J
UNG
, Y
OOKUN
C
HO
, S
ANGHOON
C
HOI

AND
S
UNG
Y. S
HIN


1068

0
10
20
30
40
50
0
50
100
150
200
250
300
350
400
No. of packets per task
Throughput(Mbits/sec)

Fig. 7. Transmission throughput according the number of data packets for a task.
was varied from 256 to 4096. When the number of data packets for a task exceeded 10,
the transmission throughput reached its maximum throughput level. When it exceeded 50,
transmission throughput was stable, but buffer overflows frequently occurred and the
throughput decreased sharply in the event of a packet loss or when the transmission was
delayed. With repetitive experiments, we found that the transmittance of 5 to 20 packets
was the best for a task. However, the optimal point can differ according to the execution
environment because the available size of the packet buffer and the wait queues are af-
fected by the memory space of the device.
With the third level of granularity, which a task creates itself before termination, the
system did not show the expected level of performance. Because the packet transmission
tasks were executed too frequently, the packet reception tasks and application tasks could
not easily be executed. Finally, we measured the difference in the delay between the exe-
cution with task cancelation and that without task cancelation. As it was not easy to
measure this at runtime, we captured the packet sequence and traced the retransmission
process after a packet loss occurred. When we used the task cancelation mechanism, the
recovery was 5 to 10 times faster than it was without.
6. CONCLUSION
With the software-based TCP/IP Offload Engine (TOE), or lightweight TCP/IP, re-
searchers have improved the design of the traditional TCP/IP protocol stack to optimize
the system performance for use in embedded systems. In this paper, we proposed the
design of a lightweight TCP/IP protocol stack that runs it on an event-driven scheduler of
the type used in real-time operating systems and analyzed its characteristics. We intro-
duced three levels of task granularity for an event-driven scheduler and discussed the
synchronization problem associated with packet transmission and reception which can
occur in the TCP retransmission process. We implemented the proposed lightweight
TCP/IP protocol stack on an embedded system and evaluated it to confirm that the pro-
posed design can suitably process network communication efficiently on an embedded
system. Lightweight TCP/IPs tend to be designed to achieve platform independence so as
to be operable on various platforms. However, designs and implementations that are in-
tegrated with the software architecture considering its platform and operating system are
required to optimize the network communication performance on an embedded system.
D
ESIGN

OF

A
L
IGHTWEIGHT
TCP/IP P
ROTOCOL
S
TACK

WITH

AN
E
VENT
-D
RIVEN
S
CHEDULER


1069

REFERENCES
1. M. C. Chan and R. Ramjee, “Improving TCP/IP performance over third generation
wireless networks,” in Proceedings of IEEE INFOCOM, Vol. 7, 2008, pp. 430-441.
2. Z. Z. Wu and H. C. Chen, “Design and implementation of TCP/IP offload engine
system over gigabit ethernet,” in Proceedings of the 15th International Conference
on Computer Communications and Networks, 2006, pp. 245-250.
3. E. Yeh, H. Chao, V. Mannem, J. Gervais, and B. Booth, “Introduction to TCP/IP of-
fload Engine (TOE),” http://www.10gea.org/tcp-ip-offload-engine-toe/, 2002.
4. P. Balaji, H. V. Shah, and D. K. Panda, “Sockets vs. RDMA interface over 10-Gigabit
Networks: An in-depth analysis of the memory traffic bottleneck” in Proceedings of
Workshop on Remote Direct Memory Access: Applications, Implementations, and
Technologies, 2004.
5. H. W. Jin, P. Balaji, C. Yoo, J. Y. Choi, and D. K. Panda, “Exploiting NIC architec-
tural support for enhancing IP-based protocols on high-performance networks,”
Journal of Parallel and Distributed Computing, Vol. 65, 2005, pp. 1348-1365.
6. X. Q. Zheng and Y. Yuan, “Design and Implementation of embedded network based
on LWIP,” Journal of Measurement Science and Instrumentation, Vol. 1, 2010, pp.
30-33.
7. D. Schweikert, “A lightweight and high-performance TCP/IP stack for Topsy,”
Computer Engineering and Network Laboratory, ETH Zuurich, 1998.
8. P. Steenkiste, “Design, implementation, and evaluation of a single-copy protocol
stack,” Software  Practice and Experience, Vol. 28, 1998, pp. 749-772.
9. I. S. Yoon, S. H. Chung, and Y. G. Kwon, “Implementation of a software-based
TCP/IP offload engine using standalone TCP/IP without an embedded OS,” Journal
of Information Science and Engineering, Vol. 27, 2011, pp. 1871-1883.
10. M. L. Chiang and Y. C. Li, “LyraNET: A zero-copy TCP/IP protocol stack for em-
bedded systems,” Journal of Real-Time Systems, Vol. 34, 2006, pp. 5-18.
11. L. Tianhua, Z. Hongfeng, L. Jie, and Z. Chuansheng, “The design and implementa-
tion of zero-copy for linux,” International Journal of Computer Information Systems
and Industrial Management Applications, Vol. 3, 2011, pp. 009-018.
12. “micro-IP,” http://www.pjort.com/micro-ip/.
13. A. Dunkels, “Design and implementation of the lwIP TCP/IP stack,” Swedish Insti-
tute of Computer Science, 2001.
14. “tinyTCP,” http://www.unusualresearch.com/tinytcp/tinytcp.htm.
15. NexGenIP, http://www.nexgen-software.com.
16. NETX, http://jnlp.sourceforge.net/netx/.
17. P. Agrawal, T. S. Teck, and A. L. Ananda, “A lightweight protocol for wireless sen-
sor networks,” in Proceedings of IEEE Wireless Communications and Networking,
2003, pp. 1280-1285.
18. K. Salah, K. El-Badawi, and F. Khalid, “Performance analysis and comparison of
interrupt-handling schemes in gigabit network,” Computer Communications, Vol. 30,
2007, pp. 3425-3441.
19. H. J. Kim, W. J. Song, and S. H. Kim, “Light-weighted internet protocol version 6
for low-power wireless personal area networks,” in Proceedings of IEEE Interna-
tional Symposium on Consumer Electronics, 2008, pp. 1-4.
J
OONHYOUK
J
ANG
, J
INMAN
J
UNG
, Y
OOKUN
C
HO
, S
ANGHOON
C
HOI

AND
S
UNG
Y. S
HIN


1070

20. J. Postel, “Transmission control protocol,” RFC 793, http://www.faqs.org/rfcs/rfc793.
html, 1981.


Joonhyouk Jang received his B.S. degree in Computer
Science from Seoul National University, Seoul, Korea, in 2009.
He is currently a Ph.D. student of School of Computer Science
and Engineering, Seoul National University, Seoul, Korea. His
current research interests include operating system, embedded
systems, and computer security.






Jinman Jung received his B.S. degree in Computer Science
from Seoul National University, Seoul, Korea, in 2008. He is
currently a Ph.D. student of School of Computer Science and
Engineering, Seoul National University, Seoul, Korea. His cur-
rent research interests include operating system, embedded sys-
tems, mobile communications, and fault-tolerant computing sys-
tems.





Yookun Cho received the B.E. degree from Seoul National
University, Seoul, Korea, in 1971 and the Ph.D. degree in com-
puter science from the University of Minnesota, Minneapolis, in
1978. Since 1979, he has been with the School of Computer Sci-
ence and Engineering, Seoul National University, where he is cur-
rently a Professor. In 1985, he was a Visiting Assistant Professor
with the University of Minnesota, and from 2001 to 2002, he was
the President of the Korea Information Science Society. He also
served as the honorary conference chair for ACM SAC 2007. His
research interests include operating systems, algorithms, system
security, and fault-tolerant computing systems.







D
ESIGN

OF

A
L
IGHTWEIGHT
TCP/IP P
ROTOCOL
S
TACK

WITH

AN
E
VENT
-D
RIVEN
S
CHEDULER


1071

Sanghoon Choi received his B.S. degree in Computer Sci-
ence from Soongsil University, Seoul, Korea, in 2011. He is cur-
rently a Ph.D. student of School of Computing, Soongsil Univer-
sity, Seoul, Korea. His current research interests include operat-
ing system, embedded systems, and system software.







Sung Y. Shin received the M.S. and Ph.D. degree in Com-
puter Science from University of Wyoming, Laramie, WY in
1984 and 1991 respectively. He is a professor and Graduate Co-
ordinator of Computer Science at South Dakota State University
since 1991. He has authored/coauthored over 130 technical peer
reviewed papers in Software Engineering, Software Fault Toler-
ance, Telemedicine, and Medical Image Processing and GIS area.
He worked as a visiting scientist for the Space and Life Science
Division at NASA Johnson Space Center in Houston, TX from
1999 to 2002. He was a Conference Chair of ACM SAC 2007, 2009 and 2010. He had
served as a Vice Chair of ACM SIGAPP from 2005 to 2009, and he is currently serving
as a chair of ACM SIGAPP.