IPv6 Tunneling Over an IPv4 Network
James M.Moscola,David Lim,Alan Tetley
Department of Computer Science
Campus Box 1045
One Brookings Drive
Saint Louis,MO 63130
Due to the growth of the internet,the current address space provided provided by IPv4,with only
4;294;967;296 addresses,has proven to be inadequate.Because of IPv4’s shortcomings,a new protocol,
IPv6,has been created to take its place.This new protocol,using its 128-bit address scheme (thats
addresses per square meter of earth!),should provide enough addresses for everyone’s computer,
refrigerator and their toaster to have a connection to the internet.To help facilitate the movement from
an IPv6 internet to an IPv4 internet we have created a module for the the Field Programmable Port
Extender (FPX) in accordance with RFC1933.This module allows IPv6 packets coming from an IPv6
network to be packed into IPv4 packets,tunneled through an IPv4 network and then unpacked at the
other end of the tunnel before reentering an IPv6 network.This approach to incorporating the new
IPv6 speciﬁcation allows a progressive changeover of networks from IPv4 to the newer IPv6.The current
implementation runs at 80 MHz.
Due to the growth of the internet,the current address space provided by IPv4,with only 4;294;967;296
addresses,has proven to be inadequate.A new protocol,IPv6 ,has been developed and promises to
facilitate the continual growth of the internet community.IPv6 is capable of oﬀering 2
which amounts to approximately 340 trillion trillion trillion addresses (no that is not a typo,it is truly 340
There are several ways to make the transition from the current IPv4 internet implementation to the
newer IPv6 internet implementation.The ﬁrst option,and also the least likely to happen,is to choose a
day and have everyone with network hardware and software change their implementation.This approach
is highly unlikely to happen,and some might say even impossible.Another approach is to have new hosts
and routers support both IPv4 and IPv6.This is a much more reasonable approach but still has some
problems.Consider the situation where an IPv6 host is sending data to another IPv6 host.The host cannot
predetermine the route the data takes along the way and therefore cannot guarantee all networks the data
passes through will support IPv6.A third approach is to allow both IPv4 and IPv6 networks to reside on
the internet and tunnel IPv6 packets through IPv4 networks .In other words,when an IPv6 packet is
leaving an IPv6 domain and entering an IPv4 domain,the packet is encapsulated in an IPv4 packet and
transmitted through the network.When the packet reaches the other end of the IPv4 network the IPv4
headers are removed from the IPv6 packet and the IPv6 packet can continue on to an IPv6 domain.
A module for the Field Programmable Port Extender (FPX)  has been created that implements the
third method described above.The module contains support for both ends of the tunneling protocol and can
both pack and unpack IPv6 packets into and from IPv4 packets.Figure 1 shows the layout of the tunneling
modules between IPv6 and IPv4 networks.
Figure 1:Layout of Tunneling modules between IPv6 and IPv4 networks
2 Field Programmable Port Extender (FPX)
The FPX is a reprogrammable logic device that provides a hardware platform for the user to deploy packet
processing network modules.It acts as an interface between the line cards and the WUGS (Washington
University Gigabit Switch) ,and can be inserted between these devices as shown in Figure 2.The FPX is
composed of two FPGAs:the Network Interface Device (NID) and the Reprogrammable Application Device
2.1 Network Interface Device (NID)
The NID controls how packet ﬂows are routed to and from modules.It also provides mechanisms to dy-
namically load router hardware modules over the network.The combination of these features allows these
modules to be dynamically loaded and unloaded without aﬀecting the switching of other traﬃc ﬂows or the
Figure 2:Conﬁguration for the WUGS,FPX,and the Line Cards
processing of packets by other modules in the system.As show in Figure 3,the NID has several components,
all of which are implemented in FPGA hardware.It contains a four-port switch to transfer data between
ports;Virtual Circuit lookup tables (VC) on each port in order to selectively route ﬂows;a Control Cell
Processor (CCP),which is used to process control cells that are transmitted and received over the network;
logic to reprogram the FPGA hardware on the RAD;and synchronous and asynchronous interfaces to the
four network ports that surround the NID.
Figure 3:Major Components of the FPX
2.2 FPX Reprogramability
The RAD can be programmed and reprogrammed to hold user-deﬁned network modules,and is connected
to two SRAM and two SDRAM components (Figure 3).In order to reprogram the RAD over the network,
the NID implements a reliable protocol that ﬁlls the contents of the on-board RAM with conﬁguration data
that are transmitted over the network.As each cell arrives,the NID uses the data and the sequence number
in the cell to write data into the RAD Program SRAM.Once the last cell has been correctly received,the
FPX holds an image of the reconﬁguration bytestream that is needed to reprogram the RAD.At that time,
another control cell can be sent to the NID to initiate the reprogramming of the RAD using the contents of
the RAD Program SRAM.
The FPX supports partial reprogramming of the RAD by allowing conﬁguration streams to contain
commands that only programa portion of the logic on the RAD.Rather than issue a command to reinitialize
the device,the NIDwrites the frames of reconﬁguration data to the RAD’s reprogramming port.This feature
enables the other modules on the RAD to continue processing packets during the partial reconﬁguration.
Similar techniques have been implemented in other systems using software-based controllers  .
3 Protocol Wrappers
Protocol Wrappers   are used in the regular expression module to streamline and simplify the net-
working functions to process ATM cells and AAL5 frames directly in hardware.They use a layered design
and consist of diﬀerent processing circuits within each layer.The block diagram of the Protocol Wrappers is
shown in Figure 4.At the lowest level,the Cell Processor processes raw ATM cells between network inter-
faces.At the higher levels,the Frame Processor processes variable length AAL5 frames.Diﬀerent layers of
abstraction are important for structuring a network because doing so allows applications to be implemented
at speciﬁc levels where important details may be exposed and irrelevant details may be hidden.In this
manner,an application that interacts with AAL5 frames can eﬀectively use the Protocol Wrappers.
Figure 4:Block Diagram of IP Tunneling module in the Protocol Wrappers.
4 Implementation Details
Several processing components have been combined with the Frame Wrappers to implement an IPv6 over
IPv4 tunneler for the FPX.An overview of the design is shown in Figure 5.The Frame Wrappers pro-
cess incoming ATM cells to provide the interior components with full AAL5 ATM frames.These internal
components then check the frame to see if they should process the packet or pass it on.The Control Cell
Processor (CCP) sits on the back end of the Frame Wrappers and receives AAL0 control cells which are
used to conﬁgure tunnels.It passes the information from these cells to the Address Lookup where they are
stored.When the IPv6 component gets an IPv6 packet,it checks the Address Lookup to see if it should be
packed.This is determined by checking if the destination IP is part of a known subnet.If a match is found,
the packet must be packed since its next hop is part of an IPv4 network.Otherwise the packet is routed
like a normal IPv6 packet.The other end of the tunnel is handled by the IPv4 component.This component
processes incoming IPv4 packets and checks the destination IP and the IP protocol ﬁeld to determine if the
packet should be unpacked,and does this process if necessary.If not,the packet is routed like a normal
IPv4 packet.Even though both ends of the tunnel have been implemented,both ends do not have to be this
implementation.Since RFC1933 has been followed,either end of the tunnel can be any other router or
host that also follows this speciﬁcation.
Figure 5:Flow Diagram for IP Tunneling Module
4.1 IPv6 Processor
The general design of the IPv6 component is seen in Figure 6.As frames enter the IPv6 component,
they are ﬁrst buﬀered into a FIFO (1).This is necessary since the Address Lookup component can take an
indeterminate and variable amount of time to respond to lookup requests.An FSM(2) has been implemented
to take care of this task.The machine looks at SOF,EOF,and DataEn to buﬀer all valid words of data.
In the case where part of a frame is dropped,the FSM sees consecutive SOFs without an EOF,and clears
the entire FIFO.This will end up dropping any previous packets already in the buﬀer.However,this design
decision greatly reduces logic complexity and is a case that should not occur often anyway.The FSM also
keeps track of the length using the input control signals and places it in a FIFO (3).This is the only way to
know the length of non-IPv6 packets.
A second FSM (4) controls the output of data.It is responsible for generating output SOF,EOF,and
DataEn signals as well as appropriate data depending on the type of packet being passed out.When the
machine senses that there is data in the input buﬀer,it moves the ATM header into another FIFO (5) and
checks to see if the packet is an IPv6 packet and should be processed.If not,the data is output immediately
from the input buﬀer.Otherwise,the IPv6 header is moved into the header buﬀer.During this process,the
component decrements the Hop Limit by 1 and records the payload length.If the Hop Limit is reduced to
0,the packet is dropped.In addition,as the destination IP is being moved,the FSM passes it out to the
Address Lookup component.As soon as the Address Lookup responds,the FSM starts outputting data,
starting with the ATM header.If a match was found,the component must pack this IPv6 packet into an
Packing is done by inserting a valid IPv4 header (6) into the output before the IPv6 header and payload.
The makeup of this header is shown in Figure 6.The length is calculated by adding the payload length of
the IPv6 packet to a constant 60 bytes for the IPv4 and IPv6 headers (IPv4 length includes the header,
whereas the IPv6 length does not).The source IP is the address of the FPX module,and the destination IP
is the IPv4 address returned by Address Lookup.The checksum is calculated over the whole IPv4 header,
including these values.However,it must be output in the third word.To achieve this,the checksum of the
ﬁrst three words and source IP are calculated while the IPv6 header is being moved between the FIFOs.
This is possible since all of these values are known except the length,which can be determined in the third
word of the IPv6 header.Then,the checksum is ﬁnished up by adding in the destination address as soon as
it is output from the Address Lookup,and in parallel to outputting the ﬁrst word of the IPv4 header.
Once the IPv4 header is output the process is the same for both packed packets and IPv6 packets just
being passed through.First,the IPv6 header is output from its FIFO.Then,the payload is output from the
input buﬀer.The payload length determines the amount of data read,rather than just emptying the FIFO.
This is because there could be another packet in the buﬀer waiting to be processed.
SOF, EOF, DataEn
SOF, EOF, DataEn
IPv6 Header BufferInput Buffer
Figure 6:IPv6 Component Design
4.2 IPv4 Processor
The IPv4 Processor resides on the backside of the IPv6 Processor in the current implementation (Figure 5).
Frames can enter the IPv4 Processor as either IPv4 frames,IPv6 frames,or any other type of data that
may be passing through the switch.The ﬁrst thing the IPv4 Processor does when receiving data is check
the version and the IP header length to decide if the frame is IPv4.If the frame is not IPv4,all data just
passes through the module without modiﬁcation.Otherwise,if the frame is IPv4,a series of actions is taken.
These actions can be followed in Figure 7.Firstly,the time-to-live (ttl) ﬁeld is checked for validity.If the
ttl ﬁeld is equal to zero,then the packets lifetime has expired and the packet is dropped.If the ttl ﬁeld is
not zero,it is decremented and a new IPv4 header checksum is calculated for the checksum ﬁeld.The IPv4
header checksum is validated upon receiving an IPv4 packet.If the header checksum is invalid,the packet
is dropped.Following this,the IPv4 Processor checks both the protocol ﬁeld and the destination address
ﬁeld of the packet.If the protocol ﬁeld is not equal to 0x29 (next encapsulated protocol is IPv6) or the
destination address of the IPv4 packet is not equal to the address of the switch that the module is residing
on,the rest of the IPv4 packet is sent to the Frame Processor without modiﬁcation.However,if both the
protocol ﬁeld and the destination address match,then the packet needs to be unpacked.To unpack the IPv6
packet from the IPv4 packet the IPv4 headers are simply removed and the IPv6 hop limit is decremented.
The IPv6 packet is then sent to the Frame Processor with no further modiﬁcation.
sof_in = ’1’
version = 4 and
iphl >= 5
version /= 4 or
iphl < 5
dropttl = 0
iphl = 5if protocol = 0x29 and
dest addy = local addy
unpack <= ’1’
iphl = 5
eof_in = ’1’
eof_in = ’1’
checksum /= x"FFFF"
Figure 7:IPv4 State Machine
4.3 Control Cell Processor
The control cell processor(CCP) is responsible for two things:1)adding (up to four) IPv4 tunnels that will
later on be used by the IPv6 processor for packing incoming IPv6 packets,2)updating the FPX IP address,
the IPv4 processor compares this address with the destination addresses of any incoming IPv4 packets to
see if the incoming IPv4 packets have our module as the end of a tunnel.
The CCP is the ﬁrst module that gets an incoming cell.Therefore it is responsible for checking to see if
the cell is a control cell (VCI=35).If the incoming cell is not a control cell,it just passes the cell through
so that the IPv6 processor and the IPv4 processor will get it.If the incoming cell is a control cell,the CCP
checks the opcode of the incoming cell.When it sees opcode 0x10,the CCP adds an IPv4 tunnel;when it
sees opcode 0x12,the CCP updates the FPX IP address.If the CCP sees an opcode other than 0x10 or
0x12,it simply passes the cell through.The ﬁnite state machine for the CCP is shown in Figure 8.
When the CCP sees opcode 10h it saves the incoming subnet,mask and destination address for an IPv4
tunnel.For now,everything is stored in registers.The CCP just enables 32-bit registers at the right time,
latching ﬁrst the subnet,then the mask,then destination address of the IPv4 tunnel.Note that both the
subnet and the mask are 128 bits long,they are therefore latched on four consecutive clocks,i.e ﬁrst latching
the highest 32 bits of the subnet,then the next 32 bits...
When the CCP sees opcode 12h it saves the incoming IP address in a register,also by enabling the FPX
IP address register.The ﬁnite state machine for the CCP is shown in Figure 8.
4.4 Lookup Engine
The address lookup module is responsible for returning the IPv4 address for the end of an IPv4 when given
an IPv6 address by the IPv6 processor.It does so by,going through each available tunnel.For each tunnel,
ﬁrst,the address lookup module masks the IPv6 address sent to it with the IPv6 address mask for that
tunnel,then compares the masked subnet with subnet for that tunnel,if there is a match,it returns the
Figure 8:Finite state machine for Control Cell Processor
The address lookup module sits in idle until it gets an address request from the IPv6 processor it then
proceeds to latch in the 128-bit IPv6 address.Note that this takes four clock cycles because the address is
coming in 32 bits at a time.After it has latched the IPv6 address,the address lookup module then proceeds
to mask then compare the IPv6 address to the subnets available.Note that this is could have been done in
one clock cycle.However,in an attempt to meet a 100MHz clock rate the masking is done in the ﬁrst clock,
then there is a two-cycle compare.It there is a match,the address lookup simply returns the corresponding
IPv4 address to the IPv6 processor.If there is no match,the the address lookup moves on the the next
tunnel.The ﬁnite state machine of the address lookup is shown in Figure 9.
The following sections go through both the simulation and the synthesis results.
5.1 Simulation Results
A simulation testbench has been setup to test the functionality of our tunneling module.For creating ATM
cells,we use the IPTestBench.Details on using the IPTestBench can be found in Section 5 of the paper
entitled Layered Protocol Wrappers for Internet Packet Processing in Reconﬁgurable Hardware
.ModelSim was used to send these cells through our module.Below you can see several output waveforms
that show the module running in simulation.
In Figure 10 the wave forms shows several things.The ﬁrst thing that happens is a control cell comes in to
set the local IP address for the switch.In this example the IP address for the switch is set to 0xADD0ADD0
or 22.214.171.124.The next two incoming cells contain an IPv6 packet.Because there are currently no
Figure 9:Finite state machine for Address Lookup
entries in the lookup tables,the IPv6 lookup fails and the packet comes out without being packed.Following
this,another control cell is sent in to add an entry to the lookup table.Then the same IPv6 packet from
before is sent through the module again.However,this time the lookup succeeds and the IPv6 packet is
encapsulated in an IPv4 packet.With the addition of the ﬁve IPv4 header words the outgoing packet is now
comprised of three ATM cells.
Figure 10:An IPv6 Packet passing through the module before and after the destination address has been
added as a route to the lookup table
The next waveform,Figure 11,shows the output of the IPv6 module after it has encapsulated an IPv6
packet into an IPv4 packet.Notice the IPv6 header is still intact as part of the payload of the IPv4 packet.
The destination address for this new IPv4 packet has been decided using the lookup tables and inserted
into the packet.In this example,the lookup table has returned a value of0x7F000001 or 127.0.0.1.The
destination address would always be a valid internet address and not the localhost,however,for simulation
we chose this value.
Figure 11:A closeup of an IPv6 packet encapsulated in an IPv4 packet
The ﬁnal waveform,Figure 12,shows the following sequence of events.The ﬁrst cell that arrives at the
module is a control cell to set the local IP address for the switch.In this example the IP address for the
switch is set to 0x7F000001 or 127.0.0.1.Notice once again that the localhost address would not be used
in a real environment.Following the control cell three ATM cells containing an IPv4 packet arrive at the
module.Because the destination address of the IPv4 packet is equal to the IP address of the switch (currently
set to 127.0.0.1) and the protocol ﬁeld is equal to 0x29 the module decides it needs to unpack the data
from the IPv4 packet.Notice the incoming IPv4 packet consisted of three ATM cells and the outgoing IPv6
packet consists of only two ATM cells.This is because the ﬁve IPv4 headers are stripped away shrinking
the frame to only two ATM cells.
Figure 12:An IPv4 packet goes into the modules,the encapsulated IPv6 packet is unpacked from the IPv4
packet and sent onto the network
5.2 Synthesis Results
The current hardware implementation is capable of running at 80MHz.This amounts to approximately
2:5GB=s of data that can pass through our module (OC-48 speeds).The placement of the circuit on a
Xilinx Virtex XCVE-1000E yields the following chip statistics:
Maximum Frequency:80 MHz
Number of Slice Flip Flops:5,049 out of 24,576 (20%)
Total Number of LUTs:4,430 out of 24,576 (18%)
Number of Block RAMs:15 out of 96 (15%)
Total Equivalent Gate Count:321,724
6 Future Enhancements
The current tunneling module was designed with a place holder for the IPv6 address lookup and routing
tables.To improve upon the tunneling module,and to make it a truly useful module,we have designed it
such that a new address lookup/routing table can be dropped in place of the current place holder.With a
real address lookup and real routing tables this module could be extremely useful to anyone with an FPX
looking to support tunneling.
Another enhancement to the module would be the support of ICMP packets.Currently,when IP packets
are dropped in the IPv4 and IPv6 Processor no ICMP messages are returned to the sender.ICMP is not a
required part of either the IPv4 or the IPv6 protocol,however,it would make a more robust switch were we
to include this functionality.
Finally,if we had more time to work on the design we could deﬁnitely achieve the 100MHz goal of the
project.This would allow us to process data at a whopping 3:2GB=s,well above OC-48 speeds.
“Internet Protocol,Version 6 (IPv6) Speciﬁcation.” Online:http://www.faqs.org/rfcs/-
“Transition Mechanisms for IPv6 Hosts and Routers.” Online:http://www.faqs.org/rfcs/-
J.W.Lockwood,J.S.Turner,and D.E.Taylor,“Field programmable port extender (FPX) for dis-
tributed routing and queuing,” in ACM International Symposium on Field Programmable Gate Arrays
J.W.Lockwood,N.Naufel,J.S.Turner,and D.E.Taylor,“Reprogrammable Network Packet Pro-
cessing on the Field Programmable Port Extender (FPX),” in ACM International Symposium on Field
Programmable Gate Arrays (FPGA’2001),(Monterey,CA,USA),pp.87–93,Feb.2001.
T.Chaney,J.A.Fingerhut,M.Flucke,and J.S.Turner,“Design of a gigabit ATMswitch,” Tech.Rep.
WU-CS-96-07,Washington University in Saint Louis,1996.
D.E.Taylor,J.W.Lockwood,and N.Naufel,“RAD Module Infrastructure of the Field-programmable
Port eXtender (FPX),” tech.rep.,WUCS-01-16,Washington University,Department of Computer
W.Westfeldt,“Internet reconﬁgurable logic for creating web-enabled devices.” Xilinx Xcell,Q1 1999.
S.Kelem,“Virtex conﬁguration architecture advanced user’s guide.” Xilinx XAPP151,Sept.1999.
F.Braun,J.W.Lockwood,and M.Waldvogel,“Layered protocol wrappers for internet packet pro-
cessing in reconﬁgurable hardware,” Tech.Rep.WU-CS-01-10,Washington University in Saint Louis,
Department of Computer Science,June 2001.
F.Braun,J.Lockwood,and M.Waldvogel,“Reconﬁgurable router modules using network protocol
wrappers,” in to appear:Proceedings of Field-Programmable Logic and Applications,(Belfast,Northern