Graphics Processing Units

birdsowlΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

74 εμφανίσεις

Accelerating Coherent Pulsar

De
-
dispersion on

Graphics Processing Units

by

Arjun Radhakrishnan


supervised by

Prof. Michael Inggs

Outline


Graphics Processing Units (GPUs)


Pulsars


Pulsar De
-
dispersion


Motivation


Implementation


Results


Conclusion & Future Work

Graphics Processing Units


GPUs are massively parallel
processors that are present on
consumer graphics cards


Generally used to render 3D
objects on screen and
calculate the colour of pixel to
display


Are mass market products due to the video game industry


Performance tracks Moore's Law since the majority of on
-
chip
space is devoted to compute units as opposed to cache on CPUs

*Source: [7]

Why Use GPUs?

Figure 1: Peak floating point performance of NVIDIA GPUs vs Intel CPUs [2]

Pulsars


Highly magnetised, rapidly
rotating neutron stars formed
after a supernova


Pulsars emit beams of
electromagnetic radiation
from their magnetic poles


Beams sweep in a circular
path called the “lighthouse
effect”


Produce periodic pulses when
the pulse sweeps Earth

Figure 2: Pulsar Model [3]

Pulsar Dispersion


Pulsar emissions are distorted
upon passing through the
ionised Interstellar Medium
(ISM)


Lower frequency components
of the pulse are delayed more
than higher frequencies

Pulsar De
-
dispersion


Pulsar emissions are distorted
upon passing through the
ionised Interstellar Medium
(ISM)


Lower frequency components
of the pulse are delayed more
than higher frequencies


Correct for the dispersion by
shifting the received signal a
certain amount

Figure 3: Pulsar De
-
dispersion [4]

Coherent De
-
dispersion


Coherent de
-
dispersion is the most accurate method of removing
the dispersion effects of the Interstellar Matter


Preserves amplitude and phase information from the receiving
signal


Convolve the voltage signal with the inverse transfer function of
the ISM


This transfer function is a function of the Dispersion Measure (DM)
of the signal got from models of the galactic electron density


In practice we use the Fast Fourier Transform (FFT) to make the
convolution operation a multiplication in the frequency domain and
then apply an inverse FFT

Motivation


Why study Pulsars?


A major SKA Science driver: Detection of gravitational waves and tests
of strong field relativity; Analysing black holes


GPU acceleration for MeerKAT


Large frequency range (Low: 0.5


2.5 GHz, High: 8


14.5GHz)


High bandwidth per polarisation (4GHz final)


Large number of channels (16384)


>10GB of data per second


Even more important for SKA since precision will be a high
priority and data storage is not feasible

Implementation Considerations


Both CPU and GPU were tested with single
-
precision floating point


A bottleneck for GPU computing is the time taken to send data to it
from main memory


minimise as much as possible


Use asynchronous data transfers to hide the latency


Re
-
calculate rather than copy data across


Use shared memory on the GPU for calculations and store to global
memory at the end


Source data file used is fake dual polarisation data generated with a
DM of 50pc/cm
3

and 100MHz bandwidth centred on 1450MHz


Basic Program Flow

Figure 4: Program flow

Read in Data

HOST

Copy to GPU memory

Initiate GPU Kernel

V(f
0
) . H
-
1
(f
0
)

V(f
n
) . H
-
1
(f
n
)

Receive de
-
dispersed signal

Free Memory

Inverse FFT

Inverse FFT

Parallel FFT

Parallel FFT

DEVICE

Allocate memory on GPU

Begin De
-
dispersion

V(f
1
) . H
-
1
(f
1
)

+

Output Array

Send Data Back to Host

Inverse FFT

...

...

Parallel FFT

...

+

Results

Figure 5: Left: Overall speedup (5x) Right: Kernel Speedup (12x)

Results


Was able to coherently de
-
disperse 50MHz on 1 GPU


Used 2 GPUs for the full 100MHz


Scaling across multiple GPUs was linear


Using larger transfer functions was found to increase performance
since there was less of an overhead in memory access times

Conclusion


GPUs are significantly faster than CPUs for de
-
dispersion


Enabled real
-
time coherent de
-
dispersion for the dataset used


Coherent de
-
dispersion of a 100MHz bandwidth signal requires
multiple GPUs at present


Faster memory access would greatly improve overall speedup


Currently testing with real undetected pulsar data

Thank You!

Questions?

References

1.
D. R. Lorimer and M. Kramer,
Handbook of Pulsar Astronomy

Cambridge
University Press, 2005

2.
NVIDIA CUDA Programming Guide

3.
D. Manchester, “CSIRO ATNF Pulsar Education Page”

4.
Jim Cordes, “The SKA as a Radio Synoptic Survey Telescope: Widefield
Surveys for Transients, Pulsars and ETI”, SKA Memo 97

5.
John Rowe Animation/Australia Telescope National Facility, CSIRO [Online].
http://www.atnf.csiro.au/research/pulsar/array/gallery.html

6.
Cornell University Dept. of Astronomy, “Legacy Pulsars: Homepage” [Online].
http://arecibo.tc.cornell.edu/legacypulsardata/Default.aspx

7.
VR
-
Zone, “The NVIDIA GeForce GTX 280 1GB bare,” [Online]. http://vr
-
zone.com/articles/nvidia
-
geforce
-
gtx
-
280
-
preview/5872.html?doc=5872