TITOLI TESINE SSD

globestupendousSecurity

Dec 3, 2013 (3 years and 10 months ago)

80 views

TITOLI TESINE SSD



TUTOR

TITLE

ABSTRACT

1

Aniello Leonardo

aniello@dis.uniro
ma1.it


Fault Tolerance in
Storm for Stateful
Bolts

Storm [1] is a distributed event processing framework that is being employed
by important companies
like Twitter and Groupon. In Storm, a computation is modeled as a topology, that is a directed graph of
processing components (spouts and bolts) that communicate each other by sending/receiving events.
These components are usually de
ployed over a cluster of machines in order to increase parellism and
improve performance. A relevant issue concerns the fault tolerance for stateful components: how to
preserve their states in case of failures? Possible solutions include storing the state
in a stable storage [2,
3] and replicating the stateful components.

Provide the detailed design of a realizable strategy to achieve fault tolerance for stateful components in
Storm.


References

[1] Storm,
https://github.com/nathanmarz/storm/wiki

[2] Storm State,
https://github.com/stormprocessor/storm
-
state

[3] Trident State,
https://github.com/nathanmarz/storm/wiki/Trident
-
state

2

Aniello Leonardo

aniello@dis.uniro
ma1.it

Cloudera Impala

In May, version 1.0 of Cloudera Impala
has been released [1]. Impala [2] is an open source, distributed
SQL query engine for Apache Hadoop that circumvents MapReduce to directly access the data, so as to
drastically improve performance. It has been inspired by Google Dremel [3].

Provide a detai
led discussion about the key features and technologies employed by Impala, together with
an analysis of pros and cons with respect to other Hadoop query engines, like Hive.


References

[1] Cloudera Impala 1.0: It’s Here, It’s Real, It’s Already the Standard for SQL on Hadoop,
http://blog.cloudera.com/blo
g/2013/05/cloudera
-
impala
-
1
-
0
-
its
-
here
-
its
-
real
-
its
-
already
-
the
-
standard
-
for
-
sql
-
on
-
hadoop/

[2] Cloudera Impala 1.0 Documentation,
http://www.cloudera.com/content/support/en/documentation/cloudera
-
impala/cloudera
-
impala
-
documentation
-
v1
-
latest.html

[3] Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo
Vassilakis, "Dremel: Inter
active Analysis of Web
-
Scale Datasets", 2010

3

Bonomi Silvia

bonomi
@dis.unir
oma1.it


Opportunism in
Social Networking

During the last years, we assisted to the massive proliferation of mobile devices with ever

increasing
computational and communication capabilities, such as smart
-
phones, netbooks, tablets, heterogeneous
sensors etc.... This new technological context opened new research directions and new communication
paradigms emerged. Among the others, opport
unistic networks and opportunistic computing are
definitely one of the most promising and interesting one.

Among the others, opportunistic behaviors in social networks are quite interesting since they require the
adaptation of classical dissemination model
s to consider selfish behaviors.


The student is required to study and report about dissemination models in social networks (both
opportunistic and not). The report must also analyze challenges and/or benefits arising from the
presence of opportunistic age
nts in social networks.


-

Anna
-
Kaisa Pietilänen and Christophe Diot. 2012. Dissemination in opportunistic social networks: the
role of temporal communities. In Proceedings of the thirteenth ACM international symposium on Mobile
Ad Hoc Networking and Compu
ting (MobiHoc '12).

http://dl.acm.org/citation.cfm?id=2248396


-

Yahui Wu, Su Deng, Hongbin Huang: Information Propagation through Opportunistic Communication in
Mobile Social Networks.
MONET 17(6): 773
-
781 (2012)

http://link.springer.com/article/10.1007/s11036
-
012
-
0401
-
3


-

Mary R. Schurgot, Cristina Comaniciu, Katia Jaffrès
-
Runser: Beyond traditional
DTN routing: social
networks for opportunistic communication. IEEE Communications Magazine 50(7): 155
-
162 (2012)

http://arxiv.org/pdf/1110.2480.pdf


-

Ceren Budak, Divyakant Agrawal, and Amr El

Abbadi. 2011. Limiting the spread of misinformation in
social networks. In Proceedings of the 20th international conference on World wide web (WWW '11).

http://dl.acm.org/citation.cfm?i
d=1963499

4

Bonomi Silvia

bonomi
@dis.unir
oma1.it


Dynamic Networks:
Models and
Distributed
Agreement
Abstractions

Modern distributed systems are characterized by the continuous evolution of entities belonging
to the
system itself. Traditional models designed for "static" systems must be extended to deal with network
topology changes (both in terms of members and communication links).

In addition, also protocol implementing common distributed abstractions (e.g.
consensus) must be
adapted to face the dynamicity of the network.


This assignment can be split in two related sub
-
task:

one student is required to study and report about recent models for dynamic networks;


The second student is required to study and repo
rt about new computation abstractions (i.e. consensus
protocols, leader election protocols, register protocols etc…) designed for dynamic distributed systems.



-

Casteigts, A., Flocchini, P., Quattrociocchi, W., & Santoro, N. (2012). Time
-
varying graphs a
nd dynamic
networks. International Journal of Parallel, Emergent and Distributed Systems, 27(5), 387
-
408.

http://www.tandfonline.com/doi/pdf/10.1080/17445760.2012.668546


-

Martin Biely, Peter Robinson, and Ulrich Schmid. 2012. Agreement in directed dynamic networks. In
Proceedings of the 19th international conference on Structural Information and Communicatio
n
Complexity (SIROCCO'12)

http://arxiv.org/abs/1204.0641


-

Fabian Kuhn, Rotem Oshman, and Yoram Moses. 2011. Coordinated consensus in dynamic networks. In
Proceedings of the 30th annual ACM SIGACT
-
SIGOPS symposium on Principles of distributed computing
(PODC '11)

http://dl.acm.org/citation.cfm?id=1993808&CFID=332219577&CFTOKEN=96580848


-

F. Harary

and G. Gupta. 1997. Dynamic graph models. Math. Comput. Model. 25, 7 (April 1997)


-

Fabian Kuhn, Nancy Lynch, and Rotem Oshman. 2010. Distributed computation in dynamic networks. In
Proceedings of the 42nd ACM symposium on Theory of computing (STOC '10)

http://dl.acm.org/citation.cfm?id=1806760&CFID=332219577&CFTOKEN=96580848


-

R. Baldoni, S. Bonomi, M. Raynal Implementing a Regular Register in an Eventu
ally Synchronous
Distributed System Prone to Continuous Churn IEEE Transaction on Parallel Distributed Systems, volume
23, num. 1, pages 102
-
109, 2012

http://midlab.dis.uniro
ma1.it/articoli/BBR_TPDS12.pdf

5

Querzoni
Leonardo

querzoni@dis.uni
roma1.it


Come evolve un
sistema su larga
scala


Please refer to the set of slides available on the

main page

of this course for detailed informations on
exam rules. Note that the f
ollowing two lists contain only suggestions. Students are strongly suggested to
propose the topics they're most interested in. Striked
-
though topics have been already assigned and are
no more available. For each topic a title and the link to one or more su
ggested readings on the topic are
provided. The list of suggested readings is not exhaustive as reviewing the state of the art is part of your
work.

Many of these papers are freely available. Those that require an active subscription can be downloaded
from

computers connected through the proxy installed at La Sapienza. Check the

BIXY

service (in italian),
or contact me for further details.

Suggested topics:

Byzantine failures,
altruistic processes and rational behaviors.

A.S. Ayer, L. Alvisi, A. Clement, M. Dahlin, J.P. Martin, and C. Porth.

BAR Fault Tolerance for Cooperative
Services

SOSP 2005.

The hurdle
s of security in cloud computing platforms.

Craig Gentry.

Fully homomorphic encryption using ideal lattices
. STOC, 2009.

Dependable and secure storage

A. Bessani, M. C
orreia, B. Quaresma, F. Andre and Paulo Sousa.

DepSky: Dependable and Secure Storage in
a Cloud
-
of
-
Clouds
. EuroSys 2011.

Consistency and performance of large scale systems

W.

Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen.

Don’t settle for eventual: scalable causal
consistency for widearea storage with cops
. SOSP 2011.

C. Li, D. Porto, A. Clement, J. G
ehrke, N. Preguica and R. Rodrigues.

Making Geo
-
Replicated Systems Fast as
Possible, Consistent when Necessary

OSDI 2012.

Read/Write atomic storage in dynamic environments

M. K. Aguilera,
I. Keidar, D. Malkhi and A. Shraer.

Dynamic atomic storage without consensus
. Journal of
the ACM (JACM), Volume 58 Issue 2, 2011.

From disk
-

to flash
-
based storage

M. Balakrishnan, D. Malk
hi, V. Prabhakaran and T. Wobber.

Going beyond Paxos
. Microsoft Reserach
Technical Report, 2011.

M. Balakrishnan, D. Malkhi, V. Prabhakaran, and T. Wobber.

CORFU: A Shared Log Design for Flash
Clusters
. NSDI 2012.

Array
-
based data storage

P. Brown.

Overview of SciDB: large scale array stora
ge, processing and analysis
. SIGMOD 2010.

V.
-
T. Tran, B. Nicolae, G. Antoniu and L. Bouge.

Pyramid: A large
-
scale array
-
oriented active storage
system
. LAIDS 2011

Speculative execution in r
eplicated services

V. G. Bortnikov, G. Chockler, D. Perelman, A. Roytman, S. Shachor and I. Shnayderman.

FRAPPE´: Fast
Replication
Platform for Elastic Services
. LADIS 2011.

Key/Value storage systems

H. Lim, B. Fan, D. G. Andersen and M. Kaminsky.

SILT: A Memory
-
Ef

cient, High
-
Performance Key
-
Value

Store
. SOSP 2011.

D. Beaver, S. Kumar, H. C. Li, J. Sobel and P. Vajgel.

Finding a needle in Haystack: Facebook’s photo
storage
. OSDI 2010.

L. Glendenning, I. Beschastnikh,
A. Krishnamurthy and T. Anderson.

Scalable Consistency in Scatter
. SOSP
2011.

Consistent data storage

B. Calder et al.

Windows Azure Storage: A Highly Available Cloud Storage Service with Strong
Consistency
. SOSP 2011.

J. C. Corbett et al.

Spanner: Google’s Globally
-
Distributed Database
. OSDI 2012.

Locality and performance

E. B. Nightingale, J. Elson, J. Fan, O. Hofmann, J. Howell and Y. Suzue.

Flat Datacenter Storage
. OSDI 2012.

P. Costa, A. Donnelly, A. Rowstron, and G. O'Shea.

Camdoop: Exploiting In
-
network Aggregation for Big
Data Applications
. NSDI 201
2.

Check the following conference proceedings for more topics:





OSDI 2012





NSDI 2012





USENIX ATC 2012





SOSP 2011





NSDI 2011





USENIX ATC 2011





OSDI 2010





NSDI
2010





USENIX ATC 2010

6

Querzoni
Leonardo

querzoni@dis.uni
roma1.it


Come evolve un
sistema su larga scala
2


Advanced Topics In Security Of Comp
lex Systems
-

Suggested Topics

Please refer to the set of slides available on the

main page

of this course for detailed informations on
exam rules. Note that t
he following two lists contain only suggestions. Students are strongly suggested to
propose the topics they're most interested in. Strike
-
though topics are not available. For each topic a title
and the link to a publication where more info can be found are

provided.

Suggested topics:

The impact of network topologies on intrusion possibilities and detection probability.

The effect of network topology on the spread of epidemics

State

of the art for digital
certificates.
http://www.dis.uniroma1.it/~querzoni/teaching/1213/AdvancedTopicsInSecurityOfCompl
exSystems/Sug
gestedTopics

Beyond RSA: elliptic curve cryptography and other methods in the state
-
of
-
the
-
art for public
-
key
cryptography.

SEC 1: Elliptic Curve Cryptography

Confidential search: how to search
encrypted data

Practical Techniques for Searches on Encrypted Data

The hurdles of security in cloud computing platforms.

Craig Gentry.

Fully homomorphic encryption using ideal lattices
. STOC, 2009.

Secure hash functions: from SHA
-
1 to more secure message digest algorithms.

NIST CRYPTOGR
APHIC HASH ALGORITHM COMPETITION

Cryptographic modules: current standards and their implementation in real
-
world products.

FIPS PUB 140
-
2

Secure mail protocols and legal aspects.

Technical rules for the Italian PEC (ONLY IN ITALIAN)

The hurdles of security in cloud computing platforms.

Fully homomorphic encryption using ideal lattices

Security for Virtual Currency

Bitcoin: A Peer
-
to
-
Peer Electronic Cash System

GPU computing vs security: how to make a strong password weak

Update: New 25 GPU Monster Devours Passw
ords In Seconds

Platforms for federated identity management.

Build a running demo to test identity federation among several providers: Google, Facebook, Windows
Live ID, Shibboleth, Microsoft ADFS.
Providxe insights from your implementation activity.


Petroni Fabio

petroni@dis.uniro
ma1.it

Distributed
Collaborative
Filtering Techniques

Collaborative Filtering (CF) is one of the most successful approaches to building recommender
systems. It uses the known pre
ferences of a group of users to make recommendations or
predictions of the unknown preferences for other users. Several surveys in literature present an
overview of this field [1, 2]. However, most existing CF based recommender systems work in a
centralize
d way. Only few works [3, 4] tackled the problem to implement CF in a distributed
fashion. Scope of the project is to investigate the state of the art of distributed CF solutions in
order to list the weaknesses/strengths of each approach.


References

[1] G
ediminas Adomavicius and Alexander Tuzhilin. Toward the next generation of recommender
systems: A survey of the state
-
of
-
the
-
art and possible extensions. Knowledge and Data
Engineering, IEEE Transactions on, 17(6):734

749, 2005.

[2] Xiaoyuan Su and Taghi M

Khoshgoftaar. A survey of collaborative filtering techniques.
Advances in Artificial Intelligence, 2009:4, 2009.

[3] Peng Han, Bo Xie, Fan Yang, and Ruimin Shen. A scalable p2p recommender system based on
distributed collaborative filtering. Expert system
s with applications, 27(2):203

210, 2004.

[4] Fady Draidi, Esther Pacitti, and Bettina Kemme. P2prec: a p2p recommendation system for
large
-
scale data sharing. In Transactions on largescale data
-
and knowledge
-
centered systems III,
pages 87

116. Springer, 2
011.


7

Petroni Fabio

petroni@dis.uniro
ma1.it


Collaborative
Filtering with
MapReduce

Collaborative Filtering (CF) is the process of identifying similar users and recommending what
similar users like.
Examples of CF applications include recommending books, CDs and other
products at Amazon.com [1], movies by MovieLens [2], and news at Google News [3]. Scope of the
project is to implement a CF algorithm using the MapReduce programming model and evaluate i
ts
performance on a real dataset. The student is invited to use:

-

framework: Apache Hadoop [4];

-

algorithm: (PLSI) Probabilistic latent semantic indexing [5];

-

dataset: MovieLens dataset [6].

Any different proposal is welcome (a different algorithm) and

has to be discussed with the tutor.


References

[1] Greg Linden, Brent Smith, and Jeremy York. Amazon. com recommendations: Item
-
to
-
item
collaborative filtering. Internet Computing, IEEE, 7(1):76

80, 2003.

[2] Bradley N Miller, Istvan Albert, Shyong K Lam
, Joseph A Konstan, and John Riedl. Movielens
unplugged: experiences with an occasionally connected recommender system. In Proceedings of
the 8th international conference on Intelligent user interfaces, pages 263

266. ACM, 2003.

[3] Abhinandan S Das, Mayur

Datar, Ashutosh Garg, and Shyam Rajaram. Google news
personalization: scalable online collaborative filtering. In Proceedings of the 16th international
conference on World Wide Web, pages 271

280. ACM, 2007.

[4]

http://hadoop.apache.org/
.

[5] Thomas Hofmann. Latent semantic models for collaborative filtering. ACM Transactions on
Information Systems (TOIS), 22(1):89

115, 2004.

[6]

http://www.grouplens.org/node/73
.

8

Virgillito
Antonino

virgilli@istat.it


Advertisement
Platform

A social network must provide to its users a report of the revenue obtained from
publishing
advertisements on their page.


Three input data sets must be considered:

1. the log of user activity for a campaign in a month. Each row in the log corresponds to one user click and
includes the following attributes:

-

Advertisement identifier

-

Type of product advertised (rough textual description like electronics, outfit, finance, etc.)

-

Advertiser Id (the identifier of the owner of the product)

-

Publisher Id (the identifier of the user on whose page the ad was published)

-

Viewer Id (can
be anonymous if comes from outside the social network)

2. the global number of page views per publisher in the month.

3. the list of user
-
to
-
user bidirectional connections.


Compute for each publisher the total revenue obtained in the month. The revenue is computed as the CPC
(cost
-
per
-
click) multiplied by the number of unique clicks per user. The CPC is computed as the number
of clicks divided by the number of page views, mul
tiplied by 0.5 if the viewer is anonymous and by 0.3 if
the viewer is a neighbor of the publisher in the network.

-

Implement a program that randomly generates reasonably “big” data sets

-

Implement the MapReduce job(s) for computing the report. This migh
t involve building intermediate
data structures, such as adjacency lists.

-

Perform a set of experiments regarding performance analysis of the implemented jobs. Report must
present the results of the experiments.

-

(Optional) Implement a simple web appli
cation with a simple search form where by typing the ID of a
publisher, the list of actual and “possible” friends are shown.

9

Virgillito
Antonino

virgilli@istat.it


People You May
Know

Implement MapReduce job(s) f
or computing the "people you may know" functionality in a social
network. For each user, the problem is to find the users that are not directly connected to the user in the
social network but are likely to be connected in the real world, according to the r
elationship between
user's direct neighbors.


Two input data sets must be considered:

1. the list of users, including the following attributes:

-

User id

-

User name

2. the list of user
-
to
-
user bidirectional connections.



-

Devise a reasonable "may
know" criteria (e.g. the first X users that are reachable from a user in two
-
step
connections)

-

Implement a program that randomly generates reasonably “big” data sets

-

Implement the Map
-
Reduce job(s) to determine the “may know” list for each user. This m
ight involve
building intermediate data structures, such as adjacency lists.

-

Perform a set of experiments regarding performance analysis of the implemented jobs. Report must
present the results of the experiments.

-

(Optional) Implement a simple web
application with a simple search form where by typing the name of
a user, the list of actual and “possible” friends are shown.


Suggestions for experiments


Experiments mainly (though not necessarily) regards performance analysis of the realized job.Creati
vity
is encouraged.

In the following I suggest some parameters that can be varied in the experiments, though students are
expected to focus on those that are more significant with respect to the specific problem:



Cluster size. (Baseline: execution on 1 no
de.)



Input size. As big as it gets.



Number of maps and reduces.



Input split size.



Specific optimizations on the job and alternative implementations (suggested for single
-
node
implementations).



Alternative input parsing. Example: use a custom
RecordReader vs. parsing the input in the Map.



Alternative partitioning solutions.

Example measures:



Overall execution time.



Average time for maps and reduces.

Number of maps execution.

10

Cerocchi Adriano

cerocchi@dis.unir
oma1.it

INVESTIGATION ON
ENERGY STORAGE
TRENDS

Energy efficiency is a key point in a smart grid scenario. Several methods should be used in combination
to achieve the goal of decreasing wastes and maximize renewable energy usage for a
considered grid.
Introducing energy storage within such systems opens to new prospectives on efficiency, with results
depending on applied technologies for different scenarios.

New technological trends (e.g. ultracapacitors) on energy storage should be ana
lyzed, indicating the
added value they can give. The focus should be on how the efficiency can increase and which problematic
issues (phase jumps, costs) can be faced for any considered implementation. A comparison of the
goodness of different trends will
indicate the best suitable implementations according to functionalities
that need to be achieved.

11

Cerocchi Adriano

cerocchi@
dis.unir
oma1.it

ELECTRICAL
SWITCHING
IMPLICATIONS

The smart grid concept can use

a dynamic energy routing to adapt the current flow to network conditions
whereas an advantage can be achieved with it.

Different areas and loads within the network should be supplied by the lines that can ensure the best
permormances in terms of
efficiency, maximization of local power generation and costs reduction.

In the common switching operations which change a previous confuguration of plugged lines creating a
new mapping of those lines, a strong attention should be focused on electrical effe
cts of the switch.
Currents, voltages and phases could significantly vary because of different features of the connected lines
(inherent electrical characteristics, kind of loads ecc. ecc.). For example, condensators should be
dynamically assigned to some
lines to avoid the presence of phase shifts (efficiency decreasing).

Investigate on the best trends and possibilities to perform “efficient” switching operations, motivating
the requirements for having a switch that can really be defined convinient.

12

Cerocchi Adriano

cerocchi@
dis.unir
oma1.it

COMMUNICATION
PROFILING FOR A
SMART
ENVIRONMENT
(Practical)

Smart networks are made of heterogeneous devices which can be viewed as distributed intelligent agents
com
municating to each others and performing “smart” tasks. The communication model of the network
needs to take latency into account because it can have a significant impact on the higher
-
level capabilities
of a smart scenario.

Latency is 1) not zero and 2) n
ot constant, becoming an inherent parameter which affects performances
and reliability of the smart network.

Using a smart infrastructure (provided by us) as test, conduct a profiling of the whole communication
model, carachterizing the amount of exchanged

messages and delays, underlyining strengths and
weaknesses (e.g. bottlenecks) that could significantly affect network functionalities.

13

Montanari
Luca/Cerocchi
Adriano

montanari@dis.u
niroma1.it

Monitoring of
power consumpion
data

In this work is required to monitor and learn power
-
related data trend in order to timely recognize
variations with respect to normal behaviors. The student should use known Java library that implements
Artificial Neural

Networks to learn and recognize situations.

In order to present the work, the application must also work

off
-
line using log
-
files.

14

Montanari Luca

montanari@dis.u
niroma1.it


Monitoring of a
preexistent
Application

The student that has an existing application can discuss with Luca Montanari how monitor this
application in order to discover critical pattern or interesting situation during the normal lifetime of the
system.

In order to present the work, the

application must also work

off
-
line using log
-
files.

15

Vignola Jacopo

Jacopo.Vignola.1@
city.ac.uk


Similarities
between Power and
Computational
Grids

The implementation of smart meters and grids will
offer network operators and energy suppliers the
ability to observe in real time generation, transportation and consumption flows: this will be possible by
overlaying a new data network onto the existing physical grid. The architecture of such data network

will
represent a key element for enhancing the implementation of smart technologies.

The student is expected to provide a detail analysis of the similarities between characteristics (and
objectives) of power grids and the ones of computational grids (and/
or other distributed network
applications), demonstrating (optional) that a peer
-
to
-
peer distributional structure is suitable to exploit
smart technologies.

Ref 1: Chetty M. and Buyya R. (2002), “Weaving Computational grids: how analogous are they with
ele
ctrical grids?”, IEEE

Ref 2: Irving, M.; Taylor, G.; Hobson, P.; (2004), "Plug in to grid computing," Power and Energy Magazine,
IEEE , vol.2, no.2, pp. 40
-

44

Ref 3: Massoud Amin, S.; Wollenberg, B.F. (2005) , "Toward a smart grid: power delivery for the
21st
century" Power and Energy Magazine, IEEE , vol.3, no.5, pp. 34
-

41

16

Vignola Jacopo

Jacopo.Vignola.1@
city.ac.uk


Simulation tools for
distributional
network
architectures

Simulation has been used
extensively for modelling and evaluation of real world systems, from business
process and factory assembly line to computer systems design. While there exists a large body of
knowledge and tools, few projects have been developed specifically for simulating

communications (e.g.
SimJava, NS
-
2, Parsec, P2Psim, PlanetSim, PeerSim) or application scheduling (e.g. Bricks, MicroGrid,
Simgrid, GridSim) in grid computing environments. As well, other projects focus on agent
-
based
simulation in different interaction d
esigns (e.g. SWARM, RePast, JAS).

The student is expected to provide a comparative analysis of existing simulation tools in grid computing
environments, suggesting (optional) the most suitable one for simulating agent
-
based economic activities
(e.g. tradin
g of a commodity) in a distributional network environment. This coursework may evolve in a
dissertation project by using the identified tool to run the simulation of trading a homogeneous
commodity in a P2P network environment.

Ref 1: Naicken, Stephen, Ani
rban Basu, Barnaby Livingston, and Sethalat Rodhetbhai. "A survey of peer
-
to
-
peer network simulators." In

Proceedings of The Seventh Annual Postgraduate Symposium, Liverpool,
UK, vol. 2. 2006.

Ref 2: Niazi, Muaz, and Amir Hussain. "Agent
-
based tools for mo
deling and simulation of self
-
organization
in peer
-
to
-
peer, ad hoc, and other complex networks." Communications Magazine, IEEE 47, no. 3 (2009):
166
-
173.

Ref 3: Buyya, Rajkumar. "Economic
-
based distributed resource management and scheduling for grid
comput
ing." arXiv preprint cs/0204048, Monash University, Australia (2002)

17
-
18

Di Luna Giuseppe

diluna@dis.uniro
ma1.it


Understanding a real
world scale attack: The
Carna botnet.


The paper [1] "Port scanning / 0
using insecure embedded devices"

(
http://internetcensus2012.bitbucket.org/paper.html
) shows how
easily is to get a botnet o
f 420k
devices. Specifically, A
cross
-
platform botnet that spread
s
itself exploiting very old, and extremely
effective, trick.
Such botnet has been used to run tests against the whole internet space.

These tests include ICMP Ping, Reve
rse DNS, SYN scan e traceroute.
The gathered data and, partially, the
s
ource code has
been released by
the author of the botnet.

(
http://internetcensus2012.bitbucket.org/download.html
)

The total data released account for about

9 terabytes of data. Data that
contains infor
mation about the
subnet 0/32!

An analysis of these information that g
oes beyond the results obtained
in [1], poses a challenge in hand
le
this extremely huge dataset.
Moreover the release of the source code gives to the interested the

amazing possibility to

study a software that has done wh
at
has not be done by any other human/ma
chine
before, An (inconsistent)
snapshot of Internet.


Two assigment on this subject could be available:


(A) (Practical):


An interested student could analy
ze the source code o
f the carna
botnet in order to:


-
Obtain a precise idea of the
botnet internals. Specifically:
the spreading and control mechanism


-
Update the botnet source code
in order to include to increase
its virulency


-
Run a simulated botnet over a (not necessary) simulated target


(B) (Practical):


An interested student could analy
ze the data gathered from carna
botnet (Don't worry I already
downloaded the 9 terabyte ) in order to:


-

Asses if the

data are real by

means of replicated measure on
sampled ipv4 addresses. For example
using traceroute and reverse DNS


-

Study the data to obtai
n aggregated information on our
university. How many device are accessible
from the outside? which kind o
f

services are used? When our university was scanned?




19

Di Luna Giuseppe

diluna@dis.uniro
ma1.it


The hidden society of
hackers: Having fun
with IRC, Tor and I2P


IRC (Internet Relay Chat) is a
chat system mainly used between
1998
-
2005. The usage of IRC between the
regu
lar audience is decreasing from
the advent of instant messaging, and now thanks to facebook IRC is

almost dead. Almost, because IRC is still wid
ely use
by a narrow and interesting
sub
-
cultur
e, the one of
the hacker. Using
hidden networks (Tor,I2
P) to hide their identities and
sometimes the location of IRC
servers

hackers are still using irc to
coordinate their move and to have "cheap talk" on public channel.

So it is possible for an
yone of us (a
nyone that knows where look) to
monitor this conversations. At firs
t
glance "cheap talk" could not
seems so interesting.... but at the best of my knowledge some of the

major FBI targets have been busted than
ks to the leak of personal informati
on
using IRC. Some examples
are:


-
Sabu of lulzsec has been taken d
own because once he forgets to
join irc using Tor


-
sup9, the hacker behing the s
tratfor hack. Has been arrested
thanks to the leak of personal info that
he did on irc.



Moreover, in my personal experience, I
have seen a lot of crazy things
on IRC: People that rat outs each
other
posting personal info on public channel,

ip addresses of targets posted in clea
r view, information
about 0 day
exploit, link to variou
s online identies, leaking the home ip
-
address

or the timezone.



An assigment on this subject could be available:



(A) (Practical):


An interested student could write down an IRC Bot (
a piece of
software that mimicry a real user) in
order to:


-

collect public reference to lin
ks that contains "hot content":
password, dump, CCV


-

record an history of nick change in order to link more identity


-

collect leaks of personal i
nfo like: TimeZone, Ip address,
tracking informatio
n in pasted urls and
similar.


-

draw a relationship graph among user