p2pWeb project - w3architect.com

judgedrunkshipΔιακομιστές

17 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

100 εμφανίσεις

p2pWeb


Slide
1

Peer
-
To
-
Peer : Concept, Tools and Applications

The p2pweb Project


Low cost Peer to Peer solutions for high
availability web hosting



19 Mai 2005

Séminaire


«

Peer
-
To
-
Peer : Concept, Tools and Applications

»

Ecole d’ingénieurs de Genève

p2pWeb


Slide
2

Peer
-
To
-
Peer : Concept, Tools and Applications

Agenda

1.
The Project goals


2.
Web hosting solutions and architecture


3.
The p2pweb solution


4.
Project constraints and key technologies

5.
Related projects


6.
The project components


Global server load balancing system


Distributed set of web server


Monitoring system


Node architecture and hardware


7.
Conclusion


p2pWeb


Slide
3

Peer
-
To
-
Peer : Concept, Tools and Applications

To explore and implement low cost solutions
for high availability web hosting


Do More with Less



Our targets are :


small or medium structures (associations, NGO, etc …)


with limited resources (money, IT people)


with important web hosting needs (bandwidth available)


rich and complex web site


medium to high web traffic


high availability and visibility needs


It may fit very well the needs of many project in Least Developed Countries :



TeleCentres Networks, Rural Organisations, Universities, Cultural Centres, Public
Libraries, Community Multimedia Centres, Health Networks, etc ...

The Project goals

p2pWeb


Slide
4

Peer
-
To
-
Peer : Concept, Tools and Applications

Afromix.org
(personal web site)

A portal of African and Caribbean Cultures since 1993

A complex web site using multiple technologies


in house Perl Content Management System (CMS)


an extended discographic database (1600 artist, more
than 50 styles from all Africa and French West Indies)


multilingual (French, English, Spanish) site running on
a JAVA application server (Tomcat)


about 25 000 files, 400 000 pages/month, 2 million
hits/month, 60 000 unique visitors/month


Mediaport.net
(community web site)

One of the first French web pioneer, first developed in INA


mostly static content (near 10 000 files)


multilingual (French, English) site running on a PHP
CMS (ezpublish)


it’s the main p2pweb test platform and it will
evolve to an open web hosting solution for
artistic and cultural web projects (an editorial
committee is forming)

Example of hosted web site

p2pWeb


Slide
5

Peer
-
To
-
Peer : Concept, Tools and Applications

The web hosting market


Free web hosting


Very limited


static html or small PHP site (limited computing resources)


can’t use your own domain name


Professional web hosting


A broad range of services


private virtual server


dedicated server


Co/location


But price is quite high


100
-
200

/month for one dedicated server


and maintenance can be complex

p2pWeb


Slide
6

Peer
-
To
-
Peer : Concept, Tools and Applications

Centralized architecture

Server in one location :

Server and Internet link are single point
of failure (SPOF)

p2pWeb


Slide
7

Peer
-
To
-
Peer : Concept, Tools and Applications

Centralized architecture (cont.)

Database cluster

SAN Storage

Application Servers

Load Balancers

Web servers

Reverse Proxy / Cache / SSL accelerators

Load Balancers

Multi
-
homing with BGP routing

High availability
architecture

Datacenter hosting

-

BGP routing

-

hardware load balancing

-

SAN storage


In theory, no SPOF


but very complex
architecture


very high cost

p2pWeb


Slide
8

Peer
-
To
-
Peer : Concept, Tools and Applications

CDN Architecture

Content Delivery Network

Service delivered by
companies like Akamai,
Speedera, and others.

Edge servers provide
caching and data
replication for fast delivery
to clients worldwide.


A solution for very high traffic web
site.

Very expensive solution
.


p2pWeb


Slide
9

Peer
-
To
-
Peer : Concept, Tools and Applications

alternative web hosting


Community based web hosting


Initiatives from various associations


ouvaton.coop, globenet.net, autre.net, altern.net, ...


Most of the time, people share their money and knowledge to
buy and administer one or two dedicated server.



Home server


We now have sufficient bandwidth (ADSL) computing power
(PCs), good software (apache, linux …)



We lack reliability !



p2pWeb


Slide
10

Peer
-
To
-
Peer : Concept, Tools and Applications

First idea : big home server

p2pWeb


Slide
11

Peer
-
To
-
Peer : Concept, Tools and Applications

Second idea (better one)

Lots of people (family, friends, co
-
workers, …) already have :



An ADSL Internet access or Permanent High Speed Connection



One or more PCs (with a lot of unused disk space)


So, what about sharing those resources to
build a more powerful and resilient network
of web servers

p2pWeb


Slide
12

Peer
-
To
-
Peer : Concept, Tools and Applications

Web Hosting : the p2pweb way

ADSL
ISP 1

ADSL
ISP 2

ADSL
ISP 3

Each member of the p2pweb network share a portion of his Internet bandwidth (most of the
time an ADSL line) and host a small server.

The result is a powerful network that is the sum of the bandwidth and computing resources of
all the members.

p2pWeb


Slide
13

Peer
-
To
-
Peer : Concept, Tools and Applications

A peer to peer solution


Somehow, it’s a return to the very fundamentals
principles of Internet:


a cooperative solution (network of servers)


a distributed solution (no central control)


a fault tolerant solution (resilience)


But with all the power of existing internet and
open source technologies


consumer computers and internet access


overlay network and services over the Internet


It is a peer to peer solution

!


p2pWeb


Slide
14

Peer
-
To
-
Peer : Concept, Tools and Applications

The project constraints


Unreliable component


Node failure is not an exception, it’s the rule.


Internet link failure, power outage, server crash …


Automatic function


Murphy’s law : servers will always crash when there
is nobody to fix the problem (at night, when you are
on vacation …)


Pragmatic approach


Build from existing component


Simple and efficient solutions are priority choices

p2pWeb


Slide
15

Peer
-
To
-
Peer : Concept, Tools and Applications

Key technologies


Mass market products are available at low cost
now !



ADSL lines


1 Mb/s Up
-

15Mb/s Down for 30


/ month (free.fr)


ADSL router / firewall / ethernet or wifi


D
-
LINK, NetGear, LINKSYS from 75 to 150



Small Servers


PC barebones (Asus, Biostar, Shuttle …)


from 300 to 500




mini iMac (Apple)


499



Open Source Software


BSD, Linux, apache, tomcat, etc …


p2pWeb


Slide
16

Peer
-
To
-
Peer : Concept, Tools and Applications

Related projects

YouServ (IBM)
http://www.almaden.ibm.com/cs/people/bayardo/userv/


YouServ is software that forms a webserving "grid" by allowing its users
to pool their desktop computing resources to create one large, virtual web
-
space.


An intranet project, more oriented on desktop file sharing.


Unfortunately not open source


Vergenet (Simon Horman)
http://www.vergenet.net/


Vergenet has servers located in Sydney, Amsterdam, London, Tokyo and
Indiana. These servers are all running Linux and a variant of Super
Sparrow to load balance traffic between them.


Super Sparrow enables users to load balance traffic between
geographically separated points of presence by finding the site network
-
wise closest to clients. This is done by accessing BGP routing information
(but it require direct access to a BGP router)

p2pWeb


Slide
17

Peer
-
To
-
Peer : Concept, Tools and Applications

Related projects (cont.)

Coral (New York University)
http://www.coralcdn.org/


Coral is peer
-
to
-
peer content distribution network, comprised of a
world
-
wide network of web proxies and name servers


Publishing through Coral is as simple as appending a short string to the
hostname of objects' URLs; a peer
-
to
-
peer DNS layer transparently
redirects browsers to participating caching proxies


an URL like www.myserver.com/some/path.html becomes
www.myserver.com.
nyud.net:8090
/some/path.html


Coral is in fact running on top of the planet
-
lab network (a grid
computing research network : http://www.planet
-
lab.org/)

Globule (Vrije University Amsterdam)
http://www.globule.org/


Globule is a module for the Apache Web server that allows a given
server to replicate its documents to other Globule servers. Clients are
automatically redirected to one of the available replicas.


The project provide both content replication and HTTP or DNS based
redirection mechanisms

p2pWeb


Slide
18

Peer
-
To
-
Peer : Concept, Tools and Applications

P2PWeb
-

Project Components


A global server load balancing system


Two main functions


Load balance the traffic on the web servers


Provide failover = only send traffic on alive web servers


A distributed set of web server


And a set of tools to :


Publish content on the servers


Keep all servers in sync (replication mechanism)


Monitoring services



p2pWeb


Slide
19

Peer
-
To
-
Peer : Concept, Tools and Applications

Global server load balancing


Load balancing


achieved using Round Robin DNS


simple system, with well known limits
(http://www.tenereillo.com/GSLBPageOfShame.htm)


Failover


achieved by coupling a monitoring system (NAGIOS) with the
DNS


DNS entries have short TTL (time to live)


NAGIOS monitors each web servers


When a server change state (for example DOWN) a special handler is
called that update the DNS entry and reload the DNS


The failed server is no longer announced by the DNS



To have a fully redundant system, we use 3 independents
DNS (all primary), each running its own NAGIOS
instance

p2pWeb


Slide
20

Peer
-
To
-
Peer : Concept, Tools and Applications

GSLB : Failover illustrated

Initial DNS entries : all server are up

www 300 IN A 82.66.103.28

www 300 IN A 195.101.152.113

www 300 IN A 82.232.203.167

www 300 IN A 66.35.250.210


Server
195.101.152.113
fails

In the syslog trace, we can see :

22:22:46 nagios: SERVICE ALERT: ns1;HTTP
-
P2PWEB;CRITICAL;SOFT;1;Connection refused by host

22:23:47 nagios: SERVICE ALERT: ns1;HTTP
-
P2PWEB;CRITICAL;SOFT;2;Connection refused by host

22:24:46 nagios: SERVICE ALERT: ns1;HTTP
-
P2PWEB;CRITICAL;HARD;3;Connection refused by host

After 3 unsuccessfull try, a notification is send by email to the admin

22:24:46 nagios: SERVICE NOTIFICATION: nagios;ns1;HTTP
-
P2PWEB;CRITICAL;notify
-
by
-
email;Connection refused by host


The specific handler is called

22:24:47 nagios: SERVICE EVENT HANDLER: ns1;HTTP
-
P2PWEB;CRITICAL;HARD;3;http_p2pweb_handler

And the DNS is reloaded

22:24:47 named[17379]: master/p2pweb.net.zone:1: no TTL specified; using SOA MINTTL instead

And now we can verify that the DNS entries are

www 300 IN A 82.66.103.28

;www 300 IN A 195.101.152.113

www 300 IN A 82.232.203.167

www 300 IN A 66.35.250.210


Failover time is : 2 or 3 minutes (NAGIOS) + DNS max TTL (here 5 minutes)


= less than 10 minutes

p2pWeb


Slide
21

Peer
-
To
-
Peer : Concept, Tools and Applications

GSLB : next steps

Improvements :


Better service provisioning (manual process for now)


Better support for “long downtime”


When a server crash for a long period of time and then recovers its content may
be outdated


We must not announce it back until it has re
-
synchronize itself


Proximity load balancing


The goal is to load balance traffic between geographically distributed servers by
finding the site network
-
wise closest to clients.


A technology used in the CDN (Content Delivery Network) world


We can use part of the globule project, as Globule support DNS redirection
based on 'AS
-
path length' policy (used in BGP routing) which tries to redirect
clients to a server close to them.

These BGP information's can be collected through routeviews.org (no direct access to a
BGP router needed)

p2pWeb


Slide
22

Peer
-
To
-
Peer : Concept, Tools and Applications

Web server content management

We have a set of web servers and we need tools to :


Publish content on all servers


Keep them in sync (content replication)


Two main replication strategies


primary backup : one master server to form replicas


active replication : if any changes, one replica propagates them back to all the other ones

ADSL
ISP 1

ADSL
ISP 2

ADSL
ISP 3

p2pWeb


Slide
23

Peer
-
To
-
Peer : Concept, Tools and Applications

static content replication

One server play the master’s role


Content is published first on the master (for example via FTP)


Then the content is either pushed or pulled on the replica

The easiest way is to use rsync (rsync.samba.org)

Content can be pulled via anonymous rsync from master

Content can be pushed via rsync over ssh (using private/public key pair for
security)

ADSL
ISP 1

ADSL
ISP 2

ADSL
ISP 3

Master

Replica

Replica

Replica

p2pWeb


Slide
24

Peer
-
To
-
Peer : Concept, Tools and Applications

Content replication : rsync


rsync is a file transfer program for Unix systems. rsync provides a very fast method for
bringing remote files into sync. It does this by sending just the differences in the files across
the link, without requiring that both sets of files are present at one of the ends of the link
beforehand.


Anonymous rsync server (pull mode)


Run as a standalone daemon or can be launched by inetd


Advanced security options (read
-
only, chroot, IP access list)


Use : run from crontab on each mirror

rsync
-
a master.mydomain.com::www/ /data/www/


Rsync over SSH (push mode)


Need ssh access on each mirror


And ssh cryptographic keys exchange for unattended operation


Use : run on demand or from crontab on master

rsync
-
a /data/www/ user@mirror.mydomain.com::/data/www/



Useful options

--
compress compress file data during the transfer

--
bwlimit=KBPS limit I/O bandwidth; KBytes per second


p2pWeb


Slide
25

Peer
-
To
-
Peer : Concept, Tools and Applications

Content distribution :
Satellite

For a lot of geographically distributed mirrors, an interesting
solution can be Datacasting over satellite


Technology used by some CDN vendors


Skycache, cidera, Skystream.com
,
panamsat.com


Now available at lower cost from worldspace.fr (SatPost Solution)

p2pWeb


Slide
26

Peer
-
To
-
Peer : Concept, Tools and Applications

Use of CMS

Nowadays most webmasters use CMS (Content
Management System) tools for publishing



A lot of open source and commercial tools


Spip, mambo, typo3, phpnuke, … (php)


Bricolage, metadot, slashcode, … (perl)


Cofax, opencms, magnolia, jahia, … (java)


Plone, cps, zwook, … (python)



But none of them has direct support for a distributed
architecture


Most use a database as a backstore


Database distributed transaction and replication is a hard problem


p2pWeb


Slide
27

Peer
-
To
-
Peer : Concept, Tools and Applications

CMS : a pragmatic solution

The webmaster publish using the CMS as usual


The content is exported as static html files


Then distributed on the replicas using rsync

Constraint : the CMS must support export with “static like URLs”

Either directly or thru URL rewriting

/article/sport/2005/4/13/football.html (good)

/article.php?id_category=3&id_article=25 (bad for mirroring)

ADSL
ISP 1

ADSL
ISP 2

ADSL
ISP 3

webmaster

Master : static html files

Replica

CMS


Back office

html export

Replica

Replica

p2pWeb


Slide
28

Peer
-
To
-
Peer : Concept, Tools and Applications

CMS : distributed architecture (1)

Example : a non
-
governmental organization has activity over 4 countries and want to provide a global
web presence. The same global web design and tools are used on all servers.


Local publishing

Each local webmaster publish news about his country using the CMS on the local server

Content exchange using web services

Each local web server “collect” (pull) new articles from the other servers using some RSS (Really Simple Syndication
)

web services

Global web presence

Global content is (re)constructed on each server (from all data from the others) and served on Internet


Such solution may be constructed by hacking/customizing existing CMS



ADSL
ISP 1

ADSL
ISP 2

ADSL
ISP 3

Ivory coast

Senegal

Burkina faso

Mali

XML content exchange

p2pWeb


Slide
29

Peer
-
To
-
Peer : Concept, Tools and Applications

CMS : distributed architecture (2)

CMS + Message
-
oriented middleware (MOM)




A MOM is a client/server infrastructure that increases the interoperability, portability and
flexibility of an application by allowing the application to be distributed over multiple
heterogeneous platforms.


Thru the use of queue system, a MOM can provide
asynchronous reliable data exchange.



MOM is typically asynchronous and peer
-
to
-
peer and supports


Point to point communication


Publish and subscribe communication


There is a standardized interface in Java : JMS (java Message Service) API

Various open source implementation in the java world

ActiveMQ (
activemq.codehaus.org)

OpenJMS (openjms.sourceforge.net)

Joram (joram.objectweb.org)

MantaRay (mantamq.org)


No CMS use it now (as far as i know), but it may be a very good solution

p2pWeb


Slide
30

Peer
-
To
-
Peer : Concept, Tools and Applications

Performance monitoring

We collaborate with the webperf.org project


WebPerf is a system for measuring response time of specified URLs
from multiple locations on the internet.


The project is founded on the premise that there are lot of other
companies who also require such a monitoring service. If the other
companies are willing to monitor our URLs, we will montior theirs (a
free co
-
peering arrangement).


Some perl script installed on local node collect data from other web site, then
data are pushed to a central repository for further analysis.

A web interface allow members to display various statistics.


A view of one’s web site as seen from all other the world.


p2pWeb


Slide
31

Peer
-
To
-
Peer : Concept, Tools and Applications

Webperf.org : sample graph (1)

p2pWeb


Slide
32

Peer
-
To
-
Peer : Concept, Tools and Applications

Webperf.org : sample graph (2)

p2pWeb


Slide
33

Peer
-
To
-
Peer : Concept, Tools and Applications

Webperf.org : sample graph (3)

p2pWeb


Slide
34

Peer
-
To
-
Peer : Concept, Tools and Applications

Node architecture and security

ADSL or Cable modem

Ethernet router/firewall

Optional Wifi access point

Private Ethernet LAN

Ethernet link

Internet

Security

Mandatory


Hardware router/firewall with NAT capabilities


Internal private network using RFC 1918 IP
address (192.168.x.y)

No incoming traffic from the outside other
than required

Controlled via redirect on the firewall


http (port 80)


ssh (port 22, optional)

Web server

P2pweb traffic

p2pWeb


Slide
35

Peer
-
To
-
Peer : Concept, Tools and Applications

Node hardware (example)

Run on the corner of a desk


An ethernet and wifi switch

Connect other computers (not shown here)


A web and application server

Mac mini (apple)
running apache2 and tomcat


A firewall

Embedded PC
(www.pcengines.ch) running pf
(packet filter) on OpenBSD from a compact flash


No noise, and low electric power
consumption (near 50W)

p2pWeb


Slide
36

Peer
-
To
-
Peer : Concept, Tools and Applications

Conclusion


It can be done (at low cost)


It runs, with good results


(service uptime measured by siteuptime.com)

www.p2pweb.net

hosted by the p2pweb network

monitored Since: 9/23/2004

Outages: 40

Total Uptime: 99.560%

Downtime/year: 38,5 hours

www.afromix.org

hosted on a single node

monitored Since: 9/23/2004

Outages: 37

Total Uptime: 97.634%

Downtime/year: 207,3 hours


Still a lot of improvements


Not already an easy to use solution : node admin still require good Unix knowledge


Most important : a new way to design web
applications

p2pWeb


Slide
37

Peer
-
To
-
Peer : Concept, Tools and Applications

The Future

What we can provide right now

P2pweb.net : a global load balancing solution for any distributed web project


Just provide the servers IP addresses and a health check URL

Mediaport.net : a Community web hosting solution


We can host various web projects


We are looking for Partnerships in the following domains :

Packaging an easy and ready to use solution for deploying web mirrors

(industrializing the solution)


dedicated LINUX or BSD Distro with preinstalled packages


“all in one” solution : Java CMS + MOM in one webapp application

Helping in deploying such solution in Least Developed Countries

The P2PWeb Solution fits perfectly for Least Developed Countries with weak
bandwidth and low connectivity,

p2pWeb


Slide
38

Peer
-
To
-
Peer : Concept, Tools and Applications

Contacts

P2pweb is a SourceForge project (bsd license)

www.p2pweb.net or mediaport.sourceforge.net


Contacts :

about the project :

fgaillard@w3architect.com


you want to be hosted on mediaport.net :

fabrice.gaillard@mediaport.net

pierre.genillon@mediaport.net

p2pWeb


Slide
39

Peer
-
To
-
Peer : Concept, Tools and Applications

Questions

Thank you



Questions ?