Slide 1 - 123seminarsonly.com

blabbedharborIA et Robotique

23 févr. 2014 (il y a 3 années et 5 mois)

68 vue(s)

CAPTCHA

A

CAPTCHA


is
a type of

challenge
-
response

test used
in

computing

to ensure that the response is not generated
by a
computer.


CAPTCHA requires that the user type letters or digits from
a distorted image that appears on the screen.














A CAPTCHA is a means of automatically generating
new challenges which:


Current software is unable to solve accurately.


Most humans can solve


Does not rely on the type of CAPTCHA being new
to the attacker.


CAPTCHAs rely on difficult problems in artificial
intelligence
.






First developed by Alta Vista in 1997.


The term coined in 2000 by Luis von
Ahn

, Manuel
Blum and Nicholas J. Hopper of Carnegie Mellon
University and John Langford of IBM.


Primitive CAPTCHAs seem to have been developed in
1997 by

Andrei
Broder
,

Martin
Abadi
,

Krishna Bharat,
and Mark
Lillibridge

to prevent

bots

from
adding

URLs

to their

search engine.








Proposed by Alan Turing.



To test a machine’s level of intelligence Human judge
asks questions to two participants, one is a machine, he
doesn’t know which is which, If judge can’t tell which is
the machine, the machine passes the test.


CAPTCHA employs a reverse Turing test,


judge = CAPTCHA program,



participant = user



if user passes CAPTCHA, he is human


if user fails, it is a machine

1.
Text Based CAPTCHAs


2.
Graphics Based CAPTCHAs


3.
Audio or Sound Based CAPTCHAs

Typically
relay on sophisticated distortion of text images
rendering them unrecognizable to the state of the art of
the pattern recognition programs but recognizable by
humans.


Examples:



Simple, normal language questions:


What is sum of three and thirty
-
five?


If today is Saturday, what is day after
tomorrow
?



Very effective, needs a large question bank



Cognitively challenged users find it hard .





Gimpy:



Originally designed by Yahoo and CMU.


Based on human ability to read heavily distorted
and corrupted

text.


works by choosing a certain number of words
from a dictionary, and then displaying them
corrupted and distorted in an image; after that
Gimpy asks the user to type the words displayed in
that image
.




EZ
-
Gimpy:



A modified version of Gimpy.


Used in Yahoo Messenger Service.


It contains only one random character string.


The word is random and not picked from the dictionary.


Its not a good implementation of CAPTCHA, and already broken

OCRs.


MSN Passport service CAPTCHAs
:



ts

provided for Microsoft MSN
services.


uses 8 characters.


Warping is used to distort.


Its very strongly implemented and hasn’t been
broken.





Requires
user to perform image recognition test
.



IMAGINATION:



CAPTCHA that requires two steps to be passed.


first step visitor clicks elsewhere on the picture
that composed of a few images and selects in this
way a single image.


second step the selected image is loaded. It is
enlarged but very distorted. Also variants of the
answer are loaded on the client side. The visitor
should select a correct answer from the set of the
proposed words.



BONGO:



After
M.M.Bongard
, pattern recognition expert.


User has to solve a pattern recognition problem
.




ASSIRA:



Animal Species Image Recognition for Restricting
Access.


It’s a HIP
that works by asking users to identify

photographs
of cats and
dogs.


Difficult for computers but humans can
accomplish it very quickly and accurately.




Require
user to solve a speech
recognition
test.


In this version of
captcha

letters are read aloud
instead of being displayed in an
image.


Helps visually disabled users


Below is the Google’s audio enabled CAPTCHA.








3DCaptcha
is the "
captcha

nice to humans, bad to
machines
".


It is written in
PHP.



A new approach to
captchas
, using human's spatial
cognition abilities to differentiate humans from
machines.


It uses a
markov
-
chain to generate words that
resemble human language and are easy to type, yet
avoid dictionary lookups.


It filters profane language.


It's easy to deploy.




Free
CAPTCHA service that helps to digitize books,
newspapers and old time radio shows
.


reCAPTCHA

improves the process of digitizing
books by sending words that cannot be read by
computers to the Web in the form of CAPTCHAs for
humans to decipher.


Each
word that cannot be read correctly by OCR is
placed on an image and used as a
CAPTCHA.


This is possible because most OCR programs alert
you when a word cannot be read correctly.



Working of
reCAPTCHA
:



Two words are shown, one word is known as Control
Word, and another one is known a questionable word.


System assumes that if human types the control word
correctly, the questionable word is also correct.


The identification performed by each OCR program
is given a value of 0.5 points, and each interpretation
by a human is given a full point
.


Once a given identification hits 2.5 votes, the word is
considered called
.




1.
Preventing Comment Spam in
Blogs

2.
Protecting Website
Registration

3.
Protecting Email Addresses
From Scrapers

4.
Online Polls

5.
Preventing Dictionary Attacks

6.
Search Engine Bots

7.
Worms and Spam




Called Hard
-
AI problems.



CAPTCHA
tests are based on open problems in
artificial
intelligence
(AI
).



A win
-
win scenario:



either
a CAPTCHA is not broken and there is a
way to differentiate humans from
computers.



Or
the CAPTCHA is broken and an AI problem
is solved
.

Thus AI knowledge is advanced if CAPTCHAs are
broken.



Things to keep in mind:



Don’t store CAPTCHA solution in Web page’s
metadata



A CAPTCHA is no good if it doesn't distort


Need a large database of different CAPTCHA
questions



Avoid repetition of questions



CAPTCHA Logic:




Generate the question



Persist the correct answer



Present the question to user



Evaluate answer, if incorrect, start again
--

Generate a
different CAPTCHA



If correct, allow access to user



GUIDELINES:




Accessibility



Image security



Script security



Security after widespread adoption


Custom implementation or a general CAPTCHA?



Cracking CAPTCHAs through programs




Convert CAPTCHA into greyscale



Detect patterns in the image corresponding to
characters



Or, read session files of that user and know the
CAPTCHA word



Solution: Only store a hash of the CAPTCHA word
in session files



Usability issues:



W
3
C mandates Web to be accessible to all people



Some CAPTCHAs are inaccessible to visually
impaired, cognitively challenged people




Compatibility issues:



JavaScript may need to be activated in browsers



Some may need Adobe Flash
plugin

installed



CAPTCHAs are an effective way to counter bots and
reduce spam



They serve dual purpose


help advance AI knowledge



Applications are varied


from stopping bots to
character recognition & pattern matching



Some issues with current implementations represent
challenges for future improvements