What is Google API? - University of Minnesota Duluth

clappingknaveSoftware and s/w Development

Dec 14, 2013 (4 years and 18 days ago)

82 views

Introduction to Google API…



By Pratheepan Raveendranathan




The Google Web APIs service is a beta web program that enables developers to
easily find and manipulate information on the web.



Google Web APIs are for developers and researchers interested in using
Google as a resource in their applications.



The Google Web APIs service allows software developers to query more
than 3 billion web documents directly from their own computer programs.



Google uses the SOAP and WSDL standards to act as an interface between
the user’s program and Google API.




Programming environments such as Java, Perl, Visual Studio .NET are
compatible with Google API.


Definitions from http:// www.google.com/apis/

What is Google API?


What can you do with the API


Developers can issue search requests to
Google's index of more than 3 billion web
pages.



and receive results as


structured data,


Estimated number of results, URL’s, Snippets, Query Time
etc.


access information in the Google cache,


and check the spelling of words.

To start using the API


You need to,


Download API Package from
http://www.google.com/apis/



Create an account and get your license key


Install kit in your UMD account


And also need Soap::Lite


However, it is on all the csdev machines, so you don’t
need to get it. IT is not on UB or Bulldog.

Contents of this package:



googleapi.jar

-

Java library for accessing the Google Web APIs
service.



GoogleAPIDemo.java

-

Example program that uses googleapi.jar.
dotnet/



Example .NET

-

programs that uses Google Web APIs.



APIs_Reference.html
-

Reference doc for the API. Describes
semantics of all calls and fields.



Javadoc

-

Documentation for the example Java libraries.



Licenses

-

Licenses for Java code that is redistributed in this package.



GoogleSearch.wsdl

-
WSDL description for Google SOAP API.



soap
-
samples/

WSDL

Web Services Description Language



The standard format for describing a web
service.


Expressed in XML, a WSDL definition
describes how to access a web service and
what operations it will perform.


This is the most important file (only) to use
the API with Perl.

SOAP


Simple Object Access Protocol




SOAP stands for Simple Object Access Protocol


SOAP is a communication protocol


SOAP is for communication between applications


SOAP is a format for sending messages


SOAP is designed to communicate via Internet


SOAP is platform independent


SOAP is language independent


SOAP is based on XML


SOAP will be developed as a W3C standard


Google API for Perl

SOAP:Lite

SOAP:Lite for Perl is a collection of Perl
modules which provides a simple and
lightweight interface to the SOAP both on
client and server side.




So How do I Query Google?

#!/usr/local/bin/perl

w

use SOAP::Lite;


# Configuration

$key = "Your Key Goes Here";


# Initialize with local SOAP::Lite file

$service = SOAP::Lite


-
> service('file:GoogleSearch.wsdl');


$query= “duluth”;


Search Contd…

$result = $service


-
> doGoogleSearch(


$key, # key


$query, # search query


0, # start results


10, # max results


"false", # filter: boolean


"", # restrict (string)


"false", # safeSearch: boolean


"", # lr


"", # ie


"" # oe


);


Name


Description


key

Provided by Google, this is required for you to access the
Google service. Google uses the key for authentication and
logging.

q

Query Phrase.

start

Zero
-
based index of the first desired result.

maxResults

Number of results desired per query. The maximum value per
query is 10.

Note:

If you do a query that doesn't have many matches, the
actual number of results you get may be smaller than what you
request.

filter


Activates or deactivates automatic results filtering, which hides
very similar results and results that all come from the same
Web host.

restrict

Restricts the search to a subset of the Google Web

index, such as a country like "Ukraine" or a topic like

"Linux."

safeSearch

A Boolean value which enables filtering of adult

content in the search results.


lr

Language Restrict
-

Restricts the search to documents

within one or more languages.

ie

Input Encoding

-

this parameter has been deprecated

and is ignored. All requests to the APIs should be

made with UTF
-
8 encoding.


oe

Output Encoding

-

this parameter has been

deprecated and is ignored. All requests to the APIs

should be made with UTF
-
8 encoding.

Now to Retrieve the Search
Results



if(defined($result
-
>{resultElements})) {


print join "
\
n",


"Found:",


$result
-
>{resultElements}
-
>[0]
-
>{title},


$result
-
>{resultElements}
-
>[0]
-
>{URL},


$result
-
>{resultElements}
-
>[0]
-
>{snippet} . "
\
n"



}


print "
\
n The search took ";

print $result
-
>{searchTime};

print "
\
n
\
n";

print "The estimated Number of results for your query is: ";

print $result
-
>{estimatedTotalResultsCount};

print "
\
n
\
n";

What you need for
your program

Search.pl Output


Found:

University of Minnesota <b>Duluth</b> Welcomes You


http://www.d.umn.edu/


The University of Minnesota <b>Duluth</b> Homepage: an overview of academic
prog

rams, campus<br> life, resources, news and events, with extensive links to other


web sites <b>...</b>


The search took 0.159791


The estimated Number of results for your query is: 881000





Or, to get all elements:


foreach $temp (@{$result
-
>{resultElements}}) {


print $temp
-
>{snippet};


}


foreach $temp (@{$result
-
>{resultElements}}) {


print $temp
-
>{URL};


}


foreach $temp (@{$result
-
>{resultElements}}) {


$title_array[$count++]=$temp
-
>{title};


}

How to get a spelling suggestion?

#!/usr/local/bin/perl
-
w


use SOAP::Lite;


# Configuration

$key = "Your Key Goes Here";


# Initialize with local SOAP::Lite file

$service = SOAP::Lite


-
> service('file:GoogleSearch.wsdl');


$correction = $service
-
>doSpellingSuggestion($key,$searchString);

How do I get the results?


Easy,



The variable Correction will contain the spelling
suggestion, if Google has one, or it would be
empty if there is no suggestion


So Retrieving the result would be as easy as:


print "
\
n The suggested spelling for $searchString is
$correction
\
n
\
n";

Spelling output

Enter a word

dulut



The suggested spelling for “Duluth” is:


duluth


How do I get a cached web page?


Google has this feature that given a URL, it
will try to retrieve the web page from its
“cache”.


So the actual contents of the page might be
somewhat old, relative to when the web crawlers
or Google did an update on the site


Example,

Example Contd…


#!/usr/local/bin/perl

w

use SOAP::Lite;



# Configuration

$key = "Your Key Goes Here";


# Initialize with local SOAP::Lite file

$service = SOAP::Lite


-
> service('file:GoogleSearch.wsdl');


$url="http://www.d.umn.edu";


$cachedPage=$service
-
>doGetCachedPage($key,$url);

How do I retrieve the results?


This is going to be the same as the spelling
suggestion,


So if the web page does exist you will have the
whole web page HTML in the “cachedWebpage”
variable.


Otherwise, you would get a message from Google
which says


“ This web page has not been
updated…blah…blah…blah



Search with other options:

Google has four topic restricts:



Topic<restrict>

value


US. Government

unclesam

Linux

linux

Macintosh


mac


FreeBSD

bsd



Search with Restrictions:

$result = $service


-
> doGoogleSearch(




$key, # key




$query,



# search query




0, # start results




10, # max results




"false", # filter: boolean




"linux", # restrict (string)




"false", # safeSearch: boolean




"", # lr




"", # ie




"" # oe




);


Search with Language
Restrictions

$result = $service


-
> doGoogleSearch(




$key, # key




$query,


# search query




0, # start results




10, # max results




"false", # filter: boolean




"", # restrict (string)




"false", # safeSearch: boolean




"lang_de", # lr




"", # ie




"" # oe




);









print "
\
n The search took ";

print $result
-
>{searchTime};

print "
\
n
\
n";

print "The estimated Number of results for your query is: ";

print $result
-
>{estimatedTotalResultsCount};

print "
\
n
\
n";



if(defined($result
-
>{resultElements})) {


print join "
\
n",


"Found:",


$result
-
>{resultElements}
-
>[0]
-
>{title},$result
-
>{resultElements}
-
>[0]
-
>{URL},


$result
-
>{resultElements}
-
>[0]
-
>{snippet} . "
\
n"




}

lang_de = Gernman

Search with Language
Restrictions Contd…


Please Enter Search Item

der sturm



The search took 0.309039


The estimated Number of results for your query is: 206000


Found:

SK <b>STURM</b> GRAZ
-

Willkommen beim Sk <b>Sturm</b>

http://www.sksturm.at/

Eintreten. Puntigamer das bierige Bier, Steiermark.com, Puma, Tipp3,<br>
Autohaus Jakob Prügger, Graz
-

Hausmannstätten. © 2003 SkSturm
<b>...</b>


Tips on Querying Google



Default Search


By default, Google only returns pages that include all of the terms in the query string.





Stop Words


Google ignores common words and characters such as
"where"

and
"how,"

as well as certain single digits and single
letters. Common words that are ignored are known as
stop words
.


However, you can prevent Google from ignoring stop words by enclosing them in quotes, such as in the phrase "to be
or not to be".


Special Characters


By default, all non
-
alphanumeric characters that are included in a search query are treated as word separators.



The only exceptions are the following: double quote mark ("), plus sign (+), minus sign or hyphen (
-
), and ampersand
(&).




The ampersand character (&) is treated as another character in the query term in which it is included, while the
remaining exception characters correspond to search features listed in the section below.


Special Query Terms


Google supports the use of several special query terms that allow the user or search administrator to access additional
capabilities of the Google search engine.



(The same Explanations can be found in the API Reference Section in your Google API download)

(The following Special Query table can


be found in the API Reference Section in
your Google API download)


Special Query
Capability

Example Query

Description

Include Query Term


Star Wars Episode +I


If a common word is essential to
getting the results you want, you can
include it by putting a "+" sign in front
of it.



Exclude Query Term

bass
-
music

You can exclude a word from your
search by putting a minus sign ("
-
")
immediately in front of the term you
want to exclude from the search
results.

Phrase Search

"yellow pages"

Search for complete phrases by
enclosing them in quotation marks or
connecting them with hyphens. Words
marked in this way will appear
together in all results exactly as
entered.

Note:

You may need to use a "+" to
force inclusion of common words in a
phrase.

Boolean OR Search

vacation london OR paris

Google search supports the Boolean "OR" operator.

To retrieve pages that include either word A or word

B, use an uppercase OR between terms.

Site Restricted
Search

admission
site:www.stanford.edu

If you know the specific web site you want to search

but aren't sure where the information is located within

that site, you can use Google to search only within a

specific web site.


Do this by entering your query followed by the string

"site:"

followed by the host name.

Date Restricted
Search

Star Wars
daterange:2452122
-
2452234

If you want to limit your results to documents

that were published within a specific date

range, then you can use the "
daterange:

"

query term to accomplish this.

The

"
daterange:
" query term must be in the

following format:

daterange:<start_date>
-
<end date>

Title Search
(term)

intitle:Google
search

If you prepend
"intitle:"

to a query term, Google search

restricts the results to documents containing that

word in the title.

Note there can be no space between

the
"intitle:"

and the following word.


Title Search (all)

allintitle: Google
search

Starting a query with the term
"allintitle:"

restricts the results to

those with all of the query words in the title.

URL Search
(term)

inurl:Google
search

If you prepend
"inurl:"

to a query term, Google search restricts

the results to documents containing that word in the result

URL.

Note there can be no space between the
"inurl:"

and the

following word.



To find multiple words in a result URL, use the
"inurl:"


operator for each word.

Note:

Putting
"inurl:"

in front of every word in your query is

equivalent to putting
"allinurl:"

at the front of your query

URL Search
(all)

allinurl: Google search

Starting a query with the term
"allinurl:"

restricts the results
to those with all of the query words in the result URL.

Text Only
Search (all)

allintext: Google search

Starting a query with the term
"allintext:"

restricts the results
to those with all of the query words in only the body
text, ignoring link, URL, and title matches.

Links Only
Search (all)

allinlinks: Google search

Starting a query with the term
"allinlinks:"

restricts the
results to those with all of the query words in the
URL links on the page.

File Type
Filtering

Google filetype:doc

OR
filetype:pdf

The query prefix
"filetype:"

filters the results returned to
include only documents with the extension
specified immediately after.

Note there can be no
space between
"filetype:"

and the specified
extension.

File Type
Exclusion

Google
-
filetype:doc

-
filetype:pdf

The query prefix
"
-
filetype:"

filters the results to exclude
documents with the extension specified immediately
after.


Note there can be no space between
"
-
filetype:"

and
the specified extension.

Web Document
Info

info:www.google.com


The query prefix
"info:"

returns a single result for the specified
URL if it exists in the index.

Back Links

link:www.google.com


The query prefix
"link:"

lists web pages that have links to the
specified web page. Note there can be no space between
"link:"

and the web page URL.

Related Links

related:www.google.com


The query prefix
"related:"

lists web pages that are similar to the
specified web page.

Note there can be no space between
"link:"

and the web page URL.

Cached Results
Page

cache:www.google.com web

The query prefix
"cache:"

returns the cached HTML version of
the specified web document that the Google search
crawled.

Note there can be no space between
"cache:"

and
the web page URL.

.

Other Interesting Issues


Search for say “yahoo”, and look at the
estimated number of results.


Wait for like a minute or so.


Search again for “yahoo” and look at the
estimated number of results.


The result, 5 out of 10 times, will be different.


Conclusion…

The API can be used as means of retrieving “information” and “Text” from the web.


Some interesting examples:


http://www.googleduel.com/original.php


http://douweosinga.com/projects/googlehacks


http://www.researchbuzz.org/archives/001418.shtml


http://cgi.sfu.ca/~gpeters/cgi
-
bin/pear/gender.php