Computational Linguistics II

adventurescoldΛογισμικό & κατασκευή λογ/κού

7 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

80 εμφανίσεις

Programming Overview
Python
Computational Linguistics II
Python Programming
Jason Baldridge
Department of Linguistics
University of Texas at Austin
September 6,2005
Jason Baldridge
Python
Programming Overview
Python
Outline
1
Programming Overview
2
Python
Jason Baldridge
Python
Programming Overview
Python
Programming in NLP
Not an option – you must be able to program.
Must balance efficiency with productivity.
Code for functionality first,efficiency later.
Software engineering can be quite important,but should
not get in the way.
Use regression tests.
Use revision tools like CVS.
Picking the best algorithm is usually more important than
picking the best language.
Jason Baldridge
Python
Programming Overview
Python
NLP Software Resources
There are many implementations of various kinds of NLP
software available.
Those with open source licenses are particularly attractive
since you can see and modify the source code.
See opennlp.sf.net for many open source NLP
projects and links.
When considering the use of an existing implementation,
pay attention to:
its maturity:bugs and features
the documentation available for it
its platform
recent activity
modularity
Jason Baldridge
Python
Programming Overview
Python
Programming Languages
C
C++
Java
Lisp
Perl
Prolog
Python
many obscure and surely interesting languages I’ve never
looked at but who have passionate advocates
etc...
Jason Baldridge
Python
Programming Overview
Python
Programming Paradigms
Imperative:C/C++,Java,Python,Perl
Declarative:Prolog
Functional:Lisp,Scheme,ML
Interactive:Lisp,Prolog,Python
Object-oriented:C++,Java,Python,(Perl)
Scripting:Python,Perl
Jason Baldridge
Python
Programming Overview
Python
Language Benchmarks
The Great Computer Language Shootout
Speed normalized against C++,2001
Test
Lisp Java Python Perl C++
hash access
1.06 3.23 4.01 1.85 1.00
exception handling
0.01 0.90 1.54 1.73 1.00
sum numbers from file
7.54 2.63 8.34 2.49 1.00
reverse lines
1.61 1.22 1.38 1.25 1.00
matrix multiplication
3.30 8.90 278.00 226.00 1.00
heapsort
1.67 7.00 84.42 75.67 1.00
array access
1.75 6.83 141.08 127.25 1.00
list processing
0.93 20.47 20.33 11.27 1.00
object instantiation
1.32 2.39 49.11 89.21 1.00
word count
0.73 4.61 2.57 1.64 1.00
Median
1.67 4.61 20.33 11.27 1.00
25%to 75%
0.93 to 1.67 2.63 to 7.00 2.57 to 84.42 1.73 to 89.21 1.00 to 1.00
Range
0.01 to 7.54 0.90 to 20.47 1.38 to 278 1.25 to 226 1.00 to 1.00
(Just for a rough illustration of speed!!)
Jason Baldridge
Python
Programming Overview
Python
Picking a Language
The language you choose depends on:
your task
existing tools and implementations
the languages you know
Don’t forget about low level Unix utilities like
grep,sed,awk,wc,sort.
...or go learn about them:
http://ling.osu.edu/~cbrew/dilbook.ps
Jason Baldridge
Python
Programming Overview
Python
Python
Relatively new language
Full-purpose language,with scripting orientation
Good for proto-typing.
Can be used to glue code in different languages together.
Cross-platform
Minimalist
Easy to learn,easy to use
Not as fast as compiled languages,eg C/C++
...but Python code can be migrated to Java or C/C++
incrementally (and partially).
Rapidly gaining in popularity
Jason Baldridge
Python
Programming Overview
Python
Example Program
Find words ending with “ing” (from NLTK tutorial):
import sys#load the system library
for line in sys.stdin.readlines():#for each line of input
for word in line.split():#for each word in the line
if word.endswith(’ing’):#does the word end in ’ing’?
print word#if so,print the word
Jason Baldridge
Python
Programming Overview
Python
Example ProgramComparison (fromNLTK)
Python
import sys
for line in sys.stdin.readlines():
for word in line.split():
if word.endswith(’ing’):
print word
Perl
while (<>) {
foreach my $word (split) {
if ($word =~/ing$/) {
print"$word\n";
}
}
}
Java
import java.io.
*
;
public class IngWords {
public static void main(String[] args) {
BufferedReader in = new BufferedReader(new
InputStreamReader(
System.in));
String line = in.readLine();
while (line!= null) {
for (String word:line.split("")) {
if (word.endsWith("ing"))
System.out.println(word);
}
line = in.readLine();
}
}
}
Jason Baldridge
Python
Programming Overview
Python
Example ProgramComparison (fromNLTK)
Lisp
(defpackage"REGEXP-TEST"(:use"LISP""REGEXP"))
(in-package"REGEXP-TEST")
(defun has-suffix (string suffix)
"Open a file and look for words ending in _ing."
(with-open-file (f string)
(with-loop-split (s f"")
(mapcar#’(lambda (x) (has_suffix suffix x)) s))))
(defun has_suffix (suffix string)
(let
*
((suffix_len (length suffix))
(string_len (length string))
(base_len (- string_len suffix_len)))
(if (string-equal suffix string:start1 0:end1 NIL:start2 base_len:end2 NIL)
(print string))))
(has-suffix"test.txt""ing")
Jason Baldridge
Python
Programming Overview
Python
Example ProgramComparison (fromNLTK)
C++
#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
#define BUFFER_SIZE 1024
int main(int argc,char
**
argv) {
regex_t space_pat,ing_pat;
char buffer[BUFFER_SIZE];
regcomp(&space_pat,"[,\t\n]+",REG_EXTENDED);
regcomp(&ing_pat,"ing$",REG_EXTENDED | REG_ICASE);
while (fgets(buffer,BUFFER_SIZE,stdin)!= NULL) {
char
*
start = buffer;
regmatch_t space_match;
while (regexec(&space_pat,start,1,&space_match,0) == 0) {
if (space_match.rm_so > 0) {
regmatch_t ing_match;
start[space_match.rm_so] = ’\0’;
if (regexec(&ing_pat,start,1,&ing_match,0) == 0)
printf("%s\n",start);
}
start += space_match.rm_eo;
}
}
regfree(&space_pat);
regfree(&ing_pat);
return 0;
Jason Baldridge
Python
Programming Overview
Python
Example ProgramComparison (fromNLTK)
Prolog
main:-
current_input(InputStream),
read_stream_to_codes(InputStream,Codes),
codesToWords(Codes,Words),
maplist(string_to_list,Words,Strings),
filter(endsWithIng,Strings,MatchingStrings),
writeMany(MatchingStrings),
halt.
codesToWords([],[]).
codesToWords([Head | Tail],Words):-
( char_type(Head,space) ->
codesToWords(Tail,Words)
;
getWord([Head | Tail],Word,Rest),
codesToWords(Rest,Words0),
Words = [Word | Words0]
).
getWord([],[],[]).
getWord([Head | Tail],Word,Rest):-
(
( char_type(Head,space);char_type(Head,punct) )
-> Word = [],Tail = Rest
;getWord(Tail,Word0,Rest),Word = [Head | Word0]
).
Jason Baldridge
Python
Programming Overview
Python
Example ProgramComparison (fromNLTK)
Prolog (continued)
filter(Predicate,List0,List):-
( List0 = [] -> List = []
;List0 = [Head | Tail],
( apply(Predicate,[Head]) ->
filter(Predicate,Tail,List1),
List = [Head | List1]
;filter(Predicate,Tail,List)
)
).
endsWithIng(String):- sub_string(String,_Start,_Len,0,’ing’).
writeMany([]).
writeMany([Head | Tail]):- write(Head),nl,writeMany(Tail).
Jason Baldridge
Python
Programming Overview
Python
Running Python
Python executable is usually in the path already.
Usual location:/usr/bin/python
How to find out:
> which python
Editors:Emacs,JEdit,Vim,others on
www.python.org/editors (use one that understands
Python tabbing!)
Example interactive session.
Jason Baldridge
Python
Programming Overview
Python
First Python Program
Edit a file:
hello.py
print"Hello,world!"
Run it from Linux shell:
> python hello.py
Hello,world!
Jason Baldridge
Python
Programming Overview
Python
First Python Program
Shebang:making it exectuble
“#” normally indicates a comment
“#!” tells the OS where to find python
hello.py
print"#!/usr/bin/python"
print"Hello,world!"
Make the file exectuable:
> chmod u+x hello.py
Run it:
>./hello.py
Jason Baldridge
Python
Programming Overview
Python
Core Python Types
Boolean:True,False
Integer:3
Long:99999999999999999999999999L
Float:3.14
Complex:2+3j
String:"Hello","3.14""John’s"’John\’s’
Lists:L1 = [1,6,3,9]
Tuples:T1 = (1,5)
Dictionaries:D1 = {"a":1,"b":2 }
Jason Baldridge
Python
Programming Overview
Python
Core Python Types
Boolean:True,False
Integer:3
Long:99999999999999999999999999L
Float:3.14
Complex:2+3j
String:"Hello","3.14""John’s"’John\’s’
Lists:L1 = [1,6,3,9]
Tuples:T1 = (1,5)
Dictionaries:D1 = {"a":1,"b":2 }
Jason Baldridge
Python
Programming Overview
Python
Core Python Types
Boolean:True,False
Integer:3
Long:99999999999999999999999999L
Float:3.14
Complex:2+3j
String:"Hello","3.14""John’s"’John\’s’
Lists:L1 = [1,6,3,9]
Tuples:T1 = (1,5)
Dictionaries:D1 = {"a":1,"b":2 }
Jason Baldridge
Python
Programming Overview
Python
Core Python Types
Boolean:True,False
Integer:3
Long:99999999999999999999999999L
Float:3.14
Complex:2+3j
String:"Hello","3.14""John’s"’John\’s’
Lists:L1 = [1,6,3,9]
Tuples:T1 = (1,5)
Dictionaries:D1 = {"a":1,"b":2 }
Jason Baldridge
Python
Programming Overview
Python
Core Python Types
Boolean:True,False
Integer:3
Long:99999999999999999999999999L
Float:3.14
Complex:2+3j
String:"Hello","3.14""John’s"’John\’s’
Lists:L1 = [1,6,3,9]
Tuples:T1 = (1,5)
Dictionaries:D1 = {"a":1,"b":2 }
Jason Baldridge
Python
Programming Overview
Python
Core Python Types
Boolean:True,False
Integer:3
Long:99999999999999999999999999L
Float:3.14
Complex:2+3j
String:"Hello","3.14""John’s"’John\’s’
Lists:L1 = [1,6,3,9]
Tuples:T1 = (1,5)
Dictionaries:D1 = {"a":1,"b":2 }
Jason Baldridge
Python
Programming Overview
Python
Core Python Types
Boolean:True,False
Integer:3
Long:99999999999999999999999999L
Float:3.14
Complex:2+3j
String:"Hello","3.14""John’s"’John\’s’
Lists:L1 = [1,6,3,9]
Tuples:T1 = (1,5)
Dictionaries:D1 = {"a":1,"b":2 }
Jason Baldridge
Python
Programming Overview
Python
Core Python Types
Boolean:True,False
Integer:3
Long:99999999999999999999999999L
Float:3.14
Complex:2+3j
String:"Hello","3.14""John’s"’John\’s’
Lists:L1 = [1,6,3,9]
Tuples:T1 = (1,5)
Dictionaries:D1 = {"a":1,"b":2 }
Jason Baldridge
Python
Programming Overview
Python
Core Python Types
Boolean:True,False
Integer:3
Long:99999999999999999999999999L
Float:3.14
Complex:2+3j
String:"Hello","3.14""John’s"’John\’s’
Lists:L1 = [1,6,3,9]
Tuples:T1 = (1,5)
Dictionaries:D1 = {"a":1,"b":2 }
Jason Baldridge
Python
Programming Overview
Python
Operators
Arithmetic:+,-,
*
,/,
**
,%
Comparison:
==,!=,>,<,is,is not,in,not in
Logical:not,and,or
Bitwise operators
Jason Baldridge
Python
Programming Overview
Python
Operators
Arithmetic:+,-,
*
,/,
**
,%
Comparison:
==,!=,>,<,is,is not,in,not in
Logical:not,and,or
Bitwise operators
Jason Baldridge
Python
Programming Overview
Python
Operators
Arithmetic:+,-,
*
,/,
**
,%
Comparison:
==,!=,>,<,is,is not,in,not in
Logical:not,and,or
Bitwise operators
Jason Baldridge
Python
Programming Overview
Python
Operators
Arithmetic:+,-,
*
,/,
**
,%
Comparison:
==,!=,>,<,is,is not,in,not in
Logical:not,and,or
Bitwise operators
Jason Baldridge
Python
Programming Overview
Python
Statements
Assigment:y = x
**
2
Function calls:
writeStringToFile(‘‘myfile.txt’’,’’Hello!’’)
If tests:if x > y:print ‘‘x is greater’’
Iteration:for item in mylist:print item
Rather than for (int i=0;i<10;i++),Python uses
for i in range(10).
Example interactive session.
Jason Baldridge
Python
Programming Overview
Python
Functions
Define with def statement:
def printAdd (x,y):
val = x+y
print ‘‘x + y =’’,val
return val
Use with function call:
>>> y = printAdd (2,5)
x + y = 7
>>> y
7
Jason Baldridge
Python
Programming Overview
Python
Objects
Data structures with built in functionality.
Contrast:reverse(mylist) with mylist.reverse()
Python strings are objects:mystring.isdigit()
Inheritance:customization and reuse of data structures.
Encapsulation:hiding implementation details.
Jason Baldridge
Python
Programming Overview
Python
Demo:Useful Unix Command Line Utilities
Word count:wc
Sorting:sort
Reading a file:cat
Getting parts of files:head,tail
Filtering:grep
Getting unique elements:uniq
Translating characters:tr
Pasting lists together:paste
Jason Baldridge
Python
Programming Overview
Python
Readings
M&S,Chapter 1.pp 19-35.Chapter 4.
Jason Baldridge
Python