Download AC-IPSyn manual

taxidermistplateSoftware and s/w Development

Nov 7, 2013 (3 years and 7 months ago)

87 views

AC
-
IPSyn

The AC
-
IPSyn software automatically computes the Index of Productive Syntax metric. This system
runs on Linux and is a command line system.


SYSTEM REQUIREMENTS


Operating System:

The current version of the software runs on Linux/UNIX systems. If

Linux/UNIX
is not installed, you could:



Install the Linux operating system: You can download Ubuntu 12.10 from
http://www.ubuntu.com/download/desktop




Run the Linux system from a DVD or USB st
ick: You can run Ubuntu 12.10 from a DVD or
USB stick. Please visit
http://www.ubuntu.com/download/help/try
-
ubuntu
-
before
-
you
-
install

for more details.



Install a vir
tual Linux machine on Windows using vmware. You can download the Vmware
player for Windows at
http://www.vmware.com/products/player/5_0
. Downloa
d the 32 bit
player for a 32
-
bit machine and 64 bit player for a 64 bit machine. You would need to
download the Ubuntu 12.10 vmware image at
http://www.thoughtpolice.co.uk/vmware/#ubunt
u12.10


Programming Language:

1) Python

To check if python is installed, at the command prompt type: python
--
version

The version should be less than 3. The system does not run with Python version 3 and above.

If python is not installed:

Download Python
2.7.3 from
http://www.python.org/download/

Choose Python 2.7.3 compressed source tarball

(for Linux, Unix or Mac OS X)

2) Programming La
nguage: Perl

To check if perl is installed, at the command prompt type: perl

version

If Perl is not installed, download and install the distribution present at

http://www.perl.org/get.html#unix_l
ike


TreeTagger


The AC
-
IPSyn package uses the TreeTagger software for determining unbound morphemes. This
package is provided within the IPSyn software package.


Charniak Parser:

The AC
-
IPSyn software uses the Charniak Parser to generate parses for child

utterances. The AC
-
IPSyn package contains the source package for the Charniak parser.


INSTALLATION OF THE AC
-
IPSYN PACKAGE


Extract the package:

Execute the following command at the prompt:

tar

zvxf AC
-
IPSyn.tar.gz

This should create a directory by the
name AC
-
IPSyn

Navigate to this directory with the following command:

cd AC
-
IPSyn


Charniak Parser Compatibility:


To ensure that the Charniak parser binary is compatible with your system:

Run the following command at the AC_IPSyn directory

./CharniakParse
r/parse.sh test.txt test.txt_parse


test.txt_parse should contain the parse of the sentence as follows:


(S1 (S (NP (DT This)) (VP (AUX is) (NP (DT a) (NN test) (NN sentence))) (. .)))

If there is an error, we may need to recompile the Charniak server on y
our system.


Navigate to the CharniakParser directory as follows:

cd CharniakParser


Extract the source files from the source package as follows:


tar

zvxf CharniakParserPackage.tar.gz


Navigate to the CharniakParserPackage directory:

cd CharniakParserPa
ckage


Execute the following commands:

rm *.o

make parseIt


There should be a file called
parseIt

in the directory.

Type the command:
ls parseIt

It should list the file
“parseIt”


Copy this file to the CharniakParser directory using the command:

cp parseI
t ..


Navigate to the AC
-
IPSyn directory using the command:

cd ..


Run the following command at the AC
-
IPSyn directory


./CharniakParser/parse.sh test.txt test.txt_parse


test.txt_parse should contain the parse of the sentence as follows:


(S1 (S (NP (DT T
his)) (VP (AUX is) (NP (DT a) (NN test) (NN sentence))) (. .)))


INPUT FORMAT


The AC
-
IPSyn software takes as input transcripts in the SALT and CHAT transcription format.

For a description of the SALT transcription conventions, please refer to the manual
at
http://www.saltsoftware.com/salt/TranConvSummary.pdf
. For a description of the CHAT
transcription conventions, please refer to the manual at


http://childes.psy.cmu.edu/manuals/CHAT.pdf
. Sample SALT transcripts are present in
sample_SALT/input

directory. Sample CHAT transcripts are present
in sample_CHAT/input

directory.


OUTPUT FORMAT

For each input transcript named transcr
ipt.EXT, where EXT is the extension (the norm is to use
the .slt extension for SALT transcripts and .cha extension for CHAT transcripts), the AC
-
IPSyn
system produces the following files:



transcript.IPS


This file has the IPSyn computation results. It has

the following:

o

A list of all the structures that were identified by the IPSyn system

o

A score for each of the structures which can take a value between 0 and 2

o

A summary chart of the scores across the noun, verb, question and negations and
sentence categor
ies

o

IPSyn score

o

Sentence listing



transcript.RAW


This file has the modified IPSyn score results. Here, the score for each
structure is not limited at 2. Please note that all other restrictions such as that of
uniqueness and exceptions are taken into accou
nt while computing the modified IPSyn score.
The result file has the following:

o

A list of all the structures that were identified by the IPSyn system

o

A score for each of the structures which can take a value greater than or equal to
zero

o

A summary chart of

the scores across the noun, verb, question and negations and
sentence categories

o

Modified IPSyn score

o

Sentence listing



transcript.PRP


This file contains the preprocessed transcript and is an intermediate file
used by the AC
-
IPSyn system.



transcript.PAR
SE


This file contains the parses of the transcript and is an intermediate file
used by the AC
-
IPSyn system


The output files will be present in the output directories that are described next.


OUTPUT DIRECTORIES


The following directories are created in
the output directory specified by the user:

1.

preprocessed

This directory contains the preprocessed transcripts. These are intermediate files generated by
the AC
-
IPSyn system.

2.

parses

This directory contains the parses of the transcripts. These are intermedi
ate files generated by
the AC
-
IPSyn system.

3.

results

This directory contains the IPSyn score for each transcript.

4.

raw

This directory contains the modified IPSyn score where each structure can have a score of more
than 2.


RUNNING THE AC
-
IPSYN System


Run

the command python source/generateIPSyn.py at the shell prompt:

-
bash
-
4.0$ python source/generateIPSyn.py


The following explains the prompts one by one.


==Prompt:


1. Choose 1 if input is a SALT transcript file

2. Choose 2 if input is a CHAT transcri
pt file

==Explanation:

Choose 1 to process a SALT transcript, 2 to process a CHAT transcript


==Prompt:

Enter the speaker ID of the child (case sensitive)

==Explanation:

The speaker ID identifies the speakers in the transcript. The speaker ID is case sensi
tive. The AC
-
IPSyn software will extract the utterances with this speaker ID.

NOTE: In case of batch processing of multiple transcripts, the AC
-
IPSyn system assumes all the
transcripts have the same child label.

For the example files provided, enter C for
the SALT transcript and CHI for the CHAT transcripts.


==Prompt:

1. Choose 1 if input is a single file

2: Choose 2 if input is a directory

==Explanation:

Choose 1 if you wish to compute the IPSyn score for a single transcript, 2 if you wish to compute
the

IPSyn score for all the transcripts in a directory.


==Prompt:

Enter the name of the transcript:

==Explanation:

If you choose to process a single transcript, enter the name of the transcript file. Please note the
name of the transcript is case sensitive a
nd includes the extension.


==Prompt:

Enter the name of directory containing transcripts:

==Explanation:

If you choose to process all the transcripts in a directory, enter the name of the directory
containing the transcripts.


==Prompt:

Enter the name of
the directory where output files will be stored:

==Explanation:

Give the name of the directory where you want to store the output. If this directory does not exist,
a new one will be created. If the directory exists, all the contents of the directory will
be
overwritten with the new results.


==Prompt:

1. Generate IPSyn Score on first 100 utterances

2. Generate IPSyn Score on entire transcript

3. Generate IPSyn Score on a range of utterances

==Explanation:

The IPSyn score in computed based on the utterance
s in the preprocessed file.

Select '1' if you wish to compute the IPSyn score for the first 100 utterances.

Select '2' if you wish to compute the IPSyn score for the entire transcript.

Select '3' if you wish to compute the IPSyn score for a range of utt
erances.

If you select '3', the range is calculated based on the utterances in the preprocessed transcript. If
the beginning of the given range is larger than the number of utterances in the preprocessed
transcript, the IPSyn score will not be computed. Ho
wever, the preprocessed transcript and parsed
transcript will be stored in the output directory.



TESTING THE AC
-
IPSYN SYSTEM

The directories
sample_SALT_dir

and
sample_CHAT_dir

have sample SALT and CHAT transcripts
along with the expected output.


sample
_SALT_dir/input contains the input directory with SALT transcripts

sample_SALT_dir /preprocessed contains the preprocessed SALT transcripts

sample_SALT_dir /parses contains the parses of the SALT transcripts

sample_SALT_dir /results contains the IPSyn scor
e for the SALT transcripts

sample_SALT_dir /raw contains the results when the IPSyn score for each structure is not limited at
2


sample_CHAT_dir /input

contains the input directory with CHAT transcripts

sample_CHAT_dir /preprocessed

contains the preproces
sed CHAT transcripts

sample_CHAT_dir /parses

contains the parses of the CHAT transcripts

sample_CHAT_dir /results contains the IPSyn score for the CHAT transcripts

sample_CHAT_dir /raw contains the results when the IPSyn score for each structure is not lim
ited at
2

TEST THE SYSTEM WITH SALT TRANSCRIPTS

Type the following command at the AC
-
IPSyn directory:

python source/generateIPSyn.py


When prompted:

1. Choose 1 if input is a SALT transcript file

2. Choose 2 if input is a CHAT transcript file

Choose 1


Whe
n prompted:

Enter the speaker ID of the child (case sensitive)

Enter C


When prompted:

1. Choose 1 if input is a single file

2: Choose 2 if input is a directory

Enter 2


When prompted:

Enter the name of directory containing transcripts:

Enter sample_SALT_d
ir/input


When prompted:

Enter the name of the directory where output files will be stored:

Enter the directory name wher
e you want to store the output. There will be 4 sub
-
directories that
will be created: preprocessed, parses, results and raw.


If this d
irectory exists,
and contains any of the four subdirectories: preprocessed, parses, results
and raw, you will be prompted as follows:

One of the directories


灲数牯cess敤, p慲s敳, 牥s畬ts 潲 r慷 is 灲ese湴 in t桥 潵瑰tt d楲ec瑯特

Do 祯甠w慮t t漠潶e牷r楴e
瑨e co湴敮ts of 瑨es攠d楲ect潲ies (Y/N)?

䕮瑥爠夠rf 祯u 睡湴 瑯to癥牷v楴攠瑨t c潮瑥n瑳. T
桥hc潮o敮es 潦 瑨t d楲散t潲
楥s 灲数牯e敳s敤Ⱐ
灡牳敳, 牥ru汴s 慮a 牡r

睩汬 b攠潶敲睲楴瑥n.

Enter N if you want to give a different output directory name. You will see t
he prompt for entering
the output directory. Specify a different output directory name where you want to store the output.


When prompted:

1. Generate IPSyn Score on first 100 utterances

2. Generate IPSyn Score on entire transcript

3. Generate IPSyn Score
on a range of utterances

Enter 1


To display the content of a directory, use the command
ls

Screen Shot


-
bash
-
4.0$ python source/generateIPSyn.py

1. Choose 1 if input is a SALT transcript file

2. Choose 2 if input is a CHAT transcript file

1

Enter the s
peaker ID of the child (case sensitive)

C

1. Choose 1 if input is a single file

2: Choose 2 if input is a directory

2

Enter the name of directory containing transcripts:

sample_SALT_dir/input

Enter the name of the directory where output files will be store
d:

SALT_output

1. Generate IPSyn Score on first 100 utterances

2. Generate IPSyn Score on entire transcript

3. Generate IPSyn Score on a range of utterances

1

Preprocessing the transcripts

Andy
-
Nar
-
SSS

Kelina
-
Con

Laura
-
APNF

Maria
-
FWAY
-
English

Parsing the t
ranscripts

Andy
-
Nar
-
SSS

16467: old priority 0, new priority 5

Kelina
-
Con

16484: old priority 0, new priority 5

Laura
-
APNF

16501: old priority 0, new priority 5

Maria
-
FWAY
-
English

16518: old priority 0, new priority 5

Computing IPSyn score

Andy
-
Nar
-
SSS

usin
g tree
-
tagger


reading parameters ...


tagging ...



finished.

Kelina
-
Con

Laura
-
APNF

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters

...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

Maria
-
FWAY
-
English


-
bash
-
4.0$ ls SALT_output/*

SALT_output/parses:

Andy
-
Nar
-
SSS.PARSE Kelina
-
Con.PARSE.bak Maria
-
FWAY
-
English.PARSE

Andy
-
Nar
-
SSS.PARSE.b
ak Laura
-
APNF.PARSE Maria
-
FWAY
-
English.PARSE.bak

Kelina
-
Con.PARSE Laura
-
APNF.PARSE.bak


SALT_output/preprocessed:

Andy
-
Nar
-
SSS.PRP Kelina
-
Con.PRP Laura
-
APNF.PRP Maria
-
FWAY
-
English.PRP


SALT_output/raw:

Andy
-
Nar
-
SSS.RAW Kelina
-
Con.RAW Laur
a
-
APNF.RAW Maria
-
FWAY
-
English.RAW


SALT_output/results:

Andy
-
Nar
-
SSS.IPS Kelina
-
Con.IPS Laura
-
APNF.IPS Maria
-
FWAY
-
English.IPS


TEST THE SYSTEM WITH CHAT TRANSCRIPTS


Type the following command at the AC
-
IPSyn directory:


python source/generateIPSyn.py


When prompted:

1. Choose 1 if input is a SALT transcript file

2. Choose 2 if input is a CHAT transcript file

Choose 2


When prompted:

Enter the speaker ID of the child (case sensitive)

Enter CHI


When prompted:

1. Choose 1 if input is a single file

2: Cho
ose 2 if input is a directory

Enter 2


When prompted:

Enter the name of directory containing transcripts:

Enter sample_CHAT_dir/input


When prompted:

Enter the name of the directory where output files will be stored:

Enter the directory name where you want

to store the output. If this directory exists, the contents
of the directory will be overwritten.


There will be 4 sub
-
directories that will be created: preprocessed, parses, results and raw.


When prompted:

1. Generate IPSyn Score on first 100 utterances

2. Generate IPSyn Score on entire transcript

3. Generate IPSyn Score on a range of utterances

Enter 1


To display the content of a directory, use the command
ls


Screen Shot


-
bash
-
4.0$ python source/generateIPSyn.py

1. Choose 1 if input is a SALT trans
cript file

2. Choose 2 if input is a CHAT transcript file

2

Enter the speaker ID of the child (case sensitive)

CHI

1. Choose 1 if input is a single file

2: Choose 2 if input is a directory

2

Enter the name of directory containing transcripts:

sample_CHAT_d
ir/input

Enter the name of the directory where output files will be stored:

CHAT_output

1. Generate IPSyn Score on first 100 utterances

2. Generate IPSyn Score on entire transcript

3. Generate IPSyn Score on a range of utterances

1

Preprocessing the transc
ripts

fssli009

fssli058

fssli062

fssli066

fssli108

fssli113

fssli501

fssli519

fssli526

fssli528

fssli536

fssli568

fssli576

fssli589

fssli591

fssli592

fssli599

fssli608

fssli613

Parsing the transcripts

fssli009

16670: old priority 0, new priority 5

fssli058

16687: old priority 0, new priority 5

fssli062

16704: old priority 0, new priority 5

fssli066

16721: old priority 0, new priority 5

fssli108

16738: old priority 0, new priority 5

fssli113

16755: old priority 0, new priority 5

fssli501

16772: old priority
0, new priority 5

fssli519

16789: old priority 0, new priority 5

fssli526

16848: old priority 0, new priority 5

fssli528

16865: old priority 0, new priority 5

fssli536

16882: old priority 0, new priority 5

fssli568

16899: old priority 0, new priority 5

fss
li576

16916: old priority 0, new priority 5

fssli589

16933: old priority 0, new priority 5

fssli591

16950: old priority 0, new priority 5

fssli592

16967: old priority 0, new priority 5

fssli599

16984: old priority 0, new priority 5

fssli608

17001: old prio
rity 0, new priority 5

fssli613

17018: old priority 0, new priority 5

Computing IPSyn score

fssli009

fssli058

fssli062

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

usi
ng tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



f
inished.

fssli066

using tree
-
tagger


reading parameters ...


tagging ...



finished.

fssli108

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

fssli113

fssli501

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished
.

fssli519

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


t
agging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

fssli526

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


read
ing parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

fssli528

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

fs
sli536

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

fssli568

using tree
-
tagger


reading parameters ...


tagging ...



finished.

fssli576

using tree
-
tagger


reading par
ameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

fssli589

fssli591

using tree
-
tagger


reading parameters ...


tagging ...



finished.

fssli592

using tree
-
tagger


reading parameters ...


tagging ...



f
inished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

fssli599

using tree
-
tagger


reading parameters ...


tagging ...



finished.

fssli608

fssli613

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

using tree
-
tagger


reading parameters ...


tagging ...



finished.

-
bash
-
4
.0$ ls CHAT_output/*

CHAT_output/parses:

fssli009.PARSE fssli113.PARSE fssli536.PARSE fssli592.PARSE

fssli009.PARSE.bak fssli113.PARSE.bak fssli536.PARSE.bak fssli592.PARSE.bak

fssli058.PARSE fssli501.PARSE fssli568.PARSE f
ssli599.PARSE

fssli058.PARSE.bak fssli501.PARSE.bak fssli568.PARSE.bak fssli599.PARSE.bak

fssli062.PARSE fssli519.PARSE fssli576.PARSE fssli608.PARSE

fssli062.PARSE.bak fssli519.PARSE.bak fssli576.PARSE.bak fssli608.PARSE.bak

fssli066.
PARSE fssli526.PARSE fssli589.PARSE fssli613.PARSE

fssli066.PARSE.bak fssli526.PARSE.bak fssli589.PARSE.bak fssli613.PARSE.bak

fssli108.PARSE fssli528.PARSE fssli591.PARSE

fssli108.PARSE.bak fssli528.PARSE.bak fssli591.PARSE.b
ak


CHAT_output/preprocessed:

fssli009.PRP fssli108.PRP fssli526.PRP fssli576.PRP fssli599.PRP

fssli058.PRP fssli113.PRP fssli528.PRP fssli589.PRP fssli608.PRP

fssli062.PRP fssli501.PRP fssli536.PRP fssli591.PRP fssli613.PRP

fssli066.PRP fssli
519.PRP fssli568.PRP fssli592.PRP


CHAT_output/raw:

fssli009.RAW fssli108.RAW fssli526.RAW fssli576.RAW fssli599.RAW

fssli058.RAW fssli113.RAW fssli528.RAW fssli589.RAW fssli608.RAW

fssli062.RAW fssli501.RAW fssli536.RAW fssli591.RAW fssli613
.RAW

fssli066.RAW fssli519.RAW fssli568.RAW fssli592.RAW


CHAT_output/results:

fssli009.IPS fssli108.IPS fssli526.IPS fssli576.IPS fssli599.IPS

fssli058.IPS fssli113.IPS fssli528.IPS fssli589.IPS fssli608.IPS

fssli062.IPS fssli501.IPS fssli536
.IPS fssli591.IPS fssli613.IPS

fssli066.IPS fssli519.IPS fssli568.IPS fssli592.IPS