Brief Evaluation of Optical Character Recognition in Relation to the ELN

siberiaskeinData Management

Nov 20, 2013 (3 years and 7 months ago)

235 views

Brief Evaluation of Optical Character
Recognition in Relation to the ELN

Introduction

Some scientists

in Steve Ley’s group

have been using a digital pen to
capture hand
-
written notes whilst working in the laboratory

which
they later transfer into their experiments in the Electronic Lab
Notebook

(ELN)
.
The pen they have used was purchased
inexpensively and came with software that
captured the image in
.pegvf file format, which can be converted to
graphics file format
s

e.g. EMF.

The CamELS project purchased a
n “
e
-
pens
M
obile
N
otes


digital
pen
.

This device uses the same proprietary file
-
format (pegvf) as the
Ley Group’s pen, but

was supplied with the MyScript Notes
software for converting handwriting into digital text,
editing
converted text and exporting results to Microsoft Word or
other
applications
.

Conversion of handwriting to text before adding it to the ELN should
make the information easier and faster to read, particularly for
someone o
ther than the original auth
or.
It also makes the
information available for conventional searching and for

potential
future data
-
mining.
However, with current software, there is alway
s
at least some time overhead.
At the very least, the conversion to
text adds one extra step to the p
rocess of transferring data into the
ELN; there is an additional overhead of checking that the conversion
to text has recognised characters as intended and mak
ing any
necessary corrections.


Description

Ben

Deadman

provided

four

files in the .pegvf format
generated by
his digital pen
.


I import
ed the files into Note Manager
.
With MyScript Notes
running, I used the Note Manager’s Convert to Text functionality, to
transfer the files into MyScri
pt Notes ready for conversion.
There
was some inconsistent behavio
ur observed using Note Manager’s
Convert to Text: sometimes it attempted the conversion itself and
opened Notepad instead of transferring to MyScript Notes, but I
have not yet been able to fully characterise this behaviour.


Within MyScript Notes,
I origin
ally used t
he Global conversion
settings.
Ben’s username, BJD0489, was not correctly converted
until added it to the dictionary
.
Once in the dictionary
it was
recognised on two of the three occasions on which it appeared; on
the third occasion an extra mar
k near the “B” resulted in the
username being treated as a number.



For the notes with chemical structures, I highlighted the structures
as “special content areas” of type “Freeform drawing”, which meant
they transferr
ed over without any conversion.

I exp
erimented with
including

Text


in the content type, but this was generally
unsuccessful, with most of the lines representing chemical bonds
being removed.


Ben’s notes were

not written with
t
ext
r
ecognition in mind, and he
had not been briefed on any
of
the recommended practice for
MyScript Notes.


The
MyScript Notes
software has the capability of setting up
individual profiles for different users, based on writing ~two A4
sheets worth of training text, which includes n
umbers and special
characters.
Howev
er, creation of a profile is recommended only for
those who “
write in an unusual style
”, so would be expected to
have limited impact for Ben’s writing.


Possible time
-
savings are possible using Auto replacements, which
allow the user to define abbreviatio
ns, initials or acronyms which
will be automatically replaced with a full expression or name during
the conversion to text.


Conclusion

In my opinion, the conversion was most successful where the
note
comprised words
found in the dictionary.
The conversion

seems to
struggle where letters and numbers are interspersed
, where there
have been alterations made to the text after the initial writing or
extraneous pen strokes have been recorded.



Conversion

rates could probably be increased by:



Adding words used
to the dictionary



Marking errors by
putting an “
X


through them



Avoiding
making
e
xtraneous marks


For the use of character recognition to be time efficient, the one
-
off
overhead of conversion and check
ing the output must be
outweigh
ed by the potentially
repeated overhead involved when
readers stru
ggle to decipher hand
-
writing.
However, the
longer
-
term implications of information being unavailable to
text
-
mining should also be considered.

Perhaps the best
compromise would be to convert to text
only
those n
otes that
contain key information, especially results such as yield, which may
be important for future searching and data
-
mining.







Note

1





BJD0489









1.100 ml DVB









11.73649mL VBC









2.749 mL I
-
dole carol









27.9mg AIBN









Need to make more than the column volume.

=> Aim for 5.8mL









20T. DUB = 5.8mL +0.2 = 1.16 ml









got. VBC = 1.74mL









50% duodecimal = 2.9mL

Combined monomer amass. == f<0609+1.8849 = 2
--
9449

=> l%1. = 29.4mg 31.3
mg used









for a 20 h.








Note

2



BJD0489

started heating to 80C at 14:00 leave at 80C for 20k (10am Wed).

Cool to it, recoding ends, flush with THF., Look at backpressure.

Take sample for ma elemental analysis.












Note

3















735170528

e, n,

.

t
-
419, ma too, ETOH

1.1996, and

-

901.0 mg

1.37mL d: Et oxalate

20 ml ETOH

Note 4









meta
-
selectivity

.

.

JACS 22011, 133, 6964

JACS 2011 (133, 19890

Native 2012 in press Leon, P., Li, G.