Magerman - APE-INV

messengerrushBiotechnology

Feb 22, 2013 (4 years and 8 months ago)

299 views

In search of anti
-
commons: Academic patenting and
patent
-
paper pairs in biotechnology. An analysis of
citation flows.

Tom Magerman, Bart Van Looy, Koenraad Debackere

(tom.magerman@econ.kuleuven.be)


INCENTIM (International Centre for Studies in Entrepreneurship and Innovation Management)

K.U.Leuven Managerial Economics, Strategy & Innovation

ECOOM (Centre for R&D Monitoring)


ESF
-
APE
-
INV workshop
Scientists

&
Inventors

10
-
11/5/2012

1957

TECHNOLOGY

SCIENCE

University
-
Industry

linkages

University
-
Industry

linkages


Scientification


of
technology

Commercialization


of
science

(
E
ntrepreneurial


University)

University
-
Industry linkages



Complementarities

Generation

of new
research
ideas

Additional

funding

Create

a market of
ideas

+

University
-
Industry linkages



Complementarities

Generation

of new
research
ideas

Additional

funding

Create

a market of
ideas

+

Crowding out

Quality

Research
orientation

Anti
-
commons

and

the end of open
science

-

University
-
Industry

linkages

Scientification


of
technology

Commercialization


of
science

(
E
ntrepreneurial


University)


Anti
-
commons

and

the end of open
science

If

I have
seen

a
little

further

[
then

you

and

Descartes]

it

is
by

standing on the
shoulders

of Giants.

Isaac Newton, letter
to

Robert
Hoode

(
originated

from

John of Salisbury)

Anti
-
commons

and

the end of open
science

Anti
-
commons and the end of open science

Tragedy of the
anticommons
: underuse of scarce resources because too many
owners can block each other

=> more intellectual property
rights may lead paradoxically to fewer useful
products


O
n the one hand incentive to undertake risky research

On the
other hand too many
owners hold
rights in previous discoveries that
constitute obstacles
to future research

=> high transaction costs lead to inefficiencies


Biomedical research
has been moving from
a commons
model toward a
privatization
model

=>
r
isc

of
anticommons

tragedy

Influenced by patent system: what is patentable (e.g. patents on gene fragments)

Influenced by patent owner: licensing behavior (e.g. use of reach
-
through license
agreements)


Transition or tragedy? Find ways to lower transactions costs of bundling rights
(intermediate organizations; patent pools; cross
-
licensing)

8/09/2011

Tom Magerman


ENID 2011

17

Anti
-
commons and the end of open science

Expansion of IPR is privatizing the scientific commons and limiting scientific
progress


Heller and Eisenberg (1998); Argyres and Liebskind (1998); David (2000);
Lessig (2002); Etzkowitz (1998); Krimsky (2003)


Murray and Stern (2007): “Do formal intellectual property rights hinder the
free flow of scientific knowledge? An empirical test of the anti
-
commons
hypothesis”


How does IPRs affect propensity of future researchers to build upon
knowledge?


Compare citation patterns of publications in pre
-
grant period and after
grant


169 patent
-
paper pairs (Nature Biotechnology)


Modest anti
-
commons effect: decline in citation rate by 10 to 20%

Detection of patent
-
publication pairs

Text Mining

Text mining refers to the automated
extraction of knowledge and information
from text by means of revealing relationships
and patterns present, but not obvious, in a
document collection.


Related
to data mining, but additional issues:


other scale of dimensionality (100,000+
‘variables’)


different kind of variables (not really
independent, and very, very sparse


99.99
%)


language issues (homonymy/polysemy
and synonymy)



Latent
Semantic

Analysis (LSA)

LSA was developed late 1980s at
BellCore
/
Bell Laboratories by
Landauer

and his team of
Cognitive Science Research:

“Latent Semantic Analysis (LSA) is a theory and method for extracting and representing
the meaning of words. Meaning is estimated using statistical computations applied to a
large corpus of text. The corpus embodies a set of mutual constraints that largely
determine the semantic similarity of words and sets of words. These constraints can be
solved using linear algebra methods, in particular, singular value decomposition.”



LSA is a technique for analyzing text: extract (underlying or latent) meaning from text


LSA is a theory of meaning: meaning is acquired by solving an enormous set of
simultaneous equations that capture the contextual usage of words


LSA is a new approach to cognitive science: use large text corpora to test cognitive
theories

Linear algebra problem

The meaning of passages of text must be
sums of the meaning of its words.


LSA
models a large corpus of text as a
large set of simultaneous equations.


The
solution is in the form of a set of
vectors, one for each word and passage,
in a semantic space


Similarity
of meaning of two words is
measured by the cosine between the
vectors, and the similarity of two
passages as the same measure on the
sum or average of all its contained words

SVD dimensionality reduction

Singular Value Decomposition rank
-
k

approximation:







Dimensionality reduction by taking first k singular values:






with a diagonal matrix of singular values

T
V
U
A





)
...
(
2
2
2
2
1
n






n
k
k
k
k
m
n
m
k
n
m
V
U
A
A
A









.
.

Practical application?

SVD

truncation

Term
weighting

Pre
-

processing

Even when using
LSA/SVD as text
mining method,
many options
remain!


Assessment of 40 measure variants

4
weighting
methods

9 SVD
truncation
levels +
no SVD

40 similarity
measures
based on
SVD and
cosine


Full process

Construct
DbT

matrix

Create full text
index with stop
word removal and
stemming (Lucene)

Convert full text
index to document
-
by
-
term matrix
(
Matlab
)

Weight DbT matrix
(4 variants)

SVD
truncation

Decompose
weighted DbT
matrix into U

V
using 1,000 largest
singular values

Generate document

by
-
concept matrix
V


Truncate document
-
by
-
concept matrix
(take first 1000,
500, …, 5 concepts)

Similarity
calculation

Normalise DbT and
DbC matrices

Calculate distance
matrix (all patents
to all publications)
by calculating inner
product of vectors

Retain closest
publication for every
patent for all of the
43 variants

Expert validation

M
easure



M
easure



RAW

No SVD

0.61

TF
-
IDF

No SVD

0.7
1

SVD 1000

0.34

SVD 1000

0.
45

SVD 500

0.31

SVD 500

0.
34

SVD 300

0.30

SVD 300

0.
26

SVD 200

0.31

SVD 200

0.21

SVD 100

0.30

SVD 100

0.17

SVD 25

0.22

SVD 25

0.14

SVD 5

0.11

SVD 5

0.11

BIN

No SVD

0.
77

IDF

No SVD

0.80

SVD 1000

0.65

SVD 1000

0.63

SVD 500

0.63

SVD 500

0.57

SVD 300

0.58

SVD 300

0.54

SVD 200

0.
51

SVD 200

0.51

SVD 100

0.45

SVD 100

0.49

SVD 25

0.38

SVD 25

0.46

SVD 5

0.20

SVD 5

0.21

Common terms (weighted by min number of terms)

0.82

Common terms (weighted by max number of terms)

0.68

Common terms (weighted by

avg number of terms)

0.75


University
-
Industry

linkages

Scientification


of
technology

Commercialization


of
science

(
E
ntrepreneurial


University)


Methodology and data

Publication data

Selection of biotechnology publications from the Web of Science
based on the subject classification (1991
-
2008):



Core set of 243,361 publications :
subject category Biotechnology &
Applied Microbiology


Extended set of 683,674 publications :
publications of following subject
categories citing or cited by a publication of the core set: Biochemical Research
Methods; Biochemistry & Molecular Biology; Biophysics; Plant sciences; Cell
Biology; Developmental Biology; Food sciences & Technology; Genetics &
Heredity; Microbiology Materials


Multidisciplinary set of 97,970 publications

: publications from
multidisciplinary journals Nature; Science; and Proceedings of the National
Academy of Sciences of the United States of America


1,025,005 publications in total (948,432 suited for text mining)


478,361 publications published between 1991 and 2000

Methodology and data

Patent data




Selection of all granted EPO and USPTO biotechnology patents,
applied for between 1991 and 2008, from PATSTAT using IPC
-
codes as listed in OECD definition of biotechnology (‘A Framework
for Biotechnology Statistics’, OECD, Paris, 2005)


27,241 EPO patents and 91,775 USPTO patents


119,016 patents in total (88,248 suited for text mining)

Methodology and data

Matching

Original document
combinations
:

83,697,227,136 patent
-
publication combinations

CommonTermsMin

≥ 0.60:

27,250 patent
-
publication combinations

And
CommonTermsMax

≥ 0.30:

645
patent
-
publication

combinations

And at least one shared inventor/author:

584 patent
-
publication pairs

Methodology and data

Pairs

584 patent
-
publication pairs identified


17 patent linked to multiple publications (up to 3)


115 publications linked to multiple patents (up to 7) (patent families)


566 distinct patents paired with publication


400 distinct publications paired with patent

Patentee type


292 University


128
Government

/ Non
profit


126 Company


38
Hospital


21
Individual

(42 patents have multiple
patentees

from

different sectors)

Publication and citation numbers

Citation analysis

Match publications to deal with quality differences

Paired and non
-
paired publications matched by year and journal (1991
-
2000)

PAIRS

NONPAIRS

VY

SO

PUB

AVG_AU

AVG_CIT

PUB

AVG_AU

AVG_CIT

1991

BIOCHEMISTRY

1

5.00

65.00

625

4.03

57.20

1991

BIOTECHNIQUES

1

2.00

64.00

125

3.24

40.27

















1992

BIOSCIENCE BIOTECH AND BIOCHEMISTRY

1

2.00

4.00

543

4.24

8.07

1992

BIOTECHNIQUES

1

4.00

147.00

144

3.07

26.17

















Total



328

5.18

130.47

117,909

4.42

67.03

328
paired

publications

versus 106,027
biotechnology

publications

Before and after publication and grant

V
ariable

C
lass

N

L
ower

cl

mean

M
ean

U
pper

cl

mean

Ratio average citations
pairs/non
-
pairs

Pre
-
gra
nt

288

1.42

1.71

2.00

Ratio average citations
pairs/non
-

pairs

Post
-
grant

288

1.48

1.74

2.00

Diff

(1
-
2)


-
0.43

-
0.03

0.36


T
-
TESTS

V
ariable

M
ethod

V
ariances

DF

t

value

Pr > |t
|

Ratio average citations
pairs/non
-
pairs

Pooled

Equal

574

-
0.17

0.8666

Rat
io average citations
pairs/non
-
pairs

Satterthwaite

Unequal

565

-
0.17

0.8666


EQUALITY OF VARIANCES

V
ariable

M
ethod

N
um DF

D
en

DF

F
value

P
r

> F

Ratio average citations
pairs/non
-
pairs

Folded F

287

287

1.29

0.0299


Paired sample t
-
tests

Test


N

Mean 1

Mean 2

Difference

t value

Pr > |t|

P
aired

vs
non
-
paired

F
orward
citations

190

130.47

74.24

56.23

4.33

0.0001

Without
self
citations

190

116.01

65.02

50.99

4.07

0.0001

P
aired

vs
non
-
paired

(at
least 2 paired
publications)

F
orward
citations

59

224.97

131.63

93.34

3.12

0.0028

Without

self
citations

59

202.7

117.88

84.82

2.97

0.0043

P
aired
and
grey
zone

vs
all others

F
orward
citations

764

60.57

42.69

17.88

5.72

0.0001

Without

self
citations

76
4

53.09

36.48

16.61

5.59

0.0001

P
aired

and
grey
zone

vs
all others

(at least 2 paired
or grey zone
publications)

F
orward
citations

281

96.41

59.64

36.77

5.57

0.0001

Without

self
citations

281

85.85

51.76

34.09

5.43

0.0001


Multivariate analysis (
negative

binomial
)

Parameter

B

S
td.
Error

95% Wald
Confidence Interval

Lower

-

Upper

Hypothesis Test

Wald Chi
-
Square

df

Sig.

(Intercept)

2.966

.1258

2.719

3.213

555.643

1

.000

Pair (Y/N)

.450

.0506

.350

.549

78.945

1

.000

Document type:









Article

-
.574

.0113

-
.
596

-
.552

2589.688

1

.000


Letter

-
.774

.0590

-
.890

-
.659

172.469

1

.000


Note

-
.567

.0175

-
.601

-
.533

1051.989

1

.000


Review

0

.

.

.

.

.

.

Number of backward
publication citations

.013

.0001

.013

.014

10416.453

1

.000

Number of authors

.
033

.0005

.032

.034

4613.407

1

.000

Time

.125

.0015

.122

.128

7191.199

1

.000

Time²

-
.012

.0001

-
.013

-
.012

29450.994

1

.000

Journal dummies
(n=104)

Included


Sector analysis

Pub

sector

P
at

sector

N

Mean

Median

Var

SD

COM

COM

21

71.6

34.0

5,999.6

77.5

KGI

COM

25

70.5

49.0

3,212.6

56.7

KGI+COM

COM

15

106.7

80.0

18,605.8

136.4

KGI

KGI

227

179.2

67.0

95,544.4

309.1

KGI+COM

KGI

16

282.0

131.5

231,467.6

481.1

KGI

KGI+COM

6

219.2

93.5

66,633.4

258.1

KGI+COM

KGI+COM

5

85.0

67.0

3,546.5

59.6





315

164.4

66.0

84,846.9

291.3


Parameter

B

Std.
Error

z

P>z

[95% Conf.

Interval]

(Intercept)

4.326

0.292

14.800

0.000

3.753

4.899

Document type:







Article







Note

0.114

0.524

0.220

0.827

-
0.913

1.141

Review

0.309

1.130

0.270

0.784

-
1.905

2.523

Number of backward
publication citations

0.046

0.008

5.990

0.000

0.031

0.061

Number of authors

0.141

0.019

7.350

0.000

0.103

0.179

Pat sector:







KGI

0.000

.

.

.

.

.

COM

-
0.627

0.206

-
3.050

0.002

-
1.030

-
0.223

KGI+COM

-
0.917

0.355

-
2.590

0.010

-
1.612

-
0.222

Aff sector







KGI

0.000

.

.

.

.

.

COM

0.051

0.314

0.160

0.870

-
0.563

0.666

KGI+COM

0.176

0.214

0.820

0.413

-
0.245

0.596

Time

-
0.301

0.122

-
2.470

0.013

-
0.539

-
0.063

Time²

0.015

0.010

1.420

0.156

-
0.006

0.035


Sector analysis

Sector analysis

THE REGENTS OF THE UNIVERSITY OF CALIFORNIA

US

26

THE JOHNS HOPKINS UNIVERSITY

US

26

THE SALK INSTITUTE FOR BIOLOGICAL STUDIES

US

15

BOARD OF REGENTS, THE UNIVERSITY OF TEXAS SYSTEM

US

12

THE SCRIPPS RESEARCH INSTITUTE

US

10

THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIVERSITY

US

9

JOHNS HOPKINS UNIVERSITY

US

9

CITY OF HOPE

US

8

PRESIDENT AND FELLOWS OF HARVARD COLLEGE

US

8

WASHINGTON UNIVERSITY

US

8

INSTITUT PASTEUR

FR

8

THE ROCKEFELLER UNIVERSITY

US

7

THE UNITED STATES OF AMERICA AS REPRESENTED BY THE SECRETARY OF AGRICULTURE

US

7

THE UNITED STATES OF AMERICA AS REPRESENTED BY THE DEPARTMENT OF HEALTH

US

7

UNIVERSITY OF UTAH RESEARCH FOUNDATION

US

7

OKLAHOMA MEDICAL RESEARCH FOUNDATION

US

6

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

US

6

THE UNITED STATES OF AMERICA AS REPRESENTED BY THE SECRETARY OF

US

6

THE JOHNS HOPKINS UNIVERSITY SCHOOL OF MEDICINE

US

6

ST. JUDE CHILDREN'S RESEARCH HOSPITAL

US

6

Conclusions

science
-
technology

interactions



We do not observe lower citation rates for
publications that are part of a patent
application (nor before and after grant, nor
matched by journal, nor matched by author)


Significant impact of
KGIs

at the patent side




We miss patent
-
publication

pairs


Dig

deeper

into

the sector
dynamics


Citation

patterns

are
only

one

aspect of the
diffusion

of
knowledge

Overview

In search of anti
-
commons: Academic patenting and
patent
-
paper pairs in biotechnology. An analysis of
citation flows.

Tom Magerman, Bart Van Looy, Koenraad Debackere

(tom.magerman@econ.kuleuven.be)


INCENTIM (International Centre for Studies in Entrepreneurship and Innovation Management)

K.U.Leuven Managerial Economics, Strategy & Innovation

ECOOM (Centre for R&D Monitoring)


ESF
-
APE
-
INV workshop
Scientists

&
Inventors

10
-
11/5/2012