yakub sebastian_visit_ntu_201405016v4
Post on 16-Apr-2017
160 Views
Preview:
TRANSCRIPT
Information Technology
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
Uncovering hidden connections in scientific literature: From the informatics and complexity science perspectives
Yakub Sebastian
16th May 2014
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 2
Agenda
1 Hidden connections
2 Literature based discovery
3 Cluster link prediction
4 Collaboration and feedback
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 3
‘The return of the prodigal son’ by Rembrandt (Wood, J. 2012. Euresis Journal. 2, 5-7)
Source: http://uploads6.wikipaintings.org/images/rembrandt/the-return-of-the-prodigal-son-1669.jpg
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 4
Look more closely
Source: http://upload.wikimedia.org/wikipedia/commons/8/8d/Rembrandt_Harmensz._van_Rijn_-
_The_Return_of_the_Prodigal_Son_-_detail_son.jpg
Female hand
(mercy)
Male hand
(justice)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 5
Hidden Connections
‘The whole is greater than the sum of its parts’ – Aristotle (?)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 6
Hidden Connections
Conference: Hidden Connections 3 - 5 March 2014, Complexity Program, NTU
Brian Uzzi
“… the highest-impact science is
primarily grounded in exceptionally
conventional combinations of prior
work yet simultaneously features an
intrusion of unusual combination. “
(Uzzi, B. et al. 2013. Science. 342, 6157, 468-472)
Novelty → the pairing of two
conventional ideas that have never
been put together before. (Uzzi, B. 2014. Complexity Program Annual Conference:
Hidden Connections. 3-5 Mar, Nanyang Technological
University, Singapore)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 7
Hidden Connections
Association: ‘The forgotten half of scientific thinking’ – Marten Scheffer
“… thinking has two complementary
modes: roughly, association versus
reasoning … . We systematically
underestimate the role of the first …”
“How can we feed the associative
machine in our brain with potential
elements for such unexpected links?
This is a tantalizing problem,
because if the connection should be
unexpected one cannot plan for it.”
(Scheffer, M. 2014. PNAS. 111, 17, 6119)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 8
Agenda
1 Hidden connections
2 Literature based discovery
3 Cluster link prediction
4 Collaboration and feedback
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 9
Literature Based Discovery
Literature Based Discovery (LBD)
uses computational algorithms to discover potential hidden connections
between previously disconnected sets of literature.
(Smalheiser, N. 2012. JASIST. 63, 2, 218-224)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 10
Literature Based Discovery
A finding very similar to Uzzi’s result has been reported by a
scientometrician in 2012.
Novel connections established by [Watts, D.J. and Strogatz, S.H. 1998. Nature. 393,
6684, 440-443].
“An article that introduces novel
connections between clusters of co-
cited references is likely to
subsequently become highly cited.” (Chen, C. 2012. JASIST. 63, 3, 431-449)
Brian’s unawareness of Chen’s work
is self-exemplary of a hidden
connection itself!
(personal communication)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 11
Literature Based Discovery
Example 1.
Fish Oil and Raynaud’s
Syndrome hidden connection
(Swanson, D.R.. 1986. Perspectives in Biology and
Medicines.. 30, 1, 7-18)
These literature are:
A. Non-interactive
B. Complementary {A} → Fish oil disrupts blood
viscosity
{C} → Blood viscosity causes
Raynaud’s Syndrome
Illustration: Torvik, V.I. and Smalheiser , N. 2007.
Bioinformatics. 23, 13, 1658-1665.
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 12
Literature Based Discovery
(Swanson, D.R. and Smalheiser, N. 1997. Artificial
Intelligence. 91, 2, 183-203)
Example 2.
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 13
Literature Based Discovery
Content
Structure
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 14
Agenda
1 Hidden connections
2 Literature based discovery
3 Cluster link prediction
4 Collaboration and feedback
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 15
Cluster Link Prediction
pre-discovery (1900 – 1985)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 16
Cluster Link Prediction
post-discovery (1900 – 1986)
Novel inter-cluster links
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 17
Cluster Link Prediction
Observations
A. Potential hidden connections between disparate scientific fields might
be found among non-overlapping clusters that:
do not have existing links, but
whose member nodes exhibit a high propensity to converge.
B. The linking of these clusters involves the novel pairing of
conventional ideas that have never been put together before.
C. As demonstrated in Swanson’s case, such novel pairing does result in
a scientific breakthrough.
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 18
Cluster Link Prediction
Conjectures
A. A search for hidden connections in literature can be re-formulated as
a cluster link prediction problem.
B. One may better predict inter-cluster link formation using a
combination of (a) content-based analysis (semantic) and (b)
structural analysis.
C. Inter-cluster links may emerge as result of the dynamics in the
complex systems of citation networks. This exposes the cluster link
prediction problem to a whole range of methods and tools in
complexity science.
(Newman, M.E.J. 2001. PNAS. 98, 2, 404-409)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 19
Cluster Link Prediction
Bibliographic coupling (shared references) network
during pre-discovery (1900 – 1985) period.
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 20
Cluster Link Prediction
Research questions
RQ1: How do we group scientific papers into clusters of distinct
research areas?
a. Many existing algorithms
b. Performance
c. Ground-truth
RQ2: How do we predict the future formation of links between nodes
in previously disconnected clusters?
a. Features
b. Algorithm
c. Interestingness
(not every inter-cluster link means a scientific breakthrough)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 21
Cluster Link Prediction
Research questions
RQ1: How do we group scientific papers into clusters of distinct
research areas? Earlier works:
Chen, P. and Redner, S. 2010. Journal of Informetrics. 4, 3, 278-290. physics
Waltman, L. and van Eck, N.J. 2012. JASIST. 63, 12, 2378-2392.
Chen, C. 2012. JASIST. 63, 2, 431-449. scientometrics
Boyack, K.W. and Klavans, R. 2014. JASIST. 65, 4, 670-685.
Community detection algorithms. (Fortunato, S. 2010. Physics Reports. 486, 3, 75-174)
Evaluation. (Lancichinetti, A. and Fortunato, S. 2009. Phys. Rev. E. 80, 5, 056117)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 22
(Newman, M.E.J. and Girvan, M. 2004. Phys. Rev. E. 69, 2, 026113)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 23
(1) (2) (3) (4)
Algorithm 1, 2:
Lancichinetti, A. and Fortunato, S. 2009. Phys. Rev. E. 80, 5, 056117
Algorithm 3:
Waltman, L. and van Eck, N.J. 2012. JASIST. 63, 12, 2378-2392
Algorithm 4:
Traag, V.A., Van Dooren, P. and Nesterov, Y. 2011. Phys. Rev. E. 84, 1, 016114
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 24
Consistent with the result reported in (Lancichinetti, A. and Fortunato, S. 2009. Phys. Rev. E. 80, 5, 056117).
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 25
(Park, J. and Newman, M.E.J. 2005. J. Stat. Mech. Theory. Exp. 10, P10014)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 26
(1) (2) (3) (4)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 27
Again, consistent with the result reported in (Lancichinetti, A. and Fortunato, S. 2009. Phys. Rev. E. 80, 5,
056117).
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 28
Cluster Link Prediction
Future work
A. Apply INFOMAP on citation data sets of American Physical Society1
(Rosvall, M. and Bergstrom, C.T. 2008. PNAS. 105, 4, 1118-1123)
Size : > 450,000 articles
Years : 1893 – 2010
Coverage : Physical Review Letters
Physical Review
Reviews of Modern Physics 1https://publish.aps.org/datasets
B. Evaluate cluster quality.
Ground truth? Suitable metrics?
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 29
Cluster Link Prediction
Future work
C. RQ2: How do we predict the future formation of links between nodes
in previously disconnected clusters?
Latent Domain Similarity (LDS)
Assumption: Different literature could have been published separately in seemingly unrelated
fields. It is possible that they share many similar domains previously unknown to
researchers in each field (i.e. latent).
Goal: To explore whether these shared latent domains correlate with the probability
of previously disconnected clusters to form future citation links with each
other.
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 30
Cluster Link Prediction
Future work
Topic modeling (Blei, D.M., Ng, A.Y. and Jordan, M.I. 2003. J. Mach. Learn. Res. 3, 993-1022)
Approach: content analysis (conventional) + structural analysis
Recent example: Lancichinetti, A., Sirer, M.I., Wang, J.X., Acuna, D., Körding, K. and Amaral, L.A.N.
2014. arXiv:1402.0422v1
Evaluation benchmark (?)
PRL Milestone papers (1958-2008), including 40 Nobel Prize papers.
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 31
Agenda
1 Hidden connections
2 Literature based discovery
3 Cluster link prediction
4 Collaboration and feedback
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 32
Collaboration and feedback
The main purpose of the current visit.
Contributions to the Complexity Program @ NTU: New added dimension to the current complexity studies
Potential shared publications
Benefits to my research:
Developing a new complexity science-oriented LBD method
Access expertise and resources in complexity science and
physics
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 33
Conference: Hidden Connections 3 - 5 March 2014, Complexity Program, NTU
Brian Uzzi
“… the highest-impact science is
primarily grounded in exceptionally
conventional combinations of prior
work yet simultaneously features an
intrusion of unusual combination. “
(Uzzi, B. et al. 2013. Science. 342, 6157, 468-472)
Novelty → the pairing of two
conventional ideas that have never
been put together before. (Uzzi, B. 2014. Complexity Program Annual Conference:
Hidden Connections. 3-5 Mar, Nanyang Technological
University, Singapore)
Monash University Malaysia is jointly owned by Monash University and the Jeffrey Cheah Foundation
16th May 2014 Research presentation at Nanyang Technological University 34
Thank you
Yakub Sebastian
PhD Candidate
School of Information Technology
Monash University Malaysia
Jalan Lagoon Selatan
46150 Bandar Sunway
Petaling Jaya, Selangor, Malaysia
Email: yakub.sebastian@monash.edu
top related