Efficient data retrieval becomes increasingly crucial in the era of Big Data. Within the data, around 80%~90% potentially useful business information is presented in unstructured form, such as text, images etc (C. Shilakes, J Tylman, Enterprise Information Portals, Merrill Lynch, 1998). In biological domain, MEDLINE has exceeded 22 million records in 2013. However, the interdisciplinary study of text mining based on Natural Language Processing (NLP), statistics, data mining etc. has not fully released its potential. Identifying entities from scientific literatures is a relatively mature technology. To dissect their relationships at different level (from molecule to organism, from causative gene to disease), and transform the relation network into semantic representation is more demanded and more informative for answering various questions in biology, pathology and pharmacology.

I currently work in Massachusetts Institute of Technology. I received the PhD from the University of Cambridge and EMBL-EBI. My research interest is to teach computers to read the scientific literatures and help biologists. Specifically, it focuses on using text mining to collect evidential statements of diseaseful effects for ultimately resolving gene-disease associations.

chnli AT mit DOT edu