Blue 3D cube with abstract grid background symbolizing big data

Data Science in Precision Medicine

Data science in precision medicine is a paradigm shift that aims to advance patient health by combining the specializations of mathematics and statistics, computer science, advanced analytics, artificial intelligence (AI), and machine learning (ML) with specific subject matter expertise to uncover actionable insights buried within clinical, public health and research data.

Health information is constantly generated from sources as diverse as patient health records, genomic sequencing, and medical research to environmental events, smart devices, and wearables. Data science in precision medicine involves the art of tracking and analyzing these data points that can then be used to establish a foundation for new breakthroughs in healthcare. 

Our team categorizes data science projects into one or more analytical buckets:

  • Descriptive Analytics - Identify patterns and relationships among variables within a range of data
  • Predictive Analytics - Project future performance using machine learning methods by examining current and historical data
  • Prescriptive Analytics - Determine an optimal course of action by combining predictive analytics with current data 

Our Model

Our model demonstrates the service we provide in creating an integrated data science platform. For every project, we first combine data from a variety of sources such as the electronic medical record, research registries, waveform or other data and text files into a data hub. Then, this data is merged and enriched within the Precision Medicine data hub. The hub gives researchers a warehouse for their data that is ready to be analyzed in any format they want. Upon request, data can also be summarized based on the required parameters of the project. Data from the hub can then be pulled in the form of Tableau reports or raw data to be used for machine learning or biostatistical analysis.

Future Projects

Most clinicians and health informaticians would agree that doctors’ written notes are one of the most important datasets in patient care. As such notes are recorded in natural language, traditional analytics cannot systematically derive meaning from them. Our goal is to apply natural language processing (NLP), a branch of machine learning, and NLP models such Bidirectional Encoder Representations from Transformers (BERT) to distill such notes into pertinent insights in a de-identified fashion that would be instrumental in promoting the next wave of clinical innovation. 

Our Toolkit

Our team’s expertise spans various fields such as data science, data visualization, machine learning, biostatistics, process improvement, nursing informatics, and medical informatics. We use various tools including but not limited to:


DevOps and project management: