When I was teaching introductory Python to scientists and engineers getting their start in data science, I would often get the question: “Why aren’t you teaching this course using R?” At the time, I didn’t have a satisfactory answer. At the time, I didn’t know any R, so I gave a hand-wavy response of “Python is more comfortable to integrate into a larger software ecosystem.” Then, I would proceed to teach in Python.

Continue reading

Molecular dynamics models the motion of atoms within molecules using classical mechanics. Many resources exist online and in print on molecular mechanics. I wanted to learn more about molecular mechanics by implementing it in Python code. Yet, when I searched for resources to lead me in writing my code, I found them scattered online. Pulling them together into a cohesive whole was difficult. In this post, I make a simple molecular dynamics simulation using velocity Verlet integration in Python and compare its results to empirical and analytical values.

Continue reading

Introduction Convolutional neural networks (CNNs) are a deep learning technology to use for classifying images. For this demonstration I used images from an introductory organic chemistry class. My problem was one of binary classification: could the CNN distinguish images with a structure called a benzene ring from images without a benzene ring? While I encountered challenges of working with a small dataset (with 205 images in each class), I did train the CNN to 73% accuracy of test data.

Continue reading

Aqueous solubility (ability to dissolve in water) is an important property of a chemical compound that is important in the laboratory. While it is possible to determine these solubilities through physical experiments, let’s assume for this tiny project that such experiments are prohibitively expensive. This presents an interesting predictive modeling problem: given a known chemical structure, can the aqueous solubility of a compound be predicted without physical experiments? This was the question proposed by Delaney in 2004 (Delaney, 2004) in a study that created a simple regression model that took SMILES strings (a data format to store the structure of chemical compounds), extracted features from these data and created a regression model to predict solubility.

Continue reading

This is an animation of a molecular dynamics trajectory of a hemoglobin. See https://www.rcsb.org/structure/5EUI for the original PDB file, and watch the animation above.

Continue reading

Author's picture

Alicia Key

I am passionate about data and science.

PhD student

Aurora, Colorado, USA