What is fftpipe? fftpipe is a package of functions that wrap around the base R fft() function. The fftpipe package enables workflows around the fft() function that use the pipe (%>%) operator. I took inspiration for the interface to fftpipe from the Tidyverse and tidymodels packages. Specifically, fftpipe offers the following functionality: Waveform generation, FFT and inverse FFT transformation, Plotting of these waveforms and FFTs. Installation Install fftpipe from GitHub with devtools.

Continue reading

Lennard-Jones Potential Equation $$ V(r) = 4 \epsilon \biggl[ \biggl (\frac{\sigma}{r}\biggr)^{12} - \biggl (\frac{\sigma}{r}\biggr)^6 \biggr] $$ $$ \sigma = \frac{r_m}{2^{1/6}} $$ Here, epsilon is the energy minimum and r_m is the distance of the energy minimum. Note the part of the equation inside the square brackets. Recall that negative energies represent a more favorable interaction. The attractive term (raised to the power of 6) is subtracted from the repulsive term (raised to the power of 12).

Continue reading

R: PDB Data Exploration

PDB Data Exploration The Protein DataBank (PDB) stores files that contain the structure of “proteins, nucleic acids, and complex assemblies.” These structures are essential tools for research in structural biology, biochemistry, and related fields. I was recently browsing Kaggle for datasets and found a scrape of PDB data up to the year 2018 by Shahir. The data included sequence information as well as metadata about those sequences. Kaggle suggests these data as a multi-class classification exercise.

Continue reading

Introduction In my post “R: Deep Learning Organic Chemistry Again,” I trained a convolutional neural network based on VGG16 to recognize a benzene ring diagram, a crucial structure in many organic chemistry molecules. The classification problem I posed to the convnet was a binary classification to separate diagrams of molecules that contain a benzene ring from those that do not. However, near the end of that post, I found images I had mistakenly put in the wrong training and validation folders.

Continue reading

R: Water Potability

Water Potability In this post, I explore a dataset with observations of water sample properties and their corresponding drinkability. The dataset is from Aditya Kadiwal on Kaggle. In this analysis, I compare the performance of logistic, decision tree, and xgboost classification models by tracking each model’s ROC AUC metric. The xgboost model wins. In this post, I use the R tidymodels framework. Tidymodels aims to unify models and modeling engines to streamline machine learning workflows under a consistent interface.

Continue reading

Overview I was looking for a dataset on which to train a binary classifier, and I found data for dairy milk quality prediction on Kaggle. Posted by Shrijayan Rajendran and located at https://www.kaggle.com/datasets/cpluzshrijayan/milkquality, it has a single outcome variable that classifies milk into three qualities: “low,” “medium,” and “high.” Seven predictor variables accompany these quality ratings. Conveniently, there are no missing values in the dataset. Ultimately, I converted the three qualities into new categories with just two types.

Continue reading

Introduction In my post “Python: Deep Learning Organic Chemistry," I trained a convolutional neural network to recognize a diagram of a benzene ring, which is a crucial structure in many organic chemistry molecules. The classification problem I posed to the convnet was a binary classification to separate diagrams of molecules that contain a benzene ring from those that do not. Using Python, TensorFlow, and Keras, my experiment proceeded in three steps:

Continue reading

Author's picture

Alicia Key

I am passionate about data and science.

PhD student

Aurora, Colorado, USA