R: Fast Fourier Transforms with fftpipe

December 31, 2022 in R

What is fftpipe? fftpipe is a package of functions that wrap around the base R fft() function. The fftpipe package enables workflows around the fft() function that use the pipe (%>%) operator. I took inspiration for the interface to fftpipe from the Tidyverse and tidymodels packages. Specifically, fftpipe offers the following functionality: Waveform generation, FFT and inverse FFT transformation, Plotting of these waveforms and FFTs. Installation Install fftpipe from GitHub with devtools.

R: PDB Data Exploration

August 19, 2022 in R

PDB Data Exploration The Protein DataBank (PDB) stores files that contain the structure of “proteins, nucleic acids, and complex assemblies.” These structures are essential tools for research in structural biology, biochemistry, and related fields. I was recently browsing Kaggle for datasets and found a scrape of PDB data up to the year 2018 by Shahir. The data included sequence information as well as metadata about those sequences. Kaggle suggests these data as a multi-class classification exercise.

R: Deep Learning Organic Chemistry Part 3

August 15, 2022 in R

Introduction In my post “R: Deep Learning Organic Chemistry Again,” I trained a convolutional neural network based on VGG16 to recognize a benzene ring diagram, a crucial structure in many organic chemistry molecules. The classification problem I posed to the convnet was a binary classification to separate diagrams of molecules that contain a benzene ring from those that do not. However, near the end of that post, I found images I had mistakenly put in the wrong training and validation folders.

R: Water Potability

August 12, 2022 in R

Water Potability In this post, I explore a dataset with observations of water sample properties and their corresponding drinkability. The dataset is from Aditya Kadiwal on Kaggle. In this analysis, I compare the performance of logistic, decision tree, and xgboost classification models by tracking each model’s ROC AUC metric. The xgboost model wins. In this post, I use the R tidymodels framework. Tidymodels aims to unify models and modeling engines to streamline machine learning workflows under a consistent interface.

R: Milk quality binary classification

August 10, 2022 in R

Overview I was looking for a dataset on which to train a binary classifier, and I found data for dairy milk quality prediction on Kaggle. Posted by Shrijayan Rajendran and located at https://www.kaggle.com/datasets/cpluzshrijayan/milkquality, it has a single outcome variable that classifies milk into three qualities: “low,” “medium,” and “high.” Seven predictor variables accompany these quality ratings. Conveniently, there are no missing values in the dataset. Ultimately, I converted the three qualities into new categories with just two types.

R: Deep Learning Organic Chemistry Again

January 1, 2022 in R

Introduction In my post “Python: Deep Learning Organic Chemistry," I trained a convolutional neural network to recognize a diagram of a benzene ring, which is a crucial structure in many organic chemistry molecules. The classification problem I posed to the convnet was a binary classification to separate diagrams of molecules that contain a benzene ring from those that do not. Using Python, TensorFlow, and Keras, my experiment proceeded in three steps:

R: Solubility Clustering

April 25, 2021 in R

In my prior aqueous solubility regression study, I did an exploratory data visualization and found intriguing plots of solubility versus other variables in the study. I didn’t perform any experimental modeling of those relationships in that study. Here, I followup by performing a cluster analysis of solubility relationships to help future regression modeling efforts. My question is: do clusters within each of these relationships explain each feature’s effect on solubility?

R: Fast Fourier Transforms with fftpipe

R: PDB Data Exploration

R: Deep Learning Organic Chemistry Part 3

R: Water Potability

R: Milk quality binary classification

R: Deep Learning Organic Chemistry Again

R: Solubility Clustering

Alicia Key