As someone who's lived with sleep apnea my whole life, I've always wondered how my sleep patterns have changed over time. So, I collected two years of CPAP machine data and analyzed it with a Random Forest regression model and a forecasting model—check out what I found!
Pictured: Check out the dashboard visualization!
What Tools?
I developed in  and  within the  IDE, using Pandas and  for data preparation and  for storage. I also used scikit-learn models and Meta's Prophet model as jumping off points.
Python
JupyterLab
AWS SageMaker Studio
AWS Glue
AWS S3
Why This Project?
After completing my AWS AI Practitioner and Associate ML Engineer certifications, I wanted to create a simple machine learning project leveraging AWS services to put my skills into practice. Around the same time, I found out that my CPAP (Continuous Positive Airway Pressure) machine had been tracking and storing my sleep data for the past two years, and I was able to download the bulk of it as a CSV file. So, I decided to use machine learning models to find out (a) what factors contributed to the CPAP company's proprietary “Sleep Score” metric and (b) what my predicted sleep metrics might look like a week into the future.
What is it?
First, I had to clean and normalize the data, as the 600+ nights of data and 40+ features in the CSV file contained a lot of unnecessary information (like the company's DynamoDB specifications). This included dropping rows with missing entries, and choosing features that would produce the best model predictions.
For the regression model predicting my “Sleep Score”, I tested Linear Regression, Random Forest, XGBoost, and Neural Network Regressor models on the data to find the most fitting model for the task—resulting in scikit-learn's Random Forest model performing the best. Check out the GitHub for a more detailed analysis. I then applied Bayesian Optimization to further fine tune the model to my data, resulting in an R² value of 0.99 and RMSE of 0.28.            
Regarding the forecast model, I selected five of the most important sleep metrics (i.e. usage hours, sleep score, etc), and created time series models for each with Meta's Prophet. I also analyzed weekly seasonality trends within the data, leading to interesting discoveries like my usage hours being the lowest on Sundays and highest on Saturdays on average. Finally, I created a dashboard visualization with Streamlit, creating interactive graphs and tables of the model's predictions.
What did I learn?
This project allowed me to apply machine learning techniques such as feature engineering and fine-tuning on a dataset that was personal to me, as well as learn the AWS ecosystem with hands-on experience. Overall, it was rewarding to take an end-to-end approach with a project close to my heart, and I hope that all this data might finally convince me to sleep earlier :).