Week 22: How to fit a model
Last week, I tackled Isotonic Regression in Python.
I used an amalgamated dataset including US News Top Colleges and College Scorecard that Caitlin cleaned for me :). Briefly, I'll discuss the process I ended up using in the script and post a pretty error picture.
The reason this post is brief is because I spent a lot of time getting acquainted with Python and not much time doing anything effective. I now have a working script, but I spent most of the week staring at code and asking Caitlin to explain everything to me. I even went as far to harass my Probability professor to explain isotonic regression and ranking to me. Fortunately, he happened to do his thesis in a related field, and he is sending me some articles to help me out.
The first thing I do after loading the cleaned data is split it into training, calibration, and testing subsets. I trained a linear regression model on the training data, and then used that trained model to get predicted outcomes with the test data. I also used the model to get predicted outcomes on the calibration data. I then used the predicted outcome and the true outcome from the calibration data to train an isotonic regressor. With the trained isotonic regressor, I transformed my predicted outcomes on the test data to get calibrated predicted outcomes. Here's the resulting plot of mean squared error in the calibrated model.
Since there's no other plot in this post to compare it to, it doesn't really make much sense. However, we will be able to compare this plot to an uncalibrated predicted outcome as well as calibrated outcomes using other methods to see which one fits the model the best.
Results!!
I used an amalgamated dataset including US News Top Colleges and College Scorecard that Caitlin cleaned for me :). Briefly, I'll discuss the process I ended up using in the script and post a pretty error picture.
The reason this post is brief is because I spent a lot of time getting acquainted with Python and not much time doing anything effective. I now have a working script, but I spent most of the week staring at code and asking Caitlin to explain everything to me. I even went as far to harass my Probability professor to explain isotonic regression and ranking to me. Fortunately, he happened to do his thesis in a related field, and he is sending me some articles to help me out.
The first thing I do after loading the cleaned data is split it into training, calibration, and testing subsets. I trained a linear regression model on the training data, and then used that trained model to get predicted outcomes with the test data. I also used the model to get predicted outcomes on the calibration data. I then used the predicted outcome and the true outcome from the calibration data to train an isotonic regressor. With the trained isotonic regressor, I transformed my predicted outcomes on the test data to get calibrated predicted outcomes. Here's the resulting plot of mean squared error in the calibrated model.
Since there's no other plot in this post to compare it to, it doesn't really make much sense. However, we will be able to compare this plot to an uncalibrated predicted outcome as well as calibrated outcomes using other methods to see which one fits the model the best.
Results!!
Comments
Post a Comment