Split data set
-
train_test_split}
-
In supervised learning you receive a dataset of N elements (N rows) in each row you have X features (column) + 1 or more results y (also column)
-
You can divide the rows into two parts: training and testing.
-
You use the training part to train your model and you use the testing part to check how good your model can predict other values.
-
train_test_split()ofscikit-learncan do this. -
examples/ml/basic_linear_regression_more_data.ipynb
-
fix the seed by setting
random_stateto any fixed non-negative integer -
stratifysplitting for classification of inbalanced datasets