Update notebooks

This commit is contained in:
Roope Astala
2018-10-12 14:39:33 -04:00
parent 216aa8b6a1
commit a4792d95ac
67 changed files with 6470 additions and 1610 deletions

View File

@@ -137,17 +137,17 @@
" shuffle = True, random_state = 42,\n",
" remove = remove)\n",
"\n",
"X_train, X_validation, y_train, y_validation = train_test_split(data_train.data, data_train.target, test_size = 0.33, random_state = 42)\n",
"X_train, X_valid, y_train, y_valid = train_test_split(data_train.data, data_train.target, test_size = 0.33, random_state = 42)\n",
"\n",
"\n",
"vectorizer = HashingVectorizer(stop_words = 'english', alternate_sign = False,\n",
" n_features = 2**16)\n",
"X_train = vectorizer.transform(X_train)\n",
"X_validation = vectorizer.transform(X_validation)\n",
"X_valid = vectorizer.transform(X_valid)\n",
"\n",
"summary_df = pd.DataFrame(index = ['No of Samples', 'No of Features'])\n",
"summary_df['Train Set'] = [X_train.shape[0], X_train.shape[1]]\n",
"summary_df['Validation Set'] = [X_validation.shape[0], X_validation.shape[1]]\n",
"summary_df['Validation Set'] = [X_valid.shape[0], X_valid.shape[1]]\n",
"summary_df"
]
},
@@ -188,8 +188,8 @@
" verbosity = logging.INFO,\n",
" X = X_train, \n",
" y = y_train,\n",
" X_valid = X_validation, \n",
" y_valid = y_validation, \n",
" X_valid = X_valid, \n",
" y_valid = y_valid, \n",
" path = project_folder)"
]
},
@@ -197,7 +197,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train the Model\n",
"## Train the Models\n",
"\n",
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
"In this example, we specify `show_output = True` to print currently running iterations to the console."
@@ -266,20 +266,13 @@
"rundata"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
@@ -331,26 +324,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Register the Fitted Model for Deployment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"description = 'AutoML Model'\n",
"tags = None\n",
"local_run.register_model(description = description, tags = tags)\n",
"local_run.model_id # Use this id to deploy the model as a web service in Azure."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing the Fitted Model"
"### Testing the Best Fitted Model"
]
},
{
@@ -360,25 +334,12 @@
"outputs": [],
"source": [
"# Load test data.\n",
"import sklearn\n",
"from pandas_ml import ConfusionMatrix\n",
"\n",
"remove = ('headers', 'footers', 'quotes')\n",
"categories = [\n",
" 'alt.atheism',\n",
" 'talk.religion.misc',\n",
" 'comp.graphics',\n",
" 'sci.space',\n",
"]\n",
"\n",
"\n",
"data_test = fetch_20newsgroups(subset = 'test', categories = categories,\n",
" shuffle = True, random_state = 42,\n",
" remove = remove)\n",
"\n",
"vectorizer = HashingVectorizer(stop_words = 'english', alternate_sign = False,\n",
" n_features = 2**16)\n",
"\n",
"X_test = vectorizer.transform(data_test.data)\n",
"y_test = data_test.target\n",
"\n",
@@ -395,6 +356,11 @@
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",