mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-20 09:37:04 -05:00
373 lines
13 KiB
Plaintext
373 lines
13 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Train a model using a custom Docker image"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"In this tutorial, learn how to use a custom Docker image when training models with Azure Machine Learning.\n",
|
|
"\n",
|
|
"The example scripts in this article are used to classify pet images by creating a convolutional neural network. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Set up the experiment\n",
|
|
"This section sets up the training experiment by initializing a workspace, creating an experiment, and uploading the training data and training scripts."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Initialize a workspace\n",
|
|
"The Azure Machine Learning workspace is the top-level resource for the service. It provides you with a centralized place to work with all the artifacts you create. In the Python SDK, you can access the workspace artifacts by creating a `workspace` object.\n",
|
|
"\n",
|
|
"Create a workspace object from the config.json file."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Workspace\n",
|
|
"\n",
|
|
"ws = Workspace.from_config()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Prepare scripts\n",
|
|
"Create a directory titled `fastai-example`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"os.makedirs('fastai-example', exist_ok=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Then run the cell below to create the training script train.py in the directory."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%writefile fastai-example/train.py\n",
|
|
"\n",
|
|
"from fastai.vision.all import *\n",
|
|
"\n",
|
|
"path = untar_data(URLs.PETS)\n",
|
|
"path.ls()\n",
|
|
"\n",
|
|
"files = get_image_files(path/\"images\")\n",
|
|
"len(files)\n",
|
|
"\n",
|
|
"#(Path('/home/ashwin/.fastai/data/oxford-iiit-pet/images/yorkshire_terrier_102.jpg'),Path('/home/ashwin/.fastai/data/oxford-iiit-pet/images/great_pyrenees_102.jpg'))\n",
|
|
"\n",
|
|
"def label_func(f): return f[0].isupper()\n",
|
|
"\n",
|
|
"#To get our data ready for a model, we need to put it in a DataLoaders object. Here we have a function that labels using the file names, so we will use ImageDataLoaders.from_name_func. There are other factory methods of ImageDataLoaders that could be more suitable for your problem, so make sure to check them all in vision.data.\n",
|
|
"\n",
|
|
"dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224))\n",
|
|
"\n",
|
|
"#We have passed to this function the directory we're working in, the files we grabbed, our label_func and one last piece as item_tfms: this is a Transform applied on all items of our dataset that will resize each imge to 224 by 224, by using a random crop on the largest dimension to make it a square, then resizing to 224 by 224. If we didn't pass this, we would get an error later as it would be impossible to batch the items together.\n",
|
|
"\n",
|
|
"dls.show_batch()\n",
|
|
"\n",
|
|
"learn = cnn_learner(dls, resnet34, metrics=error_rate)\n",
|
|
"learn.fine_tune(1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Define your environment\n",
|
|
"Create an environment object and enable Docker."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Environment\n",
|
|
"\n",
|
|
"fastai_env = Environment(\"fastai\")\n",
|
|
"fastai_env.docker.enabled = True"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This specified base image supports the fast.ai library which allows for distributed deep learning capabilities. For more information, see the [fast.ai DockerHub](https://hub.docker.com/u/fastdotai). \n",
|
|
"\n",
|
|
"When you are using your custom Docker image, you might already have your Python environment properly set up. In that case, set the `user_managed_dependencies` flag to True in order to leverage your custom image's built-in python environment."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"fastai_env.docker.base_image = \"fastdotai/fastai:2021-02-11\"\n",
|
|
"fastai_env.python.user_managed_dependencies = True"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"To use an image from a private container registry that is not in your workspace, you must use `docker.base_image_registry` to specify the address of the repository as well as a username and password."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"```python\n",
|
|
"fastai_env.docker.base_image_registry.address = \"myregistry.azurecr.io\"\n",
|
|
"fastai_env.docker.base_image_registry.username = \"username\"\n",
|
|
"fastai_env.docker.base_image_registry.password = \"password\"\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"It is also possible to use a custom Dockerfile. Use this approach if you need to install non-Python packages as dependencies and remember to set the base image to None. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Specify docker steps as a string:\n",
|
|
"```python \n",
|
|
"dockerfile = r\"\"\" \\\n",
|
|
"FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04\n",
|
|
"RUN echo \"Hello from custom container!\" \\\n",
|
|
"\"\"\"\n",
|
|
"```\n",
|
|
"Set base image to None, because the image is defined by dockerfile:\n",
|
|
"```python\n",
|
|
"fastai_env.docker.base_image = None \\\n",
|
|
"fastai_env.docker.base_dockerfile = dockerfile\n",
|
|
"```\n",
|
|
"Alternatively, load the string from a file:\n",
|
|
"```python\n",
|
|
"fastai_env.docker.base_image = None \\\n",
|
|
"fastai_env.docker.base_dockerfile = \"./Dockerfile\"\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create or attach existing AmlCompute\n",
|
|
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
|
|
"\n",
|
|
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
|
|
"\n",
|
|
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
|
"\n",
|
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
|
"\n",
|
|
"# choose a name for your cluster\n",
|
|
"cluster_name = \"gpu-cluster\"\n",
|
|
"\n",
|
|
"try:\n",
|
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
|
" print('Found existing compute target.')\n",
|
|
"except ComputeTargetException:\n",
|
|
" print('Creating a new compute target...')\n",
|
|
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
|
|
" max_nodes=4)\n",
|
|
"\n",
|
|
" # create the cluster\n",
|
|
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
|
"\n",
|
|
" compute_target.wait_for_completion(show_output=True)\n",
|
|
"\n",
|
|
"# use get_status() to get a detailed status for the current AmlCompute\n",
|
|
"print(compute_target.get_status().serialize())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create a ScriptRunConfig\n",
|
|
"This ScriptRunConfig will configure your job for execution on the desired compute target."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import ScriptRunConfig\n",
|
|
"\n",
|
|
"fastai_config = ScriptRunConfig(source_directory='fastai-example',\n",
|
|
" script='train.py',\n",
|
|
" compute_target=compute_target,\n",
|
|
" environment=fastai_env)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Submit your run\n",
|
|
"When a training run is submitted using a ScriptRunConfig object, the submit method returns an object of type ScriptRun. The returned ScriptRun object gives you programmatic access to information about the training run. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core import Experiment\n",
|
|
"\n",
|
|
"run = Experiment(ws,'fastai-custom-image').submit(fastai_config)\n",
|
|
"run.wait_for_completion(show_output=True)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "sagopal"
|
|
}
|
|
],
|
|
"category": "training",
|
|
"compute": [
|
|
"AML Compute"
|
|
],
|
|
"datasets": [
|
|
"Oxford IIIT Pet"
|
|
],
|
|
"deployment": [
|
|
"None"
|
|
],
|
|
"exclude_from_index": false,
|
|
"framework": [
|
|
"Pytorch"
|
|
],
|
|
"friendly_name": "Train a model with a custom Docker image",
|
|
"index_order": 1,
|
|
"kernelspec": {
|
|
"display_name": "Python 3.8 - AzureML",
|
|
"language": "python",
|
|
"name": "python38-azureml"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.7"
|
|
},
|
|
"nteract": {
|
|
"version": "nteract-front-end@1.0.0"
|
|
},
|
|
"tags": [
|
|
"None"
|
|
],
|
|
"task": "Train with custom Docker image"
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
} |