Compare commits

..

57 Commits

Author SHA1 Message Date
amlrelsa-ms
f9892966fd update samples from Release-108 as a part of 1.35.0 SDK stable release 2021-10-11 15:36:08 +00:00
Harneet Virk
e2dddfde85 Merge pull request #1601 from Azure/release_update/Release-114
update samples from Release-114 as a part of  SDK release
2021-09-29 14:21:59 -07:00
amlrelsa-ms
36d96f96ec update samples from Release-114 as a part of SDK release 2021-09-29 20:16:51 +00:00
Harneet Virk
7ebcfea5a3 Merge pull request #1600 from Azure/release_update/Release-113
update samples from Release-113 as a part of  SDK release
2021-09-28 12:53:57 -07:00
amlrelsa-ms
b20bfed33a update samples from Release-113 as a part of SDK release 2021-09-28 19:44:58 +00:00
Harneet Virk
a66a92e338 Merge pull request #1597 from Azure/release_update/Release-112
update samples from Release-112 as a part of  SDK release
2021-09-24 14:44:53 -07:00
amlrelsa-ms
c56c2c3525 update samples from Release-112 as a part of SDK release 2021-09-24 21:40:44 +00:00
Harneet Virk
4cac072fa4 Merge pull request #1588 from Azure/release_update/Release-111
Update samples from Release-111 as a part of SDK 1.34.0 release
2021-09-09 09:02:38 -07:00
amlrelsa-ms
aeab6b3e28 update samples from Release-111 as a part of SDK release 2021-09-07 17:32:15 +00:00
Harneet Virk
015e261f29 Merge pull request #1581 from Azure/release_update/Release-110
update samples from Release-110 as a part of  SDK release
2021-08-20 09:21:08 -07:00
amlrelsa-ms
d2a423dde9 update samples from Release-110 as a part of SDK release 2021-08-20 00:28:42 +00:00
Harneet Virk
3ecbfd6532 Merge pull request #1578 from Azure/release_update/Release-109
update samples from Release-109 as a part of  SDK release
2021-08-18 18:16:31 -07:00
amlrelsa-ms
02ecb2d755 update samples from Release-109 as a part of SDK release 2021-08-18 22:07:12 +00:00
Harneet Virk
122df6e846 Merge pull request #1576 from Azure/release_update/Release-108
update samples from Release-108 as a part of  SDK release
2021-08-18 09:47:34 -07:00
amlrelsa-ms
7d6a0a2051 update samples from Release-108 as a part of SDK release 2021-08-18 00:33:54 +00:00
Harneet Virk
6cc8af80a2 Merge pull request #1565 from Azure/release_update/Release-107
update samples from Release-107 as a part of  SDK release 1.33
2021-08-02 13:14:30 -07:00
amlrelsa-ms
f61898f718 update samples from Release-107 as a part of SDK release 2021-08-02 18:01:38 +00:00
Harneet Virk
5cb465171e Merge pull request #1556 from Azure/update-spark-notebook
updating spark notebook
2021-07-26 17:09:42 -07:00
Shivani Santosh Sambare
0ce37dd18f updating spark notebook 2021-07-26 15:51:54 -07:00
Cody
d835b183a5 update README.md (#1552) 2021-07-15 10:43:22 -07:00
Cody
d3cafebff9 add code of conduct (#1551) 2021-07-15 08:08:44 -07:00
Harneet Virk
354b194a25 Merge pull request #1543 from Azure/release_update/Release-106
update samples from Release-106 as a part of  SDK release
2021-07-06 11:05:55 -07:00
amlrelsa-ms
a52d67bb84 update samples from Release-106 as a part of SDK release 2021-07-06 17:17:27 +00:00
Harneet Virk
421ea3d920 Merge pull request #1530 from Azure/release_update/Release-105
update samples from Release-105 as a part of  SDK release
2021-06-25 09:58:05 -07:00
amlrelsa-ms
24f53f1aa1 update samples from Release-105 as a part of SDK release 2021-06-24 23:00:13 +00:00
Harneet Virk
6fc5d11de2 Merge pull request #1518 from Azure/release_update/Release-104
update samples from Release-104 as a part of  SDK release
2021-06-21 10:29:53 -07:00
amlrelsa-ms
d17547d890 update samples from Release-104 as a part of SDK release 2021-06-21 17:16:09 +00:00
Harneet Virk
928e0d4327 Merge pull request #1510 from Azure/release_update/Release-103
update samples from Release-103 as a part of  SDK release
2021-06-14 10:33:34 -07:00
amlrelsa-ms
05327cfbb9 update samples from Release-103 as a part of SDK release 2021-06-14 17:30:30 +00:00
Harneet Virk
8f7717014b Merge pull request #1506 from Azure/release_update/Release-102
update samples from Release-102 as a part of  SDK release 1.30.0
2021-06-07 11:14:02 -07:00
amlrelsa-ms
a47e50b79a update samples from Release-102 as a part of SDK release 2021-06-07 17:34:51 +00:00
Harneet Virk
8f89d88def Merge pull request #1505 from Azure/release_update/Release-101
update samples from Release-101 as a part of  SDK release
2021-06-04 19:54:53 -07:00
amlrelsa-ms
ec97207bb1 update samples from Release-101 as a part of SDK release 2021-06-05 02:54:13 +00:00
Harneet Virk
a2d20b0f47 Merge pull request #1493 from Azure/release_update/Release-98
update samples from Release-98 as a part of  SDK release
2021-05-28 08:04:58 -07:00
amlrelsa-ms
8180cebd75 update samples from Release-98 as a part of SDK release 2021-05-28 03:44:25 +00:00
Harneet Virk
700ab2d782 Merge pull request #1489 from Azure/release_update/Release-97
update samples from Release-97 as a part of  SDK  1.29.0 release
2021-05-25 07:43:14 -07:00
amlrelsa-ms
ec9a5a061d update samples from Release-97 as a part of SDK release 2021-05-24 17:39:23 +00:00
Harneet Virk
467630f955 Merge pull request #1466 from Azure/release_update/Release-96
update samples from Release-96 as a part of  SDK release 1.28.0
2021-05-10 22:48:19 -07:00
amlrelsa-ms
eac6b69bae update samples from Release-96 as a part of SDK release 2021-05-10 18:38:34 +00:00
Harneet Virk
441a5b0141 Merge pull request #1440 from Azure/release_update/Release-95
update samples from Release-95 as a part of  SDK 1.27 release
2021-04-19 11:51:21 -07:00
amlrelsa-ms
70902df6da update samples from Release-95 as a part of SDK release 2021-04-19 18:42:58 +00:00
nikAI77
6f893ff0b4 update samples from Release-94 as a part of SDK release (#1418)
Co-authored-by: amlrelsa-ms <amlrelsa@microsoft.com>
2021-04-06 12:36:12 -04:00
Harneet Virk
bda592a236 Merge pull request #1406 from Azure/release_update/Release-93
update samples from Release-93 as a part of  SDK release
2021-03-24 11:25:00 -07:00
amlrelsa-ms
8b32e8d5ad update samples from Release-93 as a part of SDK release 2021-03-24 16:45:36 +00:00
Harneet Virk
54a065c698 Merge pull request #1386 from yunjie-hub/master
Add synapse sample notebooks
2021-03-09 18:05:10 -08:00
yunjie-hub
b9718678b3 Add files via upload 2021-03-09 18:02:27 -08:00
Harneet Virk
3fa40d2c6d Merge pull request #1385 from Azure/release_update/Release-92
update samples from Release-92 as a part of  SDK release
2021-03-09 17:51:27 -08:00
amlrelsa-ms
883e4a4c59 update samples from Release-92 as a part of SDK release 2021-03-10 01:48:54 +00:00
Harneet Virk
e90826b331 Merge pull request #1384 from yunjie-hub/master
Add synapse sample notebooks
2021-03-09 12:40:33 -08:00
yunjie-hub
ac04172f6d Add files via upload 2021-03-09 12:38:23 -08:00
Harneet Virk
8c0000beb4 Merge pull request #1382 from Azure/release_update/Release-91
update samples from Release-91 as a part of  SDK release
2021-03-08 21:43:10 -08:00
amlrelsa-ms
35287ab0d8 update samples from Release-91 as a part of SDK release 2021-03-09 05:36:08 +00:00
Harneet Virk
3fe4f8b038 Merge pull request #1375 from Azure/release_update/Release-90
update samples from Release-90 as a part of  SDK release
2021-03-01 09:15:14 -08:00
amlrelsa-ms
1722678469 update samples from Release-90 as a part of SDK release 2021-03-01 17:13:25 +00:00
Harneet Virk
17da7e8706 Merge pull request #1364 from Azure/release_update/Release-89
update samples from Release-89 as a part of  SDK release
2021-02-23 17:27:27 -08:00
amlrelsa-ms
d2e7213ff3 update samples from Release-89 as a part of SDK release 2021-02-24 01:26:17 +00:00
mx-iao
882cb76e8a Merge pull request #1361 from Azure/minxia/distr-pytorch
Update distributed pytorch example
2021-02-23 12:07:20 -08:00
301 changed files with 38514 additions and 13476 deletions

9
CODE_OF_CONDUCT.md Normal file
View File

@@ -0,0 +1,9 @@
# Microsoft Open Source Code of Conduct
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
Resources:
- [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
- [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
- Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns

View File

@@ -1,77 +1,43 @@
# Azure Machine Learning service example notebooks
# Azure Machine Learning Python SDK notebooks
> a community-driven repository of examples using mlflow for tracking can be found at https://github.com/Azure/azureml-examples
This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/services/machine-learning-service/) Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.
Welcome to the Azure Machine Learning Python SDK notebooks repository!
![Azure ML Workflow](https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/media/concept-azure-machine-learning-architecture/workflow.png)
## Getting started
These notebooks are recommended for use in an Azure Machine Learning [Compute Instance](https://docs.microsoft.com/azure/machine-learning/concept-compute-instance), where you can run them without any additional set up.
## Quick installation
```sh
pip install azureml-sdk
```
Read more detailed instructions on [how to set up your environment](./NBSETUP.md) using Azure Notebook service, your own Jupyter notebook server, or Docker.
However, the notebooks can be run in any development environment with the correct `azureml` packages installed.
## How to navigate and use the example notebooks?
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, you should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples.
This [index](./index.md) should assist in navigating the Azure Machine Learning notebook samples and encourage efficient retrieval of topics and content.
If you want to...
* ...try out and explore Azure ML, start with image classification tutorials: [Part 1 (Training)](./tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb) and [Part 2 (Deployment)](./tutorials/image-classification-mnist-data/img-classification-part2-deploy.ipynb).
* ...learn about experimentation and tracking run history: [track and monitor experiments](./how-to-use-azureml/track-and-monitor-experiments).
* ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/ml-frameworks/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/ml-frameworks/pytorch/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
* ...deploy models as a realtime scoring service, first learn the basics by [deploying to Azure Container Instance](./how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb), then learn how to [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
* ...deploy models as a batch scoring service: [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb) and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring).
* ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb).
## Tutorials
The [Tutorials](./tutorials) folder contains notebooks for the tutorials described in the [Azure Machine Learning documentation](https://aka.ms/aml-docs).
## How to use Azure ML
The [How to use Azure ML](./how-to-use-azureml) folder contains specific examples demonstrating the features of the Azure Machine Learning SDK
- [Training](./how-to-use-azureml/training) - Examples of how to build models using Azure ML's logging and execution capabilities on local and remote compute targets
- [Training with ML and DL frameworks](./how-to-use-azureml/ml-frameworks) - Examples demonstrating how to build and train machine learning models at scale on Azure ML and perform hyperparameter tuning.
- [Manage Azure ML Service](./how-to-use-azureml/manage-azureml-service) - Examples how to perform tasks, such as authenticate against Azure ML service in different ways.
- [Automated Machine Learning](./how-to-use-azureml/automated-machine-learning) - Examples using Automated Machine Learning to automatically generate optimal machine learning pipelines and models
- [Machine Learning Pipelines](./how-to-use-azureml/machine-learning-pipelines) - Examples showing how to create and use reusable pipelines for training and batch scoring
- [Deployment](./how-to-use-azureml/deployment) - Examples showing how to deploy and manage machine learning models and solutions
- [Azure Databricks](./how-to-use-azureml/azure-databricks) - Examples showing how to use Azure ML with Azure Databricks
- [Reinforcement Learning](./how-to-use-azureml/reinforcement-learning) - Examples showing how to train reinforcement learning agents
---
## Documentation
* Quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
* [Python SDK reference](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py)
* Azure ML Data Prep SDK [overview](https://aka.ms/data-prep-sdk), [Python SDK reference](https://aka.ms/aml-data-prep-apiref), and [tutorials and how-tos](https://aka.ms/aml-data-prep-notebooks).
---
## Community Repository
Visit this [community repository](https://github.com/microsoft/MLOps/tree/master/examples) to find useful end-to-end sample notebooks. Also, please follow these [contribution guidelines](https://github.com/microsoft/MLOps/blob/master/contributing.md) when contributing to this repository.
## Projects using Azure Machine Learning
Visit following repos to see projects contributed by Azure ML users:
- [Learn about Natural Language Processing best practices using Azure Machine Learning service](https://github.com/microsoft/nlp)
- [Pre-Train BERT models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
- [UMass Amherst Student Samples](https://github.com/katiehouse3/microsoft-azure-ml-notebooks) - A number of end-to-end machine learning notebooks, including machine translation, image classification, and customer churn, created by students in the 696DS course at UMass Amherst.
## Data/Telemetry
This repository collects usage data and sends it to Microsoft to help improve our products and services. Read Microsoft's [privacy statement to learn more](https://privacy.microsoft.com/en-US/privacystatement)
To opt out of tracking, please go to the raw markdown or .ipynb files and remove the following line of code:
Install the `azureml.core` Python package:
```sh
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/README.png)"
pip install azureml-core
```
This URL will be slightly different depending on the file.
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/README.png)
Install additional packages as needed:
```sh
pip install azureml-mlflow
pip install azureml-dataset-runtime
pip install azureml-automl-runtime
pip install azureml-pipeline
pip install azureml-pipeline-steps
...
```
We recommend starting with one of the [quickstarts](tutorials/compute-instance-quickstarts).
## Contributing
This repository is a push-only mirror. Pull requests are ignored.
## Code of Conduct
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). Please see the [code of conduct](CODE_OF_CONDUCT.md) for details.
## Reference
- [Documentation](https://docs.microsoft.com/azure/machine-learning)

View File

@@ -103,7 +103,7 @@
"source": [
"import azureml.core\n",
"\n",
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -254,6 +254,8 @@
"\n",
"Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
"\n",
"To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors. Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.\n",
"\n",
"The cluster parameters are:\n",

View File

@@ -36,9 +36,9 @@
"\n",
"<a id=\"Introduction\"></a>\n",
"## Introduction\n",
"This notebook shows how to use [Fairlearn (an open source fairness assessment and unfairness mitigation package)](http://fairlearn.github.io) and Azure Machine Learning Studio for a binary classification problem. This example uses the well-known adult census dataset. For the purposes of this notebook, we shall treat this as a loan decision problem. We will pretend that the label indicates whether or not each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions are used to decide whether an individual should be offered a loan. Its purpose is purely illustrative of a workflow including a fairness dashboard - in particular, we do **not** include a full discussion of the detailed issues which arise when considering fairness in machine learning. For such discussions, please [refer to the Fairlearn website](http://fairlearn.github.io/).\n",
"This notebook shows how to use [Fairlearn (an open source fairness assessment and unfairness mitigation package)](http://fairlearn.org) and Azure Machine Learning Studio for a binary classification problem. This example uses the well-known adult census dataset. For the purposes of this notebook, we shall treat this as a loan decision problem. We will pretend that the label indicates whether or not each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions are used to decide whether an individual should be offered a loan. Its purpose is purely illustrative of a workflow including a fairness dashboard - in particular, we do **not** include a full discussion of the detailed issues which arise when considering fairness in machine learning. For such discussions, please [refer to the Fairlearn website](http://fairlearn.org/).\n",
"\n",
"We will apply the [grid search algorithm](https://fairlearn.github.io/master/api_reference/fairlearn.reductions.html#fairlearn.reductions.GridSearch) from the Fairlearn package using a specific notion of fairness called Demographic Parity. This produces a set of models, and we will view these in a dashboard both locally and in the Azure Machine Learning Studio.\n",
"We will apply the [grid search algorithm](https://fairlearn.org/v0.4.6/api_reference/fairlearn.reductions.html#fairlearn.reductions.GridSearch) from the Fairlearn package using a specific notion of fairness called Demographic Parity. This produces a set of models, and we will view these in a dashboard both locally and in the Azure Machine Learning Studio.\n",
"\n",
"### Setup\n",
"\n",
@@ -46,9 +46,10 @@
"Please see the [configuration notebook](../../configuration.ipynb) for information about creating one, if required.\n",
"This notebook also requires the following packages:\n",
"* `azureml-contrib-fairness`\n",
"* `fairlearn==0.4.6` (v0.5.0 will work with minor modifications)\n",
"* `fairlearn>=0.6.2` (pre-v0.5.0 will work with minor modifications)\n",
"* `joblib`\n",
"* `shap`\n",
"* `liac-arff`\n",
"* `raiwidgets~=0.7.0`\n",
"\n",
"Fairlearn relies on features introduced in v0.22.1 of `scikit-learn`. If you have an older version already installed, please uncomment and run the following cell:"
]
@@ -85,10 +86,9 @@
"outputs": [],
"source": [
"from fairlearn.reductions import GridSearch, DemographicParity, ErrorRate\n",
"from fairlearn.widget import FairlearnDashboard\n",
"from raiwidgets import FairnessDashboard\n",
"\n",
"from sklearn.compose import ColumnTransformer\n",
"from sklearn.datasets import fetch_openml\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.model_selection import train_test_split\n",
@@ -112,9 +112,9 @@
"metadata": {},
"outputs": [],
"source": [
"from fairness_nb_utils import fetch_openml_with_retries\n",
"from fairness_nb_utils import fetch_census_dataset\n",
"\n",
"data = fetch_openml_with_retries(data_id=1590)\n",
"data = fetch_census_dataset()\n",
" \n",
"# Extract the items we want\n",
"X_raw = data.data\n",
@@ -137,7 +137,7 @@
"outputs": [],
"source": [
"A = X_raw[['sex','race']]\n",
"X_raw = X_raw.drop(labels=['sex', 'race'],axis = 1)"
"X_raw = X_raw.drop(labels=['sex', 'race'], axis = 1)"
]
},
{
@@ -257,9 +257,9 @@
"metadata": {},
"outputs": [],
"source": [
"FairlearnDashboard(sensitive_features=A_test, sensitive_feature_names=['Sex', 'Race'],\n",
" y_true=y_test,\n",
" y_pred={\"unmitigated\": unmitigated_predictor.predict(X_test)})"
"FairnessDashboard(sensitive_features=A_test,\n",
" y_true=y_test,\n",
" y_pred={\"unmitigated\": unmitigated_predictor.predict(X_test)})"
]
},
{
@@ -312,8 +312,8 @@
"sweep.fit(X_train, y_train,\n",
" sensitive_features=A_train.sex)\n",
"\n",
"# For Fairlearn v0.5.0, need sweep.predictors_\n",
"predictors = sweep._predictors"
"# For Fairlearn pre-v0.5.0, need sweep._predictors\n",
"predictors = sweep.predictors_"
]
},
{
@@ -330,16 +330,14 @@
"outputs": [],
"source": [
"errors, disparities = [], []\n",
"for m in predictors:\n",
" classifier = lambda X: m.predict(X)\n",
" \n",
"for predictor in predictors:\n",
" error = ErrorRate()\n",
" error.load_data(X_train, pd.Series(y_train), sensitive_features=A_train.sex)\n",
" disparity = DemographicParity()\n",
" disparity.load_data(X_train, pd.Series(y_train), sensitive_features=A_train.sex)\n",
" \n",
" errors.append(error.gamma(classifier)[0])\n",
" disparities.append(disparity.gamma(classifier).max())\n",
" errors.append(error.gamma(predictor.predict)[0])\n",
" disparities.append(disparity.gamma(predictor.predict).max())\n",
" \n",
"all_results = pd.DataFrame( {\"predictor\": predictors, \"error\": errors, \"disparity\": disparities})\n",
"\n",
@@ -388,10 +386,9 @@
"metadata": {},
"outputs": [],
"source": [
"FairlearnDashboard(sensitive_features=A_test, \n",
" sensitive_feature_names=['Sex', 'Race'],\n",
" y_true=y_test.tolist(),\n",
" y_pred=predictions_dominant)"
"FairnessDashboard(sensitive_features=A_test, \n",
" y_true=y_test.tolist(),\n",
" y_pred=predictions_dominant)"
]
},
{
@@ -410,7 +407,7 @@
"<a id=\"AzureUpload\"></a>\n",
"## Uploading a Fairness Dashboard to Azure\n",
"\n",
"Uploading a fairness dashboard to Azure is a two stage process. The `FairlearnDashboard` invoked in the previous section relies on the underlying Python kernel to compute metrics on demand. This is obviously not available when the fairness dashboard is rendered in AzureML Studio. By default, the dashboard in Azure Machine Learning Studio also requires the models to be registered. The required stages are therefore:\n",
"Uploading a fairness dashboard to Azure is a two stage process. The `FairnessDashboard` invoked in the previous section relies on the underlying Python kernel to compute metrics on demand. This is obviously not available when the fairness dashboard is rendered in AzureML Studio. By default, the dashboard in Azure Machine Learning Studio also requires the models to be registered. The required stages are therefore:\n",
"1. Register the dominant models\n",
"1. Precompute all the required metrics\n",
"1. Upload to Azure\n",
@@ -584,7 +581,7 @@
"<a id=\"Conclusion\"></a>\n",
"## Conclusion\n",
"\n",
"In this notebook we have demonstrated how to use the `GridSearch` algorithm from Fairlearn to generate a collection of models, and then present them in the fairness dashboard in Azure Machine Learning Studio. Please remember that this notebook has not attempted to discuss the many considerations which should be part of any approach to unfairness mitigation. The [Fairlearn website](http://fairlearn.github.io/) provides that discussion"
"In this notebook we have demonstrated how to use the `GridSearch` algorithm from Fairlearn to generate a collection of models, and then present them in the fairness dashboard in Azure Machine Learning Studio. Please remember that this notebook has not attempted to discuss the many considerations which should be part of any approach to unfairness mitigation. The [Fairlearn website](http://fairlearn.org/) provides that discussion"
]
},
{

View File

@@ -3,5 +3,7 @@ dependencies:
- pip:
- azureml-sdk
- azureml-contrib-fairness
- fairlearn==0.4.6
- fairlearn>=0.6.2
- joblib
- liac-arff
- raiwidgets~=0.11.0

View File

@@ -4,7 +4,13 @@
"""Utilities for azureml-contrib-fairness notebooks."""
import arff
from collections import OrderedDict
from contextlib import closing
import gzip
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.utils import Bunch
import time
@@ -15,7 +21,7 @@ def fetch_openml_with_retries(data_id, max_retries=4, retry_delay=60):
print("Download attempt {0} of {1}".format(i + 1, max_retries))
data = fetch_openml(data_id=data_id, as_frame=True)
break
except Exception as e:
except Exception as e: # noqa: B902
print("Download attempt failed with exception:")
print(e)
if i + 1 != max_retries:
@@ -26,3 +32,80 @@ def fetch_openml_with_retries(data_id, max_retries=4, retry_delay=60):
raise RuntimeError("Unable to download dataset from OpenML")
return data
_categorical_columns = [
'workclass',
'education',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'native-country'
]
def fetch_census_dataset():
"""Fetch the Adult Census Dataset.
This uses a particular URL for the Adult Census dataset. The code
is a simplified version of fetch_openml() in sklearn.
The data are copied from:
https://openml.org/data/v1/download/1595261.gz
(as of 2021-03-31)
"""
try:
from urllib import urlretrieve
except ImportError:
from urllib.request import urlretrieve
filename = "1595261.gz"
data_url = "https://rainotebookscdn.blob.core.windows.net/datasets/"
remaining_attempts = 5
sleep_duration = 10
while remaining_attempts > 0:
try:
urlretrieve(data_url + filename, filename)
http_stream = gzip.GzipFile(filename=filename, mode='rb')
with closing(http_stream):
def _stream_generator(response):
for line in response:
yield line.decode('utf-8')
stream = _stream_generator(http_stream)
data = arff.load(stream)
except Exception as exc: # noqa: B902
remaining_attempts -= 1
print("Error downloading dataset from {} ({} attempt(s) remaining)"
.format(data_url, remaining_attempts))
print(exc)
time.sleep(sleep_duration)
sleep_duration *= 2
continue
else:
# dataset successfully downloaded
break
else:
raise Exception("Could not retrieve dataset from {}.".format(data_url))
attributes = OrderedDict(data['attributes'])
arff_columns = list(attributes)
raw_df = pd.DataFrame(data=data['data'], columns=arff_columns)
target_column_name = 'class'
target = raw_df.pop(target_column_name)
for col_name in _categorical_columns:
dtype = pd.api.types.CategoricalDtype(attributes[col_name])
raw_df[col_name] = raw_df[col_name].astype(dtype, copy=False)
result = Bunch()
result.data = raw_df
result.target = target
return result

View File

@@ -30,7 +30,7 @@
"1. [Training Models](#TrainingModels)\n",
"1. [Logging in to AzureML](#LoginAzureML)\n",
"1. [Registering the Models](#RegisterModels)\n",
"1. [Using the Fairlearn Dashboard](#LocalDashboard)\n",
"1. [Using the Fairness Dashboard](#LocalDashboard)\n",
"1. [Uploading a Fairness Dashboard to Azure](#AzureUpload)\n",
" 1. Computing Fairness Metrics\n",
" 1. Uploading to Azure\n",
@@ -48,9 +48,10 @@
"Please see the [configuration notebook](../../configuration.ipynb) for information about creating one, if required.\n",
"This notebook also requires the following packages:\n",
"* `azureml-contrib-fairness`\n",
"* `fairlearn==0.4.6` (should also work with v0.5.0)\n",
"* `fairlearn>=0.6.2` (also works for pre-v0.5.0 with slight modifications)\n",
"* `joblib`\n",
"* `shap`\n",
"* `liac-arff`\n",
"* `raiwidgets~=0.7.0`\n",
"\n",
"Fairlearn relies on features introduced in v0.22.1 of `scikit-learn`. If you have an older version already installed, please uncomment and run the following cell:"
]
@@ -88,7 +89,6 @@
"source": [
"from sklearn import svm\n",
"from sklearn.compose import ColumnTransformer\n",
"from sklearn.datasets import fetch_openml\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.model_selection import train_test_split\n",
@@ -110,9 +110,9 @@
"metadata": {},
"outputs": [],
"source": [
"from fairness_nb_utils import fetch_openml_with_retries\n",
"from fairness_nb_utils import fetch_census_dataset\n",
"\n",
"data = fetch_openml_with_retries(data_id=1590)\n",
"data = fetch_census_dataset()\n",
" \n",
"# Extract the items we want\n",
"X_raw = data.data\n",
@@ -389,12 +389,11 @@
"metadata": {},
"outputs": [],
"source": [
"from fairlearn.widget import FairlearnDashboard\n",
"from raiwidgets import FairnessDashboard\n",
"\n",
"FairlearnDashboard(sensitive_features=A_test, \n",
" sensitive_feature_names=['Sex', 'Race'],\n",
" y_true=y_test.tolist(),\n",
" y_pred=ys_pred)"
"FairnessDashboard(sensitive_features=A_test, \n",
" y_true=y_test.tolist(),\n",
" y_pred=ys_pred)"
]
},
{
@@ -404,7 +403,7 @@
"<a id=\"AzureUpload\"></a>\n",
"## Uploading a Fairness Dashboard to Azure\n",
"\n",
"Uploading a fairness dashboard to Azure is a two stage process. The `FairlearnDashboard` invoked in the previous section relies on the underlying Python kernel to compute metrics on demand. This is obviously not available when the fairness dashboard is rendered in AzureML Studio. The required stages are therefore:\n",
"Uploading a fairness dashboard to Azure is a two stage process. The `FairnessDashboard` invoked in the previous section relies on the underlying Python kernel to compute metrics on demand. This is obviously not available when the fairness dashboard is rendered in AzureML Studio. The required stages are therefore:\n",
"1. Precompute all the required metrics\n",
"1. Upload to Azure\n",
"\n",

View File

@@ -3,5 +3,7 @@ dependencies:
- pip:
- azureml-sdk
- azureml-contrib-fairness
- fairlearn==0.4.6
- fairlearn>=0.6.2
- joblib
- liac-arff
- raiwidgets~=0.11.0

View File

@@ -2,7 +2,7 @@ name: azure_automl
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- pip==20.2.4
- pip==21.1.2
- python>=3.5.2,<3.8
- nb_conda
- boto3==1.15.18
@@ -18,12 +18,13 @@ dependencies:
- holidays==0.9.11
- pytorch::pytorch=1.4.0
- cudatoolkit=10.1.243
- tornado==6.1.0
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets~=1.23.0
- azureml-widgets~=1.35.0
- pytorch-transformers==1.0.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.23.0/validated_win32_requirements.txt [--no-deps]
- PyJWT < 2.0.0
- -r https://automlresources-prod.azureedge.net/validated-requirements/1.35.0/validated_win32_requirements.txt [--no-deps]
- arch==4.14

View File

@@ -2,7 +2,7 @@ name: azure_automl
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- pip==20.2.4
- pip==21.1.2
- python>=3.5.2,<3.8
- nb_conda
- boto3==1.15.18
@@ -18,13 +18,13 @@ dependencies:
- holidays==0.9.11
- pytorch::pytorch=1.4.0
- cudatoolkit=10.1.243
- tornado==6.1.0
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets~=1.23.0
- azureml-widgets~=1.35.0
- pytorch-transformers==1.0.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.23.0/validated_linux_requirements.txt [--no-deps]
- PyJWT < 2.0.0
- -r https://automlresources-prod.azureedge.net/validated-requirements/1.35.0/validated_linux_requirements.txt [--no-deps]
- arch==4.14

View File

@@ -2,7 +2,7 @@ name: azure_automl
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- pip==20.2.4
- pip==21.1.2
- nomkl
- python>=3.5.2,<3.8
- nb_conda
@@ -19,12 +19,13 @@ dependencies:
- holidays==0.9.11
- pytorch::pytorch=1.4.0
- cudatoolkit=9.0
- tornado==6.1.0
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets~=1.23.0
- azureml-widgets~=1.35.0
- pytorch-transformers==1.0.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.23.0/validated_darwin_requirements.txt [--no-deps]
- PyJWT < 2.0.0
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlresources-prod.azureedge.net/validated-requirements/1.35.0/validated_darwin_requirements.txt [--no-deps]
- arch==4.14

View File

@@ -32,6 +32,7 @@ if [ $? -ne 0 ]; then
fi
sed -i '' 's/AZUREML-SDK-VERSION/latest/' $AUTOML_ENV_FILE
brew install libomp
if source activate $CONDA_ENV_NAME 2> /dev/null
then

View File

@@ -3,7 +3,7 @@ import platform
try:
import conda
except:
except Exception:
print('Failed to import conda.')
print('This setup is usually run from the base conda environment.')
print('You can activate the base environment using the command "conda activate base"')

View File

@@ -86,7 +86,6 @@
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.automl.core.featurization import FeaturizationConfig\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.interpret import ExplanationClient"
@@ -105,7 +104,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -165,6 +164,9 @@
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
"\n",
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
@@ -187,7 +189,7 @@
" compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n",
" max_nodes=6)\n",
" compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
@@ -374,15 +376,6 @@
"remote_run = experiment.submit(automl_config, show_output = False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -605,27 +598,21 @@
"from azureml.automl.core.onnx_convert import OnnxConvertConstants\n",
"from azureml.train.automl import constants\n",
"\n",
"if sys.version_info < OnnxConvertConstants.OnnxIncompatiblePythonVersion:\n",
" python_version_compatible = True\n",
"else:\n",
" python_version_compatible = False\n",
"\n",
"import onnxruntime\n",
"from azureml.automl.runtime.onnx_convert import OnnxInferenceHelper\n",
"\n",
"def get_onnx_res(run):\n",
" res_path = 'onnx_resource.json'\n",
" run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
" with open(res_path) as f:\n",
" onnx_res = json.load(f)\n",
" return onnx_res\n",
" result = json.load(f)\n",
" return result\n",
"\n",
"if python_version_compatible:\n",
"if sys.version_info < OnnxConvertConstants.OnnxIncompatiblePythonVersion:\n",
" test_df = test_dataset.to_pandas_dataframe()\n",
" mdl_bytes = onnx_mdl.SerializeToString()\n",
" onnx_res = get_onnx_res(best_run)\n",
" onnx_result = get_onnx_res(best_run)\n",
"\n",
" onnxrt_helper = OnnxInferenceHelper(mdl_bytes, onnx_res)\n",
" onnxrt_helper = OnnxInferenceHelper(mdl_bytes, onnx_result)\n",
" pred_onnx, pred_prob_onnx = onnxrt_helper.predict(test_df)\n",
"\n",
" print(pred_onnx)\n",
@@ -714,14 +701,12 @@
"source": [
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.webservice import AciWebservice\n",
"from azureml.core.webservice import Webservice\n",
"from azureml.core.model import Model\n",
"from azureml.core.environment import Environment\n",
"\n",
"inference_config = InferenceConfig(entry_script=script_file_name)\n",
"inference_config = InferenceConfig(environment = best_run.get_environment(), entry_script=script_file_name)\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
" memory_gb = 1, \n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 2, \n",
" memory_gb = 2, \n",
" tags = {'area': \"bmData\", 'type': \"automl_classification\"}, \n",
" description = 'sample service for Automl Classification')\n",
"\n",
@@ -798,7 +783,6 @@
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import requests\n",
"\n",
"X_test_json = X_test.to_json(orient='records')\n",
@@ -838,7 +822,6 @@
"source": [
"%matplotlib notebook\n",
"from sklearn.metrics import confusion_matrix\n",
"import numpy as np\n",
"import itertools\n",
"\n",
"cf =confusion_matrix(actual,y_pred)\n",

View File

@@ -0,0 +1,4 @@
name: auto-ml-classification-bank-marketing-all-features
dependencies:
- pip:
- azureml-sdk

View File

@@ -93,7 +93,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -127,6 +127,9 @@
"source": [
"## Create or Attach existing AmlCompute\n",
"A compute target is required to execute the Automated ML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
"\n",
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
@@ -212,7 +215,7 @@
"source": [
"automl_settings = {\n",
" \"n_cross_validations\": 3,\n",
" \"primary_metric\": 'average_precision_score_weighted',\n",
" \"primary_metric\": 'AUC_weighted',\n",
" \"enable_early_stopping\": True,\n",
" \"max_concurrent_iterations\": 2, # This is a limit for testing purpose, please increase it as per cluster size\n",
" \"experiment_timeout_hours\": 0.25, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ablity to find the best model possible\n",
@@ -255,15 +258,6 @@
"#remote_run = AutoMLRun(experiment = experiment, run_id = '<replace with your run id>')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -0,0 +1,4 @@
name: auto-ml-classification-credit-card-fraud
dependencies:
- pip:
- azureml-sdk

View File

@@ -96,7 +96,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -138,6 +138,8 @@
"## Set up a compute cluster\n",
"This section uses a user-provided compute cluster (named \"dnntext-cluster\" in this example). If a cluster with this name does not exist in the user's workspace, the below code will create a new cluster. You can choose the parameters of the cluster as mentioned in the comments.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
"\n",
"Whether you provide/select a CPU or GPU cluster, AutoML will choose the appropriate DNN for that setup - BiLSTM or BERT text featurizer will be included in the candidate featurizers on CPU and GPU respectively. If your goal is to obtain the most accurate model, we recommend you use GPU clusters since BERT featurizers usually outperform BiLSTM featurizers."
]
},
@@ -160,7 +162,7 @@
" compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\", # CPU for BiLSTM, such as \"STANDARD_D2_V2\" \n",
" compute_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\", # CPU for BiLSTM, such as \"STANDARD_DS12_V2\" \n",
" # To use BERT (this is recommended for best performance), select a GPU such as \"STANDARD_NC6\" \n",
" # or similar GPU option\n",
" # available in your workspace\n",
@@ -281,8 +283,8 @@
"outputs": [],
"source": [
"automl_settings = {\n",
" \"experiment_timeout_minutes\": 20,\n",
" \"primary_metric\": 'accuracy',\n",
" \"experiment_timeout_minutes\": 30,\n",
" \"primary_metric\": 'AUC_weighted',\n",
" \"max_concurrent_iterations\": num_nodes, \n",
" \"max_cores_per_iteration\": -1,\n",
" \"enable_dnn\": True,\n",
@@ -319,15 +321,6 @@
"automl_run = experiment.submit(automl_config, show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -494,7 +487,7 @@
"outputs": [],
"source": [
"test_run = run_inference(test_experiment, compute_target, script_folder, best_dnn_run,\n",
" train_dataset, test_dataset, target_column_name, model_name)"
" test_dataset, target_column_name, model_name)"
]
},
{

View File

@@ -0,0 +1,4 @@
name: auto-ml-classification-text-dnn
dependencies:
- pip:
- azureml-sdk

View File

@@ -5,7 +5,7 @@ from azureml.core.run import Run
def run_inference(test_experiment, compute_target, script_folder, train_run,
train_dataset, test_dataset, target_column_name, model_name):
test_dataset, target_column_name, model_name):
inference_env = train_run.get_environment()
@@ -16,7 +16,6 @@ def run_inference(test_experiment, compute_target, script_folder, train_run,
'--model_name': model_name
},
inputs=[
train_dataset.as_named_input('train_data'),
test_dataset.as_named_input('test_data')
],
compute_target=compute_target,

View File

@@ -1,5 +1,6 @@
import argparse
import pandas as pd
import numpy as np
from sklearn.externals import joblib
@@ -32,22 +33,21 @@ model = joblib.load(model_path)
run = Run.get_context()
# get input dataset by name
test_dataset = run.input_datasets['test_data']
train_dataset = run.input_datasets['train_data']
X_test_df = test_dataset.drop_columns(columns=[target_column_name]) \
.to_pandas_dataframe()
y_test_df = test_dataset.with_timestamp_columns(None) \
.keep_columns(columns=[target_column_name]) \
.to_pandas_dataframe()
y_train_df = test_dataset.with_timestamp_columns(None) \
.keep_columns(columns=[target_column_name]) \
.to_pandas_dataframe()
predicted = model.predict_proba(X_test_df)
if isinstance(predicted, pd.DataFrame):
predicted = predicted.values
# Use the AutoML scoring module
class_labels = np.unique(np.concatenate((y_train_df.values, y_test_df.values)))
train_labels = model.classes_
class_labels = np.unique(np.concatenate((y_test_df.values, np.reshape(train_labels, (-1, 1)))))
classification_metrics = list(constants.CLASSIFICATION_SCALAR_SET)
scores = scoring.score_classification(y_test_df.values, predicted,
classification_metrics,

View File

@@ -81,7 +81,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -141,6 +141,9 @@
"#### Create or Attach existing AmlCompute\n",
"\n",
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
"\n",
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
@@ -163,7 +166,7 @@
" compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n",
" max_nodes=4)\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"\n",
@@ -345,7 +348,7 @@
" \"iteration_timeout_minutes\": 10,\n",
" \"experiment_timeout_hours\": 0.25,\n",
" \"n_cross_validations\": 3,\n",
" \"primary_metric\": 'r2_score',\n",
" \"primary_metric\": 'normalized_root_mean_squared_error',\n",
" \"max_concurrent_iterations\": 3,\n",
" \"max_cores_per_iteration\": -1,\n",
" \"verbosity\": logging.INFO,\n",

View File

@@ -0,0 +1,4 @@
name: auto-ml-continuous-retraining
dependencies:
- pip:
- azureml-sdk

View File

@@ -31,7 +31,7 @@ try:
model = Model(ws, args.model_name)
last_train_time = model.created_time
print("Model was last trained on {0}.".format(last_train_time))
except Exception as e:
except Exception:
print("Could not get last model train time.")
last_train_time = datetime.min.replace(tzinfo=pytz.UTC)

View File

@@ -49,22 +49,24 @@ print("Argument 1(ds_name): %s" % args.ds_name)
dstor = ws.get_default_datastore()
register_dataset = False
end_time = datetime.utcnow()
try:
ds = Dataset.get_by_name(ws, args.ds_name)
end_time_last_slice = ds.data_changed_time.replace(tzinfo=None)
print("Dataset {0} last updated on {1}".format(args.ds_name,
end_time_last_slice))
except Exception as e:
except Exception:
print(traceback.format_exc())
print("Dataset with name {0} not found, registering new dataset.".format(args.ds_name))
register_dataset = True
end_time_last_slice = datetime.today() - relativedelta(weeks=2)
end_time = datetime(2021, 5, 1, 0, 0)
end_time_last_slice = end_time - relativedelta(weeks=2)
end_time = datetime.utcnow()
train_df = get_noaa_data(end_time_last_slice, end_time)
if train_df.size > 0:
print("Received {0} rows of new data after {0}.".format(
print("Received {0} rows of new data after {1}.".format(
train_df.shape[0], end_time_last_slice))
folder_name = "{}/{:04d}/{:02d}/{:02d}/{:02d}/{:02d}/{:02d}".format(args.ds_name, end_time.year,
end_time.month, end_time.day,

View File

@@ -5,7 +5,7 @@ set options=%3
set PIP_NO_WARN_SCRIPT_LOCATION=0
IF "%conda_env_name%"=="" SET conda_env_name="azure_automl_experimental"
IF "%automl_env_file%"=="" SET automl_env_file="automl_env.yml"
IF "%automl_env_file%"=="" SET automl_env_file="automl_thin_client_env.yml"
IF NOT EXIST %automl_env_file% GOTO YmlMissing

View File

@@ -12,7 +12,7 @@ fi
if [ "$AUTOML_ENV_FILE" == "" ]
then
AUTOML_ENV_FILE="automl_env.yml"
AUTOML_ENV_FILE="automl_thin_client_env.yml"
fi
if [ ! -f $AUTOML_ENV_FILE ]; then

View File

@@ -12,7 +12,7 @@ fi
if [ "$AUTOML_ENV_FILE" == "" ]
then
AUTOML_ENV_FILE="automl_env.yml"
AUTOML_ENV_FILE="automl_thin_client_env_mac.yml"
fi
if [ ! -f $AUTOML_ENV_FILE ]; then

View File

@@ -7,6 +7,8 @@ dependencies:
- nb_conda
- cython
- urllib3<1.24
- PyJWT < 2.0.0
- numpy==1.18.5
- pip:
# Required packages for AzureML execution, history, and data preparation.
@@ -14,4 +16,3 @@ dependencies:
- azureml-sdk
- azureml-widgets
- pandas
- PyJWT < 2.0.0

View File

@@ -8,6 +8,8 @@ dependencies:
- nb_conda
- cython
- urllib3<1.24
- PyJWT < 2.0.0
- numpy==1.18.5
- pip:
# Required packages for AzureML execution, history, and data preparation.
@@ -15,4 +17,3 @@ dependencies:
- azureml-sdk
- azureml-widgets
- pandas
- PyJWT < 2.0.0

View File

@@ -0,0 +1,420 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/experimental/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Classification of credit card fraudulent transactions on local managed compute **_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Test](#Test)\n",
"1. [Acknowledgements](#Acknowledgements)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"In this example we use the associated credit card dataset to showcase how you can use AutoML for a simple classification problem. The goal is to predict if a credit card transaction is considered a fraudulent charge.\n",
"\n",
"This notebook is using local managed compute to train the model.\n",
"\n",
"If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create an experiment using an existing workspace.\n",
"2. Configure AutoML using `AutoMLConfig`.\n",
"3. Train the model using local managed compute.\n",
"4. Explore the results.\n",
"5. Test the fitted model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"\n",
"import pandas as pd\n",
"\n",
"import azureml.core\n",
"from azureml.core.compute_target import LocalTarget\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for experiment\n",
"experiment_name = 'automl-local-managed'\n",
"\n",
"experiment=Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Determine if local docker is configured for Linux images\n",
"\n",
"Local managed runs will leverage a Linux docker container to submit the run to. Due to this, the docker needs to be configured to use Linux containers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check if Docker is installed and Linux containers are enabled\n",
"import subprocess\n",
"from subprocess import CalledProcessError\n",
"try:\n",
" assert subprocess.run(\"docker -v\", shell=True).returncode == 0, 'Local Managed runs require docker to be installed.'\n",
" out = subprocess.check_output(\"docker system info\", shell=True).decode('ascii')\n",
" assert \"OSType: linux\" in out, 'Docker engine needs to be configured to use Linux containers.' \\\n",
" 'https://docs.docker.com/docker-for-windows/#switch-between-windows-and-linux-containers'\n",
"except CalledProcessError as ex:\n",
" raise Exception('Local Managed runs require docker to be installed.') from ex"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Data\n",
"\n",
"Load the credit card dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n",
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
"training_data, validation_data = dataset.random_split(percentage=0.8, seed=223)\n",
"label_column_name = 'Class'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
"|**enable_early_stopping**|Stop the run if the metric score is not showing improvement.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**training_data**|Input dataset, containing both features and label column.|\n",
"|**label_column_name**|The name of the label column.|\n",
"|**enable_local_managed**|Enable the experimental local-managed scenario.|\n",
"\n",
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"n_cross_validations\": 3,\n",
" \"primary_metric\": 'average_precision_score_weighted',\n",
" \"enable_early_stopping\": True,\n",
" \"experiment_timeout_hours\": 0.3, #for real scenarios we recommend a timeout of at least one hour \n",
" \"verbosity\": logging.INFO,\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" compute_target = LocalTarget(),\n",
" enable_local_managed = True,\n",
" training_data = training_data,\n",
" label_column_name = label_column_name,\n",
" **automl_settings\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Call the `submit` method on the experiment object and pass the run configuration. Depending on the data and the number of iterations this can run for a while. Validation errors and current status will be shown when setting `show_output=True` and the execution will be synchronous."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"parent_run = experiment.submit(automl_config, show_output = True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If you need to retrieve a run that already started, use the following code\n",
"#from azureml.train.automl.run import AutoMLRun\n",
"#parent_run = AutoMLRun(experiment = experiment, run_id = '<replace with your run id>')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"parent_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Explain model\n",
"\n",
"Automated ML models can be explained and visualized using the SDK Explainability library. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analyze results\n",
"\n",
"### Retrieve the Best Child Run\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_best_child` method returns the best run. Overloads on `get_best_child` allow you to retrieve the best run for *any* logged metric."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run = parent_run.get_best_child()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test the fitted model\n",
"\n",
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_test_df = validation_data.drop_columns(columns=[label_column_name])\n",
"y_test_df = validation_data.keep_columns(columns=[label_column_name], validate=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Creating ModelProxy for submitting prediction runs to the training environment.\n",
"We will create a ModelProxy for the best child run, which will allow us to submit a run that does the prediction in the training environment. Unlike the local client, which can have different versions of some libraries, the training environment will have all the compatible libraries for the model already."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.automl.model_proxy import ModelProxy\n",
"best_model_proxy = ModelProxy(best_run)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# call the predict functions on the model proxy\n",
"y_pred = best_model_proxy.predict(X_test_df).to_pandas_dataframe()\n",
"y_pred"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Acknowledgements"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This Credit Card fraud Detection dataset is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/ and is available at: https://www.kaggle.com/mlg-ulb/creditcardfraud\n",
"\n",
"\n",
"The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Universit\u00c3\u0192\u00c2\u00a9 Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project\n",
"Please cite the following works: \n",
"\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00a2\tAndrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015\n",
"\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00a2\tDal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon\n",
"\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00a2\tDal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE\n",
"o\tDal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)\n",
"\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00a2\tCarcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-A\u00c3\u0192\u00c2\u00abl; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier\n",
"\u00c3\u00a2\u00e2\u201a\u00ac\u00c2\u00a2\tCarcillo, Fabrizio; Le Borgne, Yann-A\u00c3\u0192\u00c2\u00abl; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing"
]
}
],
"metadata": {
"authors": [
{
"name": "sekrupa"
}
],
"category": "tutorial",
"compute": [
"AML Compute"
],
"datasets": [
"Creditcard"
],
"deployment": [
"None"
],
"exclude_from_index": false,
"file_extension": ".py",
"framework": [
"None"
],
"friendly_name": "Classification of credit card fraudulent transactions using Automated ML",
"index_order": 5,
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"tags": [
"AutomatedML"
],
"task": "Classification",
"version": "3.6.7"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,4 @@
name: auto-ml-classification-credit-card-fraud-local-managed
dependencies:
- pip:
- azureml-sdk

View File

@@ -39,6 +39,7 @@
"source": [
"## Introduction\n",
"In this example we use an experimental feature, Model Proxy, to do a predict on the best generated model without downloading the model locally. The prediction will happen on same compute and environment that was used to train the model. This feature is currently in the experimental state, which means that the API is prone to changing, please make sure to run on the latest version of this notebook if you face any issues.\n",
"This notebook will also leverage MLFlow for saving models, allowing for more portability of the resulting models. See https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow for more details around MLFlow is AzureML.\n",
"\n",
"If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"\n",
@@ -90,7 +91,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -142,7 +143,7 @@
" compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n",
" max_nodes=4)\n",
" compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
@@ -194,7 +195,6 @@
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**training_data**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**label_column_name**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
"|**scenario**|We need to set this parameter to 'Latest' to enable some experimental features. This parameter should not be set outside of this experimental notebook.|\n",
"\n",
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
]
@@ -213,17 +213,17 @@
" \"n_cross_validations\": 3,\n",
" \"primary_metric\": 'r2_score',\n",
" \"enable_early_stopping\": True, \n",
" \"experiment_timeout_hours\": 0.3, #for real scenarios we reccommend a timeout of at least one hour \n",
" \"experiment_timeout_hours\": 0.3, #for real scenarios we recommend a timeout of at least one hour \n",
" \"max_concurrent_iterations\": 4,\n",
" \"max_cores_per_iteration\": -1,\n",
" \"verbosity\": logging.INFO,\n",
" \"save_mlflow\": True,\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'regression',\n",
" compute_target = compute_target,\n",
" training_data = train_data,\n",
" label_column_name = label,\n",
" scenario='Latest',\n",
" **automl_settings\n",
" )"
]

View File

@@ -0,0 +1,4 @@
name: auto-ml-regression-model-proxy
dependencies:
- pip:
- azureml-sdk

View File

@@ -113,7 +113,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -162,7 +162,9 @@
},
"source": [
"### Using AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you use `AmlCompute` as your training compute resource."
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you use `AmlCompute` as your training compute resource.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
]
},
{
@@ -185,7 +187,7 @@
" compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n",
" max_nodes=4)\n",
" compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
@@ -365,10 +367,13 @@
"source": [
"from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
"forecasting_parameters = ForecastingParameters(\n",
" time_column_name=time_column_name, forecast_horizon=forecast_horizon\n",
" time_column_name=time_column_name,\n",
" forecast_horizon=forecast_horizon,\n",
" freq='MS' # Set the forecast frequency to be monthly (start of the month)\n",
")\n",
"\n",
"automl_config = AutoMLConfig(task='forecasting', \n",
"# We will disable the enable_early_stopping flag to ensure the DNN model is recommended for demonstration purpose.\n",
"automl_config = AutoMLConfig(task='forecasting',\n",
" primary_metric='normalized_root_mean_squared_error',\n",
" experiment_timeout_hours = 1,\n",
" training_data=train_dataset,\n",
@@ -379,6 +384,7 @@
" max_concurrent_iterations=4,\n",
" max_cores_per_iteration=-1,\n",
" enable_dnn=True,\n",
" enable_early_stopping=False,\n",
" forecasting_parameters=forecasting_parameters)"
]
},
@@ -401,8 +407,7 @@
},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config, show_output= False)\n",
"remote_run"
"remote_run = experiment.submit(automl_config, show_output= True)"
]
},
{
@@ -419,15 +424,6 @@
"# remote_run = AutoMLRun(experiment = experiment, run_id = '<replace with your run id>')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {
@@ -668,7 +664,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
"version": "3.6.9"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,4 @@
name: auto-ml-forecasting-beer-remote
dependencies:
- pip:
- azureml-sdk

View File

@@ -71,7 +71,8 @@
"\n",
"from azureml.core import Workspace, Experiment, Dataset\n",
"from azureml.train.automl import AutoMLConfig\n",
"from datetime import datetime"
"from datetime import datetime\n",
"from azureml.automl.core.featurization import FeaturizationConfig"
]
},
{
@@ -87,7 +88,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -129,6 +130,9 @@
"source": [
"## Compute\n",
"You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
"\n",
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
@@ -151,7 +155,7 @@
" compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n",
" max_nodes=4)\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"\n",
@@ -300,6 +304,25 @@
"forecast_horizon = 14"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Convert prediction type to integer\n",
"The featurization configuration can be used to change the default prediction type from decimal numbers to integer. This customization can be used in the scenario when the target column is expected to contain whole values as the number of rented bikes per day."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"featurization_config = FeaturizationConfig()\n",
"# Force the target column, to be integer type.\n",
"featurization_config.add_prediction_transform_type('Integer')"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -318,11 +341,13 @@
" time_column_name=time_column_name,\n",
" forecast_horizon=forecast_horizon,\n",
" country_or_region_for_holidays='US', # set country_or_region will trigger holiday featurizer\n",
" target_lags='auto' # use heuristic based lag setting \n",
" target_lags='auto', # use heuristic based lag setting\n",
" freq='D' # Set the forecast frequency to be daily\n",
")\n",
"\n",
"automl_config = AutoMLConfig(task='forecasting', \n",
" primary_metric='normalized_root_mean_squared_error',\n",
" featurization=featurization_config,\n",
" blocked_models = ['ExtremeRandomTrees'], \n",
" experiment_timeout_hours=0.3,\n",
" training_data=train,\n",
@@ -349,8 +374,7 @@
"metadata": {},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config, show_output=False)\n",
"remote_run"
"remote_run = experiment.submit(automl_config, show_output=False)"
]
},
{
@@ -504,7 +528,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download the prediction result for metrics calcuation\n",
"### Download the prediction result for metrics calculation\n",
"The test data with predictions are saved in artifact outputs/predictions.csv. You can download it and calculation some error metrics for the forecasts and vizualize the predictions vs. the actuals."
]
},

View File

@@ -0,0 +1,4 @@
name: auto-ml-forecasting-bike-share
dependencies:
- pip:
- azureml-sdk

View File

@@ -24,10 +24,11 @@
"_**Forecasting using the Energy Demand Dataset**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data and Forecasting Configurations](#Data)\n",
"1. [Train](#Train)\n",
"1. [Introduction](#introduction)\n",
"1. [Setup](#setup)\n",
"1. [Data and Forecasting Configurations](#data)\n",
"1. [Train](#train)\n",
"1. [Generate and Evaluate the Forecast](#forecast)\n",
"\n",
"Advanced Forecasting\n",
"1. [Advanced Training](#advanced_training)\n",
@@ -38,7 +39,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"# Introduction<a id=\"introduction\"></a>\n",
"\n",
"In this example we use the associated New York City energy demand dataset to showcase how you can use AutoML for a simple forecasting problem and explore the results. The goal is predict the energy demand for the next 48 hours based on historic time-series data.\n",
"\n",
@@ -49,15 +50,16 @@
"1. Configure AutoML using 'AutoMLConfig'\n",
"1. Train the model using AmlCompute\n",
"1. Explore the engineered features and results\n",
"1. Generate the forecast and compute the out-of-sample accuracy metrics\n",
"1. Configuration and remote run of AutoML for a time-series model with lag and rolling window features\n",
"1. Run and explore the forecast"
"1. Run and explore the forecast with lagging features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
"# Setup<a id=\"setup\"></a>"
]
},
{
@@ -97,7 +99,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -177,7 +179,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data\n",
"# Data<a id=\"data\"></a>\n",
"\n",
"We will use energy consumption [data from New York City](http://mis.nyiso.com/public/P-58Blist.htm) for model training. The data is stored in a tabular format and includes energy demand and basic weather data at an hourly frequency. \n",
"\n",
@@ -309,7 +311,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"# Train<a id=\"train\"></a>\n",
"\n",
"Instantiate an AutoMLConfig object. This config defines the settings and data used to run the experiment. We can provide extra configurations within 'automl_settings', for this forecasting task we add the forecasting parameters to hold all the additional forecasting parameters.\n",
"\n",
@@ -342,7 +344,9 @@
"source": [
"from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
"forecasting_parameters = ForecastingParameters(\n",
" time_column_name=time_column_name, forecast_horizon=forecast_horizon\n",
" time_column_name=time_column_name,\n",
" forecast_horizon=forecast_horizon,\n",
" freq='H' # Set the forecast frequency to be hourly\n",
")\n",
"\n",
"automl_config = AutoMLConfig(task='forecasting', \n",
@@ -375,15 +379,6 @@
"remote_run = experiment.submit(automl_config, show_output=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -458,9 +453,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecasting\n",
"# Forecasting<a id=\"forecast\"></a>\n",
"\n",
"Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. First, we remove the target values from the test set:"
"Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. We will do batch scoring on the test dataset which should have the same schema as training dataset.\n",
"\n",
"The inference will run on a remote compute. In this example, it will re-use the training compute."
]
},
{
@@ -469,16 +466,15 @@
"metadata": {},
"outputs": [],
"source": [
"X_test = test.to_pandas_dataframe().reset_index(drop=True)\n",
"y_test = X_test.pop(target_column_name).values"
"test_experiment = Experiment(ws, experiment_name + \"_inference\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Forecast Function\n",
"For forecasting, we will use the forecast function instead of the predict function. Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use. Forecast function also can handle more complicated scenarios, see the [forecast function notebook](../forecasting-forecast-function/auto-ml-forecasting-function.ipynb)."
"### Retreiving forecasts from the model\n",
"We have created a function called `run_forecast` that submits the test data to the best model determined during the training run and retrieves forecasts. This function uses a helper script `forecasting_script` which is uploaded and expecuted on the remote compute."
]
},
{
@@ -487,10 +483,16 @@
"metadata": {},
"outputs": [],
"source": [
"# The featurized data, aligned to y, will also be returned.\n",
"# This contains the assumptions that were made in the forecast\n",
"# and helps align the forecast to the original data\n",
"y_predictions, X_trans = fitted_model.forecast(X_test)"
"from run_forecast import run_remote_inference\n",
"remote_run_infer = run_remote_inference(test_experiment=test_experiment,\n",
" compute_target=compute_target,\n",
" train_run=best_run,\n",
" test_dataset=test,\n",
" target_column_name=target_column_name)\n",
"remote_run_infer.wait_for_completion(show_output=False)\n",
"\n",
"# download the inference output file to the local machine\n",
"remote_run_infer.download_file('outputs/predictions.csv', 'predictions.csv')"
]
},
{
@@ -498,9 +500,7 @@
"metadata": {},
"source": [
"### Evaluate\n",
"To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). For more metrics that can be used for evaluation after training, please see [supported metrics](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#regressionforecasting-metrics), and [how to calculate residuals](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#residuals).\n",
"\n",
"It is a good practice to always align the output explicitly to the input, as the count and order of the rows may have changed during transformations that span multiple rows."
"To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). For more metrics that can be used for evaluation after training, please see [supported metrics](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#regressionforecasting-metrics), and [how to calculate residuals](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#residuals)."
]
},
{
@@ -509,9 +509,9 @@
"metadata": {},
"outputs": [],
"source": [
"from forecasting_helper import align_outputs\n",
"\n",
"df_all = align_outputs(y_predictions, X_trans, X_test, y_test, target_column_name)"
"# load forecast data frame\n",
"fcst_df = pd.read_csv('predictions.csv', parse_dates=[time_column_name])\n",
"fcst_df.head()"
]
},
{
@@ -526,8 +526,8 @@
"\n",
"# use automl metrics module\n",
"scores = scoring.score_regression(\n",
" y_test=df_all[target_column_name],\n",
" y_pred=df_all['predicted'],\n",
" y_test=fcst_df[target_column_name],\n",
" y_pred=fcst_df['predicted'],\n",
" metrics=list(constants.Metric.SCALAR_REGRESSION_SET))\n",
"\n",
"print(\"[Test data scores]\\n\")\n",
@@ -536,8 +536,8 @@
" \n",
"# Plot outputs\n",
"%matplotlib inline\n",
"test_pred = plt.scatter(df_all[target_column_name], df_all['predicted'], color='b')\n",
"test_test = plt.scatter(df_all[target_column_name], df_all[target_column_name], color='g')\n",
"test_pred = plt.scatter(fcst_df[target_column_name], fcst_df['predicted'], color='b')\n",
"test_test = plt.scatter(fcst_df[target_column_name], fcst_df[target_column_name], color='g')\n",
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
"plt.show()"
]
@@ -546,23 +546,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking at `X_trans` is also useful to see what featurization happened to the data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_trans"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Training <a id=\"advanced_training\"></a>\n",
"# Advanced Training <a id=\"advanced_training\"></a>\n",
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, time series identifier columns and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
]
},
@@ -645,7 +629,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Results<a id=\"advanced_results\"></a>\n",
"# Advanced Results<a id=\"advanced_results\"></a>\n",
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, time series identifier columns and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
]
},
@@ -655,10 +639,17 @@
"metadata": {},
"outputs": [],
"source": [
"# The featurized data, aligned to y, will also be returned.\n",
"# This contains the assumptions that were made in the forecast\n",
"# and helps align the forecast to the original data\n",
"y_predictions, X_trans = fitted_model_lags.forecast(X_test)"
"test_experiment_advanced = Experiment(ws, experiment_name + \"_inference_advanced\")\n",
"advanced_remote_run_infer = run_remote_inference(test_experiment=test_experiment_advanced,\n",
" compute_target=compute_target,\n",
" train_run=best_run_lags,\n",
" test_dataset=test,\n",
" target_column_name=target_column_name,\n",
" inference_folder='./forecast_advanced')\n",
"advanced_remote_run_infer.wait_for_completion(show_output=False)\n",
"\n",
"# download the inference output file to the local machine\n",
"advanced_remote_run_infer.download_file('outputs/predictions.csv', 'predictions_advanced.csv')"
]
},
{
@@ -667,9 +658,8 @@
"metadata": {},
"outputs": [],
"source": [
"from forecasting_helper import align_outputs\n",
"\n",
"df_all = align_outputs(y_predictions, X_trans, X_test, y_test, target_column_name)"
"fcst_adv_df = pd.read_csv('predictions_advanced.csv', parse_dates=[time_column_name])\n",
"fcst_adv_df.head()"
]
},
{
@@ -684,8 +674,8 @@
"\n",
"# use automl metrics module\n",
"scores = scoring.score_regression(\n",
" y_test=df_all[target_column_name],\n",
" y_pred=df_all['predicted'],\n",
" y_test=fcst_adv_df[target_column_name],\n",
" y_pred=fcst_adv_df['predicted'],\n",
" metrics=list(constants.Metric.SCALAR_REGRESSION_SET))\n",
"\n",
"print(\"[Test data scores]\\n\")\n",
@@ -694,8 +684,8 @@
" \n",
"# Plot outputs\n",
"%matplotlib inline\n",
"test_pred = plt.scatter(df_all[target_column_name], df_all['predicted'], color='b')\n",
"test_test = plt.scatter(df_all[target_column_name], df_all[target_column_name], color='g')\n",
"test_pred = plt.scatter(fcst_adv_df[target_column_name], fcst_adv_df['predicted'], color='b')\n",
"test_test = plt.scatter(fcst_adv_df[target_column_name], fcst_adv_df[target_column_name], color='g')\n",
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
"plt.show()"
]
@@ -726,7 +716,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
"version": "3.6.9"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,4 @@
name: auto-ml-forecasting-energy-demand
dependencies:
- pip:
- azureml-sdk

View File

@@ -1,44 +0,0 @@
import pandas as pd
import numpy as np
from pandas.tseries.frequencies import to_offset
def align_outputs(y_predicted, X_trans, X_test, y_test, target_column_name,
predicted_column_name='predicted',
horizon_colname='horizon_origin'):
"""
Demonstrates how to get the output aligned to the inputs
using pandas indexes. Helps understand what happened if
the output's shape differs from the input shape, or if
the data got re-sorted by time and grain during forecasting.
Typical causes of misalignment are:
* we predicted some periods that were missing in actuals -> drop from eval
* model was asked to predict past max_horizon -> increase max horizon
* data at start of X_test was needed for lags -> provide previous periods
"""
if (horizon_colname in X_trans):
df_fcst = pd.DataFrame({predicted_column_name: y_predicted,
horizon_colname: X_trans[horizon_colname]})
else:
df_fcst = pd.DataFrame({predicted_column_name: y_predicted})
# y and X outputs are aligned by forecast() function contract
df_fcst.index = X_trans.index
# align original X_test to y_test
X_test_full = X_test.copy()
X_test_full[target_column_name] = y_test
# X_test_full's index does not include origin, so reset for merge
df_fcst.reset_index(inplace=True)
X_test_full = X_test_full.reset_index().drop(columns='index')
together = df_fcst.merge(X_test_full, how='right')
# drop rows where prediction or actuals are nan
# happens because of missing actuals
# or at edges of time due to lags/rolling windows
clean = together[together[[target_column_name,
predicted_column_name]].notnull().all(axis=1)]
return(clean)

View File

@@ -0,0 +1,56 @@
"""
This is the script that is executed on the compute instance. It relies
on the model.pkl file which is uploaded along with this script to the
compute instance.
"""
import argparse
from azureml.core import Dataset, Run
from sklearn.externals import joblib
from pandas.tseries.frequencies import to_offset
parser = argparse.ArgumentParser()
parser.add_argument(
'--target_column_name', type=str, dest='target_column_name',
help='Target Column Name')
parser.add_argument(
'--test_dataset', type=str, dest='test_dataset',
help='Test Dataset')
args = parser.parse_args()
target_column_name = args.target_column_name
test_dataset_id = args.test_dataset
run = Run.get_context()
ws = run.experiment.workspace
# get the input dataset by id
test_dataset = Dataset.get_by_id(ws, id=test_dataset_id)
X_test = test_dataset.to_pandas_dataframe().reset_index(drop=True)
y_test = X_test.pop(target_column_name).values
# generate forecast
fitted_model = joblib.load('model.pkl')
# We have default quantiles values set as below(95th percentile)
quantiles = [0.025, 0.5, 0.975]
predicted_column_name = 'predicted'
PI = 'prediction_interval'
fitted_model.quantiles = quantiles
pred_quantiles = fitted_model.forecast_quantiles(X_test)
pred_quantiles[PI] = pred_quantiles[[min(quantiles), max(quantiles)]].apply(lambda x: '[{}, {}]'.format(x[0],
x[1]), axis=1)
X_test[target_column_name] = y_test
X_test[PI] = pred_quantiles[PI]
X_test[predicted_column_name] = pred_quantiles[0.5]
# drop rows where prediction or actuals are nan
# happens because of missing actuals
# or at edges of time due to lags/rolling windows
clean = X_test[X_test[[target_column_name,
predicted_column_name]].notnull().all(axis=1)]
file_name = 'outputs/predictions.csv'
export_csv = clean.to_csv(file_name, header=True, index=False) # added Index
# Upload the predictions into artifacts
run.upload_file(name=file_name, path_or_stream=file_name)

View File

@@ -1,22 +0,0 @@
import pandas as pd
import numpy as np
def APE(actual, pred):
"""
Calculate absolute percentage error.
Returns a vector of APE values with same length as actual/pred.
"""
return 100 * np.abs((actual - pred) / actual)
def MAPE(actual, pred):
"""
Calculate mean absolute percentage error.
Remove NA and values where actual is close to zero
"""
not_na = ~(np.isnan(actual) | np.isnan(pred))
not_zero = ~np.isclose(actual, 0.0)
actual_safe = actual[not_na & not_zero]
pred_safe = pred[not_na & not_zero]
return np.mean(APE(actual_safe, pred_safe))

View File

@@ -0,0 +1,38 @@
import os
import shutil
from azureml.core import ScriptRunConfig
def run_remote_inference(test_experiment, compute_target, train_run,
test_dataset, target_column_name, inference_folder='./forecast'):
# Create local directory to copy the model.pkl and forecsting_script.py files into.
# These files will be uploaded to and executed on the compute instance.
os.makedirs(inference_folder, exist_ok=True)
shutil.copy('forecasting_script.py', inference_folder)
train_run.download_file('outputs/model.pkl',
os.path.join(inference_folder, 'model.pkl'))
inference_env = train_run.get_environment()
config = ScriptRunConfig(source_directory=inference_folder,
script='forecasting_script.py',
arguments=['--target_column_name',
target_column_name,
'--test_dataset',
test_dataset.as_named_input(test_dataset.name)],
compute_target=compute_target,
environment=inference_env)
run = test_experiment.submit(config,
tags={'training_run_id':
train_run.id,
'run_algorithm':
train_run.properties['run_algorithm'],
'valid_score':
train_run.properties['score'],
'primary_metric':
train_run.properties['primary_metric']})
run.log("run_algorithm", run.tags['run_algorithm'])
return run

View File

@@ -94,7 +94,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -263,7 +263,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource."
"You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
]
},
{
@@ -283,7 +285,7 @@
" compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n",
" max_nodes=6)\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"\n",
@@ -319,7 +321,8 @@
" time_column_name=TIME_COLUMN_NAME,\n",
" forecast_horizon=forecast_horizon,\n",
" time_series_id_column_names=[ TIME_SERIES_ID_COLUMN_NAME ],\n",
" target_lags=lags\n",
" target_lags=lags,\n",
" freq='H' # Set the forecast frequency to be hourly\n",
")"
]
},

View File

@@ -0,0 +1,4 @@
name: auto-ml-forecasting-function
dependencies:
- pip:
- azureml-sdk

View File

@@ -0,0 +1,648 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Hierarchical Time Series - Automated ML\n",
"**_Generate hierarchical time series forecasts with Automated Machine Learning_**\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this notebook we are using a synthetic dataset portraying sales data to predict the the quantity of a vartiety of product skus across several states, stores, and product categories.\n",
"\n",
"**NOTE: There are limits on how many runs we can do in parallel per workspace, and we currently recommend to set the parallelism to maximum of 320 runs per experiment per workspace. If users want to have more parallelism and increase this limit they might encounter Too Many Requests errors (HTTP 429).**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites\n",
"You'll need to create a compute Instance by following the instructions in the [EnvironmentSetup.md](../Setup_Resources/EnvironmentSetup.md)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1.0 Set up workspace, datastore, experiment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1613003526897
}
},
"outputs": [],
"source": [
"import azureml.core\n",
"from azureml.core import Workspace, Datastore\n",
"import pandas as pd\n",
"\n",
"# Set up your workspace\n",
"ws = Workspace.from_config()\n",
"ws.get_details()\n",
"\n",
"# Set up your datastores\n",
"dstore = ws.get_default_datastore()\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Default datastore name'] = dstore.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Choose an experiment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1613003540729
}
},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"\n",
"experiment = Experiment(ws, 'automl-hts')\n",
"\n",
"print('Experiment name: ' + experiment.name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.0 Data\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Upload local csv files to datastore\n",
"You can upload your train and inference csv files to the default datastore in your workspace. \n",
"\n",
"A Datastore is a place where data can be stored that is then made accessible to a compute either by means of mounting or copying the data to the compute target.\n",
"Please refer to [Datastore](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py) documentation on how to access data from Datastore."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"datastore_path = \"hts-sample\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"datastore = ws.get_default_datastore()\n",
"datastore"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1613005886349
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"datastore.upload(src_dir='./Data/', target_path=datastore_path, overwrite=True, show_progress=True) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create the TabularDatasets \n",
"\n",
"Datasets in Azure Machine Learning are references to specific data in a Datastore. The data can be retrieved as a [TabularDatasets](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1613007017296
}
},
"outputs": [],
"source": [
"from azureml.core.dataset import Dataset\n",
"train_ds = Dataset.Tabular.from_delimited_files(path=datastore.path(\"hts-sample/hts-sample-train.csv\"), validate=False) \n",
"inference_ds = Dataset.Tabular.from_delimited_files(path=datastore.path(\"hts-sample/hts-sample-test.csv\"), validate=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Register the TabularDatasets to the Workspace \n",
"Finally, register the dataset to your Workspace so it can be called as an input into the training pipeline in the next notebook. We will use the inference dataset as part of the forecasting pipeline. The step need only be completed once."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"registered_train = train_ds.register(ws, \"hts-sales-train\")\n",
"registered_inference = inference_ds.register(ws, \"hts-sales-test\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3.0 Build the training pipeline\n",
"Now that the dataset, WorkSpace, and datastore are set up, we can put together a pipeline for training.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Choose a compute target\n",
"\n",
"You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"\n",
"\\*\\*Creation of AmlCompute takes approximately 5 minutes.**\n",
"\n",
"If the AmlCompute with that name is already in your workspace this code will skip the creation process. As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this [article](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1613007037308
}
},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"\n",
"# Name your cluster\n",
"compute_name = \"hts-compute\"\n",
"\n",
"\n",
"if compute_name in ws.compute_targets:\n",
" compute_target = ws.compute_targets[compute_name]\n",
" if compute_target and type(compute_target) is AmlCompute:\n",
" print('Found compute target: ' + compute_name)\n",
"else:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size= \"STANDARD_D16S_V3\",\n",
" max_nodes=20)\n",
" # Create the compute target\n",
" compute_target = ComputeTarget.create(\n",
" ws, compute_name, provisioning_config)\n",
"\n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min node count is provided it will use the scale settings for the cluster\n",
" compute_target.wait_for_completion(\n",
" show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n",
" # For a more detailed view of current cluster status, use the 'status' property\n",
" print(compute_target.status.serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up training parameters\n",
"\n",
"This dictionary defines the AutoML and hierarchy settings. For this forecasting task we need to define several settings inncluding the name of the time column, the maximum forecast horizon, the hierarchy definition, and the level of the hierarchy at which to train.\n",
"\n",
"| Property | Description|\n",
"| :--------------- | :------------------- |\n",
"| **task** | forecasting |\n",
"| **primary_metric** | This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i> |\n",
"| **blocked_models** | Blocked models won't be used by AutoML. |\n",
"| **iteration_timeout_minutes** | Maximum amount of time in minutes that the model can train. This is optional but provides customers with greater control on exit criteria. |\n",
"| **iterations** | Number of models to train. This is optional but provides customers with greater control on exit criteria. |\n",
"| **experiment_timeout_hours** | Maximum amount of time in hours that the experiment can take before it terminates. This is optional but provides customers with greater control on exit criteria. |\n",
"| **label_column_name** | The name of the label column. |\n",
"| **forecast_horizon** | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). Periods are inferred from your data. |\n",
"| **n_cross_validations** | Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way. |\n",
"| **enable_early_stopping** | Flag to enable early termination if the score is not improving in the short term. |\n",
"| **time_column_name** | The name of your time column. |\n",
"| **hierarchy_column_names** | The names of columns that define the hierarchical structure of the data from highest level to most granular. |\n",
"| **training_level** | The level of the hierarchy to be used for training models. |\n",
"| **enable_engineered_explanations** | Engineered feature explanations will be downloaded if enable_engineered_explanations flag is set to True. By default it is set to False to save storage space. |\n",
"| **time_series_id_column_name** | The column names used to uniquely identify timeseries in data that has multiple rows with the same timestamp. |\n",
"| **track_child_runs** | Flag to disable tracking of child runs. Only best run is tracked if the flag is set to False (this includes the model and metrics of the run). |\n",
"| **pipeline_fetch_max_batch_size** | Determines how many pipelines (training algorithms) to fetch at a time for training, this helps reduce throttling when training at large scale. |\n",
"| **model_explainability** | Flag to disable explaining the best automated ML model at the end of all training iterations. The default is True and will block non-explainable models which may impact the forecast accuracy. For more information, see [Interpretability: model explanations in automated machine learning](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-automl). |"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1613007061544
}
},
"outputs": [],
"source": [
"from azureml.train.automl.runtime._hts.hts_parameters import HTSTrainParameters\n",
"\n",
"model_explainability = True\n",
"\n",
"engineered_explanations = False\n",
"# Define your hierarchy. Adjust the settings below based on your dataset.\n",
"hierarchy = [\"state\", \"store_id\", \"product_category\", \"SKU\"]\n",
"training_level = \"SKU\"\n",
"\n",
"# Set your forecast parameters. Adjust the settings below based on your dataset.\n",
"time_column_name = \"date\"\n",
"label_column_name = \"quantity\"\n",
"forecast_horizon = 7\n",
"\n",
"\n",
"automl_settings = {\n",
" \"task\" : \"forecasting\",\n",
" \"primary_metric\" : \"normalized_root_mean_squared_error\",\n",
" \"label_column_name\": label_column_name,\n",
" \"time_column_name\": time_column_name,\n",
" \"forecast_horizon\": forecast_horizon,\n",
" \"hierarchy_column_names\": hierarchy,\n",
" \"hierarchy_training_level\": training_level,\n",
" \"track_child_runs\": False,\n",
" \"pipeline_fetch_max_batch_size\": 15,\n",
" \"model_explainability\": model_explainability,\n",
" # The following settings are specific to this sample and should be adjusted according to your own needs.\n",
" \"iteration_timeout_minutes\" : 10,\n",
" \"iterations\" : 10,\n",
" \"n_cross_validations\": 2\n",
"}\n",
"\n",
"hts_parameters = HTSTrainParameters(\n",
" automl_settings=automl_settings,\n",
" hierarchy_column_names=hierarchy,\n",
" training_level=training_level,\n",
" enable_engineered_explanations=engineered_explanations\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up hierarchy training pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Parallel run step is leveraged to train the hierarchy. To configure the ParallelRunConfig you will need to determine the appropriate number of workers and nodes for your use case. The `process_count_per_node` is based off the number of cores of the compute VM. The node_count will determine the number of master nodes to use, increasing the node count will speed up the training process.\n",
"\n",
"* **experiment:** The experiment used for training.\n",
"* **train_data:** The tabular dataset to be used as input to the training run.\n",
"* **node_count:** The number of compute nodes to be used for running the user script. We recommend to start with 3 and increase the node_count if the training time is taking too long.\n",
"* **process_count_per_node:** Process count per node, we recommend 2:1 ratio for number of cores: number of processes per node. eg. If node has 16 cores then configure 8 or less process count per node or optimal performance.\n",
"* **train_pipeline_parameters:** The set of configuration parameters defined in the previous section. \n",
"\n",
"Calling this method will create a new aggregated dataset which is generated dynamically on pipeline execution."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder\n",
"\n",
"\n",
"training_pipeline_steps = AutoMLPipelineBuilder.get_many_models_train_steps(\n",
" experiment=experiment,\n",
" train_data=registered_train,\n",
" compute_target=compute_target,\n",
" node_count=2,\n",
" process_count_per_node=8,\n",
" train_pipeline_parameters=hts_parameters,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.core import Pipeline\n",
"\n",
"training_pipeline = Pipeline(ws, steps=training_pipeline_steps)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit the pipeline to run\n",
"Next we submit our pipeline to run. The whole training pipeline takes about 1h 11m using a Standard_D12_V2 VM with our current ParallelRunConfig setting."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"training_run = experiment.submit(training_pipeline)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"training_run.wait_for_completion(show_output=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check the run status, if training_run is in completed state, continue to forecasting. If training_run is in another state, check the portal for failures."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### [Optional] Get the explanations\n",
"First we need to download the explanations to the local disk."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if model_explainability:\n",
" expl_output = training_run.get_pipeline_output(\"explanations\")\n",
" expl_output.download(\"training_explanations\")\n",
"else:\n",
" print(\"Model explanations are available only if model_explainability is set to True.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The explanations are downloaded to the \"training_explanations/azureml\" directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"if model_explainability:\n",
" explanations_dirrectory = os.listdir(os.path.join('training_explanations', 'azureml'))\n",
" if len(explanations_dirrectory) > 1:\n",
" print(\"Warning! The directory contains multiple explanations, only the first one will be displayed.\")\n",
" print('The explanations are located at {}.'.format(explanations_dirrectory[0]))\n",
" # Now we will list all the explanations.\n",
" explanation_path = os.path.join('training_explanations', 'azureml', explanations_dirrectory[0], 'training_explanations')\n",
" print(\"Available explanations\")\n",
" print(\"==============================\")\n",
" print(\"\\n\".join(os.listdir(explanation_path)))\n",
"else:\n",
" print(\"Model explanations are available only if model_explainability is set to True.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"View the explanations on \"state\" level."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import display\n",
"\n",
"explanation_type = 'raw'\n",
"level = 'state'\n",
"\n",
"if model_explainability:\n",
" display(pd.read_csv(os.path.join(explanation_path, \"{}_explanations_{}.csv\").format(explanation_type, level)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5.0 Forecasting\n",
"For hierarchical forecasting we need to provide the HTSInferenceParameters object.\n",
"#### HTSInferenceParameters arguments\n",
"* **hierarchy_forecast_level:** The default level of the hierarchy to produce prediction/forecast on.\n",
"* **allocation_method:** \\[Optional] The disaggregation method to use if the hierarchy forecast level specified is below the define hierarchy training level. <br><i>(average historical proportions) 'average_historical_proportions'</i><br><i>(proportions of the historical averages) 'proportions_of_historical_average'</i>\n",
"\n",
"#### get_many_models_batch_inference_steps arguments\n",
"* **experiment:** The experiment used for inference run.\n",
"* **inference_data:** The data to use for inferencing. It should be the same schema as used for training.\n",
"* **compute_target:** The compute target that runs the inference pipeline.\n",
"* **node_count:** The number of compute nodes to be used for running the user script. We recommend to start with the number of cores per node (varies by compute sku).\n",
"* **process_count_per_node:** The number of processes per node.\n",
"* **train_run_id:** \\[Optional] The run id of the hierarchy training, by default it is the latest successful training hts run in the experiment.\n",
"* **train_experiment_name:** \\[Optional] The train experiment that contains the train pipeline. This one is only needed when the train pipeline is not in the same experiement as the inference pipeline.\n",
"* **process_count_per_node:** \\[Optional] The number of processes per node, by default it's 4."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.automl.runtime._hts.hts_parameters import HTSInferenceParameters\n",
"\n",
"inference_parameters = HTSInferenceParameters(\n",
" hierarchy_forecast_level=\"store_id\", # The setting is specific to this dataset and should be changed based on your dataset.\n",
" allocation_method=\"proportions_of_historical_average\"\n",
")\n",
"\n",
"steps = AutoMLPipelineBuilder.get_many_models_batch_inference_steps(\n",
" experiment=experiment,\n",
" inference_data=registered_inference,\n",
" compute_target=compute_target,\n",
" inference_pipeline_parameters=inference_parameters,\n",
" node_count=2,\n",
" process_count_per_node=8\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.core import Pipeline\n",
"\n",
"inference_pipeline = Pipeline(ws, steps=steps)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"inference_run = experiment.submit(inference_pipeline)\n",
"inference_run.wait_for_completion(show_output=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieve results\n",
"\n",
"Forecast results can be retrieved through the following code. The prediction results summary and the actual predictions are downloaded the \"forecast_results\" folder"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forecasts = inference_run.get_pipeline_output(\"forecasts\")\n",
"forecasts.download(\"forecast_results\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Resbumit the Pipeline\n",
"\n",
"The inference pipeline can be submitted with different configurations."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"inference_run = experiment.submit(inference_pipeline, pipeline_parameters={\"hierarchy_forecast_level\": \"state\"})\n",
"inference_run.wait_for_completion(show_output=False)"
]
}
],
"metadata": {
"authors": [
{
"name": "jialiu"
}
],
"categories": [
"how-to-use-azureml",
"automated-machine-learning"
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,4 @@
name: auto-ml-forecasting-hierarchical-timeseries
dependencies:
- pip:
- azureml-sdk

View File

@@ -0,0 +1,717 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Many Models - Automated ML\n",
"**_Generate many models time series forecasts with Automated Machine Learning_**\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this notebook we are using a synthetic dataset portraying sales data to predict the the quantity of a vartiety of product skus across several states, stores, and product categories.\n",
"\n",
"**NOTE: There are limits on how many runs we can do in parallel per workspace, and we currently recommend to set the parallelism to maximum of 320 runs per experiment per workspace. If users want to have more parallelism and increase this limit they might encounter Too Many Requests errors (HTTP 429).**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites\n",
"You'll need to create a compute Instance by following the instructions in the [EnvironmentSetup.md](../Setup_Resources/EnvironmentSetup.md)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1.0 Set up workspace, datastore, experiment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1613003526897
}
},
"outputs": [],
"source": [
"import azureml.core\n",
"from azureml.core import Workspace, Datastore\n",
"import pandas as pd\n",
"\n",
"# Set up your workspace\n",
"ws = Workspace.from_config()\n",
"ws.get_details()\n",
"\n",
"# Set up your datastores\n",
"dstore = ws.get_default_datastore()\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Default datastore name'] = dstore.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Choose an experiment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1613003540729
}
},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"\n",
"experiment = Experiment(ws, 'automl-many-models')\n",
"\n",
"print('Experiment name: ' + experiment.name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.0 Data\n",
"\n",
"This notebook uses simulated orange juice sales data to walk you through the process of training many models on Azure Machine Learning using Automated ML. \n",
"\n",
"The time series data used in this example was simulated based on the University of Chicago's Dominick's Finer Foods dataset which featured two years of sales of 3 different orange juice brands for individual stores. The full simulated dataset includes 3,991 stores with 3 orange juice brands each thus allowing 11,973 models to be trained to showcase the power of the many models pattern.\n",
"\n",
" \n",
"In this notebook, two datasets will be created: one with all 11,973 files and one with only 10 files that can be used to quickly test and debug. For each dataset, you'll be walked through the process of:\n",
"\n",
"1. Registering the blob container as a Datastore to the Workspace\n",
"2. Registering a tabular dataset to the Workspace"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### 2.1 Data Preparation\n",
"The OJ data is available in the public blob container. The data is split to be used for training and for inferencing. For the current dataset, the data was split on time column ('WeekStarting') before and after '1992-5-28' .\n",
"\n",
"The container has\n",
"<ol>\n",
" <li><b>'oj-data-tabular'</b> and <b>'oj-inference-tabular'</b> folders that contains training and inference data respectively for the 11,973 models. </li>\n",
" <li>It also has <b>'oj-data-small-tabular'</b> and <b>'oj-inference-small-tabular'</b> folders that has training and inference data for 10 models.</li>\n",
"</ol>\n",
"\n",
"To create the [TabularDataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset?view=azure-ml-py) needed for the ParallelRunStep, you first need to register the blob container to the workspace."
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"<b> To use your own data, put your own data in a blobstore folder. As shown it can be one file or multiple files. We can then register datastore using that blob as shown below.\n",
" \n",
"<h3> How sample data in blob store looks like</h3>\n",
"\n",
"['oj-data-tabular'](https://ms.portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F102a16c3-37d3-48a8-9237-4c9b1e8e80e0%2FresourceGroups%2FAutoMLSampleNotebooksData%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fautomlsamplenotebookdata/path/automl-sample-notebook-data/etag/%220x8D84EAA65DE50B7%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container)</b>\n",
"![image-4.png](mm-1.png)\n",
"\n",
"['oj-inference-tabular'](https://ms.portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F102a16c3-37d3-48a8-9237-4c9b1e8e80e0%2FresourceGroups%2FAutoMLSampleNotebooksData%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fautomlsamplenotebookdata/path/automl-sample-notebook-data/etag/%220x8D84EAA65DE50B7%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container)\n",
"![image-3.png](mm-2.png)\n",
"\n",
"['oj-data-small-tabular'](https://ms.portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F102a16c3-37d3-48a8-9237-4c9b1e8e80e0%2FresourceGroups%2FAutoMLSampleNotebooksData%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fautomlsamplenotebookdata/path/automl-sample-notebook-data/etag/%220x8D84EAA65DE50B7%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container)\n",
"\n",
"![image-5.png](mm-3.png)\n",
"\n",
"['oj-inference-small-tabular'](https://ms.portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F102a16c3-37d3-48a8-9237-4c9b1e8e80e0%2FresourceGroups%2FAutoMLSampleNotebooksData%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fautomlsamplenotebookdata/path/automl-sample-notebook-data/etag/%220x8D84EAA65DE50B7%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container)\n",
"![image-6.png](mm-4.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.2 Register the blob container as DataStore\n",
"\n",
"A Datastore is a place where data can be stored that is then made accessible to a compute either by means of mounting or copying the data to the compute target.\n",
"\n",
"Please refer to [Datastore](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py) documentation on how to access data from Datastore.\n",
"\n",
"In this next step, we will be registering blob storage as datastore to the Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Datastore\n",
"\n",
"# Please change the following to point to your own blob container and pass in account_key\n",
"blob_datastore_name = \"automl_many_models\"\n",
"container_name = \"automl-sample-notebook-data\"\n",
"account_name = \"automlsamplenotebookdata\"\n",
"\n",
"oj_datastore = Datastore.register_azure_blob_container(workspace=ws, \n",
" datastore_name=blob_datastore_name, \n",
" container_name=container_name,\n",
" account_name=account_name,\n",
" create_if_not_exists=True) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2.3 Using tabular datasets \n",
"\n",
"Now that the datastore is available from the Workspace, [TabularDataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset?view=azure-ml-py) can be created. Datasets in Azure Machine Learning are references to specific data in a Datastore. We are using TabularDataset, so that users who have their data which can be in one or many files (*.parquet or *.csv) and have not split up data according to group columns needed for training, can do so using out of box support for 'partiion_by' feature of TabularDataset shown in section 5.0 below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1613007017296
}
},
"outputs": [],
"source": [
"from azureml.core import Dataset\n",
"\n",
"ds_name_small = 'oj-data-small-tabular'\n",
"input_ds_small = Dataset.Tabular.from_delimited_files(path=oj_datastore.path(ds_name_small + '/'), validate=False)\n",
"\n",
"inference_name_small = 'oj-inference-small-tabular'\n",
"inference_ds_small = Dataset.Tabular.from_delimited_files(path=oj_datastore.path(inference_name_small + '/'), validate=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3.0 Build the training pipeline\n",
"Now that the dataset, WorkSpace, and datastore are set up, we can put together a pipeline for training.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Choose a compute target\n",
"\n",
"You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"\n",
"\\*\\*Creation of AmlCompute takes approximately 5 minutes.**\n",
"\n",
"If the AmlCompute with that name is already in your workspace this code will skip the creation process. As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this [article](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1613007037308
}
},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"\n",
"# Name your cluster\n",
"compute_name = \"mm-compute\"\n",
"\n",
"\n",
"if compute_name in ws.compute_targets:\n",
" compute_target = ws.compute_targets[compute_name]\n",
" if compute_target and type(compute_target) is AmlCompute:\n",
" print('Found compute target: ' + compute_name)\n",
"else:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size= \"STANDARD_D16S_V3\",\n",
" max_nodes=20)\n",
" # Create the compute target\n",
" compute_target = ComputeTarget.create(\n",
" ws, compute_name, provisioning_config)\n",
"\n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min node count is provided it will use the scale settings for the cluster\n",
" compute_target.wait_for_completion(\n",
" show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n",
" # For a more detailed view of current cluster status, use the 'status' property\n",
" print(compute_target.status.serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up training parameters\n",
"\n",
"This dictionary defines the AutoML and many models settings. For this forecasting task we need to define several settings inncluding the name of the time column, the maximum forecast horizon, and the partition column name definition.\n",
"\n",
"| Property | Description|\n",
"| :--------------- | :------------------- |\n",
"| **task** | forecasting |\n",
"| **primary_metric** | This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i> |\n",
"| **blocked_models** | Blocked models won't be used by AutoML. |\n",
"| **iteration_timeout_minutes** | Maximum amount of time in minutes that the model can train. This is optional but provides customers with greater control on exit criteria. |\n",
"| **iterations** | Number of models to train. This is optional but provides customers with greater control on exit criteria. |\n",
"| **experiment_timeout_hours** | Maximum amount of time in hours that the experiment can take before it terminates. This is optional but provides customers with greater control on exit criteria. |\n",
"| **label_column_name** | The name of the label column. |\n",
"| **forecast_horizon** | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). Periods are inferred from your data. |\n",
"| **n_cross_validations** | Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way. |\n",
"| **enable_early_stopping** | Flag to enable early termination if the score is not improving in the short term. |\n",
"| **time_column_name** | The name of your time column. |\n",
"| **enable_engineered_explanations** | Engineered feature explanations will be downloaded if enable_engineered_explanations flag is set to True. By default it is set to False to save storage space. |\n",
"| **time_series_id_column_name** | The column names used to uniquely identify timeseries in data that has multiple rows with the same timestamp. |\n",
"| **track_child_runs** | Flag to disable tracking of child runs. Only best run is tracked if the flag is set to False (this includes the model and metrics of the run). |\n",
"| **pipeline_fetch_max_batch_size** | Determines how many pipelines (training algorithms) to fetch at a time for training, this helps reduce throttling when training at large scale. |\n",
"| **partition_column_names** | The names of columns used to group your models. For timeseries, the groups must not split up individual time-series. That is, each group must contain one or more whole time-series. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1613007061544
}
},
"outputs": [],
"source": [
"from azureml.train.automl.runtime._many_models.many_models_parameters import ManyModelsTrainParameters\n",
"\n",
"partition_column_names = ['Store', 'Brand']\n",
"automl_settings = {\n",
" \"task\" : 'forecasting',\n",
" \"primary_metric\" : 'normalized_root_mean_squared_error',\n",
" \"iteration_timeout_minutes\" : 10, # This needs to be changed based on the dataset. We ask customer to explore how long training is taking before settings this value\n",
" \"iterations\" : 15,\n",
" \"experiment_timeout_hours\" : 0.25,\n",
" \"label_column_name\" : 'Quantity',\n",
" \"n_cross_validations\" : 3,\n",
" \"time_column_name\": 'WeekStarting',\n",
" \"drop_column_names\": 'Revenue',\n",
" \"max_horizon\" : 6,\n",
" \"grain_column_names\": partition_column_names,\n",
" \"track_child_runs\": False,\n",
"}\n",
"\n",
"mm_paramters = ManyModelsTrainParameters(automl_settings=automl_settings, partition_column_names=partition_column_names)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up many models pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Parallel run step is leveraged to train multiple models at once. To configure the ParallelRunConfig you will need to determine the appropriate number of workers and nodes for your use case. The process_count_per_node is based off the number of cores of the compute VM. The node_count will determine the number of master nodes to use, increasing the node count will speed up the training process.\n",
"\n",
"| Property | Description|\n",
"| :--------------- | :------------------- |\n",
"| **experiment** | The experiment used for training. |\n",
"| **train_data** | The file dataset to be used as input to the training run. |\n",
"| **node_count** | The number of compute nodes to be used for running the user script. We recommend to start with 3 and increase the node_count if the training time is taking too long. |\n",
"| **process_count_per_node** | Process count per node, we recommend 2:1 ratio for number of cores: number of processes per node. eg. If node has 16 cores then configure 8 or less process count per node or optimal performance. |\n",
"| **train_pipeline_parameters** | The set of configuration parameters defined in the previous section. |\n",
"\n",
"Calling this method will create a new aggregated dataset which is generated dynamically on pipeline execution."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder\n",
"\n",
"\n",
"training_pipeline_steps = AutoMLPipelineBuilder.get_many_models_train_steps(\n",
" experiment=experiment,\n",
" train_data=input_ds_small,\n",
" compute_target=compute_target,\n",
" node_count=2,\n",
" process_count_per_node=8,\n",
" run_invocation_timeout=920,\n",
" train_pipeline_parameters=mm_paramters,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.core import Pipeline\n",
"\n",
"training_pipeline = Pipeline(ws, steps=training_pipeline_steps)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit the pipeline to run\n",
"Next we submit our pipeline to run. The whole training pipeline takes about 40m using a STANDARD_D16S_V3 VM with our current ParallelRunConfig setting."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"training_run = experiment.submit(training_pipeline)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"training_run.wait_for_completion(show_output=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check the run status, if training_run is in completed state, continue to forecasting. If training_run is in another state, check the portal for failures."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5.0 Publish and schedule the train pipeline (Optional)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.1 Publish the pipeline\n",
"\n",
"Once you have a pipeline you're happy with, you can publish a pipeline so you can call it programmatically later on. See this [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline#publish-a-pipeline) for additional information on publishing and calling pipelines."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# published_pipeline = training_pipeline.publish(name = 'automl_train_many_models',\n",
"# description = 'train many models',\n",
"# version = '1',\n",
"# continue_on_step_failure = False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7.2 Schedule the pipeline\n",
"You can also [schedule the pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-schedule-pipelines) to run on a time-based or change-based schedule. This could be used to automatically retrain models every month or based on another trigger such as data drift."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# from azureml.pipeline.core import Schedule, ScheduleRecurrence\n",
" \n",
"# training_pipeline_id = published_pipeline.id\n",
"\n",
"# recurrence = ScheduleRecurrence(frequency=\"Month\", interval=1, start_time=\"2020-01-01T09:00:00\")\n",
"# recurring_schedule = Schedule.create(ws, name=\"automl_training_recurring_schedule\", \n",
"# description=\"Schedule Training Pipeline to run on the first day of every month\",\n",
"# pipeline_id=training_pipeline_id, \n",
"# experiment_name=experiment.name, \n",
"# recurrence=recurrence)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6.0 Forecasting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up output dataset for inference data\n",
"Output of inference can be represented as [OutputFileDatasetConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.output_dataset_config.outputdatasetconfig?view=azure-ml-py) object and OutputFileDatasetConfig can be registered as a dataset. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.data import OutputFileDatasetConfig\n",
"output_inference_data_ds = OutputFileDatasetConfig(name='many_models_inference_output', destination=(dstore, 'oj/inference_data/')).register_on_complete(name='oj_inference_data_ds')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For many models we need to provide the ManyModelsInferenceParameters object.\n",
"\n",
"#### ManyModelsInferenceParameters arguments\n",
"| Property | Description|\n",
"| :--------------- | :------------------- |\n",
"| **partition_column_names** | List of column names that identifies groups. |\n",
"| **target_column_name** | \\[Optional] Column name only if the inference dataset has the target. |\n",
"| **time_column_name** | \\[Optional] Column name only if it is timeseries. |\n",
"| **many_models_run_id** | \\[Optional] Many models run id where models were trained. |\n",
"\n",
"#### get_many_models_batch_inference_steps arguments\n",
"| Property | Description|\n",
"| :--------------- | :------------------- |\n",
"| **experiment** | The experiment used for inference run. |\n",
"| **inference_data** | The data to use for inferencing. It should be the same schema as used for training.\n",
"| **compute_target** The compute target that runs the inference pipeline.|\n",
"| **node_count** | The number of compute nodes to be used for running the user script. We recommend to start with the number of cores per node (varies by compute sku). |\n",
"| **process_count_per_node** The number of processes per node.\n",
"| **train_run_id** | \\[Optional] The run id of the hierarchy training, by default it is the latest successful training many model run in the experiment. |\n",
"| **train_experiment_name** | \\[Optional] The train experiment that contains the train pipeline. This one is only needed when the train pipeline is not in the same experiement as the inference pipeline. |\n",
"| **process_count_per_node** | \\[Optional] The number of processes per node, by default it's 4. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder\n",
"from azureml.train.automl.runtime._many_models.many_models_parameters import ManyModelsInferenceParameters\n",
"\n",
"mm_parameters = ManyModelsInferenceParameters(\n",
" partition_column_names=['Store', 'Brand'],\n",
" time_column_name=\"WeekStarting\",\n",
" target_column_name=\"Quantity\"\n",
")\n",
"\n",
"inference_steps = AutoMLPipelineBuilder.get_many_models_batch_inference_steps(\n",
" experiment=experiment,\n",
" inference_data=inference_ds_small,\n",
" node_count=2,\n",
" process_count_per_node=8,\n",
" compute_target=compute_target,\n",
" run_invocation_timeout=300,\n",
" output_datastore=output_inference_data_ds,\n",
" train_run_id=training_run.id,\n",
" train_experiment_name=training_run.experiment.name,\n",
" inference_pipeline_parameters=mm_parameters,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.core import Pipeline\n",
"\n",
"inference_pipeline = Pipeline(ws, steps=inference_steps)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"inference_run = experiment.submit(inference_pipeline)\n",
"inference_run.wait_for_completion(show_output=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieve results\n",
"\n",
"The forecasting pipeline forecasts the orange juice quantity for a Store by Brand. The pipeline returns one file with the predictions for each store and outputs the result to the forecasting_output Blob container. The details of the blob container is listed in 'forecasting_output.txt' under Outputs+logs. \n",
"\n",
"The following code snippet:\n",
"1. Downloads the contents of the output folder that is passed in the parallel run step \n",
"2. Reads the parallel_run_step.txt file that has the predictions as pandas dataframe and \n",
"3. Displays the top 10 rows of the predictions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.automl.pipeline.steps.utilities import get_output_from_mm_pipeline\n",
"\n",
"forecasting_results_name = \"forecasting_results\"\n",
"forecasting_output_name = \"many_models_inference_output\"\n",
"forecast_file = get_output_from_mm_pipeline(inference_run, forecasting_results_name, forecasting_output_name)\n",
"df = pd.read_csv(forecast_file, delimiter=\" \", header=None)\n",
"df.columns = [\"Week Starting\", \"Store\", \"Brand\", \"Quantity\", \"Advert\", \"Price\" , \"Revenue\", \"Predicted\" ]\n",
"print(\"Prediction has \", df.shape[0], \" rows. Here the first 10 rows are being displayed.\")\n",
"df.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7.0 Publish and schedule the inference pipeline (Optional)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7.1 Publish the pipeline\n",
"\n",
"Once you have a pipeline you're happy with, you can publish a pipeline so you can call it programmatically later on. See this [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline#publish-a-pipeline) for additional information on publishing and calling pipelines."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# published_pipeline_inf = inference_pipeline.publish(name = 'automl_forecast_many_models',\n",
"# description = 'forecast many models',\n",
"# version = '1',\n",
"# continue_on_step_failure = False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7.2 Schedule the pipeline\n",
"You can also [schedule the pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-schedule-pipelines) to run on a time-based or change-based schedule. This could be used to automatically retrain or forecast models every month or based on another trigger such as data drift."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# from azureml.pipeline.core import Schedule, ScheduleRecurrence\n",
" \n",
"# forecasting_pipeline_id = published_pipeline.id\n",
"\n",
"# recurrence = ScheduleRecurrence(frequency=\"Month\", interval=1, start_time=\"2020-01-01T09:00:00\")\n",
"# recurring_schedule = Schedule.create(ws, name=\"automl_forecasting_recurring_schedule\", \n",
"# description=\"Schedule Forecasting Pipeline to run on the first day of every week\",\n",
"# pipeline_id=forecasting_pipeline_id, \n",
"# experiment_name=experiment.name, \n",
"# recurrence=recurrence)"
]
}
],
"metadata": {
"authors": [
{
"name": "jialiu"
}
],
"categories": [
"how-to-use-azureml",
"automated-machine-learning"
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,4 @@
name: auto-ml-forecasting-many-models
dependencies:
- pip:
- azureml-sdk

Binary file not shown.

After

Width:  |  Height:  |  Size: 176 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 165 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 166 KiB

View File

@@ -24,20 +24,20 @@
"_**Orange Juice Sales Forecasting**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Compute](#Compute)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
"1. [Predict](#Predict)\n",
"1. [Operationalize](#Operationalize)"
"1. [Introduction](#introduction)\n",
"1. [Setup](#setup)\n",
"1. [Compute](#compute)\n",
"1. [Data](#data)\n",
"1. [Train](#train)\n",
"1. [Forecast](#forecast)\n",
"1. [Operationalize](#operationalize)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"## Introduction<a id=\"introduction\"></a>\n",
"In this example, we use AutoML to train, select, and operationalize a time-series forecasting model for multiple time-series.\n",
"\n",
"Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
@@ -49,7 +49,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
"## Setup<a id=\"setup\"></a>"
]
},
{
@@ -60,7 +60,6 @@
"source": [
"import azureml.core\n",
"import pandas as pd\n",
"import numpy as np\n",
"import logging\n",
"\n",
"from azureml.core.workspace import Workspace\n",
@@ -82,7 +81,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -122,8 +121,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Compute\n",
"## Compute<a id=\"compute\"></a>\n",
"You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
"\n",
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
@@ -146,7 +148,7 @@
" compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D12_V2',\n",
" max_nodes=6)\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"\n",
@@ -157,7 +159,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"## Data<a id=\"data\"></a>\n",
"You are now ready to load the historical orange juice sales data. We will load the CSV file into a plain pandas DataFrame; the time column in the CSV is called _WeekStarting_, so it will be specially parsed into the datetime type."
]
},
@@ -284,7 +286,8 @@
"outputs": [],
"source": [
"from azureml.core.dataset import Dataset\n",
"train_dataset = Dataset.Tabular.from_delimited_files(path=datastore.path('dataset/dominicks_OJ_train.csv'))"
"train_dataset = Dataset.Tabular.from_delimited_files(path=datastore.path('dataset/dominicks_OJ_train.csv'))\n",
"test_dataset = Dataset.Tabular.from_delimited_files(path=datastore.path('dataset/dominicks_OJ_test.csv'))"
]
},
{
@@ -377,7 +380,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"## Train<a id=\"train\"></a>\n",
"\n",
"The [AutoMLConfig](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py) object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, the training data, and cross-validation parameters.\n",
"\n",
@@ -423,7 +426,8 @@
"forecasting_parameters = ForecastingParameters(\n",
" time_column_name=time_column_name,\n",
" forecast_horizon=n_test_periods,\n",
" time_series_id_column_names=time_series_id_column_names\n",
" time_series_id_column_names=time_series_id_column_names,\n",
" freq='W-THU' # Set the forecast frequency to be weekly (start on each Thursday)\n",
")\n",
"\n",
"automl_config = AutoMLConfig(task='forecasting',\n",
@@ -455,8 +459,7 @@
"metadata": {},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config, show_output=False)\n",
"remote_run"
"remote_run = experiment.submit(automl_config, show_output=False)"
]
},
{
@@ -518,9 +521,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Forecasting\n",
"# Forecast<a id=\"forecast\"></a>\n",
"\n",
"Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. First, we remove the target values from the test set:"
"Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. We will do batch scoring on the test dataset which should have the same schema as training dataset.\n",
"\n",
"The inference will run on a remote compute. In this example, it will re-use the training compute."
]
},
{
@@ -529,17 +534,15 @@
"metadata": {},
"outputs": [],
"source": [
"X_test = test\n",
"y_test = X_test.pop(target_column_name).values"
"test_experiment = Experiment(ws, experiment_name + \"_inference\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"X_test.head()"
"### Retreiving forecasts from the model\n",
"We have created a function called `run_forecast` that submits the test data to the best model determined during the training run and retrieves forecasts. This function uses a helper script `forecasting_script` which is uploaded and expecuted on the remote compute."
]
},
{
@@ -555,18 +558,16 @@
"metadata": {},
"outputs": [],
"source": [
"# forecast returns the predictions and the featurized data, aligned to X_test.\n",
"# This contains the assumptions that were made in the forecast\n",
"y_predictions, X_trans = fitted_model.forecast(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are used to scikit pipelines, perhaps you expected `predict(X_test)`. However, forecasting requires a more general interface that also supplies the past target `y` values. Please use `forecast(X,y)` as `predict(X)` is reserved for internal purposes on forecasting models.\n",
"from run_forecast import run_remote_inference\n",
"remote_run_infer = run_remote_inference(test_experiment=test_experiment, \n",
" compute_target=compute_target,\n",
" train_run=best_run,\n",
" test_dataset=test_dataset,\n",
" target_column_name=target_column_name)\n",
"remote_run_infer.wait_for_completion(show_output=False)\n",
"\n",
"The [forecast function notebook](../forecasting-forecast-function/auto-ml-forecasting-function.ipynb)."
"# download the forecast file to the local machine\n",
"remote_run_infer.download_file('outputs/predictions.csv', 'predictions.csv')"
]
},
{
@@ -586,8 +587,9 @@
"metadata": {},
"outputs": [],
"source": [
"assign_dict = {'predicted': y_predictions, target_column_name: y_test}\n",
"df_all = X_test.assign(**assign_dict)"
"# load forecast data frame\n",
"fcst_df = pd.read_csv('predictions.csv', parse_dates=[time_column_name])\n",
"fcst_df.head()"
]
},
{
@@ -602,8 +604,8 @@
"\n",
"# use automl scoring module\n",
"scores = scoring.score_regression(\n",
" y_test=df_all[target_column_name],\n",
" y_pred=df_all['predicted'],\n",
" y_test=fcst_df[target_column_name],\n",
" y_pred=fcst_df['predicted'],\n",
" metrics=list(constants.Metric.SCALAR_REGRESSION_SET))\n",
"\n",
"print(\"[Test data scores]\\n\")\n",
@@ -612,8 +614,8 @@
" \n",
"# Plot outputs\n",
"%matplotlib inline\n",
"test_pred = plt.scatter(df_all[target_column_name], df_all['predicted'], color='b')\n",
"test_test = plt.scatter(df_all[target_column_name], df_all[target_column_name], color='g')\n",
"test_pred = plt.scatter(fcst_df[target_column_name], fcst_df['predicted'], color='b')\n",
"test_test = plt.scatter(fcst_df[target_column_name], fcst_df[target_column_name], color='g')\n",
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
"plt.show()"
]
@@ -622,7 +624,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Operationalize"
"# Operationalize<a id=\"operationalize\"></a>"
]
},
{
@@ -685,8 +687,8 @@
"inference_config = InferenceConfig(environment = best_run.get_environment(), \n",
" entry_script = script_file_name)\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
" memory_gb = 2, \n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 2, \n",
" memory_gb = 4, \n",
" tags = {'type': \"automl-forecasting\"},\n",
" description = \"Automl forecasting sample service\")\n",
"\n",
@@ -720,19 +722,22 @@
"outputs": [],
"source": [
"import json\n",
"X_query = X_test.copy()\n",
"X_query = test.copy()\n",
"X_query.pop(target_column_name)\n",
"# We have to convert datetime to string, because Timestamps cannot be serialized to JSON.\n",
"X_query[time_column_name] = X_query[time_column_name].astype(str)\n",
"# The Service object accept the complex dictionary, which is internally converted to JSON string.\n",
"# The section 'data' contains the data frame in the form of dictionary.\n",
"test_sample = json.dumps({'data': X_query.to_dict(orient='records')})\n",
"sample_quantiles=[0.025,0.975]\n",
"test_sample = json.dumps({'data': X_query.to_dict(orient='records'), 'quantiles': sample_quantiles})\n",
"response = aci_service.run(input_data = test_sample)\n",
"# translate from networkese to datascientese\n",
"try: \n",
" res_dict = json.loads(response)\n",
" y_fcst_all = pd.DataFrame(res_dict['index'])\n",
" y_fcst_all[time_column_name] = pd.to_datetime(y_fcst_all[time_column_name], unit = 'ms')\n",
" y_fcst_all['forecast'] = res_dict['forecast'] \n",
" y_fcst_all['forecast'] = res_dict['forecast']\n",
" y_fcst_all['prediction_interval'] = res_dict['prediction_interval']\n",
"except:\n",
" print(res_dict)"
]
@@ -802,7 +807,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
"version": "3.6.9"
},
"tags": [
"None"

View File

@@ -0,0 +1,4 @@
name: auto-ml-forecasting-orange-juice-sales
dependencies:
- pip:
- azureml-sdk

View File

@@ -0,0 +1,56 @@
"""
This is the script that is executed on the compute instance. It relies
on the model.pkl file which is uploaded along with this script to the
compute instance.
"""
import argparse
from azureml.core import Dataset, Run
from sklearn.externals import joblib
from pandas.tseries.frequencies import to_offset
parser = argparse.ArgumentParser()
parser.add_argument(
'--target_column_name', type=str, dest='target_column_name',
help='Target Column Name')
parser.add_argument(
'--test_dataset', type=str, dest='test_dataset',
help='Test Dataset')
args = parser.parse_args()
target_column_name = args.target_column_name
test_dataset_id = args.test_dataset
run = Run.get_context()
ws = run.experiment.workspace
# get the input dataset by id
test_dataset = Dataset.get_by_id(ws, id=test_dataset_id)
X_test = test_dataset.to_pandas_dataframe().reset_index(drop=True)
y_test = X_test.pop(target_column_name).values
# generate forecast
fitted_model = joblib.load('model.pkl')
# We have default quantiles values set as below(95th percentile)
quantiles = [0.025, 0.5, 0.975]
predicted_column_name = 'predicted'
PI = 'prediction_interval'
fitted_model.quantiles = quantiles
pred_quantiles = fitted_model.forecast_quantiles(X_test)
pred_quantiles[PI] = pred_quantiles[[min(quantiles), max(quantiles)]].apply(lambda x: '[{}, {}]'.format(x[0],
x[1]), axis=1)
X_test[target_column_name] = y_test
X_test[PI] = pred_quantiles[PI]
X_test[predicted_column_name] = pred_quantiles[0.5]
# drop rows where prediction or actuals are nan
# happens because of missing actuals
# or at edges of time due to lags/rolling windows
clean = X_test[X_test[[target_column_name,
predicted_column_name]].notnull().all(axis=1)]
file_name = 'outputs/predictions.csv'
export_csv = clean.to_csv(file_name, header=True, index=False) # added Index
# Upload the predictions into artifacts
run.upload_file(name=file_name, path_or_stream=file_name)

View File

@@ -0,0 +1,38 @@
import os
import shutil
from azureml.core import ScriptRunConfig
def run_remote_inference(test_experiment, compute_target, train_run,
test_dataset, target_column_name, inference_folder='./forecast'):
# Create local directory to copy the model.pkl and forecsting_script.py files into.
# These files will be uploaded to and executed on the compute instance.
os.makedirs(inference_folder, exist_ok=True)
shutil.copy('forecasting_script.py', inference_folder)
train_run.download_file('outputs/model.pkl',
os.path.join(inference_folder, 'model.pkl'))
inference_env = train_run.get_environment()
config = ScriptRunConfig(source_directory=inference_folder,
script='forecasting_script.py',
arguments=['--target_column_name',
target_column_name,
'--test_dataset',
test_dataset.as_named_input(test_dataset.name)],
compute_target=compute_target,
environment=inference_env)
run = test_experiment.submit(config,
tags={'training_run_id':
train_run.id,
'run_algorithm':
train_run.properties['run_algorithm'],
'valid_score':
train_run.properties['score'],
'primary_metric':
train_run.properties['primary_metric']})
run.log("run_algorithm", run.tags['run_algorithm'])
return run

View File

@@ -0,0 +1,492 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-recipes-univariate/1_determine_experiment_settings.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook we will explore the univaraite time-series data to determine the settings for an automated ML experiment. We will follow the thought process depicted in the following diagram:<br/>\n",
"![Forecasting after training](figures/univariate_settings_map_20210408.jpg)\n",
"\n",
"The objective is to answer the following questions:\n",
"\n",
"<ol>\n",
" <li>Is there a seasonal pattern in the data? </li>\n",
" <ul style=\"margin-top:-1px; list-style-type:none\"> \n",
" <li> Importance: If we are able to detect regular seasonal patterns, the forecast accuracy may be improved by extracting these patterns and including them as features into the model. </li>\n",
" </ul>\n",
" <li>Is the data stationary? </li>\n",
" <ul style=\"margin-top:-1px; list-style-type:none\"> \n",
" <li> Importance: In the absense of features that capture trend behavior, ML models (regression and tree based) are not well equiped to predict stochastic trends. Working with stationary data solves this problem. </li>\n",
" </ul>\n",
" <li>Is there a detectable auto-regressive pattern in the stationary data? </li>\n",
" <ul style=\"margin-top:-1px; list-style-type:none\"> \n",
" <li> Importance: The accuracy of ML models can be improved if serial correlation is modeled by including lags of the dependent/target varaible as features. Including target lags in every experiment by default will result in a regression in accuracy scores if such setting is not warranted. </li>\n",
" </ul>\n",
"</ol>\n",
"\n",
"The answers to these questions will help determine the appropriate settings for the automated ML experiment.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import warnings\n",
"import pandas as pd\n",
"\n",
"from statsmodels.graphics.tsaplots import plot_acf, plot_pacf\n",
"import matplotlib.pyplot as plt\n",
"from pandas.plotting import register_matplotlib_converters\n",
"register_matplotlib_converters() # fixes the future warning issue\n",
"\n",
"from helper_functions import unit_root_test_wrapper\n",
"from statsmodels.tools.sm_exceptions import InterpolationWarning\n",
"warnings.simplefilter('ignore', InterpolationWarning)\n",
"\n",
"\n",
"# set printing options\n",
"pd.set_option('display.max_columns', 500)\n",
"pd.set_option('display.width', 1000)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# load data\n",
"main_data_loc = 'data'\n",
"train_file_name = 'S4248SM144SCEN.csv'\n",
"\n",
"TARGET_COLNAME = 'S4248SM144SCEN'\n",
"TIME_COLNAME = 'observation_date'\n",
"COVID_PERIOD_START = '2020-03-01'\n",
"\n",
"df = pd.read_csv(os.path.join(main_data_loc, train_file_name))\n",
"df[TIME_COLNAME] = pd.to_datetime(df[TIME_COLNAME], format='%Y-%m-%d')\n",
"df.sort_values(by=TIME_COLNAME, inplace=True)\n",
"df.set_index(TIME_COLNAME, inplace=True)\n",
"df.head(2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# plot the entire dataset\n",
"fig, ax = plt.subplots(figsize=(6,2), dpi=180)\n",
"ax.plot(df)\n",
"ax.title.set_text('Original Data Series')\n",
"locs, labels = plt.xticks()\n",
"plt.xticks(rotation=45)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The graph plots the alcohol sales in the United States. Because the data is trending, it can be difficult to see cycles, seasonality or other interestng behaviors due to the scaling issues. For example, if there is a seasonal pattern, which we will discuss later, we cannot see them on the trending data. In such case, it is worth plotting the same data in first differences."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# plot the entire dataset in first differences\n",
"fig, ax = plt.subplots(figsize=(6,2), dpi=180)\n",
"ax.plot(df.diff().dropna())\n",
"ax.title.set_text('Data in first differences')\n",
"locs, labels = plt.xticks()\n",
"plt.xticks(rotation=45)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous plot we observe that the data is more volatile towards the end of the series. This period coincides with the Covid-19 period, so we will exclude it from our experiment. Since in this example there are no user-provided features it is hard to make an argument that a model trained on the less volatile pre-covid data will be able to accurately predict the covid period."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Seasonality\n",
"\n",
"#### Questions that need to be answered in this section:\n",
"1. Is there a seasonality?\n",
"2. If it's seasonal, does the data exhibit a trend (up or down)?\n",
"\n",
"It is hard to visually detect seasonality when the data is trending. The reason being is scale of seasonal fluctuations is dwarfed by the range of the trend in the data. One way to deal with this is to de-trend the data by taking the first differences. We will discuss this in more detail in the next section."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# plot the entire dataset in first differences\n",
"fig, ax = plt.subplots(figsize=(6,2), dpi=180)\n",
"ax.plot(df.diff().dropna())\n",
"ax.title.set_text('Data in first differences')\n",
"locs, labels = plt.xticks()\n",
"plt.xticks(rotation=45)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the next plot, we will exclude the Covid period again. We will also shorten the length of data because plotting a very long time series may prevent us from seeing seasonal patterns, if there are any, because the plot may look like a random walk."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# remove COVID period\n",
"df = df[:COVID_PERIOD_START]\n",
"\n",
"# plot the entire dataset in first differences\n",
"fig, ax = plt.subplots(figsize=(6,2), dpi=180)\n",
"ax.plot(df['2015-01-01':].diff().dropna())\n",
"ax.title.set_text('Data in first differences')\n",
"locs, labels = plt.xticks()\n",
"plt.xticks(rotation=45)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p style=\"font-size:150%; color:blue\"> Conclusion </p>\n",
"\n",
"Visual examination does not suggest clear seasonal patterns. We will set the STL_TYPE = None, and we will move to the next section that examines stationarity. \n",
"\n",
"\n",
"Say, we are working with a different data set that shows clear patterns of seasonality, we have several options for setting the settings:is hard to say which option will work best in your case, hence you will need to run both options to see which one results in more accurate forecasts. </li>\n",
"<ol>\n",
" <li> If the data does not appear to be trending, set DIFFERENCE_SERIES=False, TARGET_LAGS=None and STL_TYPE = \"season\" </li>\n",
" <li> If the data appears to be trending, consider one of the following two settings:\n",
" <ul>\n",
" <ol type=\"a\">\n",
" <li> DIFFERENCE_SERIES=True, TARGET_LAGS=None and STL_TYPE = \"season\", or </li>\n",
" <li> DIFFERENCE_SERIES=False, TARGET_LAGS=None and STL_TYPE = \"trend_season\" </li>\n",
" </ol>\n",
" <li> In the first case, by taking first differences we are removing stochastic trend, but we do not remove seasonal patterns. In the second case, we do not remove the stochastic trend and it can be captured by the trend component of the STL decomposition. It is hard to say which option will work best in your case, hence you will need to run both options to see which one results in more accurate forecasts. </li>\n",
" </ul>\n",
"</ol>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Stationarity\n",
"If the data does not exhibit seasonal patterns, we would like to see if the data is non-stationary. Particularly, we want to see if there is a clear trending behavior. If such behavior is observed, we would like to first difference the data and examine the plot of an auto-correlation function (ACF) known as correlogram. If the data is seasonal, differencing it will not get rid off the seasonality and this will be shown on the correlogram as well.\n",
"\n",
"<ul>\n",
" <li> Question: What is stationarity and how to we detect it? </li>\n",
" <ul>\n",
" <li> This is a fairly complex topic. Please read the following <a href=\"https://otexts.com/fpp2/stationarity.html\"> link </a> for a high level discussion on this subject. </li>\n",
" <li> Simply put, we are looking for scenario when examining the time series plots the mean of the series is roughly the same, regardless which time interval you pick to compute it. Thus, trending and seasonal data are examples of non-stationary series. </li>\n",
" </ul>\n",
"</ul>\n",
"\n",
"\n",
"<ul>\n",
" <li> Question: Why do want to work with stationary data?</li>\n",
" <ul> \n",
" <li> In the absence of features that capture stochastic trends, the ML models that use (deterministic) time based features (hour of the day, day of the week, month of the year, etc) cannot capture such trends, and will over or under predict depending on the behavior of the time series. By working with stationary data, we eliminate the need to predict such trends, which improves the forecast accuracy. Classical time series models such as Arima and Exponential Smoothing handle non-stationary series by design and do not need such transformations. By differencing the data we are still able to run the same family of models. </li>\n",
" </ul>\n",
"</ul>\n",
"\n",
"#### Questions that need to be answered in this section:\n",
"<ol> \n",
" <li> Is the data stationary? </li>\n",
" <li> Does the stationarized data (either the original or the differenced series) exhibit a clear auto-regressive pattern?</li>\n",
"</ol>\n",
"\n",
"To answer the first question, we run a series of tests (we call them unit root tests)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# unit root tests\n",
"test = unit_root_test_wrapper(df[TARGET_COLNAME])\n",
"print('---------------', '\\n')\n",
"print('Summary table', '\\n', test['summary'], '\\n')\n",
"print('Is the {} series stationary?: {}'.format(TARGET_COLNAME, test['stationary']))\n",
"print('---------------', '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous cell, we ran a series of unit root tests. The summary table contains the following columns:\n",
"<ul> \n",
" <li> test_name is the name of the test.\n",
" <ul> \n",
" <li> ADF: Augmented Dickey-Fuller test </li>\n",
" <li> KPSS: Kwiatkowski-Phillips\u00e2\u20ac\u201cSchmidt\u00e2\u20ac\u201cShin test </li>\n",
" <li> PP: Phillips-Perron test\n",
" <li> ADF GLS: Augmented Dickey-Fuller using generalized least squares method </li>\n",
" <li> AZ: Andrews-Zivot test </li>\n",
" </ul>\n",
" <li> statistic: test statistic </li>\n",
" <li> crit_val: critical value of the test statistic </li>\n",
" <li> p_val: p-value of the test statistic. If the p-val is less than 0.05, the null hypothesis is rejected. </li>\n",
" <li> stationary: is the series stationary based on the test result? </li>\n",
" <li> Null hypothesis: what is being tested. Notice, some test such as ADF and PP assume the process has a unit root and looks for evidence to reject this hypothesis. Other tests, ex.g: KPSS, assumes the process is stationary and looks for evidence to reject such claim.\n",
"</ul>\n",
"\n",
"Each of the tests shows that the original time series is non-stationary. The final decision is based on the majority rule. If, there is a split decision, the algorithm will claim it is stationary. We run a series of tests because each test by itself may not be accurate. In many cases when there are conflicting test results, the user needs to make determination if the series is stationary or not.\n",
"\n",
"Since we found the series to be non-stationary, we will difference it and then test if the differenced series is stationary."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# unit root tests\n",
"test = unit_root_test_wrapper(df[TARGET_COLNAME].diff().dropna())\n",
"print('---------------', '\\n')\n",
"print('Summary table', '\\n', test['summary'], '\\n')\n",
"print('Is the {} series stationary?: {}'.format(TARGET_COLNAME, test['stationary']))\n",
"print('---------------', '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Four out of five tests show that the series in first differences is stationary. Notice that this decision is not unanimous. Next, let's plot the original series in first-differences to illustrate the difference between non-stationary (unit root) process vs the stationary one."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# plot original and stationary data\n",
"fig = plt.figure(figsize=(10,10))\n",
"ax1 = fig.add_subplot(211)\n",
"ax1.plot(df[TARGET_COLNAME], '-b')\n",
"ax2 = fig.add_subplot(212)\n",
"ax2.plot(df[TARGET_COLNAME].diff().dropna(), '-b')\n",
"ax1.title.set_text('Original data')\n",
"ax2.title.set_text('Data in first differences')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you were asked a question \"What is the mean of the series before and after 2008?\", for the series titled \"Original data\" the mean values will be significantly different. This implies that the first moment of the series (in this case, it is the mean) is time dependent, i.e., mean changes depending on the interval one is looking at. Thus, the series is deemed to be non-stationary. On the other hand, for the series titled \"Data in first differences\" the means for both periods are roughly the same. Hence, the first moment is time invariant; meaning it does not depend on the interval of time one is looking at. In this example it is easy to visually distinguish between stationary and non-stationary data. Often this distinction is not easy to make, therefore we rely on the statistical tests described above to help us make an informed decision. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p style=\"font-size:150%; color:blue\"> Conclusion </p>\n",
"Since we found the original process to be non-stationary (contains unit root), we will have to model the data in first differences. As a result, we will set the DIFFERENCE_SERIES parameter to True."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3 Check if there is a clear autoregressive pattern\n",
"We need to determine if we should include lags of the target variable as features in order to improve forecast accuracy. To do this, we will examine the ACF and partial ACF (PACF) plots of the stationary series. In our case, it is a series in first diffrences.\n",
"\n",
"<ul>\n",
" <li> Question: What is an Auto-regressive pattern? What are we looking for? </li>\n",
" <ul style=\"list-style-type:none;\">\n",
" <li> We are looking for a classical profiles for an AR(p) process such as an exponential decay of an ACF and a the first $p$ significant lags of the PACF. For a more detailed explanation of ACF and PACF please refer to the appendix at the end of this notebook. For illustration purposes, let's examine the ACF/PACF profiles of the simulated data that follows a second order auto-regressive process, abbreviated as an AR(2). <li/>\n",
" <li><img src=\"figures/ACF_PACF_for_AR2.png\" class=\"img_class\">\n",
" <br/>\n",
" The lag order is on the x-axis while the auto- and partial-correlation coefficients are on the y-axis. Vertical lines that are outside the shaded area represent statistically significant lags. Notice, the ACF function decays to zero and the PACF shows 2 significant spikes (we ignore the first spike for lag 0 in both plots since the linear relationship of any series with itself is always 1). <li/>\n",
" </ul>\n",
"<ul/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<ul>\n",
" <li> Question: What do I do if I observe an auto-regressive behavior? </li>\n",
" <ul style=\"list-style-type:none;\">\n",
" <li> If such behavior is observed, we might improve the forecast accuracy by enabling the target lags feature in AutoML. There are a few options of doing this </li>\n",
" <ol>\n",
" <li> Set the target lags parameter to 'auto', or </li>\n",
" <li> Specify the list of lags you want to include. Ex.g: target_lags = [1,2,5] </li>\n",
" </ol>\n",
" </ul>\n",
" <br/>\n",
" <li> Next, let's examine the ACF and PACF plots of the stationary target variable (depicted below). Here, we do not see a decay in the ACF, instead we see a decay in PACF. It is hard to make an argument the the target variable exhibits auto-regressive behavior. </li>\n",
" </ul>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Plot the ACF/PACF for the series in differences\n",
"fig, ax = plt.subplots(1,2,figsize=(10,5))\n",
"plot_acf(df[TARGET_COLNAME].diff().dropna().values.squeeze(), ax=ax[0])\n",
"plot_pacf(df[TARGET_COLNAME].diff().dropna().values.squeeze(), ax=ax[1])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p style=\"font-size:150%; color:blue\"> Conclusion </p>\n",
"Since we do not see a clear indication of an AR(p) process, we will not be using target lags and will set the TARGET_LAGS parameter to None."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p style=\"font-size:150%; color:blue; font-weight: bold\"> AutoML Experiment Settings </p>\n",
"Based on the analysis performed, we should try the following settings for the AutoML experiment and use them in the \"2_run_experiment\" notebook.\n",
"<ul>\n",
" <li> STL_TYPE=None </li>\n",
" <li> DIFFERENCE_SERIES=True </li>\n",
" <li> TARGET_LAGS=None </li>\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Appendix: ACF, PACF and Lag Selection\n",
"To do this, we will examine the ACF and partial ACF (PACF) plots of the differenced series. \n",
"\n",
"<ul>\n",
" <li> Question: What is the ACF? </li>\n",
" <ul style=\"list-style-type:none;\">\n",
" <li> To understand the ACF, first let's look at the correlation coefficient $\\rho_{xz}$\n",
" \\begin{equation}\n",
" \\rho_{xz} = \\frac{\\sigma_{xz}}{\\sigma_{x} \\sigma_{zy}}\n",
" \\end{equation}\n",
" </li>\n",
" where $\\sigma_{xzy}$ is the covariance between two random variables $X$ and $Z$; $\\sigma_x$ and $\\sigma_z$ is the variance for $X$ and $Z$, respectively. The correlation coefficient measures the strength of linear relationship between two random variables. This metric can take any value from -1 to 1. <li/>\n",
" <br/>\n",
" <li> The auto-correlation coefficient $\\rho_{Y_{t} Y_{t-k}}$ is the time series equivalent of the correlation coefficient, except instead of measuring linear association between two random variables $X$ and $Z$, it measures the strength of a linear relationship between a random variable $Y_t$ and its lag $Y_{t-k}$ for any positive interger value of $k$. </li> \n",
" <br />\n",
" <li> To visualize the ACF for a particular lag, say lag 2, plot the second lag of a series $y_{t-2}$ on the x-axis, and plot the series itself $y_t$ on the y-axis. The autocorrelation coefficient is the slope of the best fitted regression line and can be interpreted as follows. A one unit increase in the lag of a variable one period ago leads to a $\\rho_{Y_{t} Y_{t-2}}$ units change in the variable in the current period. This interpreation can be applied to any lag. </li> \n",
" <br />\n",
" <li> In the interpretation posted above we need to be careful not to confuse the word \"leads\" with \"causes\" since these are not the same thing. We do not know the lagged value of the varaible causes it to change. Afterall, there are probably many other features that may explain the movement in $Y_t$. All we are trying to do in this section is to identify situations when the variable contains the strong auto-regressive components that needs to be included in the model to improve forecast accuracy. </li>\n",
" </ul>\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<ul>\n",
" <li> Question: What is the PACF? </li>\n",
" <ul style=\"list-style-type:none;\">\n",
" <li> When describing the ACF we essentially running a regression between a partigular lag of a series, say, lag 4, and the series itself. What this implies is the regression coefficient for lag 4 captures the impact of everything that happens in lags 1, 2 and 3. In other words, if lag 1 is the most important lag and we exclude it from the regression, naturally, the regression model will assign the importance of the 1st lag to the 4th one. Partial auto-correlation function fixes this problem since it measures the contribution of each lag accounting for the information added by the intermediary lags. If we were to illustrate ACF and PACF for the fourth lag using the regression analogy, the difference is a follows: \n",
" \\begin{align}\n",
" Y_{t} &= a_{0} + a_{4} Y_{t-4} + e_{t} \\\\\n",
" Y_{t} &= b_{0} + b_{1} Y_{t-1} + b_{2} Y_{t-2} + b_{3} Y_{t-3} + b_{4} Y_{t-4} + \\varepsilon_{t} \\\\\n",
" \\end{align}\n",
" </li>\n",
" <br/>\n",
" <li>\n",
" Here, you can think of $a_4$ and $b_{4}$ as the auto- and partial auto-correlation coefficients for lag 4. Notice, in the second equation we explicitely accounting for the intermediate lags by adding them as regrerssors.\n",
" </li>\n",
" </ul>\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<ul>\n",
" <li> Question: Auto-regressive pattern? What are we looking for? </li>\n",
" <ul style=\"list-style-type:none;\">\n",
" <li> We are looking for a classical profiles for an AR(p) process such as an exponential decay of an ACF and a the first $p$ significant lags of the PACF. Let's examine the ACF/PACF profiles of the same simulated AR(2) shown in Section 3, and check if the ACF/PACF explanation are refelcted in these plots. <li/>\n",
" <li><img src=\"figures/ACF_PACF_for_AR2.png\" class=\"img_class\">\n",
" <li> The autocorrelation coefficient for the 3rd lag is 0.6, which can be interpreted that a one unit increase in the value of the target varaible three periods ago leads to 0.6 units increase in the current period. However, the PACF plot shows that the partial autocorrealtion coefficient is zero (from a statistical point of view since it lies within the shaded region). This is happening because the 1st and 2nd lags are good predictors of the target variable. Ommiting these two lags from the regression results in the misleading conclusion that the third lag is a good prediciton. <li/>\n",
" <br/>\n",
" <li> This is why it is important to examine both the ACF and the PACF plots when tring to determine the auto regressive order for the variable in question. <li/>\n",
" </ul>\n",
"</ul> "
]
}
],
"metadata": {
"authors": [
{
"name": "vlbejan"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,4 @@
name: auto-ml-forecasting-univariate-recipe-experiment-settings
dependencies:
- pip:
- azureml-sdk

View File

@@ -0,0 +1,560 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-recipes-univariate/2_run_experiment.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Running AutoML experiments\n",
"\n",
"See the `auto-ml-forecasting-univariate-recipe-experiment-settings` notebook on how to determine settings for seasonal features, target lags and whether the series needs to be differenced or not. To make experimentation user-friendly, the user has to specify several parameters: DIFFERENCE_SERIES, TARGET_LAGS and STL_TYPE. Once these parameters are set, the notebook will generate correct transformations and settings to run experiments, generate forecasts, compute inference set metrics and plot forecast vs actuals. It will also convert the forecast from first differences to levels (original units of measurement) if the DIFFERENCE_SERIES parameter is set to True before calculating inference set metrics.\n",
"\n",
"<br/>\n",
"\n",
"The output generated by this notebook is saved in the `experiment_output`folder."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import logging\n",
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"import azureml.automl.runtime\n",
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"import matplotlib.pyplot as plt\n",
"from helper_functions import (ts_train_test_split, compute_metrics)\n",
"\n",
"import azureml.core\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.train.automl import AutoMLConfig\n",
"\n",
"\n",
"# set printing options\n",
"np.set_printoptions(precision=4, suppress=True, linewidth=100)\n",
"pd.set_option('display.max_columns', 500)\n",
"pd.set_option('display.width', 1000)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As part of the setup you have already created a **Workspace**. You will also need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"amlcompute_cluster_name = \"recipe-cluster\"\n",
" \n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n",
"\n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" max_nodes = 6)\n",
"\n",
" # Create the cluster.\\n\",\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
"\n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data\n",
"\n",
"Here, we will load the data from the csv file and drop the Covid period."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"main_data_loc = 'data'\n",
"train_file_name = 'S4248SM144SCEN.csv'\n",
"\n",
"TARGET_COLNAME = \"S4248SM144SCEN\"\n",
"TIME_COLNAME = \"observation_date\"\n",
"COVID_PERIOD_START = '2020-03-01' # start of the covid period. To be excluded from evaluation.\n",
"\n",
"# load data\n",
"df = pd.read_csv(os.path.join(main_data_loc, train_file_name))\n",
"df[TIME_COLNAME] = pd.to_datetime(df[TIME_COLNAME], format='%Y-%m-%d')\n",
"df.sort_values(by=TIME_COLNAME, inplace=True)\n",
"\n",
"# remove the Covid period\n",
"df = df.query('{} <= \"{}\"'.format(TIME_COLNAME, COVID_PERIOD_START))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set parameters\n",
"\n",
"The first set of parameters is based on the analysis performed in the `auto-ml-forecasting-univariate-recipe-experiment-settings` notebook. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# set parameters based on the settings notebook analysis\n",
"DIFFERENCE_SERIES = True\n",
"TARGET_LAGS = None\n",
"STL_TYPE = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, define additional parameters to be used in the <a href=\"https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig?view=azure-ml-py\"> AutoML config </a> class.\n",
"\n",
"<ul> \n",
" <li> FORECAST_HORIZON: The forecast horizon is the number of periods into the future that the model should predict. Here, we set the horizon to 12 periods (i.e. 12 quarters). For more discussion of forecast horizons and guiding principles for setting them, please see the <a href=\"https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand\"> energy demand notebook </a>. \n",
" </li>\n",
" <li> TIME_SERIES_ID_COLNAMES: The names of columns used to group a timeseries. It can be used to create multiple series. If time series identifier is not defined, the data set is assumed to be one time-series. This parameter is used with task type forecasting. Since we are working with a single series, this list is empty.\n",
" </li>\n",
" <li> BLOCKED_MODELS: Optional list of models to be blocked from consideration during model selection stage. At this point we want to consider all ML and Time Series models.\n",
" <ul>\n",
" <li> See the following <a href=\"https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.constants.supportedmodels.forecasting?view=azure-ml-py\"> link </a> for a list of supported Forecasting models</li>\n",
" </ul>\n",
" </li>\n",
"</ul>\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# set other parameters\n",
"FORECAST_HORIZON = 12\n",
"TIME_SERIES_ID_COLNAMES = []\n",
"BLOCKED_MODELS = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To run AutoML, you also need to create an **Experiment**. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# choose a name for the run history container in the workspace\n",
"if isinstance(TARGET_LAGS, list):\n",
" TARGET_LAGS_STR = '-'.join(map(str, TARGET_LAGS)) if (len(TARGET_LAGS) > 0) else None\n",
"else:\n",
" TARGET_LAGS_STR = TARGET_LAGS\n",
"\n",
"experiment_desc = 'diff-{}_lags-{}_STL-{}'.format(DIFFERENCE_SERIES, TARGET_LAGS_STR, STL_TYPE)\n",
"experiment_name = 'alcohol_{}'.format(experiment_desc)\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n",
"output['SKU'] = ws.sku\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Run History Name'] = experiment_name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"print(outputDf.T)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create output directory\n",
"output_dir = 'experiment_output/{}'.format(experiment_desc)\n",
"if not os.path.exists(output_dir):\n",
" os.makedirs(output_dir) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# difference data and test for unit root\n",
"if DIFFERENCE_SERIES:\n",
" df_delta = df.copy()\n",
" df_delta[TARGET_COLNAME] = df[TARGET_COLNAME].diff()\n",
" df_delta.dropna(axis=0, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# split the data into train and test set\n",
"if DIFFERENCE_SERIES: \n",
" # generate train/inference sets using data in first differences\n",
" df_train, df_test = ts_train_test_split(df_input=df_delta,\n",
" n=FORECAST_HORIZON,\n",
" time_colname=TIME_COLNAME,\n",
" ts_id_colnames=TIME_SERIES_ID_COLNAMES)\n",
"else:\n",
" df_train, df_test = ts_train_test_split(df_input=df,\n",
" n=FORECAST_HORIZON,\n",
" time_colname=TIME_COLNAME,\n",
" ts_id_colnames=TIME_SERIES_ID_COLNAMES)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Upload files to the Datastore\n",
"The [Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace) is paired with the storage account, which contains the default data store. We will use it to upload the bike share data and create [tabular dataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) for training. A tabular dataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df_train.to_csv(\"train.csv\", index=False)\n",
"df_test.to_csv(\"test.csv\", index=False)\n",
"\n",
"datastore = ws.get_default_datastore()\n",
"datastore.upload_files(files = ['./train.csv'], target_path = 'uni-recipe-dataset/tabular/', overwrite = True,show_progress = True)\n",
"datastore.upload_files(files = ['./test.csv'], target_path = 'uni-recipe-dataset/tabular/', overwrite = True,show_progress = True)\n",
"\n",
"from azureml.core import Dataset\n",
"train_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, 'uni-recipe-dataset/tabular/train.csv')])\n",
"test_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, 'uni-recipe-dataset/tabular/test.csv')])\n",
"\n",
"# print the first 5 rows of the Dataset\n",
"train_dataset.to_pandas_dataframe().reset_index(drop=True).head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Config AutoML"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"time_series_settings = {\n",
" 'time_column_name': TIME_COLNAME,\n",
" 'forecast_horizon': FORECAST_HORIZON,\n",
" 'target_lags': TARGET_LAGS,\n",
" 'use_stl': STL_TYPE,\n",
" 'blocked_models': BLOCKED_MODELS,\n",
" 'time_series_id_column_names': TIME_SERIES_ID_COLNAMES\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(task='forecasting',\n",
" debug_log='sample_experiment.log',\n",
" primary_metric='normalized_root_mean_squared_error',\n",
" experiment_timeout_minutes=20,\n",
" iteration_timeout_minutes=5,\n",
" enable_early_stopping=True,\n",
" training_data=train_dataset,\n",
" label_column_name=TARGET_COLNAME,\n",
" n_cross_validations=5,\n",
" verbosity=logging.INFO,\n",
" max_cores_per_iteration=-1,\n",
" compute_target=compute_target,\n",
" **time_series_settings)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will now run the experiment, you can go to Azure ML portal to view the run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config, show_output=False)\n",
"remote_run.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the best model\n",
"Below we select the best model from all the training iterations using get_output method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = remote_run.get_output()\n",
"fitted_model.steps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Inference\n",
"\n",
"We now use the best fitted model from the AutoML Run to make forecasts for the test set. We will do batch scoring on the test dataset which should have the same schema as training dataset.\n",
"\n",
"The inference will run on a remote compute. In this example, it will re-use the training compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_experiment = Experiment(ws, experiment_name + \"_inference\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retreiving forecasts from the model\n",
"We have created a function called `run_forecast` that submits the test data to the best model determined during the training run and retrieves forecasts. This function uses a helper script `forecasting_script` which is uploaded and expecuted on the remote compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from run_forecast import run_remote_inference\n",
"remote_run = run_remote_inference(test_experiment=test_experiment, \n",
" compute_target=compute_target,\n",
" train_run=best_run,\n",
" test_dataset=test_dataset,\n",
" target_column_name=TARGET_COLNAME)\n",
"remote_run.wait_for_completion(show_output=False)\n",
"\n",
"remote_run.download_file('outputs/predictions.csv', f'{output_dir}/predictions.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download the prediction result for metrics calcuation\n",
"The test data with predictions are saved in artifact `outputs/predictions.csv`. We will use it to calculate accuracy metrics and vizualize predictions versus actuals."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_trans = pd.read_csv(f'{output_dir}/predictions.csv', parse_dates=[TIME_COLNAME])\n",
"X_trans.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# convert forecast in differences to levels\n",
"def convert_fcst_diff_to_levels(fcst, yt, df_orig):\n",
" \"\"\" Convert forecast from first differences to levels. \"\"\"\n",
" fcst = fcst.reset_index(drop=False, inplace=False)\n",
" fcst['predicted_level'] = fcst['predicted'].cumsum()\n",
" fcst['predicted_level'] = fcst['predicted_level'].astype(float) + float(yt)\n",
" # merge actuals\n",
" out = pd.merge(fcst,\n",
" df_orig[[TIME_COLNAME, TARGET_COLNAME]], \n",
" on=[TIME_COLNAME], how='inner')\n",
" out.rename(columns={TARGET_COLNAME: 'actual_level'}, inplace=True)\n",
" return out"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if DIFFERENCE_SERIES: \n",
" # convert forecast in differences to the levels\n",
" INFORMATION_SET_DATE = max(df_train[TIME_COLNAME])\n",
" YT = df.query('{} == @INFORMATION_SET_DATE'.format(TIME_COLNAME))[TARGET_COLNAME]\n",
"\n",
" fcst_df = convert_fcst_diff_to_levels(fcst=X_trans, yt=YT, df_orig=df)\n",
"else:\n",
" fcst_df = X_trans.copy()\n",
" fcst_df['actual_level'] = y_test\n",
" fcst_df['predicted_level'] = y_predictions\n",
"\n",
"del X_trans"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calculate metrics and save output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# compute metrics\n",
"metrics_df = compute_metrics(fcst_df=fcst_df,\n",
" metric_name=None,\n",
" ts_id_colnames=None)\n",
"# save output\n",
"metrics_file_name = '{}_metrics.csv'.format(experiment_name)\n",
"fcst_file_name = '{}_forecst.csv'.format(experiment_name)\n",
"plot_file_name = '{}_plot.pdf'.format(experiment_name)\n",
"\n",
"metrics_df.to_csv(os.path.join(output_dir, metrics_file_name), index=True)\n",
"fcst_df.to_csv(os.path.join(output_dir, fcst_file_name), index=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate and save visuals"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plot_df = df.query('{} > \"2010-01-01\"'.format(TIME_COLNAME))\n",
"plot_df.set_index(TIME_COLNAME, inplace=True)\n",
"fcst_df.set_index(TIME_COLNAME, inplace=True)\n",
"\n",
"# generate and save plots\n",
"fig, ax = plt.subplots(dpi=180)\n",
"ax.plot(plot_df[TARGET_COLNAME], '-g', label='Historical')\n",
"ax.plot(fcst_df['actual_level'], '-b', label='Actual')\n",
"ax.plot(fcst_df['predicted_level'], '-r', label='Forecast')\n",
"ax.legend()\n",
"ax.set_title(\"Forecast vs Actuals\")\n",
"ax.set_xlabel(TIME_COLNAME)\n",
"ax.set_ylabel(TARGET_COLNAME)\n",
"locs, labels = plt.xticks()\n",
"\n",
"plt.setp(labels, rotation=45)\n",
"plt.savefig(os.path.join(output_dir, plot_file_name))"
]
}
],
"metadata": {
"authors": [
{
"name": "vlbejan"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,4 @@
name: auto-ml-forecasting-univariate-recipe-run-experiment
dependencies:
- pip:
- azureml-sdk

View File

@@ -0,0 +1,350 @@
observation_date,S4248SM144SCEN
1992-01-01,4302
1992-02-01,4323
1992-03-01,4199
1992-04-01,4397
1992-05-01,4159
1992-06-01,4091
1992-07-01,4109
1992-08-01,4116
1992-09-01,4093
1992-10-01,4095
1992-11-01,4169
1992-12-01,4169
1993-01-01,4124
1993-02-01,4107
1993-03-01,4168
1993-04-01,4254
1993-05-01,4290
1993-06-01,4163
1993-07-01,4274
1993-08-01,4253
1993-09-01,4312
1993-10-01,4296
1993-11-01,4221
1993-12-01,4233
1994-01-01,4218
1994-02-01,4237
1994-03-01,4343
1994-04-01,4357
1994-05-01,4264
1994-06-01,4392
1994-07-01,4381
1994-08-01,4290
1994-09-01,4348
1994-10-01,4357
1994-11-01,4417
1994-12-01,4411
1995-01-01,4417
1995-02-01,4339
1995-03-01,4256
1995-04-01,4276
1995-05-01,4290
1995-06-01,4413
1995-07-01,4305
1995-08-01,4476
1995-09-01,4393
1995-10-01,4447
1995-11-01,4492
1995-12-01,4489
1996-01-01,4635
1996-02-01,4697
1996-03-01,4588
1996-04-01,4633
1996-05-01,4685
1996-06-01,4672
1996-07-01,4666
1996-08-01,4726
1996-09-01,4571
1996-10-01,4624
1996-11-01,4691
1996-12-01,4604
1997-01-01,4657
1997-02-01,4711
1997-03-01,4810
1997-04-01,4626
1997-05-01,4860
1997-06-01,4757
1997-07-01,4916
1997-08-01,4921
1997-09-01,4985
1997-10-01,4905
1997-11-01,4880
1997-12-01,5165
1998-01-01,4885
1998-02-01,4925
1998-03-01,5049
1998-04-01,5090
1998-05-01,5094
1998-06-01,4929
1998-07-01,5132
1998-08-01,5061
1998-09-01,5471
1998-10-01,5327
1998-11-01,5257
1998-12-01,5354
1999-01-01,5427
1999-02-01,5415
1999-03-01,5387
1999-04-01,5483
1999-05-01,5510
1999-06-01,5539
1999-07-01,5532
1999-08-01,5625
1999-09-01,5799
1999-10-01,5843
1999-11-01,5836
1999-12-01,5724
2000-01-01,5757
2000-02-01,5731
2000-03-01,5839
2000-04-01,5825
2000-05-01,5877
2000-06-01,5979
2000-07-01,5828
2000-08-01,6016
2000-09-01,6113
2000-10-01,6150
2000-11-01,6111
2000-12-01,6088
2001-01-01,6360
2001-02-01,6300
2001-03-01,5935
2001-04-01,6204
2001-05-01,6164
2001-06-01,6231
2001-07-01,6336
2001-08-01,6179
2001-09-01,6120
2001-10-01,6134
2001-11-01,6381
2001-12-01,6521
2002-01-01,6333
2002-02-01,6541
2002-03-01,6692
2002-04-01,6591
2002-05-01,6554
2002-06-01,6596
2002-07-01,6620
2002-08-01,6577
2002-09-01,6625
2002-10-01,6441
2002-11-01,6584
2002-12-01,6923
2003-01-01,6600
2003-02-01,6742
2003-03-01,6831
2003-04-01,6782
2003-05-01,6714
2003-06-01,6736
2003-07-01,7146
2003-08-01,7027
2003-09-01,6896
2003-10-01,7107
2003-11-01,6997
2003-12-01,7075
2004-01-01,7235
2004-02-01,7072
2004-03-01,6968
2004-04-01,7144
2004-05-01,7232
2004-06-01,7095
2004-07-01,7181
2004-08-01,7146
2004-09-01,7230
2004-10-01,7327
2004-11-01,7328
2004-12-01,7425
2005-01-01,7520
2005-02-01,7551
2005-03-01,7572
2005-04-01,7701
2005-05-01,7819
2005-06-01,7770
2005-07-01,7627
2005-08-01,7816
2005-09-01,7718
2005-10-01,7772
2005-11-01,7788
2005-12-01,7576
2006-01-01,7940
2006-02-01,8027
2006-03-01,7884
2006-04-01,8043
2006-05-01,7995
2006-06-01,8218
2006-07-01,8159
2006-08-01,8331
2006-09-01,8320
2006-10-01,8397
2006-11-01,8603
2006-12-01,8515
2007-01-01,8336
2007-02-01,8233
2007-03-01,8475
2007-04-01,8310
2007-05-01,8583
2007-06-01,8645
2007-07-01,8713
2007-08-01,8636
2007-09-01,8791
2007-10-01,8984
2007-11-01,8867
2007-12-01,9059
2008-01-01,8911
2008-02-01,8701
2008-03-01,8956
2008-04-01,9095
2008-05-01,9102
2008-06-01,9170
2008-07-01,9194
2008-08-01,9164
2008-09-01,9337
2008-10-01,9186
2008-11-01,9029
2008-12-01,9025
2009-01-01,9486
2009-02-01,9219
2009-03-01,9059
2009-04-01,9171
2009-05-01,9114
2009-06-01,8926
2009-07-01,9150
2009-08-01,9105
2009-09-01,9011
2009-10-01,8743
2009-11-01,8958
2009-12-01,8969
2010-01-01,8984
2010-02-01,9068
2010-03-01,9335
2010-04-01,9481
2010-05-01,9132
2010-06-01,9192
2010-07-01,9123
2010-08-01,9091
2010-09-01,9155
2010-10-01,9556
2010-11-01,9477
2010-12-01,9436
2011-01-01,9519
2011-02-01,9667
2011-03-01,9668
2011-04-01,9628
2011-05-01,9376
2011-06-01,9830
2011-07-01,9626
2011-08-01,9802
2011-09-01,9858
2011-10-01,9838
2011-11-01,9846
2011-12-01,9789
2012-01-01,9955
2012-02-01,9909
2012-03-01,9897
2012-04-01,9909
2012-05-01,10127
2012-06-01,10175
2012-07-01,10129
2012-08-01,10251
2012-09-01,10227
2012-10-01,10174
2012-11-01,10402
2012-12-01,10664
2013-01-01,10585
2013-02-01,10661
2013-03-01,10649
2013-04-01,10676
2013-05-01,10863
2013-06-01,10690
2013-07-01,11007
2013-08-01,10835
2013-09-01,10900
2013-10-01,10749
2013-11-01,10946
2013-12-01,10864
2014-01-01,10726
2014-02-01,10821
2014-03-01,10789
2014-04-01,10892
2014-05-01,10892
2014-06-01,10789
2014-07-01,10662
2014-08-01,10767
2014-09-01,10779
2014-10-01,10922
2014-11-01,10662
2014-12-01,10808
2015-01-01,10865
2015-02-01,10740
2015-03-01,10917
2015-04-01,10933
2015-05-01,11074
2015-06-01,11108
2015-07-01,11493
2015-08-01,11386
2015-09-01,11502
2015-10-01,11487
2015-11-01,11375
2015-12-01,11445
2016-01-01,11787
2016-02-01,11792
2016-03-01,11649
2016-04-01,11810
2016-05-01,11496
2016-06-01,11600
2016-07-01,11503
2016-08-01,11715
2016-09-01,11732
2016-10-01,11885
2016-11-01,12092
2016-12-01,11857
2017-01-01,11881
2017-02-01,12355
2017-03-01,12027
2017-04-01,12183
2017-05-01,12170
2017-06-01,12387
2017-07-01,12041
2017-08-01,12139
2017-09-01,11861
2017-10-01,12202
2017-11-01,12178
2017-12-01,12126
2018-01-01,11942
2018-02-01,12206
2018-03-01,12362
2018-04-01,12287
2018-05-01,12497
2018-06-01,12621
2018-07-01,12729
2018-08-01,12689
2018-09-01,12874
2018-10-01,12776
2018-11-01,12995
2018-12-01,13291
2019-01-01,13364
2019-02-01,13135
2019-03-01,13123
2019-04-01,13110
2019-05-01,13152
2019-06-01,13201
2019-07-01,13354
2019-08-01,13427
2019-09-01,13472
2019-10-01,13436
2019-11-01,13430
2019-12-01,13588
2020-01-01,13533
2020-02-01,13575
2020-03-01,13867
2020-04-01,12319
2020-05-01,13909
2020-06-01,13982
2020-07-01,15384
2020-08-01,15701
2020-09-01,15006
2020-10-01,15741
2020-11-01,14934
2020-12-01,13061
2021-01-01,15743
1 observation_date S4248SM144SCEN
2 1992-01-01 4302
3 1992-02-01 4323
4 1992-03-01 4199
5 1992-04-01 4397
6 1992-05-01 4159
7 1992-06-01 4091
8 1992-07-01 4109
9 1992-08-01 4116
10 1992-09-01 4093
11 1992-10-01 4095
12 1992-11-01 4169
13 1992-12-01 4169
14 1993-01-01 4124
15 1993-02-01 4107
16 1993-03-01 4168
17 1993-04-01 4254
18 1993-05-01 4290
19 1993-06-01 4163
20 1993-07-01 4274
21 1993-08-01 4253
22 1993-09-01 4312
23 1993-10-01 4296
24 1993-11-01 4221
25 1993-12-01 4233
26 1994-01-01 4218
27 1994-02-01 4237
28 1994-03-01 4343
29 1994-04-01 4357
30 1994-05-01 4264
31 1994-06-01 4392
32 1994-07-01 4381
33 1994-08-01 4290
34 1994-09-01 4348
35 1994-10-01 4357
36 1994-11-01 4417
37 1994-12-01 4411
38 1995-01-01 4417
39 1995-02-01 4339
40 1995-03-01 4256
41 1995-04-01 4276
42 1995-05-01 4290
43 1995-06-01 4413
44 1995-07-01 4305
45 1995-08-01 4476
46 1995-09-01 4393
47 1995-10-01 4447
48 1995-11-01 4492
49 1995-12-01 4489
50 1996-01-01 4635
51 1996-02-01 4697
52 1996-03-01 4588
53 1996-04-01 4633
54 1996-05-01 4685
55 1996-06-01 4672
56 1996-07-01 4666
57 1996-08-01 4726
58 1996-09-01 4571
59 1996-10-01 4624
60 1996-11-01 4691
61 1996-12-01 4604
62 1997-01-01 4657
63 1997-02-01 4711
64 1997-03-01 4810
65 1997-04-01 4626
66 1997-05-01 4860
67 1997-06-01 4757
68 1997-07-01 4916
69 1997-08-01 4921
70 1997-09-01 4985
71 1997-10-01 4905
72 1997-11-01 4880
73 1997-12-01 5165
74 1998-01-01 4885
75 1998-02-01 4925
76 1998-03-01 5049
77 1998-04-01 5090
78 1998-05-01 5094
79 1998-06-01 4929
80 1998-07-01 5132
81 1998-08-01 5061
82 1998-09-01 5471
83 1998-10-01 5327
84 1998-11-01 5257
85 1998-12-01 5354
86 1999-01-01 5427
87 1999-02-01 5415
88 1999-03-01 5387
89 1999-04-01 5483
90 1999-05-01 5510
91 1999-06-01 5539
92 1999-07-01 5532
93 1999-08-01 5625
94 1999-09-01 5799
95 1999-10-01 5843
96 1999-11-01 5836
97 1999-12-01 5724
98 2000-01-01 5757
99 2000-02-01 5731
100 2000-03-01 5839
101 2000-04-01 5825
102 2000-05-01 5877
103 2000-06-01 5979
104 2000-07-01 5828
105 2000-08-01 6016
106 2000-09-01 6113
107 2000-10-01 6150
108 2000-11-01 6111
109 2000-12-01 6088
110 2001-01-01 6360
111 2001-02-01 6300
112 2001-03-01 5935
113 2001-04-01 6204
114 2001-05-01 6164
115 2001-06-01 6231
116 2001-07-01 6336
117 2001-08-01 6179
118 2001-09-01 6120
119 2001-10-01 6134
120 2001-11-01 6381
121 2001-12-01 6521
122 2002-01-01 6333
123 2002-02-01 6541
124 2002-03-01 6692
125 2002-04-01 6591
126 2002-05-01 6554
127 2002-06-01 6596
128 2002-07-01 6620
129 2002-08-01 6577
130 2002-09-01 6625
131 2002-10-01 6441
132 2002-11-01 6584
133 2002-12-01 6923
134 2003-01-01 6600
135 2003-02-01 6742
136 2003-03-01 6831
137 2003-04-01 6782
138 2003-05-01 6714
139 2003-06-01 6736
140 2003-07-01 7146
141 2003-08-01 7027
142 2003-09-01 6896
143 2003-10-01 7107
144 2003-11-01 6997
145 2003-12-01 7075
146 2004-01-01 7235
147 2004-02-01 7072
148 2004-03-01 6968
149 2004-04-01 7144
150 2004-05-01 7232
151 2004-06-01 7095
152 2004-07-01 7181
153 2004-08-01 7146
154 2004-09-01 7230
155 2004-10-01 7327
156 2004-11-01 7328
157 2004-12-01 7425
158 2005-01-01 7520
159 2005-02-01 7551
160 2005-03-01 7572
161 2005-04-01 7701
162 2005-05-01 7819
163 2005-06-01 7770
164 2005-07-01 7627
165 2005-08-01 7816
166 2005-09-01 7718
167 2005-10-01 7772
168 2005-11-01 7788
169 2005-12-01 7576
170 2006-01-01 7940
171 2006-02-01 8027
172 2006-03-01 7884
173 2006-04-01 8043
174 2006-05-01 7995
175 2006-06-01 8218
176 2006-07-01 8159
177 2006-08-01 8331
178 2006-09-01 8320
179 2006-10-01 8397
180 2006-11-01 8603
181 2006-12-01 8515
182 2007-01-01 8336
183 2007-02-01 8233
184 2007-03-01 8475
185 2007-04-01 8310
186 2007-05-01 8583
187 2007-06-01 8645
188 2007-07-01 8713
189 2007-08-01 8636
190 2007-09-01 8791
191 2007-10-01 8984
192 2007-11-01 8867
193 2007-12-01 9059
194 2008-01-01 8911
195 2008-02-01 8701
196 2008-03-01 8956
197 2008-04-01 9095
198 2008-05-01 9102
199 2008-06-01 9170
200 2008-07-01 9194
201 2008-08-01 9164
202 2008-09-01 9337
203 2008-10-01 9186
204 2008-11-01 9029
205 2008-12-01 9025
206 2009-01-01 9486
207 2009-02-01 9219
208 2009-03-01 9059
209 2009-04-01 9171
210 2009-05-01 9114
211 2009-06-01 8926
212 2009-07-01 9150
213 2009-08-01 9105
214 2009-09-01 9011
215 2009-10-01 8743
216 2009-11-01 8958
217 2009-12-01 8969
218 2010-01-01 8984
219 2010-02-01 9068
220 2010-03-01 9335
221 2010-04-01 9481
222 2010-05-01 9132
223 2010-06-01 9192
224 2010-07-01 9123
225 2010-08-01 9091
226 2010-09-01 9155
227 2010-10-01 9556
228 2010-11-01 9477
229 2010-12-01 9436
230 2011-01-01 9519
231 2011-02-01 9667
232 2011-03-01 9668
233 2011-04-01 9628
234 2011-05-01 9376
235 2011-06-01 9830
236 2011-07-01 9626
237 2011-08-01 9802
238 2011-09-01 9858
239 2011-10-01 9838
240 2011-11-01 9846
241 2011-12-01 9789
242 2012-01-01 9955
243 2012-02-01 9909
244 2012-03-01 9897
245 2012-04-01 9909
246 2012-05-01 10127
247 2012-06-01 10175
248 2012-07-01 10129
249 2012-08-01 10251
250 2012-09-01 10227
251 2012-10-01 10174
252 2012-11-01 10402
253 2012-12-01 10664
254 2013-01-01 10585
255 2013-02-01 10661
256 2013-03-01 10649
257 2013-04-01 10676
258 2013-05-01 10863
259 2013-06-01 10690
260 2013-07-01 11007
261 2013-08-01 10835
262 2013-09-01 10900
263 2013-10-01 10749
264 2013-11-01 10946
265 2013-12-01 10864
266 2014-01-01 10726
267 2014-02-01 10821
268 2014-03-01 10789
269 2014-04-01 10892
270 2014-05-01 10892
271 2014-06-01 10789
272 2014-07-01 10662
273 2014-08-01 10767
274 2014-09-01 10779
275 2014-10-01 10922
276 2014-11-01 10662
277 2014-12-01 10808
278 2015-01-01 10865
279 2015-02-01 10740
280 2015-03-01 10917
281 2015-04-01 10933
282 2015-05-01 11074
283 2015-06-01 11108
284 2015-07-01 11493
285 2015-08-01 11386
286 2015-09-01 11502
287 2015-10-01 11487
288 2015-11-01 11375
289 2015-12-01 11445
290 2016-01-01 11787
291 2016-02-01 11792
292 2016-03-01 11649
293 2016-04-01 11810
294 2016-05-01 11496
295 2016-06-01 11600
296 2016-07-01 11503
297 2016-08-01 11715
298 2016-09-01 11732
299 2016-10-01 11885
300 2016-11-01 12092
301 2016-12-01 11857
302 2017-01-01 11881
303 2017-02-01 12355
304 2017-03-01 12027
305 2017-04-01 12183
306 2017-05-01 12170
307 2017-06-01 12387
308 2017-07-01 12041
309 2017-08-01 12139
310 2017-09-01 11861
311 2017-10-01 12202
312 2017-11-01 12178
313 2017-12-01 12126
314 2018-01-01 11942
315 2018-02-01 12206
316 2018-03-01 12362
317 2018-04-01 12287
318 2018-05-01 12497
319 2018-06-01 12621
320 2018-07-01 12729
321 2018-08-01 12689
322 2018-09-01 12874
323 2018-10-01 12776
324 2018-11-01 12995
325 2018-12-01 13291
326 2019-01-01 13364
327 2019-02-01 13135
328 2019-03-01 13123
329 2019-04-01 13110
330 2019-05-01 13152
331 2019-06-01 13201
332 2019-07-01 13354
333 2019-08-01 13427
334 2019-09-01 13472
335 2019-10-01 13436
336 2019-11-01 13430
337 2019-12-01 13588
338 2020-01-01 13533
339 2020-02-01 13575
340 2020-03-01 13867
341 2020-04-01 12319
342 2020-05-01 13909
343 2020-06-01 13982
344 2020-07-01 15384
345 2020-08-01 15701
346 2020-09-01 15006
347 2020-10-01 15741
348 2020-11-01 14934
349 2020-12-01 13061
350 2021-01-01 15743

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

View File

@@ -0,0 +1,57 @@
"""
This is the script that is executed on the compute instance. It relies
on the model.pkl file which is uploaded along with this script to the
compute instance.
"""
import argparse
from azureml.core import Dataset, Run
from azureml.automl.core.shared.constants import TimeSeriesInternal
from sklearn.externals import joblib
parser = argparse.ArgumentParser()
parser.add_argument(
'--target_column_name', type=str, dest='target_column_name',
help='Target Column Name')
parser.add_argument(
'--test_dataset', type=str, dest='test_dataset',
help='Test Dataset')
args = parser.parse_args()
target_column_name = args.target_column_name
test_dataset_id = args.test_dataset
run = Run.get_context()
ws = run.experiment.workspace
# get the input dataset by id
test_dataset = Dataset.get_by_id(ws, id=test_dataset_id)
X_test = test_dataset.drop_columns(columns=[target_column_name]).to_pandas_dataframe().reset_index(drop=True)
y_test_df = test_dataset.with_timestamp_columns(None).keep_columns(columns=[target_column_name]).to_pandas_dataframe()
# generate forecast
fitted_model = joblib.load('model.pkl')
# We have default quantiles values set as below(95th percentile)
quantiles = [0.025, 0.5, 0.975]
predicted_column_name = 'predicted'
PI = 'prediction_interval'
fitted_model.quantiles = quantiles
pred_quantiles = fitted_model.forecast_quantiles(X_test)
pred_quantiles[PI] = pred_quantiles[[min(quantiles), max(quantiles)]].apply(lambda x: '[{}, {}]'.format(x[0],
x[1]), axis=1)
X_test[target_column_name] = y_test_df[target_column_name]
X_test[PI] = pred_quantiles[PI]
X_test[predicted_column_name] = pred_quantiles[0.5]
# drop rows where prediction or actuals are nan
# happens because of missing actuals
# or at edges of time due to lags/rolling windows
clean = X_test[X_test[[target_column_name,
predicted_column_name]].notnull().all(axis=1)]
clean.rename(columns={target_column_name: 'actual'}, inplace=True)
file_name = 'outputs/predictions.csv'
export_csv = clean.to_csv(file_name, header=True, index=False) # added Index
# Upload the predictions into artifacts
run.upload_file(name=file_name, path_or_stream=file_name)

View File

@@ -0,0 +1,250 @@
"""
Helper functions to determine AutoML experiment settings for forecasting.
"""
import pandas as pd
import statsmodels.tsa.stattools as stattools
from arch import unitroot
from azureml.automl.core.shared import constants
from azureml.automl.runtime.shared.score import scoring
def adf_test(series, **kw):
"""
Wrapper for the augmented Dickey-Fuller test. Allows users to set the lag order.
:param series: series to test
:return: dictionary of results
"""
if 'lags' in kw.keys():
msg = 'Lag order of {} detected. Running the ADF test...'.format(str(kw['lags']))
print(msg)
statistic, pval, critval, resstore = stattools.adfuller(series,
maxlag=kw['lags'],
autolag=kw['autolag'],
store=kw['store'])
else:
statistic, pval, critval, resstore = stattools.adfuller(series,
autolag=kw['IC'],
store=kw['store'])
output = {'statistic': statistic,
'pval': pval,
'critical': critval,
'resstore': resstore}
return output
def kpss_test(series, **kw):
"""
Wrapper for the KPSS test. Allows users to set the lag order.
:param series: series to test
:return: dictionary of results
"""
if kw['store']:
statistic, p_value, critical_values, rstore = stattools.kpss(series,
regression=kw['reg_type'],
lags=kw['lags'],
store=kw['store'])
else:
statistic, p_value, lags, critical_values = stattools.kpss(series,
regression=kw['reg_type'],
lags=kw['lags'])
output = {'statistic': statistic,
'pval': p_value,
'critical': critical_values,
'lags': rstore.lags if kw['store'] else lags}
if kw['store']:
output.update({'resstore': rstore})
return output
def format_test_output(test_name, test_res, H0_unit_root=True):
"""
Helper function to format output. Return a dictionary with specific keys. Will be used to
construct the summary data frame for all unit root tests.
TODO: Add functionality of choosing based on the max lag order specified by user.
:param test_name: name of the test
:param test_res: object that contains corresponding test information. Can be None if test failed.
:param H0_unit_root: does the null hypothesis of the test assume a unit root process? Some tests do (ADF),
some don't (KPSS).
:return: dictionary of summary table for all tests and final decision on stationary vs non-stationary.
If test failed (test_res is None), return empty dictionary.
"""
# Check if the test failed by trying to extract the test statistic
if test_name in ('ADF', 'KPSS'):
try:
test_res['statistic']
except BaseException:
test_res = None
else:
try:
test_res.stat
except BaseException:
test_res = None
if test_res is None:
return {}
# extract necessary information
if test_name in ('ADF', 'KPSS'):
statistic = test_res['statistic']
crit_val = test_res['critical']['5%']
p_val = test_res['pval']
lags = test_res['resstore'].usedlag if test_name == 'ADF' else test_res['lags']
else:
statistic = test_res.stat
crit_val = test_res.critical_values['5%']
p_val = test_res.pvalue
lags = test_res.lags
if H0_unit_root:
H0 = 'The process is non-stationary'
stationary = "yes" if p_val < 0.05 else "not"
else:
H0 = 'The process is stationary'
stationary = "yes" if p_val > 0.05 else "not"
out = {
'test_name': test_name,
'statistic': statistic,
'crit_val': crit_val,
'p_val': p_val,
'lags': int(lags),
'stationary': stationary,
'Null Hypothesis': H0
}
return out
def unit_root_test_wrapper(series, lags=None):
"""
Main function to run multiple stationarity tests. Runs five tests and returns a summary table + decision
based on the majority rule. If the number of tests that determine a series is stationary equals to the
number of tests that deem it non-stationary, we assume the series is non-stationary.
* Augmented Dickey-Fuller (ADF),
* KPSS,
* ADF using GLS,
* Phillips-Perron (PP),
* Zivot-Andrews (ZA)
:param lags: (optional) parameter that allows user to run a series of tests for a specific lag value.
:param series: series to test
:return: dictionary of summary table for all tests and final decision on stationary vs nonstaionary
"""
# setting for ADF and KPSS tests
adf_settings = {
'IC': 'AIC',
'store': True
}
kpss_settings = {
'reg_type': 'c',
'lags': 'auto',
'store': True
}
arch_test_settings = {} # settings for PP, ADF GLS and ZA tests
if lags is not None:
adf_settings.update({'lags': lags, 'autolag': None})
kpss_settings.update({'lags:': lags})
arch_test_settings = {'lags': lags}
# Run individual tests
adf = adf_test(series, **adf_settings) # ADF test
kpss = kpss_test(series, **kpss_settings) # KPSS test
pp = unitroot.PhillipsPerron(series, **arch_test_settings) # Phillips-Perron test
adfgls = unitroot.DFGLS(series, **arch_test_settings) # ADF using GLS test
za = unitroot.ZivotAndrews(series, **arch_test_settings) # Zivot-Andrews test
# generate output table
adf_dict = format_test_output(test_name='ADF', test_res=adf, H0_unit_root=True)
kpss_dict = format_test_output(test_name='KPSS', test_res=kpss, H0_unit_root=False)
pp_dict = format_test_output(test_name='Philips Perron', test_res=pp, H0_unit_root=True)
adfgls_dict = format_test_output(test_name='ADF GLS', test_res=adfgls, H0_unit_root=True)
za_dict = format_test_output(test_name='Zivot-Andrews', test_res=za, H0_unit_root=True)
test_dict = {'ADF': adf_dict, 'KPSS': kpss_dict, 'PP': pp_dict, 'ADF GLS': adfgls_dict, 'ZA': za_dict}
test_sum = pd.DataFrame.from_dict(test_dict, orient='index').reset_index(drop=True)
# decision based on the majority rule
if test_sum.shape[0] > 0:
ratio = test_sum[test_sum["stationary"] == "yes"].shape[0] / test_sum.shape[0]
else:
ratio = 1 # all tests fail, assume the series is stationary
# Majority rule. If the ratio is exactly 0.5, assume the series in non-stationary.
stationary = 'YES' if (ratio > 0.5) else 'NO'
out = {'summary': test_sum, 'stationary': stationary}
return out
def ts_train_test_split(df_input, n, time_colname, ts_id_colnames=None):
"""
Group data frame by time series ID and split on last n rows for each group.
:param df_input: input data frame
:param n: number of observations in the test set
:param time_colname: time column
:param ts_id_colnames: (optional) list of grain column names
:return train and test data frames
"""
if ts_id_colnames is None:
ts_id_colnames = []
ts_id_colnames_original = ts_id_colnames.copy()
if len(ts_id_colnames) == 0:
ts_id_colnames = ['Grain']
df_input[ts_id_colnames[0]] = 'dummy'
# Sort by ascending time
df_grouped = (df_input.sort_values(time_colname).groupby(ts_id_colnames, group_keys=False))
df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-n])
df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-n:])
# drop group column name if it was not originally provided
if len(ts_id_colnames_original) == 0:
df_head.drop(ts_id_colnames, axis=1, inplace=True)
df_tail.drop(ts_id_colnames, axis=1, inplace=True)
return df_head, df_tail
def compute_metrics(fcst_df, metric_name=None, ts_id_colnames=None):
"""
Calculate metrics per grain.
:param fcst_df: forecast data frame. Must contain 2 columns: 'actual_level' and 'predicted_level'
:param metric_name: (optional) name of the metric to return
:param ts_id_colnames: (optional) list of grain column names
:return: dictionary of summary table for all tests and final decision on stationary vs nonstaionary
"""
if ts_id_colnames is None:
ts_id_colnames = []
if len(ts_id_colnames) == 0:
ts_id_colnames = ['TS_ID']
fcst_df[ts_id_colnames[0]] = 'dummy'
metrics_list = []
for grain, df in fcst_df.groupby(ts_id_colnames):
try:
scores = scoring.score_regression(
y_test=df['actual_level'],
y_pred=df['predicted_level'],
metrics=list(constants.Metric.SCALAR_REGRESSION_SET))
except BaseException:
msg = '{}: metrics calculation failed.'.format(grain)
print(msg)
scores = {}
one_grain_metrics_df = pd.DataFrame(list(scores.items()), columns=['metric_name', 'metric']).\
sort_values(['metric_name'])
one_grain_metrics_df.reset_index(inplace=True, drop=True)
if len(ts_id_colnames) < 2:
one_grain_metrics_df['grain'] = ts_id_colnames[0]
else:
one_grain_metrics_df['grain'] = "|".join(list(grain))
metrics_list.append(one_grain_metrics_df)
# collect into a data frame
grain_metrics = pd.concat(metrics_list)
if metric_name is not None:
grain_metrics = grain_metrics.query('metric_name == @metric_name')
return grain_metrics

View File

@@ -0,0 +1,38 @@
import os
import shutil
from azureml.core import ScriptRunConfig
def run_remote_inference(test_experiment, compute_target, train_run,
test_dataset, target_column_name, inference_folder='./forecast'):
# Create local directory to copy the model.pkl and forecsting_script.py files into.
# These files will be uploaded to and executed on the compute instance.
os.makedirs(inference_folder, exist_ok=True)
shutil.copy('forecasting_script.py', inference_folder)
train_run.download_file('outputs/model.pkl',
os.path.join(inference_folder, 'model.pkl'))
inference_env = train_run.get_environment()
config = ScriptRunConfig(source_directory=inference_folder,
script='forecasting_script.py',
arguments=['--target_column_name',
target_column_name,
'--test_dataset',
test_dataset.as_named_input(test_dataset.name)],
compute_target=compute_target,
environment=inference_env)
run = test_experiment.submit(config,
tags={'training_run_id':
train_run.id,
'run_algorithm':
train_run.properties['run_algorithm'],
'valid_score':
train_run.properties['score'],
'primary_metric':
train_run.properties['primary_metric']})
run.log("run_algorithm", run.tags['run_algorithm'])
return run

View File

@@ -96,7 +96,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -173,7 +173,7 @@
"source": [
"automl_settings = {\n",
" \"n_cross_validations\": 3,\n",
" \"primary_metric\": 'average_precision_score_weighted',\n",
" \"primary_metric\": 'AUC_weighted',\n",
" \"experiment_timeout_hours\": 0.25, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ability to find the best model possible\n",
" \"verbosity\": logging.INFO,\n",
" \"enable_stack_ensemble\": False\n",
@@ -215,15 +215,6 @@
"#local_run = AutoMLRun(experiment = experiment, run_id = '<replace with your run id>')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -445,7 +436,8 @@
"\n",
"automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, X=X_train, \n",
" X_test=X_test, y=y_train, \n",
" task='classification')"
" task='classification',\n",
" automl_run=automl_run)"
]
},
{
@@ -462,11 +454,10 @@
"metadata": {},
"outputs": [],
"source": [
"from interpret.ext.glassbox import LGBMExplainableModel\n",
"from azureml.interpret.mimic_wrapper import MimicWrapper\n",
"explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator,\n",
" explainable_model=automl_explainer_setup_obj.surrogate_model, \n",
" init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,\n",
" init_dataset=automl_explainer_setup_obj.X_transform, run=automl_explainer_setup_obj.automl_run,\n",
" features=automl_explainer_setup_obj.engineered_feature_names, \n",
" feature_maps=[automl_explainer_setup_obj.feature_map],\n",
" classes=automl_explainer_setup_obj.classes,\n",

View File

@@ -0,0 +1,4 @@
name: auto-ml-classification-credit-card-fraud-local
dependencies:
- pip:
- azureml-sdk

View File

@@ -77,7 +77,6 @@
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"import azureml.dataprep as dprep\n",
"from azureml.automl.core.featurization import FeaturizationConfig\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.core.dataset import Dataset"
@@ -96,7 +95,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -130,6 +129,8 @@
"### Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
@@ -152,7 +153,7 @@
" compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n",
" max_nodes=4)\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"\n",
@@ -305,15 +306,6 @@
"remote_run = experiment.submit(automl_config, show_output = False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -448,7 +440,7 @@
"\n",
"### Retrieve any AutoML Model for explanations\n",
"\n",
"Below we select the some AutoML pipeline from our iterations. The `get_output` method returns the a AutoML run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
"Below we select an AutoML pipeline from our iterations. The `get_output` method returns the a AutoML run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for any logged `metric` or for a particular `iteration`."
]
},
{
@@ -457,7 +449,8 @@
"metadata": {},
"outputs": [],
"source": [
"automl_run, fitted_model = remote_run.get_output(metric='r2_score')"
"#automl_run, fitted_model = remote_run.get_output(metric='r2_score')\n",
"automl_run, fitted_model = remote_run.get_output(iteration=2)"
]
},
{
@@ -547,8 +540,6 @@
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"import pkg_resources\n",
"\n",
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n",
@@ -726,14 +717,13 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import Webservice\n",
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.webservice import AciWebservice\n",
"from azureml.core.model import Model\n",
"from azureml.core.environment import Environment\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
" memory_gb=1, \n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=2, \n",
" memory_gb=2, \n",
" tags={\"data\": \"Machine Data\", \n",
" \"method\" : \"local_explanation\"}, \n",
" description='Get local explanations for Machine test data')\n",

View File

@@ -0,0 +1,4 @@
name: auto-ml-regression-explanation-featurization
dependencies:
- pip:
- azureml-sdk

View File

@@ -27,7 +27,7 @@ automl_run = Run(experiment=experiment, run_id='<<run_id>>')
# Check if this AutoML model is explainable
if not automl_check_model_if_explainable(automl_run):
raise Exception("Model explanations is currently not supported for " + automl_run.get_properties().get(
raise Exception("Model explanations are currently not supported for " + automl_run.get_properties().get(
'run_algorithm'))
# Download the best model from the artifact store
@@ -38,23 +38,25 @@ fitted_model = joblib.load('model.pkl')
# Get the train dataset from the workspace
train_dataset = Dataset.get_by_name(workspace=ws, name='<<train_dataset_name>>')
# Drop the lablled column to get the training set.
# Drop the labeled column to get the training set.
X_train = train_dataset.drop_columns(columns=['<<target_column_name>>'])
y_train = train_dataset.keep_columns(columns=['<<target_column_name>>'], validate=True)
# Get the train dataset from the workspace
# Get the test dataset from the workspace
test_dataset = Dataset.get_by_name(workspace=ws, name='<<test_dataset_name>>')
# Drop the lablled column to get the testing set.
# Drop the labeled column to get the testing set.
X_test = test_dataset.drop_columns(columns=['<<target_column_name>>'])
# Setup the class for explaining the AtuoML models
# Setup the class for explaining the AutoML models
automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, '<<task>>',
X=X_train, X_test=X_test,
y=y_train)
y=y_train,
automl_run=automl_run)
# Initialize the Mimic Explainer
explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel,
init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,
init_dataset=automl_explainer_setup_obj.X_transform,
run=automl_explainer_setup_obj.automl_run,
features=automl_explainer_setup_obj.engineered_feature_names,
feature_maps=[automl_explainer_setup_obj.feature_map],
classes=automl_explainer_setup_obj.classes)

View File

@@ -92,7 +92,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.23.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.35.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -145,7 +145,7 @@
" compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n",
" max_nodes=4)\n",
" compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
@@ -213,7 +213,7 @@
"source": [
"automl_settings = {\n",
" \"n_cross_validations\": 3,\n",
" \"primary_metric\": 'r2_score',\n",
" \"primary_metric\": 'normalized_root_mean_squared_error',\n",
" \"enable_early_stopping\": True, \n",
" \"experiment_timeout_hours\": 0.3, #for real scenarios we reccommend a timeout of at least one hour \n",
" \"max_concurrent_iterations\": 4,\n",
@@ -256,15 +256,6 @@
"#remote_run = AutoMLRun(experiment = experiment, run_id = '<replace with your run id>')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -0,0 +1,4 @@
name: auto-ml-regression
dependencies:
- pip:
- azureml-sdk

View File

@@ -350,32 +350,6 @@
"displayHTML(\"<a href={} target='_blank'>Azure Portal: {}</a>\".format(local_run.get_portal_url(), local_run.id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Retrieve All Child Runs after the experiment is completed (in portal)\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(local_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" #print(properties)\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n",
" metricslist[int(properties['iteration'])] = metrics\n",
"\n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -352,32 +352,6 @@
"displayHTML(\"<a href={} target='_blank'>Azure Portal: {}</a>\".format(local_run.get_portal_url(), local_run.id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Retrieve All Child Runs after the experiment is completed (in portal)\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(local_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" #print(properties)\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n",
" metricslist[int(properties['iteration'])] = metrics\n",
"\n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -0,0 +1,84 @@
Azure Synapse Analyticsis a limitless analytics service that brings together data integration, enterprise data warehousing, and big data analytics. It gives you the freedom to query data on your terms, using either serverless or dedicated resources—at scale. Azure Synapse brings these worlds together with a unified experience to ingest, explore, prepare, manage, and serve data for immediate BI and machine learning needs.A coreoffering within Azure Synapse Analyticsare serverlessApache Spark poolsenhanced for big data workloads.
Synapse in Aml integration is for customerswho want to useApacheSparkin AzureSynapse Analyticsto prepare data at scale in Azure ML before training their ML model. This will allow customers to work on their end-to-end ML lifecycle including large-scale data preparation, model training and deployment within Azure ML workspace without having to use suboptimal tools for machine learning or switch between multipletools for data preparation and model training.The ability to perform all ML tasks within Azure ML willreducetimerequired for customersto iterate on a machine learning project which typically includesmultiple rounds ofdata preparation and training.
In the public preview, the capabilities are provided:
- Link Azure Synapse Analytics workspace to Azure Machine Learning workspace (via ARM, UI or SDK)
- Attach Apache Spark pools powered by Azure Synapse Analytics as Azure Machine Learning compute targets (via ARM, UI or SDK)
- Launch Apache Spark sessions in notebooks and perform interactive data exploration and preparation. This interactive experience leverages Apache Spark magic and customers will have session-level Conda support to install packages.
- Productionize ML pipelines by leveraging Apache Spark pools to pre-process big data
# Using Synapse in Azure machine learning
## Create synapse resources
Follow up the documents to create Synapse workspace and resource-setup.sh is available for you to create the resources.
- Create from [Portal](https://docs.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-workspace)
- Create from [Cli](https://docs.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-workspace-cli)
Follow up the documents to create Synapse spark pool
- Create from [Portal](https://docs.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-apache-spark-pool-portal)
- Create from [Cli](https://docs.microsoft.com/en-us/cli/azure/ext/synapse/synapse/spark/pool?view=azure-cli-latest)
## Link Synapse Workspace
Make sure you are the owner of synapse workspace so that you can link synapse workspace into AML.
You can run resource-setup.py to link the synapse workspace and attach compute
```python
from azureml.core import Workspace
ws = Workspace.from_config()
from azureml.core import LinkedService, SynapseWorkspaceLinkedServiceConfiguration
synapse_link_config = SynapseWorkspaceLinkedServiceConfiguration(
subscription_id="<subscription id>",
resource_group="<resource group",
name="<synapse workspace name>"
)
linked_service = LinkedService.register(
workspace=ws,
name='<link name>',
linked_service_config=synapse_link_config)
```
## Attach synapse spark pool as AzureML compute
```python
from azureml.core.compute import SynapseCompute, ComputeTarget
spark_pool_name = "<spark pool name>"
attached_synapse_name = "<attached compute name>"
attach_config = SynapseCompute.attach_configuration(
linked_service,
type="SynapseSpark",
pool_name=spark_pool_name)
synapse_compute=ComputeTarget.attach(
workspace=ws,
name=attached_synapse_name,
attach_configuration=attach_config)
synapse_compute.wait_for_completion()
```
## Set up permission
Grant Spark admin role to system assigned identity of the linked service so that the user can submit experiment run or pipeline run from AML workspace to synapse spark pool.
Grant Spark admin role to the specific user so that the user can start spark session to synapse spark pool.
You can get the system assigned identity information by running
```python
print(linked_service.system_assigned_identity_principal_id)
```
- Launch synapse studio of the synapse workspace and grant linked service MSI "Synapse Apache Spark administrator" role.
- In azure portal grant linked service MSI "Storage Blob Data Contributor" role of the primary adlsgen2 account of synapse workspace to use the library management feature.

View File

@@ -0,0 +1,186 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-arcadia/Synapse_Job_Scala_Support.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get AML workspace which has synapse spark pool attached"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace, Experiment, Dataset, Environment\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Leverage ScriptRunConfig to submit scala job to an attached synapse spark cluster"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.datastore import Datastore\n",
"# Use the default blob storage\n",
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
"\n",
"# We are uploading a sample file in the local directory to be used as a datasource\n",
"file_name = \"shakespeare.txt\"\n",
"def_blob_store.upload_files(files=[\"./{}\".format(file_name)], overwrite=False)\n",
"\n",
"# Create file dataset\n",
"file_dataset = Dataset.File.from_files(path=[(def_blob_store, file_name)])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.data import HDFSOutputDatasetConfig\n",
"import uuid\n",
"\n",
"run_config = RunConfiguration(framework=\"pyspark\")\n",
"run_config.target = \"link-pool\"\n",
"run_config.spark.configuration[\"spark.driver.memory\"] = \"2g\"\n",
"run_config.spark.configuration[\"spark.driver.cores\"] = 2\n",
"run_config.spark.configuration[\"spark.executor.memory\"] = \"2g\"\n",
"run_config.spark.configuration[\"spark.executor.cores\"] = 1\n",
"run_config.spark.configuration[\"spark.executor.instances\"] = 1\n",
"# This can be removed if you are using local jars in source folder\n",
"run_config.spark.configuration[\"spark.yarn.dist.jars\"]=\"wasbs://synapse@azuremlexamples.blob.core.windows.net/shared/wordcount.jar\"\n",
"\n",
"dir_name = \"wordcount-{}\".format(str(uuid.uuid4()))\n",
"input = file_dataset.as_named_input(\"input\").as_hdfs()\n",
"output = HDFSOutputDatasetConfig(destination=(ws.get_default_datastore(), \"{}/result\".format(dir_name)))\n",
"\n",
"from azureml.core import ScriptRunConfig\n",
"args = ['--input', input, '--output', output]\n",
"script_run_config = ScriptRunConfig(source_directory = '.',\n",
" script= 'start_script.py',\n",
" arguments= args,\n",
" run_config = run_config)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"exp = Experiment(workspace=ws, name='synapse-spark')\n",
"run = exp.submit(config=script_run_config)\n",
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Leverage SynapseSparkStep in an AML pipeline to add dataprep step on synapse spark cluster"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.core import Pipeline\n",
"from azureml.pipeline.steps import SynapseSparkStep\n",
"\n",
"configs = {}\n",
"#configs[\"spark.yarn.dist.jars\"] = \"wasbs://synapse@azuremlexamples.blob.core.windows.net/shared/wordcount.jar\"\n",
"step_1 = SynapseSparkStep(name = 'synapse-spark',\n",
" file = 'start_script.py',\n",
" jars = \"wasbs://synapse@azuremlexamples.blob.core.windows.net/shared/wordcount.jar\",\n",
" source_directory=\".\",\n",
" arguments = args,\n",
" compute_target = 'link-pool',\n",
" driver_memory = \"2g\",\n",
" driver_cores = 2,\n",
" executor_memory = \"2g\",\n",
" executor_cores = 1,\n",
" num_executors = 1,\n",
" conf = configs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline = Pipeline(workspace=ws, steps=[step_1])\n",
"pipeline_run = pipeline.submit('synapse-pipeline', regenerate_outputs=True)"
]
}
],
"metadata": {
"authors": [
{
"name": "feli1"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
},
"nteract": {
"version": "0.28.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,240 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-arcadia/Synapse_Session_Scala_Support.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Interactive Spark Session on Synapse Spark Pool"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install -U \"azureml-synapse\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For JupyterLab, please additionally run:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!jupyter lab build --minimize=False"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## PLEASE restart kernel and then refresh web page before starting spark session."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0. Magic Usage"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2020-06-05T03:22:14.965395Z",
"iopub.status.busy": "2020-06-05T03:22:14.965395Z",
"iopub.status.idle": "2020-06-05T03:22:14.970398Z",
"shell.execute_reply": "2020-06-05T03:22:14.969397Z",
"shell.execute_reply.started": "2020-06-05T03:22:14.965395Z"
},
"gather": {
"logged": 1615594584642
}
},
"outputs": [],
"source": [
"# show help\n",
"%synapse ?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Start Synapse Session"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1615577715289
}
},
"outputs": [],
"source": [
"%synapse start -c linktestpool --start-timeout 1000"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"# 2. Use Scala"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## (1) Read Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"%%synapse scala\n",
"\n",
"var df = spark.read.option(\"header\", \"true\").csv(\"wasbs://demo@dprepdata.blob.core.windows.net/Titanic.csv\")\n",
"df.show(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## (2) Use Scala Sql"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"%%synapse scala\n",
"\n",
"df.createOrReplaceTempView(\"titanic\")\n",
"var sqlDF = spark.sql(\"SELECT Name, Fare from titanic\")\n",
"sqlDF.show(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Stop Session"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"%synapse stop"
]
}
],
"metadata": {
"authors": [
{
"name": "feli1"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
},
"nteract": {
"version": "0.28.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,892 @@
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S
12,1,1,"Bonnell, Miss. Elizabeth",female,58,0,0,113783,26.55,C103,S
13,0,3,"Saundercock, Mr. William Henry",male,20,0,0,A/5. 2151,8.05,,S
14,0,3,"Andersson, Mr. Anders Johan",male,39,1,5,347082,31.275,,S
15,0,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14,0,0,350406,7.8542,,S
16,1,2,"Hewlett, Mrs. (Mary D Kingcome) ",female,55,0,0,248706,16,,S
17,0,3,"Rice, Master. Eugene",male,2,4,1,382652,29.125,,Q
18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13,,S
19,0,3,"Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)",female,31,1,0,345763,18,,S
20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.225,,C
21,0,2,"Fynney, Mr. Joseph J",male,35,0,0,239865,26,,S
22,1,2,"Beesley, Mr. Lawrence",male,34,0,0,248698,13,D56,S
23,1,3,"McGowan, Miss. Anna ""Annie""",female,15,0,0,330923,8.0292,,Q
24,1,1,"Sloper, Mr. William Thompson",male,28,0,0,113788,35.5,A6,S
25,0,3,"Palsson, Miss. Torborg Danira",female,8,3,1,349909,21.075,,S
26,1,3,"Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)",female,38,1,5,347077,31.3875,,S
27,0,3,"Emir, Mr. Farred Chehab",male,,0,0,2631,7.225,,C
28,0,1,"Fortune, Mr. Charles Alexander",male,19,3,2,19950,263,C23 C25 C27,S
29,1,3,"O'Dwyer, Miss. Ellen ""Nellie""",female,,0,0,330959,7.8792,,Q
30,0,3,"Todoroff, Mr. Lalio",male,,0,0,349216,7.8958,,S
31,0,1,"Uruchurtu, Don. Manuel E",male,40,0,0,PC 17601,27.7208,,C
32,1,1,"Spencer, Mrs. William Augustus (Marie Eugenie)",female,,1,0,PC 17569,146.5208,B78,C
33,1,3,"Glynn, Miss. Mary Agatha",female,,0,0,335677,7.75,,Q
34,0,2,"Wheadon, Mr. Edward H",male,66,0,0,C.A. 24579,10.5,,S
35,0,1,"Meyer, Mr. Edgar Joseph",male,28,1,0,PC 17604,82.1708,,C
36,0,1,"Holverson, Mr. Alexander Oskar",male,42,1,0,113789,52,,S
37,1,3,"Mamee, Mr. Hanna",male,,0,0,2677,7.2292,,C
38,0,3,"Cann, Mr. Ernest Charles",male,21,0,0,A./5. 2152,8.05,,S
39,0,3,"Vander Planke, Miss. Augusta Maria",female,18,2,0,345764,18,,S
40,1,3,"Nicola-Yarred, Miss. Jamila",female,14,1,0,2651,11.2417,,C
41,0,3,"Ahlin, Mrs. Johan (Johanna Persdotter Larsson)",female,40,1,0,7546,9.475,,S
42,0,2,"Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott)",female,27,1,0,11668,21,,S
43,0,3,"Kraeff, Mr. Theodor",male,,0,0,349253,7.8958,,C
44,1,2,"Laroche, Miss. Simonne Marie Anne Andree",female,3,1,2,SC/Paris 2123,41.5792,,C
45,1,3,"Devaney, Miss. Margaret Delia",female,19,0,0,330958,7.8792,,Q
46,0,3,"Rogers, Mr. William John",male,,0,0,S.C./A.4. 23567,8.05,,S
47,0,3,"Lennon, Mr. Denis",male,,1,0,370371,15.5,,Q
48,1,3,"O'Driscoll, Miss. Bridget",female,,0,0,14311,7.75,,Q
49,0,3,"Samaan, Mr. Youssef",male,,2,0,2662,21.6792,,C
50,0,3,"Arnold-Franchi, Mrs. Josef (Josefine Franchi)",female,18,1,0,349237,17.8,,S
51,0,3,"Panula, Master. Juha Niilo",male,7,4,1,3101295,39.6875,,S
52,0,3,"Nosworthy, Mr. Richard Cater",male,21,0,0,A/4. 39886,7.8,,S
53,1,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female,49,1,0,PC 17572,76.7292,D33,C
54,1,2,"Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkinson)",female,29,1,0,2926,26,,S
55,0,1,"Ostby, Mr. Engelhart Cornelius",male,65,0,1,113509,61.9792,B30,C
56,1,1,"Woolner, Mr. Hugh",male,,0,0,19947,35.5,C52,S
57,1,2,"Rugg, Miss. Emily",female,21,0,0,C.A. 31026,10.5,,S
58,0,3,"Novel, Mr. Mansouer",male,28.5,0,0,2697,7.2292,,C
59,1,2,"West, Miss. Constance Mirium",female,5,1,2,C.A. 34651,27.75,,S
60,0,3,"Goodwin, Master. William Frederick",male,11,5,2,CA 2144,46.9,,S
61,0,3,"Sirayanian, Mr. Orsen",male,22,0,0,2669,7.2292,,C
62,1,1,"Icard, Miss. Amelie",female,38,0,0,113572,80,B28,
63,0,1,"Harris, Mr. Henry Birkhardt",male,45,1,0,36973,83.475,C83,S
64,0,3,"Skoog, Master. Harald",male,4,3,2,347088,27.9,,S
65,0,1,"Stewart, Mr. Albert A",male,,0,0,PC 17605,27.7208,,C
66,1,3,"Moubarek, Master. Gerios",male,,1,1,2661,15.2458,,C
67,1,2,"Nye, Mrs. (Elizabeth Ramell)",female,29,0,0,C.A. 29395,10.5,F33,S
68,0,3,"Crease, Mr. Ernest James",male,19,0,0,S.P. 3464,8.1583,,S
69,1,3,"Andersson, Miss. Erna Alexandra",female,17,4,2,3101281,7.925,,S
70,0,3,"Kink, Mr. Vincenz",male,26,2,0,315151,8.6625,,S
71,0,2,"Jenkin, Mr. Stephen Curnow",male,32,0,0,C.A. 33111,10.5,,S
72,0,3,"Goodwin, Miss. Lillian Amy",female,16,5,2,CA 2144,46.9,,S
73,0,2,"Hood, Mr. Ambrose Jr",male,21,0,0,S.O.C. 14879,73.5,,S
74,0,3,"Chronopoulos, Mr. Apostolos",male,26,1,0,2680,14.4542,,C
75,1,3,"Bing, Mr. Lee",male,32,0,0,1601,56.4958,,S
76,0,3,"Moen, Mr. Sigurd Hansen",male,25,0,0,348123,7.65,F G73,S
77,0,3,"Staneff, Mr. Ivan",male,,0,0,349208,7.8958,,S
78,0,3,"Moutal, Mr. Rahamin Haim",male,,0,0,374746,8.05,,S
79,1,2,"Caldwell, Master. Alden Gates",male,0.83,0,2,248738,29,,S
80,1,3,"Dowdell, Miss. Elizabeth",female,30,0,0,364516,12.475,,S
81,0,3,"Waelens, Mr. Achille",male,22,0,0,345767,9,,S
82,1,3,"Sheerlinck, Mr. Jan Baptist",male,29,0,0,345779,9.5,,S
83,1,3,"McDermott, Miss. Brigdet Delia",female,,0,0,330932,7.7875,,Q
84,0,1,"Carrau, Mr. Francisco M",male,28,0,0,113059,47.1,,S
85,1,2,"Ilett, Miss. Bertha",female,17,0,0,SO/C 14885,10.5,,S
86,1,3,"Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)",female,33,3,0,3101278,15.85,,S
87,0,3,"Ford, Mr. William Neal",male,16,1,3,W./C. 6608,34.375,,S
88,0,3,"Slocovski, Mr. Selman Francis",male,,0,0,SOTON/OQ 392086,8.05,,S
89,1,1,"Fortune, Miss. Mabel Helen",female,23,3,2,19950,263,C23 C25 C27,S
90,0,3,"Celotti, Mr. Francesco",male,24,0,0,343275,8.05,,S
91,0,3,"Christmann, Mr. Emil",male,29,0,0,343276,8.05,,S
92,0,3,"Andreasson, Mr. Paul Edvin",male,20,0,0,347466,7.8542,,S
93,0,1,"Chaffee, Mr. Herbert Fuller",male,46,1,0,W.E.P. 5734,61.175,E31,S
94,0,3,"Dean, Mr. Bertram Frank",male,26,1,2,C.A. 2315,20.575,,S
95,0,3,"Coxon, Mr. Daniel",male,59,0,0,364500,7.25,,S
96,0,3,"Shorney, Mr. Charles Joseph",male,,0,0,374910,8.05,,S
97,0,1,"Goldschmidt, Mr. George B",male,71,0,0,PC 17754,34.6542,A5,C
98,1,1,"Greenfield, Mr. William Bertram",male,23,0,1,PC 17759,63.3583,D10 D12,C
99,1,2,"Doling, Mrs. John T (Ada Julia Bone)",female,34,0,1,231919,23,,S
100,0,2,"Kantor, Mr. Sinai",male,34,1,0,244367,26,,S
101,0,3,"Petranec, Miss. Matilda",female,28,0,0,349245,7.8958,,S
102,0,3,"Petroff, Mr. Pastcho (""Pentcho"")",male,,0,0,349215,7.8958,,S
103,0,1,"White, Mr. Richard Frasar",male,21,0,1,35281,77.2875,D26,S
104,0,3,"Johansson, Mr. Gustaf Joel",male,33,0,0,7540,8.6542,,S
105,0,3,"Gustafsson, Mr. Anders Vilhelm",male,37,2,0,3101276,7.925,,S
106,0,3,"Mionoff, Mr. Stoytcho",male,28,0,0,349207,7.8958,,S
107,1,3,"Salkjelsvik, Miss. Anna Kristine",female,21,0,0,343120,7.65,,S
108,1,3,"Moss, Mr. Albert Johan",male,,0,0,312991,7.775,,S
109,0,3,"Rekic, Mr. Tido",male,38,0,0,349249,7.8958,,S
110,1,3,"Moran, Miss. Bertha",female,,1,0,371110,24.15,,Q
111,0,1,"Porter, Mr. Walter Chamberlain",male,47,0,0,110465,52,C110,S
112,0,3,"Zabour, Miss. Hileni",female,14.5,1,0,2665,14.4542,,C
113,0,3,"Barton, Mr. David John",male,22,0,0,324669,8.05,,S
114,0,3,"Jussila, Miss. Katriina",female,20,1,0,4136,9.825,,S
115,0,3,"Attalah, Miss. Malake",female,17,0,0,2627,14.4583,,C
116,0,3,"Pekoniemi, Mr. Edvard",male,21,0,0,STON/O 2. 3101294,7.925,,S
117,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.75,,Q
118,0,2,"Turpin, Mr. William John Robert",male,29,1,0,11668,21,,S
119,0,1,"Baxter, Mr. Quigg Edmond",male,24,0,1,PC 17558,247.5208,B58 B60,C
120,0,3,"Andersson, Miss. Ellis Anna Maria",female,2,4,2,347082,31.275,,S
121,0,2,"Hickman, Mr. Stanley George",male,21,2,0,S.O.C. 14879,73.5,,S
122,0,3,"Moore, Mr. Leonard Charles",male,,0,0,A4. 54510,8.05,,S
123,0,2,"Nasser, Mr. Nicholas",male,32.5,1,0,237736,30.0708,,C
124,1,2,"Webber, Miss. Susan",female,32.5,0,0,27267,13,E101,S
125,0,1,"White, Mr. Percival Wayland",male,54,0,1,35281,77.2875,D26,S
126,1,3,"Nicola-Yarred, Master. Elias",male,12,1,0,2651,11.2417,,C
127,0,3,"McMahon, Mr. Martin",male,,0,0,370372,7.75,,Q
128,1,3,"Madsen, Mr. Fridtjof Arne",male,24,0,0,C 17369,7.1417,,S
129,1,3,"Peter, Miss. Anna",female,,1,1,2668,22.3583,F E69,C
130,0,3,"Ekstrom, Mr. Johan",male,45,0,0,347061,6.975,,S
131,0,3,"Drazenoic, Mr. Jozef",male,33,0,0,349241,7.8958,,C
132,0,3,"Coelho, Mr. Domingos Fernandeo",male,20,0,0,SOTON/O.Q. 3101307,7.05,,S
133,0,3,"Robins, Mrs. Alexander A (Grace Charity Laury)",female,47,1,0,A/5. 3337,14.5,,S
134,1,2,"Weisz, Mrs. Leopold (Mathilde Francoise Pede)",female,29,1,0,228414,26,,S
135,0,2,"Sobey, Mr. Samuel James Hayden",male,25,0,0,C.A. 29178,13,,S
136,0,2,"Richard, Mr. Emile",male,23,0,0,SC/PARIS 2133,15.0458,,C
137,1,1,"Newsom, Miss. Helen Monypeny",female,19,0,2,11752,26.2833,D47,S
138,0,1,"Futrelle, Mr. Jacques Heath",male,37,1,0,113803,53.1,C123,S
139,0,3,"Osen, Mr. Olaf Elon",male,16,0,0,7534,9.2167,,S
140,0,1,"Giglio, Mr. Victor",male,24,0,0,PC 17593,79.2,B86,C
141,0,3,"Boulos, Mrs. Joseph (Sultana)",female,,0,2,2678,15.2458,,C
142,1,3,"Nysten, Miss. Anna Sofia",female,22,0,0,347081,7.75,,S
143,1,3,"Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck)",female,24,1,0,STON/O2. 3101279,15.85,,S
144,0,3,"Burke, Mr. Jeremiah",male,19,0,0,365222,6.75,,Q
145,0,2,"Andrew, Mr. Edgardo Samuel",male,18,0,0,231945,11.5,,S
146,0,2,"Nicholls, Mr. Joseph Charles",male,19,1,1,C.A. 33112,36.75,,S
147,1,3,"Andersson, Mr. August Edvard (""Wennerstrom"")",male,27,0,0,350043,7.7958,,S
148,0,3,"Ford, Miss. Robina Maggie ""Ruby""",female,9,2,2,W./C. 6608,34.375,,S
149,0,2,"Navratil, Mr. Michel (""Louis M Hoffman"")",male,36.5,0,2,230080,26,F2,S
150,0,2,"Byles, Rev. Thomas Roussel Davids",male,42,0,0,244310,13,,S
151,0,2,"Bateman, Rev. Robert James",male,51,0,0,S.O.P. 1166,12.525,,S
152,1,1,"Pears, Mrs. Thomas (Edith Wearne)",female,22,1,0,113776,66.6,C2,S
153,0,3,"Meo, Mr. Alfonzo",male,55.5,0,0,A.5. 11206,8.05,,S
154,0,3,"van Billiard, Mr. Austin Blyler",male,40.5,0,2,A/5. 851,14.5,,S
155,0,3,"Olsen, Mr. Ole Martin",male,,0,0,Fa 265302,7.3125,,S
156,0,1,"Williams, Mr. Charles Duane",male,51,0,1,PC 17597,61.3792,,C
157,1,3,"Gilnagh, Miss. Katherine ""Katie""",female,16,0,0,35851,7.7333,,Q
158,0,3,"Corn, Mr. Harry",male,30,0,0,SOTON/OQ 392090,8.05,,S
159,0,3,"Smiljanic, Mr. Mile",male,,0,0,315037,8.6625,,S
160,0,3,"Sage, Master. Thomas Henry",male,,8,2,CA. 2343,69.55,,S
161,0,3,"Cribb, Mr. John Hatfield",male,44,0,1,371362,16.1,,S
162,1,2,"Watt, Mrs. James (Elizabeth ""Bessie"" Inglis Milne)",female,40,0,0,C.A. 33595,15.75,,S
163,0,3,"Bengtsson, Mr. John Viktor",male,26,0,0,347068,7.775,,S
164,0,3,"Calic, Mr. Jovo",male,17,0,0,315093,8.6625,,S
165,0,3,"Panula, Master. Eino Viljami",male,1,4,1,3101295,39.6875,,S
166,1,3,"Goldsmith, Master. Frank John William ""Frankie""",male,9,0,2,363291,20.525,,S
167,1,1,"Chibnall, Mrs. (Edith Martha Bowerman)",female,,0,1,113505,55,E33,S
168,0,3,"Skoog, Mrs. William (Anna Bernhardina Karlsson)",female,45,1,4,347088,27.9,,S
169,0,1,"Baumann, Mr. John D",male,,0,0,PC 17318,25.925,,S
170,0,3,"Ling, Mr. Lee",male,28,0,0,1601,56.4958,,S
171,0,1,"Van der hoef, Mr. Wyckoff",male,61,0,0,111240,33.5,B19,S
172,0,3,"Rice, Master. Arthur",male,4,4,1,382652,29.125,,Q
173,1,3,"Johnson, Miss. Eleanor Ileen",female,1,1,1,347742,11.1333,,S
174,0,3,"Sivola, Mr. Antti Wilhelm",male,21,0,0,STON/O 2. 3101280,7.925,,S
175,0,1,"Smith, Mr. James Clinch",male,56,0,0,17764,30.6958,A7,C
176,0,3,"Klasen, Mr. Klas Albin",male,18,1,1,350404,7.8542,,S
177,0,3,"Lefebre, Master. Henry Forbes",male,,3,1,4133,25.4667,,S
178,0,1,"Isham, Miss. Ann Elizabeth",female,50,0,0,PC 17595,28.7125,C49,C
179,0,2,"Hale, Mr. Reginald",male,30,0,0,250653,13,,S
180,0,3,"Leonard, Mr. Lionel",male,36,0,0,LINE,0,,S
181,0,3,"Sage, Miss. Constance Gladys",female,,8,2,CA. 2343,69.55,,S
182,0,2,"Pernot, Mr. Rene",male,,0,0,SC/PARIS 2131,15.05,,C
183,0,3,"Asplund, Master. Clarence Gustaf Hugo",male,9,4,2,347077,31.3875,,S
184,1,2,"Becker, Master. Richard F",male,1,2,1,230136,39,F4,S
185,1,3,"Kink-Heilmann, Miss. Luise Gretchen",female,4,0,2,315153,22.025,,S
186,0,1,"Rood, Mr. Hugh Roscoe",male,,0,0,113767,50,A32,S
187,1,3,"O'Brien, Mrs. Thomas (Johanna ""Hannah"" Godfrey)",female,,1,0,370365,15.5,,Q
188,1,1,"Romaine, Mr. Charles Hallace (""Mr C Rolmane"")",male,45,0,0,111428,26.55,,S
189,0,3,"Bourke, Mr. John",male,40,1,1,364849,15.5,,Q
190,0,3,"Turcin, Mr. Stjepan",male,36,0,0,349247,7.8958,,S
191,1,2,"Pinsky, Mrs. (Rosa)",female,32,0,0,234604,13,,S
192,0,2,"Carbines, Mr. William",male,19,0,0,28424,13,,S
193,1,3,"Andersen-Jensen, Miss. Carla Christine Nielsine",female,19,1,0,350046,7.8542,,S
194,1,2,"Navratil, Master. Michel M",male,3,1,1,230080,26,F2,S
195,1,1,"Brown, Mrs. James Joseph (Margaret Tobin)",female,44,0,0,PC 17610,27.7208,B4,C
196,1,1,"Lurette, Miss. Elise",female,58,0,0,PC 17569,146.5208,B80,C
197,0,3,"Mernagh, Mr. Robert",male,,0,0,368703,7.75,,Q
198,0,3,"Olsen, Mr. Karl Siegwart Andreas",male,42,0,1,4579,8.4042,,S
199,1,3,"Madigan, Miss. Margaret ""Maggie""",female,,0,0,370370,7.75,,Q
200,0,2,"Yrois, Miss. Henriette (""Mrs Harbeck"")",female,24,0,0,248747,13,,S
201,0,3,"Vande Walle, Mr. Nestor Cyriel",male,28,0,0,345770,9.5,,S
202,0,3,"Sage, Mr. Frederick",male,,8,2,CA. 2343,69.55,,S
203,0,3,"Johanson, Mr. Jakob Alfred",male,34,0,0,3101264,6.4958,,S
204,0,3,"Youseff, Mr. Gerious",male,45.5,0,0,2628,7.225,,C
205,1,3,"Cohen, Mr. Gurshon ""Gus""",male,18,0,0,A/5 3540,8.05,,S
206,0,3,"Strom, Miss. Telma Matilda",female,2,0,1,347054,10.4625,G6,S
207,0,3,"Backstrom, Mr. Karl Alfred",male,32,1,0,3101278,15.85,,S
208,1,3,"Albimona, Mr. Nassef Cassem",male,26,0,0,2699,18.7875,,C
209,1,3,"Carr, Miss. Helen ""Ellen""",female,16,0,0,367231,7.75,,Q
210,1,1,"Blank, Mr. Henry",male,40,0,0,112277,31,A31,C
211,0,3,"Ali, Mr. Ahmed",male,24,0,0,SOTON/O.Q. 3101311,7.05,,S
212,1,2,"Cameron, Miss. Clear Annie",female,35,0,0,F.C.C. 13528,21,,S
213,0,3,"Perkin, Mr. John Henry",male,22,0,0,A/5 21174,7.25,,S
214,0,2,"Givard, Mr. Hans Kristensen",male,30,0,0,250646,13,,S
215,0,3,"Kiernan, Mr. Philip",male,,1,0,367229,7.75,,Q
216,1,1,"Newell, Miss. Madeleine",female,31,1,0,35273,113.275,D36,C
217,1,3,"Honkanen, Miss. Eliina",female,27,0,0,STON/O2. 3101283,7.925,,S
218,0,2,"Jacobsohn, Mr. Sidney Samuel",male,42,1,0,243847,27,,S
219,1,1,"Bazzani, Miss. Albina",female,32,0,0,11813,76.2917,D15,C
220,0,2,"Harris, Mr. Walter",male,30,0,0,W/C 14208,10.5,,S
221,1,3,"Sunderland, Mr. Victor Francis",male,16,0,0,SOTON/OQ 392089,8.05,,S
222,0,2,"Bracken, Mr. James H",male,27,0,0,220367,13,,S
223,0,3,"Green, Mr. George Henry",male,51,0,0,21440,8.05,,S
224,0,3,"Nenkoff, Mr. Christo",male,,0,0,349234,7.8958,,S
225,1,1,"Hoyt, Mr. Frederick Maxfield",male,38,1,0,19943,90,C93,S
226,0,3,"Berglund, Mr. Karl Ivar Sven",male,22,0,0,PP 4348,9.35,,S
227,1,2,"Mellors, Mr. William John",male,19,0,0,SW/PP 751,10.5,,S
228,0,3,"Lovell, Mr. John Hall (""Henry"")",male,20.5,0,0,A/5 21173,7.25,,S
229,0,2,"Fahlstrom, Mr. Arne Jonas",male,18,0,0,236171,13,,S
230,0,3,"Lefebre, Miss. Mathilde",female,,3,1,4133,25.4667,,S
231,1,1,"Harris, Mrs. Henry Birkhardt (Irene Wallach)",female,35,1,0,36973,83.475,C83,S
232,0,3,"Larsson, Mr. Bengt Edvin",male,29,0,0,347067,7.775,,S
233,0,2,"Sjostedt, Mr. Ernst Adolf",male,59,0,0,237442,13.5,,S
234,1,3,"Asplund, Miss. Lillian Gertrud",female,5,4,2,347077,31.3875,,S
235,0,2,"Leyson, Mr. Robert William Norman",male,24,0,0,C.A. 29566,10.5,,S
236,0,3,"Harknett, Miss. Alice Phoebe",female,,0,0,W./C. 6609,7.55,,S
237,0,2,"Hold, Mr. Stephen",male,44,1,0,26707,26,,S
238,1,2,"Collyer, Miss. Marjorie ""Lottie""",female,8,0,2,C.A. 31921,26.25,,S
239,0,2,"Pengelly, Mr. Frederick William",male,19,0,0,28665,10.5,,S
240,0,2,"Hunt, Mr. George Henry",male,33,0,0,SCO/W 1585,12.275,,S
241,0,3,"Zabour, Miss. Thamine",female,,1,0,2665,14.4542,,C
242,1,3,"Murphy, Miss. Katherine ""Kate""",female,,1,0,367230,15.5,,Q
243,0,2,"Coleridge, Mr. Reginald Charles",male,29,0,0,W./C. 14263,10.5,,S
244,0,3,"Maenpaa, Mr. Matti Alexanteri",male,22,0,0,STON/O 2. 3101275,7.125,,S
245,0,3,"Attalah, Mr. Sleiman",male,30,0,0,2694,7.225,,C
246,0,1,"Minahan, Dr. William Edward",male,44,2,0,19928,90,C78,Q
247,0,3,"Lindahl, Miss. Agda Thorilda Viktoria",female,25,0,0,347071,7.775,,S
248,1,2,"Hamalainen, Mrs. William (Anna)",female,24,0,2,250649,14.5,,S
249,1,1,"Beckwith, Mr. Richard Leonard",male,37,1,1,11751,52.5542,D35,S
250,0,2,"Carter, Rev. Ernest Courtenay",male,54,1,0,244252,26,,S
251,0,3,"Reed, Mr. James George",male,,0,0,362316,7.25,,S
252,0,3,"Strom, Mrs. Wilhelm (Elna Matilda Persson)",female,29,1,1,347054,10.4625,G6,S
253,0,1,"Stead, Mr. William Thomas",male,62,0,0,113514,26.55,C87,S
254,0,3,"Lobb, Mr. William Arthur",male,30,1,0,A/5. 3336,16.1,,S
255,0,3,"Rosblom, Mrs. Viktor (Helena Wilhelmina)",female,41,0,2,370129,20.2125,,S
256,1,3,"Touma, Mrs. Darwis (Hanne Youssef Razi)",female,29,0,2,2650,15.2458,,C
257,1,1,"Thorne, Mrs. Gertrude Maybelle",female,,0,0,PC 17585,79.2,,C
258,1,1,"Cherry, Miss. Gladys",female,30,0,0,110152,86.5,B77,S
259,1,1,"Ward, Miss. Anna",female,35,0,0,PC 17755,512.3292,,C
260,1,2,"Parrish, Mrs. (Lutie Davis)",female,50,0,1,230433,26,,S
261,0,3,"Smith, Mr. Thomas",male,,0,0,384461,7.75,,Q
262,1,3,"Asplund, Master. Edvin Rojj Felix",male,3,4,2,347077,31.3875,,S
263,0,1,"Taussig, Mr. Emil",male,52,1,1,110413,79.65,E67,S
264,0,1,"Harrison, Mr. William",male,40,0,0,112059,0,B94,S
265,0,3,"Henry, Miss. Delia",female,,0,0,382649,7.75,,Q
266,0,2,"Reeves, Mr. David",male,36,0,0,C.A. 17248,10.5,,S
267,0,3,"Panula, Mr. Ernesti Arvid",male,16,4,1,3101295,39.6875,,S
268,1,3,"Persson, Mr. Ernst Ulrik",male,25,1,0,347083,7.775,,S
269,1,1,"Graham, Mrs. William Thompson (Edith Junkins)",female,58,0,1,PC 17582,153.4625,C125,S
270,1,1,"Bissette, Miss. Amelia",female,35,0,0,PC 17760,135.6333,C99,S
271,0,1,"Cairns, Mr. Alexander",male,,0,0,113798,31,,S
272,1,3,"Tornquist, Mr. William Henry",male,25,0,0,LINE,0,,S
273,1,2,"Mellinger, Mrs. (Elizabeth Anne Maidment)",female,41,0,1,250644,19.5,,S
274,0,1,"Natsch, Mr. Charles H",male,37,0,1,PC 17596,29.7,C118,C
275,1,3,"Healy, Miss. Hanora ""Nora""",female,,0,0,370375,7.75,,Q
276,1,1,"Andrews, Miss. Kornelia Theodosia",female,63,1,0,13502,77.9583,D7,S
277,0,3,"Lindblom, Miss. Augusta Charlotta",female,45,0,0,347073,7.75,,S
278,0,2,"Parkes, Mr. Francis ""Frank""",male,,0,0,239853,0,,S
279,0,3,"Rice, Master. Eric",male,7,4,1,382652,29.125,,Q
280,1,3,"Abbott, Mrs. Stanton (Rosa Hunt)",female,35,1,1,C.A. 2673,20.25,,S
281,0,3,"Duane, Mr. Frank",male,65,0,0,336439,7.75,,Q
282,0,3,"Olsson, Mr. Nils Johan Goransson",male,28,0,0,347464,7.8542,,S
283,0,3,"de Pelsmaeker, Mr. Alfons",male,16,0,0,345778,9.5,,S
284,1,3,"Dorking, Mr. Edward Arthur",male,19,0,0,A/5. 10482,8.05,,S
285,0,1,"Smith, Mr. Richard William",male,,0,0,113056,26,A19,S
286,0,3,"Stankovic, Mr. Ivan",male,33,0,0,349239,8.6625,,C
287,1,3,"de Mulder, Mr. Theodore",male,30,0,0,345774,9.5,,S
288,0,3,"Naidenoff, Mr. Penko",male,22,0,0,349206,7.8958,,S
289,1,2,"Hosono, Mr. Masabumi",male,42,0,0,237798,13,,S
290,1,3,"Connolly, Miss. Kate",female,22,0,0,370373,7.75,,Q
291,1,1,"Barber, Miss. Ellen ""Nellie""",female,26,0,0,19877,78.85,,S
292,1,1,"Bishop, Mrs. Dickinson H (Helen Walton)",female,19,1,0,11967,91.0792,B49,C
293,0,2,"Levy, Mr. Rene Jacques",male,36,0,0,SC/Paris 2163,12.875,D,C
294,0,3,"Haas, Miss. Aloisia",female,24,0,0,349236,8.85,,S
295,0,3,"Mineff, Mr. Ivan",male,24,0,0,349233,7.8958,,S
296,0,1,"Lewy, Mr. Ervin G",male,,0,0,PC 17612,27.7208,,C
297,0,3,"Hanna, Mr. Mansour",male,23.5,0,0,2693,7.2292,,C
298,0,1,"Allison, Miss. Helen Loraine",female,2,1,2,113781,151.55,C22 C26,S
299,1,1,"Saalfeld, Mr. Adolphe",male,,0,0,19988,30.5,C106,S
300,1,1,"Baxter, Mrs. James (Helene DeLaudeniere Chaput)",female,50,0,1,PC 17558,247.5208,B58 B60,C
301,1,3,"Kelly, Miss. Anna Katherine ""Annie Kate""",female,,0,0,9234,7.75,,Q
302,1,3,"McCoy, Mr. Bernard",male,,2,0,367226,23.25,,Q
303,0,3,"Johnson, Mr. William Cahoone Jr",male,19,0,0,LINE,0,,S
304,1,2,"Keane, Miss. Nora A",female,,0,0,226593,12.35,E101,Q
305,0,3,"Williams, Mr. Howard Hugh ""Harry""",male,,0,0,A/5 2466,8.05,,S
306,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S
307,1,1,"Fleming, Miss. Margaret",female,,0,0,17421,110.8833,,C
308,1,1,"Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)",female,17,1,0,PC 17758,108.9,C65,C
309,0,2,"Abelson, Mr. Samuel",male,30,1,0,P/PP 3381,24,,C
310,1,1,"Francatelli, Miss. Laura Mabel",female,30,0,0,PC 17485,56.9292,E36,C
311,1,1,"Hays, Miss. Margaret Bechstein",female,24,0,0,11767,83.1583,C54,C
312,1,1,"Ryerson, Miss. Emily Borie",female,18,2,2,PC 17608,262.375,B57 B59 B63 B66,C
313,0,2,"Lahtinen, Mrs. William (Anna Sylfven)",female,26,1,1,250651,26,,S
314,0,3,"Hendekovic, Mr. Ignjac",male,28,0,0,349243,7.8958,,S
315,0,2,"Hart, Mr. Benjamin",male,43,1,1,F.C.C. 13529,26.25,,S
316,1,3,"Nilsson, Miss. Helmina Josefina",female,26,0,0,347470,7.8542,,S
317,1,2,"Kantor, Mrs. Sinai (Miriam Sternin)",female,24,1,0,244367,26,,S
318,0,2,"Moraweck, Dr. Ernest",male,54,0,0,29011,14,,S
319,1,1,"Wick, Miss. Mary Natalie",female,31,0,2,36928,164.8667,C7,S
320,1,1,"Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)",female,40,1,1,16966,134.5,E34,C
321,0,3,"Dennis, Mr. Samuel",male,22,0,0,A/5 21172,7.25,,S
322,0,3,"Danoff, Mr. Yoto",male,27,0,0,349219,7.8958,,S
323,1,2,"Slayter, Miss. Hilda Mary",female,30,0,0,234818,12.35,,Q
324,1,2,"Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh)",female,22,1,1,248738,29,,S
325,0,3,"Sage, Mr. George John Jr",male,,8,2,CA. 2343,69.55,,S
326,1,1,"Young, Miss. Marie Grice",female,36,0,0,PC 17760,135.6333,C32,C
327,0,3,"Nysveen, Mr. Johan Hansen",male,61,0,0,345364,6.2375,,S
328,1,2,"Ball, Mrs. (Ada E Hall)",female,36,0,0,28551,13,D,S
329,1,3,"Goldsmith, Mrs. Frank John (Emily Alice Brown)",female,31,1,1,363291,20.525,,S
330,1,1,"Hippach, Miss. Jean Gertrude",female,16,0,1,111361,57.9792,B18,C
331,1,3,"McCoy, Miss. Agnes",female,,2,0,367226,23.25,,Q
332,0,1,"Partner, Mr. Austen",male,45.5,0,0,113043,28.5,C124,S
333,0,1,"Graham, Mr. George Edward",male,38,0,1,PC 17582,153.4625,C91,S
334,0,3,"Vander Planke, Mr. Leo Edmondus",male,16,2,0,345764,18,,S
335,1,1,"Frauenthal, Mrs. Henry William (Clara Heinsheimer)",female,,1,0,PC 17611,133.65,,S
336,0,3,"Denkoff, Mr. Mitto",male,,0,0,349225,7.8958,,S
337,0,1,"Pears, Mr. Thomas Clinton",male,29,1,0,113776,66.6,C2,S
338,1,1,"Burns, Miss. Elizabeth Margaret",female,41,0,0,16966,134.5,E40,C
339,1,3,"Dahl, Mr. Karl Edwart",male,45,0,0,7598,8.05,,S
340,0,1,"Blackwell, Mr. Stephen Weart",male,45,0,0,113784,35.5,T,S
341,1,2,"Navratil, Master. Edmond Roger",male,2,1,1,230080,26,F2,S
342,1,1,"Fortune, Miss. Alice Elizabeth",female,24,3,2,19950,263,C23 C25 C27,S
343,0,2,"Collander, Mr. Erik Gustaf",male,28,0,0,248740,13,,S
344,0,2,"Sedgwick, Mr. Charles Frederick Waddington",male,25,0,0,244361,13,,S
345,0,2,"Fox, Mr. Stanley Hubert",male,36,0,0,229236,13,,S
346,1,2,"Brown, Miss. Amelia ""Mildred""",female,24,0,0,248733,13,F33,S
347,1,2,"Smith, Miss. Marion Elsie",female,40,0,0,31418,13,,S
348,1,3,"Davison, Mrs. Thomas Henry (Mary E Finck)",female,,1,0,386525,16.1,,S
349,1,3,"Coutts, Master. William Loch ""William""",male,3,1,1,C.A. 37671,15.9,,S
350,0,3,"Dimic, Mr. Jovan",male,42,0,0,315088,8.6625,,S
351,0,3,"Odahl, Mr. Nils Martin",male,23,0,0,7267,9.225,,S
352,0,1,"Williams-Lambert, Mr. Fletcher Fellows",male,,0,0,113510,35,C128,S
353,0,3,"Elias, Mr. Tannous",male,15,1,1,2695,7.2292,,C
354,0,3,"Arnold-Franchi, Mr. Josef",male,25,1,0,349237,17.8,,S
355,0,3,"Yousif, Mr. Wazli",male,,0,0,2647,7.225,,C
356,0,3,"Vanden Steen, Mr. Leo Peter",male,28,0,0,345783,9.5,,S
357,1,1,"Bowerman, Miss. Elsie Edith",female,22,0,1,113505,55,E33,S
358,0,2,"Funk, Miss. Annie Clemmer",female,38,0,0,237671,13,,S
359,1,3,"McGovern, Miss. Mary",female,,0,0,330931,7.8792,,Q
360,1,3,"Mockler, Miss. Helen Mary ""Ellie""",female,,0,0,330980,7.8792,,Q
361,0,3,"Skoog, Mr. Wilhelm",male,40,1,4,347088,27.9,,S
362,0,2,"del Carlo, Mr. Sebastiano",male,29,1,0,SC/PARIS 2167,27.7208,,C
363,0,3,"Barbara, Mrs. (Catherine David)",female,45,0,1,2691,14.4542,,C
364,0,3,"Asim, Mr. Adola",male,35,0,0,SOTON/O.Q. 3101310,7.05,,S
365,0,3,"O'Brien, Mr. Thomas",male,,1,0,370365,15.5,,Q
366,0,3,"Adahl, Mr. Mauritz Nils Martin",male,30,0,0,C 7076,7.25,,S
367,1,1,"Warren, Mrs. Frank Manley (Anna Sophia Atkinson)",female,60,1,0,110813,75.25,D37,C
368,1,3,"Moussa, Mrs. (Mantoura Boulos)",female,,0,0,2626,7.2292,,C
369,1,3,"Jermyn, Miss. Annie",female,,0,0,14313,7.75,,Q
370,1,1,"Aubart, Mme. Leontine Pauline",female,24,0,0,PC 17477,69.3,B35,C
371,1,1,"Harder, Mr. George Achilles",male,25,1,0,11765,55.4417,E50,C
372,0,3,"Wiklund, Mr. Jakob Alfred",male,18,1,0,3101267,6.4958,,S
373,0,3,"Beavan, Mr. William Thomas",male,19,0,0,323951,8.05,,S
374,0,1,"Ringhini, Mr. Sante",male,22,0,0,PC 17760,135.6333,,C
375,0,3,"Palsson, Miss. Stina Viola",female,3,3,1,349909,21.075,,S
376,1,1,"Meyer, Mrs. Edgar Joseph (Leila Saks)",female,,1,0,PC 17604,82.1708,,C
377,1,3,"Landergren, Miss. Aurora Adelia",female,22,0,0,C 7077,7.25,,S
378,0,1,"Widener, Mr. Harry Elkins",male,27,0,2,113503,211.5,C82,C
379,0,3,"Betros, Mr. Tannous",male,20,0,0,2648,4.0125,,C
380,0,3,"Gustafsson, Mr. Karl Gideon",male,19,0,0,347069,7.775,,S
381,1,1,"Bidois, Miss. Rosalie",female,42,0,0,PC 17757,227.525,,C
382,1,3,"Nakid, Miss. Maria (""Mary"")",female,1,0,2,2653,15.7417,,C
383,0,3,"Tikkanen, Mr. Juho",male,32,0,0,STON/O 2. 3101293,7.925,,S
384,1,1,"Holverson, Mrs. Alexander Oskar (Mary Aline Towner)",female,35,1,0,113789,52,,S
385,0,3,"Plotcharsky, Mr. Vasil",male,,0,0,349227,7.8958,,S
386,0,2,"Davies, Mr. Charles Henry",male,18,0,0,S.O.C. 14879,73.5,,S
387,0,3,"Goodwin, Master. Sidney Leonard",male,1,5,2,CA 2144,46.9,,S
388,1,2,"Buss, Miss. Kate",female,36,0,0,27849,13,,S
389,0,3,"Sadlier, Mr. Matthew",male,,0,0,367655,7.7292,,Q
390,1,2,"Lehmann, Miss. Bertha",female,17,0,0,SC 1748,12,,C
391,1,1,"Carter, Mr. William Ernest",male,36,1,2,113760,120,B96 B98,S
392,1,3,"Jansson, Mr. Carl Olof",male,21,0,0,350034,7.7958,,S
393,0,3,"Gustafsson, Mr. Johan Birger",male,28,2,0,3101277,7.925,,S
394,1,1,"Newell, Miss. Marjorie",female,23,1,0,35273,113.275,D36,C
395,1,3,"Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengtsson)",female,24,0,2,PP 9549,16.7,G6,S
396,0,3,"Johansson, Mr. Erik",male,22,0,0,350052,7.7958,,S
397,0,3,"Olsson, Miss. Elina",female,31,0,0,350407,7.8542,,S
398,0,2,"McKane, Mr. Peter David",male,46,0,0,28403,26,,S
399,0,2,"Pain, Dr. Alfred",male,23,0,0,244278,10.5,,S
400,1,2,"Trout, Mrs. William H (Jessie L)",female,28,0,0,240929,12.65,,S
401,1,3,"Niskanen, Mr. Juha",male,39,0,0,STON/O 2. 3101289,7.925,,S
402,0,3,"Adams, Mr. John",male,26,0,0,341826,8.05,,S
403,0,3,"Jussila, Miss. Mari Aina",female,21,1,0,4137,9.825,,S
404,0,3,"Hakkarainen, Mr. Pekka Pietari",male,28,1,0,STON/O2. 3101279,15.85,,S
405,0,3,"Oreskovic, Miss. Marija",female,20,0,0,315096,8.6625,,S
406,0,2,"Gale, Mr. Shadrach",male,34,1,0,28664,21,,S
407,0,3,"Widegren, Mr. Carl/Charles Peter",male,51,0,0,347064,7.75,,S
408,1,2,"Richards, Master. William Rowe",male,3,1,1,29106,18.75,,S
409,0,3,"Birkeland, Mr. Hans Martin Monsen",male,21,0,0,312992,7.775,,S
410,0,3,"Lefebre, Miss. Ida",female,,3,1,4133,25.4667,,S
411,0,3,"Sdycoff, Mr. Todor",male,,0,0,349222,7.8958,,S
412,0,3,"Hart, Mr. Henry",male,,0,0,394140,6.8583,,Q
413,1,1,"Minahan, Miss. Daisy E",female,33,1,0,19928,90,C78,Q
414,0,2,"Cunningham, Mr. Alfred Fleming",male,,0,0,239853,0,,S
415,1,3,"Sundman, Mr. Johan Julian",male,44,0,0,STON/O 2. 3101269,7.925,,S
416,0,3,"Meek, Mrs. Thomas (Annie Louise Rowley)",female,,0,0,343095,8.05,,S
417,1,2,"Drew, Mrs. James Vivian (Lulu Thorne Christian)",female,34,1,1,28220,32.5,,S
418,1,2,"Silven, Miss. Lyyli Karoliina",female,18,0,2,250652,13,,S
419,0,2,"Matthews, Mr. William John",male,30,0,0,28228,13,,S
420,0,3,"Van Impe, Miss. Catharina",female,10,0,2,345773,24.15,,S
421,0,3,"Gheorgheff, Mr. Stanio",male,,0,0,349254,7.8958,,C
422,0,3,"Charters, Mr. David",male,21,0,0,A/5. 13032,7.7333,,Q
423,0,3,"Zimmerman, Mr. Leo",male,29,0,0,315082,7.875,,S
424,0,3,"Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)",female,28,1,1,347080,14.4,,S
425,0,3,"Rosblom, Mr. Viktor Richard",male,18,1,1,370129,20.2125,,S
426,0,3,"Wiseman, Mr. Phillippe",male,,0,0,A/4. 34244,7.25,,S
427,1,2,"Clarke, Mrs. Charles V (Ada Maria Winfield)",female,28,1,0,2003,26,,S
428,1,2,"Phillips, Miss. Kate Florence (""Mrs Kate Louise Phillips Marshall"")",female,19,0,0,250655,26,,S
429,0,3,"Flynn, Mr. James",male,,0,0,364851,7.75,,Q
430,1,3,"Pickard, Mr. Berk (Berk Trembisky)",male,32,0,0,SOTON/O.Q. 392078,8.05,E10,S
431,1,1,"Bjornstrom-Steffansson, Mr. Mauritz Hakan",male,28,0,0,110564,26.55,C52,S
432,1,3,"Thorneycroft, Mrs. Percival (Florence Kate White)",female,,1,0,376564,16.1,,S
433,1,2,"Louch, Mrs. Charles Alexander (Alice Adelaide Slow)",female,42,1,0,SC/AH 3085,26,,S
434,0,3,"Kallio, Mr. Nikolai Erland",male,17,0,0,STON/O 2. 3101274,7.125,,S
435,0,1,"Silvey, Mr. William Baird",male,50,1,0,13507,55.9,E44,S
436,1,1,"Carter, Miss. Lucile Polk",female,14,1,2,113760,120,B96 B98,S
437,0,3,"Ford, Miss. Doolina Margaret ""Daisy""",female,21,2,2,W./C. 6608,34.375,,S
438,1,2,"Richards, Mrs. Sidney (Emily Hocking)",female,24,2,3,29106,18.75,,S
439,0,1,"Fortune, Mr. Mark",male,64,1,4,19950,263,C23 C25 C27,S
440,0,2,"Kvillner, Mr. Johan Henrik Johannesson",male,31,0,0,C.A. 18723,10.5,,S
441,1,2,"Hart, Mrs. Benjamin (Esther Ada Bloomfield)",female,45,1,1,F.C.C. 13529,26.25,,S
442,0,3,"Hampe, Mr. Leon",male,20,0,0,345769,9.5,,S
443,0,3,"Petterson, Mr. Johan Emil",male,25,1,0,347076,7.775,,S
444,1,2,"Reynaldo, Ms. Encarnacion",female,28,0,0,230434,13,,S
445,1,3,"Johannesen-Bratthammer, Mr. Bernt",male,,0,0,65306,8.1125,,S
446,1,1,"Dodge, Master. Washington",male,4,0,2,33638,81.8583,A34,S
447,1,2,"Mellinger, Miss. Madeleine Violet",female,13,0,1,250644,19.5,,S
448,1,1,"Seward, Mr. Frederic Kimber",male,34,0,0,113794,26.55,,S
449,1,3,"Baclini, Miss. Marie Catherine",female,5,2,1,2666,19.2583,,C
450,1,1,"Peuchen, Major. Arthur Godfrey",male,52,0,0,113786,30.5,C104,S
451,0,2,"West, Mr. Edwy Arthur",male,36,1,2,C.A. 34651,27.75,,S
452,0,3,"Hagland, Mr. Ingvald Olai Olsen",male,,1,0,65303,19.9667,,S
453,0,1,"Foreman, Mr. Benjamin Laventall",male,30,0,0,113051,27.75,C111,C
454,1,1,"Goldenberg, Mr. Samuel L",male,49,1,0,17453,89.1042,C92,C
455,0,3,"Peduzzi, Mr. Joseph",male,,0,0,A/5 2817,8.05,,S
456,1,3,"Jalsevac, Mr. Ivan",male,29,0,0,349240,7.8958,,C
457,0,1,"Millet, Mr. Francis Davis",male,65,0,0,13509,26.55,E38,S
458,1,1,"Kenyon, Mrs. Frederick R (Marion)",female,,1,0,17464,51.8625,D21,S
459,1,2,"Toomey, Miss. Ellen",female,50,0,0,F.C.C. 13531,10.5,,S
460,0,3,"O'Connor, Mr. Maurice",male,,0,0,371060,7.75,,Q
461,1,1,"Anderson, Mr. Harry",male,48,0,0,19952,26.55,E12,S
462,0,3,"Morley, Mr. William",male,34,0,0,364506,8.05,,S
463,0,1,"Gee, Mr. Arthur H",male,47,0,0,111320,38.5,E63,S
464,0,2,"Milling, Mr. Jacob Christian",male,48,0,0,234360,13,,S
465,0,3,"Maisner, Mr. Simon",male,,0,0,A/S 2816,8.05,,S
466,0,3,"Goncalves, Mr. Manuel Estanslas",male,38,0,0,SOTON/O.Q. 3101306,7.05,,S
467,0,2,"Campbell, Mr. William",male,,0,0,239853,0,,S
468,0,1,"Smart, Mr. John Montgomery",male,56,0,0,113792,26.55,,S
469,0,3,"Scanlan, Mr. James",male,,0,0,36209,7.725,,Q
470,1,3,"Baclini, Miss. Helene Barbara",female,0.75,2,1,2666,19.2583,,C
471,0,3,"Keefe, Mr. Arthur",male,,0,0,323592,7.25,,S
472,0,3,"Cacic, Mr. Luka",male,38,0,0,315089,8.6625,,S
473,1,2,"West, Mrs. Edwy Arthur (Ada Mary Worth)",female,33,1,2,C.A. 34651,27.75,,S
474,1,2,"Jerwan, Mrs. Amin S (Marie Marthe Thuillard)",female,23,0,0,SC/AH Basle 541,13.7917,D,C
475,0,3,"Strandberg, Miss. Ida Sofia",female,22,0,0,7553,9.8375,,S
476,0,1,"Clifford, Mr. George Quincy",male,,0,0,110465,52,A14,S
477,0,2,"Renouf, Mr. Peter Henry",male,34,1,0,31027,21,,S
478,0,3,"Braund, Mr. Lewis Richard",male,29,1,0,3460,7.0458,,S
479,0,3,"Karlsson, Mr. Nils August",male,22,0,0,350060,7.5208,,S
480,1,3,"Hirvonen, Miss. Hildur E",female,2,0,1,3101298,12.2875,,S
481,0,3,"Goodwin, Master. Harold Victor",male,9,5,2,CA 2144,46.9,,S
482,0,2,"Frost, Mr. Anthony Wood ""Archie""",male,,0,0,239854,0,,S
483,0,3,"Rouse, Mr. Richard Henry",male,50,0,0,A/5 3594,8.05,,S
484,1,3,"Turkula, Mrs. (Hedwig)",female,63,0,0,4134,9.5875,,S
485,1,1,"Bishop, Mr. Dickinson H",male,25,1,0,11967,91.0792,B49,C
486,0,3,"Lefebre, Miss. Jeannie",female,,3,1,4133,25.4667,,S
487,1,1,"Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby)",female,35,1,0,19943,90,C93,S
488,0,1,"Kent, Mr. Edward Austin",male,58,0,0,11771,29.7,B37,C
489,0,3,"Somerton, Mr. Francis William",male,30,0,0,A.5. 18509,8.05,,S
490,1,3,"Coutts, Master. Eden Leslie ""Neville""",male,9,1,1,C.A. 37671,15.9,,S
491,0,3,"Hagland, Mr. Konrad Mathias Reiersen",male,,1,0,65304,19.9667,,S
492,0,3,"Windelov, Mr. Einar",male,21,0,0,SOTON/OQ 3101317,7.25,,S
493,0,1,"Molson, Mr. Harry Markland",male,55,0,0,113787,30.5,C30,S
494,0,1,"Artagaveytia, Mr. Ramon",male,71,0,0,PC 17609,49.5042,,C
495,0,3,"Stanley, Mr. Edward Roland",male,21,0,0,A/4 45380,8.05,,S
496,0,3,"Yousseff, Mr. Gerious",male,,0,0,2627,14.4583,,C
497,1,1,"Eustis, Miss. Elizabeth Mussey",female,54,1,0,36947,78.2667,D20,C
498,0,3,"Shellard, Mr. Frederick William",male,,0,0,C.A. 6212,15.1,,S
499,0,1,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25,1,2,113781,151.55,C22 C26,S
500,0,3,"Svensson, Mr. Olof",male,24,0,0,350035,7.7958,,S
501,0,3,"Calic, Mr. Petar",male,17,0,0,315086,8.6625,,S
502,0,3,"Canavan, Miss. Mary",female,21,0,0,364846,7.75,,Q
503,0,3,"O'Sullivan, Miss. Bridget Mary",female,,0,0,330909,7.6292,,Q
504,0,3,"Laitinen, Miss. Kristina Sofia",female,37,0,0,4135,9.5875,,S
505,1,1,"Maioni, Miss. Roberta",female,16,0,0,110152,86.5,B79,S
506,0,1,"Penasco y Castellana, Mr. Victor de Satode",male,18,1,0,PC 17758,108.9,C65,C
507,1,2,"Quick, Mrs. Frederick Charles (Jane Richards)",female,33,0,2,26360,26,,S
508,1,1,"Bradley, Mr. George (""George Arthur Brayton"")",male,,0,0,111427,26.55,,S
509,0,3,"Olsen, Mr. Henry Margido",male,28,0,0,C 4001,22.525,,S
510,1,3,"Lang, Mr. Fang",male,26,0,0,1601,56.4958,,S
511,1,3,"Daly, Mr. Eugene Patrick",male,29,0,0,382651,7.75,,Q
512,0,3,"Webber, Mr. James",male,,0,0,SOTON/OQ 3101316,8.05,,S
513,1,1,"McGough, Mr. James Robert",male,36,0,0,PC 17473,26.2875,E25,S
514,1,1,"Rothschild, Mrs. Martin (Elizabeth L. Barrett)",female,54,1,0,PC 17603,59.4,,C
515,0,3,"Coleff, Mr. Satio",male,24,0,0,349209,7.4958,,S
516,0,1,"Walker, Mr. William Anderson",male,47,0,0,36967,34.0208,D46,S
517,1,2,"Lemore, Mrs. (Amelia Milley)",female,34,0,0,C.A. 34260,10.5,F33,S
518,0,3,"Ryan, Mr. Patrick",male,,0,0,371110,24.15,,Q
519,1,2,"Angle, Mrs. William A (Florence ""Mary"" Agnes Hughes)",female,36,1,0,226875,26,,S
520,0,3,"Pavlovic, Mr. Stefo",male,32,0,0,349242,7.8958,,S
521,1,1,"Perreault, Miss. Anne",female,30,0,0,12749,93.5,B73,S
522,0,3,"Vovk, Mr. Janko",male,22,0,0,349252,7.8958,,S
523,0,3,"Lahoud, Mr. Sarkis",male,,0,0,2624,7.225,,C
524,1,1,"Hippach, Mrs. Louis Albert (Ida Sophia Fischer)",female,44,0,1,111361,57.9792,B18,C
525,0,3,"Kassem, Mr. Fared",male,,0,0,2700,7.2292,,C
526,0,3,"Farrell, Mr. James",male,40.5,0,0,367232,7.75,,Q
527,1,2,"Ridsdale, Miss. Lucy",female,50,0,0,W./C. 14258,10.5,,S
528,0,1,"Farthing, Mr. John",male,,0,0,PC 17483,221.7792,C95,S
529,0,3,"Salonen, Mr. Johan Werner",male,39,0,0,3101296,7.925,,S
530,0,2,"Hocking, Mr. Richard George",male,23,2,1,29104,11.5,,S
531,1,2,"Quick, Miss. Phyllis May",female,2,1,1,26360,26,,S
532,0,3,"Toufik, Mr. Nakli",male,,0,0,2641,7.2292,,C
533,0,3,"Elias, Mr. Joseph Jr",male,17,1,1,2690,7.2292,,C
534,1,3,"Peter, Mrs. Catherine (Catherine Rizk)",female,,0,2,2668,22.3583,,C
535,0,3,"Cacic, Miss. Marija",female,30,0,0,315084,8.6625,,S
536,1,2,"Hart, Miss. Eva Miriam",female,7,0,2,F.C.C. 13529,26.25,,S
537,0,1,"Butt, Major. Archibald Willingham",male,45,0,0,113050,26.55,B38,S
538,1,1,"LeRoy, Miss. Bertha",female,30,0,0,PC 17761,106.425,,C
539,0,3,"Risien, Mr. Samuel Beard",male,,0,0,364498,14.5,,S
540,1,1,"Frolicher, Miss. Hedwig Margaritha",female,22,0,2,13568,49.5,B39,C
541,1,1,"Crosby, Miss. Harriet R",female,36,0,2,WE/P 5735,71,B22,S
542,0,3,"Andersson, Miss. Ingeborg Constanzia",female,9,4,2,347082,31.275,,S
543,0,3,"Andersson, Miss. Sigrid Elisabeth",female,11,4,2,347082,31.275,,S
544,1,2,"Beane, Mr. Edward",male,32,1,0,2908,26,,S
545,0,1,"Douglas, Mr. Walter Donald",male,50,1,0,PC 17761,106.425,C86,C
546,0,1,"Nicholson, Mr. Arthur Ernest",male,64,0,0,693,26,,S
547,1,2,"Beane, Mrs. Edward (Ethel Clarke)",female,19,1,0,2908,26,,S
548,1,2,"Padro y Manent, Mr. Julian",male,,0,0,SC/PARIS 2146,13.8625,,C
549,0,3,"Goldsmith, Mr. Frank John",male,33,1,1,363291,20.525,,S
550,1,2,"Davies, Master. John Morgan Jr",male,8,1,1,C.A. 33112,36.75,,S
551,1,1,"Thayer, Mr. John Borland Jr",male,17,0,2,17421,110.8833,C70,C
552,0,2,"Sharp, Mr. Percival James R",male,27,0,0,244358,26,,S
553,0,3,"O'Brien, Mr. Timothy",male,,0,0,330979,7.8292,,Q
554,1,3,"Leeni, Mr. Fahim (""Philip Zenni"")",male,22,0,0,2620,7.225,,C
555,1,3,"Ohman, Miss. Velin",female,22,0,0,347085,7.775,,S
556,0,1,"Wright, Mr. George",male,62,0,0,113807,26.55,,S
557,1,1,"Duff Gordon, Lady. (Lucille Christiana Sutherland) (""Mrs Morgan"")",female,48,1,0,11755,39.6,A16,C
558,0,1,"Robbins, Mr. Victor",male,,0,0,PC 17757,227.525,,C
559,1,1,"Taussig, Mrs. Emil (Tillie Mandelbaum)",female,39,1,1,110413,79.65,E67,S
560,1,3,"de Messemaeker, Mrs. Guillaume Joseph (Emma)",female,36,1,0,345572,17.4,,S
561,0,3,"Morrow, Mr. Thomas Rowan",male,,0,0,372622,7.75,,Q
562,0,3,"Sivic, Mr. Husein",male,40,0,0,349251,7.8958,,S
563,0,2,"Norman, Mr. Robert Douglas",male,28,0,0,218629,13.5,,S
564,0,3,"Simmons, Mr. John",male,,0,0,SOTON/OQ 392082,8.05,,S
565,0,3,"Meanwell, Miss. (Marion Ogden)",female,,0,0,SOTON/O.Q. 392087,8.05,,S
566,0,3,"Davies, Mr. Alfred J",male,24,2,0,A/4 48871,24.15,,S
567,0,3,"Stoytcheff, Mr. Ilia",male,19,0,0,349205,7.8958,,S
568,0,3,"Palsson, Mrs. Nils (Alma Cornelia Berglund)",female,29,0,4,349909,21.075,,S
569,0,3,"Doharr, Mr. Tannous",male,,0,0,2686,7.2292,,C
570,1,3,"Jonsson, Mr. Carl",male,32,0,0,350417,7.8542,,S
571,1,2,"Harris, Mr. George",male,62,0,0,S.W./PP 752,10.5,,S
572,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53,2,0,11769,51.4792,C101,S
573,1,1,"Flynn, Mr. John Irwin (""Irving"")",male,36,0,0,PC 17474,26.3875,E25,S
574,1,3,"Kelly, Miss. Mary",female,,0,0,14312,7.75,,Q
575,0,3,"Rush, Mr. Alfred George John",male,16,0,0,A/4. 20589,8.05,,S
576,0,3,"Patchett, Mr. George",male,19,0,0,358585,14.5,,S
577,1,2,"Garside, Miss. Ethel",female,34,0,0,243880,13,,S
578,1,1,"Silvey, Mrs. William Baird (Alice Munger)",female,39,1,0,13507,55.9,E44,S
579,0,3,"Caram, Mrs. Joseph (Maria Elias)",female,,1,0,2689,14.4583,,C
580,1,3,"Jussila, Mr. Eiriik",male,32,0,0,STON/O 2. 3101286,7.925,,S
581,1,2,"Christy, Miss. Julie Rachel",female,25,1,1,237789,30,,S
582,1,1,"Thayer, Mrs. John Borland (Marian Longstreth Morris)",female,39,1,1,17421,110.8833,C68,C
583,0,2,"Downton, Mr. William James",male,54,0,0,28403,26,,S
584,0,1,"Ross, Mr. John Hugo",male,36,0,0,13049,40.125,A10,C
585,0,3,"Paulner, Mr. Uscher",male,,0,0,3411,8.7125,,C
586,1,1,"Taussig, Miss. Ruth",female,18,0,2,110413,79.65,E68,S
587,0,2,"Jarvis, Mr. John Denzil",male,47,0,0,237565,15,,S
588,1,1,"Frolicher-Stehli, Mr. Maxmillian",male,60,1,1,13567,79.2,B41,C
589,0,3,"Gilinski, Mr. Eliezer",male,22,0,0,14973,8.05,,S
590,0,3,"Murdlin, Mr. Joseph",male,,0,0,A./5. 3235,8.05,,S
591,0,3,"Rintamaki, Mr. Matti",male,35,0,0,STON/O 2. 3101273,7.125,,S
592,1,1,"Stephenson, Mrs. Walter Bertram (Martha Eustis)",female,52,1,0,36947,78.2667,D20,C
593,0,3,"Elsbury, Mr. William James",male,47,0,0,A/5 3902,7.25,,S
594,0,3,"Bourke, Miss. Mary",female,,0,2,364848,7.75,,Q
595,0,2,"Chapman, Mr. John Henry",male,37,1,0,SC/AH 29037,26,,S
596,0,3,"Van Impe, Mr. Jean Baptiste",male,36,1,1,345773,24.15,,S
597,1,2,"Leitch, Miss. Jessie Wills",female,,0,0,248727,33,,S
598,0,3,"Johnson, Mr. Alfred",male,49,0,0,LINE,0,,S
599,0,3,"Boulos, Mr. Hanna",male,,0,0,2664,7.225,,C
600,1,1,"Duff Gordon, Sir. Cosmo Edmund (""Mr Morgan"")",male,49,1,0,PC 17485,56.9292,A20,C
601,1,2,"Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)",female,24,2,1,243847,27,,S
602,0,3,"Slabenoff, Mr. Petco",male,,0,0,349214,7.8958,,S
603,0,1,"Harrington, Mr. Charles H",male,,0,0,113796,42.4,,S
604,0,3,"Torber, Mr. Ernst William",male,44,0,0,364511,8.05,,S
605,1,1,"Homer, Mr. Harry (""Mr E Haven"")",male,35,0,0,111426,26.55,,C
606,0,3,"Lindell, Mr. Edvard Bengtsson",male,36,1,0,349910,15.55,,S
607,0,3,"Karaic, Mr. Milan",male,30,0,0,349246,7.8958,,S
608,1,1,"Daniel, Mr. Robert Williams",male,27,0,0,113804,30.5,,S
609,1,2,"Laroche, Mrs. Joseph (Juliette Marie Louise Lafargue)",female,22,1,2,SC/Paris 2123,41.5792,,C
610,1,1,"Shutes, Miss. Elizabeth W",female,40,0,0,PC 17582,153.4625,C125,S
611,0,3,"Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)",female,39,1,5,347082,31.275,,S
612,0,3,"Jardin, Mr. Jose Neto",male,,0,0,SOTON/O.Q. 3101305,7.05,,S
613,1,3,"Murphy, Miss. Margaret Jane",female,,1,0,367230,15.5,,Q
614,0,3,"Horgan, Mr. John",male,,0,0,370377,7.75,,Q
615,0,3,"Brocklebank, Mr. William Alfred",male,35,0,0,364512,8.05,,S
616,1,2,"Herman, Miss. Alice",female,24,1,2,220845,65,,S
617,0,3,"Danbom, Mr. Ernst Gilbert",male,34,1,1,347080,14.4,,S
618,0,3,"Lobb, Mrs. William Arthur (Cordelia K Stanlick)",female,26,1,0,A/5. 3336,16.1,,S
619,1,2,"Becker, Miss. Marion Louise",female,4,2,1,230136,39,F4,S
620,0,2,"Gavey, Mr. Lawrence",male,26,0,0,31028,10.5,,S
621,0,3,"Yasbeck, Mr. Antoni",male,27,1,0,2659,14.4542,,C
622,1,1,"Kimball, Mr. Edwin Nelson Jr",male,42,1,0,11753,52.5542,D19,S
623,1,3,"Nakid, Mr. Sahid",male,20,1,1,2653,15.7417,,C
624,0,3,"Hansen, Mr. Henry Damsgaard",male,21,0,0,350029,7.8542,,S
625,0,3,"Bowen, Mr. David John ""Dai""",male,21,0,0,54636,16.1,,S
626,0,1,"Sutton, Mr. Frederick",male,61,0,0,36963,32.3208,D50,S
627,0,2,"Kirkland, Rev. Charles Leonard",male,57,0,0,219533,12.35,,Q
628,1,1,"Longley, Miss. Gretchen Fiske",female,21,0,0,13502,77.9583,D9,S
629,0,3,"Bostandyeff, Mr. Guentcho",male,26,0,0,349224,7.8958,,S
630,0,3,"O'Connell, Mr. Patrick D",male,,0,0,334912,7.7333,,Q
631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80,0,0,27042,30,A23,S
632,0,3,"Lundahl, Mr. Johan Svensson",male,51,0,0,347743,7.0542,,S
633,1,1,"Stahelin-Maeglin, Dr. Max",male,32,0,0,13214,30.5,B50,C
634,0,1,"Parr, Mr. William Henry Marsh",male,,0,0,112052,0,,S
635,0,3,"Skoog, Miss. Mabel",female,9,3,2,347088,27.9,,S
636,1,2,"Davis, Miss. Mary",female,28,0,0,237668,13,,S
637,0,3,"Leinonen, Mr. Antti Gustaf",male,32,0,0,STON/O 2. 3101292,7.925,,S
638,0,2,"Collyer, Mr. Harvey",male,31,1,1,C.A. 31921,26.25,,S
639,0,3,"Panula, Mrs. Juha (Maria Emilia Ojala)",female,41,0,5,3101295,39.6875,,S
640,0,3,"Thorneycroft, Mr. Percival",male,,1,0,376564,16.1,,S
641,0,3,"Jensen, Mr. Hans Peder",male,20,0,0,350050,7.8542,,S
642,1,1,"Sagesser, Mlle. Emma",female,24,0,0,PC 17477,69.3,B35,C
643,0,3,"Skoog, Miss. Margit Elizabeth",female,2,3,2,347088,27.9,,S
644,1,3,"Foo, Mr. Choong",male,,0,0,1601,56.4958,,S
645,1,3,"Baclini, Miss. Eugenie",female,0.75,2,1,2666,19.2583,,C
646,1,1,"Harper, Mr. Henry Sleeper",male,48,1,0,PC 17572,76.7292,D33,C
647,0,3,"Cor, Mr. Liudevit",male,19,0,0,349231,7.8958,,S
648,1,1,"Simonius-Blumer, Col. Oberst Alfons",male,56,0,0,13213,35.5,A26,C
649,0,3,"Willey, Mr. Edward",male,,0,0,S.O./P.P. 751,7.55,,S
650,1,3,"Stanley, Miss. Amy Zillah Elsie",female,23,0,0,CA. 2314,7.55,,S
651,0,3,"Mitkoff, Mr. Mito",male,,0,0,349221,7.8958,,S
652,1,2,"Doling, Miss. Elsie",female,18,0,1,231919,23,,S
653,0,3,"Kalvik, Mr. Johannes Halvorsen",male,21,0,0,8475,8.4333,,S
654,1,3,"O'Leary, Miss. Hanora ""Norah""",female,,0,0,330919,7.8292,,Q
655,0,3,"Hegarty, Miss. Hanora ""Nora""",female,18,0,0,365226,6.75,,Q
656,0,2,"Hickman, Mr. Leonard Mark",male,24,2,0,S.O.C. 14879,73.5,,S
657,0,3,"Radeff, Mr. Alexander",male,,0,0,349223,7.8958,,S
658,0,3,"Bourke, Mrs. John (Catherine)",female,32,1,1,364849,15.5,,Q
659,0,2,"Eitemiller, Mr. George Floyd",male,23,0,0,29751,13,,S
660,0,1,"Newell, Mr. Arthur Webster",male,58,0,2,35273,113.275,D48,C
661,1,1,"Frauenthal, Dr. Henry William",male,50,2,0,PC 17611,133.65,,S
662,0,3,"Badt, Mr. Mohamed",male,40,0,0,2623,7.225,,C
663,0,1,"Colley, Mr. Edward Pomeroy",male,47,0,0,5727,25.5875,E58,S
664,0,3,"Coleff, Mr. Peju",male,36,0,0,349210,7.4958,,S
665,1,3,"Lindqvist, Mr. Eino William",male,20,1,0,STON/O 2. 3101285,7.925,,S
666,0,2,"Hickman, Mr. Lewis",male,32,2,0,S.O.C. 14879,73.5,,S
667,0,2,"Butler, Mr. Reginald Fenton",male,25,0,0,234686,13,,S
668,0,3,"Rommetvedt, Mr. Knud Paust",male,,0,0,312993,7.775,,S
669,0,3,"Cook, Mr. Jacob",male,43,0,0,A/5 3536,8.05,,S
670,1,1,"Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright)",female,,1,0,19996,52,C126,S
671,1,2,"Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford)",female,40,1,1,29750,39,,S
672,0,1,"Davidson, Mr. Thornton",male,31,1,0,F.C. 12750,52,B71,S
673,0,2,"Mitchell, Mr. Henry Michael",male,70,0,0,C.A. 24580,10.5,,S
674,1,2,"Wilhelms, Mr. Charles",male,31,0,0,244270,13,,S
675,0,2,"Watson, Mr. Ennis Hastings",male,,0,0,239856,0,,S
676,0,3,"Edvardsson, Mr. Gustaf Hjalmar",male,18,0,0,349912,7.775,,S
677,0,3,"Sawyer, Mr. Frederick Charles",male,24.5,0,0,342826,8.05,,S
678,1,3,"Turja, Miss. Anna Sofia",female,18,0,0,4138,9.8417,,S
679,0,3,"Goodwin, Mrs. Frederick (Augusta Tyler)",female,43,1,6,CA 2144,46.9,,S
680,1,1,"Cardeza, Mr. Thomas Drake Martinez",male,36,0,1,PC 17755,512.3292,B51 B53 B55,C
681,0,3,"Peters, Miss. Katie",female,,0,0,330935,8.1375,,Q
682,1,1,"Hassab, Mr. Hammad",male,27,0,0,PC 17572,76.7292,D49,C
683,0,3,"Olsvigen, Mr. Thor Anderson",male,20,0,0,6563,9.225,,S
684,0,3,"Goodwin, Mr. Charles Edward",male,14,5,2,CA 2144,46.9,,S
685,0,2,"Brown, Mr. Thomas William Solomon",male,60,1,1,29750,39,,S
686,0,2,"Laroche, Mr. Joseph Philippe Lemercier",male,25,1,2,SC/Paris 2123,41.5792,,C
687,0,3,"Panula, Mr. Jaako Arnold",male,14,4,1,3101295,39.6875,,S
688,0,3,"Dakic, Mr. Branko",male,19,0,0,349228,10.1708,,S
689,0,3,"Fischer, Mr. Eberhard Thelander",male,18,0,0,350036,7.7958,,S
690,1,1,"Madill, Miss. Georgette Alexandra",female,15,0,1,24160,211.3375,B5,S
691,1,1,"Dick, Mr. Albert Adrian",male,31,1,0,17474,57,B20,S
692,1,3,"Karun, Miss. Manca",female,4,0,1,349256,13.4167,,C
693,1,3,"Lam, Mr. Ali",male,,0,0,1601,56.4958,,S
694,0,3,"Saad, Mr. Khalil",male,25,0,0,2672,7.225,,C
695,0,1,"Weir, Col. John",male,60,0,0,113800,26.55,,S
696,0,2,"Chapman, Mr. Charles Henry",male,52,0,0,248731,13.5,,S
697,0,3,"Kelly, Mr. James",male,44,0,0,363592,8.05,,S
698,1,3,"Mullens, Miss. Katherine ""Katie""",female,,0,0,35852,7.7333,,Q
699,0,1,"Thayer, Mr. John Borland",male,49,1,1,17421,110.8833,C68,C
700,0,3,"Humblen, Mr. Adolf Mathias Nicolai Olsen",male,42,0,0,348121,7.65,F G63,S
701,1,1,"Astor, Mrs. John Jacob (Madeleine Talmadge Force)",female,18,1,0,PC 17757,227.525,C62 C64,C
702,1,1,"Silverthorne, Mr. Spencer Victor",male,35,0,0,PC 17475,26.2875,E24,S
703,0,3,"Barbara, Miss. Saiide",female,18,0,1,2691,14.4542,,C
704,0,3,"Gallagher, Mr. Martin",male,25,0,0,36864,7.7417,,Q
705,0,3,"Hansen, Mr. Henrik Juul",male,26,1,0,350025,7.8542,,S
706,0,2,"Morley, Mr. Henry Samuel (""Mr Henry Marshall"")",male,39,0,0,250655,26,,S
707,1,2,"Kelly, Mrs. Florence ""Fannie""",female,45,0,0,223596,13.5,,S
708,1,1,"Calderhead, Mr. Edward Pennington",male,42,0,0,PC 17476,26.2875,E24,S
709,1,1,"Cleaver, Miss. Alice",female,22,0,0,113781,151.55,,S
710,1,3,"Moubarek, Master. Halim Gonios (""William George"")",male,,1,1,2661,15.2458,,C
711,1,1,"Mayne, Mlle. Berthe Antonine (""Mrs de Villiers"")",female,24,0,0,PC 17482,49.5042,C90,C
712,0,1,"Klaber, Mr. Herman",male,,0,0,113028,26.55,C124,S
713,1,1,"Taylor, Mr. Elmer Zebley",male,48,1,0,19996,52,C126,S
714,0,3,"Larsson, Mr. August Viktor",male,29,0,0,7545,9.4833,,S
715,0,2,"Greenberg, Mr. Samuel",male,52,0,0,250647,13,,S
716,0,3,"Soholt, Mr. Peter Andreas Lauritz Andersen",male,19,0,0,348124,7.65,F G73,S
717,1,1,"Endres, Miss. Caroline Louise",female,38,0,0,PC 17757,227.525,C45,C
718,1,2,"Troutt, Miss. Edwina Celia ""Winnie""",female,27,0,0,34218,10.5,E101,S
719,0,3,"McEvoy, Mr. Michael",male,,0,0,36568,15.5,,Q
720,0,3,"Johnson, Mr. Malkolm Joackim",male,33,0,0,347062,7.775,,S
721,1,2,"Harper, Miss. Annie Jessie ""Nina""",female,6,0,1,248727,33,,S
722,0,3,"Jensen, Mr. Svend Lauritz",male,17,1,0,350048,7.0542,,S
723,0,2,"Gillespie, Mr. William Henry",male,34,0,0,12233,13,,S
724,0,2,"Hodges, Mr. Henry Price",male,50,0,0,250643,13,,S
725,1,1,"Chambers, Mr. Norman Campbell",male,27,1,0,113806,53.1,E8,S
726,0,3,"Oreskovic, Mr. Luka",male,20,0,0,315094,8.6625,,S
727,1,2,"Renouf, Mrs. Peter Henry (Lillian Jefferys)",female,30,3,0,31027,21,,S
728,1,3,"Mannion, Miss. Margareth",female,,0,0,36866,7.7375,,Q
729,0,2,"Bryhl, Mr. Kurt Arnold Gottfrid",male,25,1,0,236853,26,,S
730,0,3,"Ilmakangas, Miss. Pieta Sofia",female,25,1,0,STON/O2. 3101271,7.925,,S
731,1,1,"Allen, Miss. Elisabeth Walton",female,29,0,0,24160,211.3375,B5,S
732,0,3,"Hassan, Mr. Houssein G N",male,11,0,0,2699,18.7875,,C
733,0,2,"Knight, Mr. Robert J",male,,0,0,239855,0,,S
734,0,2,"Berriman, Mr. William John",male,23,0,0,28425,13,,S
735,0,2,"Troupiansky, Mr. Moses Aaron",male,23,0,0,233639,13,,S
736,0,3,"Williams, Mr. Leslie",male,28.5,0,0,54636,16.1,,S
737,0,3,"Ford, Mrs. Edward (Margaret Ann Watson)",female,48,1,3,W./C. 6608,34.375,,S
738,1,1,"Lesurer, Mr. Gustave J",male,35,0,0,PC 17755,512.3292,B101,C
739,0,3,"Ivanoff, Mr. Kanio",male,,0,0,349201,7.8958,,S
740,0,3,"Nankoff, Mr. Minko",male,,0,0,349218,7.8958,,S
741,1,1,"Hawksford, Mr. Walter James",male,,0,0,16988,30,D45,S
742,0,1,"Cavendish, Mr. Tyrell William",male,36,1,0,19877,78.85,C46,S
743,1,1,"Ryerson, Miss. Susan Parker ""Suzette""",female,21,2,2,PC 17608,262.375,B57 B59 B63 B66,C
744,0,3,"McNamee, Mr. Neal",male,24,1,0,376566,16.1,,S
745,1,3,"Stranden, Mr. Juho",male,31,0,0,STON/O 2. 3101288,7.925,,S
746,0,1,"Crosby, Capt. Edward Gifford",male,70,1,1,WE/P 5735,71,B22,S
747,0,3,"Abbott, Mr. Rossmore Edward",male,16,1,1,C.A. 2673,20.25,,S
748,1,2,"Sinkkonen, Miss. Anna",female,30,0,0,250648,13,,S
749,0,1,"Marvin, Mr. Daniel Warner",male,19,1,0,113773,53.1,D30,S
750,0,3,"Connaghton, Mr. Michael",male,31,0,0,335097,7.75,,Q
751,1,2,"Wells, Miss. Joan",female,4,1,1,29103,23,,S
752,1,3,"Moor, Master. Meier",male,6,0,1,392096,12.475,E121,S
753,0,3,"Vande Velde, Mr. Johannes Joseph",male,33,0,0,345780,9.5,,S
754,0,3,"Jonkoff, Mr. Lalio",male,23,0,0,349204,7.8958,,S
755,1,2,"Herman, Mrs. Samuel (Jane Laver)",female,48,1,2,220845,65,,S
756,1,2,"Hamalainen, Master. Viljo",male,0.67,1,1,250649,14.5,,S
757,0,3,"Carlsson, Mr. August Sigfrid",male,28,0,0,350042,7.7958,,S
758,0,2,"Bailey, Mr. Percy Andrew",male,18,0,0,29108,11.5,,S
759,0,3,"Theobald, Mr. Thomas Leonard",male,34,0,0,363294,8.05,,S
760,1,1,"Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)",female,33,0,0,110152,86.5,B77,S
761,0,3,"Garfirth, Mr. John",male,,0,0,358585,14.5,,S
762,0,3,"Nirva, Mr. Iisakki Antino Aijo",male,41,0,0,SOTON/O2 3101272,7.125,,S
763,1,3,"Barah, Mr. Hanna Assi",male,20,0,0,2663,7.2292,,C
764,1,1,"Carter, Mrs. William Ernest (Lucile Polk)",female,36,1,2,113760,120,B96 B98,S
765,0,3,"Eklund, Mr. Hans Linus",male,16,0,0,347074,7.775,,S
766,1,1,"Hogeboom, Mrs. John C (Anna Andrews)",female,51,1,0,13502,77.9583,D11,S
767,0,1,"Brewe, Dr. Arthur Jackson",male,,0,0,112379,39.6,,C
768,0,3,"Mangan, Miss. Mary",female,30.5,0,0,364850,7.75,,Q
769,0,3,"Moran, Mr. Daniel J",male,,1,0,371110,24.15,,Q
770,0,3,"Gronnestad, Mr. Daniel Danielsen",male,32,0,0,8471,8.3625,,S
771,0,3,"Lievens, Mr. Rene Aime",male,24,0,0,345781,9.5,,S
772,0,3,"Jensen, Mr. Niels Peder",male,48,0,0,350047,7.8542,,S
773,0,2,"Mack, Mrs. (Mary)",female,57,0,0,S.O./P.P. 3,10.5,E77,S
774,0,3,"Elias, Mr. Dibo",male,,0,0,2674,7.225,,C
775,1,2,"Hocking, Mrs. Elizabeth (Eliza Needs)",female,54,1,3,29105,23,,S
776,0,3,"Myhrman, Mr. Pehr Fabian Oliver Malkolm",male,18,0,0,347078,7.75,,S
777,0,3,"Tobin, Mr. Roger",male,,0,0,383121,7.75,F38,Q
778,1,3,"Emanuel, Miss. Virginia Ethel",female,5,0,0,364516,12.475,,S
779,0,3,"Kilgannon, Mr. Thomas J",male,,0,0,36865,7.7375,,Q
780,1,1,"Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)",female,43,0,1,24160,211.3375,B3,S
781,1,3,"Ayoub, Miss. Banoura",female,13,0,0,2687,7.2292,,C
782,1,1,"Dick, Mrs. Albert Adrian (Vera Gillespie)",female,17,1,0,17474,57,B20,S
783,0,1,"Long, Mr. Milton Clyde",male,29,0,0,113501,30,D6,S
784,0,3,"Johnston, Mr. Andrew G",male,,1,2,W./C. 6607,23.45,,S
785,0,3,"Ali, Mr. William",male,25,0,0,SOTON/O.Q. 3101312,7.05,,S
786,0,3,"Harmer, Mr. Abraham (David Lishin)",male,25,0,0,374887,7.25,,S
787,1,3,"Sjoblom, Miss. Anna Sofia",female,18,0,0,3101265,7.4958,,S
788,0,3,"Rice, Master. George Hugh",male,8,4,1,382652,29.125,,Q
789,1,3,"Dean, Master. Bertram Vere",male,1,1,2,C.A. 2315,20.575,,S
790,0,1,"Guggenheim, Mr. Benjamin",male,46,0,0,PC 17593,79.2,B82 B84,C
791,0,3,"Keane, Mr. Andrew ""Andy""",male,,0,0,12460,7.75,,Q
792,0,2,"Gaskell, Mr. Alfred",male,16,0,0,239865,26,,S
793,0,3,"Sage, Miss. Stella Anna",female,,8,2,CA. 2343,69.55,,S
794,0,1,"Hoyt, Mr. William Fisher",male,,0,0,PC 17600,30.6958,,C
795,0,3,"Dantcheff, Mr. Ristiu",male,25,0,0,349203,7.8958,,S
796,0,2,"Otter, Mr. Richard",male,39,0,0,28213,13,,S
797,1,1,"Leader, Dr. Alice (Farnham)",female,49,0,0,17465,25.9292,D17,S
798,1,3,"Osman, Mrs. Mara",female,31,0,0,349244,8.6833,,S
799,0,3,"Ibrahim Shawah, Mr. Yousseff",male,30,0,0,2685,7.2292,,C
800,0,3,"Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert)",female,30,1,1,345773,24.15,,S
801,0,2,"Ponesell, Mr. Martin",male,34,0,0,250647,13,,S
802,1,2,"Collyer, Mrs. Harvey (Charlotte Annie Tate)",female,31,1,1,C.A. 31921,26.25,,S
803,1,1,"Carter, Master. William Thornton II",male,11,1,2,113760,120,B96 B98,S
804,1,3,"Thomas, Master. Assad Alexander",male,0.42,0,1,2625,8.5167,,C
805,1,3,"Hedman, Mr. Oskar Arvid",male,27,0,0,347089,6.975,,S
806,0,3,"Johansson, Mr. Karl Johan",male,31,0,0,347063,7.775,,S
807,0,1,"Andrews, Mr. Thomas Jr",male,39,0,0,112050,0,A36,S
808,0,3,"Pettersson, Miss. Ellen Natalia",female,18,0,0,347087,7.775,,S
809,0,2,"Meyer, Mr. August",male,39,0,0,248723,13,,S
810,1,1,"Chambers, Mrs. Norman Campbell (Bertha Griggs)",female,33,1,0,113806,53.1,E8,S
811,0,3,"Alexander, Mr. William",male,26,0,0,3474,7.8875,,S
812,0,3,"Lester, Mr. James",male,39,0,0,A/4 48871,24.15,,S
813,0,2,"Slemen, Mr. Richard James",male,35,0,0,28206,10.5,,S
814,0,3,"Andersson, Miss. Ebba Iris Alfrida",female,6,4,2,347082,31.275,,S
815,0,3,"Tomlin, Mr. Ernest Portage",male,30.5,0,0,364499,8.05,,S
816,0,1,"Fry, Mr. Richard",male,,0,0,112058,0,B102,S
817,0,3,"Heininen, Miss. Wendla Maria",female,23,0,0,STON/O2. 3101290,7.925,,S
818,0,2,"Mallet, Mr. Albert",male,31,1,1,S.C./PARIS 2079,37.0042,,C
819,0,3,"Holm, Mr. John Fredrik Alexander",male,43,0,0,C 7075,6.45,,S
820,0,3,"Skoog, Master. Karl Thorsten",male,10,3,2,347088,27.9,,S
821,1,1,"Hays, Mrs. Charles Melville (Clara Jennings Gregg)",female,52,1,1,12749,93.5,B69,S
822,1,3,"Lulic, Mr. Nikola",male,27,0,0,315098,8.6625,,S
823,0,1,"Reuchlin, Jonkheer. John George",male,38,0,0,19972,0,,S
824,1,3,"Moor, Mrs. (Beila)",female,27,0,1,392096,12.475,E121,S
825,0,3,"Panula, Master. Urho Abraham",male,2,4,1,3101295,39.6875,,S
826,0,3,"Flynn, Mr. John",male,,0,0,368323,6.95,,Q
827,0,3,"Lam, Mr. Len",male,,0,0,1601,56.4958,,S
828,1,2,"Mallet, Master. Andre",male,1,0,2,S.C./PARIS 2079,37.0042,,C
829,1,3,"McCormack, Mr. Thomas Joseph",male,,0,0,367228,7.75,,Q
830,1,1,"Stone, Mrs. George Nelson (Martha Evelyn)",female,62,0,0,113572,80,B28,
831,1,3,"Yasbeck, Mrs. Antoni (Selini Alexander)",female,15,1,0,2659,14.4542,,C
832,1,2,"Richards, Master. George Sibley",male,0.83,1,1,29106,18.75,,S
833,0,3,"Saad, Mr. Amin",male,,0,0,2671,7.2292,,C
834,0,3,"Augustsson, Mr. Albert",male,23,0,0,347468,7.8542,,S
835,0,3,"Allum, Mr. Owen George",male,18,0,0,2223,8.3,,S
836,1,1,"Compton, Miss. Sara Rebecca",female,39,1,1,PC 17756,83.1583,E49,C
837,0,3,"Pasic, Mr. Jakob",male,21,0,0,315097,8.6625,,S
838,0,3,"Sirota, Mr. Maurice",male,,0,0,392092,8.05,,S
839,1,3,"Chip, Mr. Chang",male,32,0,0,1601,56.4958,,S
840,1,1,"Marechal, Mr. Pierre",male,,0,0,11774,29.7,C47,C
841,0,3,"Alhomaki, Mr. Ilmari Rudolf",male,20,0,0,SOTON/O2 3101287,7.925,,S
842,0,2,"Mudd, Mr. Thomas Charles",male,16,0,0,S.O./P.P. 3,10.5,,S
843,1,1,"Serepeca, Miss. Augusta",female,30,0,0,113798,31,,C
844,0,3,"Lemberopolous, Mr. Peter L",male,34.5,0,0,2683,6.4375,,C
845,0,3,"Culumovic, Mr. Jeso",male,17,0,0,315090,8.6625,,S
846,0,3,"Abbing, Mr. Anthony",male,42,0,0,C.A. 5547,7.55,,S
847,0,3,"Sage, Mr. Douglas Bullen",male,,8,2,CA. 2343,69.55,,S
848,0,3,"Markoff, Mr. Marin",male,35,0,0,349213,7.8958,,C
849,0,2,"Harper, Rev. John",male,28,0,1,248727,33,,S
850,1,1,"Goldenberg, Mrs. Samuel L (Edwiga Grabowska)",female,,1,0,17453,89.1042,C92,C
851,0,3,"Andersson, Master. Sigvard Harald Elias",male,4,4,2,347082,31.275,,S
852,0,3,"Svensson, Mr. Johan",male,74,0,0,347060,7.775,,S
853,0,3,"Boulos, Miss. Nourelain",female,9,1,1,2678,15.2458,,C
854,1,1,"Lines, Miss. Mary Conover",female,16,0,1,PC 17592,39.4,D28,S
855,0,2,"Carter, Mrs. Ernest Courtenay (Lilian Hughes)",female,44,1,0,244252,26,,S
856,1,3,"Aks, Mrs. Sam (Leah Rosen)",female,18,0,1,392091,9.35,,S
857,1,1,"Wick, Mrs. George Dennick (Mary Hitchcock)",female,45,1,1,36928,164.8667,,S
858,1,1,"Daly, Mr. Peter Denis ",male,51,0,0,113055,26.55,E17,S
859,1,3,"Baclini, Mrs. Solomon (Latifa Qurban)",female,24,0,3,2666,19.2583,,C
860,0,3,"Razi, Mr. Raihed",male,,0,0,2629,7.2292,,C
861,0,3,"Hansen, Mr. Claus Peter",male,41,2,0,350026,14.1083,,S
862,0,2,"Giles, Mr. Frederick Edward",male,21,1,0,28134,11.5,,S
863,1,1,"Swift, Mrs. Frederick Joel (Margaret Welles Barron)",female,48,0,0,17466,25.9292,D17,S
864,0,3,"Sage, Miss. Dorothy Edith ""Dolly""",female,,8,2,CA. 2343,69.55,,S
865,0,2,"Gill, Mr. John William",male,24,0,0,233866,13,,S
866,1,2,"Bystrom, Mrs. (Karolina)",female,42,0,0,236852,13,,S
867,1,2,"Duran y More, Miss. Asuncion",female,27,1,0,SC/PARIS 2149,13.8583,,C
868,0,1,"Roebling, Mr. Washington Augustus II",male,31,0,0,PC 17590,50.4958,A24,S
869,0,3,"van Melkebeke, Mr. Philemon",male,,0,0,345777,9.5,,S
870,1,3,"Johnson, Master. Harold Theodor",male,4,1,1,347742,11.1333,,S
871,0,3,"Balkic, Mr. Cerin",male,26,0,0,349248,7.8958,,S
872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47,1,1,11751,52.5542,D35,S
873,0,1,"Carlsson, Mr. Frans Olof",male,33,0,0,695,5,B51 B53 B55,S
874,0,3,"Vander Cruyssen, Mr. Victor",male,47,0,0,345765,9,,S
875,1,2,"Abelson, Mrs. Samuel (Hannah Wizosky)",female,28,1,0,P/PP 3381,24,,C
876,1,3,"Najib, Miss. Adele Kiamie ""Jane""",female,15,0,0,2667,7.225,,C
877,0,3,"Gustafsson, Mr. Alfred Ossian",male,20,0,0,7534,9.8458,,S
878,0,3,"Petroff, Mr. Nedelio",male,19,0,0,349212,7.8958,,S
879,0,3,"Laleff, Mr. Kristo",male,,0,0,349217,7.8958,,S
880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56,0,1,11767,83.1583,C50,C
881,1,2,"Shelley, Mrs. William (Imanita Parrish Hall)",female,25,0,1,230433,26,,S
882,0,3,"Markun, Mr. Johann",male,33,0,0,349257,7.8958,,S
883,0,3,"Dahlberg, Miss. Gerda Ulrika",female,22,0,0,7552,10.5167,,S
884,0,2,"Banfield, Mr. Frederick James",male,28,0,0,C.A./SOTON 34068,10.5,,S
885,0,3,"Sutehall, Mr. Henry Jr",male,25,0,0,SOTON/OQ 392076,7.05,,S
886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39,0,5,382652,29.125,,Q
887,0,2,"Montvila, Rev. Juozas",male,27,0,0,211536,13,,S
888,1,1,"Graham, Miss. Margaret Edith",female,19,0,0,112053,30,B42,S
889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
890,1,1,"Behr, Mr. Karl Howell",male,26,0,0,111369,30,C148,C
891,0,3,"Dooley, Mr. Patrick",male,32,0,0,370376,7.75,,Q
1 PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
2 1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.25 S
3 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 PC 17599 71.2833 C85 C
4 3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.925 S
5 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1 C123 S
6 5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.05 S
7 6 0 3 Moran, Mr. James male 0 0 330877 8.4583 Q
8 7 0 1 McCarthy, Mr. Timothy J male 54 0 0 17463 51.8625 E46 S
9 8 0 3 Palsson, Master. Gosta Leonard male 2 3 1 349909 21.075 S
10 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 2 347742 11.1333 S
11 10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 0 237736 30.0708 C
12 11 1 3 Sandstrom, Miss. Marguerite Rut female 4 1 1 PP 9549 16.7 G6 S
13 12 1 1 Bonnell, Miss. Elizabeth female 58 0 0 113783 26.55 C103 S
14 13 0 3 Saundercock, Mr. William Henry male 20 0 0 A/5. 2151 8.05 S
15 14 0 3 Andersson, Mr. Anders Johan male 39 1 5 347082 31.275 S
16 15 0 3 Vestrom, Miss. Hulda Amanda Adolfina female 14 0 0 350406 7.8542 S
17 16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55 0 0 248706 16 S
18 17 0 3 Rice, Master. Eugene male 2 4 1 382652 29.125 Q
19 18 1 2 Williams, Mr. Charles Eugene male 0 0 244373 13 S
20 19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) female 31 1 0 345763 18 S
21 20 1 3 Masselmani, Mrs. Fatima female 0 0 2649 7.225 C
22 21 0 2 Fynney, Mr. Joseph J male 35 0 0 239865 26 S
23 22 1 2 Beesley, Mr. Lawrence male 34 0 0 248698 13 D56 S
24 23 1 3 McGowan, Miss. Anna "Annie" female 15 0 0 330923 8.0292 Q
25 24 1 1 Sloper, Mr. William Thompson male 28 0 0 113788 35.5 A6 S
26 25 0 3 Palsson, Miss. Torborg Danira female 8 3 1 349909 21.075 S
27 26 1 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson) female 38 1 5 347077 31.3875 S
28 27 0 3 Emir, Mr. Farred Chehab male 0 0 2631 7.225 C
29 28 0 1 Fortune, Mr. Charles Alexander male 19 3 2 19950 263 C23 C25 C27 S
30 29 1 3 O'Dwyer, Miss. Ellen "Nellie" female 0 0 330959 7.8792 Q
31 30 0 3 Todoroff, Mr. Lalio male 0 0 349216 7.8958 S
32 31 0 1 Uruchurtu, Don. Manuel E male 40 0 0 PC 17601 27.7208 C
33 32 1 1 Spencer, Mrs. William Augustus (Marie Eugenie) female 1 0 PC 17569 146.5208 B78 C
34 33 1 3 Glynn, Miss. Mary Agatha female 0 0 335677 7.75 Q
35 34 0 2 Wheadon, Mr. Edward H male 66 0 0 C.A. 24579 10.5 S
36 35 0 1 Meyer, Mr. Edgar Joseph male 28 1 0 PC 17604 82.1708 C
37 36 0 1 Holverson, Mr. Alexander Oskar male 42 1 0 113789 52 S
38 37 1 3 Mamee, Mr. Hanna male 0 0 2677 7.2292 C
39 38 0 3 Cann, Mr. Ernest Charles male 21 0 0 A./5. 2152 8.05 S
40 39 0 3 Vander Planke, Miss. Augusta Maria female 18 2 0 345764 18 S
41 40 1 3 Nicola-Yarred, Miss. Jamila female 14 1 0 2651 11.2417 C
42 41 0 3 Ahlin, Mrs. Johan (Johanna Persdotter Larsson) female 40 1 0 7546 9.475 S
43 42 0 2 Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott) female 27 1 0 11668 21 S
44 43 0 3 Kraeff, Mr. Theodor male 0 0 349253 7.8958 C
45 44 1 2 Laroche, Miss. Simonne Marie Anne Andree female 3 1 2 SC/Paris 2123 41.5792 C
46 45 1 3 Devaney, Miss. Margaret Delia female 19 0 0 330958 7.8792 Q
47 46 0 3 Rogers, Mr. William John male 0 0 S.C./A.4. 23567 8.05 S
48 47 0 3 Lennon, Mr. Denis male 1 0 370371 15.5 Q
49 48 1 3 O'Driscoll, Miss. Bridget female 0 0 14311 7.75 Q
50 49 0 3 Samaan, Mr. Youssef male 2 0 2662 21.6792 C
51 50 0 3 Arnold-Franchi, Mrs. Josef (Josefine Franchi) female 18 1 0 349237 17.8 S
52 51 0 3 Panula, Master. Juha Niilo male 7 4 1 3101295 39.6875 S
53 52 0 3 Nosworthy, Mr. Richard Cater male 21 0 0 A/4. 39886 7.8 S
54 53 1 1 Harper, Mrs. Henry Sleeper (Myna Haxtun) female 49 1 0 PC 17572 76.7292 D33 C
55 54 1 2 Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkinson) female 29 1 0 2926 26 S
56 55 0 1 Ostby, Mr. Engelhart Cornelius male 65 0 1 113509 61.9792 B30 C
57 56 1 1 Woolner, Mr. Hugh male 0 0 19947 35.5 C52 S
58 57 1 2 Rugg, Miss. Emily female 21 0 0 C.A. 31026 10.5 S
59 58 0 3 Novel, Mr. Mansouer male 28.5 0 0 2697 7.2292 C
60 59 1 2 West, Miss. Constance Mirium female 5 1 2 C.A. 34651 27.75 S
61 60 0 3 Goodwin, Master. William Frederick male 11 5 2 CA 2144 46.9 S
62 61 0 3 Sirayanian, Mr. Orsen male 22 0 0 2669 7.2292 C
63 62 1 1 Icard, Miss. Amelie female 38 0 0 113572 80 B28
64 63 0 1 Harris, Mr. Henry Birkhardt male 45 1 0 36973 83.475 C83 S
65 64 0 3 Skoog, Master. Harald male 4 3 2 347088 27.9 S
66 65 0 1 Stewart, Mr. Albert A male 0 0 PC 17605 27.7208 C
67 66 1 3 Moubarek, Master. Gerios male 1 1 2661 15.2458 C
68 67 1 2 Nye, Mrs. (Elizabeth Ramell) female 29 0 0 C.A. 29395 10.5 F33 S
69 68 0 3 Crease, Mr. Ernest James male 19 0 0 S.P. 3464 8.1583 S
70 69 1 3 Andersson, Miss. Erna Alexandra female 17 4 2 3101281 7.925 S
71 70 0 3 Kink, Mr. Vincenz male 26 2 0 315151 8.6625 S
72 71 0 2 Jenkin, Mr. Stephen Curnow male 32 0 0 C.A. 33111 10.5 S
73 72 0 3 Goodwin, Miss. Lillian Amy female 16 5 2 CA 2144 46.9 S
74 73 0 2 Hood, Mr. Ambrose Jr male 21 0 0 S.O.C. 14879 73.5 S
75 74 0 3 Chronopoulos, Mr. Apostolos male 26 1 0 2680 14.4542 C
76 75 1 3 Bing, Mr. Lee male 32 0 0 1601 56.4958 S
77 76 0 3 Moen, Mr. Sigurd Hansen male 25 0 0 348123 7.65 F G73 S
78 77 0 3 Staneff, Mr. Ivan male 0 0 349208 7.8958 S
79 78 0 3 Moutal, Mr. Rahamin Haim male 0 0 374746 8.05 S
80 79 1 2 Caldwell, Master. Alden Gates male 0.83 0 2 248738 29 S
81 80 1 3 Dowdell, Miss. Elizabeth female 30 0 0 364516 12.475 S
82 81 0 3 Waelens, Mr. Achille male 22 0 0 345767 9 S
83 82 1 3 Sheerlinck, Mr. Jan Baptist male 29 0 0 345779 9.5 S
84 83 1 3 McDermott, Miss. Brigdet Delia female 0 0 330932 7.7875 Q
85 84 0 1 Carrau, Mr. Francisco M male 28 0 0 113059 47.1 S
86 85 1 2 Ilett, Miss. Bertha female 17 0 0 SO/C 14885 10.5 S
87 86 1 3 Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson) female 33 3 0 3101278 15.85 S
88 87 0 3 Ford, Mr. William Neal male 16 1 3 W./C. 6608 34.375 S
89 88 0 3 Slocovski, Mr. Selman Francis male 0 0 SOTON/OQ 392086 8.05 S
90 89 1 1 Fortune, Miss. Mabel Helen female 23 3 2 19950 263 C23 C25 C27 S
91 90 0 3 Celotti, Mr. Francesco male 24 0 0 343275 8.05 S
92 91 0 3 Christmann, Mr. Emil male 29 0 0 343276 8.05 S
93 92 0 3 Andreasson, Mr. Paul Edvin male 20 0 0 347466 7.8542 S
94 93 0 1 Chaffee, Mr. Herbert Fuller male 46 1 0 W.E.P. 5734 61.175 E31 S
95 94 0 3 Dean, Mr. Bertram Frank male 26 1 2 C.A. 2315 20.575 S
96 95 0 3 Coxon, Mr. Daniel male 59 0 0 364500 7.25 S
97 96 0 3 Shorney, Mr. Charles Joseph male 0 0 374910 8.05 S
98 97 0 1 Goldschmidt, Mr. George B male 71 0 0 PC 17754 34.6542 A5 C
99 98 1 1 Greenfield, Mr. William Bertram male 23 0 1 PC 17759 63.3583 D10 D12 C
100 99 1 2 Doling, Mrs. John T (Ada Julia Bone) female 34 0 1 231919 23 S
101 100 0 2 Kantor, Mr. Sinai male 34 1 0 244367 26 S
102 101 0 3 Petranec, Miss. Matilda female 28 0 0 349245 7.8958 S
103 102 0 3 Petroff, Mr. Pastcho ("Pentcho") male 0 0 349215 7.8958 S
104 103 0 1 White, Mr. Richard Frasar male 21 0 1 35281 77.2875 D26 S
105 104 0 3 Johansson, Mr. Gustaf Joel male 33 0 0 7540 8.6542 S
106 105 0 3 Gustafsson, Mr. Anders Vilhelm male 37 2 0 3101276 7.925 S
107 106 0 3 Mionoff, Mr. Stoytcho male 28 0 0 349207 7.8958 S
108 107 1 3 Salkjelsvik, Miss. Anna Kristine female 21 0 0 343120 7.65 S
109 108 1 3 Moss, Mr. Albert Johan male 0 0 312991 7.775 S
110 109 0 3 Rekic, Mr. Tido male 38 0 0 349249 7.8958 S
111 110 1 3 Moran, Miss. Bertha female 1 0 371110 24.15 Q
112 111 0 1 Porter, Mr. Walter Chamberlain male 47 0 0 110465 52 C110 S
113 112 0 3 Zabour, Miss. Hileni female 14.5 1 0 2665 14.4542 C
114 113 0 3 Barton, Mr. David John male 22 0 0 324669 8.05 S
115 114 0 3 Jussila, Miss. Katriina female 20 1 0 4136 9.825 S
116 115 0 3 Attalah, Miss. Malake female 17 0 0 2627 14.4583 C
117 116 0 3 Pekoniemi, Mr. Edvard male 21 0 0 STON/O 2. 3101294 7.925 S
118 117 0 3 Connors, Mr. Patrick male 70.5 0 0 370369 7.75 Q
119 118 0 2 Turpin, Mr. William John Robert male 29 1 0 11668 21 S
120 119 0 1 Baxter, Mr. Quigg Edmond male 24 0 1 PC 17558 247.5208 B58 B60 C
121 120 0 3 Andersson, Miss. Ellis Anna Maria female 2 4 2 347082 31.275 S
122 121 0 2 Hickman, Mr. Stanley George male 21 2 0 S.O.C. 14879 73.5 S
123 122 0 3 Moore, Mr. Leonard Charles male 0 0 A4. 54510 8.05 S
124 123 0 2 Nasser, Mr. Nicholas male 32.5 1 0 237736 30.0708 C
125 124 1 2 Webber, Miss. Susan female 32.5 0 0 27267 13 E101 S
126 125 0 1 White, Mr. Percival Wayland male 54 0 1 35281 77.2875 D26 S
127 126 1 3 Nicola-Yarred, Master. Elias male 12 1 0 2651 11.2417 C
128 127 0 3 McMahon, Mr. Martin male 0 0 370372 7.75 Q
129 128 1 3 Madsen, Mr. Fridtjof Arne male 24 0 0 C 17369 7.1417 S
130 129 1 3 Peter, Miss. Anna female 1 1 2668 22.3583 F E69 C
131 130 0 3 Ekstrom, Mr. Johan male 45 0 0 347061 6.975 S
132 131 0 3 Drazenoic, Mr. Jozef male 33 0 0 349241 7.8958 C
133 132 0 3 Coelho, Mr. Domingos Fernandeo male 20 0 0 SOTON/O.Q. 3101307 7.05 S
134 133 0 3 Robins, Mrs. Alexander A (Grace Charity Laury) female 47 1 0 A/5. 3337 14.5 S
135 134 1 2 Weisz, Mrs. Leopold (Mathilde Francoise Pede) female 29 1 0 228414 26 S
136 135 0 2 Sobey, Mr. Samuel James Hayden male 25 0 0 C.A. 29178 13 S
137 136 0 2 Richard, Mr. Emile male 23 0 0 SC/PARIS 2133 15.0458 C
138 137 1 1 Newsom, Miss. Helen Monypeny female 19 0 2 11752 26.2833 D47 S
139 138 0 1 Futrelle, Mr. Jacques Heath male 37 1 0 113803 53.1 C123 S
140 139 0 3 Osen, Mr. Olaf Elon male 16 0 0 7534 9.2167 S
141 140 0 1 Giglio, Mr. Victor male 24 0 0 PC 17593 79.2 B86 C
142 141 0 3 Boulos, Mrs. Joseph (Sultana) female 0 2 2678 15.2458 C
143 142 1 3 Nysten, Miss. Anna Sofia female 22 0 0 347081 7.75 S
144 143 1 3 Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck) female 24 1 0 STON/O2. 3101279 15.85 S
145 144 0 3 Burke, Mr. Jeremiah male 19 0 0 365222 6.75 Q
146 145 0 2 Andrew, Mr. Edgardo Samuel male 18 0 0 231945 11.5 S
147 146 0 2 Nicholls, Mr. Joseph Charles male 19 1 1 C.A. 33112 36.75 S
148 147 1 3 Andersson, Mr. August Edvard ("Wennerstrom") male 27 0 0 350043 7.7958 S
149 148 0 3 Ford, Miss. Robina Maggie "Ruby" female 9 2 2 W./C. 6608 34.375 S
150 149 0 2 Navratil, Mr. Michel ("Louis M Hoffman") male 36.5 0 2 230080 26 F2 S
151 150 0 2 Byles, Rev. Thomas Roussel Davids male 42 0 0 244310 13 S
152 151 0 2 Bateman, Rev. Robert James male 51 0 0 S.O.P. 1166 12.525 S
153 152 1 1 Pears, Mrs. Thomas (Edith Wearne) female 22 1 0 113776 66.6 C2 S
154 153 0 3 Meo, Mr. Alfonzo male 55.5 0 0 A.5. 11206 8.05 S
155 154 0 3 van Billiard, Mr. Austin Blyler male 40.5 0 2 A/5. 851 14.5 S
156 155 0 3 Olsen, Mr. Ole Martin male 0 0 Fa 265302 7.3125 S
157 156 0 1 Williams, Mr. Charles Duane male 51 0 1 PC 17597 61.3792 C
158 157 1 3 Gilnagh, Miss. Katherine "Katie" female 16 0 0 35851 7.7333 Q
159 158 0 3 Corn, Mr. Harry male 30 0 0 SOTON/OQ 392090 8.05 S
160 159 0 3 Smiljanic, Mr. Mile male 0 0 315037 8.6625 S
161 160 0 3 Sage, Master. Thomas Henry male 8 2 CA. 2343 69.55 S
162 161 0 3 Cribb, Mr. John Hatfield male 44 0 1 371362 16.1 S
163 162 1 2 Watt, Mrs. James (Elizabeth "Bessie" Inglis Milne) female 40 0 0 C.A. 33595 15.75 S
164 163 0 3 Bengtsson, Mr. John Viktor male 26 0 0 347068 7.775 S
165 164 0 3 Calic, Mr. Jovo male 17 0 0 315093 8.6625 S
166 165 0 3 Panula, Master. Eino Viljami male 1 4 1 3101295 39.6875 S
167 166 1 3 Goldsmith, Master. Frank John William "Frankie" male 9 0 2 363291 20.525 S
168 167 1 1 Chibnall, Mrs. (Edith Martha Bowerman) female 0 1 113505 55 E33 S
169 168 0 3 Skoog, Mrs. William (Anna Bernhardina Karlsson) female 45 1 4 347088 27.9 S
170 169 0 1 Baumann, Mr. John D male 0 0 PC 17318 25.925 S
171 170 0 3 Ling, Mr. Lee male 28 0 0 1601 56.4958 S
172 171 0 1 Van der hoef, Mr. Wyckoff male 61 0 0 111240 33.5 B19 S
173 172 0 3 Rice, Master. Arthur male 4 4 1 382652 29.125 Q
174 173 1 3 Johnson, Miss. Eleanor Ileen female 1 1 1 347742 11.1333 S
175 174 0 3 Sivola, Mr. Antti Wilhelm male 21 0 0 STON/O 2. 3101280 7.925 S
176 175 0 1 Smith, Mr. James Clinch male 56 0 0 17764 30.6958 A7 C
177 176 0 3 Klasen, Mr. Klas Albin male 18 1 1 350404 7.8542 S
178 177 0 3 Lefebre, Master. Henry Forbes male 3 1 4133 25.4667 S
179 178 0 1 Isham, Miss. Ann Elizabeth female 50 0 0 PC 17595 28.7125 C49 C
180 179 0 2 Hale, Mr. Reginald male 30 0 0 250653 13 S
181 180 0 3 Leonard, Mr. Lionel male 36 0 0 LINE 0 S
182 181 0 3 Sage, Miss. Constance Gladys female 8 2 CA. 2343 69.55 S
183 182 0 2 Pernot, Mr. Rene male 0 0 SC/PARIS 2131 15.05 C
184 183 0 3 Asplund, Master. Clarence Gustaf Hugo male 9 4 2 347077 31.3875 S
185 184 1 2 Becker, Master. Richard F male 1 2 1 230136 39 F4 S
186 185 1 3 Kink-Heilmann, Miss. Luise Gretchen female 4 0 2 315153 22.025 S
187 186 0 1 Rood, Mr. Hugh Roscoe male 0 0 113767 50 A32 S
188 187 1 3 O'Brien, Mrs. Thomas (Johanna "Hannah" Godfrey) female 1 0 370365 15.5 Q
189 188 1 1 Romaine, Mr. Charles Hallace ("Mr C Rolmane") male 45 0 0 111428 26.55 S
190 189 0 3 Bourke, Mr. John male 40 1 1 364849 15.5 Q
191 190 0 3 Turcin, Mr. Stjepan male 36 0 0 349247 7.8958 S
192 191 1 2 Pinsky, Mrs. (Rosa) female 32 0 0 234604 13 S
193 192 0 2 Carbines, Mr. William male 19 0 0 28424 13 S
194 193 1 3 Andersen-Jensen, Miss. Carla Christine Nielsine female 19 1 0 350046 7.8542 S
195 194 1 2 Navratil, Master. Michel M male 3 1 1 230080 26 F2 S
196 195 1 1 Brown, Mrs. James Joseph (Margaret Tobin) female 44 0 0 PC 17610 27.7208 B4 C
197 196 1 1 Lurette, Miss. Elise female 58 0 0 PC 17569 146.5208 B80 C
198 197 0 3 Mernagh, Mr. Robert male 0 0 368703 7.75 Q
199 198 0 3 Olsen, Mr. Karl Siegwart Andreas male 42 0 1 4579 8.4042 S
200 199 1 3 Madigan, Miss. Margaret "Maggie" female 0 0 370370 7.75 Q
201 200 0 2 Yrois, Miss. Henriette ("Mrs Harbeck") female 24 0 0 248747 13 S
202 201 0 3 Vande Walle, Mr. Nestor Cyriel male 28 0 0 345770 9.5 S
203 202 0 3 Sage, Mr. Frederick male 8 2 CA. 2343 69.55 S
204 203 0 3 Johanson, Mr. Jakob Alfred male 34 0 0 3101264 6.4958 S
205 204 0 3 Youseff, Mr. Gerious male 45.5 0 0 2628 7.225 C
206 205 1 3 Cohen, Mr. Gurshon "Gus" male 18 0 0 A/5 3540 8.05 S
207 206 0 3 Strom, Miss. Telma Matilda female 2 0 1 347054 10.4625 G6 S
208 207 0 3 Backstrom, Mr. Karl Alfred male 32 1 0 3101278 15.85 S
209 208 1 3 Albimona, Mr. Nassef Cassem male 26 0 0 2699 18.7875 C
210 209 1 3 Carr, Miss. Helen "Ellen" female 16 0 0 367231 7.75 Q
211 210 1 1 Blank, Mr. Henry male 40 0 0 112277 31 A31 C
212 211 0 3 Ali, Mr. Ahmed male 24 0 0 SOTON/O.Q. 3101311 7.05 S
213 212 1 2 Cameron, Miss. Clear Annie female 35 0 0 F.C.C. 13528 21 S
214 213 0 3 Perkin, Mr. John Henry male 22 0 0 A/5 21174 7.25 S
215 214 0 2 Givard, Mr. Hans Kristensen male 30 0 0 250646 13 S
216 215 0 3 Kiernan, Mr. Philip male 1 0 367229 7.75 Q
217 216 1 1 Newell, Miss. Madeleine female 31 1 0 35273 113.275 D36 C
218 217 1 3 Honkanen, Miss. Eliina female 27 0 0 STON/O2. 3101283 7.925 S
219 218 0 2 Jacobsohn, Mr. Sidney Samuel male 42 1 0 243847 27 S
220 219 1 1 Bazzani, Miss. Albina female 32 0 0 11813 76.2917 D15 C
221 220 0 2 Harris, Mr. Walter male 30 0 0 W/C 14208 10.5 S
222 221 1 3 Sunderland, Mr. Victor Francis male 16 0 0 SOTON/OQ 392089 8.05 S
223 222 0 2 Bracken, Mr. James H male 27 0 0 220367 13 S
224 223 0 3 Green, Mr. George Henry male 51 0 0 21440 8.05 S
225 224 0 3 Nenkoff, Mr. Christo male 0 0 349234 7.8958 S
226 225 1 1 Hoyt, Mr. Frederick Maxfield male 38 1 0 19943 90 C93 S
227 226 0 3 Berglund, Mr. Karl Ivar Sven male 22 0 0 PP 4348 9.35 S
228 227 1 2 Mellors, Mr. William John male 19 0 0 SW/PP 751 10.5 S
229 228 0 3 Lovell, Mr. John Hall ("Henry") male 20.5 0 0 A/5 21173 7.25 S
230 229 0 2 Fahlstrom, Mr. Arne Jonas male 18 0 0 236171 13 S
231 230 0 3 Lefebre, Miss. Mathilde female 3 1 4133 25.4667 S
232 231 1 1 Harris, Mrs. Henry Birkhardt (Irene Wallach) female 35 1 0 36973 83.475 C83 S
233 232 0 3 Larsson, Mr. Bengt Edvin male 29 0 0 347067 7.775 S
234 233 0 2 Sjostedt, Mr. Ernst Adolf male 59 0 0 237442 13.5 S
235 234 1 3 Asplund, Miss. Lillian Gertrud female 5 4 2 347077 31.3875 S
236 235 0 2 Leyson, Mr. Robert William Norman male 24 0 0 C.A. 29566 10.5 S
237 236 0 3 Harknett, Miss. Alice Phoebe female 0 0 W./C. 6609 7.55 S
238 237 0 2 Hold, Mr. Stephen male 44 1 0 26707 26 S
239 238 1 2 Collyer, Miss. Marjorie "Lottie" female 8 0 2 C.A. 31921 26.25 S
240 239 0 2 Pengelly, Mr. Frederick William male 19 0 0 28665 10.5 S
241 240 0 2 Hunt, Mr. George Henry male 33 0 0 SCO/W 1585 12.275 S
242 241 0 3 Zabour, Miss. Thamine female 1 0 2665 14.4542 C
243 242 1 3 Murphy, Miss. Katherine "Kate" female 1 0 367230 15.5 Q
244 243 0 2 Coleridge, Mr. Reginald Charles male 29 0 0 W./C. 14263 10.5 S
245 244 0 3 Maenpaa, Mr. Matti Alexanteri male 22 0 0 STON/O 2. 3101275 7.125 S
246 245 0 3 Attalah, Mr. Sleiman male 30 0 0 2694 7.225 C
247 246 0 1 Minahan, Dr. William Edward male 44 2 0 19928 90 C78 Q
248 247 0 3 Lindahl, Miss. Agda Thorilda Viktoria female 25 0 0 347071 7.775 S
249 248 1 2 Hamalainen, Mrs. William (Anna) female 24 0 2 250649 14.5 S
250 249 1 1 Beckwith, Mr. Richard Leonard male 37 1 1 11751 52.5542 D35 S
251 250 0 2 Carter, Rev. Ernest Courtenay male 54 1 0 244252 26 S
252 251 0 3 Reed, Mr. James George male 0 0 362316 7.25 S
253 252 0 3 Strom, Mrs. Wilhelm (Elna Matilda Persson) female 29 1 1 347054 10.4625 G6 S
254 253 0 1 Stead, Mr. William Thomas male 62 0 0 113514 26.55 C87 S
255 254 0 3 Lobb, Mr. William Arthur male 30 1 0 A/5. 3336 16.1 S
256 255 0 3 Rosblom, Mrs. Viktor (Helena Wilhelmina) female 41 0 2 370129 20.2125 S
257 256 1 3 Touma, Mrs. Darwis (Hanne Youssef Razi) female 29 0 2 2650 15.2458 C
258 257 1 1 Thorne, Mrs. Gertrude Maybelle female 0 0 PC 17585 79.2 C
259 258 1 1 Cherry, Miss. Gladys female 30 0 0 110152 86.5 B77 S
260 259 1 1 Ward, Miss. Anna female 35 0 0 PC 17755 512.3292 C
261 260 1 2 Parrish, Mrs. (Lutie Davis) female 50 0 1 230433 26 S
262 261 0 3 Smith, Mr. Thomas male 0 0 384461 7.75 Q
263 262 1 3 Asplund, Master. Edvin Rojj Felix male 3 4 2 347077 31.3875 S
264 263 0 1 Taussig, Mr. Emil male 52 1 1 110413 79.65 E67 S
265 264 0 1 Harrison, Mr. William male 40 0 0 112059 0 B94 S
266 265 0 3 Henry, Miss. Delia female 0 0 382649 7.75 Q
267 266 0 2 Reeves, Mr. David male 36 0 0 C.A. 17248 10.5 S
268 267 0 3 Panula, Mr. Ernesti Arvid male 16 4 1 3101295 39.6875 S
269 268 1 3 Persson, Mr. Ernst Ulrik male 25 1 0 347083 7.775 S
270 269 1 1 Graham, Mrs. William Thompson (Edith Junkins) female 58 0 1 PC 17582 153.4625 C125 S
271 270 1 1 Bissette, Miss. Amelia female 35 0 0 PC 17760 135.6333 C99 S
272 271 0 1 Cairns, Mr. Alexander male 0 0 113798 31 S
273 272 1 3 Tornquist, Mr. William Henry male 25 0 0 LINE 0 S
274 273 1 2 Mellinger, Mrs. (Elizabeth Anne Maidment) female 41 0 1 250644 19.5 S
275 274 0 1 Natsch, Mr. Charles H male 37 0 1 PC 17596 29.7 C118 C
276 275 1 3 Healy, Miss. Hanora "Nora" female 0 0 370375 7.75 Q
277 276 1 1 Andrews, Miss. Kornelia Theodosia female 63 1 0 13502 77.9583 D7 S
278 277 0 3 Lindblom, Miss. Augusta Charlotta female 45 0 0 347073 7.75 S
279 278 0 2 Parkes, Mr. Francis "Frank" male 0 0 239853 0 S
280 279 0 3 Rice, Master. Eric male 7 4 1 382652 29.125 Q
281 280 1 3 Abbott, Mrs. Stanton (Rosa Hunt) female 35 1 1 C.A. 2673 20.25 S
282 281 0 3 Duane, Mr. Frank male 65 0 0 336439 7.75 Q
283 282 0 3 Olsson, Mr. Nils Johan Goransson male 28 0 0 347464 7.8542 S
284 283 0 3 de Pelsmaeker, Mr. Alfons male 16 0 0 345778 9.5 S
285 284 1 3 Dorking, Mr. Edward Arthur male 19 0 0 A/5. 10482 8.05 S
286 285 0 1 Smith, Mr. Richard William male 0 0 113056 26 A19 S
287 286 0 3 Stankovic, Mr. Ivan male 33 0 0 349239 8.6625 C
288 287 1 3 de Mulder, Mr. Theodore male 30 0 0 345774 9.5 S
289 288 0 3 Naidenoff, Mr. Penko male 22 0 0 349206 7.8958 S
290 289 1 2 Hosono, Mr. Masabumi male 42 0 0 237798 13 S
291 290 1 3 Connolly, Miss. Kate female 22 0 0 370373 7.75 Q
292 291 1 1 Barber, Miss. Ellen "Nellie" female 26 0 0 19877 78.85 S
293 292 1 1 Bishop, Mrs. Dickinson H (Helen Walton) female 19 1 0 11967 91.0792 B49 C
294 293 0 2 Levy, Mr. Rene Jacques male 36 0 0 SC/Paris 2163 12.875 D C
295 294 0 3 Haas, Miss. Aloisia female 24 0 0 349236 8.85 S
296 295 0 3 Mineff, Mr. Ivan male 24 0 0 349233 7.8958 S
297 296 0 1 Lewy, Mr. Ervin G male 0 0 PC 17612 27.7208 C
298 297 0 3 Hanna, Mr. Mansour male 23.5 0 0 2693 7.2292 C
299 298 0 1 Allison, Miss. Helen Loraine female 2 1 2 113781 151.55 C22 C26 S
300 299 1 1 Saalfeld, Mr. Adolphe male 0 0 19988 30.5 C106 S
301 300 1 1 Baxter, Mrs. James (Helene DeLaudeniere Chaput) female 50 0 1 PC 17558 247.5208 B58 B60 C
302 301 1 3 Kelly, Miss. Anna Katherine "Annie Kate" female 0 0 9234 7.75 Q
303 302 1 3 McCoy, Mr. Bernard male 2 0 367226 23.25 Q
304 303 0 3 Johnson, Mr. William Cahoone Jr male 19 0 0 LINE 0 S
305 304 1 2 Keane, Miss. Nora A female 0 0 226593 12.35 E101 Q
306 305 0 3 Williams, Mr. Howard Hugh "Harry" male 0 0 A/5 2466 8.05 S
307 306 1 1 Allison, Master. Hudson Trevor male 0.92 1 2 113781 151.55 C22 C26 S
308 307 1 1 Fleming, Miss. Margaret female 0 0 17421 110.8833 C
309 308 1 1 Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo) female 17 1 0 PC 17758 108.9 C65 C
310 309 0 2 Abelson, Mr. Samuel male 30 1 0 P/PP 3381 24 C
311 310 1 1 Francatelli, Miss. Laura Mabel female 30 0 0 PC 17485 56.9292 E36 C
312 311 1 1 Hays, Miss. Margaret Bechstein female 24 0 0 11767 83.1583 C54 C
313 312 1 1 Ryerson, Miss. Emily Borie female 18 2 2 PC 17608 262.375 B57 B59 B63 B66 C
314 313 0 2 Lahtinen, Mrs. William (Anna Sylfven) female 26 1 1 250651 26 S
315 314 0 3 Hendekovic, Mr. Ignjac male 28 0 0 349243 7.8958 S
316 315 0 2 Hart, Mr. Benjamin male 43 1 1 F.C.C. 13529 26.25 S
317 316 1 3 Nilsson, Miss. Helmina Josefina female 26 0 0 347470 7.8542 S
318 317 1 2 Kantor, Mrs. Sinai (Miriam Sternin) female 24 1 0 244367 26 S
319 318 0 2 Moraweck, Dr. Ernest male 54 0 0 29011 14 S
320 319 1 1 Wick, Miss. Mary Natalie female 31 0 2 36928 164.8667 C7 S
321 320 1 1 Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone) female 40 1 1 16966 134.5 E34 C
322 321 0 3 Dennis, Mr. Samuel male 22 0 0 A/5 21172 7.25 S
323 322 0 3 Danoff, Mr. Yoto male 27 0 0 349219 7.8958 S
324 323 1 2 Slayter, Miss. Hilda Mary female 30 0 0 234818 12.35 Q
325 324 1 2 Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh) female 22 1 1 248738 29 S
326 325 0 3 Sage, Mr. George John Jr male 8 2 CA. 2343 69.55 S
327 326 1 1 Young, Miss. Marie Grice female 36 0 0 PC 17760 135.6333 C32 C
328 327 0 3 Nysveen, Mr. Johan Hansen male 61 0 0 345364 6.2375 S
329 328 1 2 Ball, Mrs. (Ada E Hall) female 36 0 0 28551 13 D S
330 329 1 3 Goldsmith, Mrs. Frank John (Emily Alice Brown) female 31 1 1 363291 20.525 S
331 330 1 1 Hippach, Miss. Jean Gertrude female 16 0 1 111361 57.9792 B18 C
332 331 1 3 McCoy, Miss. Agnes female 2 0 367226 23.25 Q
333 332 0 1 Partner, Mr. Austen male 45.5 0 0 113043 28.5 C124 S
334 333 0 1 Graham, Mr. George Edward male 38 0 1 PC 17582 153.4625 C91 S
335 334 0 3 Vander Planke, Mr. Leo Edmondus male 16 2 0 345764 18 S
336 335 1 1 Frauenthal, Mrs. Henry William (Clara Heinsheimer) female 1 0 PC 17611 133.65 S
337 336 0 3 Denkoff, Mr. Mitto male 0 0 349225 7.8958 S
338 337 0 1 Pears, Mr. Thomas Clinton male 29 1 0 113776 66.6 C2 S
339 338 1 1 Burns, Miss. Elizabeth Margaret female 41 0 0 16966 134.5 E40 C
340 339 1 3 Dahl, Mr. Karl Edwart male 45 0 0 7598 8.05 S
341 340 0 1 Blackwell, Mr. Stephen Weart male 45 0 0 113784 35.5 T S
342 341 1 2 Navratil, Master. Edmond Roger male 2 1 1 230080 26 F2 S
343 342 1 1 Fortune, Miss. Alice Elizabeth female 24 3 2 19950 263 C23 C25 C27 S
344 343 0 2 Collander, Mr. Erik Gustaf male 28 0 0 248740 13 S
345 344 0 2 Sedgwick, Mr. Charles Frederick Waddington male 25 0 0 244361 13 S
346 345 0 2 Fox, Mr. Stanley Hubert male 36 0 0 229236 13 S
347 346 1 2 Brown, Miss. Amelia "Mildred" female 24 0 0 248733 13 F33 S
348 347 1 2 Smith, Miss. Marion Elsie female 40 0 0 31418 13 S
349 348 1 3 Davison, Mrs. Thomas Henry (Mary E Finck) female 1 0 386525 16.1 S
350 349 1 3 Coutts, Master. William Loch "William" male 3 1 1 C.A. 37671 15.9 S
351 350 0 3 Dimic, Mr. Jovan male 42 0 0 315088 8.6625 S
352 351 0 3 Odahl, Mr. Nils Martin male 23 0 0 7267 9.225 S
353 352 0 1 Williams-Lambert, Mr. Fletcher Fellows male 0 0 113510 35 C128 S
354 353 0 3 Elias, Mr. Tannous male 15 1 1 2695 7.2292 C
355 354 0 3 Arnold-Franchi, Mr. Josef male 25 1 0 349237 17.8 S
356 355 0 3 Yousif, Mr. Wazli male 0 0 2647 7.225 C
357 356 0 3 Vanden Steen, Mr. Leo Peter male 28 0 0 345783 9.5 S
358 357 1 1 Bowerman, Miss. Elsie Edith female 22 0 1 113505 55 E33 S
359 358 0 2 Funk, Miss. Annie Clemmer female 38 0 0 237671 13 S
360 359 1 3 McGovern, Miss. Mary female 0 0 330931 7.8792 Q
361 360 1 3 Mockler, Miss. Helen Mary "Ellie" female 0 0 330980 7.8792 Q
362 361 0 3 Skoog, Mr. Wilhelm male 40 1 4 347088 27.9 S
363 362 0 2 del Carlo, Mr. Sebastiano male 29 1 0 SC/PARIS 2167 27.7208 C
364 363 0 3 Barbara, Mrs. (Catherine David) female 45 0 1 2691 14.4542 C
365 364 0 3 Asim, Mr. Adola male 35 0 0 SOTON/O.Q. 3101310 7.05 S
366 365 0 3 O'Brien, Mr. Thomas male 1 0 370365 15.5 Q
367 366 0 3 Adahl, Mr. Mauritz Nils Martin male 30 0 0 C 7076 7.25 S
368 367 1 1 Warren, Mrs. Frank Manley (Anna Sophia Atkinson) female 60 1 0 110813 75.25 D37 C
369 368 1 3 Moussa, Mrs. (Mantoura Boulos) female 0 0 2626 7.2292 C
370 369 1 3 Jermyn, Miss. Annie female 0 0 14313 7.75 Q
371 370 1 1 Aubart, Mme. Leontine Pauline female 24 0 0 PC 17477 69.3 B35 C
372 371 1 1 Harder, Mr. George Achilles male 25 1 0 11765 55.4417 E50 C
373 372 0 3 Wiklund, Mr. Jakob Alfred male 18 1 0 3101267 6.4958 S
374 373 0 3 Beavan, Mr. William Thomas male 19 0 0 323951 8.05 S
375 374 0 1 Ringhini, Mr. Sante male 22 0 0 PC 17760 135.6333 C
376 375 0 3 Palsson, Miss. Stina Viola female 3 3 1 349909 21.075 S
377 376 1 1 Meyer, Mrs. Edgar Joseph (Leila Saks) female 1 0 PC 17604 82.1708 C
378 377 1 3 Landergren, Miss. Aurora Adelia female 22 0 0 C 7077 7.25 S
379 378 0 1 Widener, Mr. Harry Elkins male 27 0 2 113503 211.5 C82 C
380 379 0 3 Betros, Mr. Tannous male 20 0 0 2648 4.0125 C
381 380 0 3 Gustafsson, Mr. Karl Gideon male 19 0 0 347069 7.775 S
382 381 1 1 Bidois, Miss. Rosalie female 42 0 0 PC 17757 227.525 C
383 382 1 3 Nakid, Miss. Maria ("Mary") female 1 0 2 2653 15.7417 C
384 383 0 3 Tikkanen, Mr. Juho male 32 0 0 STON/O 2. 3101293 7.925 S
385 384 1 1 Holverson, Mrs. Alexander Oskar (Mary Aline Towner) female 35 1 0 113789 52 S
386 385 0 3 Plotcharsky, Mr. Vasil male 0 0 349227 7.8958 S
387 386 0 2 Davies, Mr. Charles Henry male 18 0 0 S.O.C. 14879 73.5 S
388 387 0 3 Goodwin, Master. Sidney Leonard male 1 5 2 CA 2144 46.9 S
389 388 1 2 Buss, Miss. Kate female 36 0 0 27849 13 S
390 389 0 3 Sadlier, Mr. Matthew male 0 0 367655 7.7292 Q
391 390 1 2 Lehmann, Miss. Bertha female 17 0 0 SC 1748 12 C
392 391 1 1 Carter, Mr. William Ernest male 36 1 2 113760 120 B96 B98 S
393 392 1 3 Jansson, Mr. Carl Olof male 21 0 0 350034 7.7958 S
394 393 0 3 Gustafsson, Mr. Johan Birger male 28 2 0 3101277 7.925 S
395 394 1 1 Newell, Miss. Marjorie female 23 1 0 35273 113.275 D36 C
396 395 1 3 Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengtsson) female 24 0 2 PP 9549 16.7 G6 S
397 396 0 3 Johansson, Mr. Erik male 22 0 0 350052 7.7958 S
398 397 0 3 Olsson, Miss. Elina female 31 0 0 350407 7.8542 S
399 398 0 2 McKane, Mr. Peter David male 46 0 0 28403 26 S
400 399 0 2 Pain, Dr. Alfred male 23 0 0 244278 10.5 S
401 400 1 2 Trout, Mrs. William H (Jessie L) female 28 0 0 240929 12.65 S
402 401 1 3 Niskanen, Mr. Juha male 39 0 0 STON/O 2. 3101289 7.925 S
403 402 0 3 Adams, Mr. John male 26 0 0 341826 8.05 S
404 403 0 3 Jussila, Miss. Mari Aina female 21 1 0 4137 9.825 S
405 404 0 3 Hakkarainen, Mr. Pekka Pietari male 28 1 0 STON/O2. 3101279 15.85 S
406 405 0 3 Oreskovic, Miss. Marija female 20 0 0 315096 8.6625 S
407 406 0 2 Gale, Mr. Shadrach male 34 1 0 28664 21 S
408 407 0 3 Widegren, Mr. Carl/Charles Peter male 51 0 0 347064 7.75 S
409 408 1 2 Richards, Master. William Rowe male 3 1 1 29106 18.75 S
410 409 0 3 Birkeland, Mr. Hans Martin Monsen male 21 0 0 312992 7.775 S
411 410 0 3 Lefebre, Miss. Ida female 3 1 4133 25.4667 S
412 411 0 3 Sdycoff, Mr. Todor male 0 0 349222 7.8958 S
413 412 0 3 Hart, Mr. Henry male 0 0 394140 6.8583 Q
414 413 1 1 Minahan, Miss. Daisy E female 33 1 0 19928 90 C78 Q
415 414 0 2 Cunningham, Mr. Alfred Fleming male 0 0 239853 0 S
416 415 1 3 Sundman, Mr. Johan Julian male 44 0 0 STON/O 2. 3101269 7.925 S
417 416 0 3 Meek, Mrs. Thomas (Annie Louise Rowley) female 0 0 343095 8.05 S
418 417 1 2 Drew, Mrs. James Vivian (Lulu Thorne Christian) female 34 1 1 28220 32.5 S
419 418 1 2 Silven, Miss. Lyyli Karoliina female 18 0 2 250652 13 S
420 419 0 2 Matthews, Mr. William John male 30 0 0 28228 13 S
421 420 0 3 Van Impe, Miss. Catharina female 10 0 2 345773 24.15 S
422 421 0 3 Gheorgheff, Mr. Stanio male 0 0 349254 7.8958 C
423 422 0 3 Charters, Mr. David male 21 0 0 A/5. 13032 7.7333 Q
424 423 0 3 Zimmerman, Mr. Leo male 29 0 0 315082 7.875 S
425 424 0 3 Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren) female 28 1 1 347080 14.4 S
426 425 0 3 Rosblom, Mr. Viktor Richard male 18 1 1 370129 20.2125 S
427 426 0 3 Wiseman, Mr. Phillippe male 0 0 A/4. 34244 7.25 S
428 427 1 2 Clarke, Mrs. Charles V (Ada Maria Winfield) female 28 1 0 2003 26 S
429 428 1 2 Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall") female 19 0 0 250655 26 S
430 429 0 3 Flynn, Mr. James male 0 0 364851 7.75 Q
431 430 1 3 Pickard, Mr. Berk (Berk Trembisky) male 32 0 0 SOTON/O.Q. 392078 8.05 E10 S
432 431 1 1 Bjornstrom-Steffansson, Mr. Mauritz Hakan male 28 0 0 110564 26.55 C52 S
433 432 1 3 Thorneycroft, Mrs. Percival (Florence Kate White) female 1 0 376564 16.1 S
434 433 1 2 Louch, Mrs. Charles Alexander (Alice Adelaide Slow) female 42 1 0 SC/AH 3085 26 S
435 434 0 3 Kallio, Mr. Nikolai Erland male 17 0 0 STON/O 2. 3101274 7.125 S
436 435 0 1 Silvey, Mr. William Baird male 50 1 0 13507 55.9 E44 S
437 436 1 1 Carter, Miss. Lucile Polk female 14 1 2 113760 120 B96 B98 S
438 437 0 3 Ford, Miss. Doolina Margaret "Daisy" female 21 2 2 W./C. 6608 34.375 S
439 438 1 2 Richards, Mrs. Sidney (Emily Hocking) female 24 2 3 29106 18.75 S
440 439 0 1 Fortune, Mr. Mark male 64 1 4 19950 263 C23 C25 C27 S
441 440 0 2 Kvillner, Mr. Johan Henrik Johannesson male 31 0 0 C.A. 18723 10.5 S
442 441 1 2 Hart, Mrs. Benjamin (Esther Ada Bloomfield) female 45 1 1 F.C.C. 13529 26.25 S
443 442 0 3 Hampe, Mr. Leon male 20 0 0 345769 9.5 S
444 443 0 3 Petterson, Mr. Johan Emil male 25 1 0 347076 7.775 S
445 444 1 2 Reynaldo, Ms. Encarnacion female 28 0 0 230434 13 S
446 445 1 3 Johannesen-Bratthammer, Mr. Bernt male 0 0 65306 8.1125 S
447 446 1 1 Dodge, Master. Washington male 4 0 2 33638 81.8583 A34 S
448 447 1 2 Mellinger, Miss. Madeleine Violet female 13 0 1 250644 19.5 S
449 448 1 1 Seward, Mr. Frederic Kimber male 34 0 0 113794 26.55 S
450 449 1 3 Baclini, Miss. Marie Catherine female 5 2 1 2666 19.2583 C
451 450 1 1 Peuchen, Major. Arthur Godfrey male 52 0 0 113786 30.5 C104 S
452 451 0 2 West, Mr. Edwy Arthur male 36 1 2 C.A. 34651 27.75 S
453 452 0 3 Hagland, Mr. Ingvald Olai Olsen male 1 0 65303 19.9667 S
454 453 0 1 Foreman, Mr. Benjamin Laventall male 30 0 0 113051 27.75 C111 C
455 454 1 1 Goldenberg, Mr. Samuel L male 49 1 0 17453 89.1042 C92 C
456 455 0 3 Peduzzi, Mr. Joseph male 0 0 A/5 2817 8.05 S
457 456 1 3 Jalsevac, Mr. Ivan male 29 0 0 349240 7.8958 C
458 457 0 1 Millet, Mr. Francis Davis male 65 0 0 13509 26.55 E38 S
459 458 1 1 Kenyon, Mrs. Frederick R (Marion) female 1 0 17464 51.8625 D21 S
460 459 1 2 Toomey, Miss. Ellen female 50 0 0 F.C.C. 13531 10.5 S
461 460 0 3 O'Connor, Mr. Maurice male 0 0 371060 7.75 Q
462 461 1 1 Anderson, Mr. Harry male 48 0 0 19952 26.55 E12 S
463 462 0 3 Morley, Mr. William male 34 0 0 364506 8.05 S
464 463 0 1 Gee, Mr. Arthur H male 47 0 0 111320 38.5 E63 S
465 464 0 2 Milling, Mr. Jacob Christian male 48 0 0 234360 13 S
466 465 0 3 Maisner, Mr. Simon male 0 0 A/S 2816 8.05 S
467 466 0 3 Goncalves, Mr. Manuel Estanslas male 38 0 0 SOTON/O.Q. 3101306 7.05 S
468 467 0 2 Campbell, Mr. William male 0 0 239853 0 S
469 468 0 1 Smart, Mr. John Montgomery male 56 0 0 113792 26.55 S
470 469 0 3 Scanlan, Mr. James male 0 0 36209 7.725 Q
471 470 1 3 Baclini, Miss. Helene Barbara female 0.75 2 1 2666 19.2583 C
472 471 0 3 Keefe, Mr. Arthur male 0 0 323592 7.25 S
473 472 0 3 Cacic, Mr. Luka male 38 0 0 315089 8.6625 S
474 473 1 2 West, Mrs. Edwy Arthur (Ada Mary Worth) female 33 1 2 C.A. 34651 27.75 S
475 474 1 2 Jerwan, Mrs. Amin S (Marie Marthe Thuillard) female 23 0 0 SC/AH Basle 541 13.7917 D C
476 475 0 3 Strandberg, Miss. Ida Sofia female 22 0 0 7553 9.8375 S
477 476 0 1 Clifford, Mr. George Quincy male 0 0 110465 52 A14 S
478 477 0 2 Renouf, Mr. Peter Henry male 34 1 0 31027 21 S
479 478 0 3 Braund, Mr. Lewis Richard male 29 1 0 3460 7.0458 S
480 479 0 3 Karlsson, Mr. Nils August male 22 0 0 350060 7.5208 S
481 480 1 3 Hirvonen, Miss. Hildur E female 2 0 1 3101298 12.2875 S
482 481 0 3 Goodwin, Master. Harold Victor male 9 5 2 CA 2144 46.9 S
483 482 0 2 Frost, Mr. Anthony Wood "Archie" male 0 0 239854 0 S
484 483 0 3 Rouse, Mr. Richard Henry male 50 0 0 A/5 3594 8.05 S
485 484 1 3 Turkula, Mrs. (Hedwig) female 63 0 0 4134 9.5875 S
486 485 1 1 Bishop, Mr. Dickinson H male 25 1 0 11967 91.0792 B49 C
487 486 0 3 Lefebre, Miss. Jeannie female 3 1 4133 25.4667 S
488 487 1 1 Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby) female 35 1 0 19943 90 C93 S
489 488 0 1 Kent, Mr. Edward Austin male 58 0 0 11771 29.7 B37 C
490 489 0 3 Somerton, Mr. Francis William male 30 0 0 A.5. 18509 8.05 S
491 490 1 3 Coutts, Master. Eden Leslie "Neville" male 9 1 1 C.A. 37671 15.9 S
492 491 0 3 Hagland, Mr. Konrad Mathias Reiersen male 1 0 65304 19.9667 S
493 492 0 3 Windelov, Mr. Einar male 21 0 0 SOTON/OQ 3101317 7.25 S
494 493 0 1 Molson, Mr. Harry Markland male 55 0 0 113787 30.5 C30 S
495 494 0 1 Artagaveytia, Mr. Ramon male 71 0 0 PC 17609 49.5042 C
496 495 0 3 Stanley, Mr. Edward Roland male 21 0 0 A/4 45380 8.05 S
497 496 0 3 Yousseff, Mr. Gerious male 0 0 2627 14.4583 C
498 497 1 1 Eustis, Miss. Elizabeth Mussey female 54 1 0 36947 78.2667 D20 C
499 498 0 3 Shellard, Mr. Frederick William male 0 0 C.A. 6212 15.1 S
500 499 0 1 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25 1 2 113781 151.55 C22 C26 S
501 500 0 3 Svensson, Mr. Olof male 24 0 0 350035 7.7958 S
502 501 0 3 Calic, Mr. Petar male 17 0 0 315086 8.6625 S
503 502 0 3 Canavan, Miss. Mary female 21 0 0 364846 7.75 Q
504 503 0 3 O'Sullivan, Miss. Bridget Mary female 0 0 330909 7.6292 Q
505 504 0 3 Laitinen, Miss. Kristina Sofia female 37 0 0 4135 9.5875 S
506 505 1 1 Maioni, Miss. Roberta female 16 0 0 110152 86.5 B79 S
507 506 0 1 Penasco y Castellana, Mr. Victor de Satode male 18 1 0 PC 17758 108.9 C65 C
508 507 1 2 Quick, Mrs. Frederick Charles (Jane Richards) female 33 0 2 26360 26 S
509 508 1 1 Bradley, Mr. George ("George Arthur Brayton") male 0 0 111427 26.55 S
510 509 0 3 Olsen, Mr. Henry Margido male 28 0 0 C 4001 22.525 S
511 510 1 3 Lang, Mr. Fang male 26 0 0 1601 56.4958 S
512 511 1 3 Daly, Mr. Eugene Patrick male 29 0 0 382651 7.75 Q
513 512 0 3 Webber, Mr. James male 0 0 SOTON/OQ 3101316 8.05 S
514 513 1 1 McGough, Mr. James Robert male 36 0 0 PC 17473 26.2875 E25 S
515 514 1 1 Rothschild, Mrs. Martin (Elizabeth L. Barrett) female 54 1 0 PC 17603 59.4 C
516 515 0 3 Coleff, Mr. Satio male 24 0 0 349209 7.4958 S
517 516 0 1 Walker, Mr. William Anderson male 47 0 0 36967 34.0208 D46 S
518 517 1 2 Lemore, Mrs. (Amelia Milley) female 34 0 0 C.A. 34260 10.5 F33 S
519 518 0 3 Ryan, Mr. Patrick male 0 0 371110 24.15 Q
520 519 1 2 Angle, Mrs. William A (Florence "Mary" Agnes Hughes) female 36 1 0 226875 26 S
521 520 0 3 Pavlovic, Mr. Stefo male 32 0 0 349242 7.8958 S
522 521 1 1 Perreault, Miss. Anne female 30 0 0 12749 93.5 B73 S
523 522 0 3 Vovk, Mr. Janko male 22 0 0 349252 7.8958 S
524 523 0 3 Lahoud, Mr. Sarkis male 0 0 2624 7.225 C
525 524 1 1 Hippach, Mrs. Louis Albert (Ida Sophia Fischer) female 44 0 1 111361 57.9792 B18 C
526 525 0 3 Kassem, Mr. Fared male 0 0 2700 7.2292 C
527 526 0 3 Farrell, Mr. James male 40.5 0 0 367232 7.75 Q
528 527 1 2 Ridsdale, Miss. Lucy female 50 0 0 W./C. 14258 10.5 S
529 528 0 1 Farthing, Mr. John male 0 0 PC 17483 221.7792 C95 S
530 529 0 3 Salonen, Mr. Johan Werner male 39 0 0 3101296 7.925 S
531 530 0 2 Hocking, Mr. Richard George male 23 2 1 29104 11.5 S
532 531 1 2 Quick, Miss. Phyllis May female 2 1 1 26360 26 S
533 532 0 3 Toufik, Mr. Nakli male 0 0 2641 7.2292 C
534 533 0 3 Elias, Mr. Joseph Jr male 17 1 1 2690 7.2292 C
535 534 1 3 Peter, Mrs. Catherine (Catherine Rizk) female 0 2 2668 22.3583 C
536 535 0 3 Cacic, Miss. Marija female 30 0 0 315084 8.6625 S
537 536 1 2 Hart, Miss. Eva Miriam female 7 0 2 F.C.C. 13529 26.25 S
538 537 0 1 Butt, Major. Archibald Willingham male 45 0 0 113050 26.55 B38 S
539 538 1 1 LeRoy, Miss. Bertha female 30 0 0 PC 17761 106.425 C
540 539 0 3 Risien, Mr. Samuel Beard male 0 0 364498 14.5 S
541 540 1 1 Frolicher, Miss. Hedwig Margaritha female 22 0 2 13568 49.5 B39 C
542 541 1 1 Crosby, Miss. Harriet R female 36 0 2 WE/P 5735 71 B22 S
543 542 0 3 Andersson, Miss. Ingeborg Constanzia female 9 4 2 347082 31.275 S
544 543 0 3 Andersson, Miss. Sigrid Elisabeth female 11 4 2 347082 31.275 S
545 544 1 2 Beane, Mr. Edward male 32 1 0 2908 26 S
546 545 0 1 Douglas, Mr. Walter Donald male 50 1 0 PC 17761 106.425 C86 C
547 546 0 1 Nicholson, Mr. Arthur Ernest male 64 0 0 693 26 S
548 547 1 2 Beane, Mrs. Edward (Ethel Clarke) female 19 1 0 2908 26 S
549 548 1 2 Padro y Manent, Mr. Julian male 0 0 SC/PARIS 2146 13.8625 C
550 549 0 3 Goldsmith, Mr. Frank John male 33 1 1 363291 20.525 S
551 550 1 2 Davies, Master. John Morgan Jr male 8 1 1 C.A. 33112 36.75 S
552 551 1 1 Thayer, Mr. John Borland Jr male 17 0 2 17421 110.8833 C70 C
553 552 0 2 Sharp, Mr. Percival James R male 27 0 0 244358 26 S
554 553 0 3 O'Brien, Mr. Timothy male 0 0 330979 7.8292 Q
555 554 1 3 Leeni, Mr. Fahim ("Philip Zenni") male 22 0 0 2620 7.225 C
556 555 1 3 Ohman, Miss. Velin female 22 0 0 347085 7.775 S
557 556 0 1 Wright, Mr. George male 62 0 0 113807 26.55 S
558 557 1 1 Duff Gordon, Lady. (Lucille Christiana Sutherland) ("Mrs Morgan") female 48 1 0 11755 39.6 A16 C
559 558 0 1 Robbins, Mr. Victor male 0 0 PC 17757 227.525 C
560 559 1 1 Taussig, Mrs. Emil (Tillie Mandelbaum) female 39 1 1 110413 79.65 E67 S
561 560 1 3 de Messemaeker, Mrs. Guillaume Joseph (Emma) female 36 1 0 345572 17.4 S
562 561 0 3 Morrow, Mr. Thomas Rowan male 0 0 372622 7.75 Q
563 562 0 3 Sivic, Mr. Husein male 40 0 0 349251 7.8958 S
564 563 0 2 Norman, Mr. Robert Douglas male 28 0 0 218629 13.5 S
565 564 0 3 Simmons, Mr. John male 0 0 SOTON/OQ 392082 8.05 S
566 565 0 3 Meanwell, Miss. (Marion Ogden) female 0 0 SOTON/O.Q. 392087 8.05 S
567 566 0 3 Davies, Mr. Alfred J male 24 2 0 A/4 48871 24.15 S
568 567 0 3 Stoytcheff, Mr. Ilia male 19 0 0 349205 7.8958 S
569 568 0 3 Palsson, Mrs. Nils (Alma Cornelia Berglund) female 29 0 4 349909 21.075 S
570 569 0 3 Doharr, Mr. Tannous male 0 0 2686 7.2292 C
571 570 1 3 Jonsson, Mr. Carl male 32 0 0 350417 7.8542 S
572 571 1 2 Harris, Mr. George male 62 0 0 S.W./PP 752 10.5 S
573 572 1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson) female 53 2 0 11769 51.4792 C101 S
574 573 1 1 Flynn, Mr. John Irwin ("Irving") male 36 0 0 PC 17474 26.3875 E25 S
575 574 1 3 Kelly, Miss. Mary female 0 0 14312 7.75 Q
576 575 0 3 Rush, Mr. Alfred George John male 16 0 0 A/4. 20589 8.05 S
577 576 0 3 Patchett, Mr. George male 19 0 0 358585 14.5 S
578 577 1 2 Garside, Miss. Ethel female 34 0 0 243880 13 S
579 578 1 1 Silvey, Mrs. William Baird (Alice Munger) female 39 1 0 13507 55.9 E44 S
580 579 0 3 Caram, Mrs. Joseph (Maria Elias) female 1 0 2689 14.4583 C
581 580 1 3 Jussila, Mr. Eiriik male 32 0 0 STON/O 2. 3101286 7.925 S
582 581 1 2 Christy, Miss. Julie Rachel female 25 1 1 237789 30 S
583 582 1 1 Thayer, Mrs. John Borland (Marian Longstreth Morris) female 39 1 1 17421 110.8833 C68 C
584 583 0 2 Downton, Mr. William James male 54 0 0 28403 26 S
585 584 0 1 Ross, Mr. John Hugo male 36 0 0 13049 40.125 A10 C
586 585 0 3 Paulner, Mr. Uscher male 0 0 3411 8.7125 C
587 586 1 1 Taussig, Miss. Ruth female 18 0 2 110413 79.65 E68 S
588 587 0 2 Jarvis, Mr. John Denzil male 47 0 0 237565 15 S
589 588 1 1 Frolicher-Stehli, Mr. Maxmillian male 60 1 1 13567 79.2 B41 C
590 589 0 3 Gilinski, Mr. Eliezer male 22 0 0 14973 8.05 S
591 590 0 3 Murdlin, Mr. Joseph male 0 0 A./5. 3235 8.05 S
592 591 0 3 Rintamaki, Mr. Matti male 35 0 0 STON/O 2. 3101273 7.125 S
593 592 1 1 Stephenson, Mrs. Walter Bertram (Martha Eustis) female 52 1 0 36947 78.2667 D20 C
594 593 0 3 Elsbury, Mr. William James male 47 0 0 A/5 3902 7.25 S
595 594 0 3 Bourke, Miss. Mary female 0 2 364848 7.75 Q
596 595 0 2 Chapman, Mr. John Henry male 37 1 0 SC/AH 29037 26 S
597 596 0 3 Van Impe, Mr. Jean Baptiste male 36 1 1 345773 24.15 S
598 597 1 2 Leitch, Miss. Jessie Wills female 0 0 248727 33 S
599 598 0 3 Johnson, Mr. Alfred male 49 0 0 LINE 0 S
600 599 0 3 Boulos, Mr. Hanna male 0 0 2664 7.225 C
601 600 1 1 Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan") male 49 1 0 PC 17485 56.9292 A20 C
602 601 1 2 Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy) female 24 2 1 243847 27 S
603 602 0 3 Slabenoff, Mr. Petco male 0 0 349214 7.8958 S
604 603 0 1 Harrington, Mr. Charles H male 0 0 113796 42.4 S
605 604 0 3 Torber, Mr. Ernst William male 44 0 0 364511 8.05 S
606 605 1 1 Homer, Mr. Harry ("Mr E Haven") male 35 0 0 111426 26.55 C
607 606 0 3 Lindell, Mr. Edvard Bengtsson male 36 1 0 349910 15.55 S
608 607 0 3 Karaic, Mr. Milan male 30 0 0 349246 7.8958 S
609 608 1 1 Daniel, Mr. Robert Williams male 27 0 0 113804 30.5 S
610 609 1 2 Laroche, Mrs. Joseph (Juliette Marie Louise Lafargue) female 22 1 2 SC/Paris 2123 41.5792 C
611 610 1 1 Shutes, Miss. Elizabeth W female 40 0 0 PC 17582 153.4625 C125 S
612 611 0 3 Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren) female 39 1 5 347082 31.275 S
613 612 0 3 Jardin, Mr. Jose Neto male 0 0 SOTON/O.Q. 3101305 7.05 S
614 613 1 3 Murphy, Miss. Margaret Jane female 1 0 367230 15.5 Q
615 614 0 3 Horgan, Mr. John male 0 0 370377 7.75 Q
616 615 0 3 Brocklebank, Mr. William Alfred male 35 0 0 364512 8.05 S
617 616 1 2 Herman, Miss. Alice female 24 1 2 220845 65 S
618 617 0 3 Danbom, Mr. Ernst Gilbert male 34 1 1 347080 14.4 S
619 618 0 3 Lobb, Mrs. William Arthur (Cordelia K Stanlick) female 26 1 0 A/5. 3336 16.1 S
620 619 1 2 Becker, Miss. Marion Louise female 4 2 1 230136 39 F4 S
621 620 0 2 Gavey, Mr. Lawrence male 26 0 0 31028 10.5 S
622 621 0 3 Yasbeck, Mr. Antoni male 27 1 0 2659 14.4542 C
623 622 1 1 Kimball, Mr. Edwin Nelson Jr male 42 1 0 11753 52.5542 D19 S
624 623 1 3 Nakid, Mr. Sahid male 20 1 1 2653 15.7417 C
625 624 0 3 Hansen, Mr. Henry Damsgaard male 21 0 0 350029 7.8542 S
626 625 0 3 Bowen, Mr. David John "Dai" male 21 0 0 54636 16.1 S
627 626 0 1 Sutton, Mr. Frederick male 61 0 0 36963 32.3208 D50 S
628 627 0 2 Kirkland, Rev. Charles Leonard male 57 0 0 219533 12.35 Q
629 628 1 1 Longley, Miss. Gretchen Fiske female 21 0 0 13502 77.9583 D9 S
630 629 0 3 Bostandyeff, Mr. Guentcho male 26 0 0 349224 7.8958 S
631 630 0 3 O'Connell, Mr. Patrick D male 0 0 334912 7.7333 Q
632 631 1 1 Barkworth, Mr. Algernon Henry Wilson male 80 0 0 27042 30 A23 S
633 632 0 3 Lundahl, Mr. Johan Svensson male 51 0 0 347743 7.0542 S
634 633 1 1 Stahelin-Maeglin, Dr. Max male 32 0 0 13214 30.5 B50 C
635 634 0 1 Parr, Mr. William Henry Marsh male 0 0 112052 0 S
636 635 0 3 Skoog, Miss. Mabel female 9 3 2 347088 27.9 S
637 636 1 2 Davis, Miss. Mary female 28 0 0 237668 13 S
638 637 0 3 Leinonen, Mr. Antti Gustaf male 32 0 0 STON/O 2. 3101292 7.925 S
639 638 0 2 Collyer, Mr. Harvey male 31 1 1 C.A. 31921 26.25 S
640 639 0 3 Panula, Mrs. Juha (Maria Emilia Ojala) female 41 0 5 3101295 39.6875 S
641 640 0 3 Thorneycroft, Mr. Percival male 1 0 376564 16.1 S
642 641 0 3 Jensen, Mr. Hans Peder male 20 0 0 350050 7.8542 S
643 642 1 1 Sagesser, Mlle. Emma female 24 0 0 PC 17477 69.3 B35 C
644 643 0 3 Skoog, Miss. Margit Elizabeth female 2 3 2 347088 27.9 S
645 644 1 3 Foo, Mr. Choong male 0 0 1601 56.4958 S
646 645 1 3 Baclini, Miss. Eugenie female 0.75 2 1 2666 19.2583 C
647 646 1 1 Harper, Mr. Henry Sleeper male 48 1 0 PC 17572 76.7292 D33 C
648 647 0 3 Cor, Mr. Liudevit male 19 0 0 349231 7.8958 S
649 648 1 1 Simonius-Blumer, Col. Oberst Alfons male 56 0 0 13213 35.5 A26 C
650 649 0 3 Willey, Mr. Edward male 0 0 S.O./P.P. 751 7.55 S
651 650 1 3 Stanley, Miss. Amy Zillah Elsie female 23 0 0 CA. 2314 7.55 S
652 651 0 3 Mitkoff, Mr. Mito male 0 0 349221 7.8958 S
653 652 1 2 Doling, Miss. Elsie female 18 0 1 231919 23 S
654 653 0 3 Kalvik, Mr. Johannes Halvorsen male 21 0 0 8475 8.4333 S
655 654 1 3 O'Leary, Miss. Hanora "Norah" female 0 0 330919 7.8292 Q
656 655 0 3 Hegarty, Miss. Hanora "Nora" female 18 0 0 365226 6.75 Q
657 656 0 2 Hickman, Mr. Leonard Mark male 24 2 0 S.O.C. 14879 73.5 S
658 657 0 3 Radeff, Mr. Alexander male 0 0 349223 7.8958 S
659 658 0 3 Bourke, Mrs. John (Catherine) female 32 1 1 364849 15.5 Q
660 659 0 2 Eitemiller, Mr. George Floyd male 23 0 0 29751 13 S
661 660 0 1 Newell, Mr. Arthur Webster male 58 0 2 35273 113.275 D48 C
662 661 1 1 Frauenthal, Dr. Henry William male 50 2 0 PC 17611 133.65 S
663 662 0 3 Badt, Mr. Mohamed male 40 0 0 2623 7.225 C
664 663 0 1 Colley, Mr. Edward Pomeroy male 47 0 0 5727 25.5875 E58 S
665 664 0 3 Coleff, Mr. Peju male 36 0 0 349210 7.4958 S
666 665 1 3 Lindqvist, Mr. Eino William male 20 1 0 STON/O 2. 3101285 7.925 S
667 666 0 2 Hickman, Mr. Lewis male 32 2 0 S.O.C. 14879 73.5 S
668 667 0 2 Butler, Mr. Reginald Fenton male 25 0 0 234686 13 S
669 668 0 3 Rommetvedt, Mr. Knud Paust male 0 0 312993 7.775 S
670 669 0 3 Cook, Mr. Jacob male 43 0 0 A/5 3536 8.05 S
671 670 1 1 Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright) female 1 0 19996 52 C126 S
672 671 1 2 Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford) female 40 1 1 29750 39 S
673 672 0 1 Davidson, Mr. Thornton male 31 1 0 F.C. 12750 52 B71 S
674 673 0 2 Mitchell, Mr. Henry Michael male 70 0 0 C.A. 24580 10.5 S
675 674 1 2 Wilhelms, Mr. Charles male 31 0 0 244270 13 S
676 675 0 2 Watson, Mr. Ennis Hastings male 0 0 239856 0 S
677 676 0 3 Edvardsson, Mr. Gustaf Hjalmar male 18 0 0 349912 7.775 S
678 677 0 3 Sawyer, Mr. Frederick Charles male 24.5 0 0 342826 8.05 S
679 678 1 3 Turja, Miss. Anna Sofia female 18 0 0 4138 9.8417 S
680 679 0 3 Goodwin, Mrs. Frederick (Augusta Tyler) female 43 1 6 CA 2144 46.9 S
681 680 1 1 Cardeza, Mr. Thomas Drake Martinez male 36 0 1 PC 17755 512.3292 B51 B53 B55 C
682 681 0 3 Peters, Miss. Katie female 0 0 330935 8.1375 Q
683 682 1 1 Hassab, Mr. Hammad male 27 0 0 PC 17572 76.7292 D49 C
684 683 0 3 Olsvigen, Mr. Thor Anderson male 20 0 0 6563 9.225 S
685 684 0 3 Goodwin, Mr. Charles Edward male 14 5 2 CA 2144 46.9 S
686 685 0 2 Brown, Mr. Thomas William Solomon male 60 1 1 29750 39 S
687 686 0 2 Laroche, Mr. Joseph Philippe Lemercier male 25 1 2 SC/Paris 2123 41.5792 C
688 687 0 3 Panula, Mr. Jaako Arnold male 14 4 1 3101295 39.6875 S
689 688 0 3 Dakic, Mr. Branko male 19 0 0 349228 10.1708 S
690 689 0 3 Fischer, Mr. Eberhard Thelander male 18 0 0 350036 7.7958 S
691 690 1 1 Madill, Miss. Georgette Alexandra female 15 0 1 24160 211.3375 B5 S
692 691 1 1 Dick, Mr. Albert Adrian male 31 1 0 17474 57 B20 S
693 692 1 3 Karun, Miss. Manca female 4 0 1 349256 13.4167 C
694 693 1 3 Lam, Mr. Ali male 0 0 1601 56.4958 S
695 694 0 3 Saad, Mr. Khalil male 25 0 0 2672 7.225 C
696 695 0 1 Weir, Col. John male 60 0 0 113800 26.55 S
697 696 0 2 Chapman, Mr. Charles Henry male 52 0 0 248731 13.5 S
698 697 0 3 Kelly, Mr. James male 44 0 0 363592 8.05 S
699 698 1 3 Mullens, Miss. Katherine "Katie" female 0 0 35852 7.7333 Q
700 699 0 1 Thayer, Mr. John Borland male 49 1 1 17421 110.8833 C68 C
701 700 0 3 Humblen, Mr. Adolf Mathias Nicolai Olsen male 42 0 0 348121 7.65 F G63 S
702 701 1 1 Astor, Mrs. John Jacob (Madeleine Talmadge Force) female 18 1 0 PC 17757 227.525 C62 C64 C
703 702 1 1 Silverthorne, Mr. Spencer Victor male 35 0 0 PC 17475 26.2875 E24 S
704 703 0 3 Barbara, Miss. Saiide female 18 0 1 2691 14.4542 C
705 704 0 3 Gallagher, Mr. Martin male 25 0 0 36864 7.7417 Q
706 705 0 3 Hansen, Mr. Henrik Juul male 26 1 0 350025 7.8542 S
707 706 0 2 Morley, Mr. Henry Samuel ("Mr Henry Marshall") male 39 0 0 250655 26 S
708 707 1 2 Kelly, Mrs. Florence "Fannie" female 45 0 0 223596 13.5 S
709 708 1 1 Calderhead, Mr. Edward Pennington male 42 0 0 PC 17476 26.2875 E24 S
710 709 1 1 Cleaver, Miss. Alice female 22 0 0 113781 151.55 S
711 710 1 3 Moubarek, Master. Halim Gonios ("William George") male 1 1 2661 15.2458 C
712 711 1 1 Mayne, Mlle. Berthe Antonine ("Mrs de Villiers") female 24 0 0 PC 17482 49.5042 C90 C
713 712 0 1 Klaber, Mr. Herman male 0 0 113028 26.55 C124 S
714 713 1 1 Taylor, Mr. Elmer Zebley male 48 1 0 19996 52 C126 S
715 714 0 3 Larsson, Mr. August Viktor male 29 0 0 7545 9.4833 S
716 715 0 2 Greenberg, Mr. Samuel male 52 0 0 250647 13 S
717 716 0 3 Soholt, Mr. Peter Andreas Lauritz Andersen male 19 0 0 348124 7.65 F G73 S
718 717 1 1 Endres, Miss. Caroline Louise female 38 0 0 PC 17757 227.525 C45 C
719 718 1 2 Troutt, Miss. Edwina Celia "Winnie" female 27 0 0 34218 10.5 E101 S
720 719 0 3 McEvoy, Mr. Michael male 0 0 36568 15.5 Q
721 720 0 3 Johnson, Mr. Malkolm Joackim male 33 0 0 347062 7.775 S
722 721 1 2 Harper, Miss. Annie Jessie "Nina" female 6 0 1 248727 33 S
723 722 0 3 Jensen, Mr. Svend Lauritz male 17 1 0 350048 7.0542 S
724 723 0 2 Gillespie, Mr. William Henry male 34 0 0 12233 13 S
725 724 0 2 Hodges, Mr. Henry Price male 50 0 0 250643 13 S
726 725 1 1 Chambers, Mr. Norman Campbell male 27 1 0 113806 53.1 E8 S
727 726 0 3 Oreskovic, Mr. Luka male 20 0 0 315094 8.6625 S
728 727 1 2 Renouf, Mrs. Peter Henry (Lillian Jefferys) female 30 3 0 31027 21 S
729 728 1 3 Mannion, Miss. Margareth female 0 0 36866 7.7375 Q
730 729 0 2 Bryhl, Mr. Kurt Arnold Gottfrid male 25 1 0 236853 26 S
731 730 0 3 Ilmakangas, Miss. Pieta Sofia female 25 1 0 STON/O2. 3101271 7.925 S
732 731 1 1 Allen, Miss. Elisabeth Walton female 29 0 0 24160 211.3375 B5 S
733 732 0 3 Hassan, Mr. Houssein G N male 11 0 0 2699 18.7875 C
734 733 0 2 Knight, Mr. Robert J male 0 0 239855 0 S
735 734 0 2 Berriman, Mr. William John male 23 0 0 28425 13 S
736 735 0 2 Troupiansky, Mr. Moses Aaron male 23 0 0 233639 13 S
737 736 0 3 Williams, Mr. Leslie male 28.5 0 0 54636 16.1 S
738 737 0 3 Ford, Mrs. Edward (Margaret Ann Watson) female 48 1 3 W./C. 6608 34.375 S
739 738 1 1 Lesurer, Mr. Gustave J male 35 0 0 PC 17755 512.3292 B101 C
740 739 0 3 Ivanoff, Mr. Kanio male 0 0 349201 7.8958 S
741 740 0 3 Nankoff, Mr. Minko male 0 0 349218 7.8958 S
742 741 1 1 Hawksford, Mr. Walter James male 0 0 16988 30 D45 S
743 742 0 1 Cavendish, Mr. Tyrell William male 36 1 0 19877 78.85 C46 S
744 743 1 1 Ryerson, Miss. Susan Parker "Suzette" female 21 2 2 PC 17608 262.375 B57 B59 B63 B66 C
745 744 0 3 McNamee, Mr. Neal male 24 1 0 376566 16.1 S
746 745 1 3 Stranden, Mr. Juho male 31 0 0 STON/O 2. 3101288 7.925 S
747 746 0 1 Crosby, Capt. Edward Gifford male 70 1 1 WE/P 5735 71 B22 S
748 747 0 3 Abbott, Mr. Rossmore Edward male 16 1 1 C.A. 2673 20.25 S
749 748 1 2 Sinkkonen, Miss. Anna female 30 0 0 250648 13 S
750 749 0 1 Marvin, Mr. Daniel Warner male 19 1 0 113773 53.1 D30 S
751 750 0 3 Connaghton, Mr. Michael male 31 0 0 335097 7.75 Q
752 751 1 2 Wells, Miss. Joan female 4 1 1 29103 23 S
753 752 1 3 Moor, Master. Meier male 6 0 1 392096 12.475 E121 S
754 753 0 3 Vande Velde, Mr. Johannes Joseph male 33 0 0 345780 9.5 S
755 754 0 3 Jonkoff, Mr. Lalio male 23 0 0 349204 7.8958 S
756 755 1 2 Herman, Mrs. Samuel (Jane Laver) female 48 1 2 220845 65 S
757 756 1 2 Hamalainen, Master. Viljo male 0.67 1 1 250649 14.5 S
758 757 0 3 Carlsson, Mr. August Sigfrid male 28 0 0 350042 7.7958 S
759 758 0 2 Bailey, Mr. Percy Andrew male 18 0 0 29108 11.5 S
760 759 0 3 Theobald, Mr. Thomas Leonard male 34 0 0 363294 8.05 S
761 760 1 1 Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards) female 33 0 0 110152 86.5 B77 S
762 761 0 3 Garfirth, Mr. John male 0 0 358585 14.5 S
763 762 0 3 Nirva, Mr. Iisakki Antino Aijo male 41 0 0 SOTON/O2 3101272 7.125 S
764 763 1 3 Barah, Mr. Hanna Assi male 20 0 0 2663 7.2292 C
765 764 1 1 Carter, Mrs. William Ernest (Lucile Polk) female 36 1 2 113760 120 B96 B98 S
766 765 0 3 Eklund, Mr. Hans Linus male 16 0 0 347074 7.775 S
767 766 1 1 Hogeboom, Mrs. John C (Anna Andrews) female 51 1 0 13502 77.9583 D11 S
768 767 0 1 Brewe, Dr. Arthur Jackson male 0 0 112379 39.6 C
769 768 0 3 Mangan, Miss. Mary female 30.5 0 0 364850 7.75 Q
770 769 0 3 Moran, Mr. Daniel J male 1 0 371110 24.15 Q
771 770 0 3 Gronnestad, Mr. Daniel Danielsen male 32 0 0 8471 8.3625 S
772 771 0 3 Lievens, Mr. Rene Aime male 24 0 0 345781 9.5 S
773 772 0 3 Jensen, Mr. Niels Peder male 48 0 0 350047 7.8542 S
774 773 0 2 Mack, Mrs. (Mary) female 57 0 0 S.O./P.P. 3 10.5 E77 S
775 774 0 3 Elias, Mr. Dibo male 0 0 2674 7.225 C
776 775 1 2 Hocking, Mrs. Elizabeth (Eliza Needs) female 54 1 3 29105 23 S
777 776 0 3 Myhrman, Mr. Pehr Fabian Oliver Malkolm male 18 0 0 347078 7.75 S
778 777 0 3 Tobin, Mr. Roger male 0 0 383121 7.75 F38 Q
779 778 1 3 Emanuel, Miss. Virginia Ethel female 5 0 0 364516 12.475 S
780 779 0 3 Kilgannon, Mr. Thomas J male 0 0 36865 7.7375 Q
781 780 1 1 Robert, Mrs. Edward Scott (Elisabeth Walton McMillan) female 43 0 1 24160 211.3375 B3 S
782 781 1 3 Ayoub, Miss. Banoura female 13 0 0 2687 7.2292 C
783 782 1 1 Dick, Mrs. Albert Adrian (Vera Gillespie) female 17 1 0 17474 57 B20 S
784 783 0 1 Long, Mr. Milton Clyde male 29 0 0 113501 30 D6 S
785 784 0 3 Johnston, Mr. Andrew G male 1 2 W./C. 6607 23.45 S
786 785 0 3 Ali, Mr. William male 25 0 0 SOTON/O.Q. 3101312 7.05 S
787 786 0 3 Harmer, Mr. Abraham (David Lishin) male 25 0 0 374887 7.25 S
788 787 1 3 Sjoblom, Miss. Anna Sofia female 18 0 0 3101265 7.4958 S
789 788 0 3 Rice, Master. George Hugh male 8 4 1 382652 29.125 Q
790 789 1 3 Dean, Master. Bertram Vere male 1 1 2 C.A. 2315 20.575 S
791 790 0 1 Guggenheim, Mr. Benjamin male 46 0 0 PC 17593 79.2 B82 B84 C
792 791 0 3 Keane, Mr. Andrew "Andy" male 0 0 12460 7.75 Q
793 792 0 2 Gaskell, Mr. Alfred male 16 0 0 239865 26 S
794 793 0 3 Sage, Miss. Stella Anna female 8 2 CA. 2343 69.55 S
795 794 0 1 Hoyt, Mr. William Fisher male 0 0 PC 17600 30.6958 C
796 795 0 3 Dantcheff, Mr. Ristiu male 25 0 0 349203 7.8958 S
797 796 0 2 Otter, Mr. Richard male 39 0 0 28213 13 S
798 797 1 1 Leader, Dr. Alice (Farnham) female 49 0 0 17465 25.9292 D17 S
799 798 1 3 Osman, Mrs. Mara female 31 0 0 349244 8.6833 S
800 799 0 3 Ibrahim Shawah, Mr. Yousseff male 30 0 0 2685 7.2292 C
801 800 0 3 Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert) female 30 1 1 345773 24.15 S
802 801 0 2 Ponesell, Mr. Martin male 34 0 0 250647 13 S
803 802 1 2 Collyer, Mrs. Harvey (Charlotte Annie Tate) female 31 1 1 C.A. 31921 26.25 S
804 803 1 1 Carter, Master. William Thornton II male 11 1 2 113760 120 B96 B98 S
805 804 1 3 Thomas, Master. Assad Alexander male 0.42 0 1 2625 8.5167 C
806 805 1 3 Hedman, Mr. Oskar Arvid male 27 0 0 347089 6.975 S
807 806 0 3 Johansson, Mr. Karl Johan male 31 0 0 347063 7.775 S
808 807 0 1 Andrews, Mr. Thomas Jr male 39 0 0 112050 0 A36 S
809 808 0 3 Pettersson, Miss. Ellen Natalia female 18 0 0 347087 7.775 S
810 809 0 2 Meyer, Mr. August male 39 0 0 248723 13 S
811 810 1 1 Chambers, Mrs. Norman Campbell (Bertha Griggs) female 33 1 0 113806 53.1 E8 S
812 811 0 3 Alexander, Mr. William male 26 0 0 3474 7.8875 S
813 812 0 3 Lester, Mr. James male 39 0 0 A/4 48871 24.15 S
814 813 0 2 Slemen, Mr. Richard James male 35 0 0 28206 10.5 S
815 814 0 3 Andersson, Miss. Ebba Iris Alfrida female 6 4 2 347082 31.275 S
816 815 0 3 Tomlin, Mr. Ernest Portage male 30.5 0 0 364499 8.05 S
817 816 0 1 Fry, Mr. Richard male 0 0 112058 0 B102 S
818 817 0 3 Heininen, Miss. Wendla Maria female 23 0 0 STON/O2. 3101290 7.925 S
819 818 0 2 Mallet, Mr. Albert male 31 1 1 S.C./PARIS 2079 37.0042 C
820 819 0 3 Holm, Mr. John Fredrik Alexander male 43 0 0 C 7075 6.45 S
821 820 0 3 Skoog, Master. Karl Thorsten male 10 3 2 347088 27.9 S
822 821 1 1 Hays, Mrs. Charles Melville (Clara Jennings Gregg) female 52 1 1 12749 93.5 B69 S
823 822 1 3 Lulic, Mr. Nikola male 27 0 0 315098 8.6625 S
824 823 0 1 Reuchlin, Jonkheer. John George male 38 0 0 19972 0 S
825 824 1 3 Moor, Mrs. (Beila) female 27 0 1 392096 12.475 E121 S
826 825 0 3 Panula, Master. Urho Abraham male 2 4 1 3101295 39.6875 S
827 826 0 3 Flynn, Mr. John male 0 0 368323 6.95 Q
828 827 0 3 Lam, Mr. Len male 0 0 1601 56.4958 S
829 828 1 2 Mallet, Master. Andre male 1 0 2 S.C./PARIS 2079 37.0042 C
830 829 1 3 McCormack, Mr. Thomas Joseph male 0 0 367228 7.75 Q
831 830 1 1 Stone, Mrs. George Nelson (Martha Evelyn) female 62 0 0 113572 80 B28
832 831 1 3 Yasbeck, Mrs. Antoni (Selini Alexander) female 15 1 0 2659 14.4542 C
833 832 1 2 Richards, Master. George Sibley male 0.83 1 1 29106 18.75 S
834 833 0 3 Saad, Mr. Amin male 0 0 2671 7.2292 C
835 834 0 3 Augustsson, Mr. Albert male 23 0 0 347468 7.8542 S
836 835 0 3 Allum, Mr. Owen George male 18 0 0 2223 8.3 S
837 836 1 1 Compton, Miss. Sara Rebecca female 39 1 1 PC 17756 83.1583 E49 C
838 837 0 3 Pasic, Mr. Jakob male 21 0 0 315097 8.6625 S
839 838 0 3 Sirota, Mr. Maurice male 0 0 392092 8.05 S
840 839 1 3 Chip, Mr. Chang male 32 0 0 1601 56.4958 S
841 840 1 1 Marechal, Mr. Pierre male 0 0 11774 29.7 C47 C
842 841 0 3 Alhomaki, Mr. Ilmari Rudolf male 20 0 0 SOTON/O2 3101287 7.925 S
843 842 0 2 Mudd, Mr. Thomas Charles male 16 0 0 S.O./P.P. 3 10.5 S
844 843 1 1 Serepeca, Miss. Augusta female 30 0 0 113798 31 C
845 844 0 3 Lemberopolous, Mr. Peter L male 34.5 0 0 2683 6.4375 C
846 845 0 3 Culumovic, Mr. Jeso male 17 0 0 315090 8.6625 S
847 846 0 3 Abbing, Mr. Anthony male 42 0 0 C.A. 5547 7.55 S
848 847 0 3 Sage, Mr. Douglas Bullen male 8 2 CA. 2343 69.55 S
849 848 0 3 Markoff, Mr. Marin male 35 0 0 349213 7.8958 C
850 849 0 2 Harper, Rev. John male 28 0 1 248727 33 S
851 850 1 1 Goldenberg, Mrs. Samuel L (Edwiga Grabowska) female 1 0 17453 89.1042 C92 C
852 851 0 3 Andersson, Master. Sigvard Harald Elias male 4 4 2 347082 31.275 S
853 852 0 3 Svensson, Mr. Johan male 74 0 0 347060 7.775 S
854 853 0 3 Boulos, Miss. Nourelain female 9 1 1 2678 15.2458 C
855 854 1 1 Lines, Miss. Mary Conover female 16 0 1 PC 17592 39.4 D28 S
856 855 0 2 Carter, Mrs. Ernest Courtenay (Lilian Hughes) female 44 1 0 244252 26 S
857 856 1 3 Aks, Mrs. Sam (Leah Rosen) female 18 0 1 392091 9.35 S
858 857 1 1 Wick, Mrs. George Dennick (Mary Hitchcock) female 45 1 1 36928 164.8667 S
859 858 1 1 Daly, Mr. Peter Denis male 51 0 0 113055 26.55 E17 S
860 859 1 3 Baclini, Mrs. Solomon (Latifa Qurban) female 24 0 3 2666 19.2583 C
861 860 0 3 Razi, Mr. Raihed male 0 0 2629 7.2292 C
862 861 0 3 Hansen, Mr. Claus Peter male 41 2 0 350026 14.1083 S
863 862 0 2 Giles, Mr. Frederick Edward male 21 1 0 28134 11.5 S
864 863 1 1 Swift, Mrs. Frederick Joel (Margaret Welles Barron) female 48 0 0 17466 25.9292 D17 S
865 864 0 3 Sage, Miss. Dorothy Edith "Dolly" female 8 2 CA. 2343 69.55 S
866 865 0 2 Gill, Mr. John William male 24 0 0 233866 13 S
867 866 1 2 Bystrom, Mrs. (Karolina) female 42 0 0 236852 13 S
868 867 1 2 Duran y More, Miss. Asuncion female 27 1 0 SC/PARIS 2149 13.8583 C
869 868 0 1 Roebling, Mr. Washington Augustus II male 31 0 0 PC 17590 50.4958 A24 S
870 869 0 3 van Melkebeke, Mr. Philemon male 0 0 345777 9.5 S
871 870 1 3 Johnson, Master. Harold Theodor male 4 1 1 347742 11.1333 S
872 871 0 3 Balkic, Mr. Cerin male 26 0 0 349248 7.8958 S
873 872 1 1 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47 1 1 11751 52.5542 D35 S
874 873 0 1 Carlsson, Mr. Frans Olof male 33 0 0 695 5 B51 B53 B55 S
875 874 0 3 Vander Cruyssen, Mr. Victor male 47 0 0 345765 9 S
876 875 1 2 Abelson, Mrs. Samuel (Hannah Wizosky) female 28 1 0 P/PP 3381 24 C
877 876 1 3 Najib, Miss. Adele Kiamie "Jane" female 15 0 0 2667 7.225 C
878 877 0 3 Gustafsson, Mr. Alfred Ossian male 20 0 0 7534 9.8458 S
879 878 0 3 Petroff, Mr. Nedelio male 19 0 0 349212 7.8958 S
880 879 0 3 Laleff, Mr. Kristo male 0 0 349217 7.8958 S
881 880 1 1 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56 0 1 11767 83.1583 C50 C
882 881 1 2 Shelley, Mrs. William (Imanita Parrish Hall) female 25 0 1 230433 26 S
883 882 0 3 Markun, Mr. Johann male 33 0 0 349257 7.8958 S
884 883 0 3 Dahlberg, Miss. Gerda Ulrika female 22 0 0 7552 10.5167 S
885 884 0 2 Banfield, Mr. Frederick James male 28 0 0 C.A./SOTON 34068 10.5 S
886 885 0 3 Sutehall, Mr. Henry Jr male 25 0 0 SOTON/OQ 392076 7.05 S
887 886 0 3 Rice, Mrs. William (Margaret Norton) female 39 0 5 382652 29.125 Q
888 887 0 2 Montvila, Rev. Juozas male 27 0 0 211536 13 S
889 888 1 1 Graham, Miss. Margaret Edith female 19 0 0 112053 30 B42 S
890 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female 1 2 W./C. 6607 23.45 S
891 890 1 1 Behr, Mr. Karl Howell male 26 0 0 111369 30 C148 C
892 891 0 3 Dooley, Mr. Patrick male 32 0 0 370376 7.75 Q

View File

@@ -0,0 +1,270 @@
This is the 100th Etext file presented by Project Gutenberg, and
is presented in cooperation with World Library, Inc., from their
Library of the Future and Shakespeare CDROMS. Project Gutenberg
often releases Etexts that are NOT placed in the Public Domain!!
Shakespeare
*This Etext has certain copyright implications you should read!*
<<THIS ELECTRONIC VERSION OF THE COMPLETE WORKS OF WILLIAM
SHAKESPEARE IS COPYRIGHT 1990-1993 BY WORLD LIBRARY, INC., AND IS
PROVIDED BY PROJECT GUTENBERG ETEXT OF ILLINOIS BENEDICTINE COLLEGE
WITH PERMISSION. ELECTRONIC AND MACHINE READABLE COPIES MAY BE
DISTRIBUTED SO LONG AS SUCH COPIES (1) ARE FOR YOUR OR OTHERS
PERSONAL USE ONLY, AND (2) ARE NOT DISTRIBUTED OR USED
COMMERCIALLY. PROHIBITED COMMERCIAL DISTRIBUTION INCLUDES BY ANY
SERVICE THAT CHARGES FOR DOWNLOAD TIME OR FOR MEMBERSHIP.>>
*Project Gutenberg is proud to cooperate with The World Library*
in the presentation of The Complete Works of William Shakespeare
for your reading for education and entertainment. HOWEVER, THIS
IS NEITHER SHAREWARE NOR PUBLIC DOMAIN. . .AND UNDER THE LIBRARY
OF THE FUTURE CONDITIONS OF THIS PRESENTATION. . .NO CHARGES MAY
BE MADE FOR *ANY* ACCESS TO THIS MATERIAL. YOU ARE ENCOURAGED!!
TO GIVE IT AWAY TO ANYONE YOU LIKE, BUT NO CHARGES ARE ALLOWED!!
**Welcome To The World of Free Plain Vanilla Electronic Texts**
**Etexts Readable By Both Humans and By Computers, Since 1971**
*These Etexts Prepared By Hundreds of Volunteers and Donations*
Information on contacting Project Gutenberg to get Etexts, and
further information is included below. We need your donations.
The Complete Works of William Shakespeare
January, 1994 [Etext #100]
The Library of the Future Complete Works of William Shakespeare
Library of the Future is a TradeMark (TM) of World Library Inc.
******This file should be named shaks12.txt or shaks12.zip*****
Corrected EDITIONS of our etexts get a new NUMBER, shaks13.txt
VERSIONS based on separate sources get new LETTER, shaks10a.txt
If you would like further information about World Library, Inc.
Please call them at 1-800-443-0238 or email julianc@netcom.com
Please give them our thanks for their Shakespeare cooperation!
The official release date of all Project Gutenberg Etexts is at
Midnight, Central Time, of the last day of the stated month. A
preliminary version may often be posted for suggestion, comment
and editing by those who wish to do so. To be sure you have an
up to date first edition [xxxxx10x.xxx] please check file sizes
in the first week of the next month. Since our ftp program has
a bug in it that scrambles the date [tried to fix and failed] a
look at the file size will have to do, but we will try to see a
new copy has at least one byte more or less.
Information about Project Gutenberg (one page)
We produce about two million dollars for each hour we work. The
fifty hours is one conservative estimate for how long it we take
to get any etext selected, entered, proofread, edited, copyright
searched and analyzed, the copyright letters written, etc. This
projected audience is one hundred million readers. If our value
per text is nominally estimated at one dollar, then we produce 2
million dollars per hour this year we, will have to do four text
files per month: thus upping our productivity from one million.
The Goal of Project Gutenberg is to Give Away One Trillion Etext
Files by the December 31, 2001. [10,000 x 100,000,000=Trillion]
This is ten thousand titles each to one hundred million readers,
which is 10% of the expected number of computer users by the end
of the year 2001.
We need your donations more than ever!
All donations should be made to "Project Gutenberg/IBC", and are
tax deductible to the extent allowable by law ("IBC" is Illinois
Benedictine College). (Subscriptions to our paper newsletter go
to IBC, too)
For these and other matters, please mail to:
Project Gutenberg
P. O. Box 2782
Champaign, IL 61825
When all other email fails try our Michael S. Hart, Executive Director:
hart@vmd.cso.uiuc.edu (internet) hart@uiucvmd (bitnet)
We would prefer to send you this information by email
(Internet, Bitnet, Compuserve, ATTMAIL or MCImail).
******
If you have an FTP program (or emulator), please
FTP directly to the Project Gutenberg archives:
[Mac users, do NOT point and click. . .type]
ftp mrcnext.cso.uiuc.edu
login: anonymous
password: your@login
cd etext/etext91
or cd etext92
or cd etext93 [for new books] [now also in cd etext/etext93]
or cd etext/articles [get suggest gut for more information]
dir [to see files]
get or mget [to get files. . .set bin for zip files]
GET 0INDEX.GUT
for a list of books
and
GET NEW GUT for general information
and
MGET GUT* for newsletters.
**Information prepared by the Project Gutenberg legal advisor**
***** SMALL PRINT! for COMPLETE SHAKESPEARE *****
THIS ELECTRONIC VERSION OF THE COMPLETE WORKS OF WILLIAM
SHAKESPEARE IS COPYRIGHT 1990-1993 BY WORLD LIBRARY, INC.,
AND IS PROVIDED BY PROJECT GUTENBERG ETEXT OF
ILLINOIS BENEDICTINE COLLEGE WITH PERMISSION.
Since unlike many other Project Gutenberg-tm etexts, this etext
is copyright protected, and since the materials and methods you
use will effect the Project's reputation, your right to copy and
distribute it is limited by the copyright and other laws, and by
the conditions of this "Small Print!" statement.
1. LICENSE
A) YOU MAY (AND ARE ENCOURAGED) TO DISTRIBUTE ELECTRONIC AND
MACHINE READABLE COPIES OF THIS ETEXT, SO LONG AS SUCH COPIES
(1) ARE FOR YOUR OR OTHERS PERSONAL USE ONLY, AND (2) ARE NOT
DISTRIBUTED OR USED COMMERCIALLY. PROHIBITED COMMERCIAL
DISTRIBUTION INCLUDES BY ANY SERVICE THAT CHARGES FOR DOWNLOAD
TIME OR FOR MEMBERSHIP.
B) This license is subject to the conditions that you honor
the refund and replacement provisions of this "small print!"
statement; and that you distribute exact copies of this etext,
including this Small Print statement. Such copies can be
compressed or any proprietary form (including any form resulting
from word processing or hypertext software), so long as
*EITHER*:
(1) The etext, when displayed, is clearly readable, and does
*not* contain characters other than those intended by the
author of the work, although tilde (~), asterisk (*) and
underline (_) characters may be used to convey punctuation
intended by the author, and additional characters may be used
to indicate hypertext links; OR
(2) The etext is readily convertible by the reader at no
expense into plain ASCII, EBCDIC or equivalent form by the
program that displays the etext (as is the case, for instance,
with most word processors); OR
(3) You provide or agree to provide on request at no
additional cost, fee or expense, a copy of the etext in plain
ASCII.
2. LIMITED WARRANTY; DISCLAIMER OF DAMAGES
This etext may contain a "Defect" in the form of incomplete,
inaccurate or corrupt data, transcription errors, a copyright or
other infringement, a defective or damaged disk, computer virus,
or codes that damage or cannot be read by your equipment. But
for the "Right of Replacement or Refund" described below, the
Project (and any other party you may receive this etext from as
a PROJECT GUTENBERG-tm etext) disclaims all liability to you for
damages, costs and expenses, including legal fees, and YOU HAVE
NO REMEDIES FOR NEGLIGENCE OR UNDER STRICT LIABILITY, OR FOR
BREACH OF WARRANTY OR CONTRACT, INCLUDING BUT NOT LIMITED TO
INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES, EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGES.
If you discover a Defect in this etext within 90 days of receiv-
ing it, you can receive a refund of the money (if any) you paid
for it by sending an explanatory note within that time to the
person you received it from. If you received it on a physical
medium, you must return it with your note, and such person may
choose to alternatively give you a replacement copy. If you
received it electronically, such person may choose to
alternatively give you a second opportunity to receive it
electronically.
THIS ETEXT IS OTHERWISE PROVIDED TO YOU "AS-IS". NO OTHER
WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, ARE MADE TO YOU AS
TO THE ETEXT OR ANY MEDIUM IT MAY BE ON, INCLUDING BUT NOT
LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE. Some states do not allow disclaimers of
implied warranties or the exclusion or limitation of consequen-
tial damages, so the above disclaimers and exclusions may not
apply to you, and you may have other legal rights.
3. INDEMNITY: You will indemnify and hold the Project, its
directors, officers, members and agents harmless from all lia-
bility, cost and expense, including legal fees, that arise
directly or indirectly from any of the following that you do or
cause: [A] distribution of this etext, [B] alteration,
modification, or addition to the etext, or [C] any Defect.
4. WHAT IF YOU *WANT* TO SEND MONEY EVEN IF YOU DON'T HAVE TO?
Project Gutenberg is dedicated to increasing the number of
public domain and licensed works that can be freely distributed
in machine readable form. The Project gratefully accepts
contributions in money, time, scanning machines, OCR software,
public domain etexts, royalty free copyright licenses, and
whatever else you can think of. Money should be paid to "Pro-
ject Gutenberg Association / Illinois Benedictine College".
WRITE TO US! We can be reached at:
Internet: hart@vmd.cso.uiuc.edu
Bitnet: hart@uiucvmd
CompuServe: >internet:hart@.vmd.cso.uiuc.edu
Attmail: internet!vmd.cso.uiuc.edu!Hart
Mail: Prof. Michael Hart
P.O. Box 2782
Champaign, IL 61825
This "Small Print!" by Charles B. Kramer, Attorney
Internet (72600.2026@compuserve.com); TEL: (212-254-5093)
**** SMALL PRINT! FOR __ COMPLETE SHAKESPEARE ****
["Small Print" V.12.08.93]
<<THIS ELECTRONIC VERSION OF THE COMPLETE WORKS OF WILLIAM
SHAKESPEARE IS COPYRIGHT 1990-1993 BY WORLD LIBRARY, INC., AND IS
PROVIDED BY PROJECT GUTENBERG ETEXT OF ILLINOIS BENEDICTINE COLLEGE
WITH PERMISSION. ELECTRONIC AND MACHINE READABLE COPIES MAY BE
DISTRIBUTED SO LONG AS SUCH COPIES (1) ARE FOR YOUR OR OTHERS
PERSONAL USE ONLY, AND (2) ARE NOT DISTRIBUTED OR USED
COMMERCIALLY. PROHIBITED COMMERCIAL DISTRIBUTION INCLUDES BY ANY
SERVICE THAT CHARGES FOR DOWNLOAD TIME OR FOR MEMBERSHIP.>>
1609
THE SONNETS
by William Shakespeare
THE END
<<THIS ELECTRONIC VERSION OF THE COMPLETE WORKS OF WILLIAM
SHAKESPEARE IS COPYRIGHT 1990-1993 BY WORLD LIBRARY, INC., AND IS
PROVIDED BY PROJECT GUTENBERG ETEXT OF ILLINOIS BENEDICTINE COLLEGE
WITH PERMISSION. ELECTRONIC AND MACHINE READABLE COPIES MAY BE
DISTRIBUTED SO LONG AS SUCH COPIES (1) ARE FOR YOUR OR OTHERS
PERSONAL USE ONLY, AND (2) ARE NOT DISTRIBUTED OR USED
COMMERCIALLY. PROHIBITED COMMERCIAL DISTRIBUTION INCLUDES BY ANY
SERVICE THAT CHARGES FOR DOWNLOAD TIME OR FOR MEMBERSHIP.>>
End of this Etext of The Complete Works of William Shakespeare

View File

@@ -0,0 +1,507 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-arcadia/spark_job_on_synapse_spark_pool.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using Synapse Spark Pool as a Compute Target from Azure Machine Learning Remote Run\n",
"1. To use Synapse Spark Pool as a compute target from Experiment Run, [ScriptRunConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.script_run_config.scriptrunconfig?view=azure-ml-py) is used, the same as other Experiment Runs. This notebook demonstrates how to leverage ScriptRunConfig to submit an experiment run to an attached Synapse Spark cluster.\n",
"2. To use Synapse Spark Pool as a compute target from [Azure Machine Learning Pipeline](https://aka.ms/pl-concept), a [SynapseSparkStep](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.synapse_spark_step.synapsesparkstep?view=azure-ml-py) is used. This notebook demonstrates how to leverage SynapseSparkStep in Azure Machine Learning Pipeline.\n",
"\n",
"## Before you begin:\n",
"1. **Create an Azure Synapse workspace**, check [this] (https://docs.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-workspace) for more information.\n",
"2. **Create Spark Pool in Synapse workspace**: check [this] (https://docs.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-apache-spark-pool-portal) for more information."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure Machine Learning and Pipeline SDK-specific imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import azureml.core\n",
"from azureml.core import Workspace, Experiment\n",
"from azureml.core import LinkedService, SynapseWorkspaceLinkedServiceConfiguration\n",
"from azureml.core.compute import ComputeTarget, AmlCompute, SynapseCompute\n",
"from azureml.exceptions import ComputeTargetException\n",
"from azureml.data import HDFSOutputDatasetConfig\n",
"from azureml.core.datastore import Datastore\n",
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"from azureml.pipeline.core import Pipeline\n",
"from azureml.pipeline.steps import PythonScriptStep, SynapseSparkStep\n",
"\n",
"# Check core SDK version number\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Link Synapse workspace to AML \n",
"You have to be an \"Owner\" of Synapse workspace resource to perform linking. You can check your role in the Azure resource management portal, if you don't have an \"Owner\" role, you can contact an \"Owner\" to link the workspaces for you."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Replace with your resource info before running.\n",
"\n",
"synapse_subscription_id=os.getenv(\"SYNAPSE_SUBSCRIPTION_ID\", \"<my-synapse-subscription-id>\")\n",
"synapse_resource_group=os.getenv(\"SYNAPSE_RESOURCE_GROUP\", \"<my-synapse-resource-group>\")\n",
"synapse_workspace_name=os.getenv(\"SYNAPSE_WORKSPACE_NAME\", \"<my-synapse-workspace-name>\")\n",
"synapse_linked_service_name=os.getenv(\"SYNAPSE_LINKED_SERVICE_NAME\", \"<my-synapse-linked-service-name>\")\n",
"\n",
"synapse_link_config = SynapseWorkspaceLinkedServiceConfiguration(\n",
" subscription_id=synapse_subscription_id,\n",
" resource_group=synapse_resource_group,\n",
" name=synapse_workspace_name\n",
")\n",
"\n",
"linked_service = LinkedService.register(\n",
" workspace=ws,\n",
" name=synapse_linked_service_name,\n",
" linked_service_config=synapse_link_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Linked service property\n",
"\n",
"A MSI (system_assigned_identity_principal_id) will be generated for each linked service, for example:\n",
"\n",
"name=synapselink,</p>\n",
"type=Synapse, </p>\n",
"linked_service_resource_id=/subscriptions/4faaaf21-663f-4391-96fd-47197c630979/resourceGroups/static_resources_synapse_test/providers/Microsoft.Synapse/workspaces/synapsetest2, </p>\n",
"system_assigned_identity_principal_id=eb355d52-3806-4c5a-aec9-91447e8cfc2e </p>\n",
"\n",
"#### Make sure you grant \"Synapse Apache Spark Administrator\" role of the synapse workspace to the generated workspace linking MSI in Synapse studio portal before you submit job."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"linked_service"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"LinkedService.list(ws)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Attach Synapse spark pool as AML compute target"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"synapse_spark_pool_name=os.getenv(\"SYNAPSE_SPARK_POOL_NAME\", \"<my-synapse-spark-pool-name>\")\n",
"synapse_compute_name=os.getenv(\"SYNAPSE_COMPUTE_NAME\", \"<my-synapse-compute-name>\")\n",
"\n",
"attach_config = SynapseCompute.attach_configuration(\n",
" linked_service,\n",
" type=\"SynapseSpark\",\n",
" pool_name=synapse_spark_pool_name)\n",
"\n",
"synapse_compute=ComputeTarget.attach(\n",
" workspace=ws,\n",
" name=synapse_compute_name,\n",
" attach_configuration=attach_config)\n",
"\n",
"synapse_compute.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Start an experiment run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prepare data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use the default blob storage\n",
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
"print('Datastore {} will be used'.format(def_blob_store.name))\n",
"\n",
"# We are uploading a sample file in the local directory to be used as a datasource\n",
"file_name = \"Titanic.csv\"\n",
"def_blob_store.upload_files(files=[\"./{}\".format(file_name)], overwrite=False)\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tabular dataset as input"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Dataset\n",
"titanic_tabular_dataset = Dataset.Tabular.from_delimited_files(path=[(def_blob_store, file_name)])\n",
"input1 = titanic_tabular_dataset.as_named_input(\"tabular_input\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## File dataset as input"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Dataset\n",
"titanic_file_dataset = Dataset.File.from_files(path=[(def_blob_store, file_name)])\n",
"input2 = titanic_file_dataset.as_named_input(\"file_input\").as_hdfs()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Output config: the output will be registered as a File dataset\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.data import HDFSOutputDatasetConfig\n",
"output = HDFSOutputDatasetConfig(destination=(def_blob_store,\"test\")).register_on_complete(name=\"registered_dataset\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dataprep script"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"os.makedirs(\"code\", exist_ok=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile code/dataprep.py\n",
"import os\n",
"import sys\n",
"import azureml.core\n",
"from pyspark.sql import SparkSession\n",
"from azureml.core import Run, Dataset\n",
"\n",
"print(azureml.core.VERSION)\n",
"print(os.environ)\n",
"\n",
"import argparse\n",
"parser = argparse.ArgumentParser()\n",
"parser.add_argument(\"--tabular_input\")\n",
"parser.add_argument(\"--file_input\")\n",
"parser.add_argument(\"--output_dir\")\n",
"args = parser.parse_args()\n",
"\n",
"# use dataset sdk to read tabular dataset\n",
"run_context = Run.get_context()\n",
"dataset = Dataset.get_by_id(run_context.experiment.workspace,id=args.tabular_input)\n",
"sdf = dataset.to_spark_dataframe()\n",
"sdf.show()\n",
"\n",
"# use hdfs path to read file dataset\n",
"spark= SparkSession.builder.getOrCreate()\n",
"sdf = spark.read.option(\"header\", \"true\").csv(args.file_input)\n",
"sdf.show()\n",
"\n",
"sdf.coalesce(1).write\\\n",
".option(\"header\", \"true\")\\\n",
".mode(\"append\")\\\n",
".csv(args.output_dir)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set up Conda dependency for the following Script Run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.environment import CondaDependencies\n",
"conda_dep = CondaDependencies()\n",
"conda_dep.add_pip_package(\"azureml-core==1.20.0\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to leverage ScriptRunConfig to submit an experiment run to an attached Synapse Spark cluster"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import RunConfiguration\n",
"from azureml.core import ScriptRunConfig \n",
"from azureml.core import Experiment\n",
"\n",
"run_config = RunConfiguration(framework=\"pyspark\")\n",
"run_config.target = synapse_compute_name\n",
"\n",
"run_config.spark.configuration[\"spark.driver.memory\"] = \"1g\" \n",
"run_config.spark.configuration[\"spark.driver.cores\"] = 2 \n",
"run_config.spark.configuration[\"spark.executor.memory\"] = \"1g\" \n",
"run_config.spark.configuration[\"spark.executor.cores\"] = 1 \n",
"run_config.spark.configuration[\"spark.executor.instances\"] = 1 \n",
"\n",
"run_config.environment.python.conda_dependencies = conda_dep\n",
"\n",
"script_run_config = ScriptRunConfig(source_directory = './code',\n",
" script= 'dataprep.py',\n",
" arguments = [\"--tabular_input\", input1, \n",
" \"--file_input\", input2,\n",
" \"--output_dir\", output],\n",
" run_config = run_config) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment \n",
"exp = Experiment(workspace=ws, name=\"synapse-spark\") \n",
"run = exp.submit(config=script_run_config) \n",
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to leverage SynapseSparkStep in an AML pipeline to orchestrate data prep step on Synapse Spark and training step on AzureML compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpucluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" max_nodes=1)\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
"cpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile code/train.py\n",
"import glob\n",
"import os\n",
"import sys\n",
"from os import listdir\n",
"from os.path import isfile, join\n",
"\n",
"mypath = os.environ[\"step2_input\"]\n",
"files = [f for f in listdir(mypath) if isfile(join(mypath, f))]\n",
"for file in files:\n",
" with open(join(mypath,file)) as f:\n",
" print(f.read())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"titanic_tabular_dataset = Dataset.Tabular.from_delimited_files(path=[(def_blob_store, file_name)])\n",
"titanic_file_dataset = Dataset.File.from_files(path=[(def_blob_store, file_name)])\n",
"\n",
"step1_input1 = titanic_tabular_dataset.as_named_input(\"tabular_input\")\n",
"step1_input2 = titanic_file_dataset.as_named_input(\"file_input\").as_hdfs()\n",
"step1_output = HDFSOutputDatasetConfig(destination=(def_blob_store,\"test\")).register_on_complete(name=\"registered_dataset\")\n",
"\n",
"step2_input = step1_output.as_input(\"step2_input\").as_download()\n",
"\n",
"\n",
"from azureml.core.environment import Environment\n",
"env = Environment(name=\"myenv\")\n",
"env.python.conda_dependencies.add_pip_package(\"azureml-core==1.20.0\")\n",
"\n",
"step_1 = SynapseSparkStep(name = 'synapse-spark',\n",
" file = 'dataprep.py',\n",
" source_directory=\"./code\", \n",
" inputs=[step1_input1, step1_input2],\n",
" outputs=[step1_output],\n",
" arguments = [\"--tabular_input\", step1_input1, \n",
" \"--file_input\", step1_input2,\n",
" \"--output_dir\", step1_output],\n",
" compute_target = synapse_compute_name,\n",
" driver_memory = \"7g\",\n",
" driver_cores = 4,\n",
" executor_memory = \"7g\",\n",
" executor_cores = 2,\n",
" num_executors = 1,\n",
" environment = env)\n",
"\n",
"step_2 = PythonScriptStep(script_name=\"train.py\",\n",
" arguments=[step2_input],\n",
" inputs=[step2_input],\n",
" compute_target=cpu_cluster_name,\n",
" source_directory=\"./code\",\n",
" allow_reuse=False)\n",
"\n",
"pipeline = Pipeline(workspace=ws, steps=[step_1, step_2])\n",
"pipeline_run = pipeline.submit('synapse-pipeline', regenerate_outputs=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "yunzhan"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"nteract": {
"version": "0.28.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,327 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-arcadia/spark_session_on_synapse_spark_pool.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Interactive Spark Session on Synapse Spark Pool"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Install package"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install -U \"azureml-synapse\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For JupyterLab, please additionally run:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!jupyter lab build --minimize=False"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## PLEASE restart kernel and then refresh web page before starting spark session."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0. How to leverage Spark Magic for interactive Spark experience"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2020-06-05T03:22:14.965395Z",
"iopub.status.busy": "2020-06-05T03:22:14.965395Z",
"iopub.status.idle": "2020-06-05T03:22:14.970398Z",
"shell.execute_reply": "2020-06-05T03:22:14.969397Z",
"shell.execute_reply.started": "2020-06-05T03:22:14.965395Z"
}
},
"outputs": [],
"source": [
"# show help\n",
"%synapse ?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Start Synapse Session"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"synapse_compute_name=os.getenv(\"SYNAPSE_COMPUTE_NAME\", \"<my-synapse-compute-name>\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# use Synapse compute linked to the Compute Instance's workspace with an aml envrionment.\n",
"# conda dependencies specified in the environment will be installed before the spark session started.\n",
"\n",
"%synapse start -c $synapse_compute_name -e AzureML-Minimal"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# use Synapse compute from anther workspace via its config file\n",
"\n",
"# %synapse start -c <compute-name> -f config.json"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# use Synapse compute from anther workspace via subscription_id, resource_group and workspace_name\n",
"\n",
"# %synapse start -c <compute-name> -s <subscription-id> -r <resource group> -w <workspace-name>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# start a spark session with an AML environment, \n",
"# %synapse start -c <compute-name> -s <subscription-id> -r <resource group> -w <workspace-name> -e AzureML-Minimal"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data prepration\n",
"\n",
"Three types of datastore are supported in synapse spark, and you have two ways to load the data.\n",
"\n",
"\n",
"| Datastore Type | Data Acess |\n",
"|--------------------|-------------------------------|\n",
"| Blob | Credential |\n",
"| Adlsgen1 | Credential & Credential-less |\n",
"| Adlsgen2 | Credential & Credential-less |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example 1: Data loading by HDFS path"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Read data from Blob**\n",
"\n",
"```python\n",
"# setup access key or sas token\n",
"\n",
"sc._jsc.hadoopConfiguration().set(\"fs.azure.account.key.<storage account name>.blob.core.windows.net\", \"<acess key>\")\n",
"sc._jsc.hadoopConfiguration().set(\"fs.azure.sas.<container name>.<storage account name>.blob.core.windows.net\", \"sas token\")\n",
"\n",
"df = spark.read.parquet(\"wasbs://<container name>@<storage account name>.blob.core.windows.net/<path>\")\n",
"```\n",
"\n",
"**Read data from Adlsgen1**\n",
"\n",
"```python\n",
"# setup service pricinpal which has access of the data\n",
"# If no data Credential is setup, the user identity will be used to do access control\n",
"\n",
"sc._jsc.hadoopConfiguration().set(\"fs.adl.account.<storage account name>.oauth2.access.token.provider.type\",\"ClientCredential\")\n",
"sc._jsc.hadoopConfiguration().set(\"fs.adl.account.<storage account name>.oauth2.client.id\", \"<client id>\")\n",
"sc._jsc.hadoopConfiguration().set(\"fs.adl.account.<storage account name>.oauth2.credential\", \"<client secret>\")\n",
"sc._jsc.hadoopConfiguration().set(\"fs.adl.account.<storage account name>.oauth2.refresh.url\", \"https://login.microsoftonline.com/<tenant id>/oauth2/token\")\n",
"\n",
"df = spark.read.csv(\"adl://<storage account name>.azuredatalakestore.net/<path>\")\n",
"```\n",
"\n",
"**Read data from Adlsgen2**\n",
"\n",
"```python\n",
"# setup service pricinpal which has access of the data\n",
"# If no data Credential is setup, the user identity will be used to do access control\n",
"\n",
"sc._jsc.hadoopConfiguration().set(\"fs.azure.account.auth.type.<storage account name>.dfs.core.windows.net\",\"OAuth\")\n",
"sc._jsc.hadoopConfiguration().set(\"fs.azure.account.oauth.provider.type.<storage account name>.dfs.core.windows.net\", \"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider\")\n",
"sc._jsc.hadoopConfiguration().set(\"fs.azure.account.oauth2.client.id.<storage account name>.dfs.core.windows.net\", \"<client id>\")\n",
"sc._jsc.hadoopConfiguration().set(\"fs.azure.account.oauth2.client.secret.<storage account name>.dfs.core.windows.net\", \"<client secret>\")\n",
"sc._jsc.hadoopConfiguration().set(\"fs.azure.account.oauth2.client.endpoint.<storage account name>.dfs.core.windows.net\", \"https://login.microsoftonline.com/<tenant id>/oauth2/token\")\n",
"\n",
"df = spark.read.csv(\"abfss://<container name>@<storage account>.dfs.core.windows.net/<path>\")\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2020-06-04T08:11:18.812276Z",
"iopub.status.busy": "2020-06-04T08:11:18.812276Z",
"iopub.status.idle": "2020-06-04T08:11:23.854526Z",
"shell.execute_reply": "2020-06-04T08:11:23.853525Z",
"shell.execute_reply.started": "2020-06-04T08:11:18.812276Z"
}
},
"outputs": [],
"source": [
"%%synapse\n",
"\n",
"from pyspark.sql.functions import col, desc\n",
"\n",
"df = spark.read.option(\"header\", \"true\").csv(\"wasbs://demo@dprepdata.blob.core.windows.net/Titanic.csv\")\n",
"df.filter(col('Survived') == 1).groupBy('Age').count().orderBy(desc('count')).show(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example 2: Data loading by AML Dataset\n",
"\n",
"You can create tabular data by following the [guidance](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets) and use to_spark_dataframe() to load the data.\n",
"\n",
"```text\n",
"%%synapse\n",
"\n",
"import azureml.core\n",
"print(azureml.core.VERSION)\n",
"\n",
"from azureml.core import Workspace, Dataset\n",
"ws = Workspace.get(name='<workspace name>', subscription_id='<subscription id>', resource_group='<resource group>')\n",
"ds = Dataset.get_by_name(ws, \"<tabular dataset name>\")\n",
"df = ds.to_spark_dataframe()\n",
"\n",
"# You can do more data transformation on spark dataframe\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Session Metadata\n",
"After session started, you can check the session's metadata, find the links to Synapse portal."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%synapse meta"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Stop Session\n",
"When current session reach the status timeout, dead or any failure, you must explicitly stop it before start new one. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%synapse stop"
]
}
],
"metadata": {
"authors": [
{
"name": "yunzhan"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"nteract": {
"version": "0.28.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,18 @@
from pyspark.sql import SparkSession
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--input", default="")
parser.add_argument("--output", default="")
args, unparsed = parser.parse_known_args()
spark = SparkSession.builder.getOrCreate()
sc = spark.sparkContext
arr = sc._gateway.new_array(sc._jvm.java.lang.String, 2)
arr[0] = args.input
arr[1] = args.output
obj = sc._jvm.WordCount
obj.main(arr)

View File

@@ -0,0 +1,6 @@
name: multi-model-register-and-deploy
dependencies:
- pip:
- azureml-sdk
- numpy
- scikit-learn

View File

@@ -0,0 +1,6 @@
name: model-register-and-deploy
dependencies:
- pip:
- azureml-sdk
- numpy
- scikit-learn

View File

@@ -157,7 +157,9 @@
"metadata": {},
"source": [
"## Provision the AKS Cluster\n",
"If you already have an AKS cluster attached to this workspace, skip the step below and provide the name of the cluster."
"If you already have an AKS cluster attached to this workspace, skip the step below and provide the name of the cluster.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
]
},
{

View File

@@ -0,0 +1,4 @@
name: deploy-aks-with-controlled-rollout
dependencies:
- pip:
- azureml-sdk

View File

@@ -267,7 +267,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create AKS compute if you haven't done so."
"### Create AKS compute if you haven't done so.\n",
"\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
]
},
{

View File

@@ -0,0 +1,4 @@
name: enable-app-insights-in-production-service
dependencies:
- pip:
- azureml-sdk

View File

@@ -94,6 +94,17 @@ def main():
os.makedirs(output_dir, exist_ok=True)
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
# Use Azure Open Datasets for MNIST dataset
datasets.MNIST.resources = [
("https://azureopendatastorage.azurefd.net/mnist/train-images-idx3-ubyte.gz",
"f68b3c2dcbeaaa9fbdd348bbdeb94873"),
("https://azureopendatastorage.azurefd.net/mnist/train-labels-idx1-ubyte.gz",
"d53e105ee54ea40749a09fcbcd1e9432"),
("https://azureopendatastorage.azurefd.net/mnist/t10k-images-idx3-ubyte.gz",
"9fb629c4189551a2d022fa330f9573f3"),
("https://azureopendatastorage.azurefd.net/mnist/t10k-labels-idx1-ubyte.gz",
"ec29112dd5afa0611ce80d1b7f02629c")
]
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=True, download=True,
transform=transforms.Compose([transforms.ToTensor(),

View File

@@ -0,0 +1,8 @@
name: onnx-convert-aml-deploy-tinyyolo
dependencies:
- pip:
- azureml-sdk
- numpy
- git+https://github.com/apple/coremltools@v2.1
- onnx<1.7.0
- onnxmltools

View File

@@ -0,0 +1,9 @@
name: onnx-inference-facial-expression-recognition-deploy
dependencies:
- pip:
- azureml-sdk
- azureml-widgets
- matplotlib
- numpy
- onnx<1.7.0
- opencv-python-headless

View File

@@ -0,0 +1,9 @@
name: onnx-inference-mnist-deploy
dependencies:
- pip:
- azureml-sdk
- azureml-widgets
- matplotlib
- numpy
- onnx<1.7.0
- opencv-python-headless

View File

@@ -0,0 +1,4 @@
name: onnx-model-register-and-deploy
dependencies:
- pip:
- azureml-sdk

View File

@@ -0,0 +1,4 @@
name: onnx-modelzoo-aml-deploy-resnet50
dependencies:
- pip:
- azureml-sdk

View File

@@ -1,7 +1,5 @@
name: day1-part4-data
name: onnx-train-pytorch-aml-deploy-mnist
dependencies:
- pip:
- azureml-sdk
- azureml-widgets
- pytorch
- torchvision

Some files were not shown because too many files have changed in this diff Show More