6.7 KiB
id, title, challengeType, forumTopicId, dashedName
| id | title | challengeType | forumTopicId | dashedName |
|---|---|---|---|---|
| 5e46f7f8ac417301a38fb92a | 医疗数据可视化工具 | 10 | 462368 | medical-data-visualizer |
--description--
You will be working on this project with our Gitpod starter code.
我们仍在开发 Python 课程的交互式教学部分。 目前,你可以在 YouTube 上通过 freeCodeCamp.org 上传的一些视频学习这个项目相关的知识。
-
每个人视频课程的 Python (14小时)
-
如何使用 Python Pandas 分析数据(10 小时)
--instructions--
In this project, you will visualize and make calculations from medical examination data using matplotlib, seaborn, and pandas. 数据集的数值是从体检中收集的。
数据说明
数据集中的行代表患者,列代表身体测量、各种血液检查的结果和生活方式等信息。 您将使用该数据集来探索心脏病、身体测量数据、血液标志物和对生活方式的选择之间的关系。
文件名:medical_examination.csv
| 项目 | 变量类型 | 变量名 | 变量值类型 |
|---|---|---|---|
| 年龄 | 客观特征 | age |
int (days) |
| 身高 | 客观特征 | height |
int (cm) |
| 体重 | 客观特征 | weight |
float (kg) |
| 性别 | 客观特征 | gender |
分类编码 |
| 收缩压 | 检测特征 | ap_hi |
int |
| 舒张压 | 检测特征 | ap_lo |
int |
| 胆固醇 | 检测特征 | cholesterol |
1:正常,2:高于正常,3:远远高于正常值 |
| 血糖值 | 检测特征 | gluc |
1:正常,2:高于正常,3:远远高于正常值 |
| 吸烟问题 | 主观特征 | smoke |
binary |
| 饮酒量 | 主观特征 | alco |
binary |
| 体育活动 | 主观特征 | active |
binary |
| 是否有心血管疾病 | 目标变量 | cardio |
binary |
任务
Create a chart similar to examples/Figure_1.png, where we show the counts of good and bad outcomes for the cholesterol, gluc, alco, active, and smoke variables for patients with cardio=1 and cardio=0 in different panels.
在 medical_data_visualizer.py 中使用数据完成以下任务:
- 给数据添加一列
overweight。 要确定一个人是否超重,首先通过将他们的体重(公斤)除以他们的身高(米)的平方来计算他们的 BMI。 如果该值是 > 25,则此人超重。 Use the value0for NOT overweight and the value1for overweight. - Normalize the data by making
0always good and1always bad. If the value ofcholesterolorglucis1, make the value0. If the value is more than1, make the value1. - Convert the data into long format and create a chart that shows the value counts of the categorical features using
seaborn'scatplot(). The dataset should be split byCardioso there is one chart for eachcardiovalue. 该图表应该看起来像examples/Figure_1.png。 - 清理数据。 过滤掉以下代表不正确数据的患者段:
- 舒张压高于收缩压(使用
(df['ap_lo'] <= df['ap_hi'])保留正确的数据) - 高度小于第 2.5 个百分位数(使用
(df['height'] >= df['height'].quantile(0.025))保留正确的数据) - 身高超过第 97.5 个百分位
- 体重小于第 2.5 个百分位
- 体重超过第 97.5 个百分位
- 舒张压高于收缩压(使用
- 使用数据集创建相关矩阵。 Plot the correlation matrix using
seaborn'sheatmap(). 遮罩上三角。 该图表应类似于examples/Figure_2.png。
每当变量设置为 None 时,请确保将其设置为正确的代码。
Unit tests are written for you under test_module.py.
Instructions
By each number in the medical_data_visualizer.py file, add the code from the associated instruction number below.
- Import the data from
medical_examination.csvand assign it to thedfvariable - Create the
overweightcolumn in thedfvariable - Normalize data by making
0always good and1always bad. If the value ofcholesterolorglucis 1, set the value to0. If the value is more than1, set the value to1. - Draw the Categorical Plot in the
draw_cat_plotfunction - Create a DataFrame for the cat plot using
pd.meltwith values fromcholesterol,gluc,smoke,alco,active, andoverweightin thedf_catvariable. - Group and reformat the data in
df_catto split it bycardio. Show the counts of each feature. You will have to rename one of the columns for thecatplotto work correctly. - Convert the data into
longformat and create a chart that shows the value counts of the categorical features using the following method provided by the seaborn library import :sns.catplot() - Get the figure for the output and store it in the
figvariable - Do not modify the next two lines
- Draw the Heat Map in the
draw_heat_mapfunction - Clean the data in the
df_heatvariable by filtering out the following patient segments that represent incorrect data:- height is less than the 2.5th percentile (Keep the correct data with
(df['height'] >= df['height'].quantile(0.025))) - height is more than the 97.5th percentile
- weight is less than the 2.5th percentile
- weight is more than the 97.5th percentile
- height is less than the 2.5th percentile (Keep the correct data with
- Calculate the correlation matrix and store it in the
corrvariable - Generate a mask for the upper triangle and store it in the
maskvariable - Set up the
matplotlibfigure - Plot the correlation matrix using the method provided by the
seabornlibrary import:sns.heatmap() - Do not modify the next two lines
开发
Write your code in medical_data_visualizer.py. For development, you can use main.py to test your code.
测试
The unit tests for this project are in test_module.py. 为了你的方便,我们将测试从 test_module.py 导入到 main.py。
提交
复制项目的 URL 并将其提交给 freeCodeCamp。
--hints--
它应该通过所有的 Python 测试。
--solutions--
# Python challenges don't need solutions,
# because they would need to be tested against a full working project.
# Please check our contributing guidelines to learn more.