mirror of https://github.com/freeCodeCamp/freeCodeCamp.git synced 2026-04-12 10:00:39 -04:00

Files

freeCodeCamp's Camper Bot cc87f4455d chore(i18n,learn): processed translations (#54077 )

Co-authored-by: Naomi Carrigan <nhcarrigan@gmail.com>

2024-03-25 16:31:40 +00:00

6.7 KiB

Raw Blame History

id, title, challengeType, forumTopicId, dashedName

id	title	challengeType	forumTopicId	dashedName
5e46f7f8ac417301a38fb92a	医疗数据可视化工具	10	462368	medical-data-visualizer

--description--

You will be working on this project with our Gitpod starter code.

我们仍在开发 Python 课程的交互式教学部分。目前，你可以在 YouTube 上通过 freeCodeCamp.org 上传的一些视频学习这个项目相关的知识。

每个人视频课程的 Python (14小时)
如何使用 Python Pandas 分析数据（10 小时）

--instructions--

In this project, you will visualize and make calculations from medical examination data using matplotlib, seaborn, and pandas. 数据集的数值是从体检中收集的。

数据说明

数据集中的行代表患者，列代表身体测量、各种血液检查的结果和生活方式等信息。您将使用该数据集来探索心脏病、身体测量数据、血液标志物和对生活方式的选择之间的关系。

文件名：medical_examination.csv

项目	变量类型	变量名	变量值类型
年龄	客观特征	`age`	int (days)
身高	客观特征	`height`	int (cm)
体重	客观特征	`weight`	float (kg)
性别	客观特征	`gender`	分类编码
收缩压	检测特征	`ap_hi`	int
舒张压	检测特征	`ap_lo`	int
胆固醇	检测特征	`cholesterol`	1：正常，2：高于正常，3：远远高于正常值
血糖值	检测特征	`gluc`	1：正常，2：高于正常，3：远远高于正常值
吸烟问题	主观特征	`smoke`	binary
饮酒量	主观特征	`alco`	binary
体育活动	主观特征	`active`	binary
是否有心血管疾病	目标变量	`cardio`	binary

任务

Create a chart similar to examples/Figure_1.png, where we show the counts of good and bad outcomes for the cholesterol, gluc, alco, active, and smoke variables for patients with cardio=1 and cardio=0 in different panels.

在 medical_data_visualizer.py 中使用数据完成以下任务：

给数据添加一列 overweight。要确定一个人是否超重，首先通过将他们的体重（公斤）除以他们的身高（米）的平方来计算他们的 BMI。如果该值是 > 25，则此人超重。 Use the value 0 for NOT overweight and the value 1 for overweight.
Normalize the data by making 0 always good and 1 always bad. If the value of cholesterol or gluc is 1, make the value 0. If the value is more than 1, make the value 1.
Convert the data into long format and create a chart that shows the value counts of the categorical features using seaborn's catplot(). The dataset should be split by Cardio so there is one chart for each cardio value. 该图表应该看起来像 examples/Figure_1.png。
清理数据。过滤掉以下代表不正确数据的患者段：
- 舒张压高于收缩压（使用 (df['ap_lo'] <= df['ap_hi']) 保留正确的数据）
- 高度小于第 2.5 个百分位数（使用 (df['height'] >= df['height'].quantile(0.025)) 保留正确的数据）
- 身高超过第 97.5 个百分位
- 体重小于第 2.5 个百分位
- 体重超过第 97.5 个百分位
使用数据集创建相关矩阵。 Plot the correlation matrix using seaborn's heatmap(). 遮罩上三角。该图表应类似于 examples/Figure_2.png。

每当变量设置为 None 时，请确保将其设置为正确的代码。

Unit tests are written for you under test_module.py.

Instructions

By each number in the medical_data_visualizer.py file, add the code from the associated instruction number below.

Import the data from medical_examination.csv and assign it to the df variable
Create the overweight column in the df variable
Normalize data by making 0 always good and 1 always bad. If the value of cholesterol or gluc is 1, set the value to 0. If the value is more than 1, set the value to 1.
Draw the Categorical Plot in the draw_cat_plot function
Create a DataFrame for the cat plot using pd.melt with values from cholesterol, gluc, smoke, alco, active, and overweight in the df_cat variable.
Group and reformat the data in df_cat to split it by cardio. Show the counts of each feature. You will have to rename one of the columns for the catplot to work correctly.
Convert the data into long format and create a chart that shows the value counts of the categorical features using the following method provided by the seaborn library import : sns.catplot()
Get the figure for the output and store it in the fig variable
Do not modify the next two lines
Draw the Heat Map in the draw_heat_map function
Clean the data in the df_heat variable by filtering out the following patient segments that represent incorrect data:
- height is less than the 2.5th percentile (Keep the correct data with (df['height'] >= df['height'].quantile(0.025)))
- height is more than the 97.5th percentile
- weight is less than the 2.5th percentile
- weight is more than the 97.5th percentile
Calculate the correlation matrix and store it in the corr variable
Generate a mask for the upper triangle and store it in the mask variable
Set up the matplotlib figure
Plot the correlation matrix using the method provided by the seaborn library import: sns.heatmap()
Do not modify the next two lines

开发

Write your code in medical_data_visualizer.py. For development, you can use main.py to test your code.

测试

The unit tests for this project are in test_module.py. 为了你的方便，我们将测试从 test_module.py 导入到 main.py。

提交

复制项目的 URL 并将其提交给 freeCodeCamp。

--hints--

它应该通过所有的 Python 测试。

--solutions--

  # Python challenges don't need solutions,
  # because they would need to be tested against a full working project.
  # Please check our contributing guidelines to learn more.

6.7 KiB Raw Blame History Unescape Escape