Tasks Download the following dataset: mpg-new.xlsx Download mpg-new.xlsx This dataset is a subset of the fuel economy data provided by the EPA, accessible through fueleconomy.gov. It comprises 38 popular car models spanning from 1999 to 2008. Each entry includes detailed information about specific car models, including manufacturer, displacement, city MPG, highway MPG, and more. Utilize Excel to analyze the descriptive statistics of the variables. Answer the following questions. 1. Data Understanding: Open the mpg dataset in Excel and answer the following questions: What does each observation represent? How many variables are there? Which data attributes are categorical, and which are numeric? 2. Data Preprocessing: Check for duplicate and missing data. Are there any duplicate rows? Are there any missing values? Propose solutions for handling missing data. 3. Data Enrichment: Create a new variable called "mpg" that represents the average of city ("cty") and highway ("hwy") miles per gallon. 4. Understanding numerical variables: Calculate and present descriptive statistics (mean, median, range, variance, and standard deviation) for the numeric variable: "mpg" Compare and explain the mean and standard deviation. 5. Understanding categorical variables: What are the unique values of drive train type (drv)? What is the mode for "drv" variable? Create bar plots to illustrate the distribution of "drv". Compare the distribution of "drv" in 1999 and 2008. summarize the difference. 6. Box Plots for numeric variables: Use a box plot to show the summary distribution of numeric variables: "mpg" Report key statistics(Q1, median, Q3, max, min) displayed in the box plot. What is "mpg" range of the middle 50% of cars in the dataset? Box plots by year: Use a box plot to show the distribution of "mpg" variable in 1999 and 2008. summarize the difference. Box Plots by Classes: Use a box plot to show the distribution of "mpg" variable in different classes. summarize the difference. 7. Histogram for numeric variables: Use Histogram to show the detailed distribution of numeric variables: "mpg" Explore different bin width and discuss what is a proper bin width. Use a bin width of 4, how many models fall into the common range
Tasks
Download the following dataset:
mpg-new.xlsx Download mpg-new.xlsx
This dataset is a subset of the fuel economy data provided by the EPA, accessible through fueleconomy.gov. It comprises 38 popular car models spanning from 1999 to 2008. Each entry includes detailed information about specific car models, including manufacturer, displacement, city MPG, highway MPG, and more.
Utilize Excel to analyze the descriptive statistics of the variables.
Answer the following questions.
1. Data Understanding:
Open the mpg dataset in Excel and answer the following questions:
- What does each observation represent?
- How many variables are there?
- Which data attributes are categorical, and which are numeric?
2. Data Preprocessing:
- Check for duplicate and missing data. Are there any duplicate rows? Are there any missing values?
- Propose solutions for handling missing data.
3. Data Enrichment:
- Create a new variable called "mpg" that represents the average of city ("cty") and highway ("hwy") miles per gallon.
4. Understanding numerical variables:
- Calculate and present descriptive statistics (mean, median, range, variance, and standard deviation) for the numeric variable: "mpg"
- Compare and explain the mean and standard deviation.
5. Understanding categorical variables:
- What are the unique values of drive train type (drv)? What is the mode for "drv" variable?
- Create bar plots to illustrate the distribution of "drv".
- Compare the distribution of "drv" in 1999 and 2008. summarize the difference.
6. Box Plots for numeric variables:
- Use a box plot to show the summary distribution of numeric variables: "mpg"
- Report key statistics(Q1, median, Q3, max, min) displayed in the box plot.
- What is "mpg" range of the middle 50% of cars in the dataset?
- Box plots by year: Use a box plot to show the distribution of "mpg" variable in 1999 and 2008. summarize the difference.
-
Box Plots by Classes: Use a box plot to show the distribution of "mpg" variable in different classes. summarize the difference.
7. Histogram for numeric variables:
- Use Histogram to show the detailed distribution of numeric variables: "mpg"
- Explore different bin width and discuss what is a proper bin width.
- Use a bin width of 4, how many models fall into the common range
Unlock instant AI solutions
Tap the button
to generate a solution