Hi! Can you create functions to visualize splines and smothing on the female RAVDESS dataset? -Consider the data of your gender of the dataset RAVDESS.
Explore the data and represent your data as functions, choosing a suitable basis -: Try to find the ‘right smoothing’ to fit your data on facial videos from the RAVDESS
dataset. I’ll send you the two lectures you can check and follow to create the functions on the female dataset. PLEASE COMMENT ALL THE STEPS Thank you very much!
Category: R
-
Visualizing Splines and Smoothing on the Female RAVDESS Dataset Title: “Exploring Facial Expressions through Splines and Smoothing on the Female RAVDESS Dataset” In this assignment, we will be exploring the facial
-
Title: Exploring Associations and Correlations in a Dataset
Choose one dataset , select two qualitative variables and two quantitative variables. Explain why you selected these variables. Analysis: For your qualitative variables, create a contingency table and calculate the association between them. For your quantitative variables, calculate the correlation between them. Include scatter plot to visually represent this relationship. Interpretation: Explain your findings. What does the association or correlation say about the relationship between your variables? Is the relationship strong, weak, positive, negative, or nonexistent? Reflection: Reflect on the importance of understanding associations and correlations in data analysis and how they can guide further data investigation. Submission Format: Your submission should be a maximum of 500-600 words. Submit your assignment in APA format as a Word document or a PDF file. Include both your written analysis and any visualizations or tables that support your findings. If you use any software for your calculations (like R, Python, Excel or RapidMiner), please include your code or formulas as well.
-
Title: Comparison of Multiple Groups via ANOVA: A Study on Nutrition, Age, and Alcohol Consumption
Comparison of Multiple Groups via ANOVA1) Download theNutrion study data and read it into R-Studio. We will work with the entire data set for this assignment. Use the IFELSE( ) function to create 2 new categorical variables. The variable should be defined as:
Age_Cat = 1 if Age <=19 2 if 20<= Age<=29 3 if 30<=Age<=39 4 if 40<=Age<=49 5 if 50<=Age<=59 6 if 60<=Age<=69 7 if Age>=70
and, Alcohol_Cat = 0 if Alcohol=0
1 if 0=10
If you have trouble using the IFELSE( ) function in R, you could create these new categorical variables in EXCEL, and then just read them into R with the dataset. It works either way.
Report the counts for each value of these 2 new categorical variables.
2) Using the variable Quetelet as the dependent response variable (Y), specify the null and alternative hypotheses and conduct a oneway ANOVA F-test to check for mean differences on the levels of Age_Cat variable, and a separate ANOVA for the Alcohol_Cat variable. Interpret the two hypothesis tests. What do you conclude? If you have a statistically significant result at the alpha=0.05 level, then you must follow up the significant ANOVA with a post hoc analysis. At this point, use 95% Confidence Intervals for each group to determine if there are group mean differences and where they occur. Discuss your findings. 3) Now, using the Calories variable as the dependent response variable (Y), conduct similar ANOVA hypothesis tests and obtain confidence intervals for each group to determine if there are group mean differences relative to Age_Cat and Alcohol_Cat. You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. 4) For the FAT, FIBER, and CHOLESTEROL variables, use a 95% confidence interval approach to compare groups, on average, for Age_Cat and Alcohol_Cat. Interpretthe confidence intervals. Use whatever outside information you can obtain to help interpret the results.
5) With the results from this additional analysis, how has the story description from Modeling Assigment #3 changed? You are welcome to bring in information from your prior knowledge and experience to embellish this story. Is the analysis sufficient so far for your story, or is something missing? What should be done next? Write up your synthesis description of what this data set seems to be saying (up to this point) and where we should go from here. -
Title: Predicting Used Car Prices with Neural Networks: A Study on the Impact of Hidden Layers and Nodes
Car Sales. Consider the data on used cars (mlba::ToyotaCorolla ) with 1436 records and details on 38 variables, including Price, Age, KM, HP, and other specifications. The goal is to predict the price of a used Toyota Corolla based on its specifications.
Use predictors Age_08_04, KM, Fuel_Type, HP, Automatic, Doors, Quarterly_Tax, Mfr_Guarantee, Guarantee_Period, Airco, Automatic_airco, CD_Player, Powered_Windows, Sport_Model, and Tow_Bar.
To ensure everyone gets the same results, use the following code to convert categorical predictors to dummies, create training and holdout data sets, and normalize the training set and holdout set. Note the holdout set is normalized by using the training set.
# load the data and preprocess
toyota.df <- mlba::ToyotaCorolla toyota.df <- mlba::ToyotaCorolla %>%
mutate(
Fuel_Type_CNG = ifelse(Fuel_Type == “CNG”, 1, 0),
Fuel_Type_Diesel = ifelse(Fuel_Type == “Diesel”, 1, 0)
)
# partition
set.seed(1)
idx <- createDataPartition(toyota.df$Price, p=0.6, list=FALSE) train.df <- toyota.df[idx, ] holdout.df <- toyota.df[-idx, ] #Normalize the dataset. Use the training set to determine the normalization. normalizer <- preProcess(train.df, method="range") train.norm.df <- predict(normalizer, train.df) holdout.norm.df <- predict(normalizer, holdout.df) Fit a neural network model to the data. Use a single hidden layer with two nodes. Record the RMS error for the training data and the holdout data. Repeat the process, changing the number of hidden layers and nodes to single layer with 5 nodes, and two layers, 5 nodes in each layer. What happens to the RMS error for the training data as the number of layers and nodes increases? What happens to the RMS error for the holdout data? Comment on the appropriate number of layers and nodes for this application. -
Exploring Relationships in the Nutrition Study Data
1) Download the Nutrition study data and read it into R-Studio. We will work with the entire data set for this assignment. Use the IFELSE( ) function to create 2 new categorical variables. The variables are:
Alcohol_Use: 1 (yes) if Alcohol > 0
0 (no) if Alcohol=0
Age_retired: 1 if Age >= 65
0 if Age < 65 If you have trouble using the IFELSE( ) function in R, you could create these new categorical variables in EXCEL, and then just read them into R with the dataset. It works either way. Report the counts for each value of these 2 new categorical variables. 2) For this problem, we are going to see of smoking (SMOKE) is related to body mass (QUETELET). Here, Quetelet is the continuous dependent response variable (Y) and Smoke (X) is the categorical explanatory variable. Please complete the following: a) Obtain descriptive statistics on Y for each group. In a table report each group's sample size, mean, standard deviation, and variance. b) Clearly state the null and alternative hypotheses in words and symbols. c) Use R to obtain the test statistic and p-value for the classic pooled variance two sample T-test. Report the test statistic and p-value, and then state the decision to be made. d) Report the formula for the test statistic in part c) and verify the computer's computations using the descriptive statistics from part a). e) Calculate and report confidence intervals for both groups. Discuss the interpretation of the result based on confidence intervals. Is it consistent with the hypothesis test result? If they are different, which should you believe? 3) Moving into a more data analytic framework, then next question would be are there any 2 group categorical variables that exhibit differences relative to the Quetelet variable? Reframing this as more of a direction for an assignment - Using the variable Quetelet as the dependent response variable (Y), conduct hypothesis tests and obtain confidence intervals (for each group) to determine if there are group mean differences relative to the categorical variables: Gender (male vs female) Age_retired Alcohol_use You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. Please use tables to summarize your findings. What decisions do you make from these results? How would you summarize the "story" that emerges from these analyses on the Body Mass Quetelet variable? 4) Using the CHOLESTEROL variable as the dependent response variable (Y), conduct hypothesis tests and obtain confidence intervals (for each group) to determine if there are group mean differences relative to: Gender (male vs female) Smoke Age_retired Alcohol_use You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. How would you summarize the "story" that emerges from these analyses on the CHOLESTEROL variable? 5) Typically, in an open ended data analytic project, the analyst would look to see whether any of the potential response variables are related to the explanatory categorical variables of interest. To limit the amount of analytical work, for the FAT, FIBER, ALCOHOL variables, use a 95% confidence interval approach to compare groups, on average, for Gender (male vs female) Smoke Age_retired Alcohol_use Do NOT conduct or report on formal Hypothesis tests! How would you summarize the "story" that emerges from these analyses? 6) Given what you've found so far comparing groups, what is surprising to you? What turned up that you did not expect, if anything? What is it that would explain these results? What do you think should be the next steps to any analysis on this Nutrition data? Your write-up should address each task