Category: Data science

  • “Using Data Visualizations to Tell the Story of Home Prices and Dropout Rates in a Big City”

    INTRODUCTION
    As a business analyst, you must be able to evaluate data and share it effectively with other stakeholders. For this task, you will use the given scenario to support communication with stakeholders through data visualizations. These data visualizations will help you tell the story of the data you are representing. You will construct those visualizations using a data set and specific storytelling techniques.
    SCENARIO
    You are a real estate and property management business analyst in a big city. Your company’s leadership team supports building reasonably priced housing in areas that have well-performing schools with low school dropout rates (home price and dropout rate are the key factors). You wish to commission a consulting team of business and data analysts to build a full dashboard that will address the best locations for new construction projects within the five boroughs of the city based on the company’s mission and various performance metrics.
    You decide to build a basic prototype model of the dashboard using the home price and dropout rate data for one borough. This prototype model will help you explain the project to your company’s leadership before you meet with the consultants who will build the full dashboard with many other predictive factors. You also want to be able to clearly report a data story about the project to the leadership team using your prototype.
    To ensure you understand the data that will be included in the dashboard model, you will first need to connect your Excel data with Tableau and look at it visually.
    Use the provided info to complete this task.
    Provided model data for the borough: 
    •  average three-bedroom home prices by year 
    •  average dropout % by year
    A.  Describe the two main factors—home price and dropout rate—in the scenario.
    1.  Explain, using data from the “D466 Task 1 Data Set,” the current state of home prices.
    a.  Use Tableau to create an appropriate data visualization that shows the current state of home prices.
    b.  Identify one design technique you used in the visualization from part A1a, and explain how it addresses a common accessibility issue.
    c.  Identify one design technique different than the one identified in part A1b, and explain how it contributes to effective representation of the information in the data set.
    2.  Explain, using data from the “D466 Task 1 Data Set,” how a change in school dropout rates would affect home prices.
    a.  Use Tableau to create an appropriate data visualization that shows how a change in school dropout rates would affect home prices.
    b.  Identify one design technique you used in the visualization from part A2a, and explain how it addresses a common accessibility issue.
    c.  Identify one different design technique than the one used in part A2b, and explain how it contributes to effective representation of the information in the data set.
    B.  Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.
    C.  Demonstrate professional communication in the content and presentation of your submission.

  • “Integrating Process Mining, GENAI Agents, and Sequence-Aware Recommendation for Enhanced Customer Journey Analytics”

    Abstract:
    In the rapidly evolving digital landscape, understanding customer journeys is paramount for businesses seeking to improve customer satisfaction and retention. This paper presents a novel approach to customer journey analytics through the application of process mining techniques, enhanced by GENAI agents and sequence-aware recommendation. By integrating process mining with customer journey mapping, we can unveil intricate patterns and behaviors that traditional methods may overlook. GENAI agents further enrich this analysis by leveraging advanced generative AI to provide deeper insights and predictive capabilities. Additionally, incorporating sequence-aware recommendation allows our model to consider the chronological order of customer interactions, capturing temporal dynamics and user behavior patterns to make more accurate and contextually relevant suggestions.
    This study demonstrates the effectiveness of this combined model in capturing and analyzing real-time customer interactions across various touchpoints. We apply our approach to a case study within the retail industry, showcasing how businesses can leverage these insights to optimize their customer engagement strategies, identify bottlenecks, and enhance overall customer experience. The findings underscore the potential of GENAI agents, sequence-aware recommendation, and process mining as powerful tools in the realm of customer journey analytics, providing a data-driven foundation for strategic decision-making.
    https://www.researchgate.net/publication/316521600_A_Process_Mining_Based_Model_for_Customer_Journey_Mapping
    https://dl.acm.org/doi/abs/10.1145/3297280.3297288
    https://journals.sagepub.com/doi/abs/10.1016/j.intmar.2018.02.001
    https://scholar.google.com/scholar?q=research+papers+on+customer+journey+analytics&hl=en&as_sdt=0&as_vis=1&oi=scholart
    https://www.adobe.com/content/dam/www/us/en/analytics/pdf/AA-Customer-Journey-Analytics.pdf
    https://pure.rug.nl/ws/files/81733365/Understanding_Customer_Experience_Throughout_the_Customer_Journey.pdf

  • “The Double-Edged Sword of Facial Recognition: Exploring the Societal Impacts of Deploying this Controversial Technology in Public Spaces”

    This is a research paper on data science. The question at hand is What are the societal impacts of deploying facial recognition technology in public spaces? 

  • “Exploring Data Management Techniques with SAS and R”

    SAS ASSIGNMENT: 
    1. Using SAS: Date Fields Creation
    Create fields to identify the last file entry, first gift, and last gift.
    Calculate the median months between the first and last gift.
    2. Using SAS: Histogram Analysis
    Generate histograms for the ENTRY_DATE field.
    Provide an analysis explaining any unusual patterns observed in the histograms.
    3. Using SAS: Additional Date Fields
    Determine the number of individuals added to the file in 1998.
    Identify the year with the lowest average LAST_GIFT_AMT.
    Filter the data to find records where CLUSTER_CODE equals 9 and LAST_GIFT_DATE_YEAR equals 1997.
    4. Using R: Date Fields Replication
    Write R code to replicate the creation of the three date fields mentioned in task 1.
    Produce output that shows the correct minimum, median, mean, and maximum values for these fields.
    5. Using R: Gift Analysis
    Develop R code to create the field LAST_GIFT_DATE_YEAR.
    Calculate and display the mean LAST_GIFT_AMT by LAST_GIFT_YEAR.
    6. Using SAS: Handling Missing Values for DONOR_AGE
    Perform a data integrity check on DONOR_AGE.
    Discuss whether to remove records with missing values and provide a rationale.
    7. Using SAS: Missing Values Imputation for DONOR_AGE
    Observe the current values and apply various imputation methods: mean, hot-deck, stochastic regression, and predictive mean matching (PMM).
    Provide a detailed analysis and recommendation based on the imputation results.
    8. Using SAS: Handling Missing Values for Categorical Variables
    Conduct a data integrity check on WEALTH_RATING.
    Apply mode and PMM imputation methods.
    Analyze the outcomes and make recommendations.
    9. Using SAS: Analysis of Extreme Distributions
    Assess whether the distribution of LIFETIME_GIFT_AMOUNT is extreme.
    Identify and apply the appropriate transformation to address skewness.
    Use the BoxCox transformation method to discuss the optimal lambda and new skewness value.
    10. Using SAS and R: Advanced Statistical Techniques
    Calculate the median origination date in years and format as MM/DD/YYYY.
    Implement kNN imputation in R and evaluate if the results alter any recommendations.
    This assignment emphasizes the practical application of SAS and R programming in data management, specifically in handling dates, analyzing distributions, and imputing missing data.

  • Cover Letter for Data Analytics Program Application Dear Admissions Committee, I am writing to express my strong interest in the Data Analytics program at Ohio State University. As a highly motivated and dedicated individual, I am eager to further my education and pursue a

    The cover letter should be no more than 1 page long (11pt font or higher)
    The cover letter should answer the following three (3) questions:
    What are your long term goals, and how will the data analytics program help you achieve them? (300 words maximum)
    Which of the specializations most interest you and why? You can write about more than one. Note: Computational Analytics requires a 3.2 cumulative OSU GPA and Business Analytics requires a 3.0 cumulative OSU GPA. (300 words maximum)
    Have there been any external factors that have negatively impacted your academic performance at Ohio State? If so, please explain. (200 word maximum)
    This is what I wrote. I used chatgpt. Help me modify it according to what I wrote.
    Make it look different from ai
    The pdf below is what I wrote

  • Journal Entry: Steps for Conducting a Data Quality Assessment Determine if the data set reveals a problem: The first step in conducting a data quality assessment is to determine if the data set presents any potential problems or challenges. This can be determined

    Organizations are dynamic and constantly evolving, and as they change, so do their data needs. A merger between two companies requires a data quality assessment to ensure that data sets are compatible so they can be merged. A preliminary data quality assessment will enable the data analyst to brainstorm prior to starting the assessment. Writing a journal of a set of steps can save time and effort prior to beginning the data quality assessment. Refer to the steps you learned in the Data Analytics LifeCycle to guide you and the DAMA DMBOK (Chapter 12) for industry specific terms and definitions. Remember to track cited sources.
    Prompt
    Use the guidelines below to identify what you will need to look for.
    Note: A data set is not included in this assignment. You are only describing how to complete a data quality assessment.
    Specifically, you must address the following rubric criteria:
    Determine if the data set reveals a problem. How will you know if the data set represents an organizational challenge?
    Determine if the data set is usable. How will you know if the data set is suitable for an assessment?
    Assess the data set for consistency and completeness. What will you do to verify that you have all the accurate data needed to complete a data quality assessment?
    Identify data that you will keep and data you will discard. How will you know if you need all of the data or only some of it?
    Describe any obstacles that could interfere with providing an accurate data quality assessment. What will you look for to safeguard sensitive data?
    What to Submit
    Your submission should be 250 to 500 words in length.

  • “Comparing Energy and Cost Savings: Incandescent vs. CFL and LED Lights”

    Students should compare the energy and cost savings from switching their Incandescent lights to CFB and LED lights
    chose either one

  • “The Science of Climate Change: Separating Fact from Fiction”

    For this final assignment, you can choose any science topic for which you have an interest. The topic can
    be based on a subject mater that you studied in another class, past or present, or the topic can be
    based on your personal or professional interest. Regardless of the topic you choose, your goal in this
    paper is to explain the science to a general audience. This mean that you must explain all jargon. Your
    grade will be based on how well you can explain science.
    In general, it is always thought-provoking to choose a topic that concerns, or should concern, the public
    interest. As you know, there are many mes when an important science topic is misunderstood by the
    general public, a government, or a commercial enterprise, so that important decisions are made based
    on misinformaon or disinformaon.
    Research:
    You must use at least five qualified sources, but you may use as many sources as you need.

  • “Exploring the Relationship between Variables: ANOVA and Cross Tab Analysis in JAMOVI”

    I will upload excel sheet with data once your confirm with me you know how to use JAMOVI
    I would need ANOVA analysis
    and cross tab analysis for my paper as well as findings section with hypothesis testing and other findings section the findings section
    report what your findings include basic university statistics and the results of hypothesis testing. in this section you need to describe and interpret what the numbers mean key statistics. describe thoroughly readers should be able to understand your text without looking at the tables. hypothesis testing and report OTHER findings conclusions and discussions this section elaborates on findings and discusses implications of your study start this section by summarizing the main findings.

  • Title: “Addressing Ethical Concerns Arising from Flawed Algorithms”

    Ethic issues due to Faulty
    Algorithms rewrite,help me to rewrite somethings
    Comments:
    The document is very badly written. It has no sections. It fails to define what a faulty algorithm is. Everything in this world is faulty. Cars can have faults and so can any human-made machines. So, all algorithms are faulty by definition. What types of faults can there be? There is no structure in the writing. It just rambles on.