Category: Lisp

  • Team-B Final Project: Applying Concepts of Hadoop Ecosystem and Data Preparation in GCP “Meeting Notes and Task Board for Address Project Presentation”

    Team – B: [Every document and screenshot should be named as ‘Team-B’]
    Overview:
    This week you will work with your group on your final project.
    Objectives
    Apply concepts learned about Hadoop Ecosystem
    Apply concepts learned in data preparation to preprocess data
    Construct an external table using basic SQL commands in BigQuery
    Develop queries in BigQuery.
    Construct a well-defined schema using basic HQL commands in Hive
    Develop queries in Hive
    Develop queries in Spark
    Instructions
    Each group will research their assigned use case. They will select a static dataset and streaming data source from the approved list provided or locate another and obtain the instructors’ approval.
    Each group will create an executive summary. This summary should be between 400 and 550 words, not including the title page, references, or other supporting documents. It should read like a summary of your presentation, giving the use case project, stepping through the data lifecycle, identifying tools/applications used during certain phases of the data lifecycle, and concluding with the next steps for the data science or analyst teams. The executive summary is in Times New Roman, 12-point, with one-inch margins.
    Each group will create a document with screenshots that includes the project and storage they created for their use case in GCP, setting up their Hadoop ecosystem, performing data processing with their static and streaming data, and performing queries in BigQuery, Hive, and Spark to ensure the quality of their data for the data science or analysts teams. Through each step, the team will take screenshots of their work and present them in a word document with brief explanations of the screenshots. The desciption should include the application used, the task performed, and why it was performed. Do not include how-to instructions.
    Each group will create a presentation that tells a story using the data lifecycle as a guide, and they will present their work during the designated time. You may be creative with the presentation with PowerPoint. The presentation is a professional business presentation. Each member of the group should speak. After the presentation, the group will entertain questions from the audience. The presentation should be at least 10-15 minutes in length.
    Meeting_Notes_Template:
    I have provided the ‘Meeting_Notes_Template.docx’, please fill the provided template.
    Approved Data Sources:
    I have provided ‘Approved Data Sources.pdf’ please select a two datasets from any USE CASES Approved Data Sources provided in the pdf.
    PPT and Word:
    Topic: Use cases from the discussion post
    Data: Use approved data sources (two or more)
    Executive Summary (25%): This paper should be between 400 and 550 words, not including the title page, code, and references.
    Screenshots (25%): These screenshots should show how you applied what you learned. Create a new project in GCP for this use case.
    Presentation (50%): The group will present
    Grading: This project is worth 20% of your final course grade. The Executive Summary will comprise 25% of this grade, screenshots 25%, and the presentation will be 50%
    Document Type: Word and PPT
    Executive Summary Requirements:
    400 to 550 words, not counting the title page, references, or supporting documents.
    Title page: Organization Name, Logo, Use case, group number, and group members
    Introduction: Introduce the use case and its purpose (Example: Data Engineering Request)
    Body: Step through the data lifecycle with your use case and the tasks you did
    Conclusion: Summarize and discuss the next steps for the data science and analyst teams
    Double-spaced Word Document
    References
    Application Screenshots Requirements:
    GCP project & storage
    Hadoop
    OpenRefine
    BigQuery
    Hive
    Spark
    Include an explanation (3-10 sentences) with the screenshots telling the application used and the task performed.
    Supporting Documents:
    Reference page
    Meeting notes or Task board
    Data Sheet – List of Data sources and any
    additional information such as the website
    address
    Other documents
    Word Document
    Meeting Notes Template:
    Date:
    Start and End time:
    Attendees:
    Note-taker:
    Notes:
    Decisions:
    Action Items:
    Task board:
    Create a Task board using MS Teams – Planner,
    Excel, or Word
    Task board Columns
    To-Do
    In Progress
    Review
    Done
    Task Info: Description, Owner, Due Date
    Presentation Requirements:
    Business casual
    TELL A STORY
    Every group member must present
    10-15 minutes to present
    2 minutes for questions
    10-20 PowerPoint Slides
    Title page: Organization Name, Logo, Use case, group number, and group members
    Outline or Agenda
    Every step of the data lifecycle – No definitions
    Hive and Spark SQL comparison chart
    A few of your screenshots (No more than 5)
    Cite the source on the slide if not your own words
    Word document regarding rubrics instructions.