Data analysis

From Wikiversity
Jump to navigation Jump to search
Wikipedia-logo.png Search for Data analysis on Wikipedia.
Data Analysis, Training of Models and Visualization

Data analysis is the process of looking at and summarizing data with the intent to extract useful information, make inferences, and develop conclusions. Using statistical or numerical software applications, data analysis can be pursued using a range of techniques, including statistics.

Note that "data analysis" assumes different aspects, and possibly different names, in different fields.

Process[edit | edit source]

Data visualization process v1.png

Activities[edit | edit source]

The first activities relate to diagram above and the embedding of data analysis into decision making processes.

  • (Decision Making) Explain why data analysis is relevant for evidence based decision making.
  • (Use Case) Look at the diagram above and look at your field of expertise. Populate the different steps with a workflow with raw data that you have access to.
  • Spatial Decision Support Systems (SDSS) bridge the domain of data analysis and decision support and dealing with spatial data. Look at the domain of Transportation. Create a workflow for the analysis of spatial data for transported goods together with the trucks, ships, trains, planes, ... and identify sustainable ways of transport goods and services to a customer.
    • address the data analysis for using the capacities of trucks and trains, so that driving without cargo or minimal usage of the capacity is reduced. Identify indicators in the data analysis to address specific Sustainable Development Goals. Identify which SDGs are addressed and how the definition of SDGs determine the used methodologies for the data analysis. Describe the data analysis foundations that are required to measure the impact of intervention for a sustainable system of transport and delivery.
  • (Machine Learning) Data can be processed with machine learning. Compare methodologies of classical statistical analysis and machine learning as one way of performing data analysis. What are similarities, differences, benefits and drawbacks between those approaches?
  • (Digital Learning Environment) Consider digital learning environments and the diagram above.
    • Consider a specific learning environment in your domain. How would a teacher select appropriate learning tasks tailored for the student in a way, that the exercise is challenging enough and too complex? What are the indicators (required information) for the teacher to the exercise or the support the teacher provides appropriate to the specific requirements and constraints of the student/learner?
    • Now we transfer that to data analysis (in this case learner analytics. Identify data that can be collected in a digital learning environment, that could be used to support the teacher in providing tailored teaching and learning material to the student?
    • Choose from your current knowledge about data analysis an appropriate methodology to analyze the collected data. Start from very basis methodologies of
      • means,
      • standard deviation,
      • worst case, best case,...
      • ...

Wiki2Reveal Slides[edit | edit source]

The following Wiki2Reveal presentations can be used by lecturers as Open Educational Resources to support their course work in addition to standard statistical and numerical approaches to process and analyze data.

Chapter 1 - Introduction[edit | edit source]

Learning Task[edit | edit source]

  • (1) Identify an application scenario for which you want to apply your data analysis. Write a small summary of your project (e.g. a Bachelor, Master, PhD thesis).
  • (2) Describe the experimental design in which the data will be collected.
  • (3) Provide one scenario,
    • (3.1) in which you have a fixed time for data collection and after data collection the data analysis starts and
    • (3.2) in which you get a constant input stream of data that has to be processed in a continuous way with an appropriate methodology for dynamic reporting and dynamic data analysis in real time scenario
in Bachelor, Master, PhD thesis you will have mainly scenario (3.1). In this case it is just an exercise to extrapolate from (3.1) to a scenario (3.2) that handles a constant input stream of data for a dynamic analysis.
  • (4) Swarm Intelligence compare the data analysis workflow in the diagram mentioned above. For swarms data is coming in not to a central swarm container and it is analyzed centralized. Individuals in a swarm perceive different information/data and the swarm responds to the perceived to data as a group. Identify analogies and differences in data analysis on a qualitative level.

Chapter 2 - Data Clean Up - Processing of Raw Data[edit | edit source]

The section addresses the preprocessing of data.

  • Moving Average - is an example how preprocessing can be used to clean up data and remove noise.
  • Data Compression - (Wiki2Reveal slides) Wiki2Reveal Logo.png - represent the same information with less data.
  • Missing Data and incomplete data sets / records and how to impute missing in a way that it does not have an impact on the mean and standard deviation of the data set.
  • Normalization

Learning Task[edit | edit source]

  • Look at data coming from a stock exchange for a specific share. Explain the benefit of preprocessing of the data with a Moving Average. What is noise in the data and how does Moving Average contribute to reduce noise in the data.
  • Missing data can be a challenge for the researcher, because incomplete data sets might not be used in the data analysis. Explain circumstances in which imputation of missing data could help to incorporate more data in the data analysis and what are the requirements and constraints for data imputation.

Chapter 6 - Pattern Recognition[edit | edit source]

Learning Modules[edit | edit source]

See also[edit | edit source]

Wikipedia[edit | edit source]