Little Data, Big Data
The ability to collect, manipulate, analyse and visualise data is an important literacy for the digital age, and has a strong emphasis in most modern curricula. This session will take you through a classroom-tested practical exercise that will help you see how data analysis might be used in your lessons. We may also look at some really big datasets to see how they can be analysed for greater meaning. Bring a computer or Chromebook and prepare to get dirty with data.
- The Australian Curriculum mentions the use of data in several places
- New Zealand Standards mentions data collection too
- USA Common Core State Standards also references data collection and management
- The Ontario Curriculum has a Data and Probability strand that targets this understanding
In other words, being able to confidently collect, manage, analyse and work with data is a vitally important skill, no matter where you live in the world!
Activity 1, Part 1 - Collecting and organising data
- Get into a group of no more than 6 people (4 to 6 is ideal)
- Nominate ONE person to be the group leader. Just one!
- The group leader needs to make a copy of this Google Sheet (this link will prompt you to make a copy)
- Group leader then shares their copy of the Sheet with the rest of their group.
- Find the shared Sheet in your Google Drive and open it. You're now all working as a team on the same Sheet.
- Open your bag of coloured matchsticks and roughly divide them between everyone in the group.
- Count your own little pile of matches and enter the data into the shared Sheet. Don't forget to enter your names in Column B.
- Be careful not to change a cell that shouldn't be changed. (If you try to, you'll get a warning about it)
- Once the raw data is entered, you'll need to use Spreadsheet formulas to calculate the missing numbers. The formulas you'll need to use are SUM, AVERAGE, MAX, MIN. Remember that all formulas start with an = sign.
- Fill the formulas across and down so that every required cell has relevant data in it. (Rows 11, 12, 13, 14, and Column I)
Activity 1, Part 2 - Visualising and interpreting data
- Once you've filled in the whole sheet as a team, you need to make your own copy of the data. Use the File > Make a Copy command.
- Now you're on your own, kid! This new Sheet belongs to you.
- Make a stacked column chart (graph) to show the total number of matchsticks counted by each person.
- Tidy up your chart with appropriate labels, etc
- Check the colours in the chart. Do they match the correct colour names? How can you fix that?
- Once you're happy with the final chart, move it to its own sheet.
- Publish this chart to the web as an interactive chart. Make sure you just publish the chart, not the whole sheet!
- Copy the published chart URL. You'll need it in a moment
- Fill in this form - bit.ly/littledata1 (There's a link in the Sheet too!)
- You're done!
Here's an example of what your finished chart might look like. By the way, this is live data, fed from an actual Google Sheet and embedded here in this webpage! If the numbers in the sheet change, the chart will update here.
Activity 2 - Analysing Big Data
There is plenty of data on the web. Many governments and other organisations publish "open data" that anyone is welcome to download, look at or play with. Take a look through some of these open data repositories and see what interesting data you can find. You can also try Googling for information and including the term [dataset] in the search. You can also try using [filetype:csv] or [filetype:xlsx] in the search to restrict the results to commonly importable tabular data formats.
Open Data Sources
Once you find some data that sounds interesting, download a copy and open in Google Sheets. The best formats will be Excel (.xlsx) or Comma Separated Values (.csv) Both these formats will open nicely in Google Sheets if you import and convert in Drive.
- What is this data telling you?
- What meaning can be extracted from the data?
- How might you need to manipulate the data to get meaning from it?
- Can the data be graphed or visualised to reveal the information more clearly?
For example, this data on Household Appliances led to this CSV file about Televisions which was converted to this Geoogle Sheet. It has nearly 3500 rows in it. There must be something interesting in all that data!
So now what? Here's some suggestions on how to start investigating this (or any!) dataset.
Freeze the top row. This makes it easier to scroll through the data as the headings for each column remain visible. To freeze the top row choose View > Freeze > 1 Row
Clean up the data. Are there columns you don't need? Maybe they are empty, or just not relevant to what you need. Are there columns you don't know what they mean? (there is a data dictionary with this particular dataset, so you could look up anything you're not sure of). Clean the data up by deleting any parts of the dataset that are empty, that you don't need or that are just irrelevant.
Apply filters. Select the entire first row by clicking on the "1" at the far left of the row. Then choose Data > Filter. Filter the data by different criteria and see what you get.
Sort the data. Use the dropdown boxes in the first row to sort the data below. Sort forwards (A-Z) or backwards (Z-A)
Look for interesting patterns in the data. Are there trends you can spot? Are there correlations between different aspects of the data? How could you find them?
Try graphing the data. Sometimes Big Data is overwhelming. Try selecting some column(s) that looks interesting and graphing them by clicking Insert > Chart. You never know what you'll discover!
Find meaning. As you browse the data think about the sorts of questions that might be answerable because of it.
- Which television has the best energy rating?
- What screen size is the most popular?
- What country sells the most TVs?
- Are certain screen sizes more popular in some countries than others?
- Is there a relationship between screen technology (ie LCD vs Plasma) and screen size?