Table Of Contents

Previous topic

CS 2120: Practice Final Exam

Next topic

Assignment #2: Oh crap, Zombies!

This Page

Assignment #1: Million Song Dataset

  • Worth: 10%
  • DUE: September 25th; submitted on Edmodo by midnight.

Streaming music service has just hired you into a junior market analyst position (based, of course, on your awesome Python skills). Your first task is to analyze some trends in a set of pop music data.

In this assignment, you will use a real dataset of pop songs, drawn from The Million Song Dataset. To keep file sizes for the assignment reasonable, you’re only required to use a small subset of the whole million+ songs, but if you want, you can run your code on the full million songs (it will take longer, use more disk space and require you to do a bit more work).

The small subset of songs has been randomly sampled from the full database, so please don’t shoot me if your favourite/song artist isn’t in the small dataset; most of mine aren’t either.

You will need the data file songdata.csv . Download this to your computer and save it in the same directory that you will save your assignment. I’ve transformed the raw dataset into a nice, easy to use, “comma separated values” (CSV) file. We’ll learn more about those later in the class; in fact, by the end of the course, you’ll know enough to work with the raw million song data yourself! For now, you can simply use the function I provide for loading the dataset.

Big Data

If you’re really keen to try running your code on all one million songs, you’ll need: bigsongdata.csv (Warning: it’s big!). To be totally clear you do not have to use this file; you’ll get full marks for running your program on the smaller file; this is just an option for those who are interested.

To make life easier for the first assignment, you don’t have to start from scratch. I’ve already started a file for you to use as a template. This is also somewhat realistic for a research programmer; you don’t often start completely from scratch... usually, you’re trying to modify someone else’s code that you downloaded or inherited.

Download to get you started.

The steps you need to do are laid out, in detail (and suggested order) below.

Finish the “fastest year” function

Your first task is to find the year in which the fastest (highest tempo) song in the dataset occurs. Your predecessor has started a function, but never completed it (that’s why they got fired). Complete the fastest_year() year function. It should return a single integer: the year of the fastest song.

Now test your function. To do that you’ll need to write some code at the very bottom of your .py file to do two things:

  • Load the dataset using the provided function and store it in a variable
  • Run your fastest year function on the dataset

Write a “hottest artist” function

Your boss wants to know who the hottest artist in the database is, so rdio can pursue an exclusive distribution contract with them. Fill in the function hottest_artist. It should return the name of the artist with the hottest song in the dataset.

Who is the hottest artist?


The way this function works is very similar to a function you already have working!

Write a “loudest song” function

Music production has undergone a tragic process over the past few decades. In an effort to sound “better”, artists have been releasing recordings that are more and more compressed (roughly meaning that the difference between the “quiet parts” and the “loud parts” is very small). This is done because compressed sound is perceived as louder and, on first listen, many people think it sounds “better” than a version with full dynamic range.


This isn’t bad in itself. When Phil Spector invented the Wall of Sound , he made tracks that sounded great on AM radio. But now everyone compresses everything and the result is that it all sounds kinda homogenous and lacking in subtlety. This is popularly referred to as The Loudness War and, if you’re into music, is well worth reading up on.

Back to the assignment at hand: Your boss wants to know which song is the loudest in the database. The loudness values are all negative because they are reported in decibels relative to a maximum possible loudness of 0 dB.

Fill in the loudest_song() function to find and return the name of the loudest song in the dataset.

What is the loudest song?

BONUS Empirically validate The Loudness War


This is for bonus marks. You do not have to do this question! It’s much harder than the other parts of this assignment. If you’re keen to try, go for it... but make sure you get the rest of the assignment done first.

I asserted above that pop songs are getting “louder” over time. As a good researcher, you know better than to trust someone’s opinion, so now’s the time to let the data speak for itself. Fill in the function avg_loudness_by_year() which will, for each year in the database, compute the average loudness of all songs from that year.

Rather than returning a value, this function can just print the results to the screen.

Is music really getting louder? Find out for yourself.

Have a peek at the data-loading function

This time, the function that loads the data is a freebie. It contains some stuff we haven’t discussed in class yet, but it’s pretty easy to figure out what’s going on if you look at it. So look at it. Get used to looking at code that isn’t yours, and may use unfamiliar ideas/idioms/patterns, and trying to figure out what it does. This isn’t always easy (sometimes it’s very hard), but you’ll spend a lot of time doing it (whether you want to or not!).

What to submit on Edmodo

  • Your version of
    • Make sure your NAME and STUDENT NUMBER appear in a comment at the top of the program.
    • List anyone you worked with in the comments, too
  • A text file describing your findings (loudest song, hottest artist and year with the fastest tempo).

Some hints

  • Work on one function at a time.

  • Get each function working perfectly before you go on to the next one.

  • Test each function as you write it.
    • This is a really nice thing about Python: you can call your function right from the interpreter prompt and see what result gets returned. Does it look reasonable? Good! Check it!
  • If you need help, ask! Try Edmodo. Drop by my office. Ask the TA during lab. We’re here to help.

  • Remember that it is not only ok but actively encouraged to work together.