Table Of Contents

Previous topic

Assignment #3: One thing about living in Santa Carla I never could stomach... all the damn vampires

Next topic

Computer Science 2120a – Computing & Informatics for Life Sciences

This Page

Assignment #4: Here you go again (on your own)

  • Worth: 20%
  • DUE: November 28th; submitted on Edmodo by midnight.
_images/sort.jpeg

Part I: Get yourself sorted

Pick a sorting algorithm that we haven’t discussed in class. It can be iterative or recursive. Efficient, or perversely awful. Any sorting algorithm at all will do, except for:

  • Selection Sort
  • Insertion Sort
  • Bubble Sort
  • Bogosort
  • Quick Sort
  • Merge Sort

Once you’ve picked a sorting algorithm, you have to do two three things:

  • Implement it in Python
  • Test your implementation to make sure it works
  • Write a brief description of how good you think the sorting algorithm is (how many loops does it have to do for a list with n elements?)

This is a big step from previous assignments, where you’ve had to do implementations based on my descriptions. Now you have to implement sorting based on someone else’s description, which is probably more terse. If you want to use programming in your own work, this is a critical skill. You need to be able to read a paper (or textbook) and, with some time and effort, turn an algorithm described in that document into a Python program that you can use.

Suggested places to look for sorting algorithms:

Note

I’m well aware of the fact that it’s completely trivial to ‘cheat’ on this portion of the assignment and just cut and paste Python code from the internet, instead of reading a pseudocode description of the algorithm and turning it into Python yourself. If you do this, believe me, it’ll be obvious. Maybe not to you, but it will be to the TAs. More importantly, it will be obvious to this kitten. And every time someone hands in cut-and-paste sorting code from the internet, this kitten gets sadder. If you can live with that, so be it. You monster.

_images/sad.jpeg

If you’re stuck or overwhelmed by all the choices of sorting algorithm, come and talk to Mark, Lareina or Bryan. We’ll help you pick one to implement.

_images/bigdata.jpeg

Part II: Data Analytics

You have enough skill as programmers now to be let loose upon the world. Instead of me dictating an assignment and dataset to you, this time I want you to find your own dataset. It could be a stock dataset from a website, or it could be data from your own research (assuming you get permission from whomever actually owns the data – you have my word that the TAs and I will keep any results you present completely confidential). The important thing is that the data interests you, personally, in some way.

I’d prefer that you find a dataset that really interests you, but if you’re stuck for ideas, here are some places to start:

Here’s what you’re going to do:

  • Pick a data set
  • Figure out how to load it into Python.
  • Do two (different!) visualizations of the data, and interpret them.
  • Use two different machine learning approaches to analyze, and interpret your data.

I can’t tell you what visualizations to do, because this choice totally depends on your dataset and what you want to do with it. Part of the assignment is to demonstrate some understanding of the tools you’ve learned by using them appropriately. I suggest starting with the links here and finding a plot that looks like what you want. Grab that code and modify it to suit your data. If you’re really stuck (and not sure what/how to plot) come and talk to Mark, Lareina or Bryan. We’re happy to make suggestions.

Note

Do your own work. There is an effectively infinite amount of data out there. The probability of you picking exactly the same datasets, visualizations and algorithms as someone in last year’s class is just about 0%. We have those assignments, and we’ll notice if you do. The opportunity to allow you freedom to apply your new skills in ways that are meaningful to you is worth the small extra effort of making sure everyone is honest. Besides, you don’t want to cause more heartache to that poor kitten, do you?

Once you’ve got your visualizations, write up a few sentences analyzing them. Do they make any sense? Is there a story in your data? What is it?

Likewise with the machine learning component. I can’t tell you what you should do, because I don’t know what data you’ve chosen to work with. A shot in the dark might be to try one supervised approach (where you try to classify your data) and one unsupervised approach (where you try to cluster your data). For example, suppose I were working with the Vampire data from assignment 3. For supervised learning, I might try to train a Support Vector Machine (SVM) to classify individuals as Vampire/Not-Vampire. For unsupervised learning, I might just run the whole data set through a k-means clustering algorithm to see if there are any obvious clusters in the data.

Once you’ve done the machine learning, write a few more sentences describing your results. If you did k-means clustering on the Vampire data and got two very clean clusters... what would that tell you? Try to draw inferences from the results you got, rather than just restating the result itself.

Your final Python program should have the following functions:
  • load_data(...) – loads your data set
  • viz1(data,..,) – vizualizes your data
  • viz2(data,...) – vizualizes your data
  • learn1(data,...) – first machine learning function
  • learn2(data,...) – first machine learning function

The ... in the function calls is there to make it clear that you can add as many (or as few) parameters to these functions as you need.

What to submit on Edmodo

  • An implementation of a sorting function in a Python file mysort.py.

  • A text file which:
    • describes the algorithm you implemented and where you found it (doesn’t have to be formal, a link to a website is fine).
    • shows you testing your sorting algorithm on some lists of integers and describing how good (efficient) you think this sorting algorithm is (and why). Don’t just say ‘it goes through the main loop n times’ – explain to me why you think that happens. You don’t need a mathematical proof, just an argument in English. Like we did in class.
  • A Python file containing functions to load your data, run 2 visualizations and 2 machine learning algorithms on it.

  • A text file describing the data set you chose, your motivation for choosing it, the results of your visualization and machine learning and your interpretation of the results.