Binary Search, Selection Sort, and Merge Sort

Since I spend most of my day building websites, I don't often get the chance to test my algorithm skills and figure out the most efficient sorting or searching methods. I usually reach for the generic one in whatever library I'm using, but as always, CS50 is challenging me to take a deeper look at assumptions I have, so here goes!

I got to learn all about the different sorting methods (bubble, insertion, selection, merge, etc.) and also what the heck Big O notation is (hint: it's a computer science way of describing the run-time of your algorithm).

They gave us a program, find.c, which basically asked for input from the user to create a "haystack", the haystack gets sorted so that you can do an efficient search, and then you search for a number, i.e. "needle", and print whether the number was found or not. So it was our assignment to implement the sort and search functions.

Selection Sort

Like I said, in order to do an efficient search, it helps to know that the list is already sorted for you. So the first thing I did was attempt to get a working version with a selection sort algorithm:

/**
 * Sorts array of n values.
 */
void sort(int values[], int n)
{
    // selection sorting algorithm
    for (int i = 0; i < n - 1; i++)
    {
        // Set minimum
        int min = i;
        
        for (int j = i; j < n; j++) 
        {
            // Find index of minimum value
            if (values[min] > values[j])
            {
                min = j;
            }
        }
        
        // Check if min changed
        if (min != i)
        {
            // swap i with lowest value
            int temp = values[min];
            values[min] = values[i];
            values[i] = temp;
        }
    }
}

Basically you go through the list one element at a time. You start with the first item, set it as your minimum, and then go through the entire list, see if anything is lower, and if you find one that is, swap their places.

The problem with this method is you iterate over the array one time for every element, no matter what, so you're stuck with O(n^2) as your best and worst-case scenario.

Merge Sort

I went ahead and did the hacker version of the pset which called for me to figure out how to implement merge sort, so here goes:

/**
 * Sorts array of n values.
 */
void sort(int values[], int n)
{
    // If 1 or less elements stop
    if (n < 2)
    {
        return;
    }
    
    // Set middle value, and left and right arrays
    int middle = n / 2;
    int left[middle];
    int right[n - middle];
        
    // Split values into two arrays
    for (int i = 0; i < n; i++)
    {
        // If less than middle, put in left, otherwise right
        if (i < middle)
        {
            left[i] = values[i];
        }
        else
        {
            right[i - middle] = values[i];
        }
    }
        
    // Sort left and right arrays
    sort(left, middle);
    sort(right, n - middle);
    
    // Merge sorted arrays
    merge(values, left, right, middle, n);
}

Since this is a recursive strategy, we start the function with a base case: if you have reached a single element, return that element. Otherwise divide your current list in half. I initialize two temporary arrays here to store the values in. Then you call the same function again to sort the left half and the right half, and finally merge those two halves.

Which means I needed to implement another function that would merge the halves back together:

/**
 * Merges two arrays back into parent array
 */
void merge(int values[], int left[], int right[], int middle, int total)
{
    int left_size = middle, right_size = total - middle;
    int i = 0, j = 0;
    
    // Loop through values array
    for (int k = 0; k < total; k++)
    {
        // If elements left on left and either no elements on right 
        // or left is larger than right, then use next left value
        if (i < left_size && (j >= right_size || left[i] <= right[j]))
        {
            values[k] = left[i];
            i++;
        }
        // else use next right value
        else
        {
            values[k] = right[j];
            j++;
        }
    }
}

We know how many elements are in the original array, so we set up a loop to iterate over all items in the full array. We have i and j initialized to be our placeholders for each of the two smaller, sorted arrays so we can keep track of our current position.

We then go look at the current position of the left and right half, see which value is smaller, store that value in the original array, and move our placeholder for that array forward. Eventually we have all of the elements back in the original array, all sorted.

Thanks to recursion, this works it's way all the way back to the original array we started with, and when it's all done, we have a sorted haystack! And while merge sort requires more resources for the extra arrays, it performs the sort much quicker, O(n log n). Since it's logarithmic, we can increase the size of the array without adding much time to the sort.

Now to search!

Binary Search

Now that our array is sorted, we can make some assumptions and do a binary search.

/**
 * Returns true if value is in array of n values, else false.
 */
bool search(int value, int values[], int n)
{
    // Ensure value is positive
    if (value < 0)
    {
        return false;
    }
    
    // Set initial min and max values
    int min = 0, max = n - 1;
    
    // While there are still elements in the list
    while (n > 0)
    {
        // Set middle to half of difference between min and max
        int middle = (max - min) / 2 + min;
        
        // If value is middle value, return true
        if (values[middle] == value)
        {
            return true;
        }
        // If value is less than middle, set max to one less than middle
        else if (values[middle] > value)
        {
            max = middle - 1;
        }
        
        // If value is greater than middle, set min to one greater than middle
        else if (values[middle] < value)
        {
            min = middle + 1;
        }
        
        // Set new number of elements to those between min and max
        n = max - min + 1;
    }
    
    // If number is not found, return false
    return false;
}

We set up a while loop to make sure we still have items to search through. We then look at the middle element. If it's what we are looking for, great! If not, we then look either to the left or right based on whether the number we are looking for is greater or less than the middle value. We throw out any elements that we know aren't our number and continue on.

This is another O(log n) algorithm, so with merge sort and binary search used together we can find our needle in a haystack quite quickly!

This was definitely a good exercise to think deeper about how computers deal with data, and will definitely make me think twice about how I deal with large data sets in the future!

If you want to see the entire project and all the files involved, you can check it out on Github