Performance Analysis of Sorting Algorithms with Large Datasets

Abstract

This lab report analyzes the performance of the Quick Sort, Bubble Sort and Bucket Sort algorithms to determine which of these is the most efficient algorithm when it comes to sorting large data sets. The method used to conduct this experiment is feeding each algorithm large data sets of arrays/lists that contain 1,000 integers and keeping record of how long it takes for each algorithm to sort the arrays. According to all the data gathered at the end of the procedure, it was determined that among these three algorithms, the Quick Sort algorithm is the most efficient at sorting large amounts of data.

Introduction

An algorithm is a finite set of instructions that accomplish a particular ask. All algorithms must have an input, provide an output, be definite, finite, and effective. (Horowitz et al., 2008, p. 1). Sorting is the process of placing elements from a collection in some kind of order (Miller & Ranum. 2006, p. 207). A sorting algorithm is a set of definite instructions that takes in a collection of data and sorts the data in a particular order, either ascending or descending order and outputs the sorted collection of data. Sorting algorithms are important to the computer science field as they make it easier for data engineers to manage and understand their data. In general, there are lots of different algorithms available, but for this experiment we will only be focusing on analyzing the performance of three: Quick Sort, Bubble Sort and Bucket Sort. To see which of these algorithms performs the best we must look at two important factors: time complexity and space complexity. The space complexity of an algorithm is the amount of memory it needs to run to completion. The time complexity of an algorithm is the amount of computer time it needs to run to completion (Horowitz et al., 2008, p. 15).

Bubble Sort

Runs at O(n2) time complexity and has O(1) space complexity. (Miller & Ranum. 2006, p. 210).

Bubble Sort compares adjacent elements and exchanges the ones that are out of order. It makes multiple passes through the collection of data and each time it passes it places the next largest value in its proper place. (Miller & Ranum. 2006, p. 207).

This the Bubble Sort in Python:

def bubbleSort (alist):
	for passnum in range ( len (alist) – 1, 0, -1):
		for i in range (passnum):
			 if alist[i] > alist[i+1]:
				temp = alist[i]
				alist[i] = alist[i+1]
				alist[i+1] = temp

(Miller & Ranum. 2006, p. 209)

Bucket Sort

Runs at O(n+k) time complexity and has O(n) space complexity. (Aadhith et al., 2024, p. 1035)

Bucket Sort distributes the element inputs into a set of n buckets, where each bucket corresponds to a specific range of values. (Aadhith et al., 2024, p. 1034)

This is the Bucket Sort Algorithm in pseudo code:

sort (A)
	create n buckets B
	for i = 0 to n-1 do:
		k = hash (A[i])
		add A[i] to the kth bucket B[k]
		extract (B, A)
end 

extract (B, A)
	idx = 0
	for i = 0 to n-1 do:
		insertion sort (B)
		for m = 1 to size(B[i]) do:
			A[idx++] = mth element of B[i]
end

(Heineman et al., 2008, p. 348)

Quick Sort

Runs at O(nlog(n)) time complexity and has O(log(n)) space complexity. (Miller & Ranum. 2006, p. 226).

Quick sort uses divide and conquer. It selects a value, which is called pivot value, its role is to assist with splitting the collection. The actual position where the pivot belongs in the final sorted element collection is used to divide the collection for subsequent calls to the Quick Sort (Miller & Ranum. 2006, p. 222).

This is the Quick Sort in Python:

def quicksort (alist):
	quickSortHelper (alist, 0, len(alist) -1)

def quickSortHelper (alist, first, last):
	if first < last:
		splitpoint = partition (alist, first, last)
		quickSortHelper (alist, first, splitpoint – 1)
		quickSortHelper (alist, splitpoint + 1 last)

def partition (alist, first, last)
	pivotvalue = alist[first] 
	leftmark = first + 1
	rightmark = last
	done = False
	
	while not done:
		while leftmark <= rightmark and alist[leftmark] <= pivotvalue:
			leftmark = leftmark + 1 
		while alist[rightmark] >= pivotvalue and rightmark >= leftmark:
			rightmark = rightmark – 1 
		if rightmark < rightmark:
			done = True
		else:
			temp = alist[leftmark]
			alist[leftmark] = alist[rightmark]
			alist[rightmark] = temp

	temp = alist[first]
	alist[first] = alist[rightmark]
	alist[rightmark] = temp
return rightmark 

(Miller & Ranum. 2006, p. 225).

Based on the time and space complexity of each algorithm, Quick Sort will be the most efficient at sorting large data sets between the three.

Materials and Methods

Materials: 

  • Computer with the following algorithms programmed:
    • Quick Sort
    • Bubble Sort
    • Bucket Sort
  • 10 sets of arrays/lists with 1,000 integers each
  • Time module in the same file of each algorithm

Method:

For this procedure we will be repeating the following steps for each algorithm:

  1. Take time with time module and save as start_time
  2. Execute the algorithm with a data set
  3. Take time with time module and save as end_time
  4. Subtract the start_time from end_time (this is the total execution time)
  5. Record execution time in a data table
  6. Repeat for each data set (10 data sets total)

Bubble Sort

DATA SETDATA SIZE (INT)EXECUTION TIME (MS)MEMORY USAGE (KB)
110001,922.2650.328
210001,867.0940.328
310001,908.670.328
410001,880.330.328
510001,891.5910.328
610001,944.7470.328
710001,937.20.328
810001,903.1050.328
910001,928.070.328
1010001,905.9520.328

Average Execution Time: 1908.902 MS

Average Memory Usage: 0.328 KB

Bucket Sort

DATA SETDATA SIZE (INT)EXECUTION TIME (MS)MEMORY USAGE (KB)
1100011.53298.88
2100010.88895.359
310009.3495.291
410009.23795.617
510008.87694.68
610008.99594.898
710008.91395.211
810008.70594.773
910009.02595.367
1010008.81194.711

Average Execution Time: 9.432 MS

Average Memory Usage: 95.478 KB

Quick Sort

DATA SETDATA SIZE (INT)EXECUTION TIME (MS)MEMORY USAGE (KB)
1100017.84840.719
2100015.38552.573
3100014.37144.344
4100014.09642.656
5100016.08947.297
6100014.23348.547
7100014.31342.969
8100014.0845.312
9100014.9146.883
10100014.7952.766

Average Execution Time: 15.011 MS

Average Memory Usage: 46.406 KB

Discussion

When analyzing the results of the experiment, Quick Sort is the most efficient algorithm at sorting large sets of data. Bubble Sort’s average execution time was the slowest among the three, but its average memory usage was the lowest which is great. On the other hand, when looking at Bucket Sorts results, it had the fastest average execution time but the greatest average memory usage. When finally looking at Quick Sorts results, it didn’t have the fastest average execution time or the lowest average memory usage. Its average time was faster than that of Bubble Sort and its average memory usage was lower than of Bucket Sort, making it more efficient in speed and in memory.

Conclusion

In this lab report we were able to analyze the performance of three sorting algorithms, Bubble Sort, Bucket Sort and Quick Sort. The results of the experiment showed that although Quick Sort wasn’t the fastest or wasn’t the best at space saving, it was the most efficient when dealing with large data sets.

References

Horowitz, E., Sahni, S., & Rajasekaran, S. (2008). Fundamentals of Computer Algorithms (2nd ed.). Galgotia Publications.

Miller & Ranum. (2006). Problem Solving with Algorithms and Data Structures using

Python (2nd ed.). Franklin, Beedle & Associates Incorporated.

Aadhith & Rajinikanth (2024). Enhancing Bucket Sort Efficiency: A Combinatorial

Approach to Algorithm Optimization. International Journal for Research in Applied Science & Engineering Technology.

Heineman, G., Pollice, G., & Selkow, S. (2008) Algorithms in a Nutshell. O’Rielly Media Inc.

Hugging Face Co. Sorting Algorithm Performance Metrics. https://huggingface.co/datasets/ismielabir/Sorting-Algorithms-Performance-Metrics/viewer/default