Guide to Arrays in Python

1 year ago 85

An array is a structured way to store multiple items (like numbers, characters, or even other arrays) in a specific order, and you can quickly access, modify, or remove any item if you know its position (index).

In this guide, we'll give you a comprehensive overview of the array data structure. First of all, we'll take a look at what arrays are and what are their main characteristics. We'll then transition into the world of Python, exploring how arrays are implemented, manipulated, and applied in real-world scenarios.

Understanding the Array Data Structure

Arrays are among the oldest and most fundamental data structures used in computer science and programming. Their simplicity, combined with their efficiency in certain operations, makes them a staple topic for anyone delving into the realm of data management and manipulation.

An array is a collection of items, typically of the same type, stored in contiguous memory locations.

This contiguous storage allows arrays to provide constant-time access to any element, given its index. Each item in an array is called an element, and the position of an element in the array is defined by its index, which usually starts from zero.

For instance, consider an array of integers: [10, 20, 30, 40, 50]. Here, the element 20 has an index of 1:

There are multiple advantages of using arrays to store our data. For example, due to their memory layout, arrays allow for O(1) (constant) time complexity when accessing an element by its index. This is particularly beneficial when we need random access to elements. Additionally, arrays are stored in contiguous memory locations, which can lead to better cache locality and overall performance improvements in certain operations. Another notable advantage of using arrays is that, since arrays have a fixed size once declared, it's easier to manage memory and avoid unexpected overflows or out-of-memory errors.

Note: Arrays are especially useful in scenarios where the size of the collection is known in advance and remains constant, or where random access is more frequent than insertions and deletions.

On the other side, arrays come with their own set of limitations. One of the primary limitations of traditional arrays is their fixed size. Once an array is created, its size cannot be changed. This can lead to issues like wasted memory (if the array is too large) or the need for resizing (if the array is too small). Besides that, inserting or deleting an element in the middle of an array requires shifting of elements, leading to O(n) time complexity for these operations.

To sum this all up, let's illustrate the main characteristics of arrays using the song playlist example from the beginning of this guide. An array is a data structure that:

Is Indexed: Just like each song on your playlist has a number (1, 2, 3, ...), each element in an array has an index. But, in most programming languages, the index starts at 0. So, the first item is at index 0, the second at index 1, and so on.
Has Fixed Size: When you create a playlist for, say, 10 songs, you can't add an 11th song without removing one first. Similarly, arrays have a fixed size. Once you create an array of a certain size, you can't add more items than its capacity.
Is Homogeneous: All songs in your playlist are music tracks. Similarly, all elements in an array are of the same type. If you have an array of integers, you can't suddenly store a text string in it.
Has Direct Access: If you want to listen to the 7th song in your playlist, you can jump directly to it. Similarly, with arrays, you can instantly access any element if you know its index.
Contiguous Memory: This is a bit more technical. When an array is created in a computer's memory, it occupies a continuous block of memory. Think of it like a row of adjacent lockers in school. Each locker is next to the other, with no gaps in between.

Python and Arrays

Python, known for its flexibility and ease of use, offers multiple ways to work with arrays. While Python does not have a native array data structure like some other languages, it provides powerful alternatives that can function similarly and even offer extended capabilities.

At first glance, Python's list might seem synonymous with an array, but there are subtle differences and nuances to consider:

List Array

A built-in Python data structure	Not native in Python - they come from the `array` module
Dynamic size	Fixed (predefined) size
Can hold items of different data types	Hold items of the same type
Provide a range of built-in methods for manipulation	Need to import external modules
O(1) time complexity for access operations	O(1) time complexity for access operations
Consume more memory	More memory efficient

Looking at this table, it comes naturally to ask - "When to use which?". Well, if you need a collection that can grow or shrink dynamically and can hold mixed data types, Python's list is the way to go. However, for scenarios requiring a more memory-efficient collection with elements of the same type, you might consider using Python's array module or external libraries like NumPy.

The array Module in Python

When most developers think of arrays in Python, they often default to thinking about lists. However, Python offers a more specialized array structure through its built-in array module. This module provides a space-efficient storage of basic C-style data types in Python.

While Python lists are incredibly versatile and can store any type of object, they can sometimes be overkill, especially when you only need to store a collection of basic data types, like integers or floats. The array module provides a way to create arrays that are more memory efficient than lists for specific data types.

Creating an Array

To use the array module, you first need to import it:

from array import array

Once imported, you can create an array using the array() constructor:

arr = array('i', [1, 2, 3, 4, 5]) print(arr)

Here, the 'i' argument indicates that the array will store signed integers. There are several other type codes available, such as 'f' for floats and 'd' for doubles.

Accessing and Modifying Elements

You can access and modify elements in an array just like you would with a list:

print(arr[2])

And now, let's modify the element by changing it's value to 6:

arr[2] = 6 print(arr)

Array Methods

The array module provides several methods to manipulate arrays:

append() - Adds an element to the end of the array:
arr.append(7) print(arr)
extend() - Appends iterable elements to the end:
arr.extend([8, 9]) print(arr)
pop() - Removes and returns the element at the given position:
arr.pop(2) print(arr)
remove(): Removes the first occurrence of the specified value:
arr.remove(2) print(arr)
reverse(): Reverses the order of the array:
arr.reverse() print(arr)

Note: There are more methods than we listed here. Refer to the official Python documentation to see a list of all available methods in the array module.

While the array module offers a more memory-efficient way to store basic data types, it's essential to remember its limitations. Unlike lists, arrays are homogeneous. This means all elements in the array must be of the same type. Also, you can only store basic C-style data types in arrays. If you need to store custom objects or other Python types, you'll need to use a list or another data structure.

NumPy Arrays

NumPy, short for Numerical Python, is a foundational package for numerical computations in Python. One of its primary features is its powerful N-dimensional array object, which offers fast operations on arrays, including mathematical, logical, shape manipulation, and more.

NumPy arrays are more versatile than Python's built-in array module and are a staple in data science and machine learning projects.

Why Use NumPy Arrays?

The first thing that comes to mind is performance. NumPy arrays are implemented in C and allow for efficient memory storage and faster operations due to optimized algorithms and the benefits of contiguous memory storage.

While Python's built-in arrays are one-dimensional, NumPy arrays can be multi-dimensional, making them ideal for representing matrices or tensors.

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

Finally, NumPy provides a vast array of functions to operate on these arrays, from basic arithmetic to advanced mathematical operations, reshaping, splitting, and more.

Note: When you know the size of the data in advance, pre-allocating memory for arrays (especially in NumPy) can lead to performance improvements.

Creating a NumPy Array

To use NumPy, you first need to install it (pip install numpy) and then import it:

import numpy as np

Once imported, you can create a NumPy array using the array() function:

arr = np.array([1, 2, 3, 4, 5]) print(arr)

You can also create multi-dimensional arrays:

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) print(matrix)

This will give us:

[[1 2 3] [4 5 6] [7 8 9]]

Besides these basic ways we can create arrays, NumPy provides us with other clever ways we can create arrays. One of which is the arange() method. It creates arrays with regularly incrementing values:

arr = np.arange(10) print(arr)

Another one is the linspace() method, which creates arrays with a specified number of elements, spaced equally between specified beginning and end values:

even_space = np.linspace(0, 1, 5) print(even_space)

Accessing and Modifying Elements

Accessing and modifying elements in a NumPy array is intuitive:

print(arr[2]) arr[2] = 6 print(arr)

Doing pretty much the same for multi-dimensional arrays:

print(matrix[1, 2]) matrix[1, 2] = 10 print(matrix)

Will change the value of the element in the second row (index 1) and the third column (index 2):

[[1 2 3] [4 5 20] [7 8 9]]

Changing the Shape of an Array

NumPy offers many functions and methods to manipulate and operate on arrays. For example, you can use the reshape() method to change the shape of an array. Say we have a simple array:

import numpy as np arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) print("Original Array:") print(arr)

And we want to reshape it to a 3x4 matrix. All you need to do is use the reshape() method with desired dimensions passed as arguments:

reshaped_arr = arr.reshape(3, 4) print("Reshaped Array (3x4):") print(reshaped_arr)

This will result in:

Reshaped Array (3x4): [[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]]

Matrix Multiplication

The numpy.dot() method is used for matrix multiplication. It returns the dot product of two arrays. For one-dimensional arrays, it is the inner product of the arrays. For 2-dimensional arrays, it is equivalent to matrix multiplication, and for N-D, it is a sum product over the last axis of the first array and the second-to-last of the second array.

Let's see how it works. First, let's compute the dot product of two 1-D arrays (the inner product of the vectors):

import numpy as np vec1 = np.array([1, 2, 3]) vec2 = np.array([4, 5, 6]) dot_product_1d = np.dot(vec1, vec2) print("Dot product of two 1-D arrays:") print(dot_product_1d)

This will result in:

Dot product of two 1-D arrays: 32

32 is, in fact, the inner product of the two arrays - (14 + 25 + 3*6). Next, we can perform matrix multiplication of two 2-D arrays:

mat1 = np.array([[1, 2], [3, 4]]) mat2 = np.array([[2, 0], [1, 3]]) matrix_product = np.dot(mat1, mat2) print("Matrix multiplication of two 2-D arrays:") print(matrix_product)

Which will give us:

Matrix multiplication of two 2-D arrays: [[ 4 6] [10 12]]

NumPy arrays are a significant step up from Python's built-in lists and the array module, especially for scientific and mathematical computations. Their efficiency, combined with the rich functionality provided by the NumPy library, makes them an indispensable tool for anyone looking to do numerical operations in Python.