Find Similarities: Intersection of Arrays in Python

Can you imagine being able to see the shared DNA between a car and a bicycle through their characteristics? 🧬 Let’s decipher together how to do it with Python and sets.

🔮 Problem Statement

We have a neural network trained for object recognition. Each time this network identifies an object, it returns an array of integers. Each integer represents the ID of a specific feature detected in the object. For example, [1, 5, 122] could mean “has tires (1)”, “has doors (5)”, and “is red (122)”.

The goal is to determine the similarities between two different objects analyzed by the neural network. To do this, we need a function that, given two arrays of features (one for each object), returns a new array containing only the features shared between both.

Parameters:

article1: int[n] - Array of integers representing the features of object 1.
article2: int[m] - Array of integers representing the features of object 2.

Return Value:

int[o]: Array of integers containing the features in common between article1 and article2. This array should be sorted.

Examples:

>>> sorted(get_in_common([2, 5, 9], [2, 7, 1])) == [2]
True
>>> sorted(get_in_common([2, 3, 4, 5], [5, 9, 2])) == [2, 5]
True
>>> sorted(get_in_common([1, 2, 3], [5, 6, 9])) == [] #Modified to reflect the correct behavior
True

🧩 Step-by-Step Solution

To solve this problem, we will take advantage of the benefits of sets in Python, especially their efficiency in performing intersection operations.

Function Definition: We begin by defining the get_in_common function, which will take two arrays, article1 and article2, as input.
```
def get_in_common(article1, article2):
```
Conversion to Sets: The first crucial step is to convert both arrays into sets. This conversion removes duplicates within each list and, more importantly, allows us to use the set intersection operator (&), which is highly efficient for finding common elements.
```
article1 = set(article1)
```
```
article2 = set(article2)
```
Calculate the Intersection: We use the & operator to obtain the intersection of the two sets. The result is a new set containing only the elements that are present in both original sets.
```
return list(article1 & article2)
```
Finally, we convert the resulting set back into a list to meet the specified return format.

Complete Solution:

def get_in_common(article1, article2):
	"level: easy; points: 3"
	article1 = set(article1)
	article2 = set(article2)
	return list(article1 & article2)

🧠 Key Concepts

The key to the efficiency of this solution lies in the use of sets. Unlike lists, sets in Python are optimized for search and membership operations. The set intersection operation (&) has an average time complexity of O(min(len(set1), len(set2))), which means it is much faster than iterating over a list and checking for membership in another (which would have a worst-case complexity of O(n*m)). In addition, the conversion of types between lists and sets (and vice versa) allows you to take advantage of the benefits of each data structure at different stages of the process. The removal of duplicates performed by the set is also important to prevent the same feature identifier from appearing multiple times in the final result.

💫 Final Thoughts

Although the presented solution is concise and efficient, there are some potential improvements. For example, we could add input validations to ensure that article1 and article2 are actually lists of integers. Also, if the order of the common features were relevant, we could modify the function to maintain that order (perhaps using a dictionary to track the original position of each feature). Did you know that the set implementation in CPython uses hash tables, which explains its excellent performance in search and membership operations? 🤯

I hope this analysis has been useful and interesting to you. If you want to continue exploring the fascinating world of data structures and algorithms in Python, feel free to subscribe and explore more articles! Happy coding! ✨