Implementing deep_diff: A Line-by-Line Python Tutorial for Dictionary Comparison

5 min read

📚 Quick Review: This practical application is built upon a fundamental programming concept. Review the Theory Lesson here first.


Mastering Dictionary Differences: A Practical Guide to deep_diff

Understanding how to programmatically compare complex data structures is a crucial skill for any developer. The deep_diff function provides an elegant solution for identifying differences between two dictionaries, including nested ones. This practical lesson will walk you through the implementation of a deep_diff function in Python, explaining each line of code and demonstrating its execution.

The deep_diff Function: Source Code

Here is the Python function we will be dissecting:

def deep_diff(dict1, dict2):    diff = {}    for key in dict1.keys() | dict2.keys():        if key in dict1 and key not in dict2:            diff[key] = {'type': 'removed', 'value': dict1[key]}        elif key in dict2 and key not in dict1:            diff[key] = {'type': 'added', 'value': dict2[key]}        elif dict1[key] != dict2[key]:            if isinstance(dict1[key], dict) and isinstance(dict2[key], dict):                nested_diff = deep_diff(dict1[key], dict2[key])                if nested_diff: diff[key] = nested_diff            else:                diff[key] = {'type': 'modified', 'old': dict1[key], 'new': dict2[key]}    return diff

Line-by-Line Code Breakdown

Let’s break down this function step by step to understand its logic:

def deep_diff(dict1, dict2):

This line defines our function, deep_diff, which accepts two arguments: dict1 and dict2. These are the two dictionaries we intend to compare.

diff = {}

We initialize an empty dictionary called diff. This dictionary will store all the identified differences between dict1 and dict2. Each key in diff will correspond to a key from the input dictionaries that has changed or is unique.

for key in dict1.keys() | dict2.keys():

This is a crucial line. It iterates over a set union of all keys from both dict1 and dict2. Using the | operator (set union) ensures that we consider every key present in either dictionary, allowing us to detect additions, removals, and modifications.

if key in dict1 and key not in dict2:

This condition checks if a key exists in dict1 but is not present in dict2. If true, it means the key (and its corresponding value) has been removed from dict1 in comparison to dict2. The diff dictionary records this as a 'removed' type change, storing the value from dict1.

            diff[key] = {'type': 'removed', 'value': dict1[key]}

elif key in dict2 and key not in dict1:

Conversely, this condition checks if a key exists in dict2 but is not present in dict1. If true, it means the key (and its value) has been added to dict2. The diff dictionary records this as an 'added' type change, storing the value from dict2.

            diff[key] = {'type': 'added', 'value': dict2[key]}

elif dict1[key] != dict2[key]:

If the key exists in both dictionaries (neither added nor removed), this condition checks if their corresponding values are not equal. This signifies a potential modification. Python’s != operator performs a value comparison.

if isinstance(dict1[key], dict) and isinstance(dict2[key], dict):

Inside the ‘modified’ branch, this nested condition is vital for deep comparison. It checks if both values for the current key are themselves dictionaries. If they are, it means we need to recursively compare these nested dictionaries.

                nested_diff = deep_diff(dict1[key], dict2[key])

Here, the function calls itself (recursion) with the nested dictionaries. This allows the comparison to delve into arbitrary levels of depth.

                if nested_diff: diff[key] = nested_diff

If the recursive call to deep_diff returns a non-empty nested_diff (meaning there were differences within the nested dictionaries), we add this nested_diff to our main diff dictionary under the current key.

else: diff[key] = {'type': 'modified', 'old': dict1[key], 'new': dict2[key]}

If the values for the current key are different but are not both dictionaries (e.g., they are integers, strings, lists, or a mix), then it’s a simple modification. The diff dictionary records this as a 'modified' type, storing both the 'old' value from dict1 and the 'new' value from dict2.

return diff

Finally, after iterating through all unique keys and identifying all differences, the function returns the accumulated diff dictionary.

Execution Environment and Example Usage

This Python function can be executed in any standard Python environment (e.g., Python 3.x). No external libraries are required, as it relies solely on built-in types and operations.

Example 1: Basic Differences

dict_a = {'name': 'Alice', 'age': 30, 'city': 'New York'}dict_b = {'name': 'Bob', 'age': 30, 'occupation': 'Engineer'}result = deep_diff(dict_a, dict_b)print(result)# Expected output:# {'name': {'type': 'modified', 'old': 'Alice', 'new': 'Bob'},#  'city': {'type': 'removed', 'value': 'New York'},#  'occupation': {'type': 'added', 'value': 'Engineer'}}

Example 2: Nested Differences

dict_c = {    'user': {'id': 1, 'info': {'email': 'a@example.com', 'status': 'active'}},    'settings': {'theme': 'dark'}}dict_d = {    'user': {'id': 1, 'info': {'email': 'b@example.com', 'status': 'inactive'}},    'settings': {'theme': 'light', 'notifications': True}}result_nested = deep_diff(dict_c, dict_d)print(result_nested)# Expected output:# {'user': {'info': {'email': {'type': 'modified', 'old': 'a@example.com', 'new': 'b@example.com'},#                     'status': {'type': 'modified', 'old': 'active', 'new': 'inactive'}}},#  'settings': {'theme': {'type': 'modified', 'old': 'dark', 'new': 'light'},#               'notifications': {'type': 'added', 'value': True}}}
💡 Developer Tip: This deep_diff function works well for dictionaries containing primitive types and other dictionaries. However, it will treat lists as primitive types and compare them entirely. If you need to compare lists element by element, or handle custom objects, you would need to extend the isinstance checks and comparison logic to include specific handling for those data types. For production use with diverse data, consider a robust library like DeepDiff, which handles many edge cases automatically.

Conclusion

The deep_diff function is a powerful tool for precisely identifying changes between two dictionaries, including those with complex nested structures. By understanding its recursive logic and how it categorizes differences, you can effectively track and manage data changes in your Python applications.

1 comment

Leave a Reply

Your email address will not be published. Required fields are marked *