After learning about lists and tuples, let us study another container data type, called a set. The word "set" should not feel strange, because this concept also appears in math textbooks. If we treat a collection of things within a certain scope, which are definite and distinguishable, as a whole, then that whole is a set, and each thing in the set is called an element of the set.
A set usually has these properties:
- Unordered: elements do not have positions.
- Unique: duplicate elements are not allowed.
- Definite membership: for any given element, it either belongs to the set or it does not.
Sets in Python are not essentially different from sets in mathematics. What especially needs to be emphasized are the unordered and unique properties mentioned above. Unordered means the elements in a set do not have an order like the elements in a list, so we cannot access any element through indexing. Sets do not support indexing operations. In addition, the uniqueness of sets means that sets cannot have duplicate elements, and this is also one place where sets are different from lists. We cannot add duplicate elements to a set. The set type of course supports the membership operations in and not in, so we can determine whether one element belongs to a set, which is the certainty mentioned above. Membership operations on sets perform better than membership operations on lists. This is decided by the underlying storage characteristics of sets. We will not discuss it here for now. Just remember this conclusion.
Note: The bottom layer of a set uses hash storage. Readers who do not understand hash storage can first read the explanation of hash tables on the Hello Algo website.
In Python, you can create a set with the literal syntax {}, but there must be at least one element inside {}. That is because {} without any elements is not an empty set but an empty dictionary, and we will introduce dictionaries in the next lesson. Of course, you can also use Python's built-in set function to create a set. More exactly, set is not an ordinary function but a constructor for creating set objects. We will introduce this point later when discussing object-oriented programming. We can use the set function to create an empty set, and we can also use it to turn other sequences into sets. For example, set('hello') gives a set containing 4 characters, because the repeated character l appears only once in the set. Besides these two ways, you can also use comprehension syntax to create a set, just as we previously used comprehension syntax to create lists.
set1 = {1, 2, 3, 3, 3, 2}
print(set1)
set2 = {'banana', 'pitaya', 'apple', 'apple', 'banana', 'grape'}
print(set2)
set3 = set('hello')
print(set3)
set4 = set([1, 2, 2, 3, 3, 3, 2, 1])
print(set4)
set5 = {num for num in range(1, 20) if num % 3 == 0 or num % 7 == 0}
print(set5)It should be pointed out that elements in a set must be hashable types. A hashable type means a data type that can calculate a hash code. Usually, immutable types are hashable, such as integers (int), floating-point numbers (float), Boolean values (bool), strings (str), and tuples (tuple). Mutable types are not hashable, because mutable types cannot calculate a stable hash code, so they cannot be put into a set. For example, we cannot use a list as an element in a set. In the same way, because a set itself is also mutable, a set cannot be used as an element of another set. We can create nested lists, where the elements of a list are also lists, but we cannot create nested sets. This point must be remembered when using sets.
Tip: If you do not understand the concepts of hash code or hash storage mentioned above, you can leave them for now, because they do not affect your continued learning and use of Python. Of course, if you are a computer-science student, it is hard to forgive not understanding hash storage, so you should quickly make up that lesson.
We can use the len function to get how many elements are in a set, but we cannot use indexing to go through the elements in a set, because set elements do not have a fixed order. Of course, if we want to go through the elements of a set, we can still use a for-in loop, as shown below.
set1 = {'Python', 'C++', 'Java', 'Kotlin', 'Swift'}
for elem in set1:
print(elem)Tip: Look at the output of the code above, and feel the unordered nature of sets from the order in which the words are printed.
Python provides very rich operations for the set type, mainly including membership operations, intersection, union, difference, comparison operations such as equality, subset, and superset, and so on.
We can use the membership operations in and not in to check whether an element is in a set, as shown below.
set1 = {11, 12, 13, 14, 15}
print(10 in set1) # False
print(15 in set1) # True
set2 = {'Python', 'Java', 'C++', 'Swift'}
print('Ruby' in set2) # False
print('Java' in set2) # Trueset1 = {1, 2, 3, 4, 5, 6, 7}
set2 = {2, 4, 6, 8, 10}
print(set1 & set2) # intersection
print(set1.intersection(set2))
print(set1 | set2) # union
print(set1.union(set2))
print(set1 - set2) # difference
print(set1.difference(set2))
print(set1 ^ set2) # symmetric difference
print(set1.symmetric_difference(set2))From the code above, we can see that when finding the intersection of two sets, the & operator and the intersection method do exactly the same thing. Using the operator is clearly more direct, and the code is also shorter. It should be explained that binary operations on sets can also be combined with assignment operations to form compound assignment operations. For example, set1 |= set2 is the same as set1 = set1 | set2, and the method with the same effect as |= is update; set1 &= set2 is the same as set1 = set1 & set2, and the method with the same effect as &= is intersection_update, as shown below.
set1 = {1, 3, 5, 7}
set2 = {2, 4, 6}
set1 |= set2
print(set1) # {1, 2, 3, 4, 5, 6, 7}
set3 = {3, 6, 9}
set1 &= set3
# set1.intersection_update(set3)
print(set1) # {3, 6}
set2 -= set1
# set2.difference_update(set1)
print(set2) # {2, 4}Two sets can use == and != to judge equality. If the elements in two sets are exactly the same, then the result of == is True; otherwise it is False. If every element of set A is also an element of set B, then set A is called a subset of set B; in reverse, B can be called a superset of A. If A is a subset of B and A is not equal to B, then A is a proper subset of B. Python provides operators for the set type to judge subset and superset relationships. These are the very familiar operators <, <=, >, and >=. Of course, we can also use the set methods issubset and issuperset to judge the relationship between sets.
set1 = {1, 3, 5}
set2 = {1, 2, 3, 4, 5}
set3 = {5, 4, 3, 2, 1}
print(set1 < set2) # True
print(set1 <= set2) # True
print(set2 < set3) # False
print(set2 <= set3) # True
print(set2 > set1) # True
print(set2 == set3) # True
print(set1.issubset(set2))
print(set2.issuperset(set1))Note: In the code above,
set1 < set2checks whetherset1is a proper subset ofset2,set1 <= set2checks whetherset1is a subset ofset2, andset2 > set1checks whetherset2is a superset ofset1. Of course, we can also useset1.issubset(set2)to check whetherset1is a subset ofset2, and useset2.issuperset(set1)to check whetherset2is a superset ofset1.
As we said just now, sets in Python are mutable types, so we can add elements to a set or delete elements from a set through set methods.
set1 = {1, 10, 100}
set1.add(1000)
set1.add(10000)
print(set1)
set1.discard(10)
if 100 in set1:
set1.remove(100)
print(set1)
set1.clear()
print(set1) # set()Note: The
removemethod for deleting an element raisesKeyErrorwhen the element does not exist, so in the code above we first use a membership operation to check whether the element is in the set. The set type also has a method namedpop, which can randomly delete one element from the set. While deleting the element, it also returns the deleted element. Theremoveanddiscardmethods only delete the element and do not return it.
The set type also has a method named isdisjoint, which can check whether two sets have common elements. If they do not have common elements, this method returns True; otherwise, this method returns False, as shown below.
set1 = {'Java', 'Python', 'C++', 'Kotlin'}
set2 = {'Kotlin', 'Swift', 'Java', 'Dart'}
set3 = {'HTML', 'CSS', 'JavaScript'}
print(set1.isdisjoint(set2)) # False
print(set1.isdisjoint(set3)) # TruePython also has an immutable set type called frozenset. The difference between set and frozenset is like the difference between list and tuple. Because frozenset is an immutable type, it can calculate a hash code, so it can be used as an element in a set. Besides not being able to add or delete elements, frozenset is the same as set in other ways. The code below simply shows how to use frozenset.
fset1 = frozenset({1, 3, 5, 7})
fset2 = frozenset(range(1, 6))
print(fset1)
print(fset2)
print(fset1 & fset2)
print(fset1 | fset2)
print(fset1 - fset2)
print(fset1 < fset2)In Python, the set type is an unordered container, and it does not allow duplicate elements. Because the bottom layer uses hash storage, the elements in a set must be hashable types. The biggest difference between a set and a list is that the elements in a set have no order, so we cannot access elements through indexing operations. But sets can do binary operations such as intersection, union, and difference, and they can also use relational operators to check whether two sets have superset or subset relationships.
