Pattern matching. Now in Python

Hey!



Pattern matching has finally been brought to the anniversary minor of the third python. The concept itself can hardly be called new, it has already been implemented in many languages, both of the new generation (Rust, Golang) and those who are already over 0x18 (Java).





Pattern matching was announced by Guido van Rossum , the author of the Python programming language and a "generous lifelong dictator".



My name is Denis Kaishev, and I'm a code reviewer for the Middle Python developer course . In this post I want to tell you why Python has pattern matching and how to work with it.



Syntactically, pattern matching is essentially the same as in a number of other languages:



match_expr:
    | star_named_expression ',' star_named_expressions?
    | named_expression
match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression
patterns: value_pattern ',' [values_pattern] | pattern
pattern: walrus_pattern | or_pattern
walrus_pattern: NAME ':=' or_pattern
or_pattern: '|'.closed_pattern+
closed_pattern:
    | capture_pattern
    | literal_pattern
    | constant_pattern
    | group_pattern
    | sequence_pattern
    | mapping_pattern
    | class_pattern
capture_pattern: NAME !('.' | '(' | '=')
literal_pattern:
    | signed_number !('+' | '-')
    | signed_number '+' NUMBER
    | signed_number '-' NUMBER
    | strings
    | 'None'
    | 'True'
    | 'False'
constant_pattern: attr !('.' | '(' | '=')
group_pattern: '(' patterns ')'
sequence_pattern: '[' [values_pattern] ']' | '(' ')'
mapping_pattern: '{' items_pattern? '}'
class_pattern:
    | name_or_attr '(' ')'
    | name_or_attr '(' ','.pattern+ ','? ')'
    | name_or_attr '(' ','.keyword_pattern+ ','? ')'
    | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'
signed_number: NUMBER | '-' NUMBER
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME
values_pattern: ','.value_pattern+ ','?
items_pattern: ','.key_value_pattern+ ','?
keyword_pattern: NAME '=' or_pattern
value_pattern: '*' capture_pattern | pattern
key_value_pattern:
    | (literal_pattern | constant_pattern) ':' or_pattern
    | '**' capture_pattern

      
      





It may seem complicated and confusing, but in reality it all boils down to something like this:



match some_expression:
    case pattern_1:
        ...
    case pattern_2:
        ...

      
      





It looks much clearer and more pleasing to the eye.



The templates themselves are divided into several groups:



  • Literal Patterns;
  • Capture Patterns;
  • Wildcard Pattern;
  • Constant Value Patterns;
  • Sequence Patterns;
  • Mapping Patterns;
  • Class Patterns.


I'll tell you a little about each of them.



Literal Patterns



The Literal pattern, as the name suggests, involves matching a series of values, namely strings, numbers, booleans, and NULL None.



It looks like the string == 'string'



method is being used __eq__



.



match number:
    case 42:
        print('answer')
    case 43:
        print('not answer')

      
      





Capture Patterns



A capture template allows you to bind a variable with a name given in the template and use that name within the local scope.



match greeting:
    case "":
        print('Hello my friend')
    case name:
        print(f'Hello  {name}')
      
      







Wildcard pattern



If there are too many matching options, then you can use _



, which is a certain default value and will match all elements in the structure match






match number:
    case 42:
        print("Its’s forty two")
    case _:
        print("I don’t know, what it is")
      
      





Constant Value Patterns



When using constants, you need to use dotted names, for example enumerations, otherwise the capture pattern will work.



OK = 200
CONFLICT = 409

response = {'status': 409, 'msg': 'database error'}
match response['status'], response['msg']:
    case OK, ok_msg:
        print('handler 200')
    case CONFLICT, err_msg:
        print('handler 409')
    case _:
        print('idk this status')
      
      





And the expected result will not be the most obvious.



Sequence Patterns



It allows you to compare lists, tuples, and any other objects from collections.abc.Sequence



, except str



, bytes



, bytearray



.



answer = [42]
match answer:
    case []:   
        print('i do not find answer')
    case [x]:
        print('asnwer is 42')
    case [x, *_]:
        print('i find more than one answers')
      
      





Now there is no need to call each time len()



to check the number of items in the list, since the method will be called __len__



.



Mapping Patterns



This group is a bit like the previous one, only here we are matching dictionaries, or, to be precise, objects of type collections.abc.Mapping



. They can be combined quite well with each other.



args = (1, 2)
kwargs = {'kwarg': 'kwarg', 'one_more_kwarg': 'one_more_kwarg'}

def match_something(*args, **kwargs):
    match (args, kwargs):
        case (arg1, arg2), {'kwarg': kwarg}:
            print('i find positional args and one keyword args')
        case (arg1, arg2), {'kwarg': kwarg, 'one_more_kwarg': one_more_kwarg}:
            print('i find a few keyword args')
        case _:
            print('i cannot match anything')

match_something(*args, **kwargs)
      
      





And all would be fine, but there is a feature. This pattern guarantees the entry of this key (s) into the dictionary, but the length of the dictionary does not matter. So i find positional args and one keyword args will appear on the screen .



Class patterns



With regard to user-defined data types, the syntax is similar to object initialization.



This is how it will look with the example of data classes:



from dataclasses import dataclass

@dataclass
class Coordinate:
    x: int
    y: int
    z: int

coordinate = Coordinate(1, 2, 3)
match coordinate:
    case Coordinate(0, 0, 0):
        print('Zero point')
    case _:
        print('Another point')

      
      





You can also use if



, or so called guard



. If the condition is false, then pattern matching continues. It is worth noting that the pattern is matched first, and only after that the condition is checked:



case Coordinate(x, y, z) if z == 0:
    print('Point in the plane XY')
      
      





If you use classes directly, then you need an attribute __match_args__



in which positional arguments are needed (for namedtuple and dataclasses, it is __match_args__



generated automatically).



class Coordinate:
    __match_args__ = ['x', 'y', 'z']

    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

oordinate = oordinate(1, 2, 3)
match oordinate:
    case oordinate(0, 0, 0):
        print('Zero oordinate')
    case oordinate(x, y, z) if z == 0:
        print('oordinate in the plane Z')
    case _:
        print('Another oordinate')

      
      





Otherwise, a TypeError exception will be thrown: Coordinate () accepts 0 positional sub-patterns (3 given)



What is the bottom line?



In fact, it looks like another syntactic sugar along with the recent one walrus operator



. The implementation in its current form converts statement blocks match



to equivalent constructs if/else



, namely bytecode, which has the same effect.





Armin Ronacher, the creator of the Flask web framework for Python, very succinctly described the current state of Pattern matching.



Yes, it's hard to argue: the code will become somewhat cleaner than it would be if/else



a third of the screen tower . But you can't call it something that produces a wow effect either. It's not bad that it is introduced: it will be convenient to use it in some places, but not everywhere. One way or another, the main thing with this novelty is not to overdo it, not to run faster to update all projects to 3.10 and rewrite everything, because:

Now is better than never. Although never is often better than right now.


Will you use it? If so, where?



All Articles