We continue to internationalize address search using Sphinx or Manticore. Now Metaphone

This is a continuation of the publication “ Internationalization of City Address Search. Implementing the Russian-language Soundex in Sphinx Search ”, in which I discussed how to implement support for the phonetic Soundex algorithms in Sphinx Search, for text written in Cyrillic. Soundex support is already available for Latin text. It is the same with Metphone, for the Latin alphabet, but not for the Cyrillic alphabet, but we will try to correct this annoying fact with the help of transliteration, regular expressions and a file.

This is a direct continuation, in which we will analyze how to implement the original Metaphone, Russian Metaphone (in the sense that transliteration is not needed), Caverphone, and we will not be able to make Double Metaphone.

The implementation is suitable for both Sphinx Search and Manticore Search platforms.

In the end, let's see how Metaphone perceives the rakomakophone .

Docker image

Prepared the docker image tkachenkoivan / searchfonetic so that you can "feel" the result. All indexes from this publication and from the previous one have been added to the image, but, attention, the names of the indexes from the previous publication do not correspond to what is stored in the image. Why? Because a good thought comes after.

The description of the algorithms, all the same, was taken from the publication " phonetic algorithms ". I will try to duplicate the text written in it as little as possible.

Original Metaphone

It is implemented in an elementary way, regular expressions for transliteration are created:

	regexp_filter = (|) => a
	regexp_filter = (|) => b
	regexp_filter = (|) => v

And turn on the metaphone :

morphology = metaphone

mysql> select * from metaphone where match('');
| id   | aoguid                               | shortname | offname                   |
| 1130 | e21aec85-0f63-4367-b9bb-1943b2b5a8fb |         |               |

, « », call keywords


mysql> call keywords (' ', 'metaphone');
| qpos | tokenized     | normalized |
| 1    | morisa toreza | MRSTRS     |
| 1    | morisa        | MRS        |
| 2    | toreza        | TRS        |

Caverphone , .

mysql> call keywords (' ', 'caverphone');
| qpos | tokenized | normalized |
| 1    | mrsa trza | mrsa trza  |
| 1    | mrsa      | mrsa       |
| 2    | trza      | trza       |

mysql> select * from caverphone where match('');
Empty set (0.00 sec)

Double Metaphone

| qpos | tokenized    | normalized   |
| 1    |        |        |
| 2    |         |         |

mysql> select * from caverphone where match ('');
| id   | aoguid                               | shortname | offname          |
|    5 | 01339f2b-6907-4cb8-919b-b71dbed23f06 |         |          |
|  387 | 4b919f60-7f5d-4b9e-99af-a7a02d344767 |         |            |

mysql> call keywords ('', 'metaphone');
| qpos | tokenized   | normalized |
| 1    | rakomakofon | RKMKFN     |

rock the microphone:

mysql> call keywords ('rock the microphone', 'metaphone');
| qpos | tokenized           | normalized |
| 1    | rock the microphone | RK0MKRFN   |
| 1    | rock                | RK         |
| 2    | the                 | 0          |
| 3    | microphone          | MKRFN      |


mysql> call keywords ('rock microphone', 'metaphone');
| qpos | tokenized       | normalized |
| 1    | rock microphone | RKMKRFN    |
| 1    | rock            | RK         |
| 2    | microphone      | MKRFN      |


The hope qsuggest

did not come true - it will not give hints. Why? You can notice that when you call keywords

there are two columns tokenized

and normalized

