Interview: Babel Street CIO Gil Irizarry on Chinese Name Matching

Name screening is critical to compliance processes like onboarding and sanctions screening, but it is also challenging due to the many ways a name can be written. For non-Latin scripts and ideographs like Chinese, the complexity increases even further. 

The Babel Street-powered name matching system used in Siron®One enhances accuracy in identifying names, but traditional fuzzy name matching techniques often struggle with Chinese scripts and ideographs, leading to unreliable or slow results.

In this interview, Babel Street CIO Gil Irizarry breaks down what makes name matching for Chinese scripts and ideographs so complex. He explains the limitations of standard fuzzy name matching and how Babel Street developed a unique hybrid approach—the pairwise method—to achieve accurate results in sub-seconds.

Below is a written summary of the interview:

Understanding Chinese Scripts and Their Complexity  

Question 1: Give us an overview of the complexity of Chinese scripts and ideographs, how it’s shared by languages of other countries, and how culture, history, politics have influenced these languages. 

Gil’s answer (summary):

The Chinese writing system is over 4,000 years old and has evolved from pictographs into stylized ideographs. It has influenced languages across Asia, with China using Hanzi, Japan adopting Kanji, Korea using Hanja, and traditional Vietnamese also incorporating Chinese characters. The complexity of these characters varies, with some having up to 172 strokes. Additionally, Chinese is a tonal language, meaning pronunciation varies based on inflection, which poses challenges when transliterating into Latin scripts. Recognizing whether characters are Chinese, Japanese, or Korean is crucial for accurate representation.”

Challenges of Transliteration Between Han Script and Latin Script 

Question 2: What are the challenges of transliterating to and from languages that use the Han script and the challenges when translating to and from Latin? 

Gil’s answer (summary):

“Transliteration differs from translation—it focuses on representing pronunciation rather than meaning. For example, the Chinese capital “北京” is transliterated as “Beijing” rather than translated as “Northern Capital.” Romanization systems like Pinyin incorporate tonal marks to preserve pronunciation, whereas Japanese and Korean transliterations do not require tones. The challenge is ensuring accurate representation while maintaining linguistic integrity.”

Fuzzy Name Matching and Its Role in Non-Latin Scripts 

Question 3: What is fuzzy name matching, how does it work, and how effective is it for non-Latin scripts? 

Gil’s answer (summary):

Fuzzy name matching helps recognize names despite variations in spelling, transcription, or transliteration. For example, “Sophia” can be spelled as S-O-P-H-I-A or S-O-F-I-A, both being valid. This is especially critical for non-Latin scripts, where multiple transliteration standards exist. For instance, the Chinese surname “Wu” might appear as W-U or W-O-O, and the Arabic name “Muhammad” can be Mohammed, Muhammed, or Mohammad. Fuzzy name matching considers such variations to improve accuracy in identity verification.”

Pairwise Matching and Its Application to Chinese Names  

Question 4: What is pairwise matching, and how does it solve the problem of transliterating Chinese names? 

Gil’s answer (summary):

“Pairwise name matching is a specialized application of fuzzy name matching, comparing two specific names for similarity. It acts as a sanity check on existing systems, ensuring that name variations are correctly identified. For instance, “Yang” might be transliterated as Y-A-N-G or Y-O-N-G, and pairwise matching helps determine if these refer to the same person. This process improves the reliability of Chinese name transliteration in regulatory contexts.”

The Two-Pass Hybrid Method for Name Screening

Question 5: How does the two-pass hybrid method work, and how much does it reduce false positives and negatives, and how much longer does it take screen names with the two-pass? 

Gil’s answer (summary):

“Babel Street-powered name matching system used in Siron®One employs a two-pass approach to balance precision and recall. The first pass generates broad candidate matches using hashing functions, while the second pass applies AI and machine learning algorithms to refine accuracy. This method ensures both speed and precision, delivering results in milliseconds. The system is optimized to process large volumes of names efficiently, reducing false positives and negatives.” 

Deep Learning Neural Networks for Japanese Name Matching  

Question 6: Tell us a little bit about the deep learning neural network that you had to train for name matching with Japanese names. 

Gil’s answer (summary):

Traditional Hidden Markov Models (HMMs) predict character sequences based on probability, but deep learning improves accuracy by considering entire word sequences. Babel Street developed a deep learning neural model specifically for matching Katakana to Latin script, enhancing accuracy beyond simple character-by-character transliteration. This approach significantly improves name matching precision in Japanese-language applications.”

Future Enhancements for Regulatory Name Screening 

Question 7: What are some of the developments/milestones coming up with Match and how will those improve regulatory name screening? 

Gil’s answer (summary):

“Babel Street is continuously improving its name-matching system by focusing on: 

  • Speed Optimization: Leveraging GPU acceleration for near-instant results. 
  • Expanding Language Support: Currently covering 25+ languages, with more to be added. 
  • Dynamic Name Frequency Models: Adapting to naming trends influenced by migration and globalization. 
  • Customizable Models: Allowing users to fine-tune models based on specific regional data. 

These improvements ensure faster, more accurate regulatory name screening, meeting the demands of high-volume environments as in financial crime compliance.” 

To watch the full interview click here.

Read our previous blog article: The Challenges of Name Screening for Chinese and Non-Latin Scripts (and How to Solve Them).