Saturday, November 4, 2023

AI: Detecting phone number and email in messages

In e-commerce applications operating on a marketplace model, a significant challenge is the exchange of phone numbers and email addresses through messages between vendors and customers. This allows them to bypass the platform, resulting in a loss of commission revenue for the site. Users may employ inventive methods to evade standard detection algorithms that rely on regular expressions, such as spelling out numbers, e.g. "fivethreetwo" instead of "532". To develop a comprehensive list of such techniques, you could prompt ChatGPT with: 'I have a webpage with a messaging feature. I want to prevent the inclusion of phone numbers and emails in messages. What are some ways users might try to circumvent my safeguards?'

This issue presents an ideal challenge for AI to address. I experimented with OpenAI's text-davinci-003 model but had little success, it failed for most of my test cases. Then I tried the gpt-4 model which was much better. I wrote the following PHP script (with help from ChatGPT4) to tackle the phone number and email detection problem in Turkish text messages. Note that to verify that an AI script works, due to the random nature of AI, you should run it a couple of times with the same input to be sure that it provides the expected output every single time. Note also that the message 'şunu dene jane at example 532 222 33', causes AI to fail 50% of the time, because AI counts number of digits as 9 but it is 8! It also sometimes detects an email and sometimes not, so there is still some prompt engineering work to do.