Extract Arabic characters using regex

YasserKhalil
PlatinumLounger
Posts: 4913
Joined: 31 Aug 2016, 09:02

Extract Arabic characters using regex

Post by YasserKhalil »

Hello everyone
I have a UDF that enables me to extract the Arabic letters from a string but I need to use regex
Here's my try

Code: Select all

Function ExtractArabic(ByVal inputString As String) As String
    Dim regex As Object
    Set regex = CreateObject("VBScript.RegExp")
    regex.Pattern = "[^ \u0600-\u06FF]+"
    If regex.Test(inputString) Then
        ExtractArabic = Trim(regex.Replace(inputString, ""))
    Else
        ExtractArabic = ""
    End If
End Function
The udf works partially as I got some english characters in between in the result. How can I get the pure Arabic characters plus the spaces only?

YasserKhalil
PlatinumLounger
Posts: 4913
Joined: 31 Aug 2016, 09:02

Re: Extract Arabic characters using regex

Post by YasserKhalil »

I could solve it

Code: Select all

Function RemoveNonArabic(ByVal txt As String) As String
    Dim regex As Object, s As String
    Set regex = CreateObject("VBScript.RegExp")
    With regex
        .Global = True
        .MultiLine = True
        .Pattern = "[^0-9\s\u0600-\u06FF]+"
    End With
    s = regex.Replace(txt, "")
    RemoveNonArabic = Application.WorksheetFunction.Trim(s)
End Function

But I welcome any ideas.