How to remove Arabic hashtags?

I read tweets from Twitter using Twitter4j and I am trying to filter hashtags from it after I take text from it Now I turn it into strings I have this line: "892698363371638784: RT @hikids_ksa: ุงู„ู„ุนุจุฉ ุฎุทูŠุฑุฉ ู…ุฑุง ูˆูŠุจูŠ ู„ู‡ุง ู…ุฑุง ูˆูŠุจูŠ ู„ู‡ุง ู…ุฑุง ูˆูŠุจูŠ ู„ู‡ุง ู…ุฑุง ูˆูŠุจูŠ ู„ู‡ุง ู…ูˆ ุชููƒูŠุฑ ูˆ ู…ู‡ุงุฑุฉ๐Ÿ‘Œ๐Ÿป๐Ÿ’ก ู…ุชูˆูุฑุฉ ููŠ # ู…ุชุฌุฑ_ู‡ุงูŠ_ูƒูŠุฏุฒ_ุงู„ุงู„ูƒุชุฑูˆู†ูŠ .. "

I want to remove ู…ุชุฌุฑ_ู‡ุงูŠ_ูƒูŠุฏุฒ_ุงู„ุงู„ูƒุชุฑูˆู†ูŠ as it has a Hashtag after it using java

the problem of my code didn't work on this input: "@kaskasomar ู‡ูŠุฏุง ุจู„ุง ู…ุฎ ู…ุชู„ ู…ุชู„ ุบูŠุฑูˆ ุจูŠุฎูˆู† ุงู„ุดุนุจ ุงู„ู„ุจู†ุงู†ูŠ ูˆุจูŠุชู‡ู…ูˆ ุจุงู„ุงุฑู‡ุงุจ ุจุณ ู„ุงู†ุฑุฃูŠูˆ ุจูŠูŠุงุช

the ุณุฎูŠู part was not removed for some reason this is my method

static String removeHashtags(String in)
{
    in = in.replaceAll("#[A-Za-z]+","");//remove English hashtags
    in = in.replaceAll("[ุฃ-ูŠ]#+","");//remove Arabic hashtags that have # before it
    return in = in.replaceAll("#[ุฃ-ูŠ]+","");//remove Arabic hashtags that have # after it
}

      

+3


source to share


2 answers


If you are just trying to remove all hash tags in any language, you can write

in = in.replaceAll("#\\p{IsAlphabetic}+", "");

      

If you specifically want to remove Arabic hash tags, you can write



in = in.replaceAll("#\\p{IsArabic}+", "");

      

so you don't have to worry about creating a left and right and right to left regex. This improves the readability of your code.

+3


source


The problem is that the second line +

applies to the hashtag and not the Arabic characters. Fixed version:



in = in.replaceAll("[ุฃ-ูŠ]+#","");

      

+2


source







All Articles