使用Java實現(xiàn)查找并移除字符串中的Emoji
一、基礎(chǔ)知識
- Emoji 實際上是 UTF-8 (Unicode) 字符集上的特殊字符,多數(shù)基本 Emoji 都被分配到 Unicode 編碼表 1 號平面的 U+1F300–1F6FF 和 U+1F900–1FAFF 兩個區(qū)域,由2個字符組成。
- 膚色修飾:大多數(shù)與人相關(guān)的 Emoji 默認是黃色的,所以后來引入了五個新碼點作為修飾符:
U+1F3FB
、U+1F3FC
、U+1F3FD
、U+1F3FE
、U+1F3FF
。膚色修飾符追加到現(xiàn)有的 Emoji 后形成新的樣式:U+1F44B
(?? ) +U+1F3FD
= ???? - 符號變體或組合:一個普通的字后連接一個或多個變體、組合標識(字符),組合形成的 Emoji :
U+25C0
+U+FE0F
= ??U+27A1
+U+FE0F
= ??1
+U+FE0F
+U+20E3
= 1?? - 國旗:每個國旗由2個地區(qū)標識符組合而成,地區(qū)標識符的對應(yīng)碼點范圍為
U+1F1E6
~U+1F1FF
,等同于2個指定范圍的普通 Emoji 字符組成。U+1F1E8
+U+1F1F3
= ???? - 零寬度連接符(ZWJ):多個基礎(chǔ) Emoji 通過零寬度連接符(
U+200D
)形成的復(fù)雜 Emoji: ??+U+200D
+??= ???? ??+U+200D
+??+U+200D
+??= ?????? ??+U+200D
+??+U+200D
+??+U+200D
+??= ???????? - 序列:一個基礎(chǔ) Emoji 加上多個標簽字符 (
U+E0020
~U+E007F
)并以 Tag Cancel(U+E007
)結(jié)尾,組合形成一個復(fù)雜 Emoji:U+1F3F4
(??) +U+E0067
+U+E0062
+U+E0065
+U+E006E
+U+E0067
+U+E007F
= ?????????????? - 特殊符號: 特殊符號只有1個字符,有些符號在某些環(huán)境下會被當做Emoj處理:?、?、?;
Unicode 只是約定了碼點到 emoji 的映射關(guān)系,并沒有約定 Emoji 圖形,每個 Emoji 字體文件可以按照自己的想法設(shè)計 Emoji。
二、解決方案
除了一些特殊符號形式的 Emoji,其他Emoji至少有2個字符,所以先根據(jù)第二個字符類型判斷是否為Emoji,使用Character.UnicodeBlock.of
和Character.getType
方法判定每個字符的類型。
通過第二個字符類型判斷當前2個字符為 Emoji 后: 1)判斷是否有后續(xù)修飾 2)判斷處理國旗類型;判斷處理膚色修飾;判斷處理 Emoji 序列標簽;判斷處理零寬度連接符;判斷處理連續(xù)變體、組合標識;按照普通 Emoji 處理;
處理單字符的特殊符號,這一類型內(nèi)有的屬于 Emoji,有的不是,目前全部簡單的按照普通 Emoji 處理;
三、完整代碼
package com.zpf.tool; import java.util.List; public class EmojiUtil { public static boolean isEmojiNationalFlag(int codePoint) { return codePoint >= 127462 && codePoint <= 127487; } // String str = new String(new int[]{0x1F44B, 0x1F3FD}, 0, 2); public static boolean isEmojiSkinColor(int codePoint) { return codePoint >= 127995 && codePoint <= 127999; } // String str = new String(new int[]{0x1F3F4, 0xE0067, 0xE0062, 0xE0065, 0xE006E, 0xE0067, 0xE007F}, 0, 7); public static boolean isEmojiTagEnd(int codePoint) { return codePoint == 917631; } public static boolean isEmojiTagSpec(int codePoint) { return codePoint >= 917536 && codePoint <= 917630; } public static boolean isEmojiDecorateBlock(Character.UnicodeBlock block) { if (block == null) { return false; } return block.equals(Character.UnicodeBlock.VARIATION_SELECTORS) || block.equals(Character.UnicodeBlock.VARIATION_SELECTORS_SUPPLEMENT) || block.equals(Character.UnicodeBlock.COMBINING_HALF_MARKS) || block.equals(Character.UnicodeBlock.COMBINING_MARKS_FOR_SYMBOLS) || block.equals(Character.UnicodeBlock.COMBINING_DIACRITICAL_MARKS) || block.equals(Character.UnicodeBlock.COMBINING_DIACRITICAL_MARKS_SUPPLEMENT); } public static void pickAllEmoji(CharSequence data, StringBuilder removeResult, List<String> emojiList) { if (removeResult == null && emojiList == null) { return; } if (removeResult != null) { removeResult.delete(0, removeResult.length()); } if (emojiList != null) { emojiList.clear(); } if (data == null || data.length() == 0) { return; } StringBuilder emojiBuilder = new StringBuilder(); int i = 0; int j; Character.UnicodeBlock block; while (i < data.length()) { if (i + 1 < data.length()) { block = Character.UnicodeBlock.of(data.charAt(i + 1)); if (isEmojiDecorateBlock(block) || Character.UnicodeBlock.LOW_SURROGATES.equals(block)) { if (i + 2 >= data.length()) { emojiBuilder.append(data, i, i + 2); break; } j = handleNationalFlag(data, i, emojiBuilder, emojiList); if (i != j) { i = j; continue; } j = handleHumanSkin(data, i, emojiBuilder, emojiList); if (i != j) { i = j; continue; } j = handleTagSequence(data, i, emojiBuilder, emojiList); if (i != j) { i = j; continue; } emojiBuilder.append(data, i, i + 2); i = handleNextChar(data, i + 2, emojiBuilder, emojiList); continue; } } recordEmoji(emojiBuilder, emojiList); int type = Character.getType(data.charAt(i)); if (type == (int) Character.OTHER_SYMBOL) {//特殊符號一律按照Emoji處理 if (emojiList != null) { emojiList.add(String.valueOf(data.charAt(i))); } } else if (removeResult != null) { removeResult.append(data.charAt(i)); } i++; } recordEmoji(emojiBuilder, emojiList); } private static int handleNextChar(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) { if (i >= data.length()) { return i; } char nextChar = data.charAt(i); if (nextChar == '\u200D') {//零寬度連接符 emojiBuilder.append(nextChar); return i + 1; } int j = i; Character.UnicodeBlock block; while (j < data.length()) { nextChar = data.charAt(j); block = Character.UnicodeBlock.of(nextChar); if (isEmojiDecorateBlock(block)) { emojiBuilder.append(nextChar); j++; } else { break; } } if (i != j) { recordEmoji(emojiBuilder, emojiList); } return j; } private static int handleNationalFlag(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) { int codePoint = Character.codePointAt(data, i); if (isEmojiNationalFlag(codePoint)) {//處理國旗類型 recordEmoji(emojiBuilder, emojiList);//提交未處理 if (i + 3 < data.length()) { codePoint = Character.codePointAt(data, i + 2); if (isEmojiNationalFlag(codePoint)) { emojiBuilder.append(data, i, i + 4); recordEmoji(emojiBuilder, emojiList); i = i + 4; } } i = i + 2; } return i; } private static int handleHumanSkin(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) { if (i + 3 >= data.length()) { return i; } int codePoint = Character.codePointAt(data, i + 2); if (isEmojiSkinColor(codePoint)) {//膚色修飾 emojiBuilder.append(data, i, i + 4); recordEmoji(emojiBuilder, emojiList); i = i + 4; } return i; } private static int handleTagSequence(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) { if (i + 3 >= data.length()) { return i; } int codePoint = Character.codePointAt(data, i + 2); if (isEmojiTagSpec(codePoint)) { emojiBuilder.append(data, i, i + 4); i = i + 4; while (i < data.length()) { codePoint = Character.codePointAt(data, i); if (isEmojiTagSpec(codePoint)) { emojiBuilder.append(data, i, i + 2); i = i + 2; } else if (isEmojiTagEnd(codePoint)) { emojiBuilder.append(data, i, i + 2); recordEmoji(emojiBuilder, emojiList); i = i + 2; break; } else { //error break; } } emojiBuilder.delete(0, emojiBuilder.length()); } else if (isEmojiTagEnd(codePoint)) { emojiBuilder.append(data, i, i + 4); recordEmoji(emojiBuilder, emojiList); i = i + 4; } return i; } private static void recordEmoji(StringBuilder builder, List<String> emojiList) { if (builder != null && builder.length() > 0) { if (emojiList != null) { emojiList.add(builder.toString()); } builder.delete(0, builder.length()); } } }
以上就是使用Java實現(xiàn)查找并移除字符串中的Emoji的詳細內(nèi)容,更多關(guān)于Java查找并移除字符串中Emoji的資料請關(guān)注腳本之家其它相關(guān)文章!
相關(guān)文章
SpringBoot實現(xiàn)異步事件驅(qū)動的方法
本文主要介紹了SpringBoot實現(xiàn)異步事件驅(qū)動的方法,文中通過示例代碼介紹的非常詳細,對大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價值,需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧2021-06-06mybatis中Oracle參數(shù)為NULL錯誤問題及解決
這篇文章主要介紹了mybatis中Oracle參數(shù)為NULL錯誤問題及解決,具有很好的參考價值,希望對大家有所幫助。如有錯誤或未考慮完全的地方,望不吝賜教2022-12-12Shiro實現(xiàn)session限制登錄數(shù)量踢人下線功能
這篇文章主要介紹了Shiro實現(xiàn)session限制登錄數(shù)量踢人下線,本文記錄的是shiro采用session作為登錄方案時,對用戶進行限制數(shù)量登錄,以及剔除下線,需要的朋友可以參考下2023-11-11詳解Spring Cloud Consul 實現(xiàn)服務(wù)注冊和發(fā)現(xiàn)
這篇文章主要介紹了Spring Cloud Consul 實現(xiàn)服務(wù)注冊和發(fā)現(xiàn),小編覺得挺不錯的,現(xiàn)在分享給大家,也給大家做個參考。一起跟隨小編過來看看吧2018-03-03Android?Studio中創(chuàng)建java工程的完整步驟
Android?Studio創(chuàng)建java工程是非常麻煩的,因為Android?Studio沒有提供直接創(chuàng)建java工程的方法,下面這篇文章主要給大家介紹了關(guān)于Android?Studio中創(chuàng)建java工程的完整步驟,需要的朋友可以參考下2024-01-01