問題描述:
最近有同學在工作有使用到iconv-go這個庫,涉及到轉換字符的,出現如下報錯,然後再諮詢我,然後我自己也學習了一下。
報錯信息如下:
invalid or incomplete multibyte or wide character
用到的golang轉化庫爲:
github.com/djimenez/iconv-go
使用的函數爲:
body, err = iconv.ConvertString(body, "GBK", "utf-8")
解決思路:
進去github.com/djimenez/iconv-go點擊源碼查看
首先iconv.ConvertString的實現是在iconv.go中
func ConvertString(input string, fromEncoding string, toEncoding string) (output string, err error) { // create a temporary converter converter, err := NewConverter(fromEncoding, toEncoding) if err == nil { // convert the string output, err = converter.ConvertString(input) // close the converter converter.Close() } return }
通過以上發現, 它調用了
NewConverter(fromEncoding, toEncoding)
新建了一個結構體Converter,調用下面結構體的實現的
output, err = converter.ConvertString(input)
繼續跟蹤這個結構方法,在converter.go内找到實現
type Converter struct { context C.iconv_t open bool } // Initialize a new Converter. If fromEncoding or toEncoding are not supported by // iconv then an EINVAL error will be returned. An ENOMEM error maybe returned if // there is not enough memory to initialize an iconv descriptor func NewConverter(fromEncoding string, toEncoding string) (converter *Converter, err error) { converter = new(Converter) // convert to C strings toEncodingC := C.CString(toEncoding) fromEncodingC := C.CString(fromEncoding) // open an iconv descriptor converter.context, err = C.iconv_open(toEncodingC, fromEncodingC) // free the C Strings C.free(unsafe.Pointer(toEncodingC)) C.free(unsafe.Pointer(fromEncodingC)) // check err if err == nil { // no error, mark the context as open converter.open = true } return }
可以看出,它底層調用的是CGO庫轉化實現
converter.context, err = C.iconv_open(toEncodingC, fromEncodingC)
通過查詢C庫的文档man iconv_open,DESCRIPTION部分有如下介紹
The empty encoding name "" is equivalent to "char": it denotes the locale dependent character encoding.
When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target
character set, it can be approximated through one or several similarly looking characters.
When the string "//IGNORE" is appended to tocode, characters that cannot be represented in the target character set will be silently discarded.
The resulting conversion descriptor can be used with iconv any number of times. It remains valid until deallocated using iconv_close.
A conversion descriptor contains a conversion state. After creation using iconv_open, the state is in the initial state. Using iconv modifies the descrip-
tor's conversion state. (This implies that a conversion descriptor can not be used in multiple threads simultaneously.) To bring the state back to the ini-
tial state, use iconv with NULL as inbuf argument.
重點是這句話
When the string “//IGNORE” is appended to tocode, characters that cannot be represented in the target character set will be silently discarded.
意思是說,在”tocode”之後加”//IGNORE”,那些不能被tocode顯示的字符將會自動被忽略,oh good,正好是我想要的.
由這些層層調用關係
ConvertString(input string, fromEncoding string, toEncoding string) NewConverter(fromEncoding string, toEncoding string) (converter *Converter, err error) C.iconv_open(toEncodingC, fromEncodingC)
我們只需將//IGNORE傳遞到c庫既可支持
所以代碼改爲:
body, err = iconv.ConvertString(body, "GBK", "utf-8//IGNORE")
經測試,沒有報err,大功告成.
重述一下解決方案:
body, err = iconv.ConvertString(body, "GBK", "utf-8//IGNORE")