日常开发中总会遇到一些下载文件,然后通过程序处理文件内容的功能,因此也会遇到,读取后的内容不能达到我们的要求,比如从csv文件中读取到的中文是乱码,处理方式如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
| function getFileContent($file) {
$content = ''; $text = file_get_contents($file);
define('UTF32_BIG_ENDIAN_BOM', chr(0x00) . chr(0x00) . chr(0xFE) . chr(0xFF)); define('UTF32_LITTLE_ENDIAN_BOM', chr(0xFF) . chr(0xFE) . chr(0x00) . chr(0x00)); define('UTF16_BIG_ENDIAN_BOM', chr(0xFE) . chr(0xFF)); define('UTF16_LITTLE_ENDIAN_BOM', chr(0xFF) . chr(0xFE)); define('UTF8_BOM', chr(0xEF) . chr(0xBB) . chr(0xBF)); $first2 = substr($text, 0, 2); $first3 = substr($text, 0, 3); $first4 = substr($text, 0, 3); $encodType = ""; if (UTF8_BOM == $first3) { $encodType = 'UTF-8 BOM'; } else if (UTF32_BIG_ENDIAN_BOM == $first4) { $encodType = 'UTF-32BE'; } else if (UTF32_LITTLE_ENDIAN_BOM == $first4) { $encodType = 'UTF-32LE'; } else if (UTF16_BIG_ENDIAN_BOM == $first2) { $encodType = 'UTF-16BE'; } else if (UTF16_LITTLE_ENDIAN_BOM == $first2) { $encodType = 'UTF-16LE'; }
if ('' == $encodType) { $content = iconv("GBK", "UTF-8", $text); } else if ('UTF-8 BOM' == $encodType) { $content = $text; } else { $content = iconv($encodType, "UTF-8", $text); }
return $content; }
|
输入文件的路径,获取文件正确的中文内容,此函数适合于csv文件,其他文件雷同。
1 2
| $file = '/dd/ddd/ddd/ddd.csv'; getFileContent($file);
|