| [ CnUnix ] in KIDS 글 쓴 이(By): belami (- 커피 -) 날 짜 (Date): 1995년11월10일(금) 21시49분30초 KST 제 목(Title): 깨진 한글 메일 (복구프로그램) 다음에 한글의 MSB가 벗겨진 것을 복구해주는 프로그램을 붙입니다. 게시하신 글은 약간의 제어코드를 제외하고는 대부분 복구되더군요. /*- * usage: cureksc < infile > outfile * cureksc tries to recover MSB stripped-off KSC5601 Hangul text file. * cureksc uses simple heuristic and automata to guess original characters. * cureksc only works with KSC5601 Korean code, and STRONGLY assumes * original text doesn't contains any Chinese or special characters. * $Log: cureksc.c,v $ * Revision 1.1 1991/12/31 18:02:06 rhee * Initial revision * * */ #include <stdio.h> #include <ctype.h> #include <string.h> #include <assert.h> main() { short c1; short c2; unsigned int hc; int status; status = 0; /* status == 0. english */ c1 = 0; /* previous-char */ while ((c2 = getchar()) != EOF) { c2 &= 0x007f; c2 |= 0x0080; switch (status) { case 0: /* english */ if (c2 >= 0x00b0 && c2 <= 0x00c8) { /* possibly a hangul1 */ status = 1; } else { status = 0; putchar(c2 & 0x007f); } break; case 1: /* possiblly a hangul1 */ if (c2 >= 0x00a1 && c2 <= 0x00fe) { /* possibly a hangul2 */ status = 0; /* maybe hangul, flush */ putchar(c1); putchar(c2); } else if (c2 >= 0x00b0 && c2 <= 0x00c8) { status = 1; putchar(c1 & 0x007f); } else { status = 0; putchar(c1 & 0x007f); putchar(c2 & 0x007f); } break; default: /* error */ assert(0); } c1 = c2; } } |