The small C and C++ Obfuscator
Beak is a small tool that can be used to lose weight of your source code, and generate obfuscated C and C++ source code by default. It was originally developed to reduce binary size of software running in embedded systems. All user defined symbols can be replaced by very short tokens results that the compiled final binary will be smaller than its normal compiling.
Features
----------------------
a. Parse all macros, global variables, constants, structure and member, class, and function definitions, and replace them with very short tokens.
b. The generated source code will still be located in its original line so that comparing with the original source code file line by line become very convenient.
c. All comments kept in its original position to improve the readability.
d. High performance, fast parse large projects that contains hundreds of source code files.
Result
----------------------
a. Smaller binary.
b. Hide API call, concrete implementation, techniques used.
c. Unreadable source code that can prevent source code leaked (encrypt).
d. Add difficulty for reverse engineering (decompile).
Support Languages
----------------------
Beak was designed as a C and C++ Obfuscator, and for the programming languages that suitable be used in embedded systems including:
Ada, Lisp, Lua, Forth, Tcl, Basic, Erlang, Ruby, Rust, Python, JavaScript, Java, C#.
Beak can also deal with the following common programming languages, and be used to obfuscate source code files but not for binary size reducing in most situations:
Asm, Asp, AWK, CMake, COBOL, Cuda, D, DosBatch, Eiffel, Fortran, F#, Go, Html, Matlab, Objective-C, OCaml, PHP, Pascal, Perl, Perl6, PostScript, Prolog, R, Rexx, Rst, SQL, Scheme, Shell, Slang, SystemVerilog, TypeScript, VHDL, Vera, Verilog.
Inside the C and C++ Obfuscator
----------------------------------
It seems very simple at the first glance of the obfuscator, but actually the inside of it is complex. When scan source code files, Beak must detect the parsed token is whether a keeping keyword (token) or not, and ignore it by query a built-in token database, the database was manually created by very time-consuming works that described as the following.
Content of the built-in token database:
The built-in token database of Beak includes several part of tokens generated from different sources.
1: C and C++ keywords for the language standards: e.g. auto, break, case, const, default, do, while, else ...
2: C and C++ preprocessor keywords:
such as __cplusplus, DEBUG, NDEBUG, RELEASE, __MACH__, __x86_64, __PIC__, __SSE2__, __weak__attribute__, __GNUC__, __VERSION__, __INT_MAX__, _M_IX86, __INT16_MAX__, __APPLE__, __clang__, __OBJC__, __FILE__, __LINE__, __FUNCTION__, __DATE__, __TIME__, ...
3: Standard library functions, including stdlib, glib, libcxx, openmp, libunwind, STL, Boost etc:
INT_MAX, INT_MIN, int32_t, stderr , atan, ceilf, atol, feof, fflush, fgetc, gets, ftell, memset, rand, qsort, strtok, vsprintf , tmpnam, wcscpy, time, tm, ...
4: Compilers tokens.
5: System tokens used by operation system.