Since just-in-time (JIT) has considerable overhead to detect hot spots and compile them at runtime, using sophisticated optimization techniques for embedded devices means that any resulting performance improvements will be limited. In this paper, we introduce a novel static Dalvik bytecode optimization framework, as a complementary compilation of the Dalvik virtual machine, to improve the performance of Android applications. Our system generates optimized Dalvik bytecodes by using Low Level Virtual Machine (LLVM). A major obstacle in using LLVM for optimizing Dalvik bytecodes is determining how to handle the high-level language features of the Dalvik bytecode in LLVM IR and how to optimize LLVM IR conforming to the language information of the Dalvik bytecode. To this end, we annotate the high-level language features of Dalvik bytecode to LLVM IR and successfully optimize Dalvik bytecodes through instruction selection processes. Our experimental results show that our system with JIT improves the performance of Android applications by up to 6.08 times, and surpasses JIT by up to 4.34 times.
Mobile devices have limited processing power, memory, and battery life; thus, optimizing mobile applications for better performance is critical for their successful deployment
. However, since optimization requires high-level technical expertise, it is a daunting task for an application developer. Therefore, automatic code optimization techniques are widely used to achieve high performance in applications for mobile devices.
An Android application is written in Java language for developer productivity and code mobility. On desktop and server environments, a Java program is compiled to Java bytecodes, which is an intermediate representation (IR) for Java Virtual Machine (JVM); JVM runs the Java bytecodes. A Java bytecode is based on a stack-based instruction set (hence the term “stack-based Java bytecode”) and has object-oriented features in Java language. On the other hand, Android applications written in Java language are run on Dalvik Virtual Machine (DVM)
, which is an optimized virtual machine for the Android platform. However, since the performance of stack-based Java bytecode on resource-limited mobile devices is poor due to slow interpretation, Android researchers have created a new bytecode set for DVM — Dalvik bytecode — to improve the performance of Android applications. In contrast to the Java bytecode, the Dalvik bytecode is based on a register-based instruction set; therefore, it can reduce the code size and running time
In terms of automatic code optimization, JVM researchers have devoted most of their efforts in developing just-in-time (JIT) compilers rather than in developing static Java compilers
. However, when applications run, considerable runtime overhead, a byproduct of JIT compilation, is incurred in the detection of hot spots and their subsequent transformation into machine codes
. Moreover, since JIT compilation has additional processing and memory space overhead, DVM JIT
adopts only simple optimization techniques that have low overhead. Optimization techniques implemented in Dalvik JIT can be classified into the following three classes: local optimization, global optimization, and simple peephole optimization. The local optimization techniques in Dalvik JIT are constant propagation, register allocation, redundant load/store elimination, redundant null-check elimination, and heuristic scheduling such as load hoisting and store sinking. The global optimization techniques include redundant branch elimination and induction variable optimization.
Peephole optimization performs power-reduction in multiplication and division operations. Due to the high runtime overhead, more sophisticated optimization techniques such as aggressive loop optimization and peephole optimization are not implemented in DVM JIT. In addition, the scope of
is limited only to string, math, and atomic operations.
In Android 4.4, ART runtime
was introduced, which is a new Android runtime to boost the performance of Android devices based on ahead-of-time compilation. ART runtime precompiles Dalvik bytecode to binary during installation of an application, and the resulting binary code is then executed directly at runtime of an application. However, this scheme requires considerable install time of an application and more space and memory footprint than existing Dalvik bytecode.
Previous studies to improve the performance of Android applications can be largely classified into two areas — those that adopt complementary compilations of DVM
and those that adopt modifications of DVM
. However, previous studies have yet to adequately apply their optimization techniques to new versions of DVM because their techniques adopt the unique compilation on a particular version of DVM and a special hardware support. With regard to the first of these issues, since the Android platform is under active development, it requires a lot of effort to maintain the various modifications of DVM when a new version becomes available. Moreover, because Android applications are composed of Dalvik bytecodes, previous transformation techniques for Java codes are not helpful in optimizing Android applications
. With regard to the second of these issues, development of a new optimization system
requires considerable efforts on the part of developers to develop and verify any such new systems; thus, it is costly in terms of time and effort. From this perspective, using a mature compiler infrastructure, such as Low Level Virtual Machine (LLVM)
, has clear advantages in terms of development efforts and maturity. While previous studies
also used mature compiler infrastructures to transform Java programs to machine codes, they are not considered for execution on mobile devices due to their high resource requirements.
In this paper, we introduce a novel static Dalvik bytecode optimization system to complement the DVM. We elaborate on optimizations to transform the Dalvik bytecodes into optimized Dalvik bytecodes. Since the final output of our system is in that of a Dalvik bytecode format, we can improve the performance of Android applications without any runtime overhead, while maintaining code mobility. We make the following contributions in this paper:
The remainder of this paper is organized as follows. In Section II, we describe the design and implementation of our static Dalvik bytecode optimization framework. Experimental results are shown in Section III. In Section IV, related work is described. Finally, in Section V, we conclude the paper.
■ We find that there are further opportunities to optimize Dalvik bytecodes in Android applications.
■ We design and implement a static Dalvik bytecode optimization framework by exploiting optimization passes in LLVM. We discuss the challenges involved in handling high-level language features in LLVM and determine how to optimize the Dalvik bytecode.
■ We evaluate our Dalvik bytecode optimization framework by using several benchmarks and show its effectiveness.
II. Static Dalvik Bytecode Optimization
In this section, we discuss how our framework optimizes the Dalvik bytecode while preserving the high-level language features of the bytecode itself. Our system at first transforms Dalvik bytecode to LLVM IR and statically optimizes the transformed LLVM IR. It then finally generates the optimized Dalvik bytecode from the optimized LLVM IR. For an explanation, we use the sample code in
and show the optimized code in
- 1. Motivating Example
illustrates the motivating example of our research. The Java code shown in
is transformed into the Java bytecode shown in
and again into the Dalvik bytecode shown in
. Since the summation statement,
, shown in dark gray is invariant regardless of the loop iterations, it can be removed from the loop represented by the dotted box to avoid unnecessary computation. This optimization technique is called loop-invariant code motion — a widely used compiler optimization technique. Lines 4 to 7 in
and Line 9 in
are Java bytecode and Dalvik bytecode corresponding to Line 6 in
, respectively. As shown in
, although the Java code in
is compiled by the Java compiler and transformed by the Dx tool
, we find that the loop-invariant statement,
add-int/ 2addr v3
, in the dark gray box remains inside the loop; consequently, this statement becomes a redundant operation, directly decreasing program performance.
To measure how redundant statements decrease the performance, we run the motivating example in
on an Android device. When we manually optimized the code of
by removing the loop-invariant code, its performance improvement is 21% in only-DVM interpretation and 90% in JIT-enabled DVM interpretation (even though the resulting sample code is relatively simple). This demonstrates that performance can significantly be improved by performing static aggressive optimization before JIT compilation. Since static optimization can perform aggressive optimization techniques without any runtime overhead, it is beneficial in particular to Android mobile devices with limited processing power, memory, and battery.
Motivating example: (a) Java code, (b) Java bytecode, and (c) Dalvik bytecode.
- 2. Comparison between Dalvik Bytecode and LLVM IR
Our Dalvik bytecode optimization system exploits LLVM for extensive static optimization of the Dalvik bytecode as a complementary compilation for DVM. LLVM
is a mature compiler infrastructure that uses a language-independent low-level instruction set, called LLVM IR. LLVM front-end generates the LLVM IR code from application code. After the LLVM optimizer optimizes the LLVM IR code in a language-independent way using aggressive optimization passes, the LLVM back-end optimizes the LLVM IR code in a target architecture–dependent way; finally, the machine codes of the target architecture are generated. Therefore, resolving the impedance mismatches between the high-level Dalvik bytecode and the low-level LLVM IR code is challenging when attempting to create a static optimization system for Dalvik bytecode using LLVM.
To understand the characteristics of the Dalvik bytecode and LLVM IR, we compare the Dalvik bytecode, LLVM IR, and the machine code in
. An Android application written in Java language is compiled to .
file (Java bytecode) by the Java compiler. The Dx tool transforms the .
file to a Dalvik executable file (.
) that is composed of Dalvik bytecodes. During transformation, the Dx tool performs simple optimizations including register allocation, dead code elimination, constant propagation, and constant folding
. In contrast to the LLVM IR, the Dalvik bytecode uses an infinite number of virtual registers. A virtual register’s type is determined by the mnemonic code of the Dalvik bytecode that uses it. Moreover, the Dalvik bytecode has high-level language features that are derived from the Java program to describe class information and class inheritance. However, similar to the IR of most compilers, LLVM IR does not have such high-level language features. LLVM includes only four derived types — pointer, array, structure, and function
. High-level data types are represented as a combination of these four types. These four derived types are used in complicated language-independent analyses and optimizations. The register of LLVM IRs is already set according to an instruction and is in static single assignment (SSA) form to facilitate analyses and optimization passes.
Comparison between Dalvik bytecode, LLVM IR, and machine code.
| ||Dalvik bytecode ||LLVM IR ||Machine code |
|Class information ||Yes ||No ||No |
|SSA form ||Yes (conditional) ||Yes ||No |
|Register type ||Dynamic ||Static ||Static |
- 3. Translation to LLVM IR
When we regenerate the Dalvik bytecode from LLVM IR code, we should preserve the high-level language features of the original Dalvik bytecode. The beginning portion of Dalvik bytecode is composed of several constant pool sections, including string ids, type ids, method ids, class definitions, and word data. These sections represent strings; types of variables and methods; methods of classes; class information; and data in classes. To handle the impedance mismatch between Dalvik bytecode and LLVM IR code described in Section II-2, we annotate these high-level language features of the Dalvik bytecode into LLVM IR code using metadata, and create the intrinsic functions for the Dalvik instructions, which do not have any direct correspondence in LLVM IR code
. In the LLVM, intrinsic functions are used to add new fundamental types and new instructions to extend LLVM IR
. In our system, since an LLVM transformation pass does not change the metadata, an intrinsic function maintains the metadata for high-level language features, such as class information and inheritance, during optimization and code regeneration.
Our system first parses the Dalvik bytecode and then generates metadata from the constant pool sections. The metadata of a method is described as a sequence of the method name, access flag, return type, parameter count, and parameter type. If a Dalvik bytecode instruction has a corresponding LLVM IR instruction, then our system generates the LLVM IR instruction as shown in
for the Dalvik bytecode instruction shown in
. Otherwise, it generates an intrinsic function
— the name of which is the same as the Dalvik bytecode instruction. For example, since “aput” instruction at Line 10 in
, which stores a register value to a given array element with given array index, has no corresponding LLVM IR instruction, an intrinsic function “void @llvm.Dalvik.aput” underlined at Line 15 in
is generated. In LLVM optimization passes, an intrinsic function is treated as an unanalyzable function. In this way, 43 Dalvik bytecode instructions among 256 bytecode instructions are translated to intrinsic functions: for example, “aput, aget, sput, sget,” and so on.
LLVM IR code of motivating example.
shows the metadata translated from the Dalvik bytecode shown in
through the front-end compiler of our system. Each metadata in a dotted box includes variables and methods in a class as follows:
For example, in “metadata !14” at Line 27 of
, the first argument, “loopinvariant,” describes the method name. The second argument is an access flag of the “loopinvariant” method, where “i32 9” represents public static method whose type is determined in the DVM library. The third argument is the return type of the method, where “i32 0” indicates the first argument (!1) of “llvm.dvk.types” at Line 2, and the metadata “I” at Line 14 (!1) refers to an integer type. The fourth argument is the number of parameters, and the fifth and sixth arguments are the presented parameter types. Since the fourth, fifth, and sixth arguments are “i32 2,” “i32 0,” and “i32 0,” respectively, the number of parameters is two, and the type of the two parameters is integer, as described above.
■ Types of string, variables, and methods in the Dalvik bytecode (from “metadata !0” at Line 13 to “metadata !11” at Line 24).
■ Class information (“metadata !12” at Line 25).
■ Methods information (from “metadata !13” at Line 26 and “metadata !14” at Line 27).
Metadata in LLVM IR code.
- 4. Optimization and Generation of Dalvik Bytecode
An LLVM static optimizer
optimizes the LLVM IR code generated by our front-end compiler, in both a target-dependent way and a target-independent way.
illustrates how our static Dalvik bytecode optimization improves the code quality of Android applications, where the gray shaded boxes indicate our optimization components. The “Dalvik bytecode instruction selection” phase optimizes LLVM IR code using Dalvik bytecode descriptions to select instructions and operands target-dependently
. Dalvik bytecode descriptions consist of language characteristics for Dalvik bytecodes, including instruction and register information. In the “Dalvik bytecode instruction selection” phase, LLVM IR code is translated into an initial directed acyclic graph (DAG), in which the nodes specify the operations and operands of each instruction. Using an initial DAG helps LLVM to optimize the Dalvik bytecodes target-independently on a very low level. The “Optimize DAG” phase simplifies the initial DAG by eliminating redundancies exposed by the previous step, before and after the “Legalize DAG” phase. The “Legalize DAG” phase transforms the DAG to eliminate the types and operations that are not supported by the Dalvik VM. The “Instruction selection” phase for the DAG linearizes DAG into Dalvik bytecode instructions by using pattern matching. Then, the Dalvik bytecode is optimized using SSA-based optimization, including loop-invariant code motion, dead code elimination, common subexpression elimination, and peephole optimization. In the “Register allocation” stage, the number of virtual registers in SSA form is decreased to 15, which is the number of general-purpose registers in our target architecture, ARMv7. Subsequently, Dalvik bytecodes are optimized by final Dalvik bytecode-dependent optimization including peephole optimization. Finally, the optimized Dalvik bytecodes are printed out using the LLVM MC infrastructure.
Static Dalvik bytecode optimization.
shows the optimized LLVM IR code generated by our static optimization. In
, the redundant computation at Line 4, “%1 = add i32 %arg0, %arg1,” shown in the dark gray box was removed from the loop in the dotted box. Finally, the back-end of our system generates the optimized Dalvik bytecode, where the summation instruction is removed from the loop (as shown in
Optimized LLVM IR code of motivating example.
Optimized Dalvik bytecode of motivating example.
As mentioned above, our approach finally generates optimized Dalvik bytecode before it is executed in DVM. After starting the execution of the Dalvik bytecode, a typical DVM JIT compiler handles the reference types and garbage collection. Because of our aggressive optimization, our approach can give positive effects on dealing with these operations (reference types and garbage collection) during executing the Dalvik bytecode.
To evaluate our static Dalvik bytecode optimization system, we first ran four micro-benchmarks and two real benchmarks, which were downloaded from the Google application market (Benchmark Pi
, EmbeddedCaffeineMark 3.0
). We compared the scores of the benchmarks in four configurations — baseline, JIT, baseline with static optimization, and JIT with static optimization. In the baseline configuration, DVM performs only interpretation without either JIT or our static optimization. We use the baseline configuration as the baseline for measuring normalized performance improvement of other schemes. In the JIT configuration, DVM performs both the interpretation and JIT compilation. In the configurations with static optimization, the Dalvik bytecodes are statically optimized before running on DVM. In addition, we compare the performance of our static optimization scheme with other studies
. All experiments are performed on a reference phone, Galaxy Nexus
, running Android 4.0.1, with 1.2 GHz TI OMAP 4660 ARM Cortex-A9 dual core and 1 GB memory. In addition, to evaluate our system on another Android device, we performed EmbeddedCaffeineMark 3.0 on Galaxy S4, running Android 4.4.2, with 2.2 GHz Qualcomm Snapdragon 800 Krait 400 quad core and 2GB memory.
- 1. Performance of Static Dalvik Bytecode Optimization
We developed four micro benchmarks — loop-invariant code motion (
), loop splitting, loop unswitching, and partial redundancy elimination. Each micro benchmark is a program that can be optimized by the above-mentioned optimization techniques.
shows that our static optimization scheme significantly improves performance, where the performance measure is execution speed for each micro benchmark. As we expected, the best performance is achieved by using both the JIT compilation and static optimization. The average performance improvement in the micro benchmarks is 64%.
Normalized performance improvement of micro benchmarks.
For real benchmarks, we evaluated EmbeddedCaffeinemark 3.0
and Benchmark Pi
. These are popularly used benchmarks for testing the performance of DVM JIT
. First, Benchmark Pi calculates the ratio of the circumference of a circle to its diameter.
shows the normalized performance improvement of Benchmark Pi under the four configurations, where the performance measure is execution speed. The baseline with static optimization outperforms the baseline by 1.81 times and the JIT with static optimization outperforms the baseline by 2.13 times. The experimental results clearly show that our system significantly improves the performance of Android applications by statically optimizing the Dalvik bytecodes.
Normalized performance improvement of Benchmark Pi.
Second, EmbeddedCaffeineMark 3.0 is used for our evaluation. It is a subset of the complete CaffeineMark suite, including Sieve, Loop, Logic, Method, Float, and String
. First, Sieve is the classic Sieve of Eratosthenes (for prime numbers). Loop performs sorting and sequence generation for measuring the compiler optimization of loops. Logic measures the execution speed of decision-making instructions in a virtual machine. Method performs recursive function calls on a virtual machine. Float simulates a 3D rotation of objects around a point. Finally, String performs string concatenation, which is the operation of joining character strings end-to-end. The score is the number of executed instructions per second.
shows the scores of EmbeddedCaffeineMark 3.0 under the four configurations in a logarithmic scale. As expected, the benchmark scores are the best when we use both the JIT compilation and static optimization; normalized performance acceleration relative to the baseline is increased by up to 6.08 times. Also, interpreting with static optimization achieves performance improvement by up to 5.22 times on average. Therefore, when an application is statically optimized, the JIT-compiled performance is 4.34 times faster on average than that of no static optimization. Especially, in the case of the Logic benchmark in
, we can observe that our static optimization system, simultaneously with DVM JIT, improved the performance by about 10 times because of their efforts for more advanced optimization.
EmbeddedCaffeineMark 3.0 scores on Galaxy Nexus.
shows the average execution time of each benchmark, while each benchmark was run 10 times. EmbeddedCaffeineMark 3.0 cannot be measured individually, so the whole execution time of the benchmark is presented. In our experiment, each test is run for approximately the same length of time.
Average execution time of each benchmark.
| ||Baseline (s) ||JIT (s) ||Baseline with static optimization (s) ||JIT with static optimization (s) |
|Loop invariant ||7.65 ||5.24 ||6.28 ||4.02 |
|Loop splitting ||10.98 ||8.54 ||8.27 ||6.53 |
|Loop unswitching ||9.87 ||7.45 ||8.04 ||5.06 |
|Partial redundancy elimination ||6.16 ||6.01 ||6.07 ||6.01 |
|Benchmark Pi ||0.66 ||0.41 ||0.34 ||0.29 |
|EmbeddedCaffeineMark 3.0 ||23.80 |
To ensure that our experimental results are impartial to diverse hardware features, performed, another experiment with EmbeddedCaffineMark 3.0 on Galaxy S4, and showed the results in
, as expected, similar results to those of
were achieved, and we can see that the benchmark scores are also the best when we use both the JIT compilation and static optimization; normalized performance acceleration relative to the baseline is increased by up to 3.17 times on average. Moreover, interpreting with static optimization achieves performance improvement by up to 2.87 times on average. Therefore, when an application is statically optimized, the JIT-compiled performance is 2.94 times faster on average than that of no static optimization. This experimental result clearly shows that our system can improve the performance of Android applications regardless of the type of hardware.
EmbeddedCaffeineMark 3.0 scores on Galaxy S4.
- 2. Comparison with Other Optimization Schemes
We compared the performance of our static Dalvik bytecode optimization scheme with the Java bytecode level optimizer
and another complementary compilation scheme for Android applications
is a Java optimization framework developed at McGill University. It can be used to optimize and analyze class files at a Java bytecode level . For fair comparison, we statically optimized Java bytecodes of benchmarks, such as Benchmark Pi and EmbeddedCaffeineMark 3.0, by using the latest version of Soot with highest optimization options, enabling intra-procedural and whole program optimization. The optimized Java bytecodes are subsequently transformed to Dalvik bytecodes and run on the same machine as that described in Section III-1. We measured the score for the benchmarks and used DVM interpretation as the baseline, measuring the normalized performance improvement.
shows that normalized performance of Benchmark Pi for Soot is improved, where we used execution speed as a performance measure. We observe that the JIT-compiled performance is 44% faster than the baseline performance.
also shows that normalized performance of EmbeddedCaffeineMark 3.0 for Soot is improved and we can observe that the JIT-compiled performance is 48% faster than the baseline performance on average. As shown in
, Soot does not outperform DVM dramatically, because it performs only simple optimizations; that is, copy propagation, constant propagation, constant folding, dead assignment elimination, unconditional branch folding, unused local elimination, load store optimization, and peephole optimization.
Normalized performance improvement of Benchmark Pi for Soot.
Normalized performance improvement of EmbeddedCaffeine Mark 3.0 for Soot.
is a well-known complementary compilation technique for DVM. Unfortunately, we could not directly compare our results with those of Icing, because the Icing project is not accessible to the general public. Instead, we indirectly measured the performance of our optimization scheme against that of Icing by referring to the experimental results in
. We observe that our optimization scheme outperforms Icing for the same benchmark applications; whereas, Icing outperforms Dalvik JIT by 2.83 times, our system outperforms Dalvik JIT by 4.34 times.
IV. Related Work
Improving the performance of Android applications has received a lot of interest. Here, we discuss some studies that are most related to our work.
- 1. Complementary Compilation with DVM
converts hot methods in the Dalvik bytecode to C codes using the GCC compiler. The translated native codes are executed by DVM through calling the Java Native Interface. Lim and others
generate a hybrid DEX file that includes selective ahead-of-time compilation information based on the profiling information of hot methods, and then compile and execute the hybrid DEX file. BLOAT
eliminates common access expressions by performing partial redundancy elimination based on type-based alias analysis. It employs optimization between java class files at a bytecode-to-bytecode level. However, since BLOAT mainly focuses on exploiting intra-procedural analysis and optimization, the performance gain may be limited. As shown in Section III, the compilation with our static optimization and DVM JIT surpassed that of previous studies.
- 2. Modification in DVM
, the authors propose a lightweight JIT compiler for the ARM architecture, which is among the most popular on the Android platform. They use a simplified register allocation process for JIT based on the fact that in most Java methods eleven physical registers in the ARM core are sufficient to fulfill the virtual registers in DVM. Absar and Shekhar
improve the array bounds check optimization algorithm in DVM JIT so that it can handle indices such as affine functions of iterators, loop invariants, and literals. Modifying DVM causes maintenance problems since DVM is under active development. AccelDroid
is the HW/SW co-designed system for accelerating Android applications. To boost the power and performance efficiency of Android applications, a Dalvik bytecode–aware processor is implemented by co-designing hardware and software; this is so that Dalvik bytecodes can be executed directly using dynamic binary translation on the special processor. ART runtime
performs ahead-of-time compilation from Dalvik bytecode to binary during an application installation, and then the generated machine code is executed when the application is run. However, since ART runtime is still an ongoing project, the speed-up of performance is not dramatic — under 1.8 times on EmbeddedCaffeinemark 3.0
. In conclusion, our static optimization can achieve higher performance of applications with less engineering overhead than those of DVM modification techniques.
In this paper, we proposed a static Dalvik bytecode optimization system for the performance improvement of Android applications. We found that Android applications, which are compiled by the Java compiler and transformed by the Dx tool, still offer considerable opportunities for further optimization with static compilation. We resolve the performance shortage by adopting static optimization with DVM JIT as a complementary compilation technique of DVM. We exploited a mature compiler infrastructure, LLVM, to enhance the code quality for Android applications. Moreover, we proposed techniques to close the gap between the high-level Dalvik bytecode and the low-level LLVM IR code, and to optimize the LLVM IR code conforming to language information of Dalvik bytecode. Our experimental results show that the static Dalvik optimization system with DVM JIT surpasses interpretation in DVM by 6.08 times and DVM JIT by 4.34 times.
Our optimization framework can also be applied for Android application executing in ART runtime. As mentioned in Section IV, ART runtime compiles Dalvik bytecode to native code using an on-device code translation tool in advance, and then directly executes the precompiled native code to boost the execution performance at runtime. However, this scheme inevitably requires considerable installation time and more memory space than DVM. We expect that the installation time and the amount of memory usage can be reduced by compiling the optimized Dalvik bytecode, which is generated by our framework, at code translation process in ART runtime. In addition, we believe that our aggressive optimization scheme can also give positive effects on code optimization in ART runtime, which remains as a future work.
This work was supported by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (10041244, SmartTV 2.0 Software Platform).
Jeehong Kim received his BS and MS degrees in electronic engineering from the Department of Electronic Engineering, Kwangwoon University, Seoul, Rep. of Korea, in 2008 and 2010, respectively, and his PhD degree in computer engineering from the Department of Mobile Systems Engineering, Sungkyunkwan University, Suwon, Rep. of Korea, in 2015. Since 2015, he has been a senior software engineer at Samsung Electronics, Suwon, Rep. of Korea. His research interests include embedded systems and system securities.
Inhyeok Kim received his BS and MS degrees in computer engineering from the Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, Rep. of Korea, in 2006 and 2010, respectively, and since 2010, he has been a PhD student with the Department of Electrical and Computer Engineering, Sungkyunkwan University. His research interests include system software and UI/UX platforms.
Changwoo Min received his BS and MS degrees in computer science from the Department of Computer Science, Soongsil University, Seoul, Rep. of Korea, in 1996 and 1998, respectively, and his PhD degree in computer engineering from the Department of Mobile Systems Engineering, Sungkyunkwan University, Suwon, Rep. of Korea, in 2014. From 1998 to 2005, he was a research engineer with the Ubiquitous Computing Laboratory of IBM, Seoul, Rep. of Korea. From 2005 to 2014, he was a principal software engineer at Samsung Electronics, Suwon, Rep. of Korea. Since 2014, he has been working as a postdoctoral researcher at Georgia Institute of Technology, Atlanta, USA. His research interests include embedded systems, storage systems, and operating systems.
Hyung Kook Jun received his BS degree in computer engineering from the Department of Computer Science and Engineering, Sungkyunkwan University, Suwon, Rep. of Korea, in 1999 and his MS degree in computer engineering from the Department of Electrical and Computer Engineering, Sungkyunkwan University, in 2001. Since 2001, he has been a senior researcher with the Cyber-Physical Systems Research Team, ETRI. His research interests include CPS, embedded systems, communication middleware, and multimedia systems.
Soo Hyung Lee received his BS and MS degrees in electronic engineering from the Department of Electronic Engineering, Hanyang University, Seoul, Rep. of Korea, in 1991 and 1993, respectively, and his PhD degree in computer engineering from the Department of Computer Engineering, Chungnam National University, Daejeon, Rep. of Korea, in 2012. In August 1993, he joined the Network Design Laboratory of DACOM corporation, Seoul, Rep. of Korea. Since October 2000, he has been a principal member of the engineering staff of the Cyber-Physical Systems Research Team, ETRI. His research interests include IT converging systems, CPS, Smart Factory, and network security.
Won-Tae Kim received his BS, MS, and PhD degrees from the Department of Electronic Engineering, Hanyang University, Seoul, Rep. of Korea in 1994, 1996, and 2000, respectively. He established a venture company named Rostic Technologies, Inc. in 2001 and worked as CTO from 2001 to 2005. He joined ETRI in 2005 and has led in Cyber-Physical Systems (CPS) research team, SW Contents Technology Research Lab, ETRI, since 2010. Now, he is a professor at Korea University of Technology and Education, Cheonan, Rep. of Korea. His research interests include CPS, IoT networking, and Neuromorphic computing.
Corresponding Author firstname.lastname@example.org
Young Ik Eom received his BS, MS, and PhD degrees in computer science from the Department of Computer Science, Seoul National University, Rep. of Korea, in 1983, 1985, and 1991, respectively. He was a visiting scholar with the Department of Information and Computer Science, University of California, Irvine, USA, from September 2000 to August 2001. Since 1993, he has been a professor at Sungkyunkwan University, Suwon, Rep. of Korea. His research interests include system software, operating systems, virtualization, cloud systems, and system securities.
“AndroScope: An Insightful Performance Analyzer for All Software Layers of the Android-Based Systems,”
DOI : 10.4218/etrij.13.0112.0203
“Dalvik Virtual Machine Internals,”
the Google I/O Developer Conf.
San Francisco, CA, USA
“Virtual Machine Showdown: Stack versus Registers,”
ACM Trans. Archit. Code Optimization
“Swift: A Register-Based JIT Compiler for Embedded JVMs,”
Proc. ACM SIGPLAN/SIGOPS Conf. Virtual Execution Environment
Mar. 3–4, 2012
Compiler Optimizations: Can You Count on Compilers to Optimize Your Java Code, Java World
Java Bytecode: Understanding Bytecode Makes You a Better Programmer, IBM Developer Works
“sEc: A Portable Interpreter Optimizing Technique for Embedded Java Virtual Machine,”
Java Virtual Mach. Res. Technol. Symp.
San Francisco, CA, USA
Aug. 1–2, 2002
“A JIT Compiler for Android’s Dalvik VM,”
the Google I/O Developer Conf.
San Francisco, CA, USA
“The ART Runtime,”
the Google I/O Developer Conf.
San Francisco, CA, USA
“A Method-Based Ahead-of-Time Compiler for Android Applications,”
Proc. Int. Conf. Compiler, Archit. Synthesis Embedded Syst.
Oct. 9–14, 2011
“A Selective Ahead-of-Time Compiler on Android Device,”
Int. Conf. Inf. Sci. Appl.
Suwon, Rep. of Korea
May 23–25, 2012
“Eliminating Partially-Redundant Array-Bounds Check in the Android Dalvik JIT Compiler,”
Proc. Int. Conf. Principles Practice Programming Java
Kongens Lyngby, Denmark
Aug. 24–26, 2011
“Acceldroid: Co-designed Acceleration of Android Bytecode,”
IEEE/ACM Int. Symp. Code Generation Optimization
Feb. 23–27, 2013
GCJ - The GNU Compiler for the Java Programming Language
DragonEgg - Using LLVM as a GCC Backend
“Partial Redundancy Elimination for Access Path Expressions,”
Software: Practice and Experience
DOI : 10.1002/spe.371
“Soot: A Java Bytecode Optimization Framework,”
Conf. Center Adv. Studies Collaborative Res.
“VMKit: A Substrate for Managed Runtime Environments,”
Proc. ACM SIGPLAN/SIGOPS Conf. Virtual Execution Environment
Pittsburgh, PA, USA
Mar. 17–19, 2010
“LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation,”
Int. Symp. Code Generation Optimization
Palo Alto, CA, USA
Mar. 20–24, 2004
The LLVM Compiler Infrastructure
GCC- The GNU Compiler Collection
Google Android Dx Tool
Security Engineering Research Group
“Analysis of Dalvik Virtual Machine and Class Path Library,” Tech. Rep.
Institute of Management Sciences
LLVM Language Reference Manual
Extensible Metadata in LLVM IR
“PTX Back-End: GPU Programming with LLVM,”
the LLVM Developer’s Meeting
San Jose, CA, USA
Nov. 8, 2011
Extending LLVM: Adding Instructions, Intrinsics, Types, etc
LLVM’s Analysis and Transform Passes
The LLVM Target-Independent Code Generator
“Tutorial: Building a Backend in 24 Hours,”
the LLVM Developer’s Meeting
Cupertino, CA, USA
Benchmark Pi – The Android Benchmarking Tool
The Embedded CaffeineMark
Pendragon Software Corporation