Advanced
A Process Algebra-Based Detection Model for Multithreaded Programs in Communication System
A Process Algebra-Based Detection Model for Multithreaded Programs in Communication System
KSII Transactions on Internet and Information Systems (TIIS). 2014. Mar, 8(3): 965-983
Copyright © 2014, Korean Society For Internet Information
  • Received : October 31, 2013
  • Accepted : March 05, 2013
  • Published : March 28, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Tao Wang
The Key Laboratory for Computer Virtual Technology and System Integration, Hebei Province Qinhuangdao, 066004, China
Limin Shen
The Key Laboratory for Computer Virtual Technology and System Integration, Hebei Province Qinhuangdao, 066004, China
Chuan Ma
The Key Laboratory for Computer Virtual Technology and System Integration, Hebei Province Qinhuangdao, 066004, China

Abstract
Concurrent behaviors of multithreaded programs cannot be described effectively by automata-based models. Thus, concurrent program intrusion attempts cannot be detected. To address this problem, we proposed the process algebra-based detection model for multithreaded programs (PADMP). We generate process expressions by static binary code analysis. We then add concurrency operators to process expressions and propose a model construction algorithm based on process algebra. We also present a definition of process equivalence and behavior detection rules. Experiments demonstrate that the proposed method can accurately detect errors in multithreaded programs and has linear space–time complexity. The proposed method provides effective support for concurrent behavior modeling and detection.
Keywords
1. Introduction
M ultithreading is an important mechanism for supporting program structuring and parallel computation. With the multithread technique, the application prospects of concurrent systems are becoming increasingly extensive. However, concurrent systems have specific characteristics, such as programming complexity, randomness of running results, and reproducibility. Therefore, security for concurrent systems is a concern, and constructing a model to describe and detect concurrent behaviors is an effective solution.
- 1.1 Related Work
Constructing a valid and precise program model is a challenging task. Because the original development of a model that takes advantage of the system call sequence for normal program behavior was originally presented by Forrest et al. [1] , many scholars have researched software behavior using the system call. These studies are based on three basic techniques for model construction: system call short sequences [2 - 4] , automata [5 - 8] , and the Virtual Path [9] . Of these techniques, modeling based on system call short sequences is efficient and can be implemented easily. However, this method is imprecise, and these intrusion detection models are much more prone to false positives. Compared with short sequences, branch and loop structures of programs can be expressed. Modeling based on automata improves the precision of behavior modeling and reduces the false positive rate. Unfortunately, these models still have some limitations. For example, impossible paths, prohibitively high space–time complexity, and they are unsuitability for analyzing concurrent behaviors.
For concurrent behaviors, previous research has focused on two basic techniques: data race errors and timing analysis. Savage et al. discussed the potential data race problems based on the Lockset detection method when multiple threads access the same shared variables without locking [10] . Schonberg et al. analyzed data race problems that visited order uncertain, based on happen-before method [11] . Wang et al. presented a multilockset algorithm that considered the relation of happen-before and detected race condition at runtime [12] . Kong Deguang et al. presented a timing analysis method for multithreaded programs based on a hidden Markov model [13] . However, designing such models is complex because it is difficult to abstract a concurrent environment. Moreover, the space–time complexity is prohibitively high and unsuitable for practical use.
Modern software systems are prevalently concurrent; thus, they are difficult to get right. Unusual or unexpected behaviors in concurrent programs are difficult to discover using traditional detection techniques. Z. Rakamaric described a scalable, automatic, and precise approach to static unit checking of concurrent programs implemented in a tool called STORM [14] . To eliminate concurrency errors for a class of multithreaded programs, Berger et al. presented Grace, a software-only runtime system [15] . Tallent et al. described how to measure and attribute arbitrary performance metrics for high-level multithreaded programming models [16] . In addition, a technique to measure and analyze lock contention has been implemented [17] . To increase the reliability of multithreaded programs, a cooperative software–hardware mechanism to increase the performance of multithreaded applications was proposed, which was the first generalized mechanism to identify the most critical bottlenecks that cause thread waiting on multithreaded applications and accelerate those bottlenecks [18] .
- 1.2 Main Contributions
We construct a process algebra-based detection model for multithreaded programs in a communication system. The basic idea is as follows: A system call is mapped to an action by static binary code analysis; a control flow graph (CFG) of the program is mapped to a process; process expressions are generated according to the process algebra algorithms; concurrency operators are added into process expressions; model construction algorithm and behavior detection rules are defined; and the process algebra-based detection model for multithreaded programs (PADMP) is used to detect concurrent behaviors. Our primary contributions can be summarized as follows:
1) The PADMP model enables efficient multithreaded program modeling. The PADMP model represents a substantial improvement in statically constructed multithreaded program models because it can describe concurrent behaviors of a multithreaded program effectively.
2) This method is also suitable for sequential behavior modeling and detection. To the best of our knowledge, we are the first to apply process algebra to behavior modeling, which is a profitable attempt in the field of multithreaded program behavior detection.
3) According to the properties of process algebra, some definitions and laws are given to provide a theoretical framework for concurrent behaviors. By reducing and merging process expressions, the PADMP model produces a smaller state space, moreover, it is complete.
The main advantage of the proposed PADMP model is that it can accurately detect errors in multithreaded programs, such as data race, deadlock, and abnormal time sequence errors. All test programs show an order of magnitude improvement in space–time complexity.
- 1.3 Organization of the Paper
The remainder of this paper is organized as follows. In Section 2, we introduce process algebra. In Section 3, the PADMP model construction algorithm is discussed. Section 4 presents the behavior detection rules. An experimental evaluation is discussed in Section 5, and we conclude the paper in Section 6.
2. Process Algebra
Process algebra is a mathematical tool used for depicting concurrent systems [19 - 20] , and is used for researching concurrent, distributed, interactive systems [21] . At present, the Asynchronous Sequential Processes (ASP) [22 - 23] and Ambient Calculus [24 - 25] have more functions to describe the behavior of asynchronous concurrent system in theory research. This paper introduces process algebra for multithreaded programs modeling by extending its algorithms and describing the interaction of behaviors based on system calls. We extract a common subset of process algebra. Let Act be a finite set of given actions (A). The syntax specifications are defined as follows:
PPT Slide
Lager Image
Their corresponding meanings are as follows:
1)
PPT Slide
Lager Image
stands for process down time, no action is performed. √ stands for process terminated successfully.
2) a.P stands for executable action a , then transformed into process P , a Act ⋃ { τ } τ stands for unobservable action. Actions in this paper are the same as actions in CCS [26] , divided into action ( a ) and co-action
PPT Slide
Lager Image
3) P / L stands for action ( a ) in P appearing in L will be hided and be replaced by unobservable action τ at runtime.
4) P 1 + P 2 stands for the choice of P 1 or P 2 , according to the process subordinated by the following actions.
5) P 1 || A P 2 means that if action ( a ) in P 1 and co-action
PPT Slide
Lager Image
in P 2 are subordinated to set A , then P 1 and P 2 execute synchronously, while any other actions are executed asynchronously.
Definition 1 Guarded Expression. The process expression begins with the prefix action. e.g., P = a.Q, P = a.b.R .
Definition 2 Successful Termination Guarded Expression. The process expression that begins with the prefix action and ends with a successful termination process. e.g., P = a.b. √.
Definition 3 Recursive Guarded Expression. The process expression that begins with the prefix action and ends with itself. e.g., P = a.P.
Definition 4 Behavior Trace. Suppose the process P can be defined as a finite state transition of the form:
PPT Slide
Lager Image
< a 1 , a 2 ,⋯ an > is the behavior trace of process P . The set of all possible behavior traces is denoted by traces ( P ).
3. Model Construction Infrastructure
We have developed the PADMP model. The development procedure can be divided into four steps. First, static binary code analysis for multithreaded programs was employed to generate CFGs for each function. Second, the process expressions were generated from the CFGs. Third, the process expressions were rewritten by adding concurrent operations. Finally, the process algebra-based PADMP model was constructed.
- 3.1 From Binary Code to CFGs
We use static binary code analysis to generate a CFG because it does not require human interaction, determination of representative data sets, or access to a program’s source code. However, it should be noted that it is unsuitable for interpreted-language analysis. The techniques to generate a CFG from binary code are very mature [6 - 7] . We use the executable editing library method to generate CFGs [6] . If the transformation of a flow chart does not contain any function call, it is regarded as an empty operation ε . We eliminate all edges ε using a previously reported reduction algorithm [27] . A corresponding example is given in Section 5.2.
We replace library functions with system calls by comparing the library function and system call tables. The call instruction in assembly code calls library functions rather than system calls; thus, we must replace it with the corresponding system call. For example, sleep was replaced by nanosleep and printf was replaced by write. We then capture and analyze arguments. This is required to represent the behavior of a system call accurately. For example, we cannot know if semop execution is a P or V operation.
- 3.2 From CFGs to Process Expressions
We denote the CFGs as G = { V,E }, which were generated according to the process discussed in Section 3.1. Here, V denotes vertices, and E denotes the directed edges that were marked with system calls. For example, Fig. 1 shows one possible CFG generated from some binary code. We use the adjacency list to store it, shown in Fig. 2 .
PPT Slide
Lager Image
A CFG generated from some binary code
PPT Slide
Lager Image
The adjacency list of CFG in Fig. 1
The algorithm that generates the process expressions from the CFGs is as follows.
- (1) Find out the loop entry
According to sequential composition,
PPT Slide
Lager Image
then expand P 0 = a.b . P 2 and cut P 1 . But if P 1 in a loop,
PPT Slide
Lager Image
then expand and obtain
PPT Slide
Lager Image
The equations do not be cut, thus they should not be expanded. In a loop, there is a node called the loop entry with the property that no other node in the loop has a predecessor outside the loop .That is, every path from the entry of the entire flow graph to any node in the loop goes through the loop entry. Therefore, we should find out the loop entry and don’t expanded it.
By comparing the vertex and adjvex as shown in Fig. 2 , if the number of vertexes is greater than or equal to adjvex, then the adjvex is the loop entry. The adjacency list of Fig. 2 has three loops and has two loop entries
PPT Slide
Lager Image
- (2) Depth-first search to generate the process expressions
Selecting the loop entries and CFG entry (node 1, as shown in Fig. 1 ) as the root separately, we adopt depth-first search algorithm. We make the parent-child nodes sequential composition and make brother nodes alternative composition. To ensure that the loop entries do not be expanded, when we search a node that belongs to the loop entry, we backtrack to its parent node.
Fig. 1 has three nodes (1,2,6) as the root, so we get three process expressions:
P1=a.P2
P2=b.c.P2+d.(e.f.P2+g.P6+j.0)
P6=h.P6+i.0
- 3.3 Adding Concurrency Operators to Process Expressions
We obtain process expressions of functions according to the processes discussed in Sections 3.1 and 3.2. However, the expressions are not marked as an action ( a ) or a coaction
PPT Slide
Lager Image
which are used for concurrency. Thus, we must rewrite the process expressions.
We abstract mutual exclusion operations for a critical area as lock ( l ) and unlock ( u ). Concurrency operators are added to process expressions in the following situations:
(1) Situation 1. Multiple processes or threads: If the input edges are multiple processes or threads operation led by fork, vfork and clone, we should analyze jump sentences and change alternative composition of jump sentences to parallel composition(‘+’→’||A’), such as JLE and JNE.
(2) Situation 2. Mutual exclusion: If the input edges are mutual exclusion operation led by lock and unlock, we should add concurrency operators of exclusion operation to process expressions. Some coactions
PPT Slide
Lager Image
do not appear in expressions, such as critical sections and signal lamps. Thus, we create a process for each semaphore and make a parallel composition with the corresponding concurrent process.
If the initial value of a binary semaphore is 1, its behaviors are described as
PPT Slide
Lager Image
Similarly, if the initial value of the binary semaphore is 0, its behaviors are described as
PPT Slide
Lager Image
as is shown in Fig. 3 (a). Therefore, the initial value of a signal lamp is n (n>0) and can be described as Ln = L 1 || A L 1 || A ⋯|| A L 1 , as is shown in Fig. 3 (b).
PPT Slide
Lager Image
State transition diagram for mutual exclusion
(3) Situation 3. Condition variable: If the input edges are condition variable operation led by lock, wait, signal and unlock, we should add concurrency operators of condition variable to process expressions. In concurrent programming, we construct a synchronization construct using a condition variable, which allows threads to have both mutual exclusion and the ability to wait (block) for a certain condition to become true, as shown in Fig. 4 .
PPT Slide
Lager Image
State transition diagram for a condition variable
Its behaviors are described as follows.
PPT Slide
Lager Image
(4) Situation 4. Read/Write lock: If the input edges are read/write lock operation led by rlock, wlock and unlock, we should add concurrency operators of read/write lock to process expressions. A read/write lock allows concurrent read access to an object; however, it requires exclusive access for write operations, as shown in Fig. 5 .
PPT Slide
Lager Image
State transition diagram for read/write lock
Its behaviors are described as follows.
PPT Slide
Lager Image
Prl || A Prl denotes a read-lock. In this pattern, multiple readers can read the data in parallel.
PPT Slide
Lager Image
denotes a write-lock. When a writer is writing the data, readers will be blocked until the writer has finished writing.
- 3.4 Constructing the PADMP Model
Here, we present Definition 6 as an equivalence basis for the process expressions. Based on Definition 6, the laws of alternative composition and parallel composition are given. Then, we construct the PADMP model.
Definition 5 Action. A system call is an action.
In this paper, we map system calls to actions as the smallest unit to describe process behavior.
Definition 6 Process equivalence. If there are two different processes P Q ( traces ( P ) = traces ( Q ) ), i.e., traces ( P ) ⊆ traces ( Q ) and traces ( Q ) ⊆ traces ( P ), then P is equivalent to Q .
Our model is used to detect behaviors; thus, the process equivalence is based on the behavior trace. If two processes have the same behavior trace, they are considered equivalent. This also meets the requirements for behavior detection. However, this differs from equivalence based on mutual simulation of CCS [26] ; therefore, the left distributive law of alternative composition states that ( a.P + a.Q = a .( P + Q )). It also differs from equivalence based on refusal sets of CSP [28] .
PPT Slide
Lager Image
in CSP. As long as the stop sign
PPT Slide
Lager Image
appears in the alternative composition, the process is considered to result in downtime and is therefore unsafe.
Here, we present some laws for alternative compositions and parallel compositions based on Definition 6.
Law 1 P || A Q = Q || A P .
Law 2 ( P || A Q )|| A R = P || A ( Q || A R ).
Law 3 ( P + Q )|| A R = R || A ( P + Q )= P || A R + Q || A R .
Law 4
PPT Slide
Lager Image
is a zero element and √ is an identity; i.e.,
PPT Slide
Lager Image
Law 5 If
PPT Slide
Lager Image
Law 6 If
PPT Slide
Lager Image
According to Laws 5 and 6, we know the actions in A cannot execute independently. Thus, they must execute synchronously with corresponding coactions.
Law 7 If
PPT Slide
Lager Image
Law 7 indicates that if processes a.P and a.Q compete for
PPT Slide
Lager Image
they must be concurrent with
PPT Slide
Lager Image
Law 8 If a A and c A , then a.P || A c.Q = a .( P || A c.Q )
Law 9 If a,b A then a.P || A b.Q = a .( P || A b.Q )+ b .( a.P || A Q ).
Laws 8 and 9 indicate that actions without in the set A execute asynchronously.
Law 10 If P = a.Q + a.R , then P = a.Q + a.R = a .( Q + R ).
Law 10 transforms a nondeterministic alternative process into a deterministic process. Thus, we obtain Law 11.
Law 11 P = P + P .
Based on the above definitions and laws, we can construct actions, operators, and processes.
1) Action. We use a triplet to describe an action. action ::={ syscall, acttype, paramlist }. syscall denotes system calls, acttype denotes action type, including actions and coactions, and paramlist denotes an argument list for syscall .
2) Operators and states. The basic expressions, a,P , a,P 1 + b,P 2 , and a,P 1 || A b,P 2 , are made up of sequential, alternative, and parallel composition operators. We use a structure with at most two outputs to describe the states composed of different operators. The initial values of the structure are out 1 = NULL and out 2 = NULL ; i.e., state ::={ opetype , action , out 1, out 2}. opetype denotes operator type, and out 1 and out 2 are links pointing to the next state. opetype = 0 indicates that, for a successful termination state (√), action = √, and out 1 and out 2 are not used, as shown in Fig. 6 (a). opetype = 1 indicates that sequential composition operator (.) uses only out 1, which describes the state shown in Fig. 6 (b). opetype = 2 indicates alternative composition operators, as is shown in Fig. 6 (c).
PPT Slide
Lager Image
Graphs of stuct state when opetype has different values
3) Process. A process is composed of actions and operators. We use two tuples to describe a process. i.e., process :: = { statelist,outlist }. statelist denotes the set of states. If the state in statelist is greater than one, then suggest processes in the concurrent state. outlist is a series of link lists pointing to the states.
4) Construction algorithm
According to the symmetric law, associative law, and distributive law of choice operators, any process that does not contain concurrency operators can be constructed by action a , successful termination √, a.b , and a + b . In this paper, we refer to a , √, a.b , and a + b as meta process expressions.
Here, we describe the parallel composition operators (||A). Two concurrent processes P 1 || A P 2 ,
PPT Slide
Lager Image
are illustrated in Fig. 7 .
PPT Slide
Lager Image
Concurrent processes
According to the laws, we know the execution process of P 1 || A P 2 . Initially, P 1 || A P 2 is in two states (11, 21) simultaneously; then, execute
PPT Slide
Lager Image
simultaneously to obtain states (12, 22) simultaneously. Next, to obtain states (13, 22), c must wait for
PPT Slide
Lager Image
to appear to execute simultaneously. When
PPT Slide
Lager Image
appears, P 1 || A P 2 can obtain states (14, 23) simultaneously. Therefore, we describe parallel composition as a combination of multiple process states.
We can describe any process expression using meta process expressions a , √, a.b , and a + b . The PADMP model construction algorithm is constructed by meta process expressions.
PADMP Algorithm:
  • 1: Input: PEs /* process expressions list*/
  • 2: Output: PADMP
  • 3: Procedure:
  • 4: PE *pe; /*process expression */
  • 5: process p1, p2; /*process*/
  • 6: state *s; /*state*/
  • 7: while(i
  • 8: pe=PEs[i++]; /*get a process expression*/
  • 9: for(; *pe; pe++){/* construct model for process expression */
  • 10: switch(*pe){
  • 11: default: /* construct action*/
  • 12: s = state(1,*pe, NULL, NULL);
  • 13: push(process(s, outlist(s->out));
  • 14: break;
  • 15: case ’.’:/*process sequential operator (.)*/
  • 16: p2 = pop();p1 = pop();
  • 17: patch(p1.out, p2.start);/*make the output link of p1 point to p2*/
  • 18: push(process(p1.start, p2.out));/*push new process into stack*/
  • 19: break;
  • 20: case ’+’: /* process choice operator (+)*/
  • 21: p2 = pop();p1 = pop();
  • 22: s = state(2,NULL, p1.start, p2.start);
  • 23: push(process(s, append(p1.out, p2.out))); /*use append to connect two pointers and return the result*/
  • 24: break;
  • 25: case ’||A’: /* process parallel operator (||A)*/
  • 26: p2 = pop();p1 = pop();
  • 27: push(process(join(p1.statelist,p2.statelist),append(p1.out, p2.out))); /* use join to merge satatelist*/
  • 28: break;
  • 29: case ’√’: /*process successful termination (√)*/
  • 30: s = state(0,√, NULL, NULL);
  • 31: push(process(s, NULL);
  • 32: break;}}
  • 33: }
The action procedure is shown in Fig. 8 (a), the sequential composition operator (.) procedure is shown in Fig. 8 (b), and the alternative composition operator (+) procedure is shown in Fig. 8 (c). According to ( τ.P + τQ = P + Q ), Fig. 8 (c) is equivalent to Fig. 8 (d). Therefore, the alternative composition operator (+) procedure is correct. The parallel composition operator (|| A ) procedure is shown in Fig. 7 .
PPT Slide
Lager Image
Results of meta process expression compilation
4. Behaviors Detection
According to the operational semantics of process algebra (the above definitions and laws), we can obtain a process migration rule and a behavior decision rule, and obtain a behavior detection algorithm based on the PADMP model.
Process Migration Rule:
(1)
PPT Slide
Lager Image
According to (1) and Law 8, we know that, if a A and c A , then
PPT Slide
Lager Image
According to Law 9, we know that, if a,b A , then
PPT Slide
Lager Image
(2) According to Law 10, we obtain
PPT Slide
Lager Image
(3)
PPT Slide
Lager Image
Let s 0 denote the first action of behavior trace s ; s ' denotes the other actions; i.e., s = < s 0 < s ' >>. P / s denotes a process, which is behavior after all trace s actions have been executed. Therefore, P / s = ( P / s 0 )/ < s ' >.
Behavior Decision Rule:
Suppose model P was obtained by static analysis, and model R was obtained by detection at runtime. If and only if traces ( R )⊆ traces ( P ), then R is a normal behavior.
Behavior Detection Algorithm:
1) Monitoring the system calls of a program in real time forms a system call queue.
2) If the queue is null, then return TRUE to indicate that behaviors are normal. Otherwise, remove a system call from the head of the queue and place it into the PADMP model to determine a match. If there is a match, loop (2); else proceed to (3).
3) Determine if actions in the current state set of the PADMP model belong to synchronous set (A). If they do not belong to A, return FALSE and issue an alert. If they belong to A, perform a breadth-first search of the current state set and execute a synchronous action. If the execution is successful, update the current state set and proceed to (4). Otherwise, return FALSE and issue an alert.
4) Match actions in the update state set. If the match is successful, proceed to (2); otherwise proceed to (3).
5. Simulation Results and Analysis
Here, we report experimental results for the PADMP model and behavior detection rules.
- 5.1 Behavior Detection
Fig. 9 is sample C code for a multithreaded program. According to the explanation presented in Section 3.1, we obtain the corresponding function CFG based on the Linux IA32 operating system, shown in Fig. 10 . We only map a base block that contains a RET instruction into the successful termination process √. Next, we rewrite the assembly code.
PPT Slide
Lager Image
C code for a multithreaded program
PPT Slide
Lager Image
Corresponding function CFG
We analyze the assembly code, obtain the corresponding system call, and extract the arguments. For simplicity, we rename the system calls that are in boldface in Fig. 10 . For example, pthread_mutex_lock is renamed futex and pthread_mutex_unlock is renamed lock(m) or unlock(m) where m denotes mutual exclusion access to the critical area address. In addition, clone , which is called by pthread_create , is renamed create . Next, according to the algorithm presented in Section 3.2, we can obtain the process expression of the functions presented in Fig. 10 .
  • Pmain=init(m).P1
  • P1=create(1).Pthread1||Acreate(2).Pthread2
  • Pthread1=lock(m).sleep(5).unlock(m).exit(1).join(1).√
  • Pthread2=sleep(1).lock(m).write.unlock(m).exit(2).join(2).√
According to the method presented in Section 3.3, process P 1 of a main function can be rewritten to add concurrency operators to process expressions for mutual exclusion.
PPT Slide
Lager Image
According to Law 7 presented in Section 3.4, the process expression Pmain satisfies the sequential relationship, as shown in Fig. 11 . Detection processing is obtained by the behavior detection algorithm. Suppose that we capture the following call sequence at runtime:
PPT Slide
Lager Image
Sequence diagram for C code
(init(m), create(1), create(2), lock(m), sleep(1), sleep(5), unlock(m), exit(1), lock(m), write, unlock(m), exit(2), join(1), join(2))
According to Laws 5, 8, and 9, and the process migration rule, the final sequence can migrate successfully to the termination process. This indicates that it is a normal sequence. Suppose we capture the following call sequence:
(init(m), create(1), create(2), lock(m), sleep(1), sleep(5), lock(m), write, unlock(m), exit(2), unlock(m), exit(1), join(1), join(2))
If substituting the second lock(m) for the process expression Pmain causes downtime
PPT Slide
Lager Image
then the sequence is an abnormal sequence and an alert will be issued. We can test and verify that the sequences acting against program timing cause downtime.
- 5.2 Deadlock Analysis
Fig. 12 illustrates a deadlock error caused by cyclic lock acquisition. This example spawns two threads, each of which attempt to acquire two locks A and B. However, the threads attempt to obtain the locks in different orders: thread 1 acquires lock A then lock B, while thread 2 acquires lock B then lock A. By following the behavior detection steps presented in Section 5.1, the following process expressions can be obtained.
  • P=Pthread1||APthread2||AL
  • Pthread1=lock(a).systemcall1.lock(b).systemcall2.unlock(b).unlock(a).Pthread1
  • Pthread2=lock(b).systemcall3.lock(a).systemcall4.unlock(a).unlock(b).Pthread2
PPT Slide
Lager Image
Deadlock example
We can also obtain process expressions for the mutual exclusion lock and synchronous action set.
PPT Slide
Lager Image
According to Laws 5 and 6, the following calculations can be performed.
PPT Slide
Lager Image
Thus, it can be seen that program deadlock errors can be detected by the proposed PADMP model.
- 5.3 Efficiency Analysis
Here, we evaluate a parallel composition that contains two processes.
  • P=a.b.√||Ac.d.√
  • =a.(b.√||Ac.d.√)+c.(a.b.√||Ad.√)
  • =a.b.(√||Ac.d.√)+a.c.(b.√||Ad.√)+c.a.(b.√||Ad.√)+c.d.(a.b.√||A√)
  • =a.b.c.d.√+a.c.b.(√||Ad.√)+a.c.d.(b.√||A√)+c.a.b.(√||Ad.√)+c.a.d.(b.√||A√)+c.d.a.b.√
  • =a.b.c.d.√+a.c.b.d.√+a.c.d.b.√+c.a.b.d.√+c.a.d.b.√+c.d.a.b.√
From the above formula, if a parallel composition contains two processes (n=2), then six behavior traces are obtained. With increasing n, the number of behavior traces increases by order of magnitude. Adopting a training method will result in a state space explosion, which leads to low space–time efficiency. However, in this paper, space consumption is the cost necessary to maintain n state list; the scale is equal to the number of actions, i.e., linear growth with n. The PADMP model uses a state set to express concurrent and nondeterministic options without a backtrack algorithm. If the length of a process expression is m and an n status list must be maintained, then, total time cost is O (mn). Thus, running time is linear; increasing m will not increase the operational cost significantly.
Experimental environment: Linux rhel-server-5.4, Intel dual-core 1.73 GHz CPU, 2 GB RAM. Three test programs are listed in Table 1 .
Test Programs
PPT Slide
Lager Image
Test Programs
We construct PADMP models for the test programs and record space–time consumption. By modifying the source code, we simulate data race, deadlock, and synchronous abnormality scenarios that were caused by design errors or intrusion. We use the PADMP model to detect the scenarios and record the space–time consumption. All abnormalities are detected. The time consumption for modeling and detection is shown in Fig. 13 , and the space consumption is shown in Fig. 14 .
PPT Slide
Lager Image
Time Consumption
PPT Slide
Lager Image
Space Consumption
- 5.4 Experimental Design
These test programs and our runtime monitor run on Linux OS (rhel-server-5.4) on Intel 1.73Ghz cpu with 2GB of RAM. Behavior detection experiment program is made up of static analysis unit, modeling unit and detection unit. Static analysis unit is a tool program running in user-state, it reads the executable file and generates the corresponding process expressions files *.pe. According to laws presented in Section 3.4, modeling unit converts files *.pe to files *padmp using meta process expressions a , √, a.b , a + b . File *.padmp stores chain table structures of the corresponding process expressions. Detection unit uses hook technology to intercept information, which is the entry address of system calls service function in the sys_call_table with real-time software running, monitors and obtains system call sequences. By putting system call sequences into the file *.padmp, detection unit detects and reports the result, as is shown in Fig. 15 .
PPT Slide
Lager Image
Architecture of the model
We construct the models for some applications, such as apache, wu-ftpd, mysql and sqlite. Fig. 16 and Fig. 17 show the result of space and time overhead respectively.
PPT Slide
Lager Image
Result of space overhead
PPT Slide
Lager Image
Result of time overhead
- 5.4.1 Effectiveness of the PADMP Model
As can be seen from experimental results, the time and space overheads of PADMP model are acceptable. During modeling the overheads are high, but the PADMP model can perform on real applications well during detection.
- 5.4.2 Precision of the PADMP Model
Precision is behavior detection capability of model. We choose FTP server wu-ftpd-2.6.0, which is widely used on the Linux platform as the test objects. Wu-ftpd-2.6.0 exists multiple vulnerabilities, we select some representative vulnerabilities, and use existing programs to attack, detection results are shown in Table 2 .
Typical vulnerabilities of wu-ftpd-2.6.0 and test results
PPT Slide
Lager Image
Typical vulnerabilities of wu-ftpd-2.6.0 and test results
The experimental results show that the prototype system can detect attacks based on stack overflow, heap overflow and the format string overflow. The attack doesn't change the call sequence for CVE - 2004-2004 vulnerabilities, it is similar to the mimicry attack, not being detected. We ’ test all types of attacks, but the results in Table 2 . partly prove the ability of intrusion detection system.
6. Conclusions
In this paper, the problem of intrusion attempts on multithreaded programs is investigated. Based on process algebra, the PADMP model is presented to solve the concurrent behavior description and detection problem. Experimental results show that this method can accurately detect errors in multithreaded programs, such as data race, deadlock, and abnormal time sequence errors. Moreover, all test programs show an order of magnitude improvement in space–time complexity. However, the PADMP model doesn't consider the parameters of the system calls, so it ’ effectively detect attacks based on data flow. We will solve this problem next step.
BIO
Tao Wang is currently having her Ph.D study in Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei, China. She received her M.S. degree in the School of Information Science and Engineering, Yanshan University, Qinhuangdao, China in 2009. She is now working at Hebei Normal University of Science & Technology as a lecturer. Her current research interests are in the areas of intrusion detection and collaboration computing.
Limin Shen is currently a professor in the School of Information Science and Engineering, Yanshan University. He received his M.S. degree in computer applicationform, Hefei University of Technology, China, in 1987. He received his Ph.D degree in electronic circuit and system, Yanshan University, China, in 2005. He worked in Department of Computer Science, Illinois Institute of Technology, USA from 2005 to 2007 as a visiting scholar. His main research interests are focusing on flexible software technology and information security, which has been funded partially by the National Natural Science Foundation of China and Chinese Government.
Chuan Ma is currently having his Ph.D study in Information Science and Engineering, Yanshan University. He received his B.S. and M.S. degrees in the School of Information Science and Engineering, Yanshan University, Qinhuangdao, China in 2003 and 2009, respectively. He is an engineer with the School of Information Science and Engineering Yanshan University. His current research interests include information security and software formal methods.
References
Forrest S. , Hofmeyr S.A. , Somayaji A. , Longstaff T.A. 1996 “A sense of self for UNIX processes” IEEE Press in Proc.of the IEEE Symp. on Security and Privacy. Oakland May 6-8 Article (CrossRef Link) 120 - 128
Hofmeyr S.A. , Forrest S. , Somayaji A. 1998 “Intrusion detection using sequences of system calls” Journal of Computer Security Article (CrossRef Link) 6 (3) 151 - 180
Helman P. , Bhangoo J. 1997 “A statistically based system for prioritizing information exploration under uncertainty” IEEE Trans.on Systems,Man and Cybernetics, Part A:Systems and Humans Article (CrossRef Link) 27 (4) 449 - 466    DOI : 10.1109/3468.594912
Lee Wenke , Stolfo Salvatore J. 1998 “Data mining approaches for intrusion detection” in Proc. of the 7th USENIX Security Symp. San Antonio January Article (CrossRef Link) 26 - 29
Wagner D. , Dean D. 2001 “Intrusion detection via static analysis” IEEE Press in Proc. of the IEEE Symp.on Security and Privacy. Oakland May 14-16 Article (CrossRef Link) 156 - 168
Giffin J. , Jha S. , Miller B. 2004 “Efficient context- sensitive intrusion detection” in Proc. of the 11th Network and Distributed System Security Symp. San Diego Article (CrossRef Link)
Gopalakrishna R. , Spafford E.H. , Vitek J. 2005 “Efficient intrusion detection using automaton Inlining” IEEE Press in Proc. of the IEEE Symp.on Security and Privacy Oakland, CA May 8-11 Article (CrossRef Link) 18 - 31
Jianming FU , Fen TAO , Dan WANG 2011 “Software behavior model based on system objects” Journal of Software Article (CrossRef Link) 22 (11) 2716 - 2728    DOI : 10.3724/SP.J.1001.2011.03923
Feng H.H. , Giffin J.T. , Huang Y. , Jha S. 2004 “Formalizing sensitivity in static analysis for intrusion detection” IEEE Press in Proc. of the IEEE Symp.on Security and Privacy Oakland, CA May 9-12 Article (CrossRef Link) 194 - 208
Savage S. , Burrows M. , Nelson G. , sobalvarro P. 1997 “Eraser: A dynamic data race detector for multi-threaded programs” ACM Trans. on Computer Systems Article (CrossRef Link) 15 (4) 394 - 411    DOI : 10.1145/265924.265927
Schonberg D. 1989 “On-the-Fly detection of access anomalies” ACM Press in Proc. of the ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI) July vol. 24, no. 7, Article (CrossRef Link) 285 - 297
Wang L.Q. , Stoller S.D. 2006 “Runtime analysis of atomicity for multi-threaded programs” IEEE Trans. on Software Engineering Article (CrossRef Link) 32 (2) 93 - 110
Deguang K. , XB Tan , HS Xi 2010 “Hidden Markov Model for Multi-Thread Programs Time Sequence Analysis” Journal of Software Article (CrossRef Link) 21 (3) 461 - 472
Rakamaric Z. 2010 “STORM: static unit checking of concurrent programs” In Proc. of the 32nd ACM/IEEE International Conference on Software Engineerin Cape Town, South Africa May 2-8 vol. 2, Article (CrossRef Link) 519 - 520
Berger E. D. , Yang Ting , Liu Tongping , Novark Gene 2009 “Grace: safe multithreaded programming for C/C++” in Proc. of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications October vol. 44, no. 10, Article (CrossRef Link) 81 - 96
Tallent N. R. , Mellor-Crummey J. M. 2009 “Effective Performance Measurement and Analysis of Multithreaded Applications” in Proc. of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming April vol. 44, no. 4, Article (CrossRef Link) 229 - 240
Tallent N. R. , Mellor-Crummey J. M. , Porterfield A. 2010 “Analyzing Lock Contention in Multithreaded Applications” in Proc. of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming May vol. 45, no. 5, Article (CrossRef Link) 269 - 280
Joao J. A. , Suleman M. A. , Mutlu O. , Patt Y. N. 2012 “Bottleneck Identification and Scheduling in Multithreaded Applications” in Proc. of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems March vol. 40, no. 1, Article (CrossRef Link) 223 - 234
Morris J.H. 1968 “Lambda-calculus Models of Programming Languages” MIT, Cambridge, MAC USA Article (CrossRef Link)
Bekic H. 1971 “Towards a mathematical theory of processes” IBM Laboratory Vienna Technical Report TR Article (CrossRef Link)
Milne G.J. , Milner R. 1979 “Concurrent processes and their syntax” Journal of the ACM Article (CrossRef Link) 26 (2) 302 - 321    DOI : 10.1145/322123.322134
Caromel D. , Henrio L.A. 2005 “Theory of Distributed Objects” Springer-Verlag Berlin Article (CrossRef Link)
Caromel D. , Henrio L. , Serpette B.P. 2009 “Asynchronous sequential processes” Information and Computation Article (CrossRef Link) 207 (4) 459 - 495    DOI : 10.1016/j.ic.2008.12.004
Cardelli L. , Gordon A.D 2000 “Mobile Ambients” Theoretical Computer Science Article (CrossRef Link) 240 (1) 177 - 213    DOI : 10.1016/S0304-3975(99)00231-5
Cardelli L. , Ghelli G. , Gordon A.D. 2002 “Types for the ambient calculus” Types for the Ambient Calculus Article (CrossRef Link) 177 (2) 160 - 194
Milner R. 1980 “A calculus of communicating systems” Lecture Notes in Computer Science Springer-Verlag New York, Inc. Secaucus, NJ, USA Article (CrossRef Link)
Hopcroft J. 1971 “An nlogn algorithm for minimizing states in a finite automaton” Theory of Machines and Computations Academic Press New York Article (CrossRef Link)
Hoare C. 1978 “Communicating sequential processes” Communications of the ACM Article (CrossRef Link) 21 (8) 666 - 677    DOI : 10.1145/359576.359585