A reliable analysis of consumer preference from a large amount of purchase data acquired in real time and an accurate customer characterization technique are essential for successful direct marketing campaigns. In this study, an optimal segmentation of post office customers in Korea is performed using a subspace projection–based clustering method to generate an accurate customer characterization from a high-dimensional census dataset. Moreover, a traditional temporal mining method is extended to an algorithm using the MapReduce framework for a consumer preference analysis. The experimental results show that it is possible to use parallel mining through a MapReduce-based algorithm and that the execution time of the algorithm is faster than that of a traditional method.
Direct marketing is the process of identifying potential customers of products and promoting the products accordingly [1]. Marketing campaigns seek to elicit actions, such as an order, a visit to a store, or a request for further information, from selected groups of consumers in response to communication from the marketer. Retailer ads, catalog distribution, postcards, flyers, promotional letters, and real samples or publications are all included in advertising techniques. Direct marketing is a new service providing the target delivery area and address information requested by a business looking to advertise using the postal delivery address customer relationship management (CRM) and data mining techniques. The most important factor in the success of direct marketing is securing customer lists [2]. Therefore, efficient data analysis techniques are needed in direct marketing for the selection of an appropriate customer list that includes temporal and spatial information. According to research by the US Postal Service (USPS), 80% of US residents receive direct mail, which they spend twenty-five minutes per day reading [3]. The USPS has been studying a way to simultaneously increase the response rate and decrease their expenses with direct mail services [2].An accurate customer segmentation and reliable analysis regarding the bulk data on customer preference accumulated in real time are required for an effective direct marketing campaign. From the perspective of customer characterization, a detailed cluster analysis with various customer characteristics is required [4]. In addition, a market basket analysis [4]–[6] with a large amount of real-time product purchasing information and a temporal pattern analysis [7]–[13] for detecting preference changes over time are also important from the perspective of customer preference. A direct marketing campaign can be improved when customer groups based on similar lifestyles are formed and characterized. Thus, customer segmentation is performed using demographical, geographical, and economical information. Clusters detected using a clustering method are used to generate a profile based on the type of lifestyle definition. Business enterprises can identify customer groups using such profiling information and target their marketing services based on the different customer lifestyles available. However, census data (such as demographical, geographical, and economic information) used for detailed customer segmentation are high-dimensional data with around 100 different attributes. Therefore, traditional clustering methods, such as k-means and self-organizing map (SOM), are inefficient because they consider complete sets of attributes to detect clusters. Traditional methods such as k-means and SOM are based on the Euclidean distance measure and focus only on the global property by considering all dimensions in a dataset. As the number of dimensions in a dataset increases, distance measures become increasingly meaningless. Thus, a clustering algorithm using a subspace projection method is necessary for high-dimensional data. Subspace clustering methods seek to find clusters existing in subspaces or projections of a given high-dimensional data space, where a subspace is defined using a subset of attributes of a full space [14]. The mining of association rules [7] or sequential patterns [8] is mainly applied to analyze customer purchase patterns. However, for rating the customer preference of product changes over time, temporal characteristics are to be considered when a market basket analysis is applied. Data mining techniques using temporal characteristics are divided into periodic (or cyclic) pattern [9]–[11] and temporal relation [12] analyses. A periodic analysis for time-interval data discovers the cyclic pattern for repeated events during a time granularity. A temporal relation analysis uses temporal operators to show events based on a temporal expression. Previous works on temporal pattern mining for discovering useful patterns from temporal data have been conducted, but they do not consider large-scale temporal data. Among the above methods, a periodic pattern analysis is appropriate for a customer’s product preference rating. Moreover, with the fast growth of online shopping, a large amount of historical purchase data is accumulated in real time. Thus, a distributed processing system to handle bulk data is needed since there is a limitation in executing a market basket analysis algorithm on a single machine. Recently, Google has used a distributed file system to manage bulk online data efficiently and provides a MapReduce framework [15]–[17] to support an efficient distributed programming environment. A MapReduce program is composed of a map procedure that conducts filtering and sorting, along with a reduce procedure that conducts a summary operation. MapReduce can conduct such operations simultaneously with multiple machines and leads to faster operations for tasks with high cost. An algorithm using the MapReduce framework is proposed herein to conduct a market basket analysis using temporal characteristics. Temporal mining using the MapRedue framework is a challenging topic because the problem of mining temporal patterns from large-scale time interval data is very complicated.There are two main purposes in this paper. First, optimal cluster groups for the customers of Korea Post, the Korean postal service, are formed with the necessary subspace from a high-dimensional census dataset, and accurate customer characterization information of each group is given to the postal operators and business marketers through profiling. Second, temporal pattern and purchasing pattern mining techniques are suggested using the MapReduce framework to extract the product preference information quickly and accurately. Figure 1 illustrates the data mining framework for a direct marketing service, and the methodology proposed in this study for providing marketing information is as follows:
▪ Data integration and preprocessing — the census information, postal address, and historical purchase information are combined with the spatial information. These integrated datasets are preprocessed at the block level (MicroArea) for privacy protection.
▪ Subspace cluster analysis — groups having a similar lifestyle are formed from spatial and census information, and then, based on the group, profiling is performed for characterization.
▪ TACRs on MapReduce — profiled clusters are used as a class label for prediction, and purchased products are transformed into a transaction database to discover valid TACRs in the given time domain. In this step, the exploration of associative classification rules (ACRs) and temporal patterns uses the MapReduce framework to enable distributed programming.
Cluster analysis and Temporal Associative Classification Rules (TACR) mining framework for direct marketing service.
II. Customer Characterization Based on Subspace Projection Method
- 1. Data Integration and Description
Census data and postal addresses are integrated for the characterization of customer grouping through a geo-lifestyle cluster analysis and profiling. Census data are provided in the form of an open application programming interface with geographic information system (GIS) information by the National Statistics Korea, and postal address information is given by Korea Post. Census information provides demographic, geographical, and economic information, along with their subconcept information, hierarchically. Moreover, the average income information of the target customers is given from credit card companies and is added to the census information. All the attribute values from the census information are continuous statistical data. The attributes of the census information for data integration is shown in Table 1. Census information is provided at the MicroArea1) level and contains about 60 households (for privacy); in addition, maps are given in the form of a polygon — that is, a spatial object of the MicroArea. A postal address in text form is transformed into a spatial form by setting the standard GIS coordinates. A spatial join, one of the GIS operations, is used to integrate the census information from different units (points and polygons). By joining MicroArea (expressed as area) and addresses (defined as point objects), a spatial join operator calculates their inclusion relation. Through this operation, customers’ marketing information corresponding to their addresses can be extracted by block unit while protecting personal information. The process of data integration is illustrated in Fig. 2.
Example of data integration using a spatial join operation (polygons present a MicroArea, and points are postal addresses).
- 2. Projected Clustering (PROCLUS) Method
PROCLUS [18] is a k-medoid-type method that first generates k-cluster centers for a high-dimensional dataset using a dataset sample. It then refines the subspace clusters iteratively. In each iteration, for each of the current k-medoids, PROCLUS considers the local neighborhood of the medoid in the full dataset and discovers a subspace for the cluster by minimizing the standard deviation of the distances of the points in the neighborhood to the medoid on each dimension. Once all of the subspaces for the medoids are determined, each point in the dataset is assigned to the closest medoid according to the corresponding subspace. Clusters and outliers are discovered. In the next iteration, new medoids replace existing ones if doing so improves the clustering quality [19]. The PROCLUS algorithm is shown in Fig. 3. PROCLUS, a subspace projection method, uses the OpenSubspace of the Java WEKA [20] data mining tool. It supports up-to-date performance measures to facilitate study on subspace clustering. For an unsupervised method such as a cluster analysis, it is difficult to determine the appropriate parameter settings without prior knowledge on the data. For instance, there is a difficulty in finding the number of clusters for users in the case of the k-means algorithm. The PROCLUS algorithm supports parameter bracketing for the most-suitable parameter setting process. However, the best range must be determined through more repeated iterations because the algorithm using a range to set parameters, as opposed to using a fixed value. The PROCLUS algorithm has two parameters, k and l, which indicate the number of clusters and number of dimensions, respectively. A total of 50 individual results needed to be analyzed to find (k, l); that is, the best parameter pair, if the user has set k = {1 to 5} and l = {11 to 20}.
In this section, the methodology using the MapReduce framework is proposed for a market basket analysis using the TACRs algorithm. The TACRs algorithm was applied for intrusion detection in our previous work [21]. Therefore, the concept of TACRs is briefly recalled first, and a description of an extended algorithm using a distributed programming method based on MapReduce will follow.
- 1. Calendar-Based Temporal Patterns
A calendar schema (CS) is a relational schema that is based on the calendar concept hierarchy [9], [21]. A CS is defined to comprise a set of calendar-based time granularities and corresponding possible domain values. It is of the following form:where Gi is a time granularity in a calendar concept, such as a year, a month, or a day, for 1 ≤ i ≤ n. Each Di, a positive integer, is a domain value of Gi. When the CS is (Gn, Gn−1, ... , G1), 1 ≤ i ≤ n, time granularity Gi is included uniquely into Gi+1. For example, the schema of (month, day) is valid because “day” is included in a specific “month.” In the case of (year, month, week), “week” does not belong to a specific “month.” CS = (month:{1–12}, day:{1–30} {1, 20}) is valid for expressing the twentieth day of January, but {2, 31} is not a valid set. A calendar pattern (CP) is an instance of a given schema CS = (Gn:Dn, ... , G1:D1) and can be expressed as CP = {dn, ... , d1}. Here, each di is a domain value of Di or a symbol “*.” If a particular di is a symbol, then it implies that all values of domain Di can be included; hence, this symbol can be interpreted as denoting “every.” In addition, “*” means the time periodicity for the current domain. For example, when a CS is given as (month:{1–3}, day:{1–7}), calendar pattern {*, 3} expresses the time intervals {1, 3}, {2, 3}, and {3, 3} — that is, the Wednesdays of the first three months of the year. Depending on the number of “*”s included in a calendar pattern, the expression is classified. A calendar pattern containing i “*”s is called an i-star pattern (CPi), and other patterns including no “*”s are called a basic time interval (CP0) [9].
- 2. TACR Mining Using MapReduce Framework
Customer purchase patterns for each cluster can be generated using associative classification rule mining [22]–[24]. Let D denote a transaction database, I = {i1, i2, ... , in} an itemset, and C = {c1, c2, ... , cm} a cluster (or class) label. ACRs are expressed in the form X→ci, while the antecedent of an ACR is an itemset, and a rule itself is called a ruleitem.The support and confidence of ACRs, denoted by σ and δ, respectively, are defined as follows:ACRs are a ruleitem set satisfying the minimum support and minimum confidence. Temporal mining is performed by adding a CP. If a CP for a CS is given, then the transaction of a timestamp is expressed as D(CP). Grammatically, the ACRs are expressed in the form . For instance, in the case of CS = (year:{1999–2001}, month:{1–12}, day:{1–30}), rule <(A˄B)→cluster1>, {*, 2, 3}> shows a calendar and cyclic expression. This rule is valid in a set of basic time intervals, such as {1999, 2, 3}, {2000, 2, 3}, and {2001, 2, 3}, and means that the rule <(A˄B)→cluster1> is established. In addition to the support and confidence of the ACRs, temporal ACRs use a new threshold, f, called the frequency. Threshold f (%) is defined as the minimum frequency. TACRs satisfy the support and confidence with respect to frequency f in transaction DB if holds during no less than 100 × f (%) basic time intervals covered by CP [9], [21]. Therefore, the mining of TACRs results in the discovery of a set of rules that satisfy both the minimum support and minimum confidence from the transaction database. Such mining can discover changing rules as time goes on by considering the frequency f.Market basket data are huge amounts of data automatically obtained in a given time interval. Thus, the MapReduce structure should be included in the temporal concept for generating efficient associative classification rules. MapReduce’s basic data structure is composed of pairs of , where either element in can be defined as an integer, real number, string, byte string, or arbitrary complex data structures [15]–[16]. For example, the key and value elements become the URL and HTML contents, respectively, when a group of web pages is needed to be expressed by such a pair. Before the input data of MapReduce are created, they need to be segregated at the block level for a MapReduce operation. These are distributed into the data blocks in a specified size. After that, the mapper and reducer tasks start to execute the map function and reduce function independently. The mapper reads the given data and generates the pairs of as the basic data structure for the applied transaction. The reducer generates the output in pairs of from all the values having the same intermediate key.During this process, the transaction is performed for the group of intermediate values. The mapper task and reducer task process the data in parallel. The support of frequent ruleitems in each basic time interval for forming all temporal ACRs is calculated first. Therefore, the suggested algorithm is processed by the following three steps [21]:
First, all largeruleitemsare generated regarding the given CS.
Next, real rules for building the classification model are generated from them based on the confidence level.
Finally, the generated rules in the basic time units are renewed when considering the containment of the CP.
It will not be useful to use conventional algorithms, since the parcel acceptance data come at a very large scale. Thus, the a priori map and reduce functions for the distribution programming of [25] are applied for the candidate generation and support counting. The ruleitems discovered from the basic time interval are also used in CP renewal steps to find the time cycle. The single-pass counting algorithm proposed in [17], along with the MapReduce framework, is extended to generate frequent a priori–based itemsets. In our proposed algorithms, each map function computes the count value of each candidate from its own partition, and each candidate and its count value are then presented as the output. The processing result of the map function, candidates, and their count values are collected and summed for the reduce function. The communication cost is decreased through the count distribution process for the map and reduce functions. The algorithms are performed as follows:
▪ The transaction database is divided based on the basic time intervals and database partitioning.
▪ The algorithm inFig. 4discovers theruleitemscontaining one item through a single transaction scan.
▪ First, the is output regarding eachruleitemcontained in the transaction based on the map function.
▪ The reduce function collects the counts of eachruleitemand returns the result of ,R1.
Figure 5 is an algorithm for ruleitems containing more than two items. It operates as follows:
First it readsR1to generate candidateC2in the DistributedCache of Haddop[26]. It is possible to generate a candidate by calling the candidateGen function.
Next, the map function scans transactiontiwith the subset function to identify the candidates in allti.
The reduce function collects all count values of the candidates and generates <ruleitem2, count> as large 2-ruleitems. The exploration process from large 3-ruleitems to k-ruleitems is identical to that of large 2-ruleitems.Figure 6 shows an updating algorithm for i-star patterns containing more than one “*” for each rule in the CP0. This updating algorithm operates as follows:
First, each ACR is divided by each CP0.
The map function generates mappers containing one “*” and outputs by each map function.
Reducers are generated at the reduce function step for each ACR, and count values of 1-star patterns belonging to the key, ACRs, are collected.
Finally, the pair of is generated satisfying the minimum frequency.
The renewal process from the 2-star pattern to i-star pattern is the same as the 1-star pattern update process.Examples for the ACRs discovery process and 1-star pattern update in the basic time intervals are shown in Figs. 7 and 8. After finding 1-ruleitem by the algorithm described in Fig. 4, Mapper reads R1 and generates candidate 2-ruleitems. In the case where the minimum support is two, the Reducer sums up the counts and outputs frequent 2-ruleitems into R2, as shown in Fig. 7. At the CP updating phase, Mapper composes 1-star patterns including just one “*” and outputs ACRs relevant to each 1-star pattern.
Reducer sums up the count of 1-star patterns satisfying each of the ACRs and outputs an 1)> that meets the minimum frequency. The same procedure is applied to i-star patterns update.
IV. Experimental Results and Discussion
- 1. Datasets and Preprocessing
The census information for customer characterization is mainly for a metropolitan area reflecting a high population density and consumer sentiment. The results of a spatial informatization performance show that the dataset to be used includes 5,269,364 mail addresses and 15,987 features from the census information. The PROCLUS clustering method forms similar groups with the subspace of all dimensions for the class of prior knowledge included in the training datasets. Next, PROCLUS classifies the objects of test datasets to identify the group to which they belong. Therefore, the class information (prior knowledge) is needed. In this experiment, four types of customer groups segmented by credit rating are given, and the prior class labels are defined with this information. The prior class labels are considered as the original clusters, and the classes grouped by the PROCLUS method become discovered clusters, which are compared and analyzed. The data for a market basket analysis is derived from the e-Commerce purchase history of Korea Post. The data attributes include the purchase date, customer address, product code, and product name. The dataset uses a total of 18,680,419 cases from July 2008 to December 2009. In particular, the week is added to the time-stamped time granularity, and the purchase time is sectioned into morning, afternoon, evening, or night. The preprocessed data for temporal ACR mining are shown in Table 2.
- 2. PROCLUS Cluster Analysis for Profiling Customer Characterization
The preprocessed census data containing 99 dimensions and 15,987 MicroArea block objects are clustered in this subsection. The PROCLUS clustering method supports parameter bracketing for the most-appropriate-parameter setting process. The parameter settings of the PROCLUS algorithm for census data are given in Table 3. Because the census data have 99 attributes, the range of the average dimensions is set from 1 to 99, and the number of clusters is set from 3, which is lower than the original number of clusters (4 to 10). Each time the experiment is repeated, two parameter values are incremented by 1.
As PROCLUS clusters similar objects in the training datasets and classifies which group out of the defined clusters the test data objects belong to, two methods of clustering and classification are included. Therefore, an evaluation measure such as the sum of the squared error (from a traditional clustering method) is inappropriate. The PROCLUS algorithm of OpenSubspace considers each cluster label after clustering as the prediction target class of the classification problem. The current study uses evaluation measures such as F1-value, accuracy, entropy, and coverage to evaluate the clustering algorithm. Formal definitions of these measures are given below [14].andTable 4 shows the results of the PROCLUS algorithm with five clusters and 47 dimensions when applying the parameter bracketing in Table 3. The census dataset was applied with 47 dimensions depending on the parameter set with the range shown in Table 3, and the optimal number of clusters determined was five. Based on the results in Table 4, PROCLUS was used along with the clustering results shown in Table 5.
Confusion matrix of original vs. discovered groups.
Original groups
Discovered groups (cluster: 5, dimension: 47)
C1
C2
C3
C4
C5
Cluster 1 (4,572/4,578)
0
3,810
254
508
0
Cluster 2 (3,591/3,609)
57
228
114
2,691
501
Cluster 3 (3,802/4,133)
79
159
2,448
159
957
Cluster 4 (3,426/3,455)
3,426
0
0
0
0
For cluster 1, cluster 2, cluster 3, and cluster 4 (the original groups of Table 5), specific clusters detected from the PROCLUS analysis results are profiled into the customer characterization. Different subgroups can be formed from the original groups using the confusion matrix in Table 5. For example, cluster 2 comprises five clusters, C1 through C5. However, C1, C2, and C3 are removed. Since the number of members included in the cluster is small, these are considered to be minor groups. Therefore, one C2 cluster is created for the cluster 1 group, and two subgroups, C4 and C5, are created for the cluster 2 group. The groups selected as the cluster in Table 5 are displayed in bold font.
- 3. TACRs Mining Performance Evaluation
In this subsection, we evaluate the performance of the proposed algorithms, including their scalability with respect to the support and frequency thresholds. While the minimum support is a parameter affecting the generation of ruleitems, the minimum frequency is a parameter affecting the generation of i-star CPs. We carried out experiments on our market basket data from Korea Post. For the performance evaluation of parallel mining, five machines equipped with a 2.67 GHz Intel(R) Core(TM) CPU and 8 GB of RAM were used in the experiments. Java ver. 1.6 and Hadoop 0.20 were used to develop the algorithm. The transaction DB used in the experiments is composed of 17 distinct items, and the average length of the transactions is 12. The execution times between the algorithm proposed in this paper and the traditional classification based on temporal class-association rules (CTAR) [21] algorithm are compared when executed on one machine (see Fig. 9). The experimental results shows that the CTAR algorithm is faster in both the large ruleitems discovery phase and the i-star pattern updating phase in terms of the default time, as shown in Figs. 9(a) and 9(b). The use of a Hadoop distributed file system results in an additional cost when communicating between each mapper and reducer. However, when multiple machines are used, the proposed MapReduce-based methods show a superior performance since the algorithm with MapReduce shows an increase in the execution speed depending on the number of employed machines. The proposed algorithms show better results even when only two machines are used, as illustrated in Figs. 9(c) and 9(d). Figure 9(c) shows a runtime comparison in terms of the machine quantity when the support is set to 1%, and Fig. 9(d) shows a runtime comparison of the machine quantity change when the frequency is set to 80%.
Runtime of MapReduce-based TACRs vs. CTAR: (a) scalability with support, (b) scalability with frequency, (c) scalability with no. of machines: support set to 1.0%, and (d) scalability with no. of machines: frequency set to 80%.
The accuracy of the proposed classification model in predicting the customer characterization group was evaluated based on the precision, recall, and F1-value of (3) through (5). Thresholds such as the support, confidence, and frequency involved in the generation of classification rules were experimented on in the same way as our previous works [21]; thus, the detailed process of the test was omitted, and we used the optimal parameter setting values from the test results. All evaluation measures were obtained using the methodology of a stratified 10-fold cross validation for all six classes. Table 6 shows the performance evaluation results of the TACRs model for the different classes. In addition, the error rate measures of the classification use the mean absolute error (MAE) and root-mean-square error (RMSE). All classes (six clusters) were discovered by the PROCLUS clustering algorithm.To evaluate the performance with respect to the number of instances and classes, a confusion matrix is used. Table 7 records the accuracy of the classifiers used in a confusion matrix. According to the results shown in Tables 6 and 7, our TACRs method performs very well. Temporal ACR mining discovers the frequent ruleitems for customer-purchased items and represents the purchasing cycle that occurs repeatedly under the given time schema. The class included in ruleitems is the profiled cluster information found through a geo-lifestyle cluster analysis. An example of the data mining execution result for direct marketing is as follows.
Detailed accuracy results based on class. Optimal parameters of the algorithm were set as follows: support (0.8%), confidence (68%), and frequency (62%).
For example, when {clothing, flower}→cluster 2 <5,*,5>, <12,4,*> is applied for direct marketing, it is possible to conduct a targeted marketing campaign with the rules found from TACRs mining, such as is shown in Fig. 10.
The most important factor for a successful direct marketing is to obtain a list of customers. In this paper, a data mining method including temporal and spatial information is proposed for an appropriate selection of a customer list. Initially, Korea Post customers were segmented into groups according to style. The census data including demographic, economic, and geographical information are analyzed for customer grouping at the MicroArea block level with a consideration of personal information. The PROCLUS algorithm provided by OpenSubspace was used because the census information is a high-dimensional dataset. The cluster analysis showed that registered customers of Korea Post are segmented into six groups of different characteristics. A temporal associative classification method applied with the MapReduce framework was developed as a new data mining method for direct marketing. A distributed processing method using multiple machines, rather than a single machine, was developed to find product purchase cycles and purchase patterns from a large amount of customer purchase history data. The distributed processing method is compared with a traditional method for various parameter changes of the TACRs algorithm, and it was proven that the TACRs algorithm handles the market basket data analysis faster. As the contribution of the data mining techniques proposed in this paper, the GIS-based profiled customer characterization information and parallel mining method enabling a faster analysis of huge amounts of product purchase information are made available to Korea post for the first time. Future research will use the suggested data mining techniques on real big data related to marketing collected over years, analyze the results, and apply MapReduce-based TACRs to various fields.
This work was supported by the Postal Technology R&D program of MSIP (10039146, Development of Implementation Technology for SMART Post).
MicroArea: statistics-collecting unit, a spatial concept while protecting resident’s personal information and 1/24 of the small town such as eup, myeon, and dong (local district level) in size.
hg_lee@etri.re.krHeon Gyu Lee received his BS degree in computer science from Kyonggi University, Suwon, Rep. of Korea, in 2002 and his MS and PhD degrees in computer science from Chungbuk National University, Cheongju, Rep. of Korea, in 2005 and 2009, respectively. Since 2009, he has been working for the Electronics and Telecommunications Research Institute, Daejeon, Rep. of Korea, where he is now a senior member of the researching staff. His main research interests are data mining, databases, bioinformatics, pattern recognition, and knowledge-based information retrieval.
ychoi@etri.re.krYong Hoon Choi received his BS degree in mechanical engineering from the University of Ulsan, Rep. of Korea, in 1997 and his MS and PhD degrees in industrial and manufacturing systems engineering from Iowa State University, Ames, USA, in 1999 and 2003, respectively. Since 2004, he has been working for the Electronics and Telecommunications Research Institute, Daejeon, Rep. of Korea, where he is now a director of the Logistics Process Research Team. His main research interests are signal processing; process design and monitoring; SCM; and logistics automation.
hoonjung@etri.re.krHoon Jung received his BS degree in industrial engineering from Kyunghee University, Seoul, Rep. of Korea, in 1989 and his MS and PhD degrees in industrial engineering from Iowa State University, USA and the University of Missouri, MO, USA, in 1997 and 2001, respectively. Since 2002, he has been working for the Electronics and Telecommunications Research Institute, Daejeon, Rep. of Korea, where he is now a senior researcher. His main research interests are supply chain management, logistics information systems, and logistics strategy.
Corresponding Authoryhshin@ynu.ac.krYong Ho Shin received his BS degree in industrial engineering from Seoul National University, Rep. of Korea, in 1993 and his MS and PhD degrees in industrial and system engineering from KAIST, Daejeon, Rep. of Korea, in 1995 and 2003, respectively. From 2007 to 2009, he worked for the Electronics and Telecommunications Research Institute, Daejeon, Rep. of Korea. Since 2009, he has been with the School of Business, Yeungnam University, Kyeongsan, Rep. of Korea, where he is now an associate professor. His main research interests are operations management, logistics management, data mining, and knowledge management systems.
Oh H.-J.
,
Lee C.
,
Lee C.-H.
2012
“Analysis of the Empirical Effects of Contextual Matching Advertising for Online News,”
ETRI J.
34
(2)
292 -
295
DOI : 10.4218/etrij.12.0211.0171
Kim J.S.
“CTAR: Classification Based on Temporal Class-Association Rules for Intrusion Detection,”
Int. Workshop WISA
Jeju, Rep. of Korea
Aug. 25–27, 2003
84 -
96
Li W.
,
Han J.
,
Pei J.
2001
“CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules,”
IEEE Int. Conf. Data Mining
San Jose, CA, USA
369 -
376
@article{ HJTODO_2015_v37n2_317}
,title={Subspace Projection-Based Clustering and Temporal ACRs Mining on MapReduce for Direct Marketing Service}
,volume={2}
, url={http://dx.doi.org/10.4218/etrij.15.2314.0068}, DOI={10.4218/etrij.15.2314.0068}
, number= {2}
, journal={ETRI Journal}
, publisher={Electronics and Telecommunications Research Institute}
, author={Lee, Heon Gyu
and
Choi, Yong Hoon
and
Jung, Hoon
and
Shin, Yong Ho}
, year={2015}
, month={Apr}
TY - JOUR
T2 - ETRI Journal
AU - Lee, Heon Gyu
AU - Choi, Yong Hoon
AU - Jung, Hoon
AU - Shin, Yong Ho
SN - 1225-6463
TI - Subspace Projection-Based Clustering and Temporal ACRs Mining on MapReduce for Direct Marketing Service
VL - 37
PB - Electronics and Telecommunications Research Institute
DO - 10.4218/etrij.15.2314.0068
PY - 2015
UR - http://dx.doi.org/10.4218/etrij.15.2314.0068
ER -
Lee, H. G.
,
Choi, Y. H.
,
Jung, H.
,
&
Shin, Y. H.
( 2015).
Subspace Projection-Based Clustering and Temporal ACRs Mining on MapReduce for Direct Marketing Service.
ETRI Journal,
37
(2)
Electronics and Telecommunications Research Institute.
doi:10.4218/etrij.15.2314.0068
Lee, HG
,
Choi, YH
,
Jung, H
,
&
Shin, YH
2015,
Subspace Projection-Based Clustering and Temporal ACRs Mining on MapReduce for Direct Marketing Service,
ETRI Journal,
vol. 2,
no. 2,
Retrieved from http://dx.doi.org/10.4218/etrij.15.2314.0068
[1]
HG Lee
,
YH Choi
,
H Jung
,
and
YH Shin
,
“Subspace Projection-Based Clustering and Temporal ACRs Mining on MapReduce for Direct Marketing Service”,
ETRI Journal,
vol. 2,
no. 2,
Apr
2015.
Lee, Heon Gyu
Choi, Yong Hoon
Jung, Hoon
et al.
“Subspace Projection-Based Clustering and Temporal ACRs Mining on MapReduce for Direct Marketing Service”
ETRI Journal,
2.
2
2015:
Lee, HG
,
Choi, YH
,
Jung, H
,
Shin, YH
Subspace Projection-Based Clustering and Temporal ACRs Mining on MapReduce for Direct Marketing Service.
ETRI Journal
[Internet].
2015.
Apr ;
2
(2)
Available from http://dx.doi.org/10.4218/etrij.15.2314.0068
Lee, Heon Gyu
,
Choi, Yong Hoon
,
Jung, Hoon
,
and
Shin, Yong Ho
,
“Subspace Projection-Based Clustering and Temporal ACRs Mining on MapReduce for Direct Marketing Service.”
ETRI Journal
2
no.2
()
Apr,
2015):
http://dx.doi.org/10.4218/etrij.15.2314.0068