CHAPTER
ONE
1.0
INTRODUCTION
Association
rule mining(ARM) is used for identification of association between a large set
of data items. Due to large quantity of data stored in databases, several
industries are becoming concerned in mining association rules from their
databases. For example, 410 Savi Gupta &RoopalMamtora the detection of
interesting association relationships between large quantities of business
transaction data can assist in catalog design, cross-marketing, and various
business decision making processes. A typical example of association rule mining
is market basket analysis. This method examines customer buying patterns by
identifying associations among various items that customers place in their
shopping baskets. The identification of such associations can help retailers to
expand marketing strategies by gaining insight into which items are frequently
purchased jointly by customers. This work acts as a broad area for the
researchers to develop a better data mining algorithm. This paper presents a
survey about the existing data mining algorithm for market basket analysis.
This review paper is organized as follows: Section I contains brief
introduction of ARM, Section II depicts market basket analysis which is an
application of ARM, Section III discusses the literature survey in which
various data mining algorithms are discussed, section IV discusses apriori
algorithm, problems and directions of data mining algorithms are depicted in
section V. Then the complete paper is summarized in the section VI, which
includes conclusion and future scope.
1.1
BACKGROUND OF STUDY
Data mining is described as the extraction of hidden helpful
information from a collection of huge databases; data mining is also a
technique that encompasses an enormous form of applied mathematics and
computational techniques like link analysis, clustering, classification,
summarizing knowledge, regression analysis and so on. Data mining tools predict
future trends and behaviors, permitting businesses to create knowledge-driven
selections. The machine-driven, prospective analyses offered by data mining
move on the far side the analyses of past events. Data mining tools provides
answer to business questions that were time consuming. They search
databases for hidden patterns, finding useful information that is beyond the
reach of specialists.
Data mining techniques is enforced speedily on existing
package and hardware platforms to reinforce the worth of existing information
resources, and might be integrated with new product and systems as they're
brought. once enforced on high performance client/server or multiprocessing
computers, data mining tools will analyze huge databases to provide answers to
questions such as, ”What goods consumers tend to buy the most and goods
that go along side with it”.
Coenen(2010) in his publication” Data Mining: Past, Present
and Future” discussed the history of data mining can be dated as far back
as late 80s when the term began to be used, at least within the research
community and diffrentiated it from sql.
Broadly data mining can be defined as as set of mechanisms
and techniques, realized in software, to extract hidden information from data.
However,the word hidden in this definition is important; By the early 1990s
data mining was commonly recognized as a sub process within a larger process
called Knowledge Discovery in Databases or KDD , the most commonly used
definition of KDD is that of Fayyad et al as “the nontrivial process of
identifying valid, novel, potentially useful and ultimately understandable
patterns in data.’’ (Fayyad et al. 1996).
As such data mining should be viewed as the sub-process,
within the overall KDD process, concerned with the discovery of hidden
information. Other sub-processes that form part of the KDD process are data
preparation (warehousing, data cleaning, pre-processing,and so on) and the
analysis/visualisation of results. For many practical purposes KDD and data
mining are seen as synonymous, but technically one is a sub-process of the
other. The data that data mining techniques were originally directed at was
tabular data and, given the processing power available at the time,
computational eficiency was of significant concern. As the amount of processing
power generally available increased, processing became less of a concern and
was replaced with a desire for accuracy and a desire to mine ever larger data
collections. Today, in the context of tabular data, we have a well-established
range of data mining techniques available.
It is well within the capabilities of many commercial enterprises and
researchers to mine tabular.
data, using software such as Weka, on standard desktop
machines. However, the amount of electronic data collected by all kinds of
institutions and commercial enterprises, year on year, continues to grow and
thus there is still a need for efective mechanisms to mine ever larger data
sets. The popularity of data mining increased significantly in the 1990s,
notably with the establishment of a number of dedicated conferences; the ACM
SIGKDD(special interest group on knowledge discovery in data) annual conference
in 1995, and the European PKDD(practice of knowledge discovery in databases)
and the Pacific/Asia PAKDD(pacific Asia conference on knowledge discovery and
data mining) conferences This increase in popularity can be attributed to
advances in technology; the computer processing power and data storage
capabilities available meant that the processing of large volumes of data using
desktop machines was a realistic possibility. It became common place for
commercial enterprises to maintain data in computer readable form, in most
cases this was primarily to support commercial activities, the idea that this
data could be mined often came second. The 1990s also saw the introduction of
customer loyalty cards that allowed enterprises to record customer purchases,
the resulting data could then be mined to identify customer purchasing
patterns. Data mining , is the method of looking into giant volumes of data for
patterns using methods like classification, association rule mining,
clustering, etc. Data mining is a topic that is related to topics
like machine learning and pattern recognition. Data mining techniques area unit
the results of an extended process of analysis and products development.
I am in my final year. I was bright and brilliant, my family
is optimistic in me; they thought so much of me, but I had a fault. What was my
fault? I hated compiler construction. I struggled with calculations all
my life. Though i have been lucky; I did well all the same. However, I
had to write my final exam. I searched for all Compiler construction past
question for each year, compared, and sorted them. Guess what I discovered!
Over 35% of the questions were repetitions. I had hit the jackpot. I carefully
and thoroughly checked through the answer page. Therefore, I kept on revising
only the repeated questions. Well, I have a good grade to show for the Data
Mining I performed.
There is huge amount of data available in Information
Industry. This data is of no use until converted into useful information.
Analyzing this huge amount of data and extracting useful information from it is
necessary. The extraction of information is not the only process we need to
perform; it also involves other processes such as Data pre-processing( Data
Cleaning, Data Integration, Data Transformation) Data Mining, Pattern
Evaluation and Data Presentation. Once all these processes are over, we are now
position to use this information in many applications such as Fraud Detection,
Market Analysis, Production Control, Science Exploration etc.
1.2 STATEMENT OF PROBLEM
Through in depth research and observations carried on
supermarket we have discovered that retailers are willing to know what product
is purchased with the other or if a particular products are purchased together
as a group of items . Which can help in their decision making with respect
to placement of product, determining the timing and extent of promotions
on product and also have a better understanding of customer purchasing
habits by grouping customers with their transactions.
This project is aimed at designing and implementing a
well-structured market basket analysis software tool to solve the problem
stated above and compares the result to that of an existing software called WEKA.
1.3 AIM AND OBJECTIVE OF THE STUDY
The aim of the study is to maximize profit for the
retailers by providing better services to the consumers
The objectives of this study are: Cross-Market Analysis -
Data Mining performs Association/correlations between product sales
Identifying Customer Requirements - helps in identifying the
best products for different customers. It uses prediction to find the factors
that may attract new customers.
Customer Profiling - helps to determine what kind of people
buy what kind of products.
1.4 SIGNIFICANT OF THE STUDY
The essence of market basket analysis system is to
deliver or supply the right goods in the right quantity and at the right
time/place to the right customer. For these, the benefits that are derived
market basket analysis system are as follows:
A:
Increased visibility of your customer’s buying behavior
B:
Reduced order processing costs
C:
increased sale and market share
D:
Quicker execution of pricing and promotion strategies to specific target
Market segments
1.5:
SCOPE OF STUDY
This project research
is done to give knowledge about the operation of customer order and monitoring
system in Top Hills shopping mall. At focuses on the following area:
A:
Effective processing of data.
B:
Fast movement of products (delivery).
C:
Effective in the flow of information.
D:
Security of document.
During the causes of
the research work, I encountered some constraint, which restricts us to study
only the above mentioned areas. Some of the limitations include:
A: Inadequacy of funds
to finance the project as a result of economic
Instability
B: Constraint by time
factor because this research work is being done together
With other academic work
C: Unable to get the
necessary information from the project concerned and
Poor information facilities
It is necessary to
define some of the terms associated with customer order and monitoring system.
And they include:
A: SYSTEM: A system is
a group of things or parts working together in a
Particular relation
B: ORDER: This is the
terminology reserved for a request to supply the goods
asked for
C: CUSTOMER: A customer
is a person or organization who buys goods and
services from a shop,
business etc.
F: BREWAY: A place
where beer is manufactured.
E: INVOICE: List of
goods sold with the price charged.
G: ON-LINE PROCESSING:
This is the transferring of information through
On-line in cable.
I: COMPUTERISATION: It
is the process of converting manually based system
to
a computer based system.