Report submitted to Harrisburg University of Science and Technology
in partial fulfillment of the requirements
for the course of
GRAD 695: Research Methodology and Writing
Dr. Ozlem Cosgun
Anomaly detection using Data Mining
In several ways, evolving information technology has resulted inan enormous number of databases and massive data. The analysis of data sets and information technology improves the collection strategy and uses this useful data to make additional decisions.
Data mining is the process of analyzing data from different perspectives and summarizing it into useful information,information that can be used to increase revenue, cuts costs or both. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.Data mining, also known as data and knowledge exploration, is the course of action to analyze data from different points of view and shorten it into details that can be used to assess data trends and relationships to help make better business decisions and thus increase revenue, project trend, digital advertising, cost reductions.
AnamolyDetection is characterized as the problem of finding trends in data that are not in line with expected behavior. Such specific phenomena are usually referred to in different application areas as anomalies, outliers, discordant findings, deviations, aberrations, surprises, peculiarities, or pollutants, etc.Of these, anomalies and outliers are two terms that are most widely used in the identification of anomalies.Anomaly detection sees widespread use in a variety of applications such as credit card fraud detection, insurance, and health care, cyber-security intrusion detection, security-critical systems fault detection, and enemy military surveillance.
Keywords: Data Mining, Anomalies, fraud, intrusion detection, surveillance
Assumptions for Increasing technology and experience as well as knowledge, it is based on global competitive advantage. However, a shortfall of detection techniques to identify the data frauds, data anomaly will result in huge losses to organizations. Certain assumptions are made on the data to detect different anomalies.
Simply based on the historical data available there is a lot of information on how a trend of methods and solutions helped to detect frauds and assist organizations. However, it will be an ongoing study which needs to be further looked into as above stated. With vastly growing future technologies adaptions by the organizations comes various problems that are different from historical occurrences and will need various techniques to overcome issues.
There is an enormous amount of studies that need to be conducted on the new technologies changing every day for every growing era evolving every day. The requirement and lack of future trends for today are limited.Main challenge in anomaly detection is to find a robust detection and classification method that avoids false positive to minimum and has high accuracy
Parameter tuning is a big problem, especially in cases where anomalies only occur during test time, which is often the case for really practical problems
Anomaly detection cannot cope with dynamically changing environment
The results of this study would indicate the newly adapted data mining techniques to assist organization and give a clear understanding on how to reduce frauds, anomalies and implement new technologies cost-effectively for any business as well as have an indication of the advantages, disadvantages and factors in play ,as well as future trends and changes adapting and how it may further change to cater for future issues and plan going forward as well as the importance of future assessment.
Limitations for the study would be ideally be based on a case study, further research from the internet and a possible survey, which I believe is critical and will show the impact and necessities.
Also, research tends to stop and being made freely available as there are costs involved for courses and getting information based on entrepreneurs selling methods, technology, and information not being available without a certain cost associated with it. With this being said I think it is important and yet critical to complete a case study for this to have a better understanding of the future technologies and techniques that can be created, adapted and explored within the interested field.
A case study needs to be done on the different and new data mining techniques to identify various anomalies, frauds, and intrusions faced by organizations. Also, conduct research on the existing case studies on topics that are not covered in this research.
J. Huysmans, B. Baesens, D. Martens, K. Denys And J. Vanthienen, New Trends in Data Mining, TijdschriftVoorEconomieen Management, Vol. L, 4, 2005
Varun Chandola, Arindam Banerjee and Vipin Kumar, Anomaly Detection: A Survey, ACM Computing Surveys, Vol. 41, No. 3, Article 15, 2009
AnimeshPatcha, Jung-Min Park, An overview of anomaly detection techniques: Existing solutions and latest technological trends, ScienceDirect 2007.
Kalyani M Raval, Data Mining Techniques, International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE) Volume 2, Issue 10, 2012
Philippe Esling and Carlos Agon, Time-Series Data Mining, ACM Computing Surveys, Volume 45, No. 1, Article 12 (2012)