Wednesday, December 23, 2009

FRAUD DETECTION IN CREDIT CARD TRANSACTION USING ENHANCED DC-1 DATA MINING ALGORITHM AND ITS PREVENTION

GOVERNMENT COLLEGE OF TECHNOLOGY


FRAUD DETECTION IN CREDIT CARD TRANSACTION USING ENHANCED DC-1 DATA MINING ALGORITHM AND ITS PREVENTION
GOVERNMENT COLLEGE OF TECHNOLOGY



ABSTRACT


Frauds have plagued telecommunication industries, financial institutions and other organizations for a long time. The type of fraud addressed in this paper is credit card transaction fraud. This fraud cost the businesses millions of dollars per year. As a result, fraud detection has become an important and urgent task for these businesses. At present a number of methods have been implemented to detect frauds, from both statistical approaches (e.g. data mining) and hardware approaches (e.g. firewalls, smart cards).

Currently, data mining is a popular way to combat frauds because of its effectiveness. Data mining is “a well-defined procedure that takes data as input and produces output in the forms of models or patterns.” In other words, the task of data mining is to analyze a massive amount of data and to extract some usable information that we can interpret for future uses.


OUR IMPLEMENTATION


In this paper we have enhanced the First Detector Constructor systems technique called DC-1 for detecting credit card frauds. We have also discussed some data mining techniques for fraud detection. Our paper outlines the steps for online credit card fraud detection and proposed a prevention technique.













TABLE OF CONTENTS


  1. INTRODUCTION

  2. TYPES OF CREDIT CARD FRAUDS

    1. INDUSTRY FRAUDS

      1. STOLEN CARDS

      2. APPLICATION FRAUD

      3. CARDHOLDER-NOT-PRESENT FRAUD

      4. COUNTERFIET CARDS

    2. ONLINE FRAUD

      1. ORGANISED FRAUD

      2. OPPORTUNISTIC FRAUD

      3. CARDHOLDER FRAUD


  1. FRAUD DETECTION USING DATA MINING TECHNIQUES

  2. OUR IMPLEMENTATION-ENHANCED DC-1 ALGORITHM

    1. DC-1 FRAMEWORK

    2. EXPLANATION

    3. THE ENHANCED DC-1 ALGORITHM

      1. K-MEANS PROCEDURE

      2. THE FINAL ENHANCED ALGORITHM

    4. ADVANTAGES OF ENHANCED DC-1 ALGORITHM


  1. OUR SUGGESTIONS FOR CONTROLLING ONLINE FRAUD

  2. OUR SUGGESTION FOR FRAUD PREVENTION

  3. CONCLUSION


1. INTRODUCTION


The Concise Oxford Dictionary defines fraud as ‘criminal deception; the use of false representations to gain an unjust advantage'. Fraud is as old as humanity itself, and can take an unlimited variety of different forms. We begin by distinguishing between fraud prevention and fraud detection. Fraud prevention describes measures to stop fraud occurring in the first place. In contrast, fraud detection involves identifying fraud as quickly as possible once it has been perpetrated. Fraud detection comes into play once fraud prevention has failed. Fraud detection is a continuously evolving discipline. Whenever it becomes known that one detection method is in place, criminals will adapt their strategies and try others.

In this paper we have detected credit card frauds using Data Mining techniques. Data Mining is the process of automated extraction of predictive information from large databases. It predicts future trends and finds behavior that the experts may miss as it lies beyond their expectations. Data mining is part of a larger process called knowledge discovery; specifically, the step in which advanced statistical analysis and modeling techniques are applied to the data to find useful patterns and relationships. This paper will present an overview of the traditional Predictive modeling technique for fraud detection and enhanced the DC-1 data mining algorithm for the same.


2. TYPES OF CREDIT CARD FRAUDS

2.1 INDUSTRY FRAUDS


Credit card fraud may be perpetrated in various ways, including simple theft, application fraud and counterfeit cards. In all of these, the fraudster uses a physical card, but physical possession is not essential in order to perpetrate credit card fraud: one of the major areas is ‘cardholder-not-present’ fraud, where only the card details are given (over the phone).




2.1.1 STOLEN CARD

Use of a stolen card is perhaps the most straightforward type of credit card fraud.

In this case, the fraudster typically spends as much as possible in as short a space of time as possible, before the theft is detected and the card stopped, so that detecting the theft early can prevent large losses.


2.1.2 APPLICATION FRAUD

Application fraud arises when individuals obtain new credit cards from issuing companies using false personal information. Traditional credit scorecards are used to detect customers who are likely to default, and the reasons for this may include fraud. Such scorecards are based on the details given on the application forms, and perhaps also on other details, such as bureau information. Statistical models, which monitor behaviour over time, can be used to detect cards, which have been obtained from a fraudulent application (e.g. a first time card holder who runs out and rapidly makes many purchases should arouse suspicion). With application fraud, however, urgency is not so important to the fraudster, and it might not be until accounts are sent out or repayment dates begin to pass that fraud is suspected.


2.1.3 CARDHOLDER-NOT-PRESENT FRAUD


Cardholder-not-present fraud occurs when the transaction is made remotely, so that only the card’s details are needed, and a manual signature and card imprint are not required at the time of purchase. Such transactions include telephone sales and online transactions, and this type of fraud accounts for a high proportion of losses. To undertake such fraud it is necessary to obtain the details of the card without the cardholder’s knowledge. This is done in various ways, including ‘skimming’, where employees illegally copy the magnetic stripe on a credit card by swiping it through a small handheld card reader, ‘shoulder surfers’ who enter card details into a mobile phone while standing behind a purchaser in a queue, and people posing as credit card company employees taking details of credit card transactions from companies over the phone.


2.1.4 COUNTERFIET CARDS

Counterfeit cards, currently the largest source of credit card fraud, can also be created using the information over phones. Transactions made by fraudsters using counterfeit cards and making cardholder-not-present purchases can be detected through methods, which seek changes in transaction patterns, as well as checking for particular patterns which are known to be indicative of counterfeit.


2.2 ONLINE FRAUD


Online credit card fraud against merchants can be broken out into three major categories:

  • Organized Fraud

  • Opportunistic Fraud

  • Cardholder Fraud

2.2.1 ORGANIZED FRAUD


It is a form of organized crime. The criminals use identity theft or some other means to apply for valid credit cards under someone else's name. Once issued, they set up a drop location where they have goods delivered to (usually a vacant house or apartment) and they spend the cards up to their limit. When the bill comes 30 - 45 days later, there's nobody there to pay it and the criminals move on to another credit card. A minor variation on this theme is the hacker/cracker using software to generate seemingly valid credit card numbers. Both types of criminals are normally looking for items that can be easily converted into cash. These are probably the hardest criminals to catch because they know all the ins and outs of the system and are constantly altering their techniques as soon as an anti-fraud measure begins to show any level of success.



2.2.2 OPPORTUNISTIC FRAUD


It is, quite simply, fraud that is committed because the opportunity happens to present itself. Perhaps a waiter, a little short on cash, copies down the credit card info from a customer and then goes online and buys his wife a nice birthday present. There are a million variations on this but essentially; the person committing fraud doesn't normally do this for a living. They are amateurs who happened to take advantage of an opportunity.


2.2.3 CARDHOLDER FRAUD


It is when the legitimate cardholder is the person committing fraud. Sometimes they claim they never received the merchandise. Sometimes they claim they never ordered the merchandise. Whatever the excuse, the cardholder knows how card not present transactions are treated by the credit card companies and aims to take advantage of the system. Even if the merchant calls the customer and confirms that they placed the order, when the bill comes they can claim they never heard of the company and the credit card company will stick the merchant with the liability. A minor variation on this type of fraud is the spouse or children who use the card and then deny the charges. Usually the actual cardholder is completely ignorant of the unauthorized use but the result is still the same for the merchant.


3. FRAUD DETECTION USING DATA MINING TECHNIQUES

Data mining techniques go well beyond the limitations of simple exceptions reporting by identifying suspicious cases based on patterns in the data that are suggestive of fraud. Patterns in data that can be indicative of fraud can have one or more of the following characteristics:

  • Unusual data values which deviate from the norm in some way

  • Unusual relationships among data values and/or records

  • Changes in the behavior of those involved in the transactions.


Characteristic Data Mining Technique

Unusual data Outlier Analysis;

Frequency of occurrence;

Cluster Analysis;

Algorithms

Unusual relationships Outlier Analysis;

Frequency of occurrence;

Cluster Analysis;

Link Analysis.

Changes in behavior Outlier Analysis;

Frequency of occurrence.


These are some of the data mining techniques for detecting fraudulent transactions having above characteristics.



4. OUR IMPLEMENTATION-ENHANCED DC-1 ALGORITHM

4.1 DC-1 FRAMEWORK


Our approach to building a fraud detection system is to classify individual transactions as fraudulent and legitimate. In sum, the problem comprises three questions, corresponding to a component in the framework. The questions are:

1.Which transactions are important? Which features or combination of features are useful for distinguishing legitimate behavior from fraudulent ones?

2. How should profiles be created? Given an important feature, how should we

characterize/profile the behavior of a credit card holder with respect to the feature, in order to notice important changes?

3. When should alarms be issued? Given the results of profiling behavior based on multiple criteria, how should they be combined to be effective in determining when fraud has occurred?

FIRST DETECTOR CONSTRUCTOR FRAMEWORK




TRANSACTIONS

RULES

………………………………………….


MONITOR

TEMPLATES



PROFILING

MONITORS












4.2 EXPLANATION


The Detector Constructor framework (DC-1) starts with analyzing available transaction records including fraudulent transactions.

(1) CLASSIFICATION RULE LEARNING


First, based on the given history of an account, transactions of an account are analyzed and labeled as fraudulent transactions and legitimate (non-fraudulent) transactions. The local set of rules for the account is searched. For example, for one specific account, the following classification rule is devised

(No. of transactions>=restricted no. of transactions) AND (amount of transactions> Threshold value) = Fraud transaction.

However, it is required to have a set of rules, a priori rules that can perform as fraud indicators, since the rules generated are specific to one single account. In order to generate rules that can apply to as many accounts as possible, this algorithm is devised, controlled by two parameters such as Trules and Taccts. Trules is defined as a threshold on the number of rules required to cover each account, and Taccts is defined as the number of accounts which a rule must have been found in to be selected at all. After an account is examined with a certain number of rules and a rule is applied to a certain number of accounts, a rule is selected. The list of rules generated from each account is reviewed.

Finally, the rule that appears the most frequently from the list of the entire account set is chosen.


(2) CONSTRUCTION OF PROFILING MONITORS


After rules are selected, a set of monitors is built. The purpose of profiling monitors is to investigate the sensitivities of accounts to general rules. The construction of profiling monitors consists of two stages, a profiling stage and a usage stage. In the profiling stage, a general rule is applied to a portion of an account’s legitimate usage to evaluate the account’s normal activities. In other words, legitimate activities of an account are summarized into profiling monitors through the use of templates. The statistics of the account’s normal activities is saved to that account. Later, in the usage stage, the monitor is applied to the whole part of the account (i.e. account for a month). The resulting statistics can be used to examine the abnormality of the usage of the account per month.

During this process, the profiling monitors are built by the monitor constructor, which is a set of templates. These templates examine the conditions of the rules. Based on the result of it, each rule-template is finally derived as a profiling monitor. For example, templates are made up with various statistical expressions such as a threshold monitor and a standard deviation monitor. In the threshold monitor, binary categorizations are made according to whether the user’s behavior of a day exceeds the threshold defined with the portion of a day. Also, in the standard deviation monitor, different output values are defined according to how much the user’s behavior in that month deviates from the rule’s condition defined in that year.


(3) COMBINATION OF EVIDENCE FROM THE MONITORS


To improve the confidence of the detection, monitors are combined with evidence resulted from the application of monitors to the sample data. For example, monitors generated are applied to a sample account for a month, and their outputs, whether fraudulent activities are detected or not, are expressed as a result vector for that month. The evidence about the account for that month, whether the account month truly has frauds or not, is introduced together with the outputs. Then, the outputs are weighted with the combination of evidence. Also, the combination of evidence is trained with the threshold value based on the sum of weights. Hence, it is possible to put more confidence on monitors with larger weights to prevent false alarms. After all, there may exist redundant and ineffective rules.

To reduce the number of monitors, it proposes the use of a sequential forward selection process. Finally, fraud detectors are selected from monitors combined with evidence.


4.3 THE ENHANCED DC-1 ALGORITHM

We have enhanced the DC-1 algorithm by first clustering the data sets using K-means algorithm and then applying DC-1 technique. The rules Ra generated in the DC-1 algorithm is also clustered using K-means algorithm.

Clustering is a popular approach to implementing the partitioning operation. Clustering methods partition a set of objects into clusters such that objects in the same cluster are more similar to each other than objects in different clusters according to some defined criteria. The k-means algorithm is well known for its efficiency in clustering large data sets.


4.3.1 K-means procedure

Given a set of numeric objects X and an integer number k (<=n), the k-means algorithm searches for a partition of X into k clusters that minimizes the within groups sum of squared errors. This process is often formulated as the following mathematical program problem P

k n

Minimize P (W, Q) = wi, l d(Xi , Ql)

l=1 i=1

k

subject to wi, l = 1, 1<= i <= n

l=1

wi, l 0,1 , 1<= i <=n, 1<= l<=k


where W is an n k partition matrix, Q = Q1, Q2, . . . , QK is a set of objects in the same object domain, and d(. , .) is the squared Euclidean distance between two objects.

4.3.2 THE FINAL ENHANCED ALGORITHM


Given:

Accts: set of all accounts obtained after clustering using K-means (i.e.) the set Q.

Rules: set of all fraud rules generated from Accts

Trules : (parameter) Number of rules required to cover each account

Taccts : (parameter) Number of accounts in which a rule must have been found


Output:

S: set of selected rules.

1. /*Initialization*/

2. S = {};

3. for (a Accts) do Cover[a] = 0;

4. for (r Rules) do

5. Occur[r] = 0; /*Number of accounts in which r occurs*/

6. AcctsGen[r] = {}; /*Set of accounts generating r */

7. end for

8. /* Set up Occur and AcctsGen */

9. for (a Accts) do

10. Ra = set of rules generated from a;


11. for (r Ra) do

12. Occur[r] : = Occur[r] + 1;

13. add a to AcctsGen[r];

14. end for; end for

15. Call K-means procedure to cluster Ra ; /* rules are clustered here using K-

means */

16. /* Cover Accts with Rules */

17. for (a Accts) do

18. Ra = list of rules generated from a;

19. sort Ra by Occur;

20. while (cover[a] < style="font-size: 8pt;" size="1">rules) do

21. r := highest-occurrence rule from Ra

22. Remove r from Ra

23. if (r S and Occur[r] Taccts ) then

24. add r to S;

25. for (a2 AcctsGen[r]) do

26. Cover[a2] = Cover[a2] + 1;

27. end for; end if

28. end while; end for


4.4 ADVANTAGES OF ENHANCED DC-1 ALGORITHM

    • It is efficient in processing large data sets.

    • It often terminates at a local optimum.

    • It works only on numeric values.

    • The clusters have convex shapes i.e. they are bell shaped curves.


So it is easier to find the maximum cover of rules generated to distinguish legitimate from fraudulent transactions.




5. OUR SUGGESTIONS FOR CONTROLLING ONLINE FRAUD


1. Do Mod10 algorithm testing. Mod10 is an algorithm that will tell you if the card number being presented could be a valid card number. It doesn't mean that number was ever issued, or that the card number is an active account, but it will tell you whether the digits the customer typed in could be in the range of valid credit card numbers issued by the major credit card companies. This test should be the first test applied to any credit card number you process. If the card fails Mod10, it will fail all other attempts to authenticate and process a charge against the card.

2. Obtain an authorization and AVS check on every transaction. When a merchant processes a credit card transaction, normally they must receive an authorization for the amount of the order. This usually guarantees that the card is a valid card number and that the person has available credit for the amount being requested. The credit card companies make available AVS (Address Verification Service), which you can use to further verify the validity of the card. AVS matches the billing address provided by the customer with the zip code held on file at the issuing bank. While there are numerous reasons why the card may fail AVS (recent change of address, AVS computers down, etc.), an AVS failure should be a red flag that needs further investigation.

3. Be extremely wary of orders where the shipping and billing addresses are not the same. Obviously if you are in a business that sells items traditionally given as gifts (flowers would be an example) this may be difficult but if the majority of your customers bill to the same address they shop to, be cautious of orders that are being shipped to a different address.

4. All newly issued credit and debit cards carry a 3 digit non-embossed number on the back of the card. This number is not included in the data contained on the magnetic stripe of the card and is not printed on credit card statements or anywhere else.

5. Pay extra attention to orders that are for amounts greater than the norm or consist mostly of one type of item. Criminals trying to commit fraud will often place large orders for specific items that they know they can resell easily. For instance, if you sell DVDs and you receive an order to 25 of the same title, you should investigate further. Customers who place multiple small orders should draw your attention as well. Some criminals are aware that cautious merchants scrutinize large transactions so the criminal simply places many smaller orders rather than one large one.

6. Be suspicious of orders that are placed for rush or expedited delivery. Since criminals aren't paying the shipping fees they normally don't care about the extra cost and they want the order shipped as quickly as possible. The longer the order sits around before shipping the greater the chance the fraud will be uncovered.

7. Any order consisting mostly or entirely of high ticket items should receive extra scrutiny. High-ticket items usually have a high resell value so they tend to be on the shopping list of many criminals.

8. Be alert of orders that originate from email addresses issued by free hosting providers like yahoo.com, hotmail.com, etc. Many sites simply will not accept orders from email address originating at free hosting providers.

9. Keep an eye out for orders from multiple accounts/credit-card-numbers being shipped to the same delivery address. This may indicate a drop box or drop location where criminals are having orders delivered to.

10. Orders being shipped to an international address should earn a closer inspection. Pay particular attention if the card or the shipping address is in an area prone to credit card fraud.

11. Watch for multiple orders being placed over a short period of time. Many criminals will attempt to run up a card before the owner finds out or in the case of a stolen identity before the first bill arrives.

12. Pick up the phone. If you have any suspicions about an order call the contact phone number given by the customer and attempt to confirm the details of the order. If you still don't feel comfortable, call the issuing bank and ask to confirm the account details.


6. OUR SUGGESTION FOR FRAUD PREVENTION


Once again we stress that fraud prevention describes measures to stop fraud occurring in the first place. This can be done by some of the biometric techniques like

  • Finger prints

  • Iris recognition

  • Facial recognition


Out of these we suggest that iris recognition can be used efficiently in credit card

fraud prevention because iris code (binary code ) can be easily stored in the credit card and easily detected through credit card sensing machines where a camera should be attached.

. When a person stands before the credit card sensing machine, his iris is captured through the camera and converted into a binary code, which should match the original code stored in the card. If not, there will be no further transactions and hence there is no chance for fraud occurring in the first place.


7. CONCLUSION

Fraud is a deliberate deception to obtain assets or resources. In the digital world where speed and anonymity reign, this deception is costly and pervasive. Several criteria can be used when fraud is to be detected. Our algorithm and our suggestive steps for online fraud detection satisfies the following criteria:

  • For cost-management, fraud-screening mechanisms should be internal to the processing system, not outsourced to a third-party.

  • Fraud programs must be independently accessible and adaptable to the changing needs of the merchant.

  • For effective fraud screening, the process should be multi-tiered such that there are multiple levels of approval required prior to dispatch for final authorization.

  • Real-time transaction reports should be easily and independently accessible to the merchant.

  • Systems that refer to databases of past and present fraudulent cards and customer information provide tremendous value to merchants and should be part of the program.

The responsibility and risk of fraud is multi-faceted. Merchants, financial institutions and e-payment processors must be vigilant about fraud prevention, and must work in tandem to insure the continued success and growth of this new commerce frontier.





REFERENCES


1. Data Mining and Knowledge Discovery in Databases,

http://www.cs.sfu.ca/research/groups/DB/sections/publication/kdd/kdd.html

2. Burge, P. and Shawe-Taylor, J. (1997). Detecting cellular fraud using adaptive

prototypes. AAAI Workshop: AI Approaches to Fraud Detection and Risk

Management, 9-13

3. Ralambondrainy, H. 1995. A conceptual version of the k-means algorithm. Pattern Recognition Letters, 16:1147–1157.







GOVERNMENT COLLEGE OF TECHNOLOGY