Apr 08

Matching Credit Card Numbers

Overview

Using regular expressions, you can easily match a credit card number.  You may wish to validate legit CC numbers, block financial information in emails, or audit security by finding financial information in documents.  There is no perfect algorithm or regex for detecting potential CCN’s, and there will always be false positives, etc.  Although regular expressions can match a CCN, they cannot confirm incorrect digits.  If you require a more robust solution, you will need to also implement the Luhn algorithm.  Moving forward, I’ll focus on detecting CCN’s in documents or emails.

Credit Card Info

The first 6 digits of a CCN are known as the Issuer Identification Number (IIN), which are used to identify the card issuer; the remaining digits can vary in length and are up to the issuer. Using the IIN and the CCN pre-defined length, we can identify blocks of numbers that belong to each issuer.  The credit card numbers are typically grouped with spaces or dashes in order to make them more readable.  We will need to keep this in mind when matching CCN’s.  I have listed US issuers, their IIN, and the CCN lengths below. For a full list including international issuers see http://en.wikipedia.org/wiki/Bank_card_number.

ISSUER IIN STARTING PATTERN LENGTH
American Express 34, 37 15
Diners Club 300-305, 309, 36, 38-39 15
Discover 6011, 622126-622925, 644-649, 65 16
JCB 3528-3589 16
MasterCard 50-55 16
Visa 4 16 or 13 on old cards

The Basics Validating Credit Cards

If we wanted to validate a credit card, you would first want to remove any spaces or dashes from the input.  Once the input is clean we can use a typical regular expression to match potential valid CCN’s. The below regex’s were originally taken from http://www.regular-expressions.info/creditcard.html, but I’ve updated the out of date expressions in accordance to the latest IIN changes as per the Wikipedia article listed above:

American Express

 

Diners Club

 

Discover

 

JCB

 

Mastercard

 

Visa

 

However, if you are unable to strip the spaces and dashes out prior to validating the CCN, you’ll quickly find many shortcomings.  The above regex’s will not account for spaces or dashes as printed on the front of the card and will only detect a CCN when it’s the only thing on a line.  Obviously, this will not give us the results we desire for finding CCN’s in a document or email.  Instead, we will want to use \b to match on a word boundary instead of the carrot(^) and the dollar($).  In addition, we also want to add exceptions before and after the CCN we’re checking, which will help reduce false positives; this will allow us to eliminate items such as hyperlinks, order #’s, etc.  We can now use the below regex to surround each CC issuer’s rules that we want to detect:

 

  • Starting Position is a word boundary
  • Previous Character is not:
    • period(.)
    • left angle bracket(<)
    • right angle bracket(>)
    • dash()
    • plus(+)
    • forward slash(/) – Ignore false positives like http://www.domain.com/################/
    • Open parenthesis – Ignore false positives like (################)
    • Equal (=) – Ignores false positives like http://www.domain.com?si=################
    • Pound, Colon, Space(#:  ) – Ignore false positives like Order#: ################
    • dash, space(– )
    • ID, colon, space(ID: )
  • CCN – Represents the regex used to represent each issuers credit card number
  • Next character is not:
    • forward slash(/) – Ignore false positives like http://www.domain.com/################/
  • Ending position is a word boundary

Matching Credit Cards in a Document or Email

By doing a slight re-write of CC regex’s and combining it with our above wrapper, we can easily detect a CCN’s in a document or email. However, the oneliner for Discover CCN’s is quite long, and some systems limit the length of regex’s. Due to this, I’ve provided the oneliner plus shorter versions split up by the IIN.

American Express

 

Diners Club

 

Discover (Oneliner)

 

Discover (6011,644-649,65 IIN’s)

 

Discover (622 IIN No Delimiter)

 

Discover (622 IIN Space Delimiter)

Note: Office 365 has a 128 character limit for the regex expression.  I have modified the default wrapper by removing a few items to keep this at 128 characters

 

Discover (622 IIN Dash Delimiter)

Note: Office 365 has a 128 character limit for the regex expression.  I have modified the default wrapper by removing a few items to keep this at 128 characters

 

JCB

 

MasterCard

 

Visa

If we test the above regex in an online tester, we can verify it’s working as expected. As you can see, our regex is capturing our test Visa CCN’s and missing a lot of false positives:

Visa Regex Test

Real World Ex: Blocking Emails with Credit Card Numbers in Office 365

Now that we’ve established our regex’s, let’s apply it to a real world example. For our example, we’ll block inbound/outbound email in Office 365. Please note, Office 365 transport rules only allow 128 characters in a regex, and due to this we’ll need to use multiple regex’s for matching Discover.  Also note, two of the Discover Regex’s I used above have a modified wrapper to keep them at the 128 character limit.

  1. Login to the Office 365 Admin Portal (https://portal.microsoftonline.com)
  2. Click Admin then click Exchange
  3. Under the Exchange Admin Center, click Mail Flow
  4. Click the Rules tab
  5. Click the + to create a new rule
    1. Name: Block Emails with Credit Card Numbers
    2. Apply this rule if: The subject or body matches:
      1. Paste each CCN Regex
    3. Do the following:  Reject the message with the explanation
      1. Rejection Reason: Your message was blocked due to the detection of a Credit Card Number
  6. Click Save
    1. Note: Mail flow rules normally take 30-35 minutes to replicate in Office 365