* RFC2822 email address parsing and extraction, some header verification. *
* Copyright (c) 2008 Boxbe, Inc., and Les Hazlewood. See license, below. * * @author Les Hazlewood, Casey Connor */ package com.boxbe.pub.email; /* * Original code Copyright 2008 Les Hazlewood * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.util.regex.Pattern; import java.util.regex.Matcher; import java.util.ArrayList; import javax.mail.internet.InternetAddress; import java.io.UnsupportedEncodingException; /** * EmailAddress.java *
* A utility class to parse, clean up, and extract email addresses from messages * per RFC2822 syntax. Designed to integrate with Javamail (this class will require that you * have a javamail mail.jar in your classpath), but you could easily change * the existing methods around to not use Javamail at all. For example, if you're changing * the code, see the difference between getInternetAddress and getDomain: the latter doesn't * depend on any javamail code. This is all a by-product of what this class was written for, * so feel free to modify it to suit your needs. *
* For real-world addresses, this class is roughly 3-4 times slower than parsing with * InternetAddress, but * it can handle a whole lot more. Because of sensible design tradeoffs made in javamail, if * InternetAddress has trouble parsing, * it might throw an exception, but often it will silently leave the entire original string * in the result of ia.getAddress(). This class can be trusted to only provide authenticated * results. *
* This class has been tested on a few thousand real-world addresses, and is live in * production environments, but you may want to do some of your own testing to ensure * that it works for you. In other words, it's not beta, but it's not guaranteed yet. *
* Comments/Questions/Corrections welcome: java <at> caseyconnor.org *
* Started with code by Les Hazlewood: * leshazlewood.com. *
* Modified/added: removed some functions, added support for CFWS token, * corrected FWSP token, added some boolean flags, added getInternetAddress and * extractHeaderAddresses and other methods, some optimization. *
* Where Mr. Hazlewood's version was more for ensuring certain forms that were passed in during * registrations, etc, this handles more types of verifying as well a few forms of extracting * the data in predictable, cleaned-up chunks. *
* Note: CFWS means the "comment folded whitespace" token from 2822, in other words, * whitespace and comment text that is enclosed in ()'s. *
* Limitations: doesn't support nested CFWS (comments within (other) comments), doesn't * support mailbox groups except when flat-extracting addresses from headers or when doing * verification, doesn't support * any of the obs-* tokens. Also: the getInternetAddress and * extractHeaderAddresses methods return InternetAddress objects; if the personal name has * any quotes or \'s in it at all, the InternetAddress object will always * escape the name entirely and put it in quotes, so * multiple-token personal names with those characters somewhere in them will always be munged * into one big escaped string. This is not really a big deal at all, but I mention it anyway. * (And you could get around it by a simple modification to those methods to not use * InternetAddress objects.) See the docs of those methods for more info. *
* Note: This does not do any header-length-checking. There are no such limitations on the * email address grammar in 2822, though email headers in general do have length restrictions. * So if the return path * is 40000 unfolded characters long, but otherwise valid under 2822, this class will pass it. *
* Examples of passing (2822-valid) addresses, believe it or not: *
* bob @example.com
*
"bob" @ example.com
*
bob (comment) (other comment) @example.com (personal name)
*
"<bob \" (here) " < (hi there) "bob(the man)smith" (hi) @ (there) example.com (hello) > (again)
*
* (none of which are permitted by javamail, incidentally) *
* By using getInternetAddress(), you can retrieve an InternetAddress object that, when * toString()'ed, would reveal that the parser had converted the above into: *
* <bob@example.com>
*
<bob@example.com>
*
"personal name" <bob@example.com>
*
"<bob \" (here)" <"bob(the man)smith"@example.com>
*
(respectively) *
If parsing headers, however, you'll probably be calling extractHeaderAddresses(). *
* A future improvement may be to use this class to extract info from corrupted * addresses, but for now, it does not permit them. *
* Some of the configuration booleans allow a bit of tweaking * already. The source code can be compiled with these booleans in various * states. They are configured to what is probably the most commonly-useful state. * * @author Les Hazlewood, Casey Connor * @version 1.11 */ public class EmailAddress { /** * This constant changes the behavior of the domain parsing. If true, the parser will * allow 2822 domains, which include single-level domains (e.g. bob@localhost) as well * as domain literals, e.g.: * *
someone@[192.168.1.100] or
*
john.doe@[23:33:A2:22:16:1F] or
*
me@[my computer]
The RFC says these are valid email addresses, but most people don't like * allowing them. * If you don't want to allow them, and only want to allow valid domain names * (RFC 1035, x.y.z.com, etc), * and specifically only those with at least two levels ("example.com"), then * change this constant to false. * *
Its default (compiled) value is false, thus it is not RFC 2822 compliant, * but you should set it depending on what you need for your application. */ public static final boolean ALLOW_DOMAIN_LITERALS = false; /** * This constant states that quoted identifiers are allowed * (using quotes and angle brackets around the raw address) are allowed, e.g.: * *
"John Smith" <john.smith@somewhere.com> * *
The RFC says this is a valid mailbox. If you don't want to * allow this, because for example, you only want users to enter in * a raw address (john.smith@somewhere.com - no quotes or angle * brackets), then change this constant to false. * *
Its default (compiled) value is true to remain RFC 2822 compliant, but * you should set it depending on what you need for your application. */ public static final boolean ALLOW_QUOTED_IDENTIFIERS = true; /** * This constant allows "." to appear in atext. *
* The addresses: *
Kayaks.org <kayaks@kayaks.org> *
Bob K. Smith<bobksmith@bob.net> *
* ...are not valid. They should be: *
"Kayaks.org" <kayaks@kayaks.org> *
"Bob K. Smith" <bobksmith@bob.net> *
* If this boolean is set to false, the parser will act per 2822 and will require * the quotes; if set to true, it will allow the use of "." without quotes. * Default (compiled) setting is false. */ public static final boolean ALLOW_DOT_IN_ATEXT = false; /** * This controls the behavior of getInternetAddress and extractHeaderAddresses. If true, * it allows the not-totally-kosher-but-happens-in-the-real-world practice of: *
* <bob@example.com> (Bob Smith) *
* In this case, "Bob Smith" is not techinically the personal name, just a * comment. If this is set to true, the methods will convert this into: * Bob Smith <bob@example.com> *
* This also happens somewhat more often and appropriately with *
* mailer-daemon@blah.com (Mail Delivery System) *
* If a personal name appears to the left and CFWS appears to the right of an address, * the methods will favor the personal name to the left. If the methods need to use the * CFWS following the address, they will take the first comment token they find. *
e.g.: *
"bob smith" <bob@example.com> (Bobby)
*
will yield personal name "bob smith"
*
<bob@example.com> (Bobby)
*
will yield personal name "Bobby"
*
bob@example.com (Bobby)
*
will yield personal name "Bobby"
*
bob@example.com (Bob) (Smith)
*
will yield personal name "Bob"
*
* Default (compiled) setting is true. */ public static final boolean EXTRACT_CFWS_PERSONAL_NAMES = true; /** * This constant allows "[" or "]" to appear in atext. Not very * useful, maybe, but there it is. *
* The address: *
[Kayaks] <kayaks@kayaks.org> * ...is not valid. It should be: *
"[Kayaks]" <kayaks@kayaks.org> *
* If this boolean is set to false, the parser will act per 2822 and will require * the quotes; if set to true, it will allow them to be missing. *
* One real-world example seen: *
* Bob Smith [mailto:bsmith@gmail.com]=20 *
* Use at your own risk. There may be some issue with enabling this feature in conjunction * with ALLOW_DOMAIN_LITERALS, but i haven't looked into that. If ALLOW_DOMAIN_LITERALS * is false, i think this should be pretty safe. Whether or not it's useful, that's up * to you. Default (compiled) setting of false. */ public static final boolean ALLOW_SQUARE_BRACKETS_IN_ATEXT = false; /** * This contant allows ")" or "(" to appear in quoted versions of * the localpart (they are never allowed in unquoted versions) *
* The default (2822) behavior is to allow this, i.e. boolean true. *
* You can disallow it, but better to leave it true. I left this hanging around (from an * earlier incarnation of the code) as a random option you can switch off. No, it's not * necssarily useful. Long story. *
* If false, it will prevent such addresses from being valid, even though they are: * "bob(hi)smith"@test.com *
* Deafult (compiled) setting of true. */ public static final boolean ALLOW_PARENS_IN_LOCALPART = true; /** * Checks to see if the specified string is a valid * email address according to the RFC 2822 specification, which is remarkably * squirrely. See doc for this class: 2822 not fully implemented, but probably close * enough for almost any needs. *
* If being used on a 2822 header, this method applies to Sender, Resent-Sender, * only, * although you can also use it on the Return-Path if you know it to be non-empty * (see doc for isValidReturnPath()!). Folded header lines should work OK, but I haven't * tested that. *
* @param email the email address string to test for validity (null and "" OK, * will return false for those) * @return true if the given email text is valid according to RFC 2822, false otherwise. */ public static boolean isValidMailbox(String email) { return (email != null) && MAILBOX_PATTERN.matcher(email).matches(); } /** * Tells us if the email represents a valid return path header string. *
* NOTE: legit forms like <(comment here)> will return true. *
* You can check isValidReturnPath(), and * if it is true, and if getInternetAddress() returns null, you know you have a DSN, * whether it be an empty return path or one with only CFWS inside the brackets (which is * legit, as demonstated above). Note that * you can also simply call getReturnPathAddress() to have that operation done for you. *
Note that <""> is not a valid return-path. */ public static boolean isValidReturnPath(String email) { return(email != null) && RETURN_PATH_PATTERN.matcher(email).matches(); } /** * WARNING: You may want to use getReturnPathAddress() instead if you're * looking for a clean version of the return path without CFWS, etc. See that * documentation first! *
* Pull whatever's inside the angle brackets out, without alteration or cleaning. * This is more secure than a simple substring() since paths like: *
<(my > path) > *
...are legal return-paths and may throw a simpler parser off. However * this method will return all CFWS (comments, whitespace) that may be between * the brackets as well. So the example above will return: *
(my > path)_
(where the _ is the trailing space from the original
* string)
*/
public static String getReturnPathBracketContents(String email)
{
if (email == null) return(null);
Matcher m = RETURN_PATH_PATTERN.matcher(email);
if (m.matches())
return(m.group(1));
else return(null);
}
/**
* Pull out the cleaned-up return path address. May return an empty string.
* Will require two parsings due to an inefficiency.
*
* @return null if there are any syntax issues or other weirdness, otherwise
* the valid, trimmed return path email address without CFWS, surrounding angle brackets,
* with quotes stripped where possible, etc. (may return an empty string).
*/
public static String getReturnPathAddress(String email)
{
if (email == null) return(null);
// inefficient, but there is no parallel grammar tree to extract the return path
// accurately:
if (isValidReturnPath(email))
{
InternetAddress ia = getInternetAddress(email);
if (ia == null) return("");
else return(ia.getAddress());
}
else return(null);
}
/**
* Tells us if a header line is valid, i.e. checks for a 2822 mailbox-list (which
* could only have one address in it, or might have more.) Applicable to From or
* Resent-From headers only.
*
* This method seems quick enough so far, but I'm not totally * convinced it couldn't be slow given a complicated near-miss string. You may just * want to call extractHeaderAddresses() instead, unless you must confirm that the * format is perfect. I think that in 99.9999% of real-world cases this method will * work fine. *
* @see #isValidAddressList(String) */ public static boolean isValidMailboxList(String header_txt) { return(MAILBOX_LIST_PATTERN.matcher(header_txt).matches()); } /** * Tells us if a header line is valid, i.e. a 2822 address-list (which * could only have one address in it, or might have more.) Applicable to To, Cc, Bcc, * Reply-To, Resent-To, Resent-Cc, and Resent-Bcc headers only. *
* This method seems quick enough so far, but I'm not totally * convinced it couldn't be slow given a complicated near-miss string. You may just * want to call extractHeaderAddresses() instead, unless you must confirm that the * format is perfect. I think that in 99.9999% of real-world cases this method will * work fine and quickly enough. Let me know what your testing reveals. *
* @see #isValidMailboxList(String) */ public static boolean isValidAddressList(String header_txt) { // creating the actual ADDRESS_LIST_PATTERN string proved too large for java, but // forutnately we can use this alternative FSM to check. Since the address pattern // is greedy, it will match all CFWS up to the comma which we can then require easily. boolean valid = false; Matcher m = ADDRESS_PATTERN.matcher(header_txt); int max = header_txt.length(); while (m.lookingAt()) { if (m.end() == max) { valid = true; break; } else { valid = false; if (header_txt.charAt(m.end()) == ',') { m.region(m.end() + 1, max); continue; } else break; } } return(valid); // return(ADDRESS_LIST_PATTERN.matcher(header_txt).matches()); } /** * Given a 2822-valid single address string, give us an InternetAddress object holding * that address, otherwise returns null. The email address that comes back from the * resulting InternetAddress object's getAddress() call will have comments and unnecessary * quotation marks or whitespace removed. *
* If your String is an email header, you should probably use * extractHeaderAddresses instead, since most headers can have multiple addresses in them. * (see that method for more info.) This method will indeed fail if you use it on a header * line with more than one address. *
* Exception: You CAN and should use this for the Sender header, and probably you want * to use it for the X-Original-To as well. *
* Another exception: You can use this for the Return-Path, but if you want to know that * a Return-Path is valid and you want to extract * it, you will have to call both this method and isValidReturnPath; this operation can * be done for you by simply calling getReturnPathAddress() instead of this method. In * terms of this method's application to the return-path, note that * the common valid Return-Path value <> will return null. So will the illegitimate * "" or legitimate * empty-string, but other illegitimate Return-Paths like *
"hi" <bob@smith.com> *
will return an address, so the moral is that * you may want to check isValidReturnPath() first, if you care. This method is useful if * you trust the return path and want to extract a clean address from it without CFWS * (getReturnPathBracketContents() will return any CFWS), * or if you want to determine if a validated return path actually contains an address in * it and isn't just empty or full of CFWS. Except for empty return paths (those lacking an * address) the Return-Path specification is a subset * of valid 2822 addresses, so this method will work on all non-empty return-paths, * failing only on the empty ones. *
* In general for this method, note: although this method does not use InternetAddress to
* parse/extract the
* information, it does ensure that InternetAddress can use the results (i.e. that
* there are no encoding issues), but note that an InternetAddress object can hold
* (and use) values for the address which it could not have parsed itself.
* Thus, it's possible that for InternetAddress addr, which came as the result of
* this method, the following may throw an exception or may silently fail:
* InternetAddress addr2 = InternetAddress.parse(addr.toString());
*
* Again, all other uses of that addr object should work OK. It is recommended that if * you are using this class that you never create an InternetAddress object using * InternetAddress's own constructors or parsing methods; rather, retrieve them through * this class. Perhaps the addr.clone() would work OK, though. *
* The personal name will include any and all phrase token(s) to the left of the address, * if they exist, and the string will be trim()'ed, but note that InternetAddress, when * generating the getPersonal() result or the toString() result, if * it encounters any quotes or backslashes in the personal name String, will put the entire * thing in a big quoted-escaped chunk. *
* This will do some smart unescaping to prevent that from happening unnecessarily; * specifically, if there are unecessary quotes around a personal name, it will remove * them. E.g. *
* "Bob" <bob@hi.com>
*
becomes:
*
Bob <bob@hi.com>
*
* (apologies to bob@hi.com for everything i've done to him) */ public static InternetAddress getInternetAddress(String email) { if (email == null) return(null); Matcher m = MAILBOX_PATTERN.matcher(email); if (m.matches()) return(pullFromGroups(m)); else return(null); } /** * See getInternetAddress; does the same thing but returns the constituent parts * of the address in a three-element array (or null if the address is invalid). *
* This may be useful because even with cleaned-up address extracted with this class * the parsing to achieve this is not trivial. *
* To actually use these values in an email, you should construct an InternetAddress * object (or * equivalent) which can handle the various quoting, adding of the angle brackets * around the address, etc., necessary for presenting the whole address. *
* To construct the email address, you can safely use:
*
result[1] + "@" + result[2]
*
* @return a three-element array containing the personal name String, local part String, * and the domain part String of the address, in that order, without the @; will return * null if the address is invalid; if it is valid this will not * return null but the personal name (at index 0) may be null */ public static String[] getAddressParts(String email) { if (email == null) return (null); Matcher m = MAILBOX_PATTERN.matcher(email); if (m.matches()) return(getMatcherParts(m)); else return(null); } /** * See getInternetAddress; does the same thing but returns the personal name that would * have been returned from getInternetAddress() in String * form. */ public static String getPersonalName(String email) { if (email == null) return (null); Matcher m = MAILBOX_PATTERN.matcher(email); if (m.matches()) return(getMatcherParts(m)[0]); else return(null); } /** * See getInternetAddress; does the same thing but returns the local part that would * have been returned from getInternetAddress() in String * form (essentially, the part to the left of the @). This may be useful because * a simple search/split on a "@" is not a safe way to do this, given * escaped quoted strings, etc. */ public static String getLocalPart(String email) { if (email == null) return (null); Matcher m = MAILBOX_PATTERN.matcher(email); if (m.matches()) return(getMatcherParts(m)[1]); else return(null); } /** * See getInternetAddress; does the same thing but returns the domain part in string * form (essentially, the part to the right of the @). This may be useful because * a simple search/split on a "@" is not a safe way to do this, given * escaped quoted strings, etc. */ public static String getDomain(String email) { if (email == null) return (null); Matcher m = MAILBOX_PATTERN.matcher(email); if (m.matches()) return(getMatcherParts(m)[2]); else return(null); } /** * Given a header, like the From:, extract valid 2822 addresses from it * and place them in an array. Returns an empty array if none found, will not return * null. The addresses that come back from the * resulting InternetAddress objects' getAddress calls will have comments and unnecessary * quotation marks or whitespace removed. If a bad address is encountered, parsing stops, * and the good * addresses found up until then (if any) are returned. This is kind of strict * and could be improved, but that's the way it is for now. If you need to know * if the header is totally valid (not just up to a certain address) then you can use * isValidMailboxList() or isValidAddressList() or isValidMailbox(), depending on * the header: *
* This method can handle group addresses, but it does not preseve the group name or * the structure of any groups; rather it flattens them all into the same array. * You can call this method on the From or any other header that uses the mailbox-list form * (which doesn't use groups), or you can call it on the To, Cc, Bcc, or Reply-To or any * other header which uses the address-list format which might have groups in there. * This method doesn't enforce any group structure syntax either. If you care to test * for 2822 validity of a list of addresses (including group format), use the appropriate * method. This will dependably extract addresses from a valid list. If the list is * invalid, it may extract them anyway, or it may fail somewhere along the line. *
* You should not use this method on the Return-Path header; instead use * getInternetAddress() or getReturnPathAddress() (see that doc for info about * Return-Path). However, you could use this on the Sender header if you didn't care * to check it for validity, since single mailboxes are valid subsets of valid * mailbox-lists and address-lists. *
* @param header_txt is text from whatever header. I don't * think the String needs to be unfolded, but i haven't tested that. *
* see getInternetAddress() for more info: this extracts the same way *
* @return zero-length array if erorrs or none found, otherwise an array of length > 0
* with the addresses as InternetAddresses with the personal name and emails set correctly
* (i.e. doesn't rely on InternetAddress parsing for extraction, but does require that
* the address be usable by InternetAddress, although re-parsing with InternetAddress may
* cause exceptions, see getInternetAddress()); will not return null.
*/
public static InternetAddress[] extractHeaderAddresses(String header_txt)
{
// you may go insane from this code
if (header_txt == null || header_txt.equals("")) return(new InternetAddress[0]);
// optimize: separate method or boolean to indicate if group should be worried about at all
Matcher m = MAILBOX_PATTERN.matcher(header_txt);
Matcher gp = GROUP_PREFIX_PATTERN.matcher(header_txt);
ArrayList
* You could roll your own method that does what you care about.
*
* This should work on the matcher for MAILBOX_LIST_PATTERN or MAILBOX_PATTERN, but
* only those. With some tweaking it could easily be adapted to some others.
*
* May return null on encoding errors.
*
* Also cleans up the address: tries to strip bounding quotes off of the local
* part without damaging it's parsability (by this class); if it can, do that; all other
* cases, don't.
*
* e.g. "bob"@example.com becomes bob@example.com
*/
private static InternetAddress pullFromGroups(Matcher m)
{
InternetAddress current_ia = null;
String[] parts = getMatcherParts(m);
if (parts[1] == null || parts[2] == null) return(null);
// if for some reason you want to require that the result be re-parsable by
// InternetAddress, you
// could uncomment the appropriate stuff below, but note that not all the utility
// functions use pullFromGroups; some call getMatcherParts directly.
try
{
//current_ia = new InternetAddress(parts[0] + " <" + parts[1] + "@" +
// parts[2]+ ">", true);
// so it parses it OK, but since javamail doesn't extract too well
// we make sure that the consituent parts
// are correct
current_ia = new InternetAddress();
current_ia.setPersonal(parts[0]);
current_ia.setAddress(parts[1] + "@" + parts[2]);
}
//catch (AddressException ae)
// {
//System.out.println("ex: " + ae);
// current_ia = null;
// }
catch (UnsupportedEncodingException uee)
{
current_ia = null;
}
return(current_ia);
}
/**
* See pullFromGroups
*
* @return will not return null
*/
private static String[] getMatcherParts(Matcher m)
{
String current_localpart = null;
String current_domainpart = null;
String local_part_da = null;
String local_part_qs = null;
String domain_part_da = null;
String domain_part_dl = null;
String personal_string = null;
// see the group-ID lists in the grammar comments
if (ALLOW_QUOTED_IDENTIFIERS)
{
if (ALLOW_DOMAIN_LITERALS)
{
// yes quoted identifiers, yes domain literals
if (m.group(1) != null)
{
// name-addr form
local_part_da = m.group(5);
if (local_part_da == null) local_part_qs = m.group(6);
domain_part_da = m.group(7);
if (domain_part_da == null) domain_part_dl = m.group(8);
current_localpart =
(local_part_da == null ? local_part_qs : local_part_da);
current_domainpart =
(domain_part_da == null ? domain_part_dl : domain_part_da);
personal_string = m.group(2);
if (personal_string == null && EXTRACT_CFWS_PERSONAL_NAMES)
{
personal_string = m.group(9);
personal_string = removeAnyBounding('(', ')',
getFirstComment(personal_string));
}
}
else if (m.group(10) != null)
{
// addr-spec form
local_part_da = m.group(12);
if (local_part_da == null) local_part_qs = m.group(13);
domain_part_da = m.group(14);
if (domain_part_da == null) domain_part_dl = m.group(15);
current_localpart =
(local_part_da == null ? local_part_qs : local_part_da);
current_domainpart =
(domain_part_da == null ? domain_part_dl : domain_part_da);
if (EXTRACT_CFWS_PERSONAL_NAMES)
{
personal_string = m.group(16);
personal_string = removeAnyBounding('(', ')',
getFirstComment(personal_string));
}
}
}
else
{
// yes quoted identifiers, no domain literals
if (m.group(1) != null)
{
// name-addr form
local_part_da = m.group(5);
if (local_part_da == null) local_part_qs = m.group(6);
current_localpart =
(local_part_da == null ? local_part_qs : local_part_da);
current_domainpart = m.group(7);
personal_string = m.group(2);
if (personal_string == null && EXTRACT_CFWS_PERSONAL_NAMES)
{
personal_string = m.group(8);
personal_string = removeAnyBounding('(', ')',
getFirstComment(personal_string));
}
}
else if (m.group(9) != null)
{
// addr-spec form
local_part_da = m.group(11);
if (local_part_da == null) local_part_qs = m.group(12);
current_localpart =
(local_part_da == null ? local_part_qs : local_part_da);
current_domainpart = m.group(13);
if (EXTRACT_CFWS_PERSONAL_NAMES)
{
personal_string = m.group(14);
personal_string = removeAnyBounding('(', ')',
getFirstComment(personal_string));
}
}
}
}
else
{
// no quoted identifiers, yes|no domain literals
local_part_da = m.group(3);
if (local_part_da == null) local_part_qs = m.group(4);
domain_part_da = m.group(5);
if (domain_part_da == null && ALLOW_DOMAIN_LITERALS)
domain_part_dl = m.group(6);
current_localpart = (local_part_da == null ? local_part_qs : local_part_da);
current_domainpart = (domain_part_da == null ? domain_part_dl : domain_part_da);
if (EXTRACT_CFWS_PERSONAL_NAMES)
{
personal_string = m.group((ALLOW_DOMAIN_LITERALS ? 1 : 0) + 6);
personal_string = removeAnyBounding('(', ')',
getFirstComment(personal_string));
}
}
if (current_localpart != null) current_localpart = current_localpart.trim();
if (current_domainpart != null) current_domainpart = current_domainpart.trim();
if (personal_string != null)
{
// trim even though calling cPS which trims, because the latter may return
// the same thing back without trimming
personal_string = personal_string.trim();
personal_string = cleanupPersonalString(personal_string);
}
// remove any unecessary bounding quotes from the localpart:
String test_addr = removeAnyBounding('"', '"', current_localpart) +
"@" + current_domainpart;
if (ADDR_SPEC_PATTERN.matcher(test_addr).matches()) current_localpart =
removeAnyBounding('"', '"', current_localpart);
return(new String[] { personal_string, current_localpart, current_domainpart });
}
/**
* Given a string, extract the first matched comment token as defined in 2822, trimmed;
* return null on all errors or non-findings
*
* This is probably not super-useful. Included just in case.
*
* Note for future improvement: if COMMENT_PATTERN could handle nested
* comments, then this should be able to as well, but if this method were to be used to
* find the CFWS personal name (see boolean option) then such a nested comment would
* probably not be the one you were looking for?
*/
public static String getFirstComment(String text)
{
if (text == null) return(null); // important
Matcher m = COMMENT_PATTERN.matcher(text);
if (! m.find()) return(null);
return(m.group().trim()); // trim important
}
/**
* Given a string, if the string is a quoted string (without CFWS
* around it, although it will be trimmed) then remove the bounding
* quotations and then unescape it. Useful when passing
* simple named address personal names into InternetAddress since InternetAddress always
* quotes the entire phrase token into one mass; in this simple (and common) case, we
* can strip off the quotes and de-escape, and passing to javamail will result in a cleaner
* quote-free result (if there are no embedded escaped characters) or the proper
* one-level-quoting
* result (if there are embedded escaped characters). If the string is anything else,
* this just returns it unadulterated.
*/
private static String cleanupPersonalString(String text)
{
if (text == null) return(null);
text = text.trim();
Matcher m = QUOTED_STRING_WO_CFWS_PATTERN.matcher(text);
if (! m.matches()) return(text);
text = removeAnyBounding('"', '"', m.group());
text = ESCAPED_BSLASH_PATTERN.matcher(text).replaceAll("\\\\");
text = ESCAPED_QUOTE_PATTERN.matcher(text).replaceAll("\"");
return(text.trim());
}
/**
* If the string starts and ends with s and e, remove them, otherwise return
* the string as it was passed in.
*/
private static String removeAnyBounding(char s, char e, String str)
{
if (str == null || str.length() < 2) return(str);
if (str.startsWith(String.valueOf(s)) && str.endsWith(String.valueOf(e)))
return(str.substring(1, str.length() - 1));
else return(str);
}
/* The current regex string for mailbox token, just for fun:
(((?:(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?[a-zA-Z0-9\!\#-\'\*\+\-\/\=\?\^-\`\{-\~\.]+(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?)|(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?\"(?:(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!\#-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F])))*(?:(?:[ \t]*\r\n)?[ \t]+)?\"(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?))(?:(?:(?:[ \t]*\r\n)?[ \t]+)(?:(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?[a-zA-Z0-9\!\#-\'\*\+\-\/\=\?\^-\`\{-\~\.]+(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?)|(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?\"(?:(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!\#-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F])))*(?:(?:[ \t]*\r\n)?[ \t]+)?\"(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?)))*)??((?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?<((?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?([a-zA-Z0-9\!\#-\'\*\+\-\/\=\?\^-\`\{-\~\.]+(?:\.[a-zA-Z0-9\!\#-\'\*\+\-\/\=\?\^-\`\{-\~\.]+)*)(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?|(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?(\"(?:(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!\#-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F])))*(?:(?:[ \t]*\r\n)?[ \t]+)?\")(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?)@(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?([a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\.[a-zA-Z]{2,6})(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?>((?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?))|(((?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?([a-zA-Z0-9\!\#-\'\*\+\-\/\=\?\^-\`\{-\~\.]+(?:\.[a-zA-Z0-9\!\#-\'\*\+\-\/\=\?\^-\`\{-\~\.]+)*)(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[\t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?|(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?(\"(?:(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!\#-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F])))*(?:(?:[ \t]*\r\n)?[ \t]+)?\")(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?)@(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?([a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\.[a-zA-Z]{2,6})((?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F\!-\'\*-\[\]-\~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?)
*/
}