tcl.lang
Class Regex

java.lang.Object
  extended by tcl.lang.Regex

public class Regex
extends Object

The Regex class can be used to match a TCL-style regular expression against a string and optionally replace the matched parts with new strings. It serves as a gasket between TCL regular expression style and java.util.regex.* style. Known problems: FIXME - Most important: TCL always attempts to match the longest string starting from the outermost levels to the inner levels of parens. With alternation (|) TCL chooses the longest match of all the branches. Java, on the other hand, evaluates the RE from left to right, and returns the first successful match, even if it's not the longest. This class follows the Java rules, because there doesn't appear to be a way to influence Matcher's behavior to choose the longest. Probably the real solution is to write a custom regex engine that performs according to TCL rules. - BRE's are not supported (embedded option 'b' causes PatternSyntaxException) - ERE's are not supported, unless they are 'ARE'-compatible and are not explicitly requested with the 'e' embedded option ('e' causes PatternSyntaxException) - getInfo(), used by 'regexp -about', doesn't provide flag information. But the test cases in reg.test that use 'regexp -about', and the behavior during 'regexp -about' compile errors is adjusted. - Some syntax errors that would occur in C TCL don't occur here because Java is more forgiving of bad RE syntax

Version:
1.0, 2009/08/05, 1.1, 2010/05/29
Author:
Radoslaw Szulgo (radoslaw@szulgo.pl), Dan Bodoh (dan.bodoh@gmail.com)
See Also:
Matcher, Pattern

Field Summary
static int TCL_REG_ADVANCED
          AREs (also EREs) flag
static int TCL_REG_ADVF
          advanced features in ARE flag
static int TCL_REG_BASIC
          BRE flag (convenience)
static int TCL_REG_CANMATCH
          report details on partial/limited matches flag
static int TCL_REG_EXPANDED
          Expanded - comments and whitespace flag
static int TCL_REG_EXTENDED
          ERE flag
static int TCL_REG_NEWLINE
          Newlines are line terminators flag
static int TCL_REG_NLANCH
          ^ matches after \n $ before flag
static int TCL_REG_NLSTOP
          \n doesn't match .
static int TCL_REG_NOCASE
          ignore case flag
static int TCL_REG_NOSUB
          don't care about subexpressions flag
static int TCL_REG_QUOTE
          regex is a literal flag
 
Constructor Summary
Regex(String regexp, String string, int offset)
          Stores params in object and compiles given regexp.
Regex(String regexp, String string, int offset, int flags)
          Stores params in object and compiles given regexp.
Regex(String regexp, String string, int offset, int flags, String xflags)
          Stores params in object and compiles given regexp.
 
Method Summary
protected  Pattern compile(String tclRegex)
          Rewrite TCL regex into a Java regex, and compiles it to a Java Pattern.
 int end()
           
 int end(int group)
           
 int getCount()
           
 TclObject getInfo(Interp interp)
          Returns a list containing information about the regular expression.
protected  int getJavaFlags()
          Convert this.flags to Java's Pattern flags
 int getOffset()
           
static String getPatternSyntaxMessage(PatternSyntaxException ex)
          Return a regexp pattern syntax error message in a format expected by Tcl, primarily to make TCL tests to pass.
 String group()
           
 String group(int group)
          Returns the input subsequence captured by the given group during the previous match operation.
 int groupCount()
           
 boolean match()
          Attempts to match the input string against the regular expression.
protected  int parseFlagString(String optionString, boolean isEmbed)
          Parse up a embedded options string or an xflags string and modify this.flags.
protected static String parseSubSpec(String subSpec)
          ------------------------------------------------------------------------ ----- parseSubSpec -- Parses the replacement string (subSpec param) which is in Tcl's form.
 String replace(String tclSubSpec, boolean all)
          Replaces the subsequence(s) of the input sequence that match the pattern with the given TCL-style replacement string.
 int start()
           
 int start(int group)
           
protected  void testForUnsupportedFlags()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TCL_REG_BASIC

public static final int TCL_REG_BASIC
BRE flag (convenience)

See Also:
Constant Field Values

TCL_REG_EXTENDED

public static final int TCL_REG_EXTENDED
ERE flag

See Also:
Constant Field Values

TCL_REG_ADVF

public static final int TCL_REG_ADVF
advanced features in ARE flag

See Also:
Constant Field Values

TCL_REG_ADVANCED

public static final int TCL_REG_ADVANCED
AREs (also EREs) flag

See Also:
Constant Field Values

TCL_REG_QUOTE

public static final int TCL_REG_QUOTE
regex is a literal flag

See Also:
Constant Field Values

TCL_REG_NOCASE

public static final int TCL_REG_NOCASE
ignore case flag

See Also:
Constant Field Values

TCL_REG_NOSUB

public static final int TCL_REG_NOSUB
don't care about subexpressions flag

See Also:
Constant Field Values

TCL_REG_EXPANDED

public static final int TCL_REG_EXPANDED
Expanded - comments and whitespace flag

See Also:
Constant Field Values

TCL_REG_NLSTOP

public static final int TCL_REG_NLSTOP
\n doesn't match . or [^ ] flag

See Also:
Constant Field Values

TCL_REG_NLANCH

public static final int TCL_REG_NLANCH
^ matches after \n $ before flag

See Also:
Constant Field Values

TCL_REG_NEWLINE

public static final int TCL_REG_NEWLINE
Newlines are line terminators flag

See Also:
Constant Field Values

TCL_REG_CANMATCH

public static final int TCL_REG_CANMATCH
report details on partial/limited matches flag

See Also:
Constant Field Values
Constructor Detail

Regex

public Regex(String regexp,
             String string,
             int offset,
             int flags)
      throws PatternSyntaxException
Stores params in object and compiles given regexp. Additional param 'flags' is a bitwise-or of Regex.TCL_REG_* flags Note that 'flags = flags | TCL_REG_ADVANCED' internally prior to any processing of embedded options.

Parameters:
regexp - TCL-style regular expression
string - input string
offset - offset of the input string where matching starts
flags - Regex.TCL_REG_* flags of pattern object that compiles regexp
Throws:
PatternSyntaxException - when there is an error during regexp compilation

Regex

public Regex(String regexp,
             String string,
             int offset)
      throws PatternSyntaxException
Stores params in object and compiles given regexp. Flags are set to TCL_REG_ADVANCED prior to any processing of embedded options.

Parameters:
regexp - TCL-style regular expression
string - input string
offset - offset of the input string where matching starts
Throws:
PatternSyntaxException - when there is an error during regexp compilation

Regex

public Regex(String regexp,
             String string,
             int offset,
             int flags,
             String xflags)
      throws PatternSyntaxException
Stores params in object and compiles given regexp. This constructor is used to support testregexp, which has direct access to flags.

Parameters:
regexp - TCL-style regular expression
string - input string
offset - offset of the input string where matching starts
flags - Regex.TCL_REG_* flags of pattern object that compiles regexp
xflags - Flag string from reg.test (for testregexp)
Throws:
PatternSyntaxException - when there is an error during regexp compilation
Method Detail

match

public boolean match()
Attempts to match the input string against the regular expression. On the first call, it attempts matching starting at the offset specified in the constructor. On subsequent calls, it attempts matching where the previous match succeeded.

Returns:
true if a match is made, false otherwise

replace

public String replace(String tclSubSpec,
                      boolean all)
Replaces the subsequence(s) of the input sequence that match the pattern with the given TCL-style replacement string.

Parameters:
tclSubSpec - TCL-regsub-style replacement string
all - If true, all matches are replaced. If false, only the first match is replaced
Returns:
The string constructed by replacing the matching subsequence(s) by the replacement string, substituting captured subsequences as needed

getInfo

public TclObject getInfo(Interp interp)
                  throws TclException
Returns a list containing information about the regular expression. The first element of the list is a subexpression count. The second element is a should be a list of property names that describe various attributes of the regular expression. Currently, that property name list is empty. Primarily intended for debugging purposes.

Parameters:
interp - current Jacl interpreter object
Returns:
A list containing information about the regular expression.
Throws:
TclException

parseSubSpec

protected static String parseSubSpec(String subSpec)
------------------------------------------------------------------------ ----- parseSubSpec -- Parses the replacement string (subSpec param) which is in Tcl's form. This method replaces Tcl's '&' and '\N' where 'N' is a number 0-9. to Java's reference characters. This method also quotes any characters that have special meaning to Java's regular expression APIs. The replacement string (subSpec param) may contain references to subsequences captured during the previous match: Each occurrence of $g will be replaced by the result of evaluating group(g). The first number after the $ is always treated as part of the group reference. Subsequent numbers are incorporated into g if they would form a legal group reference. Only the numerals '0' through '9' are considered as potential components of the group reference. If the second group matched the string "foo", for example, then passing the replacement string "$2bar" would cause "foobar" to be appended to the string buffer. A dollar sign ($) may be included as a literal in the replacement string by preceding it with a backslash (\$). Results: None. Side effects: None.

Parameters:
subSpec - The replacement string
Returns:
The replacement string in Java's form ---------------------------- -------------------------------------------------

groupCount

public int groupCount()
Returns:
the number of capturing groups in the last successful match()
See Also:
Matcher.groupCount()

start

public int start()
Returns:
the index of the first character matched
See Also:
Matcher.start()

start

public int start(int group)
Parameters:
group - The index of a capturing group in this matcher's pattern
Returns:
the start index of the subsequence captured by the given group during the previous match operation.
See Also:
Matcher.start(int)

end

public int end()
Returns:
Returns the index of the last character matched, plus one.
See Also:
Matcher.end()

end

public int end(int group)
Parameters:
group - The index of a capturing group in this matcher's pattern
Returns:
The index of the last character captured by the group, plus one, or -1 if the match was successful but the group itself did not match anything
See Also:
Matcher.end(int)

group

public String group()
Returns:
The (possibly empty) subsequence matched by the previous match, in string form.

group

public String group(int group)
Returns the input subsequence captured by the given group during the previous match operation.

Parameters:
group - The index of a capturing group in this matcher's pattern
Returns:
The (possibly empty) subsequence captured by the group during the previous match, or null if the group failed to match part of the input
See Also:
Matcher.group(int)

getCount

public int getCount()
Returns:
the count of correctly matched subsequences of the input string

getOffset

public int getOffset()
Returns:
the offset of the input string

getPatternSyntaxMessage

public static String getPatternSyntaxMessage(PatternSyntaxException ex)
Return a regexp pattern syntax error message in a format expected by Tcl, primarily to make TCL tests to pass.

Parameters:
ex - A PatternSyntaxException thrown from the Regex constructor
Returns:
an error message string that looks like TCL's C implementation

getJavaFlags

protected int getJavaFlags()
Convert this.flags to Java's Pattern flags

Returns:
flags for Pattern.compile() from this object's flags member

parseFlagString

protected int parseFlagString(String optionString,
                              boolean isEmbed)
                       throws PatternSyntaxException
Parse up a embedded options string or an xflags string and modify this.flags.

Parameters:
optionString - contains embedded options or xflags
isEmbed - if true, string will be ignored if it does not start with (?, and anything after ')' will be ignored.
Returns:
index+1 of ')' or last character of string
Throws:
PatternSyntaxException - if flag string contains unknown or unsupported flags

testForUnsupportedFlags

protected void testForUnsupportedFlags()
                                throws PatternSyntaxException
Throws:
PatternSyntaxException - if a flag is unsupported

compile

protected Pattern compile(String tclRegex)
                   throws PatternSyntaxException
Rewrite TCL regex into a Java regex, and compiles it to a Java Pattern. There are some know problems, see Javadoc at top of file

Parameters:
tclRegex - The TCL regular expression to be compiled
Returns:
a Pattern object containing the compiled regex
Throws:
PatternSyntaxException - if the expression's syntax is invalid or unsupported
See Also:
Regex


Copyright © 2015. All rights reserved.