What is Fast Check?
Regular expression searches provide flexibility and power but can be fairly costly to evaluate. To maximize performance, GINA performs a rudimentary parse on the regular expression pattern to determine if there are characters in the pattern that will always be found in successfully searched text, which I will refer to as "digests". If a digest of 8 or more characters are found, GINA picks the longest of these digests, and performs standard "string contains" comparison against a line of text using the digest instead of testing the regular expression. If the digest comparison succeeds, only then is the regular expression actually tested.
The difference between a simple string-contains test and a regular expression test is significant -- usually 1 to 3 orders of magnitude or more depending on how complex the regular expression is. On my computer, the string-contains test takes 1-2 microseconds, while regular expression tests take 30-1000 microseconds depending on complexity (and I don't use a lot of super complex triggers). Given that the majority of lines running through the parser are not going to match a trigger, it is much faster to check thousands of lines at 1 microsecond rather than 100 microseconds, and then spend the 100 microseconds on the much small pool of candidates that passed the string-contains check.
To get a feel for the performance difference with fast check enabled and disabled, use the performance tab in GINA, paying particular attention to the yellow "Unmatched Average microseconds". (Note that the first time a trigger is evaluated after loading GINA or making changes to the trigger, GINA compiles data for that trigger, so you should hit the "Reset Statistics" so that the initial compile time for the trigger does not skew the results.)
Determining the Digest
As I noted in the first paragraph, the algorithm GINA uses to derive a digest is very rudimentary. First, it ignores anything in parenthesis and brackets, replacing them with an empty set of parenthesis. Then it replaces any backslashes and the following character with "+". Next, it does a regex search on what is left to find any consecutive character runs composed of word characters (\w), commas, and colons as the list of potential digests. Finally, it takes the longest of these digests having more than 8 characters and picks the first 8 characters of that digest to be the final digest to be used when matching the trigger. If there are no digests which are 8 or more characters, GINA will not attempt to do the Fast Check functionality for this trigger.
Let's use an example:
Search Text:
Code: Select all
(Reht|Gore) tells you, '{S} \d\d\d and the like'
Code: Select all
(Reht|Gore) tells you '(?<s>.+) \d\d\d and the like'
Code: Select all
() tells you, '() \d\d\d and the like'
Code: Select all
() tells you '() +++ and the like'
Code: Select all
" tells you " and " and the like"
Tweaking the Fast Check Digest
Of course, because of the very basic algorithm being used to derive a digest, there will be times when an inefficient digest may be selected. For example, given the trigger text:
Code: Select all
Legabozo glares awkwardly at {S}
Code: Select all
(Legabozo glares )awkwardly at {S}