New Beta Feature: Fast Check

Post by **Gimagukk** » Mon Aug 17, 2015 10:37 pm

There is a new checkbox in the trigger editor called "Use Fast Check", which enhances performance for regular expression triggers as detailed below. Technically this is not a new feature as GINA has always done this behind the scenes, but this new version offers visibility into the string GINA is picking for fast checks and allows you to disable the feature for a specific trigger. Hovering over the checkbox will show you what text GINA has determined should be used for the trigger for each character. If you disable the Fast Check box, or if GINA can not derive a digest (blanks show in the tooltip for characters), GINA will not attempt to do a Fast Check for the trigger, and will always perform the more costly regular expression check. You should only disable Fast Check if GINA's derived digest test fails but you know the regular expression test should be succeeding.

What is Fast Check?
Regular expression searches provide flexibility and power but can be fairly costly to evaluate. To maximize performance, GINA performs a rudimentary parse on the regular expression pattern to determine if there are characters in the pattern that will always be found in successfully searched text, which I will refer to as "digests". If a digest of 8 or more characters are found, GINA picks the longest of these digests, and performs standard "string contains" comparison against a line of text using the digest instead of testing the regular expression. If the digest comparison succeeds, only then is the regular expression actually tested.

The difference between a simple string-contains test and a regular expression test is significant -- usually 1 to 3 orders of magnitude or more depending on how complex the regular expression is. On my computer, the string-contains test takes 1-2 microseconds, while regular expression tests take 30-1000 microseconds depending on complexity (and I don't use a lot of super complex triggers). Given that the majority of lines running through the parser are not going to match a trigger, it is much faster to check thousands of lines at 1 microsecond rather than 100 microseconds, and then spend the 100 microseconds on the much small pool of candidates that passed the string-contains check.

To get a feel for the performance difference with fast check enabled and disabled, use the performance tab in GINA, paying particular attention to the yellow "Unmatched Average microseconds". (Note that the first time a trigger is evaluated after loading GINA or making changes to the trigger, GINA compiles data for that trigger, so you should hit the "Reset Statistics" so that the initial compile time for the trigger does not skew the results.)

Determining the Digest
As I noted in the first paragraph, the algorithm GINA uses to derive a digest is very rudimentary. First, it ignores anything in parenthesis and brackets, replacing them with an empty set of parenthesis. Then it replaces any backslashes and the following character with "+". Next, it does a regex search on what is left to find any consecutive character runs composed of word characters (\w), commas, and colons as the list of potential digests. Finally, it takes the longest of these digests having more than 8 characters and picks the first 8 characters of that digest to be the final digest to be used when matching the trigger. If there are no digests which are 8 or more characters, GINA will not attempt to do the Fast Check functionality for this trigger.

Let's use an example:
Search Text:

Code: Select all

(Reht|Gore) tells you, '{S} \d\d\d and the like'

First, GINA expands the regex shortcut {S} to complete a valid regex expression:

Code: Select all

(Reht|Gore) tells you '(?<s>.+) \d\d\d and the like'

Next, GINA replaces parenthetical data with an empty set:

Code: Select all

() tells you, '() \d\d\d and the like'

Next, backslashed characters are replaced:

Code: Select all

() tells you '() +++ and the like'

Next, GINA looks for possible digests, and finds:

Code: Select all

" tells you " and " and the like"

Both digests are over 8 characters, so GINA picks the longest one (" and the like") and then picks the first 8 characters (" and the") as the final digest.

Tweaking the Fast Check Digest
Of course, because of the very basic algorithm being used to derive a digest, there will be times when an inefficient digest may be selected. For example, given the trigger text:

Code: Select all

Legabozo glares awkwardly at {S}

GINA is going to decide to use "Legabozo " as the digest. If you are on a raid fighting Legabozo, the majority of the lines passing through the parser are going to pass the Fast Check ("Reht hits Legabozo for 1 points of damage.", "You swing at Legabozo but miss!", etc.) and then test (and fail) against the more costly regular expression. A much better digest would have been "awkwardl" since very few lines will have that text. To force GINA to use this as the digest, we just need to use parenthesis around the rest of the literal text preceding it, since GINA ignores parenthetical data when determining the digest:

Code: Select all

(Legabozo glares )awkwardly at {S}

Deloehne · Post by **Deloehne** » Fri Apr 01, 2016 1:40 pm

It appears to me that Fast Check is only needed if the string for comparison contains a regex expression. Is this correct?

eq.gimasoft.com

New Beta Feature: Fast Check

New Beta Feature: Fast Check

Re: New Beta Feature: Fast Check