v/programming: I suck at regex... Can someone please explain why only example C is working here?

I suck at regex... Can someone please explain why only example C is working here?

1 11 Jul 2019 09:01 by u/argosciv

Goal: Catch only letters found in char array regardless of upper/lower case, only perform operations if char is not a letter.

Problem: Does not catch uppercase except in example C.

Example A:

foreach ( char x in chararray )
{
    string str = x.ToString();
    Regex rgx = new Regex( @"^(?-i)[a-zA-Z]$", RegexOptions.IgnoreCase );
    if ( !rgx.IsMatch( str ) )
    {
        //operations here
    }
}

Example B:

foreach ( char x in chararray )
{
    string str = x.ToString();
    Regex rgx = new Regex( @"^[a-zA-Z]$", RegexOptions.IgnoreCase );
    if ( !rgx.IsMatch( str.ToLower() ) )
    {
        //operations here
    }
}

Example C:

foreach ( char x in chararray )
{
    string str = x.ToString().ToLower();
    Regex rgx = new Regex( @"^[a-zA-Z]$" );
    if ( !rgx.IsMatch( str ) )
    {
        //operations here
    }
}

17 comments

0 u/neogag 11 Jul 2019 09:04

I don't have time to look into this now, but know that there is a /i regex flag that means case-insensitive, which might be what you want.

0 u/argosciv [OP] 11 Jul 2019 09:08

Tried (?-i) in example A. Is /i proper syntax?

0 u/lemon11 11 Jul 2019 09:28

I don't know C#, but if its engine is PCRE, the syntax for flags inside the pattern should be like (?i)pattern or (?i:pattern).

0 u/argosciv [OP] 11 Jul 2019 09:36

I've tried a few (?i)/(?-i) variations, except the second one from your comment. Just tried that in a few variations, no dice.

0 u/folgeyharry 11 Jul 2019 09:14

put a * after [a-zA-Z], otherwise you're only going to match 1 character strings

0 u/argosciv [OP] 11 Jul 2019 09:15

They are indeed only 1 character strings. Original string was converted to a CharArray, to then be iterated through.

0 u/folgeyharry 11 Jul 2019 09:43

I tried all 3 using mono on linux and they all worked fine. Also, why use regex at all? You can just compare x to 'a', etc. I included a 4th example in the pastebin showing the complete program.

https://pastebin.com/k1HQR5hu

0 u/argosciv [OP] 11 Jul 2019 09:48

Interesting approach there in your 4th example, hadn't considered that.

I'll see if I get acceptable results using your code in a minute. Thanks for taking the time.

0 u/argosciv [OP] 11 Jul 2019 10:04

Okay, what in the actual fuck is this black magick...

It's true, all of the examples in your code work.

0 u/argosciv [OP] 11 Jul 2019 10:06

Interesting approach there in your 4th example, hadn't considered that.

I'll see if I get acceptable results using your code in a minute. Thanks for taking the time.

Okay, what in the actual fuck is this black magick...

It's true, all of the examples in your code work.

Oh god fucking damnit, I just figured out where I was actually going wrong and it has nothing at all to do with the regex. Fuck my fucking life.

0 u/psimonster 11 Jul 2019 13:34

Instead of running a match inside a loop create a new string by regex replacing letters with nothing, then run the operation on each character in the new string.

0 u/HoneyTrap1488 12 Jul 2019 01:18

Using regex for a simple pattern you know at compile-time is a lousy way to do it. Checking if a char is an ASCII letter can be done in 3 instructions in C: https://godbolt.org/z/HMvLLC . Using regex would be hundreds.

0 u/ELS_BrigadeWarning 28 Aug 2019 14:23

and even that has the convenience function of isalpha() in C and its rough equivalent Char.IsLetter() in C#.

0 u/berne 09 Aug 2019 18:52

So? What where you actually trying to do, and how did you solve it? :-)

0 u/argosciv [OP] 09 Aug 2019 23:11

Was trying to split a user-inputted string into its characters and run ops on each character (if said character is a letter).

Turns out the regex was perfectly fine, I just had a logical error further down which at first appeared like a regex problem. Fixing the logical error showed that the regex was working perfectly fine.

The error being that I wasn't converting uppercase letters to lowercase letters when checking them against a dictionary which only has lowercase letters in it. This made me think that the regex was skipping over uppercase letters, but it wasn't, I just wasn't using them correctly (hence also why Example C 'worked' due to lowercasing the character at that particular point -- albeit that I didn't want to do so there).

0 u/berne 10 Aug 2019 10:21

Okay. I was just curious - because it looked so overly elaborate just for detecting letters. I figured you had a reason besides the utilitarian to use regexes and that the input char array was beyond your control or for "performance considerations".

There is an IsLetter(...) method in the Char class that can detect if a character is a Unicode 'letter' but that includes non-english letters. The simplest (a fastest) way to detect if a char is an english letter would be simply "if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')) ...". This will not have any of the overhead that string conversions and regexes have and will also use the processor's cache and speculative execution efficiently - if that is ever a consideration.

Using regexes can be very efficient but mostly for complex syntaxes or longer 'strings'. Rolling your 'own' parser in such a case is likely a waste of time.

Good luck!

0 u/ELS_BrigadeWarning 28 Aug 2019 14:20

Don't use Regex for that shit, use Char.IsLetter()