I suck at regex... Can someone please explain why only example C is working here?

1    11 Jul 2019 09:01 by u/argosciv

Goal: Catch only letters found in char array regardless of upper/lower case, only perform operations if char is not a letter.

Problem: Does not catch uppercase except in example C.

Example A:

foreach ( char x in chararray )
{
    string str = x.ToString();
    Regex rgx = new Regex( @"^(?-i)[a-zA-Z]$", RegexOptions.IgnoreCase );
    if ( !rgx.IsMatch( str ) )
    {
        //operations here
    }
}

Example B:

foreach ( char x in chararray )
{
    string str = x.ToString();
    Regex rgx = new Regex( @"^[a-zA-Z]$", RegexOptions.IgnoreCase );
    if ( !rgx.IsMatch( str.ToLower() ) )
    {
        //operations here
    }
}

Example C:

foreach ( char x in chararray )
{
    string str = x.ToString().ToLower();
    Regex rgx = new Regex( @"^[a-zA-Z]$" );
    if ( !rgx.IsMatch( str ) )
    {
        //operations here
    }
}

17 comments

0

I don't have time to look into this now, but know that there is a /i regex flag that means case-insensitive, which might be what you want.

0

Tried (?-i) in example A. Is /i proper syntax?

0

I don't know C#, but if its engine is PCRE, the syntax for flags inside the pattern should be like (?i)pattern or (?i:pattern).

0

I've tried a few (?i)/(?-i) variations, except the second one from your comment. Just tried that in a few variations, no dice.

0

put a * after [a-zA-Z], otherwise you're only going to match 1 character strings

0

They are indeed only 1 character strings. Original string was converted to a CharArray, to then be iterated through.

0

I tried all 3 using mono on linux and they all worked fine. Also, why use regex at all? You can just compare x to 'a', etc. I included a 4th example in the pastebin showing the complete program.

https://pastebin.com/k1HQR5hu

0

Interesting approach there in your 4th example, hadn't considered that.

I'll see if I get acceptable results using your code in a minute. Thanks for taking the time.

0

Okay, what in the actual fuck is this black magick...

It's true, all of the examples in your code work.

0

Interesting approach there in your 4th example, hadn't considered that.

I'll see if I get acceptable results using your code in a minute. Thanks for taking the time.

Okay, what in the actual fuck is this black magick...

It's true, all of the examples in your code work.

Oh god fucking damnit, I just figured out where I was actually going wrong and it has nothing at all to do with the regex. Fuck my fucking life.

0

Instead of running a match inside a loop create a new string by regex replacing letters with nothing, then run the operation on each character in the new string.

0

Using regex for a simple pattern you know at compile-time is a lousy way to do it. Checking if a char is an ASCII letter can be done in 3 instructions in C: https://godbolt.org/z/HMvLLC . Using regex would be hundreds.

0

and even that has the convenience function of isalpha() in C and its rough equivalent Char.IsLetter() in C#.

0

So? What where you actually trying to do, and how did you solve it? :-)

0

Was trying to split a user-inputted string into its characters and run ops on each character (if said character is a letter).

Turns out the regex was perfectly fine, I just had a logical error further down which at first appeared like a regex problem. Fixing the logical error showed that the regex was working perfectly fine.

The error being that I wasn't converting uppercase letters to lowercase letters when checking them against a dictionary which only has lowercase letters in it. This made me think that the regex was skipping over uppercase letters, but it wasn't, I just wasn't using them correctly (hence also why Example C 'worked' due to lowercasing the character at that particular point -- albeit that I didn't want to do so there).

0

Okay. I was just curious - because it looked so overly elaborate just for detecting letters. I figured you had a reason besides the utilitarian to use regexes and that the input char array was beyond your control or for "performance considerations".

There is an IsLetter(...) method in the Char class that can detect if a character is a Unicode 'letter' but that includes non-english letters. The simplest (a fastest) way to detect if a char is an english letter would be simply "if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')) ...". This will not have any of the overhead that string conversions and regexes have and will also use the processor's cache and speculative execution efficiently - if that is ever a consideration.

Using regexes can be very efficient but mostly for complex syntaxes or longer 'strings'. Rolling your 'own' parser in such a case is likely a waste of time.

Good luck!

0

Don't use Regex for that shit, use Char.IsLetter()