Youāre addressing a very, very real pain-point. I am a -rabid- fan of using regexps, and have sometimes written -massive- res. They end up with duplication and just their size alone makes them daunting to read and understand a few months after theyāre written.
Notwithstanding, theyāre still (to me) far, far better than the equivalent loop-nests. But what would be great, would be a ālittle languageā where you could write your regexp in stages, e.g.
slc = [-@:%._\\+~#=]
tlc = [()]
pc = [()@:%_\\+.~#?&/=]
"http(s)?://(www.)?" slc{1,256} "." tlc{1,6} bow pc*
where Iām clearly wildly abusing the notation and syntax, and what Iāve written isnāt from any known RE dialect. But what I mean is,
(a) named sub-expressions
(b) ability to combine named subexpressions in follow-on subexps and the final expression
(c) a bit of an expression-language for combining the sub-expressions.
Maybe for that last line, instead
http(s)?://(www.)?{slc}{1,256}.{tlc}{1,6}{bow}{pc}*
where now ā{}'ā become special chars and enclose subexpression-names. But in the first version, whitespace was not part of the regexp, so probably the regexps need to be quoted if they contain whitespace so that can be distinguished from whitespace-for-formatting-and-readability.
OK, Iām just babbling here, but really what I want to say is, this a real problem that hinders usability for regexps and Iām excited to see that youāre working on addressing it.