Asked by PeterBaileyUk
at 2024-07-12 22:45:48
Point:500 Replies:15 POST_ID:828969USER_ID:11868
Topic:
Regular Expressions;;
How can I instruct regular expression to ignore the first word, then start its search for a pattern.
Expert: Terry Woods replied at 2024-07-23 19:02:34
Are you doing a replace with the replacement string being something like "$1"? We may need to change that as part of the solution. In fact, if you weren't doing a replace with "$1" as the replacement string, then try doing that as it may solve the problem :-)
Author: PeterBaileyUk replied at 2024-07-23 01:42:21
I have a pattern which evolved around one manufacturer and model.
Ignoring the very first word is fundamental in case the actual vehicle was named with digits like puegeot 207 but had the BHP which could be from 2 digits to 3 digits long i.e 90 or 109 and or have the word BHP as a separate word or attached 110bhp.
I wanted to try to create a less complex pattern that was not vehicle specific.
StrPattern = "^s*(S+(?=s).*?)(?:([^d.])d+$|(?d* ?bhp?d*)?|(d+)|[d{2,3}]|d{3})"
I have discovered a boundary so it must find values between 39-302 inclusive but not pick up for example 3020 or 390
skipping the first word is paramount. the pattern must find the values if they are at the end of the string also.
Ignoring the very first word is fundamental in case the actual vehicle was named with digits like puegeot 207 but had the BHP which could be from 2 digits to 3 digits long i.e 90 or 109 and or have the word BHP as a separate word or attached 110bhp.
I wanted to try to create a less complex pattern that was not vehicle specific.
StrPattern = "^s*(S+(?=s).*?)(?:([^d.])d+$|(?d* ?bhp?d*)?|(d+)|[d{2,3}]|d{3})"
I have discovered a boundary so it must find values between 39-302 inclusive but not pick up for example 3020 or 390
skipping the first word is paramount. the pattern must find the values if they are at the end of the string also.
Expert: Terry Woods replied at 2024-07-22 18:47:14
Peter, are you able to please post the code you've currently got? Any existing capturing groups (and their replacement) may need to be adjusted once we provide a change to your existing pattern.
If the pattern you're talking about, with regards to that comment, is the one starting with
I have noticed that the first word is never preceded by a space so is that how some of these work?
If the pattern you're talking about, with regards to that comment, is the one starting with
then the s* part of the pattern can match zero occurrences of a space character. To match one or more occurrences, rather than zero or more, you would use s+ instead of s*
The pattern
is matching the first word, but capturing the second one (well, at least it is if the second word is only made up of numbers, as it's d+ rather than w+). As I mentioned above, we need to know what's happening with the capturing groups to determine what happens from there on. I suspect this pattern isn't one that will work for you, as it only matches when the second "word" is made up of numbers.
Author: PeterBaileyUk replied at 2024-07-21 23:42:10
I am in vb access using VBScript.RegExp object
Assisted Solution
Expert: kaufmed replied at 2024-07-14 12:01:55
125 points EXCELLENT
What programming language or text editor are you using to execute this? If we don't know what engine you use, then we could be offering bad advice. For example, what duncanb7 suggested is a PERL-based expression. It would not work (the way you think it would) in a .NET regex.
Author: PeterBaileyUk replied at 2024-07-14 00:23:48
I have noticed that the first word is never preceded by a space so is that how some of these work?
Author: PeterBaileyUk replied at 2024-07-14 00:18:44
i put a simple pattern in with respect to idID: 40192739
~ s/^S+s*//d+ i've added the d+ just a simple pattern
Test data 207 HATCHBACK XR 1.6 VTi [120] 5dr Auto I had hoped the 120 would be highlighted after ignoring the first word the 207.
i put a simple pattern in with respect to idID: 40192765 ^s*w+s*(d+)
207 HATCHBACK XR 1.6 VTi [120] 5dr Auto that highlights the first word, it should have ignored that and find the 120
I wanted to ignore the first word as the manufacturer have names their models with integers ie 207
would it be simpler maybe to reverse the search start from end in?
~ s/^S+s*//d+ i've added the d+ just a simple pattern
Test data 207 HATCHBACK XR 1.6 VTi [120] 5dr Auto I had hoped the 120 would be highlighted after ignoring the first word the 207.
i put a simple pattern in with respect to idID: 40192765 ^s*w+s*(d+)
207 HATCHBACK XR 1.6 VTi [120] 5dr Auto that highlights the first word, it should have ignored that and find the 120
I wanted to ignore the first word as the manufacturer have names their models with integers ie 207
would it be simpler maybe to reverse the search start from end in?
Author: PeterBaileyUk replied at 2024-07-13 23:28:50
ok i will take a look. Ive been involved in a project recently involving 750000 word groups and asked a lot of the experts. I am widening the data sets for testing and trying now to create shorter patterns that cover most of what i need as its not possible to find a pattern that does all that i need.
its become evident that the first word has more weight in the description.
its become evident that the first word has more weight in the description.
Assisted Solution
Expert: Dan Craciun replied at 2024-07-13 00:49:34
125 points EXCELLENT
Depending on your Regexp engine, this usually works:
.*? will match the first word (sequence of letters, numbers or _)
or
or
s*w+s* will match some (optional) space characters, 1 or more word characters, then some more optional space characters.
HTH,
Dan
HTH,
Dan
Assisted Solution
Expert: duncanb7 replied at 2024-07-12 23:44:43
125 points EXCELLENT
you could try this, remove first word of each line first and then search pattern
~ s/^S+s*//
s/.../.../ # Substitute command.
^ # (Zero-width) Begin of line.
S+ # Non-space characters.
s* # Blank-space characters.
// # Substitute with nothing, so remove them.
Hope understand your question completely.If not, please point it out
Duncan
~ s/^S+s*//
s/.../.../ # Substitute command.
^ # (Zero-width) Begin of line.
S+ # Non-space characters.
s* # Blank-space characters.
// # Substitute with nothing, so remove them.
Hope understand your question completely.If not, please point it out
Duncan