29 June 2008

Notepad++: A guide to using regular expressions and extended search mode

The information in this post details how to clean up DMDX .zil files, allowing for easy importing into Excel. However, the explanations following each Find/Replace term will benefit anyone looking to understand how to use Notepad++ extended search mode and regular expressions.

If you are specifically looking for multiline regular expressions, look at this post.

You may already know that I am a big fan of Notepad++. Apparently, a lot of other people are interested in Notepad++ too. My introductory post on Notepad++ is the most popular post on my speechblog. I have a feeling that that is about to change.

Since the release of version 4.9, the Notepad++ Find and Replace commands have been updated. There is now a new Extended search mode that allows you to search for tabs(\t), newline(\r\n), and a character by its value (\o, \x, \b, \d, \t, \n, \r and \\). Unfortunately, the Notepad++ documentation is lacking in its description of these new capabilities. I found Anjesh Tuladhar's excellent slides on regular expressions in Notepad++ useful. After six hours of trial and error, I managed to bend Notepad++ to my will. And so I decided to post what I think is the most detailed step-by-step guide to Search and Replace in Notepad++, and certainly the most detailed guide to cleaning up DMDX .zil output files on the internet.

What's so good about Extended search mode?

One of the major disadvantages of using regular expressions in Notepad++ was that it did not handle the newline character well—especially in Replace. Now, we can use Extended search mode to make up for this shortcoming. Together, Extended and Regular Expression search modes give you the power to search, replace and reorder your text in ways that were not previously possible in Notepad++.

Search modes in the Find/Replace interface

In the Find (Ctrl+F) and Replace (Ctrl+H) dialogs, the three available search modes are specified in the bottom right corner. To use a search mode, click on the radio button before clicking the Find Next or Replace buttons.

Cleaning up a DMDX .zil file

DMDX allows you to run experiments where the user responds by using the mouse or some other input device. Depending on the number of choices/responses (and of course the kind of task), DMDX will output a .zil file containing the results (instead of the traditional .azk file). This is specified in the header along with the various response options available to the participant. For some reason, DMDX outputs the reaction time twice—and on separate lines—in .zil files. Here's a guide for cleaning up these messy .zil files with Notepad++. Explanations of the Notepad++ search terms are provided in bullet points at the end of each step.

Step 1: Backup your original result file (e.g. yourexperiment.zil) and create a copy of that file (yourexperiment_copy.zil) that we will edit and clean up.

Step 2: Open yourexperiment_copy.zil in Notepad++ (version 4.9 or later).



Step 3: Remove all error messages.All lines containing DMDX error messages begin with an exclamation mark. Let's get rid of them.

Bring up the Replace dialog box (Ctrl+H) and select the Regular Expression search mode.

Find what: [!].*

Replace with: (leave this blank)

Press Replace All. All the error messages are gone.


  • [!] finds the exclamation character.

  • .* selects the rest of the line.

Step 4: Get rid of all these blank lines.

Switch to Extended search mode in the Replace dialog.

Find what: \r\n\r\n

Replace with: (leave this blank)

Press Replace All. All the blank lines are gone.



  • \r\n is a newline character (in Windows).

  • \r\n\r\n finds two newline characters (what you get from pressing Enter twice).


Step 5: Put each Item (DMDXspeak for trial) on a new line.

Switch to Regular Expression search mode.

Find what: (\+.*)(Item)

Replace with: \1\r\n\2

Press Replace All. "Item"s have been placed on new lines.



  • \+ finds the + character.

  • .* selects the text after the + up until the word "Item".

  • Item finds the string "Item".

  • () allow us to access whatever is inside the parentheses. The first set of parentheses may be accessed with \1 and the second set with \2.

  • \1\r\n\2 will take + and whatever text comes after it, will then add a new line, and place the string "Item" on the new line.

So far so good. Our aim now is to delete duplicate or redundant information (reaction time data).


Step 6: Remove all newline characters using Extended search mode, replacing them with a unique string of text that we will use as a signpost for redundant data later in RegEx. Choose a string of text that does not appear in you .zil file—I have chosen mork.

Switch to Extended search mode in the Replace dialog.

Find what: \r\n

Replace with: mork

Press Replace All. All the newline characters are gone. Your entire DMDX .zil file is now one very long line of (in my case word-wrapped) text.



Step 7: We're nearly there. Using our mork signpost keyword, let's separate the different RT values.

Stay in Extended search mode.

Find what: ,

Replace with: ,mork

Press Replace All. Now, mork appears after every comma.


Step 8: Let's put the remaining Items on new lines.

Switch to and stay in Regular Expression search mode for the remaining steps.

Find what: mork(Item)

Replace with: \r\n\1

Press Replace All. All "Item"s should now be on new lines.



Step 9: Let's get rid of those duplicate RTs.

Find what: mork ([^A-Za-z]*)mork [^A-Za-z]*\,mork

Replace with: \1,

Press Replace All. Duplicate reaction times are gone. It's starting to look like a result file :)



  • A-Z finds all letters of the alphabet in upper case.

  • a-z finds all lower case letters.

  • A-Za-z will find all alphabetic characters.

  • [^...] is the inverse. So, if we put these three together: [^A-Za-z] finds any character except an alphabetic character.

  • Notice that only one of the [^A-Za-z] is in parentheses (). This is recalled by \1 in the Replace with field. The characters outside of the parentheses are discarded.

Step 10: Let's get rid of all those morks.

Find what: mork

Replace with: (leave blank)

Press Replace All. The morks are gone.



Step 11: Separate each participant's data from the next.

Find what: (\**\*)

Replace with: \r\n\r\n\1\r\n\r\n

Press Replace All. The final product is a beautiful, comma-delimited .zil result file that is ready to be imported into Excel for further analysis.



Notepad++, is there anything it can't do?


Please post your questions in the comments below, rather than emailing me. This way, others can refer to my answers here, saving me many hours of responding to similar emails over and over.

Update 20/2/2009: Having trouble understanding regexp? I have created a new Guide for regular expressions. Check it out.

471 comments:

«Oldest   ‹Older   401 – 471 of 471
Manolis said...

hello

What if I have multiple commands
...
System.QQQTable[1].Priority=3
other configuration
System.QQQTable[2].Priority=5
other configuration
...
and I want to search exactly all the below Priority entries of the specific System.QQQTable

System.QQQTable[*].Priority=*

because Priority exists elsewhere and System.QQQTable has other entries as well?

thanks
Manolis

Mark Antoniou said...

Hi Manoli,

The answer will depend on what the other occurrences of Priority and System that you want to exclude look like. It also depends on if occurrences that you are searching for will always have a single character enclosed within the the brackets and following the equals sign. Assuming those things to be true, then you would use

Search for (regexp mode): System\.QQQTable\[.\]\.Priority=.

If your search term needs to accommodate additional constraints/flexibility, then you'll need to specify further details.

Superhans said...

I have a very long HTML Code with the following structure (I made spaces in order to be able to post the html code):

< td class="tar" >
< div class="bubble in" >
Some Text, I want to keep! And maybe even an image.< br / >
< span class="time" >< div style="text-align:right" >17:14< /span >< /div >
< /div >
< /td >
< /tr >

< tr >
< td class="tal" >
< div class="bubble out" >
Some Text, I want to keep! And maybe even an image.< br / >
< span class="time" >< div style="text-align:right" >17:15< /span >< /div >
< /div >
< /td >
< /tr >

This is the structure of a chat with the two participants "bubble in" and "bubble out".
As you can see every "block" (Chat-Message) has a timecode like "17:14". I want to extend this timecode with a space character and an image but only for the text-block coming from "bubble in".

I am trying to accomplish this with the find & replace feature in Notepad++.

Here is what I came up with:

**Find what:**

< div class="bubble in" >[^"]*< span class="time" >< div style="text-align:right" >([0-9]*[0-9]*):([0-9]*[0-9]*)< /span >< /div >

**Replace with:**

< div class="bubble in" >\1< span class="time" >< div style="text-align:right">\2\3\4\5\6&#160< img src= "test.png" width="16" height="10" alt="0"/< /span >< /div >

The searching works but replacing \1 doesn't work somehow. Can you help me out?
I am new to RedEx and figured this out by searching the internet and just trying.
I think I am pretty close but only the wildcard [^"] for the random text is not yet correct.

Mark Antoniou said...

Try enclosing the random wildcard within parentheses ( )

Superhans said...

Thanks, I didn't know that \1 referred to the brackets:

this is how it worked:

Find What:

(< div class="bubble in" >.*?< span class="time" >< div style="text-align:right" >[0-9]{1,2}:[0-9]{1,2})(< /span >< /div >)

Replace with:

\1 < img src= "test.png" width="16" height="10" alt="0" /> \2

DESI said...

Hello,
I have lot many inserts statement and all of them have ROWSTAMP column and its value. Plus I have some other unique column value to be replaced with sequence next number.

How do I write regular expression starting with some particular string and ending on one particular string?

Mark Antoniou said...

DESI, you need to give me more detail. Can you paste a few lines of the text and then show me what you want the output to be (before and after).

DESI said...

I have exported insert statement from SQLDeveloper. Those statements have one column called ROWSTAMP. I can easily remove that columns using find and replace. But its values I can't remove easily. So I was looking for search pattern where if I can get regular expression for Starts with and end with then I can replace contents coming in between start with and end with.

e.g
values (1,'START',1,1,null,'START 1','START 1',1,'APA_MOCREV',599,'EN',0,'869313');

I want to replace 599,'EN',0,'869313' with XYZ.nextval,'EN',0)

Mark Antoniou said...

In order to give you a robust solution that will work for your larger textfile, I need more lines of text. Based on what you have provided, this will work:

Search for (regexp mode): (.*MOCREV',).*(;)
Replace with: \1XYZ.nextval,'EN',0)\2

Unknown said...
This comment has been removed by the author.
Unknown said...

I have deleted my last comment because it was poorly expressed. Here's my question in a clearer form:

I am trying to replace all the different "Last revised: [timestamp]" lines (one on each page of my site) with a call to a PHP include file.

As I understand it, because the PHP call has carriage returns, I will have to do that step separately via Extended Search. With this bit I have no problem.

But how do I RegEx search *just* for the line containing the string "Last revised: [timestamp]" (in which [timestamp] is a variable) in order to set this up?

Mark Antoniou said...

I'm not really sure what you're trying to do. Rather than describing it, could you paste 3-5 lines of your original text (before you have done any regexp or extended search changes) and then show me what you want it to look like at the end.

Unknown said...

Mark,

Since I had in two different formats, I actually wrote two search strings:

"<b>Last revised:</b< [FMSTW]+[a-z]+[a-z]*[a-z]*[a-z]*[a-z]+[d]+[a]+[y]+[,] +[0-9]*[0-9] +[A-Z]+[a-z]+[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z] +[2]+[0]+[01]+[0-9] +[a]+[t] +[0-9]*[0-9]:+[0-9]+[0-9]:+[0-9]+[0-9] +[ap]+[m]"

and

"<b>Last revised: </b> [0-9]*[0-9]:+[0-9]+[0-9] +[AP]+[M] +[FMSTW]+[a-z]+[a-z]*[a-z]*[a-z]*[a-z]+[d]+[a]+[y]+[,] +[A-Z]+[a-z]+[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z] +[0-9]*[0-9], +[2]+[0]+[0-9]+[0-9]"

I'm sure there are more efficient and elegant ways to do this, but it worked.

I then replaced both expressions with the text string "Timestamp1" throughout the site.

From there, replacing "Timestamp1" with my PHP call was simple in Extended Search mode.

Still some editing to do on individual pages that can't be avoided, but the Gordian knot is unraveled, thanks in part to some ideas I got from your page.

Thank you.

Unknown said...

Correction:

"Since I had <timestamp> in two different formats...."

Mark Antoniou said...

I can't help you unless if you show me the original text. Not your search expressions.

3-5 lines of the original, unaltered text. And a sample of what you want it to look like when you're done.

Right now, I have no idea what the text looks like and therefore I can't help you.

Unknown said...

I am no longer asking for help. The problem has already been solved.

I'm just thanking you for some ideas I took from your post in solving it. :-)

Binh Nguyen said...

Thanks, nice post

Keeneye99 said...

Thank you! This has just saved me hours of menial labour!

eapo said...

To replace ### mg -> (### mg)

Find what: (\d+) mg
Replace with: \(\1 mg\)

SARA said...

i have a html file which have two paragraph tags

i want to replace the second paragraph tag with some css class name,

pls help me to find the solution

Mark Antoniou said...

Can you paste some of the text surrounding the two instances of the paragraph text that you want to replace?

SARA said...

thank you for your reply

but i found a solution, if any query i ask you, and thank you once again :-)

Malte said...

Hi I did not succeed in finde brackets '['
using regurlar expressions I thought I Need to escape the bracket with a slash liek this \[ but it tells me that the Syntax is wrong.
Then I tried it with (\[) did not help either [\[] does not help ...
I don't have any ideas left, do you?

Mark Antoniou said...

Malte, if you are using Regular Expression search mode, the way to search for [ is to precede it with a \

So, if I have this text:
Hello, how [ you today?

Search for (regexp): \[

Replace with: are

will give you this:
Hello, how are you today?



Unknown said...

Well i m new to Regular Expression..I hope you can help me. I have a text file which contains 10 digits mobile nos along with other garbage...These mobile nos are separated by A Tab Space..I wanna extract all these 10 digits mobile nos. Please help me..

Thanks

Mark Antoniou said...

Akash, you can simply copy the Tab character and paste that into your search term. For example,

Search for (regexp mode): (.*) Note that there is a Tab before the parentheses
Replace with: /1

egotrench said...

I know this post is old, but I ended up here looking for a specific item. I couldn't find it, but I figured out how to remove everything inbetween a /* and */, even if it spanned multiple lines:

([/*][ -Z\r\n]{0,9999}[*/])

Unknown said...

 Thanks for your amazing post. Keep going on

Anonymous said...

Mark, is it possible to group liked "things" together? I want to group all the liked positions [NFL Football] together.

I have this:

41 ABDUL-MALIK, Sultan LB 6-3 220 9/26/77 Fr. -- Arcadia (Arcadia)
19 ABRAMS, Adam PK 5-9 170 3/28/76 So*. 1V San Diego (Bishop's)
89 ALLRED, John TE 6-5 250 9/9/74 Sr.* 3V Del Mar (Torrey Pines)
40 AUBREY, Bob LB 6-3 215 9/29/75 So*. 1V Glendale (St. Francis)
25 BASTIANELLI, Mike S 6-1 185 5/18/76 So. 1V Danville (De La Salle)
37 BELL, David PK 6-1 200 2/10/78 Fr. -- Anaheim (Western)
79 BOELTER, Grant OT 6-6 310 12/2/75 Jr. JC Seguin, TX (Judson/San Francisco CC)
74 BOWEN, Ken OT 6-8 320 9/21/76 So.* 1V Orlando, FL (Dr. Phillips)

I would like to be able to group all the DT, OT, WR, LB etc. together. All the LB's are together, then all the TE's are together, then all the DT's are together, etc.

Is this possible?

Mark Antoniou said...

Craig68, it is possible, but regular expressions are not the best tool for sorting large data sets like yours. My advice would be to import the data into Excel (perhaps using the SPACE character to delimit columns) and then sort by the row containing DT, OT, WR etc.

Michel Merlin said...

In the initial article (29 June 2008) "Step 4: Get rid of all these blank lines" you replace (in Extended search mode) "\r\n\r\n" with nothing; this will actually replace each group of (1 text line, 1 blank line, 1 text line) with just ONE (concatenated) text line.
To do what Step 4 claims, you actually need to replace "\r\n\r\n" with "\r\n".
Versailles, Thu 14 Jan 2016 11:22:30 +0100

Mark Antoniou said...

That would only be an issue if there were any occurrences of 1 text line, 1 blank line, 1 text line. Also, when replacing \r\n\r\n with \r\n, it would be necessary to hit Replace All repeatedly until all instances of \r\n\r\n are gone. Another illustration of Notepad ++'s limitations when it comes to handling multiline regular expressions.

Unknown said...

How to find these chars in notepad++
ç
ü

Mark Antoniou said...

Praveen, there are lots of ways to do this. Here are 3 that immediately come to mind:

1. Type the characters in (either using an international keyboard or by specifying the ASCII code).
2. Copy the character and paste it into the Find dialog box.
3. Click and drag in the text file to highlight the character and then open the Find dialog box. The character will appear in the Find input field.

Layarion said...

Ok question/situation.

I need to Find something in an html file, the number 6.

i need notepad to match only the following:

any line that has "aPossessConditions" and the number 6
any line that has "id" and the number 6

i figured out how to search for them one at a time, like so
("aPossessConditions".*[6])

which results in false positives like this:
Line 351: 214=51,214=Over:06001

but also gives me what i'm looking for like this:
Line 832: 11=6,11=7

so i wanna know 2 things.
1) how do i weed out results like 06007?
2) how do i search for both of these things, that are on separate lines, at the same time?
any line that has "aPossessConditions" and the number 6
any line that has "id" and the number 6

Layarion said...

also, i should add that i'm less interested in replacing them, and more interested in finding them. replacing would be a bonus

Layarion said...

ok update: so far i got this:
(aPossessConditions.*6|"id".*6)

which seems to solve #2, but issue #1 is still a mystery to me

Layarion said...

update: ok this is what i'm using so far, it covers all of my needs except issue #1

"id".*6|aPossessConditions.*6|aEffects.*6|vCondIDs.*6|aThresholds.*6|vUsConditions.*6|vThemConditions.*6|vPairConditions.*6|vUsFailConditions.*6|vThemFailConditions.*6|vPairFailConditions.*6|vUsPreConditions.*6|vThemPreConditions.*6

i just punch that into the regular expressions. while it still gives me alot of false positives, it's far better than what it was before.

still, if you can help me with issue 1 that would make life bliss.

to give some other important details:

i only need to search for things like "6" and "6x1x1" and "6.90x1x1" "6.90"

i don't want things like "60" "06" or "60x1" "60.90" "60.90x1" in my search results

Mark Antoniou said...

Wow, lots of questions, Layarion. Sounds like you have figured some of it out.

Here is something that should help with issue 1. I should point out that I am using Emacs, not Notepad++, and would advise you to do the same because I don't think Notepad++ is powerful enough to do what you want, especially if you want to search across multiple lines.

Searching for aPossessConditions.* 6\. will return lines with 6 followed by a decimal point, such as 6.1, 6.2, 6.3 etc.

Searching for aPossessConditions.* 6x will return lines where 6 is multiplied by something else. If you use a unicode multiplication sign, i.e., × rather than an x, then simply replace that character in the search term

Searching for aPossessConditions.* 6[A-z,a-z] will return lines with 6 followed by an alphabetic character

Note that in each of the above I have included a space between the period and the 6 to omit values such as 06, 006, etc.

Put it all together to get something like this
Search for: aPossessConditions.* 6\.\|aPossessConditions.* 6x\|aPossessConditions.* 6[A-z,a-z]

It is hard for me to know if this will work for all instances in your text given the small peek that I'm working from, but this should be enough to give you an idea.

And to answer your question about searching vs replacing, if you can find it, you can replace it. Finding it is the hard part.

Layarion said...

hey mark i ended up going with this:

aResponses.*\D6\D

which, i should be able to replace the 6 with something like 6.1x1 (that's not multiplication, it's just an ID system for a game) and still get exact results.

the \D in front will keep things like 06 from showing up, and the \D after the number will keep things like 60 from showing up but will still allow 6x1x1 to show up.

this guide makes the most sense out of the others out there.

Mark Antoniou said...

Excellent. Glad you worked it out.

Dev said...

Hi,

I have couple of words in the following format:
[HOME_OWNER], [HOUSE_RENT] etc etc.

I want to convert it to the following:
[home_owner], [house_rent] etc etc.

Basically, convert everything inside [] into lower case.

Best Regards

Mark Antoniou said...

Hi Dev,

I am assuming that you're using Notepad++. You have a couple of options:

Option 1 would be to install the Notepad++ plug-in called TextFX. Select all the text and then click on the TextFX menu | TextFX Characters | lowercase. That will make everything selected lowercase in one click.

Option 2 is more precise and uses a regular expression to modify only the text enclosed within square brackets.

Search for (regex): (\[.*\])
Replace with: \L$1

That should do it :)

Michel Merlin said...

"Mark Antoniou" Fri 15 Jan 2016 03:15 GMT,
1. You wrote: « That would only be an issue if there were any occurrences of 1 text line, 1 blank line, 1 text line »
Such occurrences are just the "blank lines" targeted in Step 4 « Get rid of all these blank lines »

2. « Also, when replacing \r\n\r\n with \r\n, it would be necessary to hit Replace All repeatedly until all instances of \r\n\r\n are gone »
Of course, and this is also the case in §1; but it goes without saying that in such case you 1st do a Replace "\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n" with "\r\n" and or similar ones, which in total is way shorter than to "hit Replace All repeatedly until all instances of \r\n\r\n are gone"

3. You didn't address my very question: Step 4 is still uncorrected in the state where it does NOT what it claims. A 433-post discussion is of little use IMO as long as those posts are NOT really read and eventually applied when necessary.
Versailles, Tue 06 Dec 2016 09:02:20 +0100

Michel Merlin said...

"Mark Antoniou 06 Dec 2016 05:48 GMT "Option 1... Select all the text and then click TextFX menu | TextFX Characters | lowercase"
You can also select the text in Notepad++, right-click and directly choose "lowercase".

Now of course Option 2 (Regex | Replace All) remains more efficient if the occurrences are numerous.
Versailles, Tue 06 Dec 2016 15:13:20 +0100

ChuckB said...

Mark, just a quick note of appreciation for sharing your knowledge and helping those less adept with this software. :)

Unknown said...

Hi Mark
I dont want to REMOVE text, but I want to append to specific lines.
For example
If my text contains
a few lines like this
and then I want to append after this line

I can search for the line 'and then I want to append after this line'. . . but what do I type into the 'Replace With' box? (Lets say I want to add the text 'to extend the text')

So the final text will say
If my text contains
a few lines like this
and then I want to append after this line
to extend the text

Unknown said...

What a great post Mark! And you are so helpful, answering hundreds of new questions, OMG, you must have 155hr days...
Also interesting, your move to EMAC.

@Anonymous said... "Nice article!" July 31, 2008 at 1:44 AM (this anonymous, yes):

I built the "Text Master" for the very reason, yes:
No tool I found could handle what I needed to handle for productivity reasons: it wastes too much time to have to figure out IF AT ALL a certain tool can do ANY text modification one may need, and IF so, HOW...

So, around the time Mark published this great post, I ventured to build the ultimate text management tool: ANY text. Code. Whatever! In as many files as needed. And fully automatic. - I am that type of guy trying the unthinkable, yes -

So: When you think about ANY text editing needs, you realize there are
TWO REALMS of text edit needs:
I. Language
II. Content

and there are FOUR MODALITIES of text edit needs, no more, no less:
1. Fixed Text for Varying Text
2. Varying Text for Fixed Text
3. Fixed Text for Varying Text
4. Varying Text for Varying Text

and there are THREE VARIETIES of text edit needs (that have nothing to do with I. Language):
a) REPLACE (fixed or varying!) text with other (fixed or varying!) text
b) MOVE (fixed or varying!) text to different place of (fixed or varying!) text
c) COPY (fixed or varying!) text to different place of (fixed or varying!) text

1a. and 2a. any (quality) text editor can do, Notepad++ of course as well.
3a+b+c. and 4a+b+c. however no text editor I know of can do, for understandable reasons, lol.

Why I ventured into building the Text Master was: We often need to edit VARYING TEXT for VARYING TEXT, the pinnacle of text modifications...!
While in our mind we will be thinking "that goes there, that goes there,...etc", we can't even imagine to come up with a SYNTAX for our vague thoughts to do it automatically, hundreds or thousands of times, right?

Maybe what helped me to finally figure it out was my studies of Methodology at Uni, which was about coming up with suitable syntax, for anything, yes.

So, my Text Master can do ALL FOUR MODALITIES and each in ALL THREE VARIETIES. And fully automatic. In an unlimited number of files.

If "Anonymous from July 31, 2008 at 1:44 AM", or you Mark, or anyone else here needs some text modification that you feel none of your editors can do, and you want to save time having to do it manually... you can send me an email and I will see if I can do it nonetheless, lol. ;-)

At least I haven't come across a case yet that I couldn't handle with my Text Master. I found a bug though, but working around the bug is easier than going back to coding...

Anyway, hopefully my above structuring of "text edit needs" helps everyone: It's so much easier to solve a problem if we first systematize the case, right?
There's more to this of course, but the above should do.

Unknown said...

Funnily blogger says "Unknown said..." although I commented with my full gmail profile, even photo, lol. Seems blogger prefers "Unknown"?

Unknown said...

Hi, Mark. Can you help me with my question?
I have thousands of database lines with similar code:

<span id="docs-internal-guid-cebe059c-4c4e-3c45-ec54-72d23903401f"><span>Производитель:</span></span></p>\r\n </td>\r\n <td style=" padding: 3px;">

I need to completely remove this text: id="docs-internal-guid-cebe059c-4c4e-3c45-ec54-72d23903401f"

The initial text of this line is the same for all: (id="docs-internal-guid)
But after this text, the code in all the lines is different. That is, changing the ID.

To remove unnecessary text in Notepad ++ I use this mask: (id="docs-.*)(")
But the program selects text to the last (") in this line which is located here: style=" padding: 3px;">

How can I limit the range to the first closing tag (")? For just this ID: (id="docs-internal-guid-cebe059c-4c4e-3c45-ec54-72d23903401f")

Unknown said...

Maybe not the easiest solution to my problem, but I used this mask to search for the text I needed: (id="docs-internal-guid-([a-z,0-9]+)-([a-z,0-9]+)-([a-z,0-9]+)-([a-z,0-9]+)-([a-z,0-9]+)")

Mark Antoniou said...

Hi Sergii, I am glad that you found a solution. There are more elegant solutions, for sure, but it really depends on the text that you need to work on re: how many constraints need to be built in to the search expression. This is how I would do it based on the text samples you have provided:

Search for: (.*)id=.*(><span.*)
Replace with: \1\2

Unknown said...

Thanks, Mark.

praosv said...

...
I want to edit the attributes of 3rd table cell with find and replace
regular expression.So, I want firstly regex pattern opening tag only and
replace function.Secondly, for the entire string with open and close tags
and replace function. primarily for Textpad otherwise Notepad++.
I find very less regular expressions for tables. where can I find.
I am not a programmer nor have knowledge of languages.

Mark Antoniou said...

praosv, can you paste a small snippet of text, and then show me what you want it to look like (like a before and after kind of thing)?

praosv said...

Find and replace a table's nth cell with attributes by regular expression
<tr align="" valign="top">
<td align="" class="xxx">some text1</td>
<td align="" class="xxx">some text2</td>
*<td align="" class="xxx"><span style="">some text3 </span></td>
<td align="" class="xxx">some text4</td>
<td align="" class="xxx">some text5</td>
</tr>
I want regex pattern for the 3rd table cell, and replace it to
<td align="" class="zzz"><span style="">some text3 </span></td>
or totally
<td align="" class="zzz"><span style="yyy">some other text </span></td>
thank u
is it correct way of posting html tags.

Mark Antoniou said...

Ok. I am on a Mac, using Atom, but the logic is pretty easy. You need to be able to include a newline character in the search term (in Atom, that is \n). Notepad++ isn't the best for this. TextPad looks like it also uses \n. Here you go:

Search for: ({tr align.*\n{td align.*\n{td align.*\n{td align="" class=").*("}{span style=")(".*)
Replace with: $1zzz$2yyy$3

(I replaced all instances of < and > with { and } because Blogger was driving me insane. I'd love to know what you did to get your comment above to show the html tags normally.)

This will turn this:

{tr align="" valign="top"}
{td align="" class="xxx"}some text1{/td}
{td align="" class="xxx"}some text2{/td}
{td align="" class="xxx"}{span style=""}some text3 {/span}{/td}
{td align="" class="xxx"}some text4{/td}
{td align="" class="xxx"}some text5{/td}

Into this:

{tr align="" valign="top"}
{td align="" class="xxx"}some text1{/td}
{td align="" class="xxx"}some text2{/td}
{td align="" class="zzz"}{span style="yyy"}some text3 {/span}{/td}
{td align="" class="xxx"}some text4{/td}
{td align="" class="xxx"}some text5{/td}

I've never used TextPad, but from what I see in the documentation, this should work, but you will need to change the Replace term to this:

Search for: ({tr align.*\n{td align.*\n{td align.*\n{td align="" class=").*("}{span style=")(".*)
Replace with: \1zzz\2yyy\3

Let me know if it works.

praosv said...

thank you but it is neither working in notepad++ nor in textpad

Mark Antoniou said...

It does.

I just borrowed a Windows computer, installed TextPad and tried it. It works perfectly without needing any changes. TextPad requires you to place the cursor at the beginning of the text file (i.e. it searches forward). Make sure you haven't placed your cursor after the text pattern you are searching for. Also, make sure that Regular Expression is checked in the search box (I had Match case and Regular expression both checked).

Search for: ({tr align.*\n{td align.*\n{td align.*\n{td align="" class=").*("}{span style=")(".*)

Obviously, when you do the search you need to replace { with < and } with > in your text editor search box. Blogger won't let me post the html tag. (Again, I'd love to know how you got it to do so! Pretty please, with a cherry on top.)

I also discovered that both of these will work
Replace with: $1zzz$2yyy$3
or
Replace with: \1zzz\2yyy\3

praosv said...

thank u Mark.
It worked on Textpad.
after replacing the curly brackets with parenthesis.
but the find is highlighting all the three cells including
table row. but it should highlight only third cell. so, it
may need some more refining.

Mark Antoniou said...

No, that’s the whole point of building in multiline constraints into the search term.

The replace term puts back the other lines where they should go. That’s what $1, $2, $3 does. Try it. You’ll see.

praosv said...

Ok. Then what will be the regular expressions for other table cells.
I mean 1st, 2nd , 4th and 5th etc.,

Mark Antoniou said...

I don’t understand. Do you want to edit all of the cells?

praosv said...

yes Mr.Mark,
separate regular expression for each cell.
thank you

besayi said...


work award to Ramin Fallah

winfo solutions said...

Thank you for this wonderful and much-required information. Best Oracle Software Testing Tool

besayi said...



p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p12
p13
p14
p15
p16
p17
p18
p19
p20
p21
p37
p22
p23
p24
p25
p26
p27
p28
p29
p30
p31
p32
p33
p34
p35
p36

besayi said...





p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p12
p13
p14
p15
p16
p17
p18
p19
p20
p21
p37
p22
p23
p24
p25
p26
p27
p28
p29

Crackglobal said...

Please also visit my website and tell us what you think.
notepad-crack
windows-11-download-iso-crack
rhinoceros-crack
xnviewmp-crack

webbyone2345 said...

Hi

I am looking for how to search the below.

I know first letter and not the next 8. Then I know the rest.

for example

z********l&Tooldes.doo

So I want to find for example (in a large document

z565kJd%9l&Tooldes.doo
zkW£98&9Ql&Tooldes.doo
z76jkED!"l&Tooldes.doo

Many thanks

Col

Shawn said...

Hi, I know this is a pretty old post. But I just want to mention I recently created a find-and-replace tool (and I intend to make it to do more), which can edit text and tables inside and outside of the application.

I'm not sure whether you are still in need of this kind of tool. If you do, perhaps you have some needs that can't be statisfied by existing tools. It would be great if you can share them with me and I am more than happy to incorporate them into my application.
You can check it out at texcel.app.

«Oldest ‹Older   401 – 471 of 471   Newer› Newest»