29 June 2008

Notepad++: A guide to using regular expressions and extended search mode

The information in this post details how to clean up DMDX .zil files, allowing for easy importing into Excel. However, the explanations following each Find/Replace term will benefit anyone looking to understand how to use Notepad++ extended search mode and regular expressions.

If you are specifically looking for multiline regular expressions, look at this post.

You may already know that I am a big fan of Notepad++. Apparently, a lot of other people are interested in Notepad++ too. My introductory post on Notepad++ is the most popular post on my speechblog. I have a feeling that that is about to change.

Since the release of version 4.9, the Notepad++ Find and Replace commands have been updated. There is now a new Extended search mode that allows you to search for tabs(\t), newline(\r\n), and a character by its value (\o, \x, \b, \d, \t, \n, \r and \\). Unfortunately, the Notepad++ documentation is lacking in its description of these new capabilities. I found Anjesh Tuladhar's excellent slides on regular expressions in Notepad++ useful. After six hours of trial and error, I managed to bend Notepad++ to my will. And so I decided to post what I think is the most detailed step-by-step guide to Search and Replace in Notepad++, and certainly the most detailed guide to cleaning up DMDX .zil output files on the internet.

What's so good about Extended search mode?

One of the major disadvantages of using regular expressions in Notepad++ was that it did not handle the newline character well—especially in Replace. Now, we can use Extended search mode to make up for this shortcoming. Together, Extended and Regular Expression search modes give you the power to search, replace and reorder your text in ways that were not previously possible in Notepad++.

Search modes in the Find/Replace interface

In the Find (Ctrl+F) and Replace (Ctrl+H) dialogs, the three available search modes are specified in the bottom right corner. To use a search mode, click on the radio button before clicking the Find Next or Replace buttons.

Cleaning up a DMDX .zil file

DMDX allows you to run experiments where the user responds by using the mouse or some other input device. Depending on the number of choices/responses (and of course the kind of task), DMDX will output a .zil file containing the results (instead of the traditional .azk file). This is specified in the header along with the various response options available to the participant. For some reason, DMDX outputs the reaction time twice—and on separate lines—in .zil files. Here's a guide for cleaning up these messy .zil files with Notepad++. Explanations of the Notepad++ search terms are provided in bullet points at the end of each step.

Step 1: Backup your original result file (e.g. yourexperiment.zil) and create a copy of that file (yourexperiment_copy.zil) that we will edit and clean up.

Step 2: Open yourexperiment_copy.zil in Notepad++ (version 4.9 or later).



Step 3: Remove all error messages.All lines containing DMDX error messages begin with an exclamation mark. Let's get rid of them.

Bring up the Replace dialog box (Ctrl+H) and select the Regular Expression search mode.

Find what: [!].*

Replace with: (leave this blank)

Press Replace All. All the error messages are gone.


  • [!] finds the exclamation character.

  • .* selects the rest of the line.

Step 4: Get rid of all these blank lines.

Switch to Extended search mode in the Replace dialog.

Find what: \r\n\r\n

Replace with: (leave this blank)

Press Replace All. All the blank lines are gone.



  • \r\n is a newline character (in Windows).

  • \r\n\r\n finds two newline characters (what you get from pressing Enter twice).


Step 5: Put each Item (DMDXspeak for trial) on a new line.

Switch to Regular Expression search mode.

Find what: (\+.*)(Item)

Replace with: \1\r\n\2

Press Replace All. "Item"s have been placed on new lines.



  • \+ finds the + character.

  • .* selects the text after the + up until the word "Item".

  • Item finds the string "Item".

  • () allow us to access whatever is inside the parentheses. The first set of parentheses may be accessed with \1 and the second set with \2.

  • \1\r\n\2 will take + and whatever text comes after it, will then add a new line, and place the string "Item" on the new line.

So far so good. Our aim now is to delete duplicate or redundant information (reaction time data).


Step 6: Remove all newline characters using Extended search mode, replacing them with a unique string of text that we will use as a signpost for redundant data later in RegEx. Choose a string of text that does not appear in you .zil file—I have chosen mork.

Switch to Extended search mode in the Replace dialog.

Find what: \r\n

Replace with: mork

Press Replace All. All the newline characters are gone. Your entire DMDX .zil file is now one very long line of (in my case word-wrapped) text.



Step 7: We're nearly there. Using our mork signpost keyword, let's separate the different RT values.

Stay in Extended search mode.

Find what: ,

Replace with: ,mork

Press Replace All. Now, mork appears after every comma.


Step 8: Let's put the remaining Items on new lines.

Switch to and stay in Regular Expression search mode for the remaining steps.

Find what: mork(Item)

Replace with: \r\n\1

Press Replace All. All "Item"s should now be on new lines.



Step 9: Let's get rid of those duplicate RTs.

Find what: mork ([^A-Za-z]*)mork [^A-Za-z]*\,mork

Replace with: \1,

Press Replace All. Duplicate reaction times are gone. It's starting to look like a result file :)



  • A-Z finds all letters of the alphabet in upper case.

  • a-z finds all lower case letters.

  • A-Za-z will find all alphabetic characters.

  • [^...] is the inverse. So, if we put these three together: [^A-Za-z] finds any character except an alphabetic character.

  • Notice that only one of the [^A-Za-z] is in parentheses (). This is recalled by \1 in the Replace with field. The characters outside of the parentheses are discarded.

Step 10: Let's get rid of all those morks.

Find what: mork

Replace with: (leave blank)

Press Replace All. The morks are gone.



Step 11: Separate each participant's data from the next.

Find what: (\**\*)

Replace with: \r\n\r\n\1\r\n\r\n

Press Replace All. The final product is a beautiful, comma-delimited .zil result file that is ready to be imported into Excel for further analysis.



Notepad++, is there anything it can't do?


Please post your questions in the comments below, rather than emailing me. This way, others can refer to my answers here, saving me many hours of responding to similar emails over and over.

Update 20/2/2009: Having trouble understanding regexp? I have created a new Guide for regular expressions. Check it out.

471 comments:

«Oldest   ‹Older   201 – 400 of 471   Newer›   Newest»
Mscarfix said...

Hi Mark,
I did paste my code in but after blog "publish" they vanished. I'll try it again. I "Find in Files", .*. In the "Find result" pane 262 hits. What is an easy way I can compile those hits into one file?

Thanks in advance for your help.
Mscarfix

Mscarfix said...

Mark: The greater and less than sign vanished again. I have spelled it out below:

Less than sign caution greater than sign .* Less than sign forward slash caution greater than sign.

How do I get the greater or less than signs to show up in your blog?

Thanks ever so much!
Mscarfix

Mark Antoniou said...

Blogger allows users to insert HTML tags into comments so that text may be formatted, or links inserted. What is happening is that Blogger does not recognize the caution tag and returns an error. You can get around this by inserting spaces like so:
< caution > .* < /caution >

And it's always a good idea to hit Preview before hitting Publish.

So, now that we've got that sorted out, let me see a few lines of your code and I'll try to give you a solution.

As for taking snippets of text from multiple files and inserting them into new files (or a single file), you are going beyond a simple regular expression and into the world of scripting. If you do not have a particular language in mind, you could automate some of the repetitive steps by using something like Auto Hotkey (or something similar), but you are not going to do it with Notepad++.

Mscarfix said...

Here is a few lines of the XML code. Search for "caution". I'd like to find all occurrences of caution.
----
Cxxxx3xxxxxx-xxx-xxx.1Axxx30003P 3xxxx1Sxxxx Vxxx3xxxPxxxx.1These procedures require the use of Hazardous Material xxxThe fxxxxxxx.Lxxxxxx.
-----

Mark Antoniou said...

There are no occurrences of caution in this text.

Mscarfix said...

< spare > < nomen >Cxxxx< /nomen > < identno > < mfc > 3xxx < /mfc > < pnr>xxx-xxx-xxx < /pnr > < /identno > < qty >.1 < /qty > < /spare > < spare > < nomen > Axxx< /nomen> < identno > < mfc>30003 < /mfc > < pnr >P 3xxxx < /pnr> < /identno > < qty >1 < /qty > < /spare > < spare > < nomen >Sxxxx Vxxx < /nomen > < identno > < mfc > xxx < /mfc > < pnr > Pxxxx < /pnr > < /identno >< qty> .1< /qty> < /sparesli > < /spares > < safety > < safecond > < caution > < para > These procedures require the use of Hazardous Material xxx < / para > < / caution > < / safecond > < / safety > < / prelreqs > < step1 id="s1" > < note > < para >The fxxxxxxx. < / para > < / note > < para > Lxxxxxx. < / para > < / step1 >

Mscarfix said...

Thank you for your responses Mark. I just now got them. Have you had any experience with "TextPad"? Will TextPad get the job done or do I still have to resort to 1) a scripting language; 2)Auto Hotkey

Feel free to delete my postings that blogger didn't recognize.

Thanks again for your responses. I sure appreciate them. (Now I don't feel so alone in solving this.) :-)

Mark Antoniou said...

Ok, so if you want to keep whatever occurs within the caution tags, this is a fairly straightforward regular expression.

Search for (regexp mode): .*< caution >(.*)< / caution >.*
Replace with: \1

This will give you this

< para > These procedures require the use of Hazardous Material xxx < / para >

Mscarfix said...

Hi Mark:
You suggested: Search for (regexp mode): .*< caution >(.*)< / caution >.*
Replace with: \1

When I Find and Replace in RegEdit mode, what it does on my computer is deletes the caution. Not sure what I'm doing wrong. I can get NotePad++ to find the desired caution. So, I'm thinking the next step is to find more about the "Auto Hot Key" that you suggested before. Can I program the key to "Paste Special/Copy Binary Content" to another file? Since I have not written a script, I don't mind doing a little manual processing on my part, but there's got to be a better way than finding and then manually copying. Can I "Mark" or "Bookmark" somehow and then copy the marked cautions?

Thanks again in advance for any knowledge. You are terrific!

-Mscarfix

Mark Antoniou said...

If you want to keep the caution tags, change the search term to this
.*(< caution >.*< / caution >).*

As for copy+pasting the left over text into another file or program, that goes beyond regular expressions in any text editor. I'd recommend running the regular expression first, then using Auto Hotkey on Windows or Automator on OSX to automate the repetitive steps for you.

Good luck with it.

Mscarfix said...

Is there a way to salvage the Cautions and blow everything else away? If not, I have a tiny little bit of Perl experience, so perhaps I can figure out a Perl script to capture everything in a Caution statement and combine all the Cautions into file.

Thank you very much for sharing your time and valuable knowledge. :-)

Best to you!
-Mscarfix

Mark Antoniou said...

Search for (regexp mode): .*(< caution >.*< / caution >).*
Replace with: \1

Mscarfix said...

Thank you so much for your patience with me. IT FINALLY WORKED! You are AWESOME! I'm forever indebted to you.

Thank you so, so, so much! *Big Smiles*!
-Mscarfix

crmanikandan said...

How to change // word1 word2 word3
to /* word1 word2 word3 */

i had used
Find : [////](.+)
Replace : //*\1*//

but it match single / too present in other lines :(

Mark Antoniou said...

Search for (regular expression mode): //(.*)
Replace with: /*\1 */

sahabe said...

I asked this question on a different forum, but got no answers. Hope to get answer from you guys.

I have hundreds of HTML documents named as 001.html 002.html 003.html and so on.

All of these HTML files contain multiple constant "F0" string. Also, these HTML files contain a single variable string {PABC}, where "ABC" refers to the title number. For example:

001.html has F0 string(s) and a string {P001}.

002.html has F0 string(s) and a string {P002}.

003.html has F0 string(s) and a string {P003}.

and so on. How do I replace all "F0" occurrences within these HTML files with their corresponding "{PABC}" strings? Let's say, I have 001.html that has multiple F0's and a "{P001}" in it. I want to replace all "F0" occurrences with a "{P001}". Same thing for 002.html, replacing F0's with a "{P002}". How can I perform a batch operation for all HTML files?

Is there any good Text editor that can do, if so, how?

Many Thanks!

Mark Antoniou said...

Anything can be done, but I need to see a few lines of text. How complicated the answer will be will depend on the recurring patterns in the text.

Could you paste 4-5 lines, and then show me what you want them to look like in the end.

ibedir said...

Here is the source text of 010.html:
==========
(sc_F1)ﮌﮎ()
(sc_F1)ﰸﰹ()

.sc_F0 {P010}

(class=sc_F0 style=''>ﭑﭒﭓ()
==========

The question then becomes, how do I replace "sc_F0 style" with ".sc_F0 {P010}" and so on for the rest of HTML files?

sahabe said...

I want above 010.html to look like this:

==

(sc_F1)ﮌﮎ()
(sc_F1)ﰸﰹ()

.sc_F0 {P010}

(class=.sc_F0 {P010}=''>ﭑﭒﭓ()
===

Similarly, say for 045.html, the source is:

(sc_F1)ﮌ()
(sc_F1)ﰸ()

.sc_F0 {P045}

(class=sc_F0 style='')ﭑ()

and I want it to take form:

(sc_F1)ﮌ()
(sc_F1)ﰸ()

.sc_F0 {P045}

(class=.sc_F0 {P045}='')ﭑ()

==============

Same thing let's say for 601.html, the source is:

(sc_F1)ﮎ()
(sc_F1)ﰳ()

.sc_F0 {601}

(class=sc_F0 style='')ﭘ()

Need it to look like this:

(sc_F1)ﮎ()
(sc_F1)ﰳ()

.sc_F0 {601}

(class=.sc_F0 {601}='')ﭘ()

sahabe said...

Mark, are you on it?

Mark Antoniou said...

I'm not 100% clear on what the problem is. From what I understand, a simple Find + Replace operation should do the trick for you. For instance, for 001.html, just replace F0 with {P001} and then click Replace All. Would that do the trick for the 001.html file?

sahabe said...

How do you do this for hundreds or thousand of HTML files? Yes, it is possible to do it one file at a time? However, this is what I want to avoid due to time it will take and possible mistakes.

Let me rephrase my task:

Code of 001.html is:

P001
F0 abc
F0 def

should take form:

P001
P001 abc
P001 def

==================
Code of 002.html is:

P002
F0 mmm
F0 nnn
F0 kkk

should take form:

P002
P002 mmm
P002 nnn
P002 kkk
=========================
...

Code of 885.html is:

P885
F0 mmm
F0 nnn
F0 kkk

should take form:

P885
P885 mmm
P885 nnn
P885 kkk
=========================

Hope it is clear now.

Thank you and waiting for your answer.

Mark Antoniou said...

I just wanted to be clear. So, this is not really a regular expression problem then, but more of a batch processing problem. In effect, you want to run a few thousand Find + Replace operations, one for each file.

Ok, so there are ways to automate such repetitive tasks, but a) I have never done anything on such a vast scale, and b) there is no text editor that would be able to handle such an operation.

The steps that you want to perform are:
Search for: FO
Replace with: XXX -where XXX represents the text/number that you want to insert
Then hit Replace All

Here are your options:
Option 1. Use some sort of keystroke automation software (such as AutoHotKey) to do the work for you one file at a time. This will work, but it will take time.
Option 2. You could create some sort of parser file (e.g., Perl) that will run through each .html file and replace F0 with the desired value for that script.
Option 3. You could outsource the job to someone to do it for you manually.

The challenge for Options 1 and 2 is automating the selection of the XXX, because you don't want to have to type it in for each file. There are a number of ways to do this: you could search within the file, you could pull it out of the filename, you could pull it out of an external list, or if the numbers are sequential without any gaps you could use a counter. I really cannot say any more than that.

Good luck, and I hope that this has at least given you some idea about how to proceed.

sahabe said...

Let me share the solution for my problem that I was finally able to perform it with a simple batch operation.
===
@echo off & setLocal EnableDelayedExpansion
for /f "tokens=1* delims=[]" %%a in ('find /v /n "" ^<001.html') do (set str=%%b
if not "%%b"=="" set str=!str:F0=P001 !
echo.!str!>> 001.txt
)
exit /b
====

Then, I generated a list on Excel for all HTML files, that look like this:

@echo off & setLocal EnableDelayedExpansion
for /f "tokens=1* delims=[]" %%a in ('find /v /n "" ^<001.html') do (set str=%%b
if not "%%b"=="" set str=!str:F0=P001 !
echo.!str!>> 001.txt
)
for /f "tokens=1* delims=[]" %%a in ('find /v /n "" ^<002.html') do (set str=%%b
if not "%%b"=="" set str=!str:F0=P002 !
echo.!str!>> 002.txt
)
for /f "tokens=1* delims=[]" %%a in ('find /v /n "" ^<003.html') do (set str=%%b
if not "%%b"=="" set str=!str:F0=P003 !
echo.!str!>> 003.txt
)
....

exit /b

===

That was it.

Thank you for your time and valuable advices.

Mark Antoniou said...

Glad you managed to find a solution.

sahabe said...

Mark, now I have right question for you, that I can't figure out.

I need a line in each of these HTML files to be placed on top of that file. How can I do it with regular expressions?

Such a search: kkdslkds* finds all occurring strings, that's fine, but then how do I place these strings on top of these files? Many thanks.

Mark Antoniou said...

This requires a few steps.

First, you would replace all newlines characters with a unique string that does not occur in the text, let's say NEWLINE, which would put all of the text on one line.

Then, you would use a regular expression to find the text that you are interested like this
Search for: (.*)(textgoeshere)(.*)
and in the replace term, you would move the text of interest to the front of the line like this
Replace with: \2NEWLINE\1\3

Finally, replace your unique text with a NEWLINE character to restore the original structure of the file.

sahabe said...

Yes, but then how do I automate it for thousands of files?
How do I generate a unique string for thousands of files? Maybe following will give a better idea.

001.html source:
lksdlsj
lksjds
65:99430
lksdmls

002.html source:
kjsndsk
lksflsmf
fskjfnskn
65:09i
lksdmls
dfd

002.html source:
eoijrfje
reorijfe
knknknkn
sderokfeokfr
65:3498u
lksdmlsre
dfderfe

....

885.html
hhskjd
65:o34u
lsfkfdk

Now, I want them to look like this [shifting 65:* to the top]:

001.html source:
65:99430
lksdlsj
lksjds
lksdmls

002.html source:
65:09i
kjsndsk
lksflsmf
fskjfnskn
lksdmls
dfd

002.html source:
65:3498u
eoijrfje
reorijfe
knknknkn
sderokfeokfr
lksdmlsre
dfderfe

....

885.html
65:o34u
hhskjd
lsfkfdk

==========================

I am really grateful for your help.

Mark Antoniou said...

My answer to the automation/batch problem is the same as 5 posts ago, although it seems that your batch solution was more elegant than what I had suggested anyway. So, you could do something similar.

As for moving the text to the top of each file, that is what the Replace term will do:
\2NEWLINE\1\3

The \2 refers to the second set of parentheses that contains the text that you want to move. By placing it at the beginning of the Replace term, it will be at the beginning of the text. Then in the next step when you reinstate the newline characters, it will be on line 1 in your file.

Unknown said...

Hey, thanks a lot for your post (even though it's now 4 years old!). I knew there was a way to use wildcard characters in Notepad++ but couldn't figure it out until I found this. Thanks for making it easy!

Jeffrey van Prehn said...

Is it possible to use the {n} syntax in notepad++ to specify the exact number of times for the preceeding item to match? If so, how? I tried finding [0-9]{3} in '123' and I expect it to match, but it doesn't.

Mark Antoniou said...

Jeffrey, no there isn't, not in Notepad++. I think it's time for you to migrate to a more serious text editor.

Anonymous said...

Hi Mark! Congratulation for your blog. One question. I must replace many many WORDS (100+) into many lines.
This is the lines (example):

1. la luna va forte come il vento
2. <>, basta non vedo
3. gamma
4. ...
5. ...

This are the words to replace with £ (example)

luna
<>
basta
come
gamma

Final result should be:

1. la £ va forte £ il vento
2. £, £ non vedo
3. £
4. ...
5. ...

How do I do?

Mark Antoniou said...

Do the lines ever repeat or does each line occur only once? You've shown lines 1-3. What do lines 1-10 look like?

Anonymous said...
This comment has been removed by the author.
Anonymous said...

the lines can also be repeated, but I'm interested in replacing just the words in the lines. I do not want to replace the lines, but different words in the lines.

Look here:

http://img836.imageshack.us/img836/1633/notepadreplacemanywords.png

Final Result:

http://img98.imageshack.us/img98/6792/resultse.jpg

Mark Antoniou said...

Haha, the arrows made me laugh. I understand what you want to do, but I am looking for the recurring patterns in the text, and whether your search words occur in other places where you don't want them to be replaced. Onto the solution:

Search for (regular expression mode): luna|<>|basta|come|gamma
Replace with: £

Anonymous said...

Thank you..it works!

ok..I want to ask you please about the plugin for N++ Multiline find and Replace
It works by replacing lines in sequence, one below the other:

http://img407.imageshack.us/img407/5337/multilinefindandreplace.png

but if you insert another word in the middle of words that I'm searching, the plugin don't replace because It sees the lines as a block.
Is there a way to eliminate many lines (all those that I write in the search field)?

http://img842.imageshack.us/img842/5337/multilinefindandreplace.png

Mark Antoniou said...

I am not familiar with that plug in. I haven't used Notepad++ for multi-line regular expressions in a long time.

From your screenshot, it appears that your search term does not match what is in the text. That's why it won't work. It's like searching for "boy boy girl" on a single line, but the text is "boy girl boy" and then asking why doesn't it work. It doesn't work because it doesn't match what is in the text.

Anonymous said...

But if i want eliminate "boy girl boy" ? Is there a solution?
Thanks

Mark Antoniou said...

You have to change your search term so that it matches what is in the text. This isn't a regular expression issue. It's a Find issue. You are searching for text that is not there. To correct this:

In the boy vs girl example, you would search for "boy girl boy".

In your pic, you would move to the bottom line.

Anonymous said...

ok..
Can i use boolean operators AND, OR, NOT with Notepad or plugin for N++ to eliminate lines?
Like here:

http://textmechanic.com/Remove-Lines-of-Text-Containing.html

What is the most powerful Editor that allows you to do everything?

Thanks

Mark Antoniou said...

Not exactly.

AND is implied. Say if you want boy AND girl, and they're not next to each other, and boy occurs first, you could search for boy.*girl

OR is what | is for, i.e., boy|girl

NOT is probably more difficult. You could use negating, but it becomes very complicated very quickly. Depending on the structure of your text, you could just find the word that you would put in the NOT condition and just not delete it by recalling it in the replace term.

In my opinion, the most powerful text editor is Emacs. Some say Vi, but it has a steep learning curve. And if you could use Vi, you wouldn't be reading this blog.

Good luck

Unknown said...

This was very useful, thanks Mark. :)

Anonymous said...

Hi Mark.
I have this html text:

http://img403.imageshack.us/img403/6326/spoiler1.png

I want transform with Regex or Macros into this:

http://img36.imageshack.us/img36/548/spoiler2.png

Is possible solve this problems with N++ or Emacs or Excel etc?
Spoiler is sequential. Every 2 spoilers, the number increases by one becoming spoiler2, spoiler3, spoiler4 ...
At the same time, as you see, I want to insert below

style = "display: none">

one line of some words. But in every line that has

style = "display: none">

the contents of the rows below is different, not equal.
The content of the first insert (www.google.com) is different from the contents of the second insertion.(www.yahoo.it) The content of the third insertion is different from the first and second.(www.coca-cola..)
Can i create macros to solve this problem(s)?

Mark Antoniou said...

Incrementing the numbers for each spoiler pair is possible but messy.
Step 1: You would remove the line breaks from your text file, and replace them with some unique string, let's say NEWLINEHERE.
Step 2: Then create line breaks for each second occurrence of spoiler.
Step 3: Then you could insert line numbers using Notepad++'s column editor.
Step 4: Then use regexp to move that number after each occurrence of spoiler on that line.
Step 5: Then put all the line breaks back.
I've posted similar solutions before in these comments (see my comment from July 19, 2011).

Inserting the URLs is even more difficult, given that they are different for each occurrence. If you have them typed down already, you could take the text from Step 3 above, in which each spoiler pair is still on one line. Copy and paste it into Excel in column A. Then paste your URLs into column B, placing NEWLINEHERE wherever there will be a new line in the text from column B (at the beginning, and wherever else needed). Then select columns A and B and paste them back into Notepad++, and follow on from Step 4 above.

As with anything, it can be done, but the question is is it worth the time? This is really going beyond a simple regular expression now.

Anonymous said...

The process is tricky..but the time is essential because i have to modify html code and i must work many substitutions of code that follow patterns.
I will try to follow your instructions.
Thanks

Anonymous said...

mmm..Mark..I try copy html code in excel but it recognizes the language and it returns a code other than textual code. I also tried to use cell format but it is useless. I noticed that every 13 lines you can enter the http link, but if I copy the text code in excel, it converts it to me.
http://img696.imageshack.us/img696/2039/excelproblem.png
Any idea to solve the problem?

Mark Antoniou said...

That is a bit annoying. You could use a regular expression to insert a single apostrophe at the beginning of each line before pasting into Excel. So each line will look like this:

'your text goes here

That should prevent Excel from changing it.

Anonymous said...

I solve Excel problem linke this:
(look shot video)

http://nathan3000.altervista.org/Test_Vari/solve_import_code_in_excel.htm

But why Excel have importing problems?

Mark Antoniou said...

I'm not sure. I couldn't replicate your problem. For me, there was no problem pasting it the first time staight into Excel without an apostrophe.

Glad you worked it out.

Afsana.BD said...
This comment has been removed by the author.
Mark Antoniou said...

Afsana.BD,
If the spaces you refer to are space characters (i.e., what you get when you press the Space bar), then you could

Search for (normal search mode): <-- press Space bar twice
Replace with: nothing, leave blank

If the spaces are blank lines (i.e., what you get when you press the Enter key), the you could

Search for (extended search mode): \r\n\r\n
Replace with: nothing, leave blank

Afsana.BD said...

Your suggestion was really helpful! Thanks a LOT :)!

Anonymous said...

Hi Mark,
How Can I copy highlighted lines like this?

http://img96.imageshack.us/img96/4524/highlightlines.png

The lines are not bookmarked. Which regex?

Mark Antoniou said...

I don't understand what you mean by bookmarked. Also, it is not clear to me what you are using to select the lines of interest. What is common about them? If you just want to delete the Anos, Dubstep and 90's Dance lines, then do 3 regexp searches for

.*Anos.*
.*Dubstep.*
.*90's Dance.*

and replace with nothing.

Anonymous said...

http://img515.imageshack.us/img515/6994/highlightedlines.png

I want copy only highlighted lines..is possible in Notepad++?

Mark Antoniou said...

This may seem obvious, but if the lines are highlighted, you can copy them by pressing Control+C. There is no regexp term that takes highlighting into account.

To use a regexp to find the beginning of the word "mark", you would search for \r\n\r\n in extended search mode and take it from there.

Anonymous said...

sorry for the misunderstanding, I do not know how to explain.. the lines are not selected, but only colored

http://img39.imageshack.us/img39/5752/selectedvscolored.png

Anonymous said...

Copy to clipboard style token lines in N++?

In N++ you can select some text and apply style token. When you are using compare plugin it generates difference between left text and right text applying different style token. Can i copy this differences? The difference are highlighted by style token

http://img96.imageshack.us/img96/4524/highlightlines.png

Mark Antoniou said...

So, essentially you want to compare two documents. The video you posted seems like a lot of work in order to do this in Notepad++.

Why not just use the Compare function in Microsoft Word?

It will do what you want and only takes one click.

Anonymous said...

Word is not very practical .. I solved with EditPad Pro but probably has a few bugs in the comparison of files that is inaccurate, but it allows to show the differences on different windows (very useful). Compare plugin for Notepad + + needs to be improved by its developer.
But is possible copy style token lines in Notepad++ without compare the text?
Thanks anyway

Anonymous said...

Hi Mark!
How can i copy the lines, or words observed the numbers of rows?

1. one
2. two
3. three
4. four

I want copy lines 2-4. not lines 1-3. I want maintaining the structure of lines:

1.
2. two
3.
4. four

http://img688.imageshack.us/img688/6109/linenumbers.png

I'm using Excel or Notepad++

Mark Antoniou said...

Notepad++ isn't much help for problems like this. There is no recurring text, so you need to use the linebreaks. I would use Emacs for this.

Search for: .*
\(.*\)
.*
\(.*\)

Replace with:
\1

\2

Note: insert newlines into your search and replace terms in Emacs by pressing Ctrl-Q Ctrl-J.

Anonymous said...

I'm trying to use Emacs, but I find cryptic as a program .. I do not know where to start .. But it is a program that works for commands? I tried to enter so, but does not give me anything ..

http://img864.imageshack.us/img864/4569/emacs.png

Forgive my ignorance, but I can not use emacs .. you give me a hand?

Mark Antoniou said...

Don't be overwhelmed by the interface. If you use it a few times, you get used to it very quickly. Just as in Notepad++, you need to select the "Replace" option from the menu. In Emacs, click on Edit | Replace | Replace Regexp...

Then it will say "Query replace regexp:" at the bottom, and you just start typing this

.*Ctrl-QCtrl-J
\(.*\)Ctrl-QCtrl-J
.*Ctrl-QCtrl-J
\(.*\)Ctrl-QCtrl-J

then press enter, and it will ask you what you would like to replace this with, so you type:

Ctrl-QCtrl-J
\1Ctrl-QCtrl-J
Ctrl-QCtrl-J
\2

Then press enter, and Emacs will highlight the text that matches your search term. Press Y to replace one at a time, or press ! (Shift-1) to replace all.

If you do it right, this will be left over:


two

four

Anonymous said...

mmm.. i have some problem..look here:
http://nathan3000.altervista.org/Test_Vari/emacs_replace_problem.htm

Mark Antoniou said...

You do not type Ctrl-Q and Ctrl-J.

Ctrl-Q means hold down the control key and then press Q

Ctrl-J means hold down the control key and press J

When you press Ctrl-Q Ctrl-J, it will insert a newline character (like pressing the Enter key).

Anonymous said...

it works..thank you. Nice program Emacs..

the moderator said...

I'm trying to manipulate some xml data...
I'm hoping you can help my mind straighten out.

Lets say I have this:
value
value
value
value

And I need to do a find in files > replace in files to achieve this:


value
value


value
value


---
The reason is, I have these old xml files that I need to be able to be red by a new parser and the paths to the tags don't match but the tags do.

Please help! I'm going nuts and they shut of the A/C here at work hours ago...

Mark Antoniou said...

Hi mod,
no problem, this is actually fairly simple if you use a powerful text editor like Emacs.

Search for: \(.*\)
\(.*\)
\(.*\)
\(.*\)

Replace with:


\1
\2


\3
\4

And that'll do it. Note that in order to create a NEWLINE character in your search and replace expressions in Emacs, you need to press Ctrl-Q then Ctrl-J. Ive just explained this to someone else if you look at my comment above.

If you insist on using Notpad++, things become a bit more complicated because you will need to replace the NEWLINE characters with something else (a unique string such as REPLACEMELATER), then move your text around, and reinstate the NEWLINE characters. There are lots of comments above explaining how to do this, but trust me it would be a lot easier to use the one line solution that I've suggested for you.

Unknown said...

lets say we have something like this:




...
.
we have just one line without \r \n.

we need to extract just URLs from the href only.
URL1
URL2
URL3
...
URLn

Find:(.*)(href=")(" target=.*)
Replace with: \1\r\n\3\r\n

this will just extract the URLn and put the rest above. because Notepad++ using the direction Down 2 up. the regrex is working fine but i have to click again and again to replace all.

the Syntax (href=")(" target=.*) should be reused in the \1 term. the ((href=")(" target=.*))+ is not working, even {}+ ...

Thanks in advance.
Jones

Unknown said...

html is ignored. here is the example egain.

a blablabla href="URL1" target="URL1" blabla blakjkjj /a br/ a blablabla href="URL2" target="URL2" blabla blakjkjj /a br/ a blablabla href="URL3" target="URL3" blabla blakjkjj /a br/ ... a blablabla href="URLn" target="URLn" blabla blakjkjj /a br/

Mark Antoniou said...

Why not replace href with \r\n in Extended search mode, then use a regular expression to clean up the lines that remain?

Step 1
Search for (extended search mode): href
Replace with: \r\n

Step 2
Search for (regexp mode): .*"(.*)".*
Replace with: \1

afr said...

Is there any way to replace a portion of text with the same portion twice ?

An example: would be replace all color codes on a text like:
"background-color: #660000;"
"background-color: #770000;"
"background-color: #880000;"
"background-color: #990000;"

replaced by background-color: #THE COLOR CODE; color: #THE COLOR CODE;

How this can be done ???

Mark Antoniou said...

Absolutely. In your search term, you can store a portion of text in a bank by enclosing it in parentheses like so:

Search for (regular expression mode): .*(color: )(#.*)"

In your example, color: goes into bank 1, and the color code goes into bank 2. Then, it is just a matter of inserting bank 2 twice into your replacement term like this:

Replace with: \2 \1\2

which will give you this:

#660000; color: #660000;
#770000; color: #770000;
#880000; color: #880000;
#990000; color: #990000;

Anonymous said...

Hi mark, Is there a solution for replacing many words with other many words?
I can't use the | symbol. Look here please

http://img41.imageshack.us/img41/6760/replacemanywordswithoth.png

Mark Antoniou said...

Unfortunately, you cannot use the | symbol in this way. It functions as an OR statement, and you can't ask a text editor to replace your text with this OR that. It is after all just a dumb text editor that can't make decisions.

The problem you are facing is complicated because the text that you want to insert on each line is unique. So, you somehow need to get the text into the file and from there you can rearrange it.

So, step 1 is to get your replacement text typed out somehow, in order, with line breaks. Without that you're pretty screwed.

Step 2 is to get it into your text file. You could use Notepad++'s Column Editor mode or even use Excel (column A contains original text, column B contains replacement text or vice versa) and then export to txt format.

Step 3 is to use a regexp to replace your text.

Unfortunately, none of this is fast or easy.

Anonymous said...

Finally i solve this problem..I use this method:
http://nathan3000.altervista.org/Test_Vari/replace_with_sed.htm

Mark Antoniou said...

That is a very complex workflow that you've developed. Impressive. Glad that you managed to get it done!

Anonymous said...

Thanks Mark..but i have another question 4 you..sorry.
I have posted my question here but nobody answered me yet.
http://stackoverflow.com/questions/11405550/insert-bookmark-to-next-line-in-notepad

Is there a solution?

Mark Antoniou said...

Just use Autohotkey. Something like this will do it.

Send {F2}
Sleep 100
Send {down}
Sleep 100
Send ^{F2}
Sleep 100

Anonymous said...

thanks..i solve with this script:
WinWait, *new 2 - Notepad++,
IfWinNotActive, *new 2 - Notepad++, , WinActivate, *new 2 - Notepad++,
Sleep, 100
Send, {F2}{DOWN}{CTRLDOWN}{F2}{CTRLUP}

Anonymous said...

Hi mark. Is possible apply same macro, same replacement on multiple files with Notepad++?

Unknown said...

@Nathan J.

use the option :
"Replace All in All opened Documents"

(ctrl+H) and you will see it.

Anonymous said...

yes..to replace ok, but if i want apply a MACRO on All opened document in N++?

Ryan Perry said...

Hi Mark. I hope you can help me.

I'm trying to replace columns of text.. without interfering with the rest.

I have a CSV file, which raw looks like this:

LC*,00:24:E8:C3:16:E5,Migration Scenario,,SAS JMP x86 9.0.3 en-US,Microsoft Access 2010 1.0 en-US,Microsoft OFFLINE Dynamics CRM 4.0 for MS Office Outlook 4.0 en-US
LC*,5C:26:0A:19:2C:BC,Migration Scenario,,Airtel Airtel x64 21.00 en-US,Microsoft Access 2010 1.0 en-US,Microsoft OFFLINE Dynamics CRM 4.0 for MS Office Outlook 4.0 en-US,SAP Gui x64 7.20.9 en-US
LC*,D4:BE:D9:18:71:73,Migration Scenario,,Airtel Airtel x64 21.00 en-US,Microsoft Access 2010 1.0 en-US,Microsoft OFFLINE Dynamics CRM 4.0 for MS Office Outlook 4.0 en-US,SAP Gui x64 7.20.9 en-US
LC*,00:26:B9:F6:C7:1F,Migration Scenario,,Airtel Airtel x64 21.00 en-US,Apple iTunes x64 10.6.3.25,IBM Lotus Notes 6.5.5 en-US,Microsoft Access 2010 1.0 en-US,SAP Gui x64 7.20.9 en-US
LC*,5C:26:0A:63:32:50,Migration Scenario,,Airtel Airtel x64 21.00 en-US,Apple iTunes x64 10.6.3.25,IBM Lotus Notes 6.5.5 en-US,Microsoft Access 2010 1.0 en-US,SAP Gui x64 7.20.9 en-US
LC*,00:24:E8:E3:96:AE,Migration Scenario,,IBM Lotus Notes 6.5.5 en-US,Microsoft Access 2010 1.0 en-US,SAP Gui x64 7.20.9 en-US
LC*,5C:26:0A:4C:08:F5,Migration Scenario,,IBM Lotus Notes 6.5.5 en-US,Microsoft Access 2010 1.0 en-US,SAP Gui x64 7.20.9 en-US
LC*,5C:26:0A:46:6B:95,Migration Scenario,,Microsoft Access 2010 1.0 en-US,SAP Gui x64 7.20.9 en-US

What I want to do is replace all commas up to and including the double comma with semi-colons. Everything after the double-comma (or double semi-colon after replace) should be separated by commas still.

I'm trying to use the column select and then find and replace... but it seems to only want to replace all the commas, not just the ones in the selected columns.

Is there an easy way to do this?


Thanks,

Ryan

Mark Antoniou said...

Hi Ryan,

This is really easy to do with a regular expression, because there are recurring patterns in your text (e.g., LC* is always at the beginning, Migration Scenario is always in the same place, and so on). There are multiple solutions to your problem. I chosen one that has some restrictions. If the format of your text were to change, we could come up with something more abstract/flexible.

Search for (regular expression mode): (LC\*),(.*),(Migration .*),,(.*)

Replace with: \1:\2:\3::\4

Mark Antoniou said...

Nathan, perhaps you could record a macro and assign it to a key combination. Then program the steps of running the macro for each file using Autohotkey. Then close the file using Autohotkey, which will move you to the next one.

Anonymous said...

mmm..i try now but use when i ctrl+tab or personal hotkey combination, N++ give me a loop of some opened documents and not of ALL opened document.
Is there a correctly method to jump next one document in N++?

Mark Antoniou said...

That's why I would suggest closing the current document (Ctrl-W) instead.

Anonymous said...

I do not understand why Should I close the current document. I have to apply a series of macros in all documents open in notepad. But when i recorded the key combination ctrl + tab with Autohotkey the macros are applied only to certain documents and not all. This is because as you move from one document to another Notepad doesn't follow a linear path 1-2-3-4-5, but follows a discontinuous path as 1-3-4-6-2. Do you understand?

Mark Antoniou said...

Yes, I understand.

You should perform your macro (and do whatever else you want) to the current txt file. Then you should close it (save changes). Then, another txt file will become the current file. Repeat steps until there are no more files. This way, you don't have to worry about the linear vs discontinuous path problem.

Ryan Perry said...

Mark

Many thanks for your quick reply; I made one tweak to the expression (as it could be LC* or PC* at the start), but otherwise, this works perfectly. You've just saved me at least half an hour a day for the next 4 months! :-)

All the best

Ryan

Unknown said...

Hi,

I am trying to replace this

dump(a.xxxx) --> a.xxxx

in notepad ++. I tried like this in find

([dump(])(.*)([)]) and replace with \2

but what it all does is removing the character 'd' in dump. any suggestions

Mark Antoniou said...

A couple of hints:
1. Whatever text you want to discard does not need to be enclosed in parentheses.
2. You can make things a little more abstract by not using the word dump, but instead using the parentheses as sign posts of what you will keep vs what you will discard.

Search for (regular expression mode): .*\((.*)\)
Replace with: \1

Unknown said...

Just wanted to say that your original post and all your subsequent comments have really been helpful. To the point that I've been able to apply it to my situation and work it out for myself - no mean feat.

Mark Antoniou said...

That's great, Simon. Even though the problems that people face seem very different, the underlying idea for each solution, more often than not, has some common elements. Any you're right, the comments below the post have become quite a resource!

w_conti said...

Greetings Mark and pardon the intrusion

Is it possible to replacea string [---] with a progressive number ?

Thank you.

Mark Antoniou said...

It is, although it is a bit messy. Refer to previous comments that mention the Column Editor, such as my comment from July 19, 2011, which go through the process in detail.

The basic idea is that you add numbers to the beginning of each line and then rearrange the text so that the numbers replace the text you want to discard, such as ---

Not that you cannot simply replace --- with an incrementing number using find + replace.

w_conti said...

To be a bit more specific:

I need to insert progressively numbered IDs into a text file. There are already Placeholders in the file, so that I would tend to work with search and replace.

Mark Antoniou said...

Show me 5-6 lines of text, and also show me what you would like it to look like at the end.

w_conti said...

---
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
---
Do eiusmod tempor incididunt ut labore et dolore magna aliqua.
---
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
---
Nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
---
In reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
---
Pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa




1
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
2
Do eiusmod tempor incididunt ut labore et dolore magna aliqua.
3
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
4
Nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
5
In reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
6
Pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa

Mark Antoniou said...

If the numbers (or IDs) are located somewhere else in the text file, that's not much good to you, as you need them here. This is how you would insert the numbers using Notepad++'s column editor.

Step 1: Put the occurrences of --- and the line of text below on the same line.

Search for (extended search mode): (---)\r\n
Replace with: \1

which will give you this

---"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
---Do eiusmod tempor incididunt ut labore et dolore magna aliqua.
---Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
---Nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
---In reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
---Pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa

Step2: Using the column editor mode, add numbers to the beginning of each line.

1---"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
2---Do eiusmod tempor incididunt ut labore et dolore magna aliqua.
3---Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
4---Nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
5---In reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
6---Pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa

Step 3: Put the line breaks back in.

Search for (extended search mode): ---
Replace with: \r\n

1
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
2
Do eiusmod tempor incididunt ut labore et dolore magna aliqua.
3
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
4
Nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
5
In reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
6
Pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa

There you go.

w_conti said...

Thanks Mark, almost there.
The column editor put the number wherever it finds a carriage return, and not only at the beginning of the line starting with ---. So, with a full paragraph it works as expected, but if the paragraph contains CR LF, it puts a number there too.
[Nub here, please bear with me]

Mark Antoniou said...

Ok, so your text file is a bit more complicated than your example above. That will, of course, require a more complicated solution. Let's say that your text looks like this:

---
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
some other text that I don't want to have a number at the front
---
Do eiusmod tempor incididunt ut labore et dolore magna aliqua.
---
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
more text that shouldn't have a number at the front
---
Nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor

Step 1: Let's get rid of those carriage returns
Search for (extended search mode): \r\n
Replace with: NEWLINEHERE

When you do that, eveything will be on one really long line like this
---NEWLINEHERE"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sedNEWLINEHEREsome other text that I don't want to have a number at the frontNEWLINEHERE---NEWLINEHEREDo eiusmod tempor incididunt ut labore et dolore magna aliqua. NEWLINEHERE---NEWLINEHEREUt enim ad minim veniam, quis nostrud exercitation ullamco laborisNEWLINEHEREmore text that shouldn't have a number at the frontNEWLINEHERE---NEWLINEHERENisi ut aliquip ex ea commodo consequat. Duis aute irure dolor

Step 2: Put only the --- at the beginning of each line
Search for (extended search mode): ---
Replace with: \r\n

NEWLINEHERE"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sedNEWLINEHEREsome other text that I don't want to have a number at the frontNEWLINEHERE
NEWLINEHEREDo eiusmod tempor incididunt ut labore et dolore magna aliqua. NEWLINEHERE
NEWLINEHEREUt enim ad minim veniam, quis nostrud exercitation ullamco laborisNEWLINEHEREmore text that shouldn't have a number at the frontNEWLINEHERE
NEWLINEHERENisi ut aliquip ex ea commodo consequat. Duis aute irure dolor

Step 3: Using the column editor mode, add numbers to the beginning of each line.

1NEWLINEHERE"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sedNEWLINEHEREsome other text that I don't want to have a number at the frontNEWLINEHERE
2NEWLINEHEREDo eiusmod tempor incididunt ut labore et dolore magna aliqua. NEWLINEHERE
3NEWLINEHEREUt enim ad minim veniam, quis nostrud exercitation ullamco laborisNEWLINEHEREmore text that shouldn't have a number at the frontNEWLINEHERE
4NEWLINEHERENisi ut aliquip ex ea commodo consequat. Duis aute irure dolor

Step 4: Put the line breaks back in.

Search for (extended search mode): NEWLINEHERE
Replace with: \r\n

1
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
some other text that I don't want to have a number at the front

2
Do eiusmod tempor incididunt ut labore et dolore magna aliqua.

3
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
more text that shouldn't have a number at the front

4
Nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor

*Optional step 5: If the double carriage return is annoying, you can get rid of it
Search for (extended search mode): \r\n\r\n
Replace with: \r\n

1
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
some other text that I don't want to have a number at the front
2
Do eiusmod tempor incididunt ut labore et dolore magna aliqua.
3
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
more text that shouldn't have a number at the front
4
Nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor

w_conti said...

MARK, thank you very much ツ

P.K. SENS said...

How to replace all new line in whole book by regular expression.

P.K. SENS said...

How to select "absdgfadgrWYCXB1542FC00" matter by regular expression.

Mark Antoniou said...

Hi P.K. Sens,

To replace all newlines

Search for (extended search mode): \r\n
Replace with: whatever you like

And to replace absdgfadgrWYCXB1542FC00, it really depends on what is around it. So, I can't give you an answer unless if you show me a few lines.

Adam said...

I browsed through the comments and I was unable to find an answer to what I am looking to do so maybe you can be some help to me. I would like to change the IP address on each line listed in a Web Server log. So the original text would be:
EX:
192.168.5.140 - - rest of line
187.158.45.843 - - rest of line
241.248.142.2 - - rest of line

To something like:
192.168.5.555 - - rest of line
187.158.45.555 - - rest of line
241.248.142.555 - - rest of line

So the last set of digits are changed to a default. Any help would be great

Mark Antoniou said...

Search for (regular expression mode): (.*\.).*( - - .*)
Replace with: \1555\2

Adam said...

Thanks for the quick response. It worked great.

Ryan Perry said...

Hi Mark

The regular expression you provided me works great for me, but a colleague who is in Europe is having problems, as they use a semi-colon for separating lists rather than a comma like we do in the UK.

Therefore, I'm trying to tweak (without much success) the expression so that it takes input like the following:
LC*;00:24:E8:E2:CF:9A;Migration Scenario;;Apple iTunes x64 10.6.3.25;Microsoft Access 2010 1.0 en-US;Ordbogen Ordbogsprogrammet 2.0.45561 da-DK;SAP Gui x64 7.20.9 en-US
LC*;D4:BE:D9:41:69:A8;Migration Scenario;;Apple iTunes x64 10.6.3.25
LC*;D4:BE:D9:41:C7:47;Migration Scenario;;Microsoft Access 2010 1.0 en-US;SAP Gui x64 7.20.9 en-US
PC*;00:1E:4F:45:32:19;Migration Scenario;;IBM Lotus Notes 6.5.5 en-US;Microsoft Access 2010 1.0 en-US;SAP Gui x64 7.20.9 en-US

(Note that there may be none, some or many entries after the double semi-colon)

And output this:
LC*;00:24:E8:E2:CF:9A;Migration Scenario;;Apple iTunes x64 10.6.3.25,Microsoft Access 2010 1.0 en-US,Ordbogen Ordbogsprogrammet 2.0.45561 da-DK,SAP Gui x64 7.20.9 en-US
LC*;D4:BE:D9:41:69:A8;Migration Scenario;;Apple iTunes x64 10.6.3.25
LC*;D4:BE:D9:41:C7:47;Migration Scenario;;Microsoft Access 2010 1.0 en-US,SAP Gui x64 7.20.9 en-US
PC*;00:1E:4F:45:32:19;Migration Scenario;;IBM Lotus Notes 6.5.5 en-US,Microsoft Access 2010 1.0 en-US,SAP Gui x64 7.20.9 en-US

Essentially, replacing any semi-colons after the double with a comma, while leaving everything before the double as it is. I think the tricky part is that the number of entries in the list after the double semi-colon can vary so much, and I'm having trouble getting the repition working correctly.

Do you have any suggestions for this? The file that we are looking to run this on can be up to 80 lines long, so it's a big task to do manually :-)


Thanks in advance!

Ryan

Mark Antoniou said...

Ryan, this is getting a bit more complicated, so I have broken it down into three steps. You could do it quicker if you didn't use Notepad++.

Step 1: Use the double semi-colon to separate the semi-colons that you want to keep from those that you want to replace.

Search for (extended search mdoe): ;;
Replace with: \r\nNEWLINE

LC*;00:24:E8:E2:CF:9A;Migration Scenario
NEWLINEApple iTunes x64 10.6.3.25;Microsoft Access 2010 1.0 en-US;Ordbogen Ordbogsprogrammet 2.0.45561 da-DK;SAP Gui x64 7.20.9 en-US
LC*;D4:BE:D9:41:69:A8;Migration Scenario
NEWLINEApple iTunes x64 10.6.3.25
LC*;D4:BE:D9:41:C7:47;Migration Scenario
NEWLINEMicrosoft Access 2010 1.0 en-US;SAP Gui x64 7.20.9 en-US
PC*;00:1E:4F:45:32:19;Migration Scenario
NEWLINEIBM Lotus Notes 6.5.5 en-US;Microsoft Access 2010 1.0 en-US;SAP Gui x64 7.20.9 en-US

Step 2: Replace all of the semi-colons on the NEWLINE lines only

Search for (regexp mode): (NEWLINE.*);(.*)
Replace with: \1,\2

Note that you may need to press 'Replace All' several times as it cycles through the different lengths.

LC*;00:24:E8:E2:CF:9A;Migration Scenario
NEWLINEApple iTunes x64 10.6.3.25,Microsoft Access 2010 1.0 en-US,Ordbogen Ordbogsprogrammet 2.0.45561 da-DK,SAP Gui x64 7.20.9 en-US
LC*;D4:BE:D9:41:69:A8;Migration Scenario
NEWLINEApple iTunes x64 10.6.3.25
LC*;D4:BE:D9:41:C7:47;Migration Scenario
NEWLINEMicrosoft Access 2010 1.0 en-US,SAP Gui x64 7.20.9 en-US
PC*;00:1E:4F:45:32:19;Migration Scenario
NEWLINEIBM Lotus Notes 6.5.5 en-US,Microsoft Access 2010 1.0 en-US,SAP Gui x64 7.20.9 en-US

Step 3: Put is all back together

Search for (extended search mode): \r\nNEWLINE
Replace with: ;;

LC*;00:24:E8:E2:CF:9A;Migration Scenario;;Apple iTunes x64 10.6.3.25,Microsoft Access 2010 1.0 en-US,Ordbogen Ordbogsprogrammet 2.0.45561 da-DK,SAP Gui x64 7.20.9 en-US
LC*;D4:BE:D9:41:69:A8;Migration Scenario;;Apple iTunes x64 10.6.3.25
LC*;D4:BE:D9:41:C7:47;Migration Scenario;;Microsoft Access 2010 1.0 en-US,SAP Gui x64 7.20.9 en-US
PC*;00:1E:4F:45:32:19;Migration Scenario;;IBM Lotus Notes 6.5.5 en-US,Microsoft Access 2010 1.0 en-US,SAP Gui x64 7.20.9 en-US

This is not the most elegant solution, but without having seen your full text file, it is the best I can do and with the most restrictions built in so that it does not mess up your text. If you have a line with two occurrences of ;; then this will break the expression.

Ryan Perry said...

Hi Mark

That is perfect - even following the three steps takes significatnly less time than trying to wade through 80 lines manually to do the same.

Thanks for your help once again!


Ryan

Unknown said...

Thank you! I'd been banging my head trying to figure out how the replace functionality worked with RegEx. You just saved me a ton of time.

Ryans_r said...

Hi All,

Mark, I need some help. I'd really appreciate if you could help me.

My original text is:

07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP|


I got the following applying:

07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|ESP

Find what: (^07\|)(.*)([|]$)
Replace with: \1\2

But the result should be like:

07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP


Thanks in advance,

Ryan

Mark Antoniou said...

I have assumed that you have more than one line to fix. I have also assumed that the vertical bar that you want to replace is always the final character on that line. Finally, I have assumed that there are other lines that do not have vertical bars to delete. So, let's pretend that this is what you've got, except the the cities and countries will be different.

07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP|
07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP this line should not be changed
07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP|
07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP|

So, what you want to do is remove the vertical bar character | that occurs before the line break. This is a little bit complicated in Notepad++ because we can't search for line breaks and regular expressions at the same time. Here is what to do.

Step 1: Put something unique at the end of each line between the vertical bar you want to delete and the line breaks
Find what (extended search mode): |\r\n
Replace with: |NEWLINE\r\n

07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP|NEWLINE
07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP this line should not be changed
07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP|NEWLINE
07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP|NEWLINE

Step 2: Get rid of only the correct vertical bars |
Find what (normal mode): |NEWLINE
Replace with: nothing, leave blank

07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP
07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP this line should not be changed
07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP
07|TR40032750~NAPMAD~CO||Calle Yecora 4|28022 Madrid, Spain|Madrid|||ESP

Unknown said...

Hi Mark and thank tou for this post.
I read it but still no solution for the following ...

Given the text below

0064,20120101,01,35634,0
0064,20120101,02,39152,0
0064,20120101,03,43445,4
0064,20120101,04,47972,43
0064,20120101,05,47742,65
0064,20120101,06,40196,023232
0064,20120101,07,39241,0
0064,20120101,08,46363,1
0064,20120101,09,40425,0
0064,20120101,10,47491,0


how can I remove anything after the last comma (and the comma as well)?


thank you in advance

Mark Antoniou said...

This is actually a lot easier than it looks. If you had wanted to get rid of the second last comma, it would have been a different story.

Search for (regexp mode): (.*),.*
Replace with: \1

'Αντε γεια!

Unknown said...

Thanks a lot! (Ευχαριστώ ;-) )

Arvind said...

Thanks a lot!! Worked like a charm!! :D :D

Arvind said...

Thanks a ton!! Worked like a charm!!

Unknown said...

can you search regular word and replace with regex? it didn't work for me.

Unknown said...

Great blog Mark.

Here is my string I want to find/replace.

.ACTIVEID. 1238923948, 324908234

I want to find and replace the 1238923948 ID and leave the second ID as it is most current. Do you know of an easy way to achieve this with Notepad++

Mark Antoniou said...

@Damodar, you sure can. There are lots of examples in this very post.

@Michael, I have assumed that all lines begin with .ACTIVEID. and that you want to keep that, that you want to delete the first ID, and that the first ID is always followed by a comma. This is how to do it.

Search for (regexp): (.*ACTIVEID. ).*, (.*)
Replace with: \1\2

Unknown said...

Very nice blog:)

I would like to know how to extract names from a html using notepad++?

Santosh Kumar said...


hi Mark,

I want to do the below search and replace. Is this possible in N++.
If my file is as below, I want to find and replace the section (AAA, XXX, CCC) XXX may be a dynamic content which cannot be used to search. When I use reg exp (AAA).*(CCC), it is selecting entire file. Any solution for this ? Please guide me.

AAA
EEE
CCC

AAA
DDD
CCC

AAA
XXX
CCC

regards
Santosh

Santosh Kumar said...

hi Mark, In the above example, I have partial info reg XXX which is static.

(AAA).*(INFO).*(CCC)

Even this is selecting entire file.

Mark Antoniou said...

Yes, you can do it, but you would make life A LOT easier for yourself if you used a more powerful text editor than Notepad++ (I like Aquamacs if I'm using a Mac, or Emacs on Windows/Linux). Let's say that you have the following text, and you want to replace all of the 3-line instances that contain "info".

AAA
ZZinfoZZ
CCC

AAA
EEE
CCC

AAA
DDD
CCC

AAA
XXinfoXX
CCC

AAA
FFF
CCC

AAA
GGG
CCC

AAA
YYinfoYY
CCC

Search for (Regexp):
.*
.*info.*
.*

Note that Aquamacs/Emacs allows you to insert newline characters into your search term by pressing Ctrl-Q Ctrl J. So, the above search term will select the lines above and below the line containing "info" (wherever it occurs in the line)

Replace with: nothing, leave blank

The result is this:

AAA
EEE
CCC

AAA
DDD
CCC

AAA
FFF
CCC

AAA
GGG
CCC

Sergio said...

I have a text file full of "70,00" ... "180,00" ... "95,00" ... and I want to replace all of them with 70.00 ... 180.00 ... 95.00 and so on without commas. May you help me ?

Mark Antoniou said...

Unless if I'm missing something, a simple Find + Replace will do the trick. Just search for commas and replace with periods.

Search for: ,
Replace with: .

Then Replace All

Sergio said...

Hi Mark ! Thanks by the help article !! I have a text with "70,00" .. "500,00" ... "60,00" ... and so on. I did the search (\".*)(\"), ok ! Now I want to replace by the same string without quotes 70,00 .. 500,00... 60,00 ... Could you help me ???

Sergio said...

Thanks Mark by you reply !!! I really appreciate. Sorry by the duplicate post.

Sergio said...

I made the question ...
Hi Mark ! Thanks by the help article !! I have a text with "70,00" .. "500,00" ... "60,00" ... and so on. I did the search (\".*)(\"), ok ! Now I want to replace by the same string without quotes 70.00 .. 500.00... 60.00 ... Could you help me ???
And you reply ... Unless if I'm missing something, a simple Find + Replace will do the trick. Just search for commas and replace with periods. >>> The problem is ... there is a lot of commas in the text, so I want to replace the search text by the same expression with a "." instead of ",". Ok ?

sawfoot said...

This is a nice guide, but wouldn't it be easier to just use the dmdxparse utilty?

Anonymous said...

I have a trivia questions and answers file with over 206,000 lines in which I'm going to proof read for grammar, spelling and casing and manually correct any errors, in addition to fixing any errors in the question/answer format - answers are simply separated from a question by an asterisk. The document will contain information on pretty much everything and from my experience spell checkers and grammar checkers just won't be up to the job. The most repetitive task I'm doing at the moment is correcting casing, using find and replace in notepad++, so when I find a word (or words) I follow the same process;

1. CTRL C to copy selection
2. CTRL H to bring up the find and replace dialog box
3. CTRL V to paste the selection
4. Tab to Replace With field
5. CTRL V to paste the selection again
6. Manually correct the case
6a. Alt W to tick the 'Match whole word only' if the selection could affect other words.
7. Alt A to replace all

I thought about recording a macro, but steps 6 and 6a will be different for every word.

I've added the TextFX plugin as that has more change case features.

Is there a way of simplifying this process in notepad++?

Mark Antoniou said...

I'm having a bit of a tough time understanding exactly what you're doing without an example. It sounds like you have a lot of key presses which you could automate using something like AutoHotKey.

Steps 1-3 could be reduced to a single step: Select the text and press Ctrl-H and the word will automatically appear in the Find field.

Regexp could be used to help find the letter that needs to have its case changed, but it would depend on what comes before and after the word in question. I'd need to see about 2-3 lines of text to be able to tell you if there was a pattern in the text that we could use.

Unknown said...

Awesome, I didn't know you could use \1, \2, .., \n as group selectors for search and replace.

tarotbyparis said...

Is it possible to do a search for the nth incident of a term's appearing in the file? Example: the string "" appears 1000 times. I need to locate the 650th incidence of this string.

tarotbyparis said...

I had an xml tag in my prior example...and it didn't show up (of course) So let's just say the string that should be between those quotes is "user"

Mark Antoniou said...

No, you can't do that with regexp. In theory, you could create a search term that ignores the first 649 instances and acts on the 650th, but most text editors don't give you that many buffers.

Having said that, it is possible that there may be some other unique pattern of text that makes the 650th occurrence stand out from the others. That is what I would recommend trying to find.

If not, then you will need to program a loop with a counter in some programming language. But now we are leaving the realm of regexp.

tarotbyparis said...

((.*?)User){650} highlighted all the text up to and including the 650th incidence of "User" so at least I can FIND the string. (I had to change search mode to "regular expression" and add a check to ".matches newline, of course.)

Mark Antoniou said...

Hmm, you're close. How about searching for the 649th string in the first buffer, and then finding the 650th?

I'm thinking something like this (I'm on an iPad right now so can't test it in Notepad++):
((.*?)User){649}.*(User)

You could insert more parentheses if you wanted to keep the text before the 650th occurrence.

tarotbyparis said...

((.*?)User){649}.*(User) ended up highlighting all the records in the file rather than the 650th.

Mark Antoniou said...

Could you paste the lines containing occurrences 649, 650 and 651 of User? And let me know how many lines are between them.

tarotbyparis said...

I can't post the actual data. Give me a day here, and I'll come back with a good test file for us to play with. I think this should be do-able. The file is an XML file, and the lines between the incidences of "User" vary. That's the reason for the need for an "nth occurrence" procedure. I appreciate your interest and this blog has given me some terrific tools, so thanks for that, too, Mark.

Mark Antoniou said...

Ah, I was close with the first try, but it's just too hard if you don't have the text editor in front of you. Here you go:

Search for (regexp mode): ((.*?)user){649}.*?user

Also, if you want to keep (rather than discard) the first chunk of text up to and including the 649th occurrence of user, then place it in parentheses like this and recall the buffer in your replace term (i.e., \1):

(((.*?)user){649}.*?)user

Now, I'm not sure what you want to do with the 650th occurrence of user, but let's say that you wanted to change it to monkey, then you would make your replace term this:

\1monkey

Glad we sorted that one out.

tarotbyparis said...

Yes, Mark, those work. Often when importing XML files, an import-fail-report will specify which record did not make it (the 650th, for example) but the XML data isn't always numbered so this will give me a chance to open the file in Notepad++ and jump right to the failed record. BTW, I couldn't post test XML text for us because of eBlogger restrictions ("Your HTML cannot be accepted") so thanks for soldiering on without it.

Eric said...

Mark,

I have used [\t ]+$ to find one or more whitespaces at the end of line with the regular expression. For example, it will find:
1] 2013PG,Active,,"TM# 160(05)00-002-0 CRLF
and
2] Active,"181-09-0117 Site #1 CRLF
This was from a previously saved .CSV file and the encoding was set to UTF-8 (I don’t remember setting this, so it must have done this automatically when I opened the file).

The problem is that I copied and pasted data from another source into a new blank NPP document, and tried the above search of [\t ]+$ using regular expression and it cannot find any whitespace at the end of lines. I changed the view to Show All Symbols and some lines clearly end in whitespace. I saved the file to .CSV thinking that may be the cause. It did not seem to help. The original encoding was ANSI when I pasted the data, so I changed it UTF-8 like my other file. This time it showed the whitespaces as “xA0” and when I used [\t ]+$ it found double occurrences of xA0 at the EOL, but not single or triple or quad, etc. occurrences.

AK1500FF
AK500C
AK500F
AK500FF
AK600C
AK600F

Any thoughts? Thanks.

Mark Antoniou said...

You could make life a lot easier for yourself by using a text editor that handles whitespace better than Notepad++. I'd recommend this one http://ftp.gnu.org/gnu/emacs/windows/

If you decide to go down this rout, the regexp search term for whitespace is this \s-

It will find whatever whitespace, with whatever encoding.

Unknown said...

Thanks very much, just what I needed. I had a list of database tables that I wanted to add a prefix to:

replace:
(.*)
with:
RENAME '\1' TO 'legacy_\1';

worked exactly as expected!

DESI said...

Hello,
I am using notepad++. I have XML file and I want to replace all the lines contaning say Some Value with

Tried using Reg Expression such as find all the lines beginning with and ending with but some how not able to reach the solution.

Any advise will be highly appreciated.

-Santosh

DESI said...

Hello Mark,
I am using XML file and I am trying to replace lines starting with Some Value by


Any idea?

-Santosh

DESI said...

Hello Mark,
I am using XML file and I am trying to replace lines starting with &ltABC&gtSome Value&lt/ABC&gt by
&ltABC/&gt

Any idea?

-Santosh

Mark Antoniou said...

Santosh, I don't understand what you want to do. Please give an example of what the line looks like, and what you want it to look like at the end.

philippeko said...

Also see the official Wiki documentation of Notepad++ @https://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Searching_And_Replacing
with all options of the extended Search explained

Helpful post as a quick reference

BloggerMaxz said...

Hi, I am tryin to use Notepad++ to filter the ppl to a common name after Expression: xxxxxx

2013-10-25 12:10:33,isCompatible: Expression:koh ai ty
2013-10-25 12:10:33,isCompatible: Expression:simon smith
2013-10-25 12:10:33,isCompatible: Expression: tao jin yin
2013-10-25 12:10:33,isCompatible: Expression:jackie wu
2013-10-25 12:10:33,isCompatible: Expression:wang jie


how do i that ?

Mark Antoniou said...

That's a pretty straightforward example, Maxz. You want to keep everything up to and including Expression: and replace everything after it. So, we will use Expression: as a sign post like this:

Search for (regexp): (.*Expression:).*
Replace with: \1common name

which will give you this:

2013-10-25 12:10:33,isCompatible: Expression:common name
2013-10-25 12:10:33,isCompatible: Expression:common name
2013-10-25 12:10:33,isCompatible: Expression:common name
2013-10-25 12:10:33,isCompatible: Expression:common name
2013-10-25 12:10:33,isCompatible: Expression:common name

BloggerMaxz said...

tks Mark, i got it working with your example

appreciate it

rockmead76 said...

This is a great article, but I'm struggling for a solution to the following

I have a text file with usersnames dotted in the file in various formats etc

Joe Bloggs
Joe_Bloggs
jbloggs
bloggsj

Is there a regular expression that could cope with this ?

Mark Antoniou said...

What do you want to do? Replace all the different Joe Bloggs with X, and replace all the different Jane Doe with Y?

The text around the names is important, so you need to include some, not just the names on their own (or is the text file just a list of names?).

rockmead76 said...
This comment has been removed by the author.
rockmead76 said...

Sorry I should of been clearer, I wish to replace

Joe Bloggs
Joe_Bloggs
jbloggs
bloggsj

with user1

and jane doe etc with user2

unfortunately these names would be anywhere in this log file and have differing strings before and after anyone the entries. There is no pattern to where the strings could be in the file.

Is this at all possible.

Mark Antoniou said...

There's not a lot to work with. I have a fairly ugly solution that requires specifying all of the variations of Joe Bloggs.

Search for (regexp mode): Joe Bloggs|Joe_Bloggs|jbloggs|bloggsj

Replace with: user1

Then hit Replace All

Unknown said...

do you know how i can replace regular text "\r\n" with regular expression "\n"?

Note I am trying to format JSON text copied from console. so it includes "\r\n" instead of actual line breaks.

Thanks.

Mark Antoniou said...

You could just do it the quick and dirty way.

1. Find (regular search mode): /r/n
Replace with: NEWLINE

2. Find (regexp mode): NEWLINE
Replace with: /n

ICS Cyber Security said...

this web site is genuinely good and the people are in fact sharing pleasant thoughts.
website heatmap

Harsh said...

Hi,

I am struggling with one output i want to achieve.

Is is possible to select and remove those lines from a text file,which have a repeating keyword "xyz" suppose in it .

for eg,

my name is xyz harry.
your name is abc charlie.
my hobbie is xyz football.
your hobbie is abc carrom.

(Now i want to remove the lines which contains "xyz")

to end in -

your name is abc charlie.
your hobbie is abc carrom.

Anonymous said...

Mark, I'm a beta tester for Franchise Hockey Manager from OOTP. The schedule I'm using is fouled up. I need the 2 teams switched but yet keep the dates intact. How would I approach this?
Here is an example:
20;1;1995;Toronto Maple Leafs;Los Angeles Kings
20;1;1995;Buffalo Sabres;New York Rangers
20;1;1995;St. Louis Blues;San Jose Sharks

Mark Antoniou said...

Harsh, this is pretty straightforward. Essentially, you look for occurrences of xyz and replace those lines with nothing. You also want to remove the blank lines that will be left behind. In Notepad++, you need to do this using two search expressions. There are numerous ways to do this. One way would be:

Search for (regexp mode): .*xyz.*
Replace with: nothing, leave blank

which will give you this

your name is abc charlie.

your hobbie is abc carrom.

And then, you can clean up the empty lines like this:
Search for (extended search mode): \r\n\r\n
Replace with: \r\n

which will give you this:
your name is abc charlie.
your hobbie is abc carrom.

Mark Antoniou said...

Craig68, your problem seems to be a bit more complicated. The way that it will be handled will likely depend on the regexp engine of the text editor in question. Right now, I'm working on a Mac and am using Aquamacs, so I can't verify with 100% certainty that this will work in Notepad++ on Windows, but I'm 90% confident it should.

So, we start with this:

20;1;1995;Toronto Maple Leafs;Los Angeles Kings
20;1;1995;Buffalo Sabres;New York Rangers
20;1;1995;St. Louis Blues;San Jose Sharks

Search for (regexp mode): (.*);(.*);(.*)
Replace with: \1;\3;\2

which will give you this

20;1;1995;Los Angeles Kings ;Toronto Maple Leafs
20;1;1995;New York Rangers;Buffalo Sabres
20;1;1995;San Jose Sharks;St. Louis Blues

Anonymous said...

Mark, I want you to bump that 90% to 100%. It worked. I cannot thank you enough! I tested it first in a copy schedule and nearly fell off my chair. We have 2 weeks to turn around our beta tests, and this saved my bacon! Thank you very much!

BTW, I was using Notepad++ v4.9.2 in case anyone else is reading this.

Mark Antoniou said...

Great. Glad it worked out.

Anonymous said...

Mark, I've got another schedule messed up. *arghhh* When I try to replace the numerals, the schedule numerals get deleted as well.

This:

12;10;2012 ; Norfolk 4 Worcester 2 5,031
12;10;2012 ; Lake Erie 2 Oklahoma City 1 12,011
12;10;2012 ; Connecticut 4 Bridgeport 6 ,14
12;10;2012 ; Rochester 6 Syracuse 5 SO ,59
12;10;2012 ; Providence 1 Manchester 3 10,665

I need this:

12;10;2012;Norfolk;Worcester
12;10;2012;Lake Erie;Oklahoma City 12,10;2012;Connecticut;Bridgeport
12;10;2012;Rochester;Syracuse
12;10;2012;Providence;Manchester


All I need is the date format and the teams. All the other numerals and extra spaces are not needed. OOTP has a way they want their csv files (day;month;year;home team; away team)

Thanks again!

Anonymous said...

I should clarify a sentence: "The numerals on the end get replaced but so do the date numerals."

Sorry if that confused you or anyone.

Mark Antoniou said...

Craig68, this was a hard one. The difficulty lies in identifying the repeating patterns in the text. Here you go:

Search for (regexp mode): (.*) ; (.*) [0-9] (.*) [0-9] .*
Replace with: \1;\2;\3

Anonymous said...

Thanks again, Mark. It worked for me!

Anonymous said...

You're gonna' ban me from here, lol.

Mark, how would switch just the date string w/o tampering with the teams?

The day and month need reversed but the year is fine right where it's at.
This;
11;01;2012;Peoria Rivermen;Lake Erie Monsters
11;01;2012;Abbotsford Heat;Toronto Marlies
11;02;2012;San Antonio Rampage;Grand Rapids Griffins

To this:
01;11;2012;Peoria Rivermen;Lake Erie Monsters
01;11;2012;Abbotsford Heat;Toronto Marlies
02;11;2012;San Antonio Rampage;Grand Rapids Griffins

evdp said...

hallo! i have i problem with my data. i managed to creat a das file and now many of my data have no numbers but***** what is wrong??????

Mark Antoniou said...

evdp, I'm not sure what you're having a problem with.

Mark Antoniou said...

Craig68,

Search for (regexp mode): (..;)(..;)(.*)
Replace with: \2\1\3

Anonymous said...

Thank you, again, Mark.

I decided to teach myself this stuff so I don't have to keep asking for help. :)

Anonymous said...

Mark, I tried playing around with different settings to rid the (;) at the end of the string without disturbing the other (;) semi-colons, but I ended up adding ;;; at the end.

This:
9;10;2009;Hamilton Bulldogs;Rockford IceHogs;

To this:
9;10;2009;Hamilton Bulldogs;Rockford IceHogs

This was the advice I got from Stack Overflow:
To teach you some regex...

First you can match/find digits with \d
Secondly, you can "anchor" the match, the $ means "the end of the string"
Finally, you want to specify 1 or more digits, so you add the + quantifier to the \d token I mentioned earlier to create \d+
If the numbers are not ALWAYS on the end, make it optional with * ('0 or more') \d*

Full regex: \d+$ or \d*$

regex '\d+$' or '[0-9]+$' (some regex engines like one and not the other) will match digits at the end of a line.

\d+$
\1;\2;\3

Was this wrong?

Mark Antoniou said...

If all you're trying to do is get rid of the semi colon from the end of the line, you don't need to worry about any of that. Just keep everything and remove the final semi colon.

Search for (regexp mode): (.*);
Replace with: \1

Anonymous said...

Yup, it worked but the semi-colon between the teams disappeared too.

Anonymous said...

Wait! Scratch that. Some of the semis were already gone because i was doing it by hand. So when I used your codes, it did work. Sorry about that.

Anonymous said...

Mark, I have tried every combination to switch just the 1st and 3rd date (year & day; leaving the middle number alone) but what I came up with either added numbers or took away numbers. On one, the month and day switched but added digits to the year. Frustrating.

2011;9;3;Belfast Giants;Sheffield Steelers
2011;9;3;Nottingham Panthers;Cardiff Devils
2011;9;4;Cardiff Devils;Nottingham Panthers
2011;9;7;Nottingham Panthers;Coventry Blaze

Mark Antoniou said...

The thing is, you need to look at the text for patterns. Your text is actually formatted in the same way from one line to the next, and conveniently uses four semicolons to demarcate the different parts of each line = year;month;day;team 1;team 2. The regular expressions that I typically provide as solutions are the simplest and most advanced, but for the purposes of education, we could break down each line into five text bins that are separated by four semicolons, which would look like this

Search for (regexp mode): .*;.*;.*;.*;.*

Now, you want to keep all of the information. Your goal is only to swap the year with day. Regular expressions allow us to call back text from the search term by placing that text in banks using parentheses, like this

Search for (regexp mode): (.*);(.*);(.*);(.*);(.*)

This allows us to recall the text in the five banks in the replace term using backslash followed by the number of the bank. Following on from the search term above, year is in bank \1 and day is in bank \3, so the replace term would be

Replace with: \3;\2;\1;\4;\5

As you can see, I have moved bank \3, which contains the day, to the front of the line, then comes bank \2, i.e., the month, followed by bank \1, which is the year, and then come the team names.

If you are using Notepad++, make sure that . matches newline is unchecked.

Anonymous said...

Mark, I recently bought Edit Pad Pro 7 so I can really get into this. Is this a good choice? Or Notepad++ is where I should start learning from?

Mark Antoniou said...

I don't think it really matters. If you've already dropped the cash on a commercial text editor, use that. A lot of people use Notepad++ because it's free and looks pretty. It actually handles regular expressions poorly in my opinion, so the user has to figure out various workarounds to make up for the limitations of the program. Most people who are very advanced regular expression users are full-time coders and programmers, so it is likely that they are using Linux, and probably use text editors like Vi or Emacs. This blog obviously isn't directed at them.

My personal favourite is Aquamacs on OSX, but I really don't use regexp so much these days, other than when replying here, of course.

Anonymous said...

Mark, in Notepad++, is it possible to work globally. Meaning: I have several tabs open with almost duplicate string patterns (seasons)?

Theoretically, could I make changes that affect ALL open tabs? So, if I use Regex to rid semi-colons, or whatever, that all open tabs will benefit?

Mark Antoniou said...

Yes you can. And I'm pretty sure you can do that in Edit Pad Pro, too. Most powerful text editors have this feature. There's normally an option "Find in all files" or "Find in all documents" etc. And then hit "Replace in all open documents" or something similar, and you're good to go.

Anonymous said...

Whoa, you're right. I did see this feature but didn't want to touch it until I knew what it was.

See? That's why you're the Jedi Master! :)

I will try this and report back. I'm sure you're thrilled to hear that. lol.

Anonymous said...

Mark, I think I know the answer to this but I need your opinion: if I have multiple tabs open, and they all need the same csv extension, is there a way to mass save all those files into the csv extension?

Mark Antoniou said...

Although this might seem off topic from regexp, one of the potential solutions lies in regexp.

Of course, it can be done. There is a Save All file command. It still requires individual extensions for each file to be specified, but it will save you quite a few mouse clicks.

If you don't want to do that, you could simply save the files as they are (as .txt or whatever) and then use a file renamer, such as Bulk Rename Utility, to change all of the .txt extensions to .csv, and you could do that using regexp.

Anonymous said...

Mark, I Dl'ed Bulk Rename Utility and it works like a charm. Thanks for for the HU! I find it easier this way since it allows me to do entire folders.

Unknown said...

Hi Mark you are wonderful helpful... Can you help? I do have the following lines:
LU1_312_1237 + LU2_312_1237 = 1;
LU1_312_1238 + LU2_312_1238 = 1;
LU1_312_1239 + LU2_312_1239 = 1;
LU1_312_1240 + LU2_312_1240 = 1;
LU1_312_1241 + LU2_312_1241 = 1;
LU1_312_1242 + LU2_312_1242 = 1;
LU1_312_1243 + LU2_312_1243 = 1;
= 1;
= 1;
= 1;
= 1;
= 1;
= 1;
= 1;
= 1;
= 1;
= 1;
= 1;
= 1;

I would like to delete all the lines that contain the " = 1;". This means:
LU1_312_1237 + LU2_312_1237 = 1;
LU1_312_1238 + LU2_312_1238 = 1;
LU1_312_1239 + LU2_312_1239 = 1;
LU1_312_1240 + LU2_312_1240 = 1;
LU1_312_1241 + LU2_312_1241 = 1;
LU1_312_1242 + LU2_312_1242 = 1;
LU1_312_1243 + LU2_312_1243 = 1;

Mark Antoniou said...

Thanks for the kind words, Fernando. Your problem looks complicated, but the solution is simple. As I'm sure you're aware, the problem with replacing all instances of = 1; is that it also occurs on the first 7 lines. So, you need to restrict the search to the lines containing only = 1;

The way I've chosen to do this is to use the newline character that precedes the = sign for lines 8-19. For lines 1-7, the = sign is preceded by a space, so this differentiates them into 2 groups.

Search for (extended search mode): \r\n= 1;
Replace with : nothing, leave blank

Then hit Replace All.

«Oldest ‹Older   201 – 400 of 471   Newer› Newest»