TweetFollow Us on Twitter

What's Rattling Around in my sed

Volume Number: 21 (2005)
Issue Number: 12
Column Tag: Programming

Mac In The Shell

What's Rattling Around in my sed

by Edward Marczak

Automating edits.

OS X is Unix" - a statement that has been under a bit of debate since the OS shipped. It does look like a duck and does smell like a duck, but it doesn't really quack like a duck. While many, many projects will, unmodified, compile and run under OS X, there are enough that do not thanks to the acrobatics that OS X engages in underneath it all. That's why we all have to delight in any project that comes pre-installed for us - no compile necessary, and we know it'll be on all OS X boxes that we touch - and runs identically to the Solaris/IRIX/Linux/HP versions. There are two indispensable utilities that fall into this category: sed and awk. These will consume us the next few columns. With sed ("the stream editor") being the more simple of the two, that's where we'll start.

Cuckoo for Cocoa Puffs!

OK - it's not that kind of cereal! sed is a "serial editor" as in it will repeatedly make changes to an input stream in series. I was, not all that long ago, amongst a group of like-minded Mac techs and one person asked how he could take a single file and alter it many times over for specific destinations. I immediately offered sed as a solution. "What?" "sed," I repeated. Blank stares. sed and awk are too useful to not use!

sed takes input, via file name or standard input, processes the file and pumps the results back via standard out. However, instead of making edits interactively, like you might with vi or a word processor, sed allows you to script your edits, almost as if you're creating a macro. This script can then be applied to many (or, any) files. Just like your editor has commands that will insert a line, delete a line and so on, so does sed. Even more impressive, is sed's ability to alter a file, or stream through substitution and regular expressions. And there's the rub. We can 't talk about sed without talking about regular expressions (RE).

Lucky Charms

Regular expressions are fundamental to the Unix landscape. Understanding them makes your computing life better and easier on many levels. Even many GUI editors have support for RE built in. Notable examples are BBEdit and Dreamweaver. There are patterns that you can describe with RE that just can't be done cleanly otherwise. A full discourse on regular expressions is its own article or even book. Fortunately, there's plenty of material out there. The February 2005 issue of MacTech ran, "Matchmaking With Regular Expressions" by Paul Ammann, so if you've been a subscriber, or have that issue, you should start there. O'Reilly has an entire book on RE alone called "Mastering Regular Expressions." If you've already mastered RE, please skip ahead to the next section. Otherwise, I'm ready to present the world's shortest intro and tutorial on regular expressions.

Regular expressions are one of those things that can be a little daunting at first, because the syntax is a little out of the ordinary. However once you dive in, you'll start seeing patterns everywhere that are a great fit. I will admit that I'm going to gloss over some of the finer details regarding RE.

A regular expression is just that: an expression. It's not to be taken literally as it is by itself. This is because most patterns involve metacharacters. We all intrinsically now understand the wildcard metacharacter, "*". If we were to type into our shell, "ls -l *", we'd expect a listing of all files, as we're not literally looking for a file named "*". Similarly, sed is heavily dependant on metacharacters in RE. Also, as an expression, it is something that the interpreter evaluates, much like a mathematical expression. Since understanding how the interpreter evaluates each metacharacter is fundamental to your understanding of the concepts covered in this piece, that's where we'll start.

Non-metacharacters are matched verbatim. If you ask sed (or grep) to match "Cat", it will match a capital "C" immediately followed by a lower case "a", adjacent to a "t". No other permutation will match. Each character is its own RE that matches that single character. To match any single character, use the "." (dot) meta. So, a regexp of "ca." would match "cat" or "cab", but not "rat". To match a wider swath of "any" characters, use the wildcard "*" (asterisk). The asterisk behaves a little differently than you may think. It modifies the previous character, and matches zero or more characters. Again: it modifies the previous character. If you are looking for "car", a RE of "c*r" will not do what you expect. Yes, it will match "car", but will also match any word with "r" in it. Huh? "c*r" says, "match zero or more of the letter c, and then an r." That re-enforces the second point: it will match zero or more characters. To find words that begin with "c" and end with "r", you can use "c.*r", which will match "cr" (if that was a word), "car" and "cur", but also "choir", "czar", "carpetmonger", "calcaneoplantar"...and much more, as we'll see. This says, "match the letter 'c', zero or more of any character, and then an 'r'".

Well, if you actually ran the above on a dictionary file, you'd notice that you'd get a lot more that expected. How does "c.*r" match "buckaroo"? We need to learn two more metacharacters: beginning-of-line "^" (circumflex) and end-of-line "$" (dollar sign). So, to truly get words that begin with "c" and end with "r", you'd use a RE of "^c.*r$". That says, "at the beginning of the line, look for a 'c', then match zero or more characters, and finally match an 'r' at the end of the line."

Cheerios

Let's jump right in with an example, shall we? Probably the sed command (or, mnemonic) used most often is substitute - "s":

sed -e 's/Mike/Michael/' letter.txt

This tells sed to run through the file letter.txt and replace 'Mike' with 'Michael'. Note the use of the forward slash as a delimiter. However, if you read the preceding section, you may have come to expect that it's never quite that simple. The substitute command will only make the substitution for the first occurrence on a line, unless you add the global flag - 'g'. Simply, the command should be:

sed -e 's/Mike/Michael/g' letter.txt

That'll get all of them. You can even make multiple changes in a file:

sed -e 's/Dot/Dorothy/g' -e 's/Chris/Christopher/g' personnel.txt

This gives us an opportunity to talk about how sed applies edits, which is critical to understanding what's going on.

sed works with a line that it brings into its pattern space. It then applies all edits that you've asked for to this line, moves the updated line to stdout (if appropriate), and then brings the next line into the pattern space. To really understand this, a demonstration is in order. The following text is in a file called 'short_story.txt':

    Bill and Michael went to the store. Bill needed to buy some butter, eggs and flour. He and Michael were in a hurry to bake a cake for their parent's Anniversary. Once they got home, Bill and Michael realized that they forgot cake icing.

Of course, Michael, being the older brother, feels he should precede Bill in the story. Well, you have sed and that's an easy task, right? Wherever you see 'Michael', change it to 'Bill', and wherever you see 'Bill', change it to 'Michael':

$ sed -e 's/Bill/Michael/g' -e 's/Michael/Bill/g' short_story.txt 

But when you run the command, you get this output:

    Bill and Bill went to the store. Bill needed to buy some butter, eggs and flour. He and Bill were in a hurry to bake a cake for their parent's Anniversary. Once they got home, Bill and Bill realized that they forgot cake icing.

What happened? Let's trace:

    1. sed brought the first line into pattern space.

    2. sed found the pattern 'Bill' and substituted 'Michael', making the line:

    "Michael and Michael went to the store. Michael needed to buy"

    3. sed then applied the second edit we asked for, making 'Michael' into 'Bill', resulting in the output we just saw.

Be aware of that interaction. Later on (read: next month's column), I'll show how to deal with that. Using the simple substitution above is great static files, like a checklist:

Hello -=firstname=-!  Welcome to Acme and Associates.  There are some things 
   you'll need to know to get started with our network.  Please keep a record of these items:
Name: -=username=-
Phone: -=phonenum=-
IP address: -=ipaddr=-
Thanks!

You could write a sed script that alters this file appropriately before mailing it out. Of course, substitution gets much more interesting when combined with regular expressions. But first, we need to look at some of the other sed commands.

Rice Krispies

As mentioned, sed is an editor. What good would an editor be if it couldn't add and delete lines? The simpler of the two is delete. Delete erases the entire pattern space, and then continues on to the next line - there's nothing left to match after that.

sed -e '1d' short_story.txt

This prints out our short story, minus the first line - that gets deleted. Of course, delete is even more powerful when combined with regular expressions. How about one that gets rid of bash comments:

sed -e '/^#/d' bashscript.sh

To create more complex sed interactions, we can put all of our commands in a file to make a sed script. By placing your edits into a sed script, you gain the advantage of making your routine reusable, and being able to build it up slowly - these scripts can get complex very quickly, and it's not often that you get it 100% right on the first shot. If you were to save the following into a file named "editscript.sed":

1d
s/Bill/William/g
s/cake/pie/g

...you'd be able to run it against our short story, using the "-f" flag, and see each edit:

$ sed -f editscript.sed short_story.txt 
some butter, eggs and flour.  He and Michael were in a hurry
to bake a pie for their parent's Anniversary.  Once they got
home, William and Michael realized that they forgot pie icing.

First line deleted, "Bill" becomes "William" and "cake" becomes "pie". Using a sed script also brings some other advantages in terms of opening up other commands for our use. First and foremost, as opposed to delete, we can insert. Insert is one of the odd sed commands in that it expects its input to be broken up over more than one line. If we wanted to insert a title at the top of our story, we could use this:

1i\
The Wedding Cake

With insert, you must include a backslash after the command with the text to be inserted on the next line.

Also, within a sed script, we can use functions. Functions in sed? Well, perhaps they're not like traditional functions, but we can opt for a series of edits to take place on a sub-set of the document we're editing:

/^    Bill/ {
        s/Bill/William/
        s/    / /
}

(note here that the second edit is "s slash space space space space slash tab slash". Thank you for your patience.)

This tells sed, "only on lines that start with four spaces immediately followed by 'Bill' will you make the following edits: change Bill to William and then change four spaces to a tab character." Think of all the conditional ways you can program edits with this feature! (OK, did I get too excited over that?)

Again, the power of regular expressions increases sed's value exponentially. This is especially effective when dealing with any kind of mark-up. And to make this even more powerful, we need to introduce addressing. So far, our commands have not had an address, which makes sed apply edits to each line. Another possibility is to supply one address. This can come in the form of a number, or a pattern. If we only wanted to make substitutions on lines that end with a period, we could use something like this:

sed -e '/\.$/s/\./!/g' short_story.txt

(The address to affect is in bold)

This tells sed, "only on lines that end with a period ("\." - we have to escape the period, otherwise it'll match any single character), substitute an exclamation point for any period on the line." OK, it's a completely contrived example, but it should convey the power this brings. I used a not so contrived example the day I finished this article. I was working on a mailing for a client (not a spammer), that needed to retrieve their client e-mail addresses from a Crystal Reports CSV file. CR writes crummy CSV, only enclosing fields in quotes if they have an embedded comma. While that's valid, the software this client was using for mailing didn't like that format too much. sed to the rescue! Here's what I did:

$ sed '/^\"/s/,/ /1' clients.csv > clients2.csv

This told sed, "only on lines that begin with a quote (gotta escape the quote there), substitute a space for the first comma that you find." This output was then redirected to another file, which we had the mailing software (happily) read in.

Don't feel confuSED

Just learning the sed presented in this month's column will bring you incredible power on the command line and when editing files in batch or under the command of a shell script. Believe it or not, there's more! If this was your first exposure to sed, practice, practice, practice! Get a test file (or two or three) and see what sed does when you issue various commands. Next month, I'll cover a little more on the quirky interactions, other commands and more advanced usage.

While I normally say, 'see you next month,' and I mean 'in print,' I really do hope to see everyone next month in person: at MacWorld! I'll be there all week, and presenting the, "From the Chime to the Desktop" session on Wednesday in the IT track conference. Please say hello if you'll be in San Francisco! Of course, I'll still see everyone in print. Until then, let sed rip through some files for you!


Ed Marczak owns and operates Radiotope, a technology consulting company. Ed will be speaking and hanging out at MacWorld SF 2006 all week - hope to see you there! He also deposits tech thoughts on-line at http://www.radiotope.com

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

GraphicConverter 10.5.4 - $39.95
GraphicConverter is an all-purpose image-editing program that can import 200 different graphic-based formats, edit the image, and export it to any of 80 available file formats. The high-end editing... Read more
Dash 4.1.3 - Instant search and offline...
Dash is an API documentation browser and code snippet manager. Dash helps you store snippets of code, as well as instantly search and browse documentation for almost any API you might use (for a full... Read more
Microsoft OneNote 16.9 - Free digital no...
OneNote is your very own digital notebook. With OneNote, you can capture that flash of genius, that moment of inspiration, or that list of errands that's too important to forget. Whether you're at... Read more
DEVONthink Pro 2.9.17 - Knowledge base,...
Save 10% with our exclusive coupon code: MACUPDATE10 DEVONthink Pro is your essential assistant for today's world, where almost everything is digital. From shopping receipts to important research... Read more
OmniGraffle 7.6 - Create diagrams, flow...
OmniGraffle helps you draw beautiful diagrams, family trees, flow charts, org charts, layouts, and (mathematically speaking) any other directed or non-directed graphs. We've had people use Graffle to... Read more
iFinance 4.3.7 - Comprehensively manage...
iFinance allows you to keep track of your income and spending -- from your lunchbreak coffee to your new car -- in the most convenient and fastest way. Clearly arranged transaction lists of all your... Read more
Opera 50.0.2762.58 - High-performance We...
Opera is a fast and secure browser trusted by millions of users. With the intuitive interface, Speed Dial and visual bookmarks for organizing favorite sites, news feature with fresh, relevant content... Read more
Microsoft Office 2016 16.9 - Popular pro...
Microsoft Office 2016 - Unmistakably Office, designed for Mac. The new versions of Word, Excel, PowerPoint, Outlook and OneNote provide the best of both worlds for Mac users - the familiar Office... Read more
SoftRAID 5.6.4 - High-quality RAID manag...
SoftRAID allows you to create and manage disk arrays to increase performance and reliability. SoftRAID allows the user to create and manage RAID 4 and 5 volumes, RAID 1+0, and RAID 1 (Mirror) and... Read more
OmniGraffle Pro 7.6 - Create diagrams, f...
OmniGraffle Pro helps you draw beautiful diagrams, family trees, flow charts, org charts, layouts, and (mathematically speaking) any other directed or non-directed graphs. We've had people use... Read more

Latest Forum Discussions

See All

Around the Empire: What have you missed...
Around this time every week we're going to have a look at the comings and goings on the other sites in Steel Media's pocket-gaming empire. We'll round up the very best content you might have missed, so you're always going to be up to date with the... | Read more »
The 7 best games that came out for iPhon...
Well, it's that time of the week. You know what I mean. You know exactly what I mean. It's the time of the week when we take a look at the best games that have landed on the App Store over the past seven days. And there are some real doozies here... | Read more »
Popular MMO Strategy game Lords Mobile i...
Delve into the crowded halls of the Play Store and you’ll find mobile fantasy strategy MMOs-a-plenty. One that’s kicking off the new year in style however is IGG’s Lords Mobile, which has beaten out the fierce competition to receive Google Play’s... | Read more »
Blocky Racing is a funky and fresh new k...
Blocky Racing has zoomed onto the App Store and Google Play this week, bringing with it plenty of classic kart racing shenanigans that will take you straight back to your childhood. If you’ve found yourself hooked on games like Mario Kart or Crash... | Read more »
Cytus II (Games)
Cytus II 1.0.1 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.1 (iTunes) Description: "Cytus II" is a music rhythm game created by Rayark Games. It's our fourth rhythm game title, following the footsteps of three... | Read more »
JYDGE (Games)
JYDGE 1.0.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0.0 (iTunes) Description: Build your JYDGE. Enter Edenbyrg. Get out alive. JYDGE is a lawful but awful roguehate top-down shooter where you get to build your... | Read more »
Tako Bubble guide - Tips and Tricks to S...
Tako Bubble is a pretty simple and fun puzzler, but the game can get downright devious with its puzzle design. If you insist on not paying for the game and want to manage your lives appropriately, check out these tips so you can avoid getting... | Read more »
Everything about Hero Academy 2 - The co...
It's fair to say we've spent a good deal of time on Hero Academy 2. So much so, that we think we're probably in a really good place to give you some advice about how to get the most out of the game. And in this guide, that's exactly what you're... | Read more »
Everything about Hero Academy 2: Part 3...
In the third part of our Hero Academy 2 guide we're going to take a look at the different modes you can play in the game. We'll explain what you need to do in each of them, and tell you why it's important that you do. [Read more] | Read more »
Everything about Hero Academy 2: Part 2...
In this second part of our guide to Hero Academy 2, we're going to have a look at the different card types that you're going to be using in the game. We'll split them up into different sections too, to make sure you're getting the most information... | Read more »

Price Scanner via MacPrices.net

Apple restocked Certified Refurbished 13″ Mac...
Apple has restocked a full line of Certified Refurbished 2017 13″ MacBook Airs starting at $849. An Apple one-year warranty is included with each MacBook, and shipping is free: – 13″ 1.8GHz/8GB/128GB... Read more
How to find the lowest prices on 2017 Apple M...
Apple has Certified Refurbished 13″ and 15″ 2017 MacBook Pros available for $200 to $420 off the cost of new models. Apple’s refurbished prices are the lowest available for each model from any... Read more
The lowest prices anywhere on Apple 12″ MacBo...
Apple has Certified Refurbished 2017 12″ Retina MacBooks available for $200-$240 off the cost of new models. Apple will include a standard one-year warranty with each MacBook, and shipping is free.... Read more
Apple now offering a full line of Certified R...
Apple is now offering Certified Refurbished 2017 10″ and 12″ iPad Pros for $100-$190 off MSRP, depending on the model. An Apple one-year warranty is included with each model, and shipping is free: –... Read more
27″ iMacs on sale for $100-$130 off MSRP, pay...
B&H Photo has 27″ iMacs on sale for $100-$130 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only: – 27″ 3.8GHz iMac (MNED2LL/A): $2199 $100 off MSRP – 27″ 3.... Read more
2.8GHz Mac mini on sale for $899, $100 off MS...
B&H Photo has the 2.8GHz Mac mini (model number MGEQ2LL/A) on sale for $899 including free shipping plus NY & NJ sales tax only. Their price is $100 off MSRP. Read more
Apple offers Certified Refurbished iPad minis...
Apple has Certified Refurbished 128GB iPad minis available today for $339 including free shipping. Apple’s standard one-year warranty is included. Their price is $60 off MSRP. Read more
Amazon offers 13″ 256GB MacBook Air for $1049...
Amazon has the 13″ 1.8GHz/256B #Apple #MacBook Air on sale today for $150 off MSRP including free shipping: – 13″ 1.8GHz/256GB MacBook Air (MQD42LL/A): $1049.99, $150 off MSRP Read more
9.7-inch 2017 WiFi iPads on sale starting at...
B&H Photo has 9.7″ 2017 WiFi #Apple #iPads on sale for $30 off MSRP for a limited time. Shipping is free, and pay sales tax in NY & NJ only: – 32GB iPad WiFi: $299, $30 off – 128GB iPad WiFi... Read more
Wednesday deal: 13″ MacBook Pros for $100-$15...
B&H Photo has 13″ #Apple #MacBook Pros on sale for up to $100-$150 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only: – 13-inch 2.3GHz/128GB Space Gray... Read more

Jobs Board

*Apple* Store Leader - Retail District Manag...
Job Description: Job Summary As more and more people discover Apple , they visit our retail stores seeking ways to incorporate our products into their lives. It's Read more
Sr. Experience Designer, Today at *Apple* -...
# Sr. Experience Designer, Today at Apple Job Number: 56495251 Santa Clara Valley, California, United States Posted: 18-Jan-2018 Weekly Hours: 40.00 **Job Summary** Read more
Security Applications Engineer, *Apple* Ret...
# Security Applications Engineer, Apple Retail Job Number: 113237456 Santa Clara Valley, California, United States Posted: 17-Jan-2018 Weekly Hours: 40.00 **Job Read more
*Apple* Solutions Consultant - Apple (United...
# Apple Solutions Consultant Job Number: 113384559 Brandon, Florida, United States Posted: 10-Jan-2018 Weekly Hours: 40.00 **Job Summary** Are you passionate about Read more
Art Director, *Apple* Music + Beats1 Market...
# Art Director, Apple Music + Beats1 Marketing Design Job Number: 113258081 Santa Clara Valley, California, United States Posted: 05-Jan-2018 Weekly Hours: 40.00 Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.