...making Linux just a little more fun! |
By Ben Okopnik |
When Woomert Foonly answered the door in response to an insistent knocking, he found himself confronted by two refrigerator-sized (and shaped) men in dark coats who wore scowling expressions. He noted that they were both reaching into their coats, and his years of training in the martial arts and razor-sharp attention to detail resulted in an instant reaction.
- "Hello - you're obviously with the government, and you're here to help me, even if didn't call you. May I see those IDs?... Ah. That agency. Do come in, gentlemen. Feel free to remove your professional scowls along with your coats, you won't need them. Pardon me while I call your superiors just to make sure everything is all right; I need to be sure of your credentials. Please have a seat."
Some moments later, he put down the phone.
- "Very well; everything seems right. How may I help you, or, more to the point, help your associates who have a programming problem? I realize that security is very tight these days, and your organization prefers face-to-face meetings in a secure environment, so I'm mystified as to your purpose here; I don't normally judge people by appearances, but you're clearly not programmers."
The men glanced at each other, got up without a word, and began a minute security survey of Woomert's living room - and Woomert himself - using a variety of expensive-looking tools. When they finished a few minutes later, they once again looked at each other, and nodded in unison. Then, each of them reached into the depths of their coat and extracted a rumpled-looking programmer apiece, both of whom they carefully placed in front of Woomert. The look-and-nod ritual was repeated, after which they each retired to the opposite corners of the room to lurk like very large shadows.
Woomert blinked.
- "Well. The requirements of security shall be served... no matter what it takes. Have a seat, gentlemen; I'll brew some tea."
A few minutes later, after introductions and hot tea - the names of the human cargo turned out to be Ceedee Tilde and Artie Effem - they got down to business. Artie turned out to be the spokesman for the pair.
- "Mr. Foonly, our big challenge these days is image processing. As you can imagine, we get a lot of surveillance data... well, it comes to us in a standardized format that contains quite a lot of information besides the image: the IP of the originating site, a comment field, position information, etc. The problem is, both of us are very familiar with text processing under Perl but have no idea how to approach extracting a set of non-text records - or, for that matter, how to avoid reading in a 200MB image file when all we need is the header info... I'll admit, we're rather stuck. Our resident C++ guru keeps trying to convince us that this can only be done in his language of choice - it wouldn't take him more than a week, or so he says, but we've heard that story before." After an enthusiastic nod of agreement from Ceedee he went on. "Anyway, we thought we'd consult you - there just has to be something you can do!"
Woomert nodded.
- "There is. One thing, though: since we're not dealing with actual classified data or anything that requires a clearance - I assume you've brought me a carefully-vetted specification sheet, yes? - I want my assistant, Frink Ooblick, to be in on the discussion. This is, in fact, similar to the kind of work he's been trying to do lately, so he should find it useful as well."
Frink was brought in and debugged by the pair Woomert had dubbed Strong and Silent, although "perl -d" [1] was nowhere in evidence. After introductions all around, he settled into his favorite easy chair from which he could see Woomert's screen.
- "All right, let's look at the spec sheet. Hmmm... the header is 1024 bytes; four bytes worth of raw IP address, a forty-byte comment field, latitude and longitude of top left and bottom right, each of the four measurements preceded by a length-definition character... well, that'll be enough for a start; you should be able to extrapolate from the solution for the above."
"What do you think, Frink? Any ideas on how to approach this one?"
Frink was already sitting back in his chair, eyes narrowed in thought.
- "Yes, actually - at least the first part. Since they're reading a binary file, ``read'' seems like the right answer. As for the second... well, ``substr'', maybe..."
- "Close, but not quite. ``read'' is correct: we want to get a fixed-length chunk of the file. However, "substr" isn't particularly great for non-text strings - and hopeless when we don't know what the field length is ahead of time, as is the case with the four lat/long measurements. However, we do have a much bigger gun... whoa, boys, calm down!" he added as Strong and Silent stepped out of their corners, "it's just a figure of speech!"
"Anyway," he continued, with a twinkle in his eye that hinted at the "slip" being not-so-accidental, "we have a better tool we can use for the job, one that's got plenty of pep and some to spare: ``unpack''. Here, try this:
The moment of silence stretched until Ceedee cleared his throat.
# Code fragment only; actually processing the retrieved data is left as an # excercise, etc. :) ... $A="file.img";open A or die "$A: $!";read A,$b,1024;@c=unpack "C4A40(A/A)4", $b ...
- "Ah... Mr. Foonly... what the heck is that? I can understand the ``open'' function, even though it looks sort of odd... ``read'' looks reasonable too... but what's that ``unpack'' syntax? It looks as weird as snake suspenders."
Woomert glanced around. Artie was nodding in agreement, and even Frink looked slightly bewildered. He smiled and took another sip of tea.
- "Nothing to worry about, gentlemen; it's just an ``unpack'' template, a pattern which tells it how to handle the data. Here, I'll walk through it for you. First, though, let's expand this one-liner into something a bit more readable, maybe add a few comments:
The new syntax of "open" (starting with Perl 5.6.0) allows us to "combine" the filehandle name and the filename, as I did in the first two lines; the name of the variable (without the '$' sigil) is used as the filehandle. If you take a look at ``perldoc -f pack'', it contains a longish list of template specifications, pretty much anything you might want for conversions; you can convert various types of data, move forward, back up, and in general dance a merry jig. The above was rather simple, really:
$A = "file.img"; # Set $A equal to the file name open A or die "$A: $!"; # Open using the "new" syntax read A, $b, 1024; # Read 1kB from 'A' into $b @c = unpack "C4A40(A/A)4", $b; # Unpack $b into @c per template
The resulting output was assigned to @c, which now contains something like this:
C4 An unsigned "char" value repeated 4 times A40 An ASCII string 40 characters long (A/A)4 ASCII string preceded by a "length" argument which is itself a single ASCII character, repeated 4 times
Obviously, you can extend this process to your entire data layout. What do you think, gentlemen - does this fit your requirements?"
$a[0] The first octet of the IP quad $a[1] " second " " " " $a[2] " third " " " " $a[3] " fourth " " " " $a[4] The comment field $a[5] The latitude of the upper left corner $a[6] " longitude " " " " " $a[7] The latitude of the lower right corner $a[8] " longitude " " " " "
After the now-enthusiastic Artie and Ceedee had been bundled off by their hulking keepers and the place was once again as roomy as it had been before their arrival, Woomert opened a bottle of Hennessy's "Paradise" cognac and brought out a pair of small but fragrant cigars which proved to be top-grade Cohibas.
- "Well, Flink - that's another case solved; something that never fails
to make me feel cheery and upbeat. As for you - hit those books, young
man! - at least when we get done with this little treat. ``perldoc perlopentut''
will make a good introduction to the various ways to open a file, duplicate
a filehandle, etc.; ``perldoc -f pack'' and ``perldoc -f unpack'' will
explain those functions in detail. When you think you've got it, find a
documented binary file format and write a parser that will pull out the
data for examination. By this time tomorrow, you should be quite expert
in the use of these tools..."
Ben is a Contributing Editor for Linux Gazette and a member of The Answer Gang.
Ben was born in Moscow, Russia in 1962. He became interested in electricity at age six--promptly demonstrating it by sticking a fork into a socket and starting a fire--and has been falling down technological mineshafts ever since. He has been working with computers since the Elder Days, when they had to be built by soldering parts onto printed circuit boards and programs had to fit into 4k of memory. He would gladly pay good money to any psychologist who can cure him of the resulting nightmares.
Ben's subsequent experiences include creating software in nearly a dozen languages, network and database maintenance during the approach of a hurricane, and writing articles for publications ranging from sailing magazines to technological journals. Having recently completed a seven-year Atlantic/Caribbean cruise under sail, he is currently docked in Baltimore, MD, where he works as a technical instructor for Sun Microsystems.
Ben has been working with Linux since 1997, and credits it with his complete
loss of interest in waging nuclear warfare on parts of the Pacific Northwest.