Planet SpiderLogic

July 23, 2008

Maneesh Chaturvedi

Kickstarting your day and honing your development skills

As per various studies conducted, it has been shown that developers tend to take at least an hour or more to get into the zone, where they are most productive. There are various ways which have been suggested like writing unit tests at the start if the day,etc. to get into the zone. Being a pragmatic programmer, i have found something which has helped me kick-start my day and get into the zone in the minimum time possible. The following recipe has done wonders for me.
At the start of the day, i take up a puzzle and spend 5-10 minutes in analyzing and solving it. Be aware the intent is not to come to a solution, but just to get the gray cells kicking. After this i normally try some programming problem, albeit a simple one using a new language which i am trying to acquire. I spend another 20-30 minutes in solving the problem. The problem could be as simple as trying to find a repeated integer in a list of otherwise unique sequential numbers. Getting these two activities at the start of the day has two-fold advantages. One it focuses the brain and secondly it helps me acquire a new skill. I have even tried using this after prolonged meetings just to get back into the zone. However a word of caution, the intent is not to get to perfection, but to time-box the effort. You can get back to the problems again the next day if you are stuck on something.

by maneesh chaturvedi (noreply@blogger.com) at July 23, 2008 05:00 AM

July 22, 2008

Grant Rettke

Larceny Scheme

Larceny is a simple and efficient implementation of the Scheme programming language. Created originally as a test vehicle for research on garbage collection and compiler optimizations, Larceny has grown into a major multiplatform system, and is currently the only implementation that supports all four de facto standards for Scheme: IEEE/ANSI, R5RS, ERR5RS, and the R6RS.

by Grant at July 22, 2008 12:25 PM

July 21, 2008

Dan Tanner

what, me worry about good stack traces?

Those of us fortunate enough to get to build an application inside sharepoint are often confronted with this error message:


Pretty useful, huh?

What you quickly learn is that the real error is usually found in the shared log, which is usually more descriptive. But every now and then you get a real treat like this as your only source of information:

05/12/2008 14:29:32.84 w3wp.exe (0x03A4) 0x0588 Windows SharePoint Services General 8dzz High Exception Type: System.NullReferenceException Exception Message: Object reference not set to an instance of an object.


Those of us familiar with java development are used to seeing worthless NullPointerExceptions like this, except that further down the stack trace is the clue to what's missing. In this case however, you don't get that clue.

Good thing it's so easy to write functional automated tests in sharepoint :)

My wise colleague Jon Jones pointed me to a roundup of great logging and debugging tips by Andrew Connel here.


On a positive note, my talented colleague Brian Kapellusch wrote a nice wrapper around Spring.NET and NHibernate, which our project is using as the container for our business logic and data access. Craptacular? Yes. Needed? Yes. At least until MS finally gets around to producing a production quality MVC framework, which is probably at least a year off. Without it, I'd have a lot tougher time isolating and testing things like this.

by Dan (noreply@blogger.com) at July 21, 2008 12:01 AM

July 20, 2008

Dan Tanner

Netflix and the case of the missing download selection


So I got a Roku box for Netflix a few weeks back. I was excited, because I feel it's a great next step in the evolution of media deliver. We took a step back for a while there - long ago we had to actually get off the couch to change the channel. Then we got remotes. Thirty years later people were running up to their office computer to change the stream. Definitely not good - especially if we're gonna keep up our American Image.

The good news is that the Roku box and the Watch It Now service works great for the movies they offer. The bad news is that I've watched all 5 of the decent movies they offer. For real. 95% of the movies they offer for their Watch It Now service are either:
a. really old
b. really bad
c. all of the above

Think I'm exaggerating? As an example, the #4 most popular title last week in the Watch It Now collection is "Blushing Bloopers". You can't even *find* that title in IMDB. Other Netflix customers rated it 2.1. Netflix's estimated rating for me was 1.9. Yes, I have exquisite taste; only movies like Slap Shot get a 5 star rating from me. Space Jam was number 10 on the most popular list last week. I did see Dexter in there, which is a must-see IMO for anyone that hasn't seen it.

After my own reaction, I figured a google search would uncover the reason for the skimpy selections, or at least show a thousand other similar opinions, but so far there aren't many people complaining about it at all. For something that's definitely the future, it seems like Netflix is being too tentative in their content offerings. Great - they've solved the delivery problem for another huge potential chunk of the consumer population. But if they're going to succeed, they've gotta offer content worth delivering. I'm sure it's not completely their fault (another case of license extortion?), but they need to solve that problem before they'll get beyond a beta stage of this solution.

So in all, I like my Roku box, but I hope that soon enough I'll be able to waste time watching good TV instead of blogging about bad TV. I'm a busy guy, and I have things to do.

by Dan (noreply@blogger.com) at July 20, 2008 11:57 PM

Grant Rettke

Setting the memory limit in DrScheme

When you use DrScheme, you should be sure to set the memory limit by going to the menu-item:

Scheme->Limit Memory

Doing so allows DrScheme to “play nice” with the operating system when you write some code that eats up all of the free memory. Rather than taking the whole operating system down; DrScheme dies gracefully.

by Grant at July 20, 2008 06:21 PM

Logging support in PLT

Language level logging support has been added to to PLT Scheme.

Why not in a library?

so that the run-time system can report information through logging.

by Grant at July 20, 2008 01:32 AM

July 18, 2008

Geoff Lane

Member of the orignal 416 - Pradipta’s Rolodex

The 416 is an elite group of software developers (alright, some of them weren’t software developers, and some of them were fresh out of school) brought together by Pradipita’s mistaken use of CC. Who would have known what would come of such an innocent message: I have a couple of Ruby on Rails position, wanted to [...]

by Geoff Lane at July 18, 2008 05:54 PM

Grant Rettke

July 14, 2008

Grant Rettke

Maglev Ruby

Ruby is often compared to Smalltalk; and I’m sure a bunch of folks have always wondered when someone would implement Ruby either on top of Smalltalk (or even in a similar manner to Smalltalk, aka Rubinius).

Avi Bryant wondered as such, and seems to have gotten a job out of it in producing Maglev Ruby (it is a video).

Maglev seems to be the combination of a Ruby VM implemented along with a distributed, concurrent object system to support the needs of Ruby on Rails.

I heard Avi speak at OSCON 06, and he seems to be a nice fellow; I’ll be interested to see how this pans out.

Strangely, I haven’t heard Maglev mentioned by anyone I know, perhaps Ruby VMs aren’t interesting.

by Grant at July 14, 2008 01:09 AM

July 13, 2008

Grant Rettke

Computing History at Bell Labs

In 1997, on his retirement from Bell Labs, Doug McIlroy gave a fascinating talk about the “History of Computing at Bell Labs.”

Here is one man’s transcription of that talk.

I only skimmed over it (at the moment I’m not interested in what was); but nonetheless it does look like a great talk with a lot of fun and interesting bits in there.

by Grant at July 13, 2008 11:13 PM

match.ss Examples

Ben posted some examples of how to use the PLT match.ss library here.

by Grant at July 13, 2008 11:01 PM

Mosh Scheme

Mosh Scheme is

A Fast R6RS Scheme interpreter.

I saw Mosh referenced on the SRFI-98 email discussion archive.

There already a wiki written using Mosh.

Addendum 7/18/8:

Mosh has been updated to version 0.0.5.

by Grant at July 13, 2008 03:06 AM

SRFI 98: An interface to access environment variables

This announces the availability for discussion of

Scheme Request for Implementation 98

“An interface to access environment variables.”

by Taro Minowa(Higepon).

Its draft and an archive of the ongoing discussion is available at

http://srfi.schemers.org/srfi-98/

(via Usenet)

by Grant at July 13, 2008 02:43 AM

OLPC News Forum

The OLPC News Forum is the place to go for XO discussion.

by Grant at July 13, 2008 12:07 AM

SD Cards on the XO

Here is the page on the OLPC website that documents secure digital cards on the XO.

The XO supports SDHC (high capacity). I use a 16GB AData card on my XO without problem.

Here is a little reminder: when you insert the SD card into the XO, do so with the screen, and the shiny part of the card, both facing you.

BTW: I use a “Sandisk MicroMate for SDHC” USB adapter to work with the card under Windows XP, it cost 8USD from an Amazon affiliate.

by Grant at July 13, 2008 12:02 AM

July 12, 2008

Grant Rettke

The OLPC Wiki

The OLPC Wiki is the place to go to get information about XO.

by Grant at July 12, 2008 11:57 PM

July 11, 2008

Chris Peterson

Toga! Toga! Toga!

So one of my favorite Bars in Milwaukee and one of my favorite Bands in Milwaukee are getting together to throw a toga party Saturday. Papa's Social Club and Single Barrel are getting together along with yours truly working the door and making a guest appearance to play cow bell to bring you this party all for a small $5.00 fee at the door. If you come in a toga you will also receive drink

by Chris Peterson (noreply@blogger.com) at July 11, 2008 05:12 PM

July 10, 2008

Grant Rettke

erlang-scheme interop

Eric Sessoms announced his Erlang-Scheme interoperability library recently on the PLT discussion list.

What it is: Basically, it’s a port of Distel from emacs lisp over to
scheme. It talks to erlang using its own protocol and impersonates an
erlang node on the network. It aims to provide an abstraction such
that erlang processes look like scheme threads, and vice-versa.
Communication from scheme to erlang is done with (a wrapper around)
thread-send. Messages from erlang to scheme get routed to thread
mailboxes so that they can be picked up with thread-receive.

by Grant at July 10, 2008 06:29 PM

July 03, 2008

Grant Rettke

TeachScheme, ReachJava

A silent revolution has changed the way computer science is understood and taught. The modern curriculum no longer focuses on the constructs of a language and the state changes in the machine. Instead, programming is taught as a problem-solving process that starts from a thorough understanding of classes of data and objects. The TeachScheme! Project has been at the vanguard of this revolution; the new series is its natural extension to cover a seamless transition to object-oriented design using Java.

Teaching is hard, and getting people to change how they teach is even harder. TeachScheme! (read Teach Scheme, NOT!)

wants to turn Computing and Programming into an indispensable part of the liberal arts curriculum.

There are truly world class folks involved in this effort whom I trust. Though I do not yet understand the depth of this project, I find it fascinating, and inspiring.

by Grant at July 03, 2008 08:43 PM

Common Lisp HyperSpec

The Common Lisp HyperSpec is a hypertext version of the ANSI Common Lisp standard comprising approximately 15MB of data in 2300 files which contain approximately 105,000 hyperlinks.

(via Wikipedia)

by Grant at July 03, 2008 08:34 PM

The 90 Minute Scheme to C compiler

90 minute video presentation from Marc Feeley, along with accompanying PowerPoint slides and source code, for a Scheme to C compiler. Good discussion of continuations and closures, as well as some dipping into the area of compiler construction.

I didn’t work through this but it looks like it might be a fun project to undertake (I’ll add it to the list).

(via LtU)

by Grant at July 03, 2008 08:31 PM

June 28, 2008

Grant Rettke

Advice on writing teachpacks

Here is some advice on writing teachpacks for PLT’s DrScheme.

About teachpacks:

Teaching languages are small subsets of a full programming language. While such restrictions simplify error diagnosis and the construction of tools, they also make it impossible (or at least difficult) to write some interesting programs. To circumvent this restriction, it is possible to import teachpacks into programs written in a teaching language.

In principle, a teachpack is just a library written in the full language, not the teaching subset. Like any other library, it may export values, functions, etc. In contrast to an ordinary library, however, a teachpack must enforce the contracts of the “lowest” teaching language into which it is imported and signal errors in a way with which students are familiar at that level.

by Grant at June 28, 2008 10:32 PM

XO Critical Configuration 1

Until a recent trip, I hadn’t used the XO very hard, or configured it at all. Before heading out, I read Bill’s article and found some real gems that, along with my own preferences, make using the XO a much more pleasurable experience. They follow:

by Grant at June 28, 2008 10:11 PM

An image viewer for the XO

Feh is the only image viewer that is both easy to install and use on the XO.

I never thought I would use the XO as an image viewer until I found how convenient it was to pass around to folks in lieu of a much larger laptop.

by Grant at June 28, 2008 09:59 PM

Conference Paper Word Templates

Back when my co-worker and I were preparing some white-papers (the research was the hard part), we decided to present them in “conference paper layout” initially.

SIGPLAN provides some such templates for MS Word here.

Alternately, here are some local copies:

sigplanconf.dot
sigplanconf-varsize.dot

by Grant at June 28, 2008 09:43 PM

OLPC XO OS build 703 changes

OLPC XO OS build 703 has at least two significant changes:

The first is that it automatically suspend when closed, with this caveat:

the system can’t suspend when the USB bus is in use by an external device (unless it’s a USB mass storage device and has been fully allowed to write any cached info and quiesce itself).

This might not seem like a big deal, but folks have been wanting it for a long time.

The second is that activities no longer come pre-installed in the OS image.

by Grant at June 28, 2008 02:42 PM

S24 FORTHdrive

This article led me to IntellaSys, which offers this tiny little 24-core CPU that runs Forth code!

by Grant at June 28, 2008 02:26 PM

Typed Scheme

Typed Scheme is a typed dialect of PLT Scheme. It integrates with modules written in other PLT dialects, and provides a type system designed to support common Scheme idioms.

Typed Scheme is a pretty neat language because it can can both use and be used by (untyped) Scheme code in PLT Scheme.

by Grant at June 28, 2008 01:22 PM

The XO was made for its creators

After using the OLPC XO heavily for the past three weeks for web browsing, pdf reading, and educational game playing (by my 5 year old nephew), I can’t help me get the feeling that the XO was made for its creators, and not for children.

Now don’t get me wrong, I love the thing; but how do you explain to a 5 year old (albeit a very smart one) that he can’t start more than 2 or 3 programs at once because the machine will run out of memory and the CPU will get bogged down?! (An aside, how you explain it is by doing just that, excluding the part about memory and cpu).

The vision of a computer where everything is written in Python and everything is modifiable and maintainable by the user is a fun idea, but only for hackers, aka, the creators. Kids could care less. What they want are programs they can use that work well. Are they getting that right now? Well, they are getting something that works “well enough”, but it seems like the XO creators are painting themselves into a corner here in terms of performance; since the hardware will never get upgraded, the only place they’ve got left to speed things up is in the code, and that seemingly has not been a priority thus far.

by Grant at June 28, 2008 01:13 PM

colorForth

colorForth is a redesign of [Forth] for the 21st century. It also draws upon a 20-year evolution of minimal instruction-set microprocessors. Now implemented on modern PCs, it runs stand-alone without an operating system. Applications are recompiled from source with a simple optimizing compiler.

It is the child of Chuck Moore, the creator of Forth.

by Grant at June 28, 2008 12:44 PM

An interview with Charles H. Moore

This interview with Charles H. Moore is a great read.

I think it behooves new programmers to sample all the languages available. Forth is the only one that’s fun. The satisfaction of finding a neat representation cannot be equaled in Fortran, C or even Lisp.

Well clearly he hasn’t tried out Scheme :)

(via Dave)

by Grant at June 28, 2008 12:33 PM

June 26, 2008

Grant Rettke

Why Computer Science Doesn’t Matter

Why Computer Science Doesn’t Matter is an essay about the lack of computer science in the educational curriculum today, and what can be done about it. They’ve come up with an interesting, and successful, approach.

[I want] to place computing where it belongs: in the hearts and minds of every single student.

Here here!

by Grant at June 26, 2008 11:25 AM

June 25, 2008

Grant Rettke

DEFUN08: ACM SIGPLAN 2008 Developer Tracks on Functional Programming

DEFUN 2008 invites functional programmers who know how to solve
problems with functional progamming to give talks and lead tutorials
at the The ICFP Developer Tracks.

We want to know about your favorite programming techniques, powerful
libraries, and engineering approaches you’ve used that the world
should know about and apply to other projects. We want to know how to
be productive using functional programming, write better code, and
avoid common pitfalls.

ICFP08 looks like a lot of fun!

(via PLT)

by Grant at June 25, 2008 02:02 AM

June 24, 2008

Grant Rettke

Soft real-time Scheme implementations

Check out this list of soft real-time Scheme implementations!

by Grant at June 24, 2008 11:03 PM

ypsilon

ypsilon is the implementation of Scheme Programming Language, which conforms to the latest standard R6RS. It achieves a remarkably short GC pause time and the best performance in parallel execution as it implements “mostly concurrent garbage collection”, which is optimized for the multi-core CPU system.

If you are wondering “Why yet another Scheme implementation” you can find the answer here. To sum it up: they require real-time processing speed and can not use Boehm Garbage Collector because they run on arcade consoles or pinball machines, so, they had to start from scratch.

by Grant at June 24, 2008 11:02 PM

PLT Scheme 4.0 is out

Bill Clementson wrote a great post about it here.

by Grant at June 24, 2008 04:44 PM

June 22, 2008

Maneesh Chaturvedi

Regular Expressions Part 2

This entry continues the discussion regarding regular expressions from the previous post(Regular Expression part 1). We continue with a discussion of the rich meta-characters available. In this post i will take up some of the other meta-characters and discussion their usage. The pattern followed is the same as the previous post with some exercises thrown in for good measure.
  • Optional Items -- Suppose we wanted to search for the word June, which could be represented as either June or Jun. The ? meta-character means optional. So the search could be accomplished using June?. This can be interpreted as match J, then u then n followed by e if its there. So the ? is placed after the character that is allowed to appear at that point in the regular expression, but the existence isn't required to consider it a successful match. Expanding on the same lets say we have to match June 5th, which can occur as Jun 5, June 5th, or fifth. To match the following we can use (June|Jun).(5|5th|fifth). The first expression can be simplified as (June?) and the second one as fifth|5(th)?. Hence the complete expression can be re-written as June?.(fifth|5(th)?). One point here is that although there are different alternatives, choosing the right one needs some thought and introspection.
  • Repetition -- The + and * meta-characters are used for checking for repetition. + implies one or more of the preceding character and * implies any number including none of the item. Thus + fails if there are none of the characters but succeeds for one or many, * on the other hand always succeeds. As an example, to match spaces, one can use . ? which would match a single optional space, . + would match any number of spaces with at least one space and . * would match any number of spaces, if present. Lets take another example, where we have to search for the html tag <hr size="10">. To build the regular expression, we need to have, then HR followed by one or more spaces(allowed as per the semantics of html), followed by zero or more spaces, followed by = followed by zero or more spaces the number 10 , again zero or more spaces followed by >. Based on our discussion so far, this should be simple, Now lets try and generalise this to any size, not just 10, so the regular expression would look like For case insensitive search using egrep, we can use the -i option.
Exercise -- Write a regular expression for the same examplem as above, where the size attribute is optional e.i a simple
is a legal match.

  • Back References -- Many tools provide the ability for parentheses to remember text matched by the sub-expression they invoke. So as an example, ([a-z])([0-9])\1\2, \1 refers to the text matched by [a-z] and \2 refers text matched by [0-9]. We will see examples of using back-referencing in following posts.
  • Escapting Meta-characters -- In order to escape meta-characters, we can use \ to escape the meta-character. For example if we tried to match att.com, it could end up matching something like watt company. So we need to escape the meta-character . , this can be done using att\.com. This escapes the meta-character and treats the meta-character . like a normal character
This covers some of the most important meta-characters. In future posts i would expand on these fundamentals and show how applying regular expressions makes life simpler.

by maneesh chaturvedi (noreply@blogger.com) at June 22, 2008 10:23 AM

June 21, 2008

Sudarshan Bhalerao

ASP.NET MVC

 

ASP.NET MVC is new and clean way of writing web applications. Few important points about ASP.NET MVC:

  1. It enables clean separation of concerns, testability, and TDD by default.
  2. All core contracts within the MVC framework are interface based and easily mockable.
  3. It is highly extensible and pluggable.  Everything in the MVC framework is designed so that it can be easily replaced/customized. For example: you can optionally plug-in your own view engine, routing policy, parameter serialization, etc.
  4. It also supports using existing dependency injection and IOC container models (Windsor, Spring.Net, NHibernate, etc).
  5. It includes a very powerful URL mapping component that enables you to build applications with clean URLs.  URLs do not need to have extensions within them, and are designed to easily support SEO and REST-friendly naming patterns. For example: http://localhost/Products/Edit/5.
  6. he MVC framework supports using the existing ASP.NET .ASPX, .ASCX, and .Master markup files as "view templates".
  7. The ASP.NET MVC framework fully supports existing ASP.NET features like forms/windows authentication, URL authorization, membership/roles, output and data caching, session/profile state management, health monitoring, configuration system, the provider architecture, etc.

You will come to know some more details while going through presentation.

image

Here is presentation done by me on ASP.NET MVC.

Here is one sample application which gives you an idea about basics of ASP.NET MVC like controllers, views, models and URL routing. This will also gives an idea about testing abilities in ASP.NET MVC.

Here are some valuable resources on ASP.NET MVC. You can find them here also: http://polymorphicpodcast.com/shows/mvcresources/.

jQuery

How have I missed jQuery all this time? I am really loving the support jQuery can provide for all kinds of things like AJAX and controls, but my favorite feature is the support for CSS3-like behavior that is available today in a cross-broswer fashion.

Here is a script I used recently to "stripe" a table (provide alternating row colors).

$(document).ready(function() {
    $('table.striped tr:odd').css('background', '#eee');
});

Community Events

By now surely you have had at least some exposure to the new ASP.NET MVC framework. As with any new technology often times trying to find the quality resources to rely on may be a trying task.

 

Get the Latest

To help make your development experience more comortable the following is a list of 47 of the most interesting and helpful ASP.NET MVC resources available to-date. Now while we can all agree that this post becomes a legacy artifact as soon as I press the "Publish" button, make sure to frequently visit the links below for the latest ASP.NET MVC goodness:

Introductions & Architectural Overview

Let's start with a good foundation, shall we? The following links are a good read for the uninitiated as well as those who have had the bits since before preview 2.

Routing

One of the first mindset shifts you have to make is to think up-front about the URLs on your site. The Routes Table is a new concept introduced along with MVC. These links will help you get started:

Building the UI

Once you've got the basics down you are going to want to know how to execute some of the common tasks that you are used to doing under any development platform. These links will get you up-and-running.

Integration with the Familiar

Learn to use MVC with AJAX, the Membership framework as well as older versions of IIS.

Implementing Security

Forms authentication is alive and well with MVC, but you need to interface with the system in a different way:

  • Securing Controller Actions: While you will still use forms authentication, MVC is all about defining resources via the URL so folder-based lock-downs don't make as much sense. The best way to add a secure layer to your application is to secure controller actions.
  • Using XML: Check this out! Azam Sharp shows you how to secure controller actions with an XML file.

REST

As they say, REST has been around for a long time, but there are those of us who are just being introduced:

Validation

Without the page/control paradigm of WebForms some have wondered if MVC developers are going to be left to reimplement validation JavaScript all over again. Lay your worries to rest!

Testing

One of the main reasons of using MVC is to have the ability to fully test your applications.

Other Interesting Links

There is so much great content out there that all if it doesn't always fit into a tidy little category heading. Here are a few more gems I've found that are worth reading.

Books

Blog posts and podcasts are great, but to really sink into the technology check out these books:

Stay tune for some more interesting articles on ASP.NET MVC.

Enjoy Programming!!

June 21, 2008 05:36 PM

June 20, 2008

Maneesh Chaturvedi

Regular Expressions Part 1

I have been a great fan of using regular expressions for solving day to day tasks. I thought it was time to post a series for discussing the use and syntax of regular expressions. In this post i will talk about some simple meta-characters and their uses with tools like egrep/grep. In addition i will try and set some small exercises to practice the understanding of the readers.

A regular expression is composed of a set of meta-characters like *, ^ etc and a set of literal characters like A-Z, a-z, 0-9 etc.

Some Simple Meta-characters
Let us start with a some simple meta-characters and how we can apply these to various scenario's. The
  • ^ - The Caret/Anchor is one of the simplest meta-characters and represents the start of the line which is being checked. For example ^cat would match all lines which start with the literal cat like catalog
  • $ - The dollar meta-character represents the end of the line which is being checked. So cat$ would match all lines which end with the literal characters cat, for example scat.
These 2 meta-characters are unique in the sense that they match a position rather than the actual literal characters themselves.

Exercise -- What would ^cat$, ^$ and ^ match ?


  • Character Class -- The meta-character [...] called the character class lets you list the characters you want to allow at that point of match. For example e matches just e , a matches just a, however [ea] would match both e or a. Thus the implied meaning is that of OR. Suppose you wanted to check whether you have spelt grey or gray. The following regular expression would match the occurance of both gr[ea]y. This matches g follows by r followed by either e or a followed by y. The - operator within a character class matches a range, so [0-9A-Z] matches occurances of numbers and capital letters. For example ] would help you while searching headers in html documents. The - is special inside a character class, otherwise it is used as a normal character and not as a range.
  • Negated Character classes -- A ^ inside a character class negates the matches. For example [^a-z] would match any character which is not a through z. Remember a negated character class means match a character thats not listed and not don't match whats listed
Exercise -- Figure out all words in a dictionary which contain a q not followed by a u
  • Matching Any character with dot -- this is simpler to explain with an example. Suppose you want to find a date 26/02/1977, which can also be represented as 26-02-1977 or even 26.02.1977. Not the . character can be used to match any character. So seemingly, 26.02.1977 should return us the date however it be represented, maybe with a - or a /. However, the above would also match a random number like 12264023197734. This is because the dot character matches any character. The correct way to do so would be to use something like 26[-./]02[-./]1977 (Note, the - is this case would not be a meta-character, since it immediately follows the [, if the same was written as [.-/], then it would become the range meta-character, also the . is not treated as a meta-character since it is inside the character class)
  • Matching any one of several sub-expressions -- The | (OR) operator lets you combine expressions into a single expression, that matches any of the individual ones. For example, based on the previous examples, we can rewrite gr[ea]y as gr(e|a)y. (Note-- we cannot use something like gr[e|a]y as inside the character class, | is treated as a normal character). We need the parenthesis because gre|ay would mean gre or ay.
Exercises 1) How does gr[ea]y differ from gr(e|a)y ?
2) What is the difference between ^:List|Of|Books: from ^(List|Of|Books): ?
Note -- Tools like egrep provide switches for case-insensitive searches. For egrep -i performs a case insensitive search.
Example -- egrep -i '^(Subject|Predicate):' myfile.txt
This serves as a good groundwork for someone wanting to start using regular expressions. I will be following up the post with detailed topics. Till then, cheers and keep the faith.

by maneesh chaturvedi (noreply@blogger.com) at June 20, 2008 05:48 AM

June 19, 2008

Geoff Lane

Capistrano and Ferret DRB

This is a bit of a followup to my previous post on Capistrano with Git and Passenger. I decided to use Ferret via the acts_as_ferret (AAF) plugin. Ferret is a full-text search inspired by Apache’s Lucene but written in Ruby. Basically Ferret and Lucene keep a full-text index outside of the database that allows it to [...]

by Geoff Lane at June 19, 2008 04:27 PM

Chris Peterson

Midwest Flooding

As many of you know the Midwest has been hit hard by rain and floods the last couple of weeks. In fact an entire lake was drained when the lake bed gave way. Many people have been displaced and have lost homes. some pics and video of the disaster are here. Please support the red cross and other charities to help the victims.

by Chris Peterson (noreply@blogger.com) at June 19, 2008 02:17 PM

Grant Rettke

Maneesh Chaturvedi

The Art of analysis and problem solving

I have been wondering about the art of analysis and problem solving and decided to do a writeup using an example. A lot of developers tend to jump to writing code and never seem to bother about thinking out their solutions. For them if it works its fine. Moving away from this is a pre-requisite for anyone who wants to hone his development skills. In order to convey my point, i take up a seemingly simple example which displays how to analyze a solution before implementing it.

Problem Statement

Write a function which removes a set of characters from a given String. The signature of the function should be as follows

public String removeCharsFromString(String source, String remove)


Problem Analysis
The problem involves the following 2 steps
1) For each character in source, determine whether it needs to be deleted
2) Delete the character

Lets start with the actual deletion process first. You have to remove an element from an array. An array is a contiguous block of memory, so you cant remove an element from the middle as can be done with a linked list. So you have to rearrange the elements to maintain it as a contiguous block. For example if you need to remove c from an array containing abcd , you have to rearrange the elements. So you can either shift ab up or shift d down. In addition you need to decrease the size of the string by one.

Now how would the proposed algorithm work if you had to delete all elements from the source string. If the String is n characters long, this would mean shifting the last element by n-1, the second last by n-2 and so on, giving the worst time ass O(n^2).

How can we avoid this. What if you allocated a temporary string buffer and built your modified string there instead of in place? Then you could simply copy the characters you need to keep into the temporary string, skipping the characters you want to delete. When you finish building the modified string, you can copy it from the temporary buffer back into str. This way, you move each character at most twice, giving O(n) deletion. However, you’ve incurred the memory overhead of a temporary buffer the same size as the original string, and the time overhead of copying the modified string back over the original string. Is there any way you could avoid these penalties while retaining your O(n) algorithm?

To implement the O(n) algorithm just described, you need to track a source position for the read location in the original string and a destination position for the write position in the temporary buffer. These positions both start at zero. The source position is incremented every time you read, and the destination position is incremented every time you write. In other words, when you copy a character you increment both positions, but when you delete a character you increment only the source position. This means the source position will always be the same as or ahead of the destination position. Once you read a character from the original string (that is, the source position has advanced past it), you no longer need that character - in fact, you’re just going to copy the modified string over it. Because the destination position in the original string is always a character you don’t need anymore, you can write directly into the original string, eliminating the temporary buffer entirely. This is still an O(n) algorithm, but without the memory and time overhead of the earlier version.

Now to the actual part of deciding whether a character needs to be deleted. The easiest way is to compare all characters in the remove string with all the characters in the source string. If the size of the source string is n and the size of the remove string is m, this entails a running time of O(nm). How can we reduce this to a running time of O(m).

If we construct an array or a hashtable which has a constant time lookup, we can reduce the running time to O(m).
Lets construct an array containing all Booleans which is indexed by all the possible values of the characters in the source string. This enables you to determine whether a character is in remove by checking a single element.

Now the function will have 3 parts
  • Set all elements in the lookup array to false
  • Iterate through each element in remove, setting the corresponding value in the lookup array to true
  • Iterate through str with a source and destination index, copying each character only if its corresponding value in the lookup array is false
Now if we combine all the operations, the running time would be O(n+m) which is still linear.

Now lets take a look at the code


public String removeCharOccurances(String source,String remove){
char[] sourceArr = source.toCharArray();
char[] removeArr = remove.toCharArray();
boolean[] flags = new boolean[128]; // assumes ASCII!
int len = sourceArr.length;
int len1=removeArr.length;
int src, dst;

// Set flags for characters to be removed
for( src = 0; src len1; ++src ){
flags[removeArr[src]] = true;
}
src = 0;
dst = 0;

// Now loop through all the characters,
// copying only if they aren't flagged
while( src len ){
if( !flags[ (int)sourceArr[src] ] ){
sourceArr[dst++] = sourceArr[src];
}
++src;
}

return new String( sourceArr, 0, dst );
}

by maneesh chaturvedi (noreply@blogger.com) at June 19, 2008 02:31 AM