Monday, November 3, 2008

Fast Trim Function for Javascript

Introduction

Following Steven Levithan's old post about string trimming.

I decided to give it another try after my first proposal.

I made a trim version, based on my aforementioned first attempt. It seems to be much faster on any browser and with different (string) lengths and whitespaces.


First of all...

I want to make clear that this version doesn't trim the exact same characters that other versions do.

In my first version, I simply tried to improve Steven's version without really giving it a deep thought.

This time I thought:Are all those characters really relevant?.

After thinking about this for a while I decided that, whatever /\s/ matches, should be trimmed.

Why ? because most major js libraries use the regex version. If their users are content, then those are the needed characters. Also, if my function would actually do better than others, it could go directly to jQuery's core (there's a ticket for that).

So I created a test page. The results weren't equal for all browsers, IE and Safari 3 yielded far less charCodes. I decided these are the ones I want to trim.


My Trim function

So, the basic modification I made to my previous version, was to check whether the charCode is lower than 33 instead of the whole map/object.

This one takes much less code and is quite faster, I humbly named it "myBestTrim", here it is:

// Licensed under BSD
function myBestTrim( str ){
 var start = -1,
  end = str.length;
 while( str.charCodeAt(--end) < 33 );
 while( str.charCodeAt(++start) < 33 );
 return str.slice( start, end + 1 );
};

The Benchmark

If you want to try the benchmark, get in here. The number should you care about (in my opinion) is the minimum. That's probably the one that ran a lower CPU level.

That test uses a string with 10K characters. Sounds like a lot but this blog's homepage has 55K of html. As oposed to a regex-based trim, this one should scale pretty well, because it doesn't need to check all the string.

That's actually why I removed any regex-based approach from this test, because they'd take too long.

If you have Firebug, I'd advice you to turn it off before getting into this test.

I made a similar test with small strings. It uses a string of 30 characters with 3 whitespaces on each side. In this case, the difference was smaller but still noticeable.

The only situation where another function was faster was on IE, with a small string WITH whitespaces. In this situation, jQuery's trim (the typical regex-based trim) was slightly faster, my trim got 2nd place.

Most situations don't really require much trimming and critical situations have very large strings, so I think this function scales well on those 2.


Conclusion

If all goes well, I'll put this function into jQuery's core. I'd like though, to get some results from other users, to verify my trim is really fast and effective.

Here's the ticket requesting a faster trim for jQuery: #2279.

11/6/08
Removed a dispensable check, thanks Andrea Giammarchi.

17 comments:

明月星光 said...

Good job. :)

Christophe Beyls said...

Great code! Have you seen the performance of your function in Google Chrome? It's incredible.

Ariel Flesler said...

@明月星光
Thanks!

@Christophe
Thanks, haven't seen that but I'm very glad to know, thanks for telling.

Nosredna said...

Really cool, but the idea that different browsers get different charcodes trimmed makes me nervous.

Cloudream said...

In Chrome:
http://labs.cloudream.name/jquery/faster-trim-1.png

http://labs.cloudream.name/jquery/faster-trim-2.png

Ariel Flesler said...

Yay! thanks Cloudream.
It's awesome to see those numbers, it's odd how the roman numbers went to the right... but meh.. nevermind.

Digital Spaghetti said...

I can't benchmark but I sure works fast on the G1 :)

Steven Levithan said...

Nice code, Ariel. But IMHO, you should not call it "trim." Maybe "clean," or something like that, but people have a reasonable and well-established expectation about what a function called trim should do. This code does something else entirely, leaving some whitespace behind, and stripping a bunch of control characters that are not whitespace at all.

I don't disagree with the idea of ignoring some esoteric whitespace characters (here's the full list based on ES3 and Unicode 5.1: http://stevenlevithan.com/regex/xregexp/#extended ), but I think it's important to add at least U+00A0 (no-break space) to your list. That's the character you get when using the &nbsp; entity in HTML, and it's not uncommon for people to unintentionally copy and paste no-break spaces into other places.

Ariel Flesler said...

Yeah, I was thinking about that 160 (charCode). I might add that in.

The way I thought it, was firstly to improve jQuery trim's perfomance and scalability.

jQuery's trim uses the \s approach, thus leaving out all those chars (included &nbsp) for IE/Safari.

I'll see how much overhead does including this char add.

Note that jQuery's function is called trim and it works just as this one (cross browser).

Anonymous said...

Pre-incrementing is faster than post. You can speed up the code slightly like this:

function trim(str){
var start = 0, end = str.length;
while (str.charCodeAt(--end) < 33);
while (++start < end && str.charCodeAt(start) < 33);
return str.slice(start, end + 1);
}

Anonymous said...

Correction:
var start = -1, end = str.length;

Ariel Flesler said...

Yes, thanks. I'll try these asap.

Anonymous said...

Wouldn't this fail for a string that is all spaces?
e.g. "

Ariel Flesler said...

Nope, did you try that ?

Marc said...

Thank you, your work helped me out of a performance crisis managing 80.000 db-entries :)

Just for the ones who do have problems understanding it hier a little quick'n'dirty help for ltrim and rtrim...


function ltrim( str ){
var start = -1,
end = str.length;
while( str.charCodeAt(++start) < 33 );
return str.slice( start, end + 1 );
};

function rtrim( str ){
var start = -1,
end = str.length;
while( str.charCodeAt(--end) < 33 );
return str.slice( start, end + 1 );
};

Yesudeep said...

Here's my implementation based on a lookup table:

String.whiteSpace = [];
String.whiteSpace[0x0009] = true;
String.whiteSpace[0x000a] = true;
String.whiteSpace[0x000b] = true;
String.whiteSpace[0x000c] = true;
String.whiteSpace[0x000d] = true;
String.whiteSpace[0x0020] = true;
String.whiteSpace[0x0085] = true;
String.whiteSpace[0x00a0] = true;
String.whiteSpace[0x1680] = true;
String.whiteSpace[0x180e] = true;
String.whiteSpace[0x2000] = true;
String.whiteSpace[0x2001] = true;
String.whiteSpace[0x2002] = true;
String.whiteSpace[0x2003] = true;
String.whiteSpace[0x2004] = true;
String.whiteSpace[0x2005] = true;
String.whiteSpace[0x2006] = true;
String.whiteSpace[0x2007] = true;
String.whiteSpace[0x2008] = true;
String.whiteSpace[0x2009] = true;
String.whiteSpace[0x200a] = true;
String.whiteSpace[0x200b] = true;
String.whiteSpace[0x2028] = true;
String.whiteSpace[0x2029] = true;
String.whiteSpace[0x202f] = true;
String.whiteSpace[0x205f] = true;
String.whiteSpace[0x3000] = true;


function trim17(str){
var len = str.length;
if (len){
var whiteSpace = String.whiteSpace;
while (whiteSpace[str.charCodeAt(--len)]);
if (++len){
var i = 0;
while (whiteSpace[str.charCodeAt(i)]){ ++i; }
}
str = str.substring(i, len);
}
return str;
}

Ariel Flesler said...

Do you have any comparison numbers ?