Ariel Flesler: String Tokenizer for Javascript

Saturday, March 29, 2008

String Tokenizer for Javascript

Introduction

This small class can easily parse a string, and generate different kind of tokens. It's very simple and straight-forward. It can perform as a base for other string parsing scripts, like templating engines, custom language interpreters, and many more.

jQuery plugin vs standalone

When called, the script will generate the class, and if jQuery is detected, it will be saved at $.tokenizer.
Otherwise, the class is saved at (window.)Tokenizer.
Note that this script doesn't need jQuery at all, this option is added to ease on jQuery developers.

How to use

The constructor of the class takes 2 arguments, 1 is optional.

tokenizers
This is a collection of strings/regexes that match the tokens.
The Regexes don't need to include back-references, they can though, but the whole match will be considered a token.
If you use regex, it's important that you DON'T make it global.
You can send an array of tokenizers, or just one.
build
This is a parsing function, it will get called for each token found, and also for the string between tokens. It should return the parsed token, note this doesn't need to be a string, the returned token can be an array, an object, etc.
If no function is given, the tokens are the matched strings.
The function receives 3 arguments:
1. The string token that was matched.
2. Whether it is a matched token, or the string between 2 tokens (true means real token, false, plain string).
3. The tokenizer that matched this string, or the one that skipped over this slice in the case of plain strings.

As mentioned, build won't just get called for each token found, but also for the strings between tokens. Use the second argument to know which one it is. After you create the tokenizer, you call the method .parse() passing the string, and it will return the array of tokens. You might want to actually do what you need, inside the build method, and just ignore the returned array.

Examples

Templating

var values = { name:'Joe', age:32, surname:'Smith' };
var tokenizer = new Tokenizer([
    /<%(\w+)%>/, /\$(\w+)/
 ],function( src, real, re ){
    return real ? src.replace(re,function(all,name){
       return values[name];
    }) : src;
  }
);
var tpl = '<%name%> $surname is $age years old.';
var tokens = tokenizer.parse(tpl);
document.body.innerHTML = tokens.join('');

CSV parser

var rows = [ ], row = rows[0] = [ ]; 
var csv = new Tokenizer( [',',';'],
  function( text, isSeparator ){
     if( isSeparator ){
         if( text == ';' ){//new row
             row = [ ];
             rows.push(row);
         }
     }else{   
         row.push(text);
     } 
  }
);
csv.parse('Joe,Smith,32;Jane,Doe,26;Mike,Bowel,54');

Downloads

Tokenizer 1.0.1 Zip(all files and docs)
Tokenizer 1.0.1 Source(to learn or test)
Tokenizer 1.0.1 Minified(recommended)

7 comments:

Richard D. Worth said...: Ariel. This is great! It's going on my short list. Thanks for sharing.; March 31, 2008 at 9:56 AM
Unknown said...: do you mean CSV instead of CVS?; April 17, 2008 at 7:15 AM
Ariel Flesler said...: Heh..
What was I thinking ? Thanks for catching that up, I made a mistake once, and then repeated over and over, will fix it now.

Thanks again.; April 17, 2008 at 10:24 AM
Miguel Ruiz Velasco S said...: Hello,
The firebug complains on onEnd not defined, and reading the code
return new Tokenizer( tokenizers, onEnd, onFound );
in the above code, onEnd and onFound are not defined, changing that to doBuild, makes it work
Thanks; July 18, 2008 at 2:01 AM
Ariel Flesler said...: Right thanks for spotting. That remained from a change in the last release.
Only happens when called without 'new'.
I just fixed on the trunk, will be in for a next release.
Thanks again; July 18, 2008 at 10:36 AM
Anonymous said...: very interesting, though curiously the first example doesn't work for me, returning :

Joe Smith is 32undefined years old.

But, removing 'years old.' from tpl, or adding a <% %> at the end as follows:

var values = { firstname:'Joe', age:'32', surname:'Smith', fin:'' };
...
var tpl = 'guy <%firstname%> <%surname%> is <%age%> years old.<%fin%>';

makes it work properly

any hint ?; November 18, 2008 at 9:56 AM
Ariel Flesler said...: Ok, fixed the demo. Thanks for noticing.; November 19, 2008 at 7:34 PM

Post a Comment To get help prepare a demo.

Ariel Flesler

Saturday, March 29, 2008

String Tokenizer for Javascript

Introduction

jQuery plugin vs standalone

How to use

tokenizers

build

Examples

Downloads

7 comments:

By Category

Donate to my blog

Recent

Links

Some sites using these scripts

Followers

Ariel Flesler

Saturday, March 29, 2008

String Tokenizer for Javascript

Introduction

jQuery plugin vs standalone

How to use

tokenizers

build

Examples

Downloads

7 comments:

By Category

General

jQuery Plugins

Scripts

Donate to my blog

Recent

Links

Some sites using these scripts

Followers