Javascript Compression, Madness

Source     ::::    http://codeflow.org/entries/2010/aug/25/javascript-compression-madness/

Javascript Compression, Madness

maalmood lamaane A ka hor waxaan qoray post oo ku saabsan sida loo sameeyo istuuray n-jirka ee javascript la shiraac. Laakiin hadii aan doonayo inaan u gudbiyo js1k, waxaa loo baahan yahay in ay ku habboon 1024 bytes. post soo socda waa saabsan kombaresarada javascript gaar ah ku qoran tukan jiray. Waxaan bilaabi la 3.1kb ka dibna u hesho la minification in 1.4kb oo aan la isku geynta riixo u gaar ah si ay u 0.9kb.

 

Haddii aad tahay qof hore u yaqaanaan Yui-kombaresarada iyo tweaking waxoogaa ka dib, waxaa laga yaabaa inaad rabto in aad si toos ah u gudub Tallaabada 3 halkaas oo aan sharax sida aan dhisay javascript code is-damlaynayso ah.

 

Tallaabo 1 – kombaresarada Yui

Tallaabada 2 – Compact more qaar ka mid ah

Tallaabada 3 – Waxaan mas’uul kama aad saaxiga ahay

Man uted

ls -l gravity.js

-rw-r–r– 1 pyalot pyalot 3253 2010-08-25 13:48 gravity.js-rw-r – r– pyalot 1 pyalot 3253 2010-08-25 13:48 gravity.js

Sidaas version uncompressed ee xeerka Waxaan doonayaa in ay soo gudbiyaan waxa uu leeyahay size a of 3253 bytes. Ax ah, meel fog ka badan 1024 bytes.

 

Download

Waxaad kala soo bixi kartaa isha for kombaresarada waalan haddii aad rabto in aad toy naftaada la

 

Tallaabo 1 – kombaresarada Yui

Tani waa kombaresarada a quruxsan nifty javascript, waxaa javascript parses oo sameeya oo dhan cayn kasta oo optimizations. Carar ku cajiintaan my code this:

 

bash

yui-compressor gravity.js > script.js

ls -l script.js

-rw-r–r– 1 pyalot pyalot 1800 2010-08-25 15:59 script.js

Ahaa, in ay fiican tahay, hoos in 1800 bytes Waxaan ahay. Sidaas eegno wax soo saarka iyo arko haddii ay jiraan boos uu horumar.

javascript

function Vector(a,b){this.x=a;this.y=b;this.sub=function(c)

{return new Vector(this.x-c.x,this.y-c.y)};this.isub=functi

on(c){this.x-=c.x;this.y-=c.y};this.iadd=function(c){this.x

+=c.x;this.y+=c.y};this.size=function(){return Math.sqrt(th

is.x*this.x+this.y*this.y)};this.idiv=function(c){this.x/=c

;this.y/=c};this.zero=function(){this.x=this.y=0};this.vali

date=function(){if(isNaN(this.x+this.y)){this.x=this.y=0}}}

function Particle(a){this.acceleration=new Vector(0,0);this

.velocity=new Vector(Math.random()-0.5,Math.random()-0.5);t

his.position=new Vector(Math.random()*a.width,Math.random()

*a.height);this.step=function(b){this.acceleration.validate

();this.velocity.iadd(this.acceleration);speed=this.velocit

y.size();if(speed>4){this.velocity.idiv(speed/4);speed=4}th

is.position.iadd(this.velocity);this.acceleration.zero();if

(this.position.x<0){this.position.x=0;this.velocity.x/=-2}e

lse{if(this.position.x>a.width){this.position.x=a.width;thi

s.velocity.x/=-2}}if(this.position.y<0){this.position.y=0;t

his.velocity.y/=-2}else{if(this.position.y>a.height){this.p

osition.y=a.height;this.velocity.y/=-2}}color=Math.round(sp

eed*128);b.fillStyle=”rgba(“+(color-50)+”,”+(300-color)+”,0

,1)”;b.beginPath();b.arc(this.position.x,this.position.y,sp

eed+2,0,Math.PI*2,false);b.fill()}}gc=”globalCompositionOpe

ration”;new (function System(){var a=document.getElementByI

d(“c”);a.width=a.height=300;var c=a.getContext(“2d”);var d=

[];for(var b=0;b<30;b++){d.push(new Particle(a))}setInterva

l(function(){c[gc]=”source-over”;c.fillStyle=”rgba(0,0,0,0.

2)”;c.fillRect(0,0,a.width,a.height);c[gc]=”lighter”;for(va

r l=0,g=30;l<g;l++){var f=d[l];for(var h=l+1;h<30;h++){var

e=d[h];var k=f.position.sub(e.position);k.idiv(Math.pow(Mat

h.max(10,k.size()),3)/5);e.acceleration.iadd(k);f.accelerat

ion.isub(k)}f.step(c)}},40)});

 

Waa shaqo si fiican, laakiin waxaa jira waxyaabo laba in uu noqon karaa more isafgaradka, sida magacyada fasalka dulinka / Qurub iyo farabadan of magacyada habka.

 

Tallaabada 2 – Compact more qaar ka mid ah

Waxaa jira a qaabka caddaalad ah oo fudud, waxaan dili karaa la hadal joogto ah. Dabcan ma ahan ammaan si ay sidaa sameeyaan guud ahaan, laakiin code gaaban oo qoraal ah in la Tifaftirayaasha, waa ok. Sidaas daraaddeed waxaan u qoray qoraal gaaban in ka badan qaar ka mid ah nadiifinta:

python

#!/usr/bin/env python

 

import re, sys

 

method_re = re.compile(r’\.([a-zA-Z]+)’)

objects_re = re.compile(‘new ([a-zA-Z]+)’)

blacklist = [

‘length’, ‘random’, ‘getElementById’,

‘getContext’, ‘beginPath’, ‘arc’, ‘fill’, ‘push’,

‘globalCompositeOperation’, ‘fillStyle’, ‘pos’,

‘pow’, ‘sqrt’, ‘width’, ‘height’, ‘PI’, ‘fillRect’,

‘log’, ‘join’, ’round’, ‘min’, ‘max’,

]

 

if __name__ == ‘__main__’:

source = sys.stdin.read()

 

# replace method names with short symbols

method_names = ‘abcdefghijklmnopqrstuvwxyz’

methods = set()

for match in method_re.finditer(source):

method = match.group(1)

if method not in blacklist and len(method) > 1:

methods.add(method)

 

i = 0

for method in methods:

source = source.replace(method, method_names[i])

i += 1

 

# replace obvious objectnames with symbols

objects = set()

for match in objects_re.finditer(source):

name = match.group(1)

objects.add(name)

 

object_names = ‘abcdefghijklmnopqrstuvwxyz’.upper()

 

i = 0

for name in objects:

source = source.replace(name, object_names[i])

i += 1

 

print source

 

Maxay waxan sameeyaa waa ka fiiriyaana magacyada hab cad iyo haddii ay tahay in aan liiska madoow habka, beddelayo code hal qof ah. Markaas waxa ay u muuqataa magacyada wax cad, oo iyaga badalay leh code hal qof ah. Haddii aad ku qor code si cadaalad ah oo nadiif ah, noocaas ah bedelay macquul ah ammaan, iyo sidoo kale, waayo, demo 1024 byte ah si kasta.

 

Sidaas aan aragno waxa aan ka helay:

 

bash

yui-compressor gravity.js | ./compact > script.js

ls -l script.js

-rw-r–r– 1 pyalot pyalot 1462 2010-08-25 16:05 script.js

Yeah, that’s not much better, but I got to 1462 bytes, shaved off 338 bytes. Better then nothing, but still too large.

 

Aynu eegno at saarka mar kale iyo arko haddii aan seegay wax:

javascript

function A(a,b){this.x=a;this.y=b;this.c=function(c){return

new A(this.x-c.x,this.y-c.y)};this.b=function(c){this.x-=c

.x;this.y-=c.y};this.j=function(c){this.x+=c.x;this.y+=c.y}

;this.k=function(){return Math.sqrt(this.x*this.x+this.y*th

is.y)};this.e=function(c){this.x/=c;this.y/=c};this.f=funct

ion(){this.x=this.y=0};this.i=function(){if(isNaN(this.x+th

is.y)){this.x=this.y=0}}}function B(a){this.a=new A(0,0);th

is.h=new A(Math.random()-0.5,Math.random()-0.5);this.g=new

A(Math.random()*a.width,Math.random()*a.height);this.d=func

tion(b){this.a.i();this.h.j(this.a);speed=this.h.k();if(spe

ed>4){this.h.e(speed/4);speed=4}this.g.j(this.h);this.a.f()

;if(this.g.x<0){this.g.x=0;this.h.x/=-2}else{if(this.g.x>a.

width){this.g.x=a.width;this.h.x/=-2}}if(this.g.y<0){this.g

.y=0;this.h.y/=-2}else{if(this.g.y>a.height){this.g.y=a.hei

ght;this.h.y/=-2}}color=Math.round(speed*128);b.fillStyle=”

rgba(“+(color-50)+”,”+(300-color)+”,0,1)”;b.beginPath();b.a

rc(this.g.x,this.g.y,speed+2,0,Math.PI*2,false);b.fill()}}g

c=”globalComgOperation”;new (function System(){var a=docume

nt.getElementById(“c”);a.width=a.height=300;var c=a.getCont

ext(“2d”);var d=[];for(var b=0;b<30;b++){d.push(new B(a))}s

etInterval(function(){c[gc]=”source-over”;c.fillStyle=”rgba

(0,0,0,0.2)”;c.fillRect(0,0,a.width,a.height);c[gc]=”lighte

r”;for(var l=0,g=30;l<g;l++){var f=d[l];for(var h=l+1;h<30;

h++){var e=d[h];var k=f.g.c(e.g);k.e(Math.pow(Math.max(10,k

.k()),3)/5);e.a.j(k);f.a.b(k)}f.d(c)}},40)});

 

Nope, ma u eg maqan badan. Sidaas waxa hadda?

 

Tallaabada 3 – Waxaan mas’uul kama aad saaxiga ahay

Hordhaca: wax kasta oo hadda soo socota waa si buuxda aan waxtar lahayn loo isticmaalo guud marka laga reebo 1k tartamada demo iyo wixii la mid ah.

 

code ma weli ku jira isku xigxigta, in lagu celiyo si caddaalad badanaa, sida “this.” ama “shaqo” iwm, anigu ma minify karaan kuwa la komberesarrada caadiga ah sababta oo ah waxay tahay keywords javascript. Laakiin haddii aan lahaa qaar ka mid ah hab si ay u Khushuucsan …

 

Waxa aan u baahanahay waa isku geynta riixo dhabta ah. Dhibaatadu waxay la kuwa ay tahay in ay sida caadiga ah la dhiso, miisaskii oo calaamad oo ay isticmaalaan codes multi-byte si ay u index. Waxaan aadan haysan meel noocaas ah wax. Laakiin waxaan u baahanahay miiska calaamad iyo indices u galay ee sort qaar ka mid ah.

 

characters Free?

Aan aragno haddii ay jiraan wax ASCII characters isticmaalin, waayo xeerka.

python

def find_free(data):

exclude = set(‘\n\\\r\t\x0b\x0c\'”‘)

candidates = set(string.printable) – exclude

return candidates – set(data)

 

if __name__ == ‘__main__’:

data = sys.stdin.read().strip()

data = data.replace(‘”‘, “‘”).replace(‘\n’, ‘\\n’)

keys = find_free(data)

print len(keys), keys

I’m excluding line feed (\n), carriage return (\r), vertical tab (x0b) double quotation mark (“) and form feed new page (x0c) because those don’t work in javascript strings without escaping. I’m excluding tab (\t) because I want to use that later. So indeed, there are 32 characters unused!

 

!#%$&769:?@DGFHKJLQUTWVYXZ_^`z|~

Pattern matching

If I could replace common patterns in the code with these characters, I could save space. I need to find common patterns, so let’s enumerate all patterns the code has.

 

python

def window_iter(data, length):

for i in range(len(data) – length):

yield data[i:i+length]

This is a sliding window iterator, if you where to call with this code:

 

python

for window in window_iter(data, 10): print window

You would get something like this:

 

&h.xLelse{

h.xLelse{i

.xLelse{if

xLelse{if(

Lelse{if(9

else{if(9x

lse{if(9x>

se{if(9x>:

e{if(9x>:$

 

ku habboon Qaabka

Haddii aan laga yaabaa inuu bedelo qaabka caadiga ah ee code la characters, kuwaas oo, waan badbaadin karin boos. Waxaan u baahan tahay si aad u ogaato qaabka caadiga ah, si aynu koobi dhan qaababka code leeyahay.

python

def find_best(data):

symbols = {}

for size in range(2, 100):

hit = False

for window in window_iter(data, size):

count = data.count(window)

if count > 1:

hit = True

symbols[window] =  count * size

if not hit:

break

 

return sorted(symbols, key=symbols.__getitem__)[-1]

Actual compression

So I replace all these best occurrences successively until no more keys are left:

 

python

if __name__ == ‘__main__’:

data = sys.stdin.read().strip()

data = data.replace(‘”‘, “‘”).replace(‘\n’, ‘\\n’)

keys = find_free(data)

 

symbols = []

while keys:

sequence = find_best(data)

key = keys.pop()

data = data.replace(sequence, key)

symbols.insert(0, key+sequence)

Putting it back together

But now I have a symbol table with key/sequence relationships and a chunk of unreadable text. I need a javascript algorithm that can decompress it.

 

javascript

symbols = “the symbol table”.split(“\t”);

data = “the remaining data”;

do{

previous = data.length;

for(i=0; i<symbols.length; i++){

symbol = symbols[i];

key = symbol[0];

sequence = symbol.substr(1,99);

data = data.replace(key, sequence);

}

}

while(previous != data.length)

eval(data);

This algorithm expands each symbol in the data string with the sequence from the table for that key, until data does not expand anymore.

 

So I need to wrap my symbol table and data into that template, and the template needs to be pretty short too:

 

python

def wrap(symbols, data):

template = ”’\

s=”%(symbols)s”.split(“%(symsep)s”);d=”%(data)s”;\

do{p=d.length;for(i=0;i<s.length;i++){q=s[i];\

d=d.replace(q[0],q.substr(1,99))}}while(p!=d.length)\

eval(d)”’

return template % {

‘symsep’    : ‘\t’,

‘data’      : data,

‘symbols’   : ‘\t’.join(symbols),

}

 

Sidaas daraaddeed, haddaan tirisaan dhacdooyinka of hannaanka ee isha iyo xisaabiyo dhacdo * Len (nidaam), waan ogahay taas oo ka mid ah waa sida ugu fiican. Tani waa waxa shaqo find_best ku sameeya.

 

 

riixo dhabta ah

Sidaas daraaddeed waxaan u bedelo, kuwaas oo dhan dhacdooyinka fiican israac ilaa furayaasha ka badan hadhay,

python

def find_best(data):

symbols = {}

for size in range(2, 100):

hit = False

for window in window_iter(data, size):

count = data.count(window)

if count > 1:

hit = True

symbols[window] =  count * size

if not hit:

break

 

return sorted(symbols, key=symbols.__getitem__)[-1]

 

waxaa wada Habeynta dib

Laakiin hadda waxaan u leeyihiin miiska calaamad la xiriir / muhiimka ah xigxiga iyo waslad ka mid ah qoraalka aqrin. Waxaan u baahanahay geynta javascript in decompress karo.

python

if __name__ == ‘__main__’:

data = sys.stdin.read().strip()

data = data.replace(‘”‘, “‘”).replace(‘\n’, ‘\\n’)

keys = find_free(data)

 

symbols = []

while keys:

sequence = find_best(data)

key = keys.pop()

data = data.replace(sequence, key)

symbols.insert(0, key+sequence)

 

geynta Tani ballaarinaya calaamad kasta oo string xogta la isku xigxiga miiska ee muhiimka ah in ka, ilaa xogta aanu mar dambe ballaariyo.

javascript

symbols = “the symbol table”.split(“\t”);

data = “the remaining data”;

do{

previous = data.length;

for(i=0; i<symbols.length; i++){

symbol = symbols[i];

key = symbol[0];

sequence = symbol.substr(1,99);

data = data.replace(key, sequence);

}

}

while(previous != data.length)

eval(data);

Sidaas daraaddeed waxaan u baahan tahay inaad duub miiskayga calaamad iyo xogta galay template in, iyo template u baahan yahay in uu noqdo quruxsan gaaban aad:

python

def wrap(symbols, data):

template = ”’\

s=”%(symbols)s”.split(“%(symsep)s”);d=”%(data)s”;\

do{p=d.length;for(i=0;i<s.length;i++){q=s[i];\

d=d.replace(q[0],q.substr(1,99))}}while(p!=d.length)\

eval(d)”’

return template % {

‘symsep’    : ‘\t’,

‘data’      : data,

‘symbols’   : ‘\t’.join(symbols),

}

Ugu dambayntiina xeerka Tifaftirayaasha waa inay noqdaan kuwo wax soo saarka si ay stdout:

 

python

print wrap(symbols, data)

This can’t possibly work or can it?

There’s about 120 bytes additional code for the decompressor and the symbol table format has two more bytes for each of the 32 symbols. So doing some math, 1024-120-64 = 872 bytes. I need a compression before serialization to 57% of its original size, and just with 32 symbols. Did it work?

 

bash

yui-compressor gravity.js | ./compact | ./jszip > script.js

-rw-r–r– 1 pyalot pyalot 1012 2010-08-25 17:25 script.js

Yes! It does work, I’m down to 1012 bytes! If you care to check it out you can see that it’s also valid javascript after decompression.

 

Sidee code hadda fiiri? Cajiib ah.

javascript

s=”~=Yx&y   |a. zif(    `Kc$J   ^=HA(   _/=-2}} Zc[gc]=’

X)} Yc. V&h.    Wcolor  T$return    U$J=!y=0}   Q;for(6 L0

, J!x K=#(    Hnew    FMath.  G); D/=-2}else{if(? @speed  ?

!g.    :.fillStyle=’rgba(  9a.width    6var    7a.height

&;! $){ %Math.random()  #function   !this.”.split(” “);d=”#

A(a,b$J=a&y=b&cKcTHA(J-Yx,!y-YyX&b`-~-=Yy}&j`+~+=Yy}&kKTFs

qrt(J*J+!y*!yX&e`/=c&y/=c}&fKU&iK$zisNaN(J+!y)U}}# B(a$!a^L

0)&h^%-0.5,%-0.5)&g^%*9,%*7)&dKb$!|i()Vj(!aG@=!h.k(Gz@>4$!h

.e(@/4G@=4}?j(!h)&|f(Gz?x<0$?x=0VxDx>9$?x=9Vx_z?y<0$?y=0VyD

y>7$?y=7Vy_W=Fround(@*128Gb:’+(W-50)+’,’+(300-W)+’,L1)’;b.b

eginPath(Gb.arc(?x,?y,@+2,LFPI*2,falseGb.fill(X}gc=’globalC

omgOperation’;H(# System($6a=document.getElementById(‘c’G9=

7=300;6c=|getContext(‘2d’G6d=[]Qb=0;b<30;b++$d.push(HB(a)Xs

etInterval(#($Zsource-over’;c:LLL0.2)’;YfillRect(LL9,7GZlig

hter’Ql=Lg=30;l<g;l++$6f=d[l]Qh=l+1;h<30;h++$6e=d[h];6k=f.g

.c(e.gGk.e(Fpow(Fmax(1Lk.k()),3)/5Ge.|j(kGf.|b(kXf.d(cX},40

XG”;do{p=d.length;for(i=0;i<s.length;i++){q=s[i];d=d.replac

e(q[0],q.substr(1,99))}}while(p!=d.length)eval(d)

 

Sidaas aan soo gudbinta js1k iska yahay, oo aan isagoo rajo weyn ka Xusuuso sharaf nidaamka riixo ugu baahsan 🙂