Syndicate
Site (RSS, Atom)
Contact
Weblog status
Total entries: 78
Last entry: 2022-10-16 13:52:24
Last updated: 2022-10-16 14:12:58
powered by vim, bash, cat, grep, sed, and nb 3.4.2

2010-09-17 21:42:19

GNU grep speed comparison fixed strings

sfgrep was designed especially for searching log files for fixed strings. After a bugfix some tests with gigabytes of data must be made. GNU (e)grep was invoked with files directly as arguments. The same task was realized with multiple invocations of sfgrep and an additional cat-process.

The logfile sizes were about 540 MB when running the comparison:

root@log:~ > time egrep -h -v \
  'disconn|connect|localhost|timeout|2010-09-17T1[3456]' \
  /var/log/cluster/mail*/*/postfix/smtpd |wc -l
217649
 
real    109m38.380s
user    3m53.587s
sys     105m9.866s

Ooops! Most of the time I use sfgrep and now I wondered if GNU grep would ever finish its task. But now give the little intruder its chance:

root@log:~ > time cat \
  /var/log/cluster/mail*/*/postfix/smtpd \
  |sfgrep -v disconn |sfgrep -v connect \
  |sfgrep -v localhost |sfgrep -v timeout \
  |sfgrep -v 2010-09-17T13 |sfgrep -v 2010-09-17T14 \
  |sfgrep -v 2010-09-17T15 |sfgrep -v 2010-09-17T16 \
  |wc -l
217649
 
real    0m13.734s
user    0m4.272s
sys     0m2.460s

sfgrep used open()/read() and has no alg like BMH or BM, but it is approx. 480 times faster than GNU grep. Funny thing. :-)

Don't trust them if they tell you "why GNU grep is fast". Always trust your stop watch and never just believe that your code is fast.


Posted by Frank W. Bergmann | Permanent link | File under: spitzensoftware, logging, shell