Commit 6b0e720d authored by Tim Peters's avatar Tim Peters

A large number of changes so that cache tracing isn't

100% broken -- I can get a trace file now, and stats.py
doesn't blow up anymore.  Don't yet know whether it's
reporting sensible things, but it's not obviously insane.
simul.py doesn't know about the current cache implementation
either yet.
parent fdb78a64
...@@ -13,29 +13,32 @@ cache; the ZEO client cache avoids roundtrips to the ZEO server. ...@@ -13,29 +13,32 @@ cache; the ZEO client cache avoids roundtrips to the ZEO server.
Enabling Cache Tracing Enabling Cache Tracing
---------------------- ----------------------
To enable cache tracing, set the environment variable ZEO_CACHE_TRACE To enable cache tracing, you must use a persistent cache (specify a ``client``
to the name of a file to which the ZEO client process can write. ZEO name), and set the environment variable ZEO_CACHE_TRACE. The path to the
will append a hyphen and the storage name to the filename, to trace file is derived from the path to the persistent cache file by appending
distinguish different storages. If the file doesn't exist, the ZEO ".trace". If the file doesn't exist, ZEO will try to create it. If the file
will try to create it. If there are problems with the file, a log does exist, it's opened for appending (previous trace information is not
message is written to the standard Zope log file. To start or stop overwritten). If there are problems with the file, a warning message is
tracing, the ZEO client process (typically a Zope application server) logged. To start or stop tracing, the ZEO client process (typically a Zope
must be restarted. application server) must be restarted.
The trace file can grow pretty quickly; on a moderately loaded server, The trace file can grow pretty quickly; on a moderately loaded server, we
we observed it growing by 5 MB per hour. The file consists of binary observed it growing by 5 MB per hour. The file consists of binary records,
records, each 24 bytes long; a detailed description of the record each 26 bytes long if 8-byte oids are in use; a detailed description of the
lay-out is given in stats.py. No sensitive data is logged. record lay-out is given in stats.py. No sensitive data is logged: data
record sizes and binary object and transaction ids are logged, but no
information about object types or names, user names, version names,
transaction comments, access paths, or machine information such as machine
name or IP address.
Analyzing a Cache Trace Analyzing a Cache Trace
----------------------- -----------------------
The stats.py command-line tool is the first-line tool to analyze a The stats.py command-line tool is the first-line tool to analyze a cache
cache trace. Its default output consists of two parts: a one-line trace. Its default output consists of two parts: a one-line summary of
summary of essential statistics for each segment of 15 minutes, essential statistics for each segment of 15 minutes, interspersed with lines
interspersed with lines indicating client restarts and "cache flip indicating client restarts, followed by a more detailed summary of overall
events" (more about those later), followed by a more detailed summary statistics.
of overall statistics.
The most important statistic is probably the "hit rate", a percentage The most important statistic is probably the "hit rate", a percentage
indicating how many requests to load an object could be satisfied from indicating how many requests to load an object could be satisfied from
...@@ -43,9 +46,9 @@ the cache. Hit rates around 70% are good. 90% is probably close to ...@@ -43,9 +46,9 @@ the cache. Hit rates around 70% are good. 90% is probably close to
the theoretical maximum. If you see a hit rate under 60% you can the theoretical maximum. If you see a hit rate under 60% you can
probably improve the cache performance (and hence your Zope probably improve the cache performance (and hence your Zope
application server's performance) by increasing the ZEO cache size. application server's performance) by increasing the ZEO cache size.
This is normally configured using the cache_size keyword argument to This is normally configured using cache_size key in the ``zeoclient``
the ClientStorage() constructor in your custom_zodb.py file. The section of your configuration file. The default cache size is 20 MB, which
default cache size is 20 MB. is very small.
The stats.py tool shows its command line syntax when invoked without The stats.py tool shows its command line syntax when invoked without
arguments. The tracefile argument can be a gzipped file if it has a arguments. The tracefile argument can be a gzipped file if it has a
...@@ -95,32 +98,3 @@ the cache only helps if an object is loaded more than once. ...@@ -95,32 +98,3 @@ the cache only helps if an object is loaded more than once.
The simul.py tool also supports simulating different cache The simul.py tool also supports simulating different cache
strategies. Since none of these are implemented, these are not strategies. Since none of these are implemented, these are not
further documented here. further documented here.
Cache Flips
-----------
The cache uses two files, which are managed as follows:
- Data are written to file 0 until file 0 exceeds limit/2 in size.
- Data are written to file 1 until file 1 exceeds limit/2 in size.
- File 0 is truncated to size 0 (or deleted and recreated).
- Data are written to file 0 until file 0 exceeds limit/2 in size.
- File 1 is truncated to size 0 (or deleted and recreated).
- Data are written to file 1 until file 1 exceeds limit/2 in size.
and so on.
A switch from file 0 to file 1 is called a "cache flip". At all cache
flips except the first, half of the cache contents is wiped out. This
affects cache performance. How badly this impact is can be seen from
the per-15-minutes summaries printed by stats.py. The -i option lets
you choose a smaller summary interval which shows the impact more
acutely.
The simul.py tool shows the number of cache flips in the FLIPS column.
If you see more than one flip per hour the cache may be too small.
...@@ -76,15 +76,10 @@ class ClientCache(object): ...@@ -76,15 +76,10 @@ class ClientCache(object):
# default of 20MB. The default here is misleading, though, since # default of 20MB. The default here is misleading, though, since
# ClientStorage is the only user of ClientCache, and it always passes an # ClientStorage is the only user of ClientCache, and it always passes an
# explicit size of its own choosing. # explicit size of its own choosing.
def __init__(self, path=None, size=200*1024**2, trace=False): def __init__(self, path=None, size=200*1024**2):
self.path = path self.path = path
self.size = size self.size = size
if trace and path:
self._setup_trace()
else:
self._trace = self._notrace
# The cache stores objects in a dict mapping (oid, tid) pairs # The cache stores objects in a dict mapping (oid, tid) pairs
# to Object() records (see below). The tid is the transaction # to Object() records (see below). The tid is the transaction
# id that wrote the object. An object record includes data, # id that wrote the object. An object record includes data,
...@@ -111,6 +106,8 @@ class ClientCache(object): ...@@ -111,6 +106,8 @@ class ClientCache(object):
# and retrieving objects to/from the cache file. # and retrieving objects to/from the cache file.
self.fc = FileCache(size, self.path, self) self.fc = FileCache(size, self.path, self)
self._setup_trace(self.path)
def open(self): def open(self):
self.fc.scan(self.install) self.fc.scan(self.install)
...@@ -140,6 +137,10 @@ class ClientCache(object): ...@@ -140,6 +137,10 @@ class ClientCache(object):
def close(self): def close(self):
self.fc.close() self.fc.close()
if self._tracefile:
sync(self._tracefile)
self._tracefile.close()
self._tracefile = None
## ##
# Set the last transaction seen by the cache. # Set the last transaction seen by the cache.
...@@ -430,19 +431,29 @@ class ClientCache(object): ...@@ -430,19 +431,29 @@ class ClientCache(object):
L = self.noncurrent[oid] L = self.noncurrent[oid]
L.remove((o.start_tid, o.end_tid)) L.remove((o.start_tid, o.end_tid))
def _setup_trace(self): # If `path` isn't None (== we're using a persistent cache file), and
tfn = self.path + ".trace" # envar ZEO_CACHE_TRACE is set to a non-empty value, try to open
self.tracefile = None # path+'.trace' as a trace file, and store the file object in
# self._tracefile. If not, or we can't write to the trace file, disable
# tracing by setting self._trace to a dummy function, and set
# self._tracefile to None.
def _setup_trace(self, path):
self._tracefile = None
if path and os.environ.get("ZEO_CACHE_TRACE"):
tfn = path + ".trace"
try: try:
self.tracefile = open(tfn, "ab") self._tracefile = open(tfn, "ab")
self._trace(0x00) self._trace(0x00)
except IOError, msg: except IOError, msg:
self.tracefile = None self._tracefile = None
logger.warning("Could not write to trace file %s: %s", logger.warning("cannot write tracefile %r (%s)", tfn, msg)
tfn, msg) else:
logger.info("opened tracefile %r", tfn)
def _notrace(self, *arg, **kwargs): if self._tracefile is None:
def notrace(*args, **kws):
pass pass
self._trace = notrace
def _trace(self, def _trace(self,
code, oid="", version="", tid="", end_tid=z64, dlen=0, code, oid="", version="", tid="", end_tid=z64, dlen=0,
...@@ -462,7 +473,7 @@ class ClientCache(object): ...@@ -462,7 +473,7 @@ class ClientCache(object):
if end_tid is None: if end_tid is None:
end_tid = z64 end_tid = z64
try: try:
self.tracefile.write( self._tracefile.write(
struct_pack(">iiH8s8s", struct_pack(">iiH8s8s",
time_time(), time_time(),
encoded, encoded,
......
...@@ -26,8 +26,8 @@ Usage: stats.py [-h] [-i interval] [-q] [-s] [-S] [-v] [-X] tracefile ...@@ -26,8 +26,8 @@ Usage: stats.py [-h] [-i interval] [-q] [-s] [-S] [-v] [-X] tracefile
"""File format: """File format:
Each record is 24 bytes, with the following layout. Numbers are Each record is 18 bytes, plus a variable number of bytes to store an oid,
big-endian integers. with the following layout. Numbers are big-endian integers.
Offset Size Contents Offset Size Contents
...@@ -46,6 +46,9 @@ Mask bits Contents ...@@ -46,6 +46,9 @@ Mask bits Contents
0x7e 6 function and outcome code 0x7e 6 function and outcome code
0x01 1 current cache file (0 or 1) 0x01 1 current cache file (0 or 1)
The "current cache file" bit is no longer used; it refers to a 2-file
cache scheme used before ZODB 3.3.
The function and outcome codes are documented in detail at the end of The function and outcome codes are documented in detail at the end of
this file in the 'explain' dictionary. Note that the keys there (and this file in the 'explain' dictionary. Note that the keys there (and
also the arguments to _trace() in ClientStorage.py) are 'code & 0x7e', also the arguments to _trace() in ClientStorage.py) are 'code & 0x7e',
...@@ -59,18 +62,18 @@ import struct ...@@ -59,18 +62,18 @@ import struct
from types import StringType from types import StringType
def usage(msg): def usage(msg):
print >>sys.stderr, msg print >> sys.stderr, msg
print >>sys.stderr, __doc__ print >> sys.stderr, __doc__
def main(): def main():
# Parse options # Parse options
verbose = 0 verbose = False
quiet = 0 quiet = False
dostats = 1 dostats = True
print_size_histogram = 0 print_size_histogram = False
print_histogram = 0 print_histogram = False
interval = 900 # Every 15 minutes interval = 15*60 # Every 15 minutes
heuristic = 0 heuristic = False
try: try:
opts, args = getopt.getopt(sys.argv[1:], "hi:qsSvX") opts, args = getopt.getopt(sys.argv[1:], "hi:qsSvX")
except getopt.error, msg: except getopt.error, msg:
...@@ -78,24 +81,27 @@ def main(): ...@@ -78,24 +81,27 @@ def main():
return 2 return 2
for o, a in opts: for o, a in opts:
if o == '-h': if o == '-h':
print_histogram = 1 print_histogram = True
if o == "-i": elif o == "-i":
interval = int(60 * float(a)) interval = int(60 * float(a))
if interval <= 0: if interval <= 0:
interval = 60 interval = 60
elif interval > 3600: elif interval > 3600:
interval = 3600 interval = 3600
if o == "-q": elif o == "-q":
quiet = 1 quiet = True
verbose = 0 verbose = False
if o == "-s": elif o == "-s":
print_size_histogram = 1 print_size_histogram = True
if o == "-S": elif o == "-S":
dostats = 0 dostats = False
if o == "-v": elif o == "-v":
verbose = 1 verbose = True
if o == '-X': elif o == '-X':
heuristic = 1 heuristic = True
else:
assert False, (o, opt)
if len(args) != 1: if len(args) != 1:
usage("exactly one file argument required") usage("exactly one file argument required")
return 2 return 2
...@@ -107,12 +113,12 @@ def main(): ...@@ -107,12 +113,12 @@ def main():
try: try:
import gzip import gzip
except ImportError: except ImportError:
print >>sys.stderr, "can't read gzipped files (no module gzip)" print >> sys.stderr, "can't read gzipped files (no module gzip)"
return 1 return 1
try: try:
f = gzip.open(filename, "rb") f = gzip.open(filename, "rb")
except IOError, msg: except IOError, msg:
print >>sys.stderr, "can't open %s: %s" % (filename, msg) print >> sys.stderr, "can't open %s: %s" % (filename, msg)
return 1 return 1
elif filename == '-': elif filename == '-':
# Read from stdin # Read from stdin
...@@ -122,7 +128,7 @@ def main(): ...@@ -122,7 +128,7 @@ def main():
try: try:
f = open(filename, "rb") f = open(filename, "rb")
except IOError, msg: except IOError, msg:
print >>sys.stderr, "can't open %s: %s" % (filename, msg) print >> sys.stderr, "can't open %s: %s" % (filename, msg)
return 1 return 1
# Read file, gathering statistics, and printing each record if verbose # Read file, gathering statistics, and printing each record if verbose
...@@ -203,9 +209,8 @@ def main(): ...@@ -203,9 +209,8 @@ def main():
bysizew[dlen] = d = bysizew.get(dlen) or {} bysizew[dlen] = d = bysizew.get(dlen) or {}
d[oid] = d.get(oid, 0) + 1 d[oid] = d.get(oid, 0) + 1
if verbose: if verbose:
print "%s %d %02x %s %016x %016x %1s %s" % ( print "%s %02x %s %016x %016x %c %s" % (
time.ctime(ts)[4:-5], time.ctime(ts)[4:-5],
current,
code, code,
oid_repr(oid), oid_repr(oid),
U64(start_tid), U64(start_tid),
...@@ -234,7 +239,7 @@ def main(): ...@@ -234,7 +239,7 @@ def main():
# Error if nothing was read # Error if nothing was read
if not records: if not records:
print >>sys.stderr, "No records processed" print >> sys.stderr, "No records processed"
return 1 return 1
# Print statistics # Print statistics
...@@ -261,7 +266,7 @@ def main(): ...@@ -261,7 +266,7 @@ def main():
code, code,
explain.get(code) or "*** unknown code ***") explain.get(code) or "*** unknown code ***")
# Print histogram # Print histogram.
if print_histogram: if print_histogram:
print print
print "Histogram of object load frequency" print "Histogram of object load frequency"
...@@ -281,7 +286,7 @@ def main(): ...@@ -281,7 +286,7 @@ def main():
print fmt % (binsize, addcommas(count), print fmt % (binsize, addcommas(count),
obj_percent, load_percent, cum) obj_percent, load_percent, cum)
# Print size histogram # Print size histogram.
if print_size_histogram: if print_size_histogram:
print print
print "Histograms of object sizes" print "Histograms of object sizes"
...@@ -325,7 +330,7 @@ def dumpbyinterval(byinterval, h0, he): ...@@ -325,7 +330,7 @@ def dumpbyinterval(byinterval, h0, he):
def hitrate(bycode): def hitrate(bycode):
loads = 0 loads = 0
hits = 0 hits = 0
for code in bycode.keys(): for code in bycode:
if code & 0x70 == 0x20: if code & 0x70 == 0x20:
n = bycode[code] n = bycode[code]
loads += n loads += n
...@@ -345,8 +350,7 @@ def histogram(d): ...@@ -345,8 +350,7 @@ def histogram(d):
return L return L
def U64(s): def U64(s):
h, v = struct.unpack(">II", s) return struct.unpack(">Q", s)[0]
return (long(h) << 32) + v
def oid_repr(oid): def oid_repr(oid):
if isinstance(oid, StringType) and len(oid) == 8: if isinstance(oid, StringType) and len(oid) == 8:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment