Merge rev 37375 from 3.4 branch.

A large number of changes so that cache tracing isn't 100% broken -- I can get a trace file now, and stats.py doesn't blow up anymore. Don't yet know whether it's reporting sensible things, but it's not obviously insane. simul.py doesn't know about the current cache implementation either yet.

Merge rev 37375 from 3.4 branch.
A large number of changes so that cache tracing isn't 100% broken -- I can get a trace file now, and stats.py doesn't blow up anymore. Don't yet know whether it's reporting sensible things, but it's not obviously insane. simul.py doesn't know about the current cache implementation either yet.
8ab7eb8e · Tim Peters · 6140445b · 8ab7eb8e · 8ab7eb8e · 8ab7eb8e
Commit 8ab7eb8e authored 19 years ago by Tim Peters
Hide whitespace changes
Inline Side-by-side

Showing with 95 additions and 106 deletions

doc/ZEO/trace.txt doc/ZEO/trace.txt +26 -52

src/ZEO/cache.py src/ZEO/cache.py +30 -19

src/ZEO/stats.py src/ZEO/stats.py +39 -35

No files found.
--- a/doc/ZEO/trace.txt
+++ b/doc/ZEO/trace.txt
@@ -13,29 +13,32 @@ cache; the ZEO client cache avoids roundtrips to the ZEO server.
 Enabling Cache Tracing
 ----------------------

-To enable cache tracing, set the environment variable ZEO_CACHE_TRACE
-to the name of a file to which the ZEO client process can write.  ZEO
-will append a hyphen and the storage name to the filename, to
-distinguish different storages.  If the file doesn't exist, the ZEO
-will try to create it.  If there are problems with the file, a log
-message is written to the standard Zope log file.  To start or stop
-tracing, the ZEO client process (typically a Zope application server)
-must be restarted.
-
-The trace file can grow pretty quickly; on a moderately loaded server,
-we observed it growing by 5 MB per hour.  The file consists of binary
-records, each 24 bytes long; a detailed description of the record
-lay-out is given in stats.py.  No sensitive data is logged.
+To enable cache tracing, you must use a persistent cache (specify a ``client``
+name), and set the environment variable ZEO_CACHE_TRACE.  The path to the
+trace file is derived from the path to the persistent cache file by appending
+".trace".  If the file doesn't exist, ZEO will try to create it.  If the file
+does exist, it's opened for appending (previous trace information is not
+overwritten).  If there are problems with the file, a warning message is
+logged.  To start or stop tracing, the ZEO client process (typically a Zope
+application server) must be restarted.
+
+The trace file can grow pretty quickly; on a moderately loaded server, we
+observed it growing by 5 MB per hour.  The file consists of binary records,
+each 26 bytes long if 8-byte oids are in use; a detailed description of the
+record lay-out is given in stats.py.  No sensitive data is logged:  data
+record sizes and binary object and transaction ids are logged, but no
+information about object types or names, user names, version names,
+transaction comments, access paths, or machine information such as machine
+name or IP address.

 Analyzing a Cache Trace
 -----------------------

-The stats.py command-line tool is the first-line tool to analyze a
-cache trace.  Its default output consists of two parts: a one-line
-summary of essential statistics for each segment of 15 minutes,
-interspersed with lines indicating client restarts and "cache flip
-events" (more about those later), followed by a more detailed summary
-of overall statistics.
+The stats.py command-line tool is the first-line tool to analyze a cache
+trace.  Its default output consists of two parts:  a one-line summary of
+essential statistics for each segment of 15 minutes, interspersed with lines
+indicating client restarts, followed by a more detailed summary of overall
+statistics.

 The most important statistic is probably the "hit rate", a percentage
 indicating how many requests to load an object could be satisfied from
@@ -43,9 +46,9 @@ the cache.  Hit rates around 70% are good.  90% is probably close to
 the theoretical maximum.  If you see a hit rate under 60% you can
 probably improve the cache performance (and hence your Zope
 application server's performance) by increasing the ZEO cache size.
-This is normally configured using the cache_size keyword argument to
-the ClientStorage() constructor in your custom_zodb.py file.  The
-default cache size is 20 MB.
+This is normally configured using cache_size key in the ``zeoclient``
+section of your configuration file.  The default cache size is 20 MB, which
+is very small.

 The stats.py tool shows its command line syntax when invoked without
 arguments.  The tracefile argument can be a gzipped file if it has a
@@ -85,7 +88,7 @@ size::
    ZEOCacheSimulation, cache size 300,000,000 bytes
      START TIME  DURATION    LOADS     HITS INVALS WRITES  FLIPS HITRATE
    Sep  4 11:59     38:01    59833    40921    258     20      0  68.4%
-    $ 
+    $

 This shows that for this particular trace file, the maximum attainable
 hit rate is 68.4%.  This is probably caused by the fact that nearly a
@@ -95,32 +98,3 @@ the cache only helps if an object is loaded more than once.
 The simul.py tool also supports simulating different cache
 strategies.  Since none of these are implemented, these are not
 further documented here.
-
-Cache Flips
-----------
-
-The cache uses two files, which are managed as follows:
-
-  - Data are written to file 0 until file 0 exceeds limit/2 in size.
-
-  - Data are written to file 1 until file 1 exceeds limit/2 in size.
-
-  - File 0 is truncated to size 0 (or deleted and recreated).
-
-  - Data are written to file 0 until file 0 exceeds limit/2 in size.
-
-  - File 1 is truncated to size 0 (or deleted and recreated).
-
-  - Data are written to file 1 until file 1 exceeds limit/2 in size.
-
-and so on.
-
-A switch from file 0 to file 1 is called a "cache flip".  At all cache
-flips except the first, half of the cache contents is wiped out.  This
-affects cache performance.  How badly this impact is can be seen from
-the per-15-minutes summaries printed by stats.py.  The -i option lets
-you choose a smaller summary interval which shows the impact more
-acutely.
-
-The simul.py tool shows the number of cache flips in the FLIPS column.
-If you see more than one flip per hour the cache may be too small.
--- a/src/ZEO/cache.py
+++ b/src/ZEO/cache.py
@@ -76,15 +76,10 @@ class ClientCache(object):
    # default of 20MB.  The default here is misleading, though, since
    # ClientStorage is the only user of ClientCache, and it always passes an
    # explicit size of its own choosing.
-    def __init__(self, path=None, size=200*1024**2, trace=False):
+    def __init__(self, path=None, size=200*1024**2):
        self.path = path
        self.size = size

-        if trace and path:
-            self._setup_trace()
-        else:
-            self._trace = self._notrace
-
        # The cache stores objects in a dict mapping (oid, tid) pairs
        # to Object() records (see below).  The tid is the transaction
        # id that wrote the object.  An object record includes data,
@@ -111,6 +106,8 @@ class ClientCache(object):
        # and retrieving objects to/from the cache file.
        self.fc = FileCache(size, self.path, self)

+        self._setup_trace(self.path)
+
    def open(self):
        self.fc.scan(self.install)

@@ -140,6 +137,10 @@ class ClientCache(object):

    def close(self):
        self.fc.close()
+        if self._tracefile:
+            sync(self._tracefile)
+            self._tracefile.close()
+            self._tracefile = None

    ##
    # Set the last transaction seen by the cache.
@@ -430,19 +431,29 @@ class ClientCache(object):
            L = self.noncurrent[oid]
            L.remove((o.start_tid, o.end_tid))

-    def _setup_trace(self):
-        tfn = self.path + ".trace"
-        self.tracefile = None
-        try:
-            self.tracefile = open(tfn, "ab")
-            self._trace(0x00)
-        except IOError, msg:
-            self.tracefile = None
-            logger.warning("Could not write to trace file %s: %s",
-                           tfn, msg)
+    # If `path` isn't None (== we're using a persistent cache file), and
+    # envar ZEO_CACHE_TRACE is set to a non-empty value, try to open
+    # path+'.trace' as a trace file, and store the file object in
+    # self._tracefile.  If not, or we can't write to the trace file, disable
+    # tracing by setting self._trace to a dummy function, and set
+    # self._tracefile to None.
+    def _setup_trace(self, path):
+        self._tracefile = None
+        if path and os.environ.get("ZEO_CACHE_TRACE"):
+            tfn = path + ".trace"
+            try:
+                self._tracefile = open(tfn, "ab")
+                self._trace(0x00)
+            except IOError, msg:
+                self._tracefile = None
+                logger.warning("cannot write tracefile %r (%s)", tfn, msg)
+            else:
+                logger.info("opened tracefile %r", tfn)

-    def _notrace(self, *arg, **kwargs):
-        pass
+        if self._tracefile is None:
+            def notrace(*args, **kws):
+                pass
+            self._trace = notrace

    def _trace(self,
               code, oid="", version="", tid="", end_tid=z64, dlen=0,
@@ -462,7 +473,7 @@ class ClientCache(object):
        if end_tid is None:
            end_tid = z64
        try:
-            self.tracefile.write(
+            self._tracefile.write(
                struct_pack(">iiH8s8s",
                            time_time(),
                            encoded,

--- a/src/ZEO/stats.py
+++ b/src/ZEO/stats.py
@@ -26,8 +26,8 @@ Usage: stats.py [-h] [-i interval] [-q] [-s] [-S] [-v] [-X] tracefile

 """File format:

-Each record is 24 bytes, with the following layout.  Numbers are
-big-endian integers.
+Each record is 18 bytes, plus a variable number of bytes to store an oid,
+with the following layout.  Numbers are big-endian integers.

 Offset  Size  Contents

@@ -46,6 +46,9 @@ Mask    bits  Contents
 0x7e    6     function and outcome code
 0x01    1     current cache file (0 or 1)

+The "current cache file" bit is no longer used; it refers to a 2-file
+cache scheme used before ZODB 3.3.
+
 The function and outcome codes are documented in detail at the end of
 this file in the 'explain' dictionary.  Note that the keys there (and
 also the arguments to _trace() in ClientStorage.py) are 'code & 0x7e',
@@ -59,18 +62,18 @@ import struct
 from types import StringType

 def usage(msg):
-    print >>sys.stderr, msg
-    print >>sys.stderr, __doc__
+    print >> sys.stderr, msg
+    print >> sys.stderr, __doc__

 def main():
    # Parse options
-    verbose = 0
-    quiet = 0
-    dostats = 1
-    print_size_histogram = 0
-    print_histogram = 0
-    interval = 900 # Every 15 minutes
-    heuristic = 0
+    verbose = False
+    quiet = False
+    dostats = True
+    print_size_histogram = False
+    print_histogram = False
+    interval = 15*60 # Every 15 minutes
+    heuristic = False
    try:
        opts, args = getopt.getopt(sys.argv[1:], "hi:qsSvX")
    except getopt.error, msg:
@@ -78,24 +81,27 @@ def main():
        return 2
    for o, a in opts:
        if o == '-h':
-            print_histogram = 1
-        if o == "-i":
+            print_histogram = True
+        elif o == "-i":
            interval = int(60 * float(a))
            if interval <= 0:
                interval = 60
            elif interval > 3600:
                interval = 3600
-        if o == "-q":
-            quiet = 1
-            verbose = 0
-        if o == "-s":
-            print_size_histogram = 1
-        if o == "-S":
-            dostats = 0
-        if o == "-v":
-            verbose = 1
-        if o == '-X':
-            heuristic = 1
+        elif o == "-q":
+            quiet = True
+            verbose = False
+        elif o == "-s":
+            print_size_histogram = True
+        elif o == "-S":
+            dostats = False
+        elif o == "-v":
+            verbose = True
+        elif o == '-X':
+            heuristic = True
+        else:
+            assert False, (o, opt)
+
    if len(args) != 1:
        usage("exactly one file argument required")
        return 2
@@ -107,12 +113,12 @@ def main():
        try:
            import gzip
        except ImportError:
-            print >>sys.stderr,  "can't read gzipped files (no module gzip)"
+            print >> sys.stderr, "can't read gzipped files (no module gzip)"
            return 1
        try:
            f = gzip.open(filename, "rb")
        except IOError, msg:
-            print >>sys.stderr,  "can't open %s: %s" % (filename, msg)
+            print >> sys.stderr, "can't open %s: %s" % (filename, msg)
            return 1
    elif filename == '-':
        # Read from stdin
@@ -122,7 +128,7 @@ def main():
        try:
            f = open(filename, "rb")
        except IOError, msg:
-            print >>sys.stderr,  "can't open %s: %s" % (filename, msg)
+            print >> sys.stderr, "can't open %s: %s" % (filename, msg)
            return 1

    # Read file, gathering statistics, and printing each record if verbose
@@ -203,9 +209,8 @@ def main():
                    bysizew[dlen] = d = bysizew.get(dlen) or {}
                    d[oid] = d.get(oid, 0) + 1
            if verbose:
-                print "%s %d %02x %s %016x %016x %1s %s" % (
+                print "%s %02x %s %016x %016x %c %s" % (
                    time.ctime(ts)[4:-5],
-                    current,
                    code,
                    oid_repr(oid),
                    U64(start_tid),
@@ -234,7 +239,7 @@ def main():

    # Error if nothing was read
    if not records:
-        print >>sys.stderr, "No records processed"
+        print >> sys.stderr, "No records processed"
        return 1

    # Print statistics
@@ -261,7 +266,7 @@ def main():
                code,
                explain.get(code) or "*** unknown code ***")

-    # Print histogram
+    # Print histogram.
    if print_histogram:
        print
        print "Histogram of object load frequency"
@@ -281,7 +286,7 @@ def main():
            print fmt % (binsize, addcommas(count),
                         obj_percent, load_percent, cum)

-    # Print size histogram
+    # Print size histogram.
    if print_size_histogram:
        print
        print "Histograms of object sizes"
@@ -325,7 +330,7 @@ def dumpbyinterval(byinterval, h0, he):
 def hitrate(bycode):
    loads = 0
    hits = 0
-    for code in bycode.keys():
+    for code in bycode:
        if code & 0x70 == 0x20:
            n = bycode[code]
            loads += n
@@ -345,8 +350,7 @@ def histogram(d):
    return L

 def U64(s):
-    h, v = struct.unpack(">II", s)
-    return (long(h) << 32) + v
+    return struct.unpack(">Q", s)[0]

 def oid_repr(oid):
    if isinstance(oid, StringType) and len(oid) == 8: