- 19 Sep, 2018 7 commits
-
-
Kirill Smelkov authored
For now all decoding all those pickles is tested to give the same object, as in e.g. "(I1\nI2\ntp0\n.", // MARK + TUPLE + INT and "I1\nI2\n\x86."), // TUPLE2 + INT But having a way to specify several pickles to a test case will become even more handy when later adding precise tests for Encoder - there we will need to assert that at such and such protocol encoding gives such and such pickles. And there are 5 protocol versions to test...
-
Kirill Smelkov authored
Previously there were two separate tables - for decode and encode tests. The table for encode tests was very small. TestDecode, which was operating on data from decode table, was also performing checks that decode(encode(object)) is idempotent - the work which is already too by TestEncode. However the coverage of input objects from decode table was not a strict superset of encode table objects. For the reasons above let's stop this divergence. Let's have a common table that define tests where for every test case there can be: - an "in" object, - a pickle, and - an "out" object where 1. pickle must decode to "out" object, and 2. encoding "in" object must give some pickle that decodes to "out" object. NOTE: In the usual case "in" object == "out" object and they can only differ if "in" object contains a Go struct. This will allow us to cover all existing decode and encode tests logic. However the coverage of each logic is now higher - for example Encoder tests are now run on every object from main table, not only for 3 cases like it was before.
-
Kirill Smelkov authored
Similarly to TestDecode. No integration with main tests table yet.
-
Kirill Smelkov authored
We are going to integrate encoding tests with the main tests data table in several steps. Only code movement here, no other change.
-
Kirill Smelkov authored
We are going to modify the main tests table and for every case (e.g. int(1) add pickles with various encoding that all represent the same object when decoded. The main TestDecode driver will iterate over all those input data and hand it over for particular testing to testDecode worker. Use t.Run for spawning each case - it is less typing (we do not need to manually print test.name on errors, etc), and it allows to select which subtests to run via e.g. `go test -run Decode/int`
-
Kirill Smelkov authored
Having that long data makes the table clumsy and harder to understand. By moving such data definition out of the table, we make it a bit easier to understand. In the case of "long line", the pickle input and the line itself were almost duplicating each other, so instead of having two long lines explicitly pasted, let's have the test input be defined as + {"too long line", "V" + longLine + "\n.", longLine}, In the case of Graphite messages, one long Graphite object was also duplicated in encode_test.go . Just a cleanup, no semantic change.
-
Kirill Smelkov authored
It will be handy to use testing.T.Run in the upcoming patches that add more decode/encode tests. However since testing.T.Run was was added in Go1.7 the code won't work with Go1.6 . Since Go1.6 was released in the beginning of 2016 - more than 2.5 years ago, and upstream Go policy is to support only current stable (currently Go1.11) and two previous releases (currently Go1.10 and Go1.9), as of today Go1.6 is both not supported and outdated. So let's drop explicit support for it in the name of progress. By above logic we could also drop support for 1.7 and 1.8, but since there is no pressing particular need to do so I'm not dropping them for now.
-
- 18 Sep, 2018 6 commits
-
-
Kirill Smelkov authored
Currently we often use constructs like _, err := e.w.Write([]byte{opNone}) return err however with added Encoder.emit helper it can become only return e.emit(opNone) which is more clearly readable. We can do similarly for formatted output: - _, err := fmt.Fprintf(e.w, "%c%dL\n", opLong, b) - return err + return e.emitf("%c%dL\n", opLong, b) Due to much boilerplate in one place (encodeInt) the return of fmt.Fprintf was not even checked. We change the code there with - _, err = e.w.Write([]byte{opInt}) - if err != nil { - return err - } - fmt.Fprintf(e.w, "%d\n", i) + return e.emitf("%c%d\n", opInt, i) which is now hopefully more visible for what is going on and is easier to catch potential mistake by eyes. A corresponding test for "int64 encoded as string" will be added later.
-
Kirill Smelkov authored
Protocol 3 adds opcodes to represent Python bytes. For Protocol 4 we were missing definitions of several of the opcodes, such as BINUNICODE8, EMPTY_SET etc. Add definitions for them all.
-
Kirill Smelkov authored
We already keep "Protocol 2" and "Protocol 4" opcodes separately, and for a good reason - it must be clear for a person just by looking at opcodes definition from which protocol version an opcode comes. However at present Protocol 0 and Protocol 1 opcodes come all intermixed with each other. In the upcoming patches we'll be teaching Encoder to take desired pickle protocol version into account (similarly to how pickle.dumps accepts desired protocol version in Python), and then the encoder will have to use different opcodes for different versions: for example if user asks to produce "protocol 0" pickle stream it will be invalid to use "protocol 1" opcodes - e.g. BININT, EMPTY_TUPLE, etc. So I went through https://github.com/python/cpython/blob/master/Lib/pickletools.py for each opcode searching its protocol version and separated the opcodes with proto=1 from protocol 0 opcodes.
-
Kirill Smelkov authored
It was probably a thinko in d00e99e7 (Add more opcodes so we can read Graphite's Carbon stream) which changed opcodes from string ("X") to character ('X') constants and marked the first opcode with byte type, probably with the idea that following opcodes will have the same type. Unfortunately it is not so as the following program demonstrates: package main const ( opAAA byte = 'a' opBBB = 'b' ) func main() { op := opBBB if true { op = opAAA // <-- line 11 } println(op) } --> ./bc.go:11:6: cannot use opAAA (type byte) as type rune in assignment Similarly if we try comparing opcodes, it also results in compile error: func main() { op := opBBB if op == opAAA { // <-- line 10 panic(0) } println(op) } --> ./bc.go:10:8: invalid operation: op == opAAA (mismatched types rune and byte) Since in the following patches it will be handy for encoder to e.g. set default opcode first, and then change it under some conditions to another opcode, exactly the same compile error(s) will pop up. So let's fix all opcode constants to have their type as byte to avoid such unexpected errors.
-
Kirill Smelkov authored
For example d.push(math.Float64frombits(u)) looks more clear compared to d.stack = append(d.stack, math.Float64frombits(u)) and we already use push in many other places.
-
Kirill Smelkov authored
v is concrete type here (float64) and Go automatically converts it to interface{} on call to Decoder.push().
-
- 17 Sep, 2018 1 commit
-
-
Kirill Smelkov authored
While doing a5094338 (encoder: More unexpected EOF handling) I was a bit disappointed that https://golang.org/cl/37052 was not going to be accepted and somehow added hack to tests that was doing strconv.ErrSyntax -> io.ErrUnexpectedEOF error conversion if the test name contained "unicode". Today I was refactoring tests and noticed that it is better we teach loadUnicode to return io.ErrUnexpectedEOF in the first place, and the next easy way for this (after fixing strconv.UnquoteChar itself) is to append many "0" to string and see if strconv.UnquoteChar still returns ErrSyntax. So do it in our custom unquoteChar wrapper.
-
- 11 Sep, 2018 1 commit
-
-
Kirill Smelkov authored
Instead of copy-pasting full test driver for each value, make it to be only one test with many values in table each tested by common driver. TestZeroLengthData was checking output.BitLen() == 0 which is another way to test for equality with big.Int(0) according to https://golang.org/pkg/math/big/#Int.BitLen So join it too to the rest of decodeLong tests. While at decodeLong topic, rename BenchmarkSpeed -> BenchmarkDecodeLong as this benchmark checks speed of decodeLong only, nothing else.
-
- 10 Sep, 2018 1 commit
-
-
Kirill Smelkov authored
Add a test that excersizes whether invalid protocol version is caught and reported. Without 5c420f85 the test fails like this: --- FAIL: TestDecodeError (0.00s) ogorek_test.go:312: "\x80\xffI1\n.": no decode error ; got 1, <nil>
-
- 05 Sep, 2018 1 commit
-
-
Kirill Smelkov authored
Due to := in v, err := d.r.ReadByte() newly declared err was shadowing outside err and thus even though the code intent was to treat all protocols != 2 as ErrInvalidPickleVersion, after leaving the switch err was always =nil and no error was reported. Fix it by explicitly declaring v and not using := not to shadow err. We also change the condition for supported protocol version to be in [0, 4] because: - we already have messages in our testsuite with PROTO opcode and version 1, and if we don't adjust the version condition the tests will fail. - the highest protocol version we support opcodes for is 4. - even though the documentation for PROTO opcode says version must be >= 2, in practice CPython decodes pickles with PROTO and lower version just ok: In [18]: pickle.loads("\x80\x02I5\n.") Out[18]: 5 In [19]: pickle.loads("\x80\x01I5\n.") Out[19]: 5 In [20]: pickle.loads("\x80\x00I5\n.") Out[20]: 5 so we don't complain about those lower versions too.
-
- 29 Aug, 2018 1 commit
-
-
Kirill Smelkov authored
-
- 17 Jul, 2018 2 commits
-
-
Kirill Smelkov authored
Continuing 84598fe1 (Add support for persistent references) theme let's develop support for persistent references more: the usual situation (at least in ZODB context, which is actually the original reason persistent references exist) is that decode converts a reference of (type, oid) tuple form into corresponding "ghost" object - an instance of type, but not yet loaded with database object's data. This way resulted object tree is created right away with types application expects (and ZODB/py further cares to automatically load object data when that ghost objects are accessed). To mimic this on Go side, with current state, after decoding, one would need to make a full reflect pass on the resulted decoded objects, find Refs, and convert them to instances of other types, also caring to patch pointers in other objects consistently that were pointing to such Refs. In other words it is some work and it coincides almost 100% with what Decoder already does by itself. Thus, not to duplicate that work and conducting parallel with Python world, let's add ability to configure Decoder and Encoder so that persistent references could be handled with user-specified application logic right in the process, and e.g. Decoder would return resulted object with references converted to corresponding Go types. To do so we introduce DecoderConfig and EncoderConfig - configurations to tune decode/encode process. To maintain backward compatibility NewDecoder and NewEncoder signatures are not adjusted and instead NewDecoderWithConfig and NewEncoderWithConfig are introduces. We should be sure we won't need e.g. NewDecoderWithConfigAndXXX in the future, since from now on we could be adding fields to configuration structs without breaking backward compatibility. For decoding there is DecoderConfig.PersistentLoad which mimics Unpickler.persistent_load in Python: https://docs.python.org/3/library/pickle.html#pickle.Unpickler.persistent_load For encoding there is EncoderConfig.PersistentRef which mimics Pickler.persistent_id in Python: https://docs.python.org/3/library/pickle.html#pickle.Pickler.persistent_id ( for Persistent{Load,Ref}, following suggestion from @kisielk, the choice was made to explicitly pass in/out Ref, instead of raw interface{} pid, because that makes the API cleaner. ) Then both Decoder and Encoder are adjusted correspondingly with tests added. About tests: I was contemplating to patch-in support for decoder and encoder configurations, and handling errors, into our main test TestDecode, but for now decided not to go that way and to put the test as separate TestPersistentRefs. By the way, with having configurations in place, we could start to add other things there - e.g. the protocol version to use for Encoder. Currently we have several places where we either don't use opcodes from higher protocol versions (fearing to break compatibility), or instead always use higher-version opcodes without checking we should be able to do so by allowed protocol version.
-
Kirill Smelkov authored
"1.10" is put into quotes because otherwise Travis parses that as 1.1 - see e.g. https://github.com/travis-ci/travis-ci/issues/9247.
-
- 01 Jul, 2018 1 commit
-
-
Kirill Smelkov authored
* Add docstrings to types exposed in public API I was refreshing my knowled of ogórek and added them along the way. * Add support for persistent references Python has the mechanism for objects in one pickle to reference objects in another pickle. This mechanism is primarily used in ZODB object database for one object to "persistently" reference another object(*). I needed a way to load and understand objects from a ZODB database in Go, and hence comes this patch for ogórek that teaches it to understand such persistent references. Like it is already with Call and Class we add a dedicated Ref type to represent persistent references in pickle stream. (*) in ZODB when one object is changed, there is no need to resave unchanged object that is referencing it because its reference stays valid and pointing to changed object: the "persistent ID" in ZODB is only object ID and which revision of the referenced object it points to is implicitly follows by current transaction number view of the database.
-
- 25 Apr, 2017 1 commit
-
-
Sean DuBois authored
-
- 05 Apr, 2017 6 commits
-
-
Kamil Kisiel authored
-
Kamil Kisiel authored
decoder: Small speedups
-
Kirill Smelkov authored
readLine is used in many text decoders, so it pays off if we care not to do allocation for result on every readLine call: name old time/op new time/op delta Speed-4 378ns ± 0% 379ns ± 1% ~ (p=0.397 n=5+5) Decode-4 70.1µs ± 0% 67.3µs ± 0% -3.97% (p=0.008 n=5+5) Encode-4 16.6µs ± 0% 16.6µs ± 0% ~ (p=0.548 n=5+5) name old alloc/op new alloc/op delta Speed-4 280B ± 0% 280B ± 0% ~ (all equal) Decode-4 32.3kB ± 0% 31.0kB ± 0% -3.87% (p=0.008 n=5+5) Encode-4 6.54kB ± 0% 6.54kB ± 0% ~ (all equal) name old allocs/op new allocs/op delta Speed-4 8.00 ± 0% 8.00 ± 0% ~ (all equal) Decode-4 428 ± 0% 326 ± 0% -23.83% (p=0.008 n=5+5) Encode-4 297 ± 0% 297 ± 0% ~ (all equal)
-
Kirill Smelkov authored
This reduces allocations a bit: name old time/op new time/op delta Speed-4 383ns ± 4% 378ns ± 0% ~ (p=0.302 n=5+5) Decode-4 72.1µs ± 1% 70.1µs ± 0% -2.76% (p=0.008 n=5+5) Encode-4 16.5µs ± 0% 16.6µs ± 0% ~ (p=0.095 n=5+5) name old alloc/op new alloc/op delta Speed-4 280B ± 0% 280B ± 0% ~ (all equal) Decode-4 35.7kB ± 0% 32.3kB ± 0% -9.72% (p=0.008 n=5+5) Encode-4 6.54kB ± 0% 6.54kB ± 0% ~ (all equal) name old allocs/op new allocs/op delta Speed-4 8.00 ± 0% 8.00 ± 0% ~ (all equal) Decode-4 429 ± 0% 428 ± 0% -0.23% (p=0.008 n=5+5) Encode-4 297 ± 0% 297 ± 0% ~ (all equal)
-
Kirill Smelkov authored
Most of the functions in ogorek.go already use Decoder by pointer, but NewDecoder() and Decoder.Decode() exceptionally used it by value. This was probably an oversight because when Decoder struct is used by value it is copied on every Decode call and also it is not possible to change and retain state in between calls as all changed happen to copy and then forgotten. This also leads to many unnnecessary allocations, i.e. memory allocated for stack one Decode call is not reused on next Decode call etc. So use Decoder by pointer only. This leads to the following speedup: name old time/op new time/op delta Speed-4 377ns ± 0% 383ns ± 4% ~ (p=0.214 n=5+5) Decode-4 80.4µs ± 3% 72.1µs ± 1% -10.38% (p=0.008 n=5+5) Encode-4 16.5µs ± 1% 16.5µs ± 0% ~ (p=0.690 n=5+5) name old alloc/op new alloc/op delta Speed-4 280B ± 0% 280B ± 0% ~ (all equal) Decode-4 44.0kB ± 0% 35.7kB ± 0% -18.77% (p=0.008 n=5+5) Encode-4 6.54kB ± 0% 6.54kB ± 0% ~ (all equal) name old allocs/op new allocs/op delta Speed-4 8.00 ± 0% 8.00 ± 0% ~ (all equal) Decode-4 502 ± 0% 429 ± 0% -14.54% (p=0.008 n=5+5) Encode-4 297 ± 0% 297 ± 0% ~ (all equal)
-
Kirill Smelkov authored
- use existing tests vector as input data - for Decode we prepare one large pickle stream from testv and decode all pickle from there - for Encode we prepare one lirge slice with all objects from testv and encode it On an i7-6600U currently it looks like: BenchmarkSpeed-4 5000000 387 ns/op 280 B/op 8 allocs/op BenchmarkDecode-4 20000 81892 ns/op 43989 B/op 502 allocs/op BenchmarkEncode-4 100000 16711 ns/op 6536 B/op 297 allocs/op
-
- 09 Mar, 2017 2 commits
-
-
Kamil Kisiel authored
.travis.yml += go1.8
-
Kirill Smelkov authored
Which is currently latest stable go release. Maybe we should also drop go1.6 .
-
- 08 Mar, 2017 7 commits
-
-
Kamil Kisiel authored
Make decode(encode(v)) to be identity + tuple
-
Kirill Smelkov authored
It is true that tuples and lists are very similar, and when e.g. a python function accepts tuple as input, and then uses only indexes access, it can be substituted with list. However some functions do `isinstance(..., tuple)` etc, and e.g. for a go program which wants to produce pickle for such python programs there is currently no way to generate pickle with tuple opcodes. So to solve this teach ogórek about tuples: - introduce simple Tuple type which has []interface{} as underlying - when decoding tuple opcodes produce this Tuple instead of []interface{} used previously. - when encoding encode Tuple with tuple opcodes. Otherwise Tuple can be seen and behaves like a regular go []interface{} slice. In fact the only difference is that runtime type of Tuple is different from runtime type of []interface{} but otherwise both values are the same and can be casted to each other if/when needed freely. NOTE opReduce decoder is adjusted to always require tuple, not list, because with e.g. cPickle: "c__main__\nf\n]R." -> TypeError('argument list must be a tuple', ...) "c__main__\nf\n)R." -> ok ( the first one uses empty list - "]", and the second one empty tuple - ")" ) So it is ok and compatible to always require args to be tuple for reduce.
-
Kirill Smelkov authored
That is so that decode·encode becomes identity. For the other way encode(decode(pickle)) == pickle it is generally not possible to do, because pickle machine is almost general machine, and it is possible to e.g. have some prefix in program which is NOP or result of which is no longer used, use different opcodes to build list or tuple etc. Adjustments to encoder are: - teach it to encode big.Int back to opLong (not some struct) - teach it to encode Call back to opReduce (not some struct) Tests are adjusted to verify that `decode(encode(v)) == v` holds for all inputs in test vector.
-
Kirill Smelkov authored
Else it prints just "no EOF" and it is not clear for which input the problem is there.
-
Kirill Smelkov authored
that is with %#v instead of %q. Because e.g. %q for nil []byte will be printing still as "", not nil etc. When some decode test fails here we want to get full information, not some pretty one.
-
Kirill Smelkov authored
1. the number has to be terminated with 'L' - check it (previously last byte was simply discarded) 2. big.Int.SetString() ok code has to be checked too to catch invalid input. big.Int.SetString() usage in test adjusted to also not skip errors silently.
-
Kirill Smelkov authored
* decoder: Don't forget to check memo for key not there on read access If we do not do reading from memo will return memo's zero value (= nil), which is a) not correct - many memo keys could be read this way, and b) nil there on stack breaks invariants of stack containing only good values. Furthermore, it can even lead to crashes on e.g. calling reflect.TypeOf(stack.top()) in the next patch. Anyway getting something from memo must be checked for whether it there or not for correctness. Noticed while working on fix for #30. * decoder: Fix "panic: runtime error: hash of unhashable type ..." Go specification requires that only comparable types could be used as map keys: https://golang.org/ref/spec#Map_types For map[interface{}]... this is checked at runtime, and if concrete value used as a key is not comparable it results in runtime panic, e.g.: panic: runtime error: hash of unhashable type ogórek.Call goroutine 1 [running]: github.com/kisielk/og-rek.(*Decoder).loadDict(0xc420084360, 0x64, 0x0) /tmp/go-fuzz-build561441550/gopath/src/github.com/kisielk/og-rek/ogorek.go:655 +0x18c github.com/kisielk/og-rek.Decoder.Decode(0xc42001c3c0, 0x5a9300, 0x0, 0x0, 0xc4200164b0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /tmp/go-fuzz-build561441550/gopath/src/github.com/kisielk/og-rek/ogorek.go:187 +0x172b github.com/kisielk/og-rek.Fuzz(0x7ff901592000, 0xa, 0x200000, 0x3) /tmp/go-fuzz-build561441550/gopath/src/github.com/kisielk/og-rek/fuzz.go:12 +0x108 go-fuzz-dep.Main(0x50d830) /tmp/go-fuzz-build561441550/goroot/src/go-fuzz-dep/main.go:49 +0xd9 main.main() /tmp/go-fuzz-build561441550/gopath/src/github.com/kisielk/og-rek/go.fuzz.main/main.go:10 +0x2d exit status 2 so when decoding dict and friends - all places where maps are populated - we have to check whether an object on stack we are going to use as key is suitable. Issue #30 contains comparison of 2 ways to do such check - catch runtime panic in exception style manner or use reflect.TypeOf(v).Comparable(). Since reflect-way turns out to be faster https://github.com/kisielk/og-rek/issues/30#issuecomment-283609067 and likely will become more faster in the future: https://github.com/golang/go/issues/19361 it was decided to go the reflect way (which is also a canonical way in go land). So audit all places where map items are set and add appropriate checks before them. I've verified that if we remove any of the added checks, via so far found crash vectors, at least one crash case will reappear in tests. This means that all added checks are actually test covered. Updates: #30 * decoder: Don't forget to check for odd #elements in loadDict & friends e.g. for Dict opcode the expected stack state is MARK key1 obj1 key2 obj2 ... keyN objN DICT so if in between MARK and DICT there is odd number of elements this is an error in input data. We also used to crash on such cases, e.g.: "(\x88d" panic: runtime error: index out of range goroutine 1 [running]: github.com/kisielk/og-rek.(*Decoder).loadDict(0xc420082990, 0xc42000e464, 0x0) /tmp/go-fuzz-build403415384/gopath/src/github.com/kisielk/og-rek/ogorek.go:652 +0x21d github.com/kisielk/og-rek.Decoder.Decode(0xc42001c7e0, 0x5a9320, 0x0, 0x0, 0xc4200167b0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /tmp/go-fuzz-build403415384/gopath/src/github.com/kisielk/og-rek/ogorek.go:188 +0x172d github.com/kisielk/og-rek.Fuzz(0x7f6d1f310000, 0x3, 0x200000, 0x3) /tmp/go-fuzz-build403415384/gopath/src/github.com/kisielk/og-rek/fuzz.go:12 +0x108 go-fuzz-dep.Main(0x50d798) /tmp/go-fuzz-build403415384/goroot/src/go-fuzz-dep/main.go:49 +0xd9 main.main() /tmp/go-fuzz-build403415384/gopath/src/github.com/kisielk/og-rek/go.fuzz.main/main.go:10 +0x2d I've audited whole decoder and regarding odd #(elements) there are only 2 places to care: loadDict and loadSetItems and crashers for all of them were already found by fuzz testing. Fixes #30 (for all known cases so far).
-
- 01 Mar, 2017 3 commits
-
-
Kirill Smelkov authored
This adds more corpus to fuzz as run by me and which found a bit more crashers now collected in #30.
-
Kirill Smelkov authored
This partly reverts 572d0a6a and just changes %t -> %T so that printing does not enter infinite recursion and just print what we need: type of an object (%t is for true/false). Along the way I've added "pickle:" prefix to errors. In the future maybe it would make sense to create dedicated DecodeError which would contain position, current opcode, and error message itself. This way error reporting would be more structured and useful for users.
-
Kamil Kisiel authored
-