ZBigArray: Fix API inconsistency with ndarray
bigarray
tests
test_basic.py +14 -0
__init__.py +3 -0
lib
tests
test_utils.py +56 -0
utils.py +60 -0
+ 14
- 0
+ 3
- 0
+ 56
- 0
+ 60
- 0
This fixes issue #9 (closed).
Test runs are pending (I just created and validated a new test suite).
/cc @kirr
mentioned in issue #9 (closed)
@levin.zimmermann, thanks for following up with the patch.
I have several suggestions about it:
even though we fix behaviour of BigArray/ZBigArray, there is no test that covers adjusted behaviour. In other words if, for example, we change _init0 code back to use provided shape directly, nothing in the tests would indicate reintroduced breakage. There is a golden and universal rule: whenever we fix something we add a test that covers the fix. Actually in such situations it is better to start preparing the fix by first working on corresponding test, that when run, would fail and indicate discovered problem. Then we know that if we fix the problem the test should indicate that by stopping to fail, and we will also be sure that if, in the future, something - by accident or due to another reason - would break fixed behaviour, the test, that we add with the fix, would catch it and report about detected problem. In other words in case of regression it won't go unnoticed. So please add corresponding test that exercises e.g. using list for shape as you originally reported in #9 (closed).
Over the years I settled on the following structure for code: lay functionality from top at the beginning to down at the end. I mean it helps to understand the code if the implementation is structured as follows: first there comes an overview of the module, then comes definition of data structures, then comes definition of top-level exported functionality with functions, that the top-level functionality uses, coming after implementation of top-level bits. This way if a person starts to look at a module the first time, or after 6 months break, it comes very handy: it is clearly documented what the module provides, then comes the implementation of what should be provided with going step-by-step down towards the end of the file. And as the end result due to this discipline, I hope, that it becomes easier to understand the implementation. Original wendelin.core 1 code from 2014 might be not following this fully, but recent parts in wcfs/ , and other new modules added during wendelin.core 2 work, should.
In your patch _inttuple
is added at the top of bigarray.py module coming before BigArray. Locally and short-term - while now we are working on this particular problem _inttuple functionality is indeed in the focus. But in the longer-term, or for a user of bigarray module, the focus should be on BigArray class with _inttuple being only a little corner detail. I would thus suggest to move _inttuple to the end of bigarray.py or, if you prefer, to somewhere in the lib/
.
I would suggest to amend commit message and to put more explanation and context into it. For example I would suggest to be more specific in the subject saying that we fix handling of shape
instead of just fix inconsistency
without mentioning which inconsistency, and I would later explain the details of the inconsistency in the commit message itself. The rationale for it is below:
Thanks to git, or any other version control system, what we have for software is not only its latest snapshot, but a history how this software was changed, fixed and evolved. And with some discipline that history can be valuable for learning and navigating the software on its own: when someone studies the implementation, or re-learns about why and how things were done after not looking at particular code for say 1 year:
During short-term there is no problem if the commit messages says just fixes #9
, because I already read that #9 (closed) recently and fully remember the context. But long-term, if later someone studies the code, runs git gui blame
for bigarray.py, sees the line with that _inttuple call and checks whats up with it, that person will only see what it says: "fixes inconsistency" and would need to go through reading #9 (closed) to get which one and how. I agree that #9 (closed) is not big and going through it is not a big deal, but still it would be better if the commit message would directly explain the problem that it fixes and how it fixes it. The explanation usually can be more short compared to whole exchange in corresponding issue - coming only with relevant summary. In other words the entry, that we put into git log, should be considered as coming almost self-sufficient: it itself should describe everything related to the change. It, of course, can refer to original issues and other conversations for better context, but still it comes more clear and transparent for everyone if that final log entry has everything necessary. It also starts to make sense if you think of it as the final clean copy with everything else, that was used before it, as being preliminary drafts. People usually underestimate importance of this, but the more you start to rely on software history for learning and transparency, the more it becomes useful and more important. I would suggest to start practicing it from the beginning when it is easy to learn new things.
On py2 there are two types to represent integers: int
and long
. And as currently implemented raise_if_not_integer(2L)
raises TypeError. I would say we need to fix this because np.ndarray((2L,))
is handled ok. Since the first implementation of _inttuple
is buggy, I would suggest to also add a test for _inttuple
itself too.
Does those suggestions makes sense to you?
Please let me know what you think.
Kirill
Thank you @kirr for the comprehensive feedback and review! Everything you wrote makes sense for me and its reasons are understandable. I'll try to improve (1) (tests for bug fixes and new features; maybe tests before solution) and (3) (precise comprehensive commit messages which explain what is actually solved).
(2) is a new idea for me, but after reading it twice I understand the module structure concept of starting with the "what" and ending with the "how". I added the helper function before the class because it is same with _flatbytev
which is defined before ArrayRef
.
Regarding 4: Thanks for pointing this out, I didn't knew about the existence of long
.
I'll apply your suggestions, rebase the MR and finally ping you again.
Would lib/utils.py
be okay for you for inttuple
(perhaps then it shouldn't indicate being private)?
@levin.zimmermann, thanks for feedback and you are welcome. Regarding (2) I agree that _flatbytev
is defined before ArrayRef
, which you used as an example and that goes against my advice. I agree this is inconsistent and it is one of the examples of what I meant with saying that "wendelin.core 1 code is not following this fully". The metaphor of starting from "what" to -> "how" is a good one and being relevant here catching the gist - thanks for referencing it.
lib/utils.py
seems ok, please feel free to move that function to there and make it "public".
Kirill
mentioned in commit levin.zimmermann/wendelin.core@97b12dd6
added 1 commit
Hello @kirr,
thanks for feedback regarding the inttuple
placement!
I added tests, moved the utility function into a new module and rebased everything into one commit (with a - hopefully - clear commit message). Tests are passing now on Nexedis test runner.
Do you think this is okay to be merged?
If I'm not mistaken, the wendelin.core
repository avoids merge request commits but uses a fast-forwarding strategy, right? So would something like this be the proper way of fast-forward merging? Would this also set the status of this MR in gitlab to merged
?
git clone https://lab.nexedi.com/nexedi/wendelin.core
cd wendelin.core
git fetch origin merge-requests/14/head:mr-origin-14 && git checkout mr-origin-14
git checkout origin/master
git rebase origin/master
git merge --ff-only mr-origin-14
git push origin master
Best, Levin
Ps. Thanks for explaining the _flatbytev
case, I can understand that this is one of the old leftovers before given approach was followed :)
Hello @levin.zimmermann, thanks for the update.
The change looks better now. I suggest we first work on fixing remaining smaller issues before merging it:
XXX
is usually used to mark that something is not good and needs to be fixed. However in BigArray constructor you use it just to put a notice about why shape = inttuple(shape)
is done. If you want to put such a note usally it can be done with NOTE
mark. That's the culture I was taught myself when learning.
Actually regarding that particular case, I believe, there is no need to explain in the code so much about why we do shape = inttuple(shape)
. It is a common practice to convert input arguments to expected types like this. I would thus suggest to consider incorporating the following interdiff:
--- a/bigarray/__init__.py
+++ b/bigarray/__init__.py
@@ -81,6 +81,7 @@ def __init__(self, shape, dtype_, bigfileh, order='C'):
# __init__ part without fileh
def _init0(self, shape, dtype_, order):
+ shape = inttuple(shape) # mimic numpy
_dtype = dtype(dtype_)
if _dtype.hasobject:
logging.warn("You tried to use dtype containing object (%r) with out-of-core array ..." % _dtype)
@@ -94,10 +95,6 @@ def _init0(self, shape, dtype_, order):
raise TypeError("dtypes with object are not supported", _dtype)
self._dtype = _dtype
- # XXX: Reassigning cast object to local "shape" variable,
- # because local "shape" is used few lines later here
- # and we need to be sure its already tuple[int, ...].
- shape = inttuple(shape)
self._shape = shape
self._order = order
# TODO +offset ?
the tests for shape-handing functionality are added to test_arrayzodb.py and are done at ZBigArray level. However ZBigArray is just a layer over BigArray which adds persistency, and the most logic of being array-like is implemented by BigArray itself. This way test_arrayzodb.py exercies only persistency-related issues, and the tests for being array like are located in bigarray/tests/test_basic.py
and operate in terms of just BigArray without any ZODB context. I would thus suggest to move shape-handling test there and to redo it to test BigArray directly. It is logical and consistent to do because we fix BigArray - not ZBigArray.
The copyright years, that you put into newly added utils.py
and test_utils.py
are 2014-2018
and 2015
. What is the reason for that? Normally the copyright years mark the time during which the functionality in corresponding file has been worked on. And the primary author of those files is not me, isn't it? Would you please correct?
Regarding how things are merged: wendelin.core is following "Apply Patch" approach for merge-requests with just one patch. I initially implemented that functionality here: gitlab-ce@827d3914 and we used it for some time for all projects in Nexedi. However over time and after maintainers of lab.nexedi.com have been changed, to my knowledge this patch was dropped even though people were asking to retain this functionality because it proved to be useful. This way and given there is now no "Apply Patch" button in gitlab web ui, to apply a patch, I usually use command line, cherry-pick a given patch, amend it to have /reviewed-by
and /reviewed-on
notices, and push it to master. But this way corresponding merge request is not automatically marked as merged. Personally I'm ok to just close such merge requests with saying "merged as XXX" in the end.
Kirill
Ps. Thanks for explaining the _flatbytev case, I can understand that this is one of the old leftovers before given approach was followed :)
You are welcome, but just for the reference: it was not a "one-go" switch. The approach was incrementally being realized / starting to follow over time with gaining more understanding and experience.
mentioned in commit levin.zimmermann/wendelin.core@8f46044d
added 1 commit
mentioned in commit levin.zimmermann/wendelin.core@264eb9f3
added 1 commit
Hi @kirr,
thank you for your review!
XXX
is usually used to mark that something is not good and needs to be fixed.
Thanks for pointing this out, I misunderstood it as 'caution, take care'.
Actually regarding that particular case, I believe, there is no need to explain in the code so much
Rereading it, I agree. I added it after my initial mistake of immediately assigning it to self._shape
. But it should indeed be clear as it is.
the tests for shape-handing functionality are added to test_arrayzodb.py and are done at ZBigArray level. However ZBigArray is just a layer over BigArray which adds persistency, and the most logic of being array-like is implemented by BigArray itself. This way test_arrayzodb.py exercies only persistency-related issues, and the tests for being array like are located in
bigarray/tests/test_basic.py
and operate in terms of just BigArray without any ZODB context. I would thus suggest to move shape-handling test there and to redo it to test BigArray directly.
Ahh yes I see, this makes sense. I moved the test.
The copyright years, that you put into newly added
utils.py
andtest_utils.py
are2014-2018
and2015
.
Sorry for this, I just missed it. I recognized there is the policy of adding license note at the header of a source file, but I didn't checked the copyright details. I'll take care of them from now.
Regarding how things are merged: wendelin.core is following "Apply Patch" approach for merge-requests with just one patch. I initially implemented that functionality here: gitlab-ce@827d3914 and we used it for some time for all projects in Nexedi. However over time and after maintainers of lab.nexedi.com have been changed, to my knowledge this patch was dropped even though people were asking to retain this functionality because it proved to be useful. This way and given there is now no "Apply Patch" button in gitlab web ui, to apply a patch, I usually use command line, cherry-pick a given patch, amend it to have
/reviewed-by
and/reviewed-on
notices, and push it to master. But this way corresponding merge request is not automatically marked as merged. Personally I'm ok to just close such merge requests with saying "merged as XXX" in the end.
Thank you for the context and history of Nexedi gitlab. Okay, so just pushing it to master like any other commit with additional review-by
notes in the commit message. I'm fine with manually closing the MR too, I just wanted to ensure I wouldn't miss any hidden magic how to mark it as merged while omitting merge requests :)
Anyway, I applied your suggestions and test runs are successful after changes. If there aren't any more objections left, I would push the commit.
Best, Levin
Ps. BTW when moving tests from ZBigArray
to BigArray
I realized the dtype_
init argument of BigArray (perhaps because of the direct import of dtype
from numpy
). Would you say its worth it considering a refactor from dtype_
to dtype
(as it is in ZBigArray
and ndarray
) or would you say it doesn't matter, because 99% of the time people are using ZBigArray
+ 99% of the time people *args
instead of **kwargs
?
mentioned in commit adffe247
Good day, @levin.zimmermann. Thanks for the update and corrections.
I've applied your patch to master and pushed it to nexedi/wendelin.core as adffe247. I've made the following amendment to the patch:
diff --git a/bigarray/tests/test_basic.py b/bigarray/tests/test_basic.py
index b20b1055..1a290ff0 100644
--- a/bigarray/tests/test_basic.py
+++ b/bigarray/tests/test_basic.py
@@ -112,6 +112,7 @@ def test_bigarray_noobject(testbig):
with raises(TypeError):
BigArray((1,), dtype_, Zh)
+
# Ensure BigArray mimics the behavior of ndarray
# when initializing its shape property.
# (array.shape should always return a tuple of ints,
@@ -127,6 +128,7 @@ def assert_shape_becomes(shape_input, shape_property):
assert_shape_becomes(42, (42,))
assert_shape_becomes((4, 4), (4, 4))
+
# basic ndarray-compatibility attributes of BigArray
def test_bigarray_basic(testbig):
Zh = testbig.fopen()
diff --git a/lib/utils.py b/lib/utils.py
index dcab8070..45e4bd41 100644
--- a/lib/utils.py
+++ b/lib/utils.py
@@ -40,7 +40,7 @@ def inttuple(int_or_int_sequence):
# https://github.com/python/cpython/blob/9c4ae037b9c3/Objects/abstract.c#L1706-L1713
def raise_if_not_integer(object_):
- # XXX: Use 'numbers.Integral' instead of 'int' to support
+ # NOTE: Use 'numbers.Integral' instead of 'int' to support
# python2 long type.
if not isinstance(object_, numbers.Integral):
raise TypeError(
to change XXX
to NOTE
in inttuple implementation, and to follow style with
vertical spacing in test_basic.py where nearby functions and classes are
delimited by 2 vertical blank lines in between them. I've changed XXX to NOTE
with the same rationale as we discussed regarding shape = inttuple(shape)
call: there is nothing to fix here and the remark is just a notice.
I hope the edits are ok, but please correct me if maybe I misunderstood or missed something.
XXX
is usually used to mark that something is not good and needs to be fixed.
Thanks for pointing this out, I misunderstood it as 'caution, take care'.
You are welcome. I agree it might be not self-describing compared to e.g.
FIXME
, TODO
or NOTE
. It is like this probably for historical reason.
Actually regarding that particular case, I believe, there is no need to explain in the code so much
Rereading it, I agree. I added it after my initial mistake of immediately assigning it to
self._shape
. But it should indeed be clear as it is.
Yes, this is very similar to what we discussed about inttuple placement: while we are working on something, the details of that something are in focus and perceived as important. And a mistake, if taken, seems to be important do document thoroughly. But this perception is what it gets on a smaller scale. On a larger-scale and long-term what matters is cleanness, consistency and transparency of the code. Comments help here, but it also happens that lots of comments can do the reverse of the intent: if many comments explain or document what should be already obvious from the code, or from existing culture, they only start to add noise and make other - more import comments - harder to find. In other words it is always about finding the balance. Yes, good balance is hard to find, and also it is subjective, but still it is good to keep this in mind.
Kirill
P.S. And of course the inconsistency with numpy API, that you found in BigArray
constructor regarding dtype
, is better to fix. Normally I would say unless
there is no strong reason to diverge from numpy API and semantics, we should
follow it literally. Yes, it was my mistake to originally name the argument as
dtype_
to avoid clash with imported dtype
, but to the users of this module
what is important is external API, not how internally-used things are named. So
this was a mistake of putting internal organization ("how") as a higher
priority compared to provided service ("what"). Today this looks obviously
wrong to me, but it takes time and practice to realize driving principles and
to start following them.
Hello Kirill,
thanks for pushing the patch to master and applying its final polishing! All good here for me, I understand the rationale.
In other words it is always about finding the balance. Yes, good balance is hard to find, and also it is subjective, but still it is good to keep this in mind.
Yes keeping in mind to check if a proposed solution is acceptable only from a local or also from a global larger scale point of view is a good hint. And it's indeed more difficult because it's not so much about 'hard rules', but about experience and intuition. I keep practicing.
And of course the inconsistency with numpy API, that you found in BigArray constructor regarding
dtype
, is better to fix. Normally I would say unless there is no strong reason to diverge from numpy API and semantics, we should follow it literally. Yes, it was my mistake to originally name the argument asdtype_
to avoid clash with importeddtype
, but to the users of this module what is important is external API, not how internally-used things are named. So this was a mistake of putting internal organization ("how") as a higher priority compared to provided service ("what"). Today this looks obviously wrong to me, but it takes time and practice to realize driving principles and to start following them.
I see, then I'll keep it on my TODOs somewhere :)
Best, Levin
closed
Levin, you are welcome and thanks for feedback. Good to know my adjustments are ok with you. Yes, keeping practicing is required - good luck with it. And with keeping TODOs it is also possible to keep it in a shared place one way or another - e.g. in bug module in Nexedi ERP5 or in an issue.
Kirill
mentioned in merge request erp5!953 (closed)
mentioned in merge request kirr/xlte!4 (merged)
Files with large changes are collapsed by default.
Files with large changes are collapsed by default.
Files with large changes are collapsed by default.
Files with large changes are collapsed by default.