Commit ff79457d authored by Jérome Perrin's avatar Jérome Perrin

XXX why not just use unix date command ?

parent 9be7226f
These are chronologically-order notes taken while working on that prototype to support dates (absolute and relative) in zodbtools' `<tidrange>`
# git's approxidate test "as is" # git's approxidate test "as is"
both python and go libraries do not support inputs like `1.month.ago`, so we preprocess to remove the `.`. (python's dateparser supports some formats though) both python and go libraries do not support inputs like `1.month.ago`, so we preprocess to remove the `.`. (python's dateparser supports some formats though)
...@@ -54,9 +57,80 @@ go run prototype/proto.go ...@@ -54,9 +57,80 @@ go run prototype/proto.go
no more errors reported. no more errors reported.
# but is all this really a good idea ?
As we could see go and python have small differences, but it's OK.
One problem is that it (especially python's dateparser) does not report errors on invalid input, but almost always parse something. It has a `STRICT_PARSING` option, but it still allows invalid inputs (and cause different behavior on some relative dates), basically when the output contains numbers, the parser will return a date ...
for example with these inputs:
```
### some invalid formats that "looks OK"
# wrong format on timezone (should be 2009-06-01T22:00:00+09:00)
2009-06-01T01:00:00Z XXX 2009-06-01T10:00:00:+09:00
# day is 34
ERROR XXX 2009-06-34T22:00:00Z
# one digits hour minutes
ERROR XXX 2009-06-01T1:2:3
# month use a captital o instead of O
ERROR XXX 2009-O6-01T22:00:00Z
```
there's no error, just unexpected behaviors:
```
(env) $ python prototype/proto.py
ERROR timespec: 2009-06-01T10:00:00:+09:00 expected time: 2009-06-01T01:00:00Z parsed time: 2009-06-07T04:00:00Z
ERROR timespec: 2009-06-34T22:00:00Z expected time: ERROR parsed time: 2009-06-07T04:00:00Z
ERROR timespec: 2009-06-01T1:2:3 expected time: ERROR parsed time: 2009-05-31T23:02:03Z
ERROR timespec: 2009-O6-01T22:00:00Z expected time: ERROR parsed time: 2009-06-01T22:00:00Z
(env) $ go run ./prototype/proto.go
ERROR timespec: 2009-06-01T10:00:00:+09:00 expected time: 2009-06-01T01:00:00Z parsed time: 2009-08-30T08:00:00Z
ERROR timespec: 2009-06-34T22:00:00Z expected time: ERROR parsed time: 2009-08-30T20:00:00Z
ERROR timespec: 2009-06-01T1:2:3 expected time: ERROR parsed time: 2009-06-01T01:02:03Z
ERROR timespec: 2009-O6-01T22:00:00Z expected time: ERROR parsed time: 2009-08-30T20:00:00Z
```
But with valid inputs, current implementations de-facto support so many formats, so for users who don't read the manual, it seem the tool support "any format and it will magically work", but sometimes it will not do what they think and cause bad surprises.
The original need was to support using date in tidrange, without having to "calculate" the tid from hash.
Relative dates are a cool feature, but I feel the most common use cases would be to use absolute dates.
If we only supported only one simple format (ISO8601), we would be able to detect and refuse invalid inputs.
That said, this is a tool for power users, so maybe that's OK as is. All these arguments exists for any powerful but dangerous tools such as using chainsaw to cut wood faster or cars to move faster, maybe we should not worry here.
While thinking about this, I checked unix `date` command and realized it also supports "clever" time parsing, for example:
```bash
date -d yesterday --iso-8601=seconds --utc
```
with `prototype/proto_unix_date.py` to check support for the reference dates:
```
(env) $ python prototype/proto_unix_date.py
ERROR timespec: 1985-04-12T23:20:50.52Z expected time: 1985-04-12T23:20:50.520000Z parsed time: 1985-04-12T23:20:50Z
ERROR timespec: 6am yesterday expected time: 2009-08-29T04:00:00Z parsed time: 2009-08-29T06:00:00Z
ERROR timespec: 6pm yesterday expected time: 2009-08-29T16:00:00Z parsed time: 2009-08-29T18:00:00Z
ERROR timespec: 3:00 expected time: 2009-08-30T01:00:00Z parsed time: 2009-08-30T03:00:00Z
ERROR timespec: 15:00 expected time: 2009-08-30T13:00:00Z parsed time: 2009-08-30T15:00:00Z
ERROR timespec: noon today expected time: 2009-08-30T10:00:00Z parsed time: date: invalid date 'noon today'
ERROR timespec: noon yesterday expected time: 2009-08-29T10:00:00Z parsed time: date: invalid date 'noon yesterday'
ERROR timespec: 6AM, June 7, 2009 expected time: 2009-06-07T04:00:00Z parsed time: date: invalid date '6AM, June 7, 2009'
```
Another completly different approach could be:
* `<tidrange>` supports only hex tid
* new subcommands are added `tid-from-iso8601` and `iso8601-from-tid` (the later is not directly needed here, but could be convenient as well)
Reports on relative dates could be obtained with combination of simple commands for which , wh for example:
```bash
zodb analyze Data.fs $(zodb tid-from-iso8601 $(date -d "3 weeks ago" --iso-8601=seconds))..$(zodb tid-from-iso8601 $(date -d "yesterday" --iso-8601=seconds))
```
# TODO # TODO
* [ ] spec with "local timezone" properly. * [X] spec with "local timezone" properly.
* [ ] make sure go's proto handling of timezone is correct (it's copy pasted from examples) * [X] make sure go's proto handling of timezone is correct (it's copy pasted from examples)
\ No newline at end of file \ No newline at end of file
# check our format with unix date command
from __future__ import print_function
import argparse
import subprocess
parser = argparse.ArgumentParser(description="Prototype script for tidrange parsing")
parser.add_argument(
"--formats",
default="./zodbtools/test/testdata/tidrange-formats.txt",
help="Path of tidrange-formats.txt",
)
if 0:
# faketime's $FAKETIME and $FAKETIME_FMT does not seem to support timezones,
# so this is not configurable as command line args.
parser.add_argument(
"--timezone",
default="Europe/Paris",
help="Timezone aka `America/Los_Angeles` formatted time-zone",
)
parser.add_argument(
"--relative-time-base",
default="2009-08-30T19:20:00Z",
help="Consider relative time as relative to this RFC3339 formatted time",
)
args = parser.parse_args()
args.timezone = "Europe/Paris"
args.relative_time_base = "2009-08-30T21:20:00Z"
with open(args.formats, "r") as f:
for line in f:
line = line.strip()
# skip comments or empty lines
if not line or line.startswith("#"):
continue
reference, _, timespec = line.split(" ", 2)
preprocessed_timespec = timespec
if "ago" in timespec:
preprocessed_timespec = timespec.replace(".", " ").replace("_", " ")
try:
parsed = subprocess.check_output(
("date", "-d", preprocessed_timespec, "--iso-8601=seconds", "--utc"),
stderr=subprocess.STDOUT,
env={
"TZ": args.timezone,
"FAKETIME": args.relative_time_base,
"FAKETIME_FMT": "%Y-%m-%dT%H:%M:%S%z",
# https://github.com/wolfcw/libfaketime
"LD_PRELOAD": "/srv/slapgrid/slappart8/srv/runner/project/zodbtools/faketime/libfaketime/src/libfaketime.so.1",
}).strip().decode('utf-8')
# Normalize the output, as date command outputs timzone as +00:00 but our reference times use Z
assert parsed.endswith('+00:00')
parsed = parsed[:-len('+00:00')] + 'Z'
except subprocess.CalledProcessError as e:
parsed = e.output.strip()
if parsed != reference:
print(
"ERROR timespec:",
timespec,
# "( preprocessed as", preprocessed_timespec, ")",
"expected time:",
reference,
"parsed time:",
parsed
)
\ No newline at end of file
...@@ -13,18 +13,19 @@ ...@@ -13,18 +13,19 @@
2018-01-01T10:30:00Z 03c4857600000000 2018-01-01T10:30:00Z 2018-01-01T10:30:00Z 03c4857600000000 2018-01-01T10:30:00Z
1985-04-12T23:20:50.520000Z 02b914f8d78d4fdf 1985-04-12T23:20:50.52Z 1985-04-12T23:20:50.520000Z 02b914f8d78d4fdf 1985-04-12T23:20:50.52Z
1996-12-20T00:39:57Z 03189927f3333333 1996-12-19T16:39:57-08:00 1996-12-20T00:39:57Z 03189927f3333333 1996-12-19T16:39:57-08:00
2018-01-01T05:30:00Z 03c4857600000000 2018-01-01T10:30:00+05:00
# RFC822 # RFC822
1976-08-26T14:29:00Z 02728aa500000000 26 Aug 76 14:29 GMT 1976-08-26T14:29:00Z 02728aa500000000 26 Aug 76 14:29 GMT
1976-08-26T12:29:00Z 02728a2d00000000 26 Aug 76 14:29 +02:00 1976-08-26T12:29:00Z 02728a2d00000000 26 Aug 76 14:29 +02:00
# RFC850 -> not supported
# (note that I'm not 100% sure of the expected result here)
#2006-01-02T22:04:05Z 036277cc15555555 Monday, 02-Jan-06 15:04:05 MST
# RFC1123 -> not supported # RFC850 -> not supported (by go implementation)
# (note that I'm not 100% sure of the expected result here) 2006-01-02T22:04:05Z 036277cc15555555 Monday, 02-Jan-06 15:04:05 MST
#2006-01-02T22:04:05Z 036277cc15555555 Mon, 02 Jan 2006 15:04:05 MST
# RFC1123 -> not supported (by go implementation)
2006-01-02T22:04:05Z 036277cc15555555 Mon, 02 Jan 2006 15:04:05 MST
2006-01-02T22:04:05Z 036277cc15555555 Mon, 02 Jan 2006 23:04:05 GMT+1
# explicit UTC timezone # explicit UTC timezone
2018-01-01T10:30:00Z 03c4857600000000 2018-01-01 10:30:00 UTC 2018-01-01T10:30:00Z 03c4857600000000 2018-01-01 10:30:00 UTC
...@@ -89,3 +90,13 @@ ...@@ -89,3 +90,13 @@
### works with python implementation, but not supported: ### works with python implementation, but not supported:
#2018-01-01T09:30:00Z 03c4853a00000000 le 1er janvier 2018 à 10h30 #2018-01-01T09:30:00Z 03c4853a00000000 le 1er janvier 2018 à 10h30
#2018-01-01T23:00:00Z 03c4886400000000 2018年1月2日 #2018-01-01T23:00:00Z 03c4886400000000 2018年1月2日
### some invalid formats that "looks OK"
# wrong format on timezone (should be 2009-06-01T22:00:00+09:00)
2009-06-01T01:00:00Z XXX 2009-06-01T10:00:00:+09:00
# day is 34
ERROR XXX 2009-06-34T22:00:00Z
# one digits hour minutes
ERROR XXX 2009-06-01T1:2:3
# month use a captital o instead of O
ERROR XXX 2009-O6-01T22:00:00Z
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment