List Info

Thread: Time Series review results




Time Series review results
country flaguser name
United States
2007-08-17 12:19:17

  I am pleased to announce that the Time Series library,
submitted by Eric
Neibler and developed by him along with Daniel Egloff,
Matthias Troyer,
David Abrahams and Daniel Wallin using funding provided by
Zurcher
Kantonalbank has been accepted into boost. As seen in the
review, there are
some important issues that need to be addressed before the
library is ready
for boost distribution, however I am confident that this is
a very good
library. Although the donation of time and funding for Time
Series by
Zurcher Kantonalbank is laudable and something we would like
to encourage
more companies to do, the fact that it was donated is not
important to the
review process.
 
  Something that is important is the well-established
reputation of the
involved lead developer. Eric has a number of very high
quality libraries
already in boost and has shown his willingness to support
them thoroughly
and improve them continually. He has been very responsive to
user feedback
and has an obvious desire to have only high quality work
committed to boost
with his name on it. For this reason, I am giving him a
little more leeway
than might be given to a developer without his reputation to
address the
issues in this review and enter the revised version into
boost without a
fast tracked second review.
 
  Many thanks to the participants in the review: Tom
Brinkman, Hugo Duncan,
Andrey Tcherepanov, Matthias Schabel, Steven Watanabe,
Stjepan Rajko, Matias
Capeletto, Matthias Schabel, Lewis Hyatt, Paul Bristow,
Tobias Schwinger,
Matthias Troyer, Phil Endecott, Zach Laine, Dave Abrahams,
Michael Marcin,
and Michael Fawcett for their time and attention. The
discussions held in
this review provided a strong foundation for improving the
library, and the
variety of domains of expertise represented by the
participants shows how
broad the desire for a library of this sort is.
 
  Most of the points raised in the review have been directly
addressed by
Eric and discussed until a solution that is acceptable to
all involved
parties was reached. Eric has acknowledged many as now
either fixed in his
version or on his To-Do list. However, for the sake of easy
reference I will
list the major points and my current understanding of what
to do about them
below. The order in which the points are presented should
not be taken as a
rating of importance: it is just the order I made notes
about them as I
reviewed the discussion.
 
1)   Fix the floating-point offset issues ­ The answer to
this may be a
change in how they are implemented or a complete scrapping
of the floating
point offsets as an indexing type. In either case, it would
be a good idea
to provide a view facade for time series that applies a
constant
multiplicative factor (or possibly a more general function)
to an offset.
This may even be a sufficient answer to all the
floating-point issues, but
it is not yet clear to me (or to anyone else that I noticed
in the review)
what the best answer is. I expect there will be continued
discussion on the
developer list to clarify this issue.

2)   Polish and include the rolling window average example
sent to the list
­ This is a very good example of a very potent design idiom
for this
library, and one that would help a lot of people better
understand the
potential of the library. The use of the circular buffer to
collect multiple
points is necessary for many filters. It is such a good case
that you should
consider not only including it in the library, but also
making a tutorial
out of it that shows how new algorithms can be added to the
library.
Ideally, many algorithms that are specific to domains that
the authors are
not specialists in will be donated by users who develop and
polish them for
inclusion. A very clear tutorial will make this more
likely.

3)   If the rolling average is not developed as a tutorial
example, some
other filter should be ­ There is no way the library can
include all the
filters in use by programmers in all the various use
domains, so the
expectation is that writing your own filters will be common.
The
documentation needs to support this very well.

4)   The documentation needs improvement in a number of ways
­
Reorganization to build from foundational ideas and uses to
the more complex
issues is important, as is the addition of overview,
tutorial and rationale
sections, as well as a substantial increase in the number of
examples. The
learning curve is currently too steep and that seems to be
discouraging
users who might find the library a very good fit. An
important step along
the way is to add good examples and good pictures in many
places in the
documentation. This general note is reinforced in some of
the other
comments.

5)   Small toy examples for each of the algorithms would
make the
documentation of them far more clear

6)   Since you maintain both libraries, it would be useful
to add to the
documentation for both the Time Series and the Accumulators
libraries some
notes on how a user should select between them for specific
jobs. This came
up in discussions before the review, so it is likely to be a
problem for
more future users.

7)   Well chosen pictures can help the readers of this
documentation
immensely ­ Especially in your attempts to clarify the
documentation of the
discretization, the offset and sample values, good pictures
will be
invaluable.

8)   Expand the focus of the documentation so it doesn't
appear to a casual
reader that this is purely for econometric/financial time
series ­ The
library has potential uses in a wide variety of fields, so
it is important
that potential future users can tell that from even a light
skimming of the
documentation.

9)   The prenamed discretizations seem to cause more
problems than they
solve ­ Initial testing of the boost::units library to
provide
discretization units looks hopeful. If this stands up under
testing, not
only remove the current prenamed discretizations, but also
document using
boost::units. A good example could do for this.

10) It is only possible to assign across series with the
same discretization
type ­ This should be made more explicit in the
documentation and also since
a good example of trouble that would be caused by trying to
assign series
with different discretization types showed up during the
review, it might be
added to the rationale section so users can see why the
choice is a good
one.

11) The reasoning behind having unit series for types that
have no obvious
unit member should also appear in the rationale ­ In
general, there were
many good explanations for design choices that should appear
in the
rationale section. Many of these reasons are compelling but
not obvious to a
developer who has not tried writing a time series library.
Thus they are
ideal rationale entries.

12) Currently non-entries in a dense time series are
recorded as zeros. This
does not seem to be the best idea in all cases. In specific,
in some time
series, the difference between a value of zero an unknown is
very important.
Investigate the best choice for a non-entry. Zero gives
simple arithmetic
for multiplication (though not addition) that gives the
unknown value
placeholder when multiplied with anything. Some non-zero
value would have to
have special arithmetic rules, probably. A universal answer
may not be
feasible. If not, consider making the unknown value a
parameter available
for the user to set.

13) Consider a name change from delta_series to
dirac_delta_series and a
link in the documentation to what a Dirac delta function
is.

14) Consider renaming the inverse_series to
reciprocal_series or some other
more illustrative name.

15) Consider reorganizing to expose the storage containers
to outside
developers ­ They are likely to be useful for a number of
applications aside
from time series and this would prevent others from needing
to reinvent the
wheel as often.

16) Since the usage of time series is spread across a
variety of different
problem domains, there will be terms that aren't familiar to
many of the
readers of the documentation. Consider including links to
definitions and
descriptions for many of the terms used. Paul's example of
users who don't
know what amortized constant complexity implies is
unfortunately not extreme
for what you should expect.

17) ordered_inserter is not a great name for the operation
it performs -
Determine a better name and change to that. Also consider
moving to named
parameters for the start, stop and value arguments to the
ordered_inserter.
After all this, the ordered_inserter is a lower level
interface than most
users want in most circumstances. Add another layer for
doing appends on top
of this interface, and include a push_back method where
sensible for more
stl-like syntax.

18) There is a confusion between the template parameter
discretization and
the constructor parameter discretization. Consider changing
the name for the
template parameter to something like Discretization_type.

19) Include in the documentation a clear description of what
it means for
two discretizations to be the same, and point out that the
runtime check for
this is an assertion, not an exception. Explain why in the
rationale
section, if nowhere else.

20) A number of reviewers found the concepts of
discretization, offset and
run to be hard to grasp from the documentation. Work to
improve this.

21) The comparison operators for the time_series_facade
currently are tied
to the time_series_base class. This could be generalized to
remove that
dependency. If this is done, the documentation should no
longer say that
time_series_base is the base for all time_series classes,
but instead that
it provides a convenient base for implementing time_series
classes. If this
isn't done, the documentation should clearly state the need
to inherit from
time_series_base.

22) A large number of specific documentation edits were
provided in posts by
Steven, Paul and Zach.

23) Fix the documentation to clearly show what kind of
iterator the
series_stl_iterator is.

24) This is outside the scope of this review for a complete
answer, but
there are questions about the behavior of non-MSVC compilers
that define the
_MSC_VER macro to masquerade as MS compilers. Is there a way
to know which
(if any) share the behaviors of the actual MS compilers they
pose as, and
what is the best way to test for them in code. While the
pursuit of this
answer is not part of this library, using it is, so if a
good answer is
found it should become part of this library.

25) Provide versions of the fine_grain and coarse_grain
functions that allow
for the user to easily provide a sampling function - This is
likely to be a
very common use case for this library, so it should be well
supported. An
example in the documentation of how to do this is a good
idea.

26) If the review of floating point offsets concludes that
they should
remain in the library, the dense series reference page
should explicitly
point out that floating point offsets are not usable with
dense series.

27) What is the appropriate handling of zero width points.
They are a part
of the library in the delta_series, so having them is
unavoidable. They
should be on firm conceptual ground if possible.

28) Make the existence and use of the set_at function more
obvious in the
documentation ­ Reviewers didn't notice it and missed the
functionality, so
others will, as well.

29) Consider whether it is valuable to provide versions of
the transform
algorithm that take more than two series as arguments. I
can't think of a
good application off hand, but there may be some. However,
the same reviewer
who requested the feature said earlier in the review that he
almost never
works on more than one series at a time.

30) Provide an example of using a single pass data stream
with a circular
buffer based algorithm.

31) Consider a better name for the scaled_storage_tag.

32) Consider whether users need access to the metafunctions
used to
determine return types. If so, consider how they could be
exposed in a user
friendly way.
 
  Once again, my thanks to the reviewers and the developers
for the work,
time and attention put into this library and congratulations
to Eric for the
accepted work.
 
   John Phillips
   Review Manager
 
 


-- Those only are happy, who have their minds fixed on some
object other
than their own happiness; on the happiness of others, on the
improvement of
mankind, even on some art or pursuit, followed not as a
means, but as itself
an ideal end. Aiming thus at something else, they find
happiness by the way.

John Stuart Mill



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost-anno
unce

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )