Clarify the labels format for the entry (vs describe it in headers) #35

tahini · 2020-09-28T19:01:42Z

Should the labels be returned as a formatted string? Or wouldn't it make more sense to return it as raw data and let the client format them, by giving it a hint using the dataType and unit (like for the axis data).

For large numbers, like sum of time, it would make more sense as the raw data can be used for sorting, but we can still visualize them in human friendly manner, using the hints. If we decide to go this way, the EntryHeader object will have to be updated to support dataType and units.

What do you think?

bhufmann · 2020-10-01T19:42:49Z

@tahini thanks for opening this tracker. It's an important topic for the TSP specification.

The advantage of sending formatted strings is that the domain specific logic and responsibility is in the trace server, and the client can then display the data as. The definition of virtual table and the tree model used for xy, time-graphs and data-trees currently asks for formatted strings. The disadvantage is, that for doing client side actions, that require raw values, for example, sorting, timestamp synchronization and others, the raw data needs to be retrieved somehow. New APIs are needed that will allow the client to retrieve the raw data. This could be done by having a query parameter in the APIs to request the data as raw values (e.g. in fetchTree()), or by requesting tooltips with detailed information in additional APIs. For the events table, there is a proposal (see PR #36 ) to provide tooltips information (properties) including the formatted timestamp as well as the raw timestamp value.

As an alternative, there is your suggestion to provide a data type hint plus a unit to the relevant data. With this the server can send raw values along with a data type and unit and the client then can apply a formatting. The advantage is that the client can change formatting, the client can apply sorting, filtering on the client side without doing extra querying the server (to a certain extend where client side actions make sense). The disadvantage is that it's not trivial to define a "good" set of data types that can serve most use cases which have "good" default formatters in the client. Also, each client has to implement it's own formatting. Maybe, the server could provide the format string and the client would just apply it. But looking at e.g. the DecimalFormat or DateFormat it seems quite challenging to handle in the client.

It tried to implement data type support for the trees with columns (e.g. data tree, time graph tree, xy tree) and had some challenges when implementing formatters that would fit various use cases. For example in a table with multiple columns with decimal numbers, where each column has a different value range, what should be the default decimal format in the client? Also, I had a case where a data type was needed per table cell and not just per column.

I think it's important that we have this discussion, and decide which route to go. The first alternative, seems initially a better solution to get started and get something quickly implemented, but I wonder if in the long run alternative 2 is better.

Any thoughts?

MatthewKhouzam · 2020-10-01T20:07:11Z

Hi, as the instigator of PR36, I would like to give some thoughts...

Let's look at a timestamp example as this is seems to cover all the cases

examples:

Timestamp 150000000000 (raw)
Timestamp 2018/05/02 14:23:33.0002231234 (formatted)
Timestamp <a href="timestamp://150000000000">2018/05/02 14:23:33.0002231234</a> (both)

What do you think of option 3?

MatthewKhouzam · 2020-10-05T16:06:57Z

Some clarifications: for the hyperlink. We can look into HATEOAS.

Also, for the link, let's see possibilities:

<a href="timestamp://150000000000">2018/05/02 14:23:33.0002231234</a>
<a href="tsp://?timestamp=150000000000">2018/05/02 14:23:33.0002231234</a>
<a href="tsp://127.0.0.1:8080:/?timestamp=150000000000">2018/05/02 14:23:33.0002231234</a>
<a href="tsp://127.0.0.1:8080:/path-to-endpoint/?timestamp=150000000000">2018/05/02 14:23:33.0002231234</a>

which level of detail is interesting, also, how can we make it clear that the data is for other views to update, like an outputelement in eclipse's UI?

tahini · 2020-10-05T19:27:54Z

@MatthewKhouzam for the hyperlinks, those would be generated by the server, right?

You both bring up very valid points about the data, which brings even more questions. Like who is expected to do the sorting. Lea did sorting of columns in the filter-table-tree (left part of XY charts and timegraphs). And the sort was done client-side, with whatever value was received. Which is where this issue originated. But if we say all sorting is meant to be done server-side, then we can very well send only formatted strings.

But does it make sense to always require going server side (with all the overhead it implies), just to sort a couple hundreds of entries typically?

But we sometimes have more than a couple hundreds entries with virtual tables, in which case it would make sense to sort server side.

I think the idea of having the possibility, via query parameter, to request either raw or formatted data is interesting, as it would make good formatted default for the dumbest consuming UI, while still give room for creative ways to consume the data by the client.

So we could update the protocol so headers are not simple strings, but objects with a name, tooltip, dataType and unit (same as the Axis descriptor), say that the data comes by default formatted as per its data type.

Then as a second step, we could decide on how best to give the possibility to retrieve raw data, either as additional endpoints as Matthew suggests or as additional parameter in the query.

bhufmann · 2020-10-15T18:02:22Z

@MatthewKhouzam for the hyperlinks, those would be generated by the server, right?

You both bring up very valid points about the data, which brings even more questions. Like who is expected to do the sorting. Lea did sorting of columns in the filter-table-tree (left part of XY charts and timegraphs). And the sort was done client-side, with whatever value was received. Which is where this issue originated. But if we say all sorting is meant to be done server-side, then we can very well send only formatted strings.

But does it make sense to always require going server side (with all the overhead it implies), just to sort a couple hundreds of entries typically?

But we sometimes have more than a couple hundreds entries with virtual tables, in which case it would make sense to sort server side.

For sorting client side the client has to have sufficient information to do it without querying the server. First of all the client would have all the entries to sort. Secondly, the client knows how to sort. What I mean sorting string values vs numbers is different. Sometimes the value to sort is a string but needs to be sorted differently. Consider sorting a list with "CPU0, CPU1 ... CPU16" it would be expected to sort on the CPU number not on the string. The next question is, how does the client know that it can sort client side and when it has to query the server? Would be part of the some kind of descriptor (e.g. column descriptor).
With Filtering we have similar challenges.

I think the idea of having the possibility, via query parameter, to request either raw or formatted data is interesting, as it would make good formatted default for the dumbest consuming UI, while still give room for creative ways to consume the data by the client.

I agree, I think just working with formatted strings or raw data is not sufficient. We need probably both. For timestamps, we for sure need both.

So we could update the protocol so headers are not simple strings, but objects with a name, tooltip, dataType and unit (same as the Axis descriptor), say that the data comes by default formatted as per its data type.

So, when sending formatted data, I guess the dataType would be string. For raw data, then there are different types depending on the data.

Then as a second step, we could decide on how best to give the possibility to retrieve raw data, either as additional endpoints as Matthew suggests or as additional parameter in the query.
Using the same endpoint, would have the advantage that views can display the raw data instead or the whole data structure could processed further in the client (e.g. Python client). Additional endpoints provide a more directed request i.e. give the raw values of this entry.

bhufmann · 2020-10-15T18:15:05Z

Some clarifications: for the hyperlink. We can look into HATEOAS.

Also, for the link, let's see possibilities:
* `<a href="timestamp://150000000000">2018/05/02 14:23:33.0002231234</a>`

* `<a href="tsp://?timestamp=150000000000">2018/05/02 14:23:33.0002231234</a>`

* `<a href="tsp://127.0.0.1:8080:/?timestamp=150000000000">2018/05/02 14:23:33.0002231234</a>`

* `<a href="tsp://127.0.0.1:8080:/path-to-endpoint/?timestamp=150000000000">2018/05/02 14:23:33.0002231234</a>`
which level of detail is interesting, also, how can we make it clear that the data is for other views to update, like an outputelement in eclipse's UI?

HATEOAS is intended for the server advertising other endpoints or actions to the client that the client can use for further features. This might be interesting for some other features in our application.

Using the a hyperlink to transport timestamp information (raw and formatted value) is meant from server to the client and within the client and not to provide an end-point that the client will use to query the server for more information. If we would go that way, this would have to be specified in the TSP so that clients and servers know about it implement it accordingly.

MatthewKhouzam · 2020-10-16T17:22:51Z

Follow up: the core issue:

Passing data from the trace server to the trace client

Problem:
Views cannot synchronize with each other or trace server
They need a unique key to do so
How do we convey data from the server to the client. Do we provide it in “raw” or “formatted”?
What implications do we have? Do we pass formatters? Do we get raw values? See GH issue 35 for TSP
How to enable Critical path (Good use case to test on)
Any view can react on any key of any view.
How can we get the “keys”? Defining “aspects” of a trace that are unique.

Ways forward:

Raw data sent as well as formatted

Heavy on networks
Heavy on client-side memory
Trivial in client compute

Client formats data

Efficient in terms of network,
Need metadata for inter-view interactions
Potentially inconsistent across implementations
May fragment client ecosystem
In memory sort ()

Server formats data

Virtual sort ()
a. Client has raw, queries for formatted <- not inconsistent with Update the tree entrypoint for TimeGraph, XY and Events table #2
b. Client has formatted queries for raw
Less client-side compute
Less separation of concerns
One more network hop to get data

Internal nanosecond storage https://www.npmjs.com/package/timestamp-nano

Work for #2

Send metadata
Have client format
Update events table
Update Statistics
TSP: Send nanoseconds

Work for #3

Support lookup endpoints
Lookup from there

If we send only “raw” to the client:

Client-side formatting: a warning.
JS != Python != java

So, #2 makes more sense architecturally but is more work today, less tomorrow, let's do #2!

tahini · 2021-01-08T22:53:30Z

I'll start working on this next week

bhufmann · 2021-01-18T15:09:21Z

I'll start working on this next week

@tahini, thanks for letting us know. If you'd like to have some discussions please let me know. I'm looked at it before and recently I was scratching my head to see how we can do similar things in the events table. In the end we'd like to have way to transport data between server and client, and then be able to correlate the data in the client between different views (data provider).

Fixes eclipse-cdt-cloud#35 With this, it is expected that column and entry descriptions may contain an optional typing option, to explain what the data type is and, if available, what unit to use when formatting it. This allow the server implementations to send raw data to the client, who would then have the option to format the data, but also the possibility to do some other actions, like sorting, or use the data for richer data interactions, as raw data may have a same meaning in different outputs, unlike pre-formatted data which may not allow such interaction. Signed-off-by: Geneviève Bastien <[email protected]>

tahini · 2021-01-25T14:43:10Z

@bhufmann I just did a PR for column data types (including the event table). I could try to make a quick prototype of this new API to see if it solves whatever you were scratching your head about. What was the exact issue? Is it about the timestamp? Or something else?

tahini mentioned this issue Sep 30, 2020

Align client classes with the TSP specification eclipse-cdt-cloud/tsp-typescript-client#4

Merged

tahini mentioned this issue Jan 8, 2021

Sync event timestamp with timegraph eclipse-cdt-cloud/theia-trace-extension#207

Merged

tahini self-assigned this Jan 8, 2021

tahini mentioned this issue Jan 15, 2021

Support segment statistics in Theia eclipse-cdt-cloud/theia-trace-extension#17

Closed

1 task

tahini linked a pull request Jan 25, 2021 that will close this issue

Add data type hints to column and entry descriptions #50

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify the labels format for the entry (vs describe it in headers) #35

Clarify the labels format for the entry (vs describe it in headers) #35

tahini commented Sep 28, 2020

bhufmann commented Oct 1, 2020

MatthewKhouzam commented Oct 1, 2020 •

edited

Loading

MatthewKhouzam commented Oct 5, 2020 •

edited

Loading

tahini commented Oct 5, 2020

bhufmann commented Oct 15, 2020 •

edited

Loading

bhufmann commented Oct 15, 2020

MatthewKhouzam commented Oct 16, 2020

tahini commented Jan 8, 2021

bhufmann commented Jan 18, 2021

tahini commented Jan 25, 2021

Clarify the labels format for the entry (vs describe it in headers) #35

Clarify the labels format for the entry (vs describe it in headers) #35

Comments

tahini commented Sep 28, 2020

bhufmann commented Oct 1, 2020

MatthewKhouzam commented Oct 1, 2020 • edited Loading

MatthewKhouzam commented Oct 5, 2020 • edited Loading

tahini commented Oct 5, 2020

bhufmann commented Oct 15, 2020 • edited Loading

bhufmann commented Oct 15, 2020

MatthewKhouzam commented Oct 16, 2020

tahini commented Jan 8, 2021

bhufmann commented Jan 18, 2021

tahini commented Jan 25, 2021

MatthewKhouzam commented Oct 1, 2020 •

edited

Loading

MatthewKhouzam commented Oct 5, 2020 •

edited

Loading

bhufmann commented Oct 15, 2020 •

edited

Loading