Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the labels format for the entry (vs describe it in headers) #35

Open
tahini opened this issue Sep 28, 2020 · 10 comments · May be fixed by #50
Open

Clarify the labels format for the entry (vs describe it in headers) #35

tahini opened this issue Sep 28, 2020 · 10 comments · May be fixed by #50
Assignees

Comments

@tahini
Copy link
Contributor

tahini commented Sep 28, 2020

Should the labels be returned as a formatted string? Or wouldn't it make more sense to return it as raw data and let the client format them, by giving it a hint using the dataType and unit (like for the axis data).

For large numbers, like sum of time, it would make more sense as the raw data can be used for sorting, but we can still visualize them in human friendly manner, using the hints. If we decide to go this way, the EntryHeader object will have to be updated to support dataType and units.

What do you think?

@bhufmann
Copy link
Collaborator

bhufmann commented Oct 1, 2020

@tahini thanks for opening this tracker. It's an important topic for the TSP specification.

The advantage of sending formatted strings is that the domain specific logic and responsibility is in the trace server, and the client can then display the data as. The definition of virtual table and the tree model used for xy, time-graphs and data-trees currently asks for formatted strings. The disadvantage is, that for doing client side actions, that require raw values, for example, sorting, timestamp synchronization and others, the raw data needs to be retrieved somehow. New APIs are needed that will allow the client to retrieve the raw data. This could be done by having a query parameter in the APIs to request the data as raw values (e.g. in fetchTree()), or by requesting tooltips with detailed information in additional APIs. For the events table, there is a proposal (see PR #36 ) to provide tooltips information (properties) including the formatted timestamp as well as the raw timestamp value.

As an alternative, there is your suggestion to provide a data type hint plus a unit to the relevant data. With this the server can send raw values along with a data type and unit and the client then can apply a formatting. The advantage is that the client can change formatting, the client can apply sorting, filtering on the client side without doing extra querying the server (to a certain extend where client side actions make sense). The disadvantage is that it's not trivial to define a "good" set of data types that can serve most use cases which have "good" default formatters in the client. Also, each client has to implement it's own formatting. Maybe, the server could provide the format string and the client would just apply it. But looking at e.g. the DecimalFormat or DateFormat it seems quite challenging to handle in the client.

It tried to implement data type support for the trees with columns (e.g. data tree, time graph tree, xy tree) and had some challenges when implementing formatters that would fit various use cases. For example in a table with multiple columns with decimal numbers, where each column has a different value range, what should be the default decimal format in the client? Also, I had a case where a data type was needed per table cell and not just per column.

I think it's important that we have this discussion, and decide which route to go. The first alternative, seems initially a better solution to get started and get something quickly implemented, but I wonder if in the long run alternative 2 is better.

Any thoughts?

@MatthewKhouzam
Copy link

MatthewKhouzam commented Oct 1, 2020

Hi, as the instigator of PR36, I would like to give some thoughts...

Let's look at a timestamp example as this is seems to cover all the cases

examples:

  • Timestamp 150000000000 (raw)
  • Timestamp 2018/05/02 14:23:33.0002231234 (formatted)
  • Timestamp <a href="timestamp://150000000000">2018/05/02 14:23:33.0002231234</a> (both)

What do you think of option 3?

@MatthewKhouzam
Copy link

MatthewKhouzam commented Oct 5, 2020

Some clarifications: for the hyperlink. We can look into HATEOAS.

Also, for the link, let's see possibilities:

  • <a href="timestamp://150000000000">2018/05/02 14:23:33.0002231234</a>
  • <a href="tsp://?timestamp=150000000000">2018/05/02 14:23:33.0002231234</a>
  • <a href="tsp://127.0.0.1:8080:/?timestamp=150000000000">2018/05/02 14:23:33.0002231234</a>
  • <a href="tsp://127.0.0.1:8080:/path-to-endpoint/?timestamp=150000000000">2018/05/02 14:23:33.0002231234</a>

which level of detail is interesting, also, how can we make it clear that the data is for other views to update, like an outputelement in eclipse's UI?

@tahini
Copy link
Contributor Author

tahini commented Oct 5, 2020

@MatthewKhouzam for the hyperlinks, those would be generated by the server, right?

You both bring up very valid points about the data, which brings even more questions. Like who is expected to do the sorting. Lea did sorting of columns in the filter-table-tree (left part of XY charts and timegraphs). And the sort was done client-side, with whatever value was received. Which is where this issue originated. But if we say all sorting is meant to be done server-side, then we can very well send only formatted strings.

But does it make sense to always require going server side (with all the overhead it implies), just to sort a couple hundreds of entries typically?

But we sometimes have more than a couple hundreds entries with virtual tables, in which case it would make sense to sort server side.

I think the idea of having the possibility, via query parameter, to request either raw or formatted data is interesting, as it would make good formatted default for the dumbest consuming UI, while still give room for creative ways to consume the data by the client.

So we could update the protocol so headers are not simple strings, but objects with a name, tooltip, dataType and unit (same as the Axis descriptor), say that the data comes by default formatted as per its data type.

Then as a second step, we could decide on how best to give the possibility to retrieve raw data, either as additional endpoints as Matthew suggests or as additional parameter in the query.

@bhufmann
Copy link
Collaborator

bhufmann commented Oct 15, 2020

@MatthewKhouzam for the hyperlinks, those would be generated by the server, right?

You both bring up very valid points about the data, which brings even more questions. Like who is expected to do the sorting. Lea did sorting of columns in the filter-table-tree (left part of XY charts and timegraphs). And the sort was done client-side, with whatever value was received. Which is where this issue originated. But if we say all sorting is meant to be done server-side, then we can very well send only formatted strings.

But does it make sense to always require going server side (with all the overhead it implies), just to sort a couple hundreds of entries typically?

But we sometimes have more than a couple hundreds entries with virtual tables, in which case it would make sense to sort server side.

For sorting client side the client has to have sufficient information to do it without querying the server. First of all the client would have all the entries to sort. Secondly, the client knows how to sort. What I mean sorting string values vs numbers is different. Sometimes the value to sort is a string but needs to be sorted differently. Consider sorting a list with "CPU0, CPU1 ... CPU16" it would be expected to sort on the CPU number not on the string. The next question is, how does the client know that it can sort client side and when it has to query the server? Would be part of the some kind of descriptor (e.g. column descriptor).
With Filtering we have similar challenges.

I think the idea of having the possibility, via query parameter, to request either raw or formatted data is interesting, as it would make good formatted default for the dumbest consuming UI, while still give room for creative ways to consume the data by the client.

I agree, I think just working with formatted strings or raw data is not sufficient. We need probably both. For timestamps, we for sure need both.

So we could update the protocol so headers are not simple strings, but objects with a name, tooltip, dataType and unit (same as the Axis descriptor), say that the data comes by default formatted as per its data type.

So, when sending formatted data, I guess the dataType would be string. For raw data, then there are different types depending on the data.

Then as a second step, we could decide on how best to give the possibility to retrieve raw data, either as additional endpoints as Matthew suggests or as additional parameter in the query.
Using the same endpoint, would have the advantage that views can display the raw data instead or the whole data structure could processed further in the client (e.g. Python client). Additional endpoints provide a more directed request i.e. give the raw values of this entry.

@bhufmann
Copy link
Collaborator

Some clarifications: for the hyperlink. We can look into HATEOAS.

Also, for the link, let's see possibilities:

* `<a href="timestamp://150000000000">2018/05/02 14:23:33.0002231234</a>`

* `<a href="tsp://?timestamp=150000000000">2018/05/02 14:23:33.0002231234</a>`

* `<a href="tsp://127.0.0.1:8080:/?timestamp=150000000000">2018/05/02 14:23:33.0002231234</a>`

* `<a href="tsp://127.0.0.1:8080:/path-to-endpoint/?timestamp=150000000000">2018/05/02 14:23:33.0002231234</a>`

which level of detail is interesting, also, how can we make it clear that the data is for other views to update, like an outputelement in eclipse's UI?

HATEOAS is intended for the server advertising other endpoints or actions to the client that the client can use for further features. This might be interesting for some other features in our application.

Using the a hyperlink to transport timestamp information (raw and formatted value) is meant from server to the client and within the client and not to provide an end-point that the client will use to query the server for more information. If we would go that way, this would have to be specified in the TSP so that clients and servers know about it implement it accordingly.

@MatthewKhouzam
Copy link

Follow up: the core issue:

Passing data from the trace server to the trace client

Problem:
Views cannot synchronize with each other or trace server
They need a unique key to do so
How do we convey data from the server to the client. Do we provide it in “raw” or “formatted”?
What implications do we have? Do we pass formatters? Do we get raw values? See GH issue 35 for TSP
How to enable Critical path (Good use case to test on)
Any view can react on any key of any view.
How can we get the “keys”? Defining “aspects” of a trace that are unique.

Ways forward:

  1. Raw data sent as well as formatted
  • Heavy on networks
  • Heavy on client-side memory
  • Trivial in client compute
  1. Client formats data
  • Efficient in terms of network,
  • Need metadata for inter-view interactions
  • Potentially inconsistent across implementations
  • May fragment client ecosystem
  • In memory sort ()
  1. Server formats data

Internal nanosecond storage https://www.npmjs.com/package/timestamp-nano

Work for #2

  • Send metadata
  • Have client format
  • Update events table
  • Update Statistics
  • TSP: Send nanoseconds

Work for #3

  • Support lookup endpoints
  • Lookup from there

If we send only “raw” to the client:

Client-side formatting: a warning.
JS != Python != java

So, #2 makes more sense architecturally but is more work today, less tomorrow, let's do #2!

@tahini
Copy link
Contributor Author

tahini commented Jan 8, 2021

I'll start working on this next week

@bhufmann
Copy link
Collaborator

I'll start working on this next week

@tahini, thanks for letting us know. If you'd like to have some discussions please let me know. I'm looked at it before and recently I was scratching my head to see how we can do similar things in the events table. In the end we'd like to have way to transport data between server and client, and then be able to correlate the data in the client between different views (data provider).

tahini added a commit to tahini/trace-server-protocol that referenced this issue Jan 25, 2021
Fixes eclipse-cdt-cloud#35

With this, it is expected that column and entry descriptions may contain
an optional typing option, to explain what the data type is and, if
available, what unit to use when formatting it.

This allow the server implementations to send raw data to the client,
who would then have the option to format the data, but also the
possibility to do some other actions, like sorting, or use the data for
richer data interactions, as raw data may have a same meaning in
different outputs, unlike pre-formatted data which may not allow such
interaction.

Signed-off-by: Geneviève Bastien <[email protected]>
@tahini tahini linked a pull request Jan 25, 2021 that will close this issue
@tahini
Copy link
Contributor Author

tahini commented Jan 25, 2021

@bhufmann I just did a PR for column data types (including the event table). I could try to make a quick prototype of this new API to see if it solves whatever you were scratching your head about. What was the exact issue? Is it about the timestamp? Or something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants