Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add H264RtpDepacketizer #1082

Merged

Conversation

Sean-Der
Copy link
Contributor

Inverse of H264RtpPacketizer. Takes incoming H264 packets and emits H264 NALUs.

@Sean-Der
Copy link
Contributor Author

@paullouisageneau can/should I add more fields to Message? It would be nice to know Duration + Discontinuity

Copy link
Owner

@paullouisageneau paullouisageneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paullouisageneau can/should I add more fields to Message? It would be nice to know Duration + Discontinuity

Yes, feel free to add them, but not directly in Message as it adds overhead to every packet in every transport. A shared_ptr<FrameInfo> would make sense, like for the reliability information. You should also add the frame timestamp there.

src/h264rtpdepacketizer.cpp Outdated Show resolved Hide resolved
src/h264rtpdepacketizer.cpp Outdated Show resolved Hide resolved
src/h264rtpdepacketizer.cpp Outdated Show resolved Hide resolved
src/h264rtpdepacketizer.cpp Outdated Show resolved Hide resolved
src/h264rtpdepacketizer.cpp Outdated Show resolved Hide resolved
src/h264rtpdepacketizer.cpp Outdated Show resolved Hide resolved
src/h264rtpdepacketizer.cpp Outdated Show resolved Hide resolved
auto first = this->rtp_buffer.begin();
auto last = this->rtp_buffer.begin() + (packets_in_timestamp - 1);

messages = buildFrame(first, last);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are issues with the handling of messages. For instance:

  • If there is a single timestamp in rtp_buffer (for instance a single frame), no new frame is depacketized (because of the break just above), in that case it looks like input messages in messages won't be cleared and will leak to the next element in the media processing chain.
  • If there are two frames depacketized in a single call, messages will be replaced for each frame, so frames will be dropped and only the last one will be passed to the chain.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing parts of the logic but if the principle is to flush the current frame when the next timestamp is seen, couldn't such a simple approach do the job for H264RtpDepacketizer::incoming?

message_vector result;
for (auto message : messages) {
    [...] // check message type and size
    auto p = reinterpret_cast<const RtpHeader *>(message->data());
    if (!rtp_buffer.empty() && current_timestamp != p->timestamp()) {
        result.push_back(buildFrame(rtp_buffer.begin(), rtp_buffer.end()));
        rtp.buffer.clear();
   }
   current_timestamp = p->timestamp();
   rtp_buffer.push_back(std::move(message));
}
messages.swap(result);

current_timestamp could be a class member (or read from a packet in rtp_buffer before the loop).

Copy link
Contributor Author

@Sean-Der Sean-Der Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both should be handled now!

If incoming RTP packets aren't enough to build a frame

messages.clear() is called so messages aren't leaked

many frames in a singe call

I merge the lists now. If a incoming RTP packet results in multiple frames being available it works!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks correct now, even if I'm still a bit puzzled by the convoluted approach.

include/rtc/h264rtpdepacketizer.hpp Outdated Show resolved Hide resolved
@paullouisageneau
Copy link
Owner

Thank for adding the depacketizer, this is great!

For visibility, this PR partially implements #676.

@walletiger
Copy link

how to use this depacketizer in c_api ?

@Sean-Der Sean-Der force-pushed the h264-rtp-depacketizer branch 4 times, most recently from ac302f1 to b42ff73 Compare February 16, 2024 17:51
@Sean-Der
Copy link
Contributor Author

@paullouisageneau can I get another review please! Sorry for the delay I will be on top of this now :)

src/h264rtpdepacketizer.cpp Outdated Show resolved Hide resolved
Comment on lines 46 to 48
auto firstByte = std::to_integer<uint8_t>(pkt->at(headerSize));
auto secondByte = std::to_integer<uint8_t>(pkt->at(headerSize + 1));
auto naluType = firstByte & naluTypeBitmask;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for redefining the parsing logic and constants rather than relying on helpers structs in nalunit.hpp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paullouisageneau Wasn't aware of it! I just looked a bit and I don't believe they are applicable.

nalunit.hpp seems to be just concerned with detecting/splitting NAL units and not the actual understanding of them?

I am all for expanding nalunit.hpp to include this logic also though if you want that in this commit.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. No need to expand the logic, but maybe you could only use the header struct from nalunit.hpp to read the fields here?

auto first = this->rtp_buffer.begin();
auto last = this->rtp_buffer.begin() + (packets_in_timestamp - 1);

messages = buildFrame(first, last);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks correct now, even if I'm still a bit puzzled by the convoluted approach.

src/h264rtpdepacketizer.cpp Outdated Show resolved Hide resolved
@Sean-Der
Copy link
Contributor Author

@paullouisageneau Can I get another review please?

I also fixed the 'Outdated' comments also. I can't respond to them inline on GitHub though :/

@Sean-Der Sean-Der force-pushed the h264-rtp-depacketizer branch from b42ff73 to 838e21f Compare February 18, 2024 02:58
@Sean-Der
Copy link
Contributor Author

After this commits lands I am going to add FrameInfo and a Opus depacketizer! After that I can go back and add WHEP support to OBS.

Thanks for merging+reviewing so much @paullouisageneau

Copy link
Owner

@paullouisageneau paullouisageneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sean-Der It looks good, thank you for your work! Would you mind replacing the firstByte and secondByte manipulation with casts to NalUnitHeader and NalUnitFragmentHeader so it is not implemented twice?

@Sean-Der Sean-Der force-pushed the h264-rtp-depacketizer branch from 838e21f to ee1b355 Compare February 20, 2024 02:53
@Sean-Der
Copy link
Contributor Author

@paullouisageneau Done! Can I get another review?

I added one small method to NalUnitHeader and was able to drop the firstByte/secondByte.

With some more refactoring/exposing things we could drop even more.

Inverse of H264RtpPacketizer. Takes incoming H264 packets and emits H264
NALUs.

Co-authored-by: Paul-Louis Ageneau <[email protected]>
@Sean-Der Sean-Der force-pushed the h264-rtp-depacketizer branch from ee1b355 to 70a1fc3 Compare February 21, 2024 14:29
@Sean-Der
Copy link
Contributor Author

@paullouisageneau Ok I think I got it this time :) Mind taking a look and if this is good going to start Opus + FrameInfo

Copy link
Owner

@paullouisageneau paullouisageneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, thank you! If you have the opportunity to add Opus and FrameInfo metadata, I would be my pleasure to review it.

@paullouisageneau paullouisageneau merged commit b7f1f03 into paullouisageneau:master Feb 22, 2024
12 checks passed
@Sean-Der Sean-Der deleted the h264-rtp-depacketizer branch February 22, 2024 12:44
edmonds added a commit to edmonds/libdatachannel that referenced this pull request Mar 18, 2024
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082.

I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader).

So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798):

> A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.*

"Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs.

With this commit I can now do the following:

* Generate encoded bitstream output from the Intel VPL encoder.
* Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler.
* Transport the video track over a WebRTC connection to a libdatachannel peer.
* Depacketize it with the `H265RtpDepacketizer` media handler in this commit.
* Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
@edmonds edmonds mentioned this pull request Mar 18, 2024
edmonds added a commit to edmonds/libdatachannel that referenced this pull request Mar 18, 2024
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082.

I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader).

So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798):

> A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.*

"Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs.

With this commit I can now do the following:

* Generate encoded bitstream output from the Intel VPL encoder.
* Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler.
* Transport the video track over a WebRTC connection to a libdatachannel peer.
* Depacketize it with the `H265RtpDepacketizer` media handler in this commit.
* Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
edmonds added a commit to edmonds/libdatachannel that referenced this pull request Mar 19, 2024
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082.

I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader).

So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798):

> A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.*

"Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs.

With this commit I can now do the following:

* Generate encoded bitstream output from the Intel VPL encoder.
* Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler.
* Transport the video track over a WebRTC connection to a libdatachannel peer.
* Depacketize it with the `H265RtpDepacketizer` media handler in this commit.
* Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
edmonds added a commit to edmonds/libdatachannel that referenced this pull request Mar 26, 2024
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082.

I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader).

So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798):

> A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.*

"Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs.

With this commit I can now do the following:

* Generate encoded bitstream output from the Intel VPL encoder.
* Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler.
* Transport the video track over a WebRTC connection to a libdatachannel peer.
* Depacketize it with the `H265RtpDepacketizer` media handler in this commit.
* Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
edmonds added a commit to edmonds/libdatachannel that referenced this pull request Apr 13, 2024
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082.

I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader).

So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798):

> A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.*

"Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs.

With this commit I can now do the following:

* Generate encoded bitstream output from the Intel VPL encoder.
* Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler.
* Transport the video track over a WebRTC connection to a libdatachannel peer.
* Depacketize it with the `H265RtpDepacketizer` media handler in this commit.
* Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
edmonds added a commit to edmonds/libdatachannel that referenced this pull request Apr 17, 2024
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082.

I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader).

So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798):

> A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.*

"Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs.

With this commit I can now do the following:

* Generate encoded bitstream output from the Intel VPL encoder.
* Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler.
* Transport the video track over a WebRTC connection to a libdatachannel peer.
* Depacketize it with the `H265RtpDepacketizer` media handler in this commit.
* Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
edmonds added a commit to edmonds/libdatachannel that referenced this pull request Oct 28, 2024
This commit adds an H265 depacketizer which takes incoming H265 RTP packets and emits H265 access units. It is closely based on the `H264RtpDepacketizer` added by @Sean-Der in paullouisageneau#1082.

I originally started with a version of this commit that was closer to the `H264RtpDepacketizer` and which emitted individual H265 NALUs in `H265RtpDepacketizer::buildFrames()`. This resulted in calling my `Track::onFrame()` callback for each NALU, which did not work well with the decoder that I'm using which wants to see the VPS/SPS/PPS NALUs as a unit before initializing the decoder (https://intel.github.io/libvpl/v2.10/API_ref/VPL_func_vid_decode.html#mfxvideodecode-decodeheader).

So for the `H265RtpDepacketizer` I've tried to make it emit access units rather than NALUs. An "access unit" is (RFC 7798):

> A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, *and that contain exactly one coded picture.*

"Exactly one coded picture" seems to correspond with what a caller might expect an "onFrame" callback to do. Maybe the `H264RtpDepacketizer` should be revised to similarly emit H264 access units rather than NALUs, too. At least, I could not find a way to receive individual NALUs from the depacketizer and run the VPL decoder without needing to do my own buffering/copying of the NALUs.

With this commit I can now do the following:

* Generate encoded bitstream output from the Intel VPL encoder.
* Pass the output of the encoder one frame at a time to libdatachannel's `Track::send()` on a track with an `H265RtpPacketizer` media handler.
* Transport the video track over a WebRTC connection to a libdatachannel peer.
* Depacketize it with the `H265RtpDepacketizer` media handler in this commit.
* Pass the depacketized output via my `Track::onFrame()` callback to the Intel VPL decoder in "complete frame" mode (https://intel.github.io/libvpl/v2.10/API_ref/VPL_enums.html#_CPPv428MFX_BITSTREAM_COMPLETE_FRAME). Each "onFrame" callback corresponds to a single call to the decoder API to decode a frame.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants