Getting Started

🚧

Preparation
Tutorial
- Hello World!
- Official Solution

Preparation

Build / Installation

This plugin requires native libraries (e.g. libmediapipe_c.so, mediapipe_c.dll, mediapipe_android.aar, etc...) to work, but they are not included in the repository.
If you've not built them yet, go to https://github.com/homuler/MediaPipeUnityPlugin/wiki/Installation-Guide first.

Test

Before using the plugin in your project, it's strongly recommended that you check if it works in this project.

First, open Assets/MediaPipeUnity/Samples/Scenes/Start Scene.unity. test-start-scene

And play the scene.
If you've built the plugin successfully, the Face Detection sample will start after a while. test-start-scene-2

Import into your project

Once you've built the plugin, you can import it into your project. Choose your favorite method from the following options.

Build and import a unity package

Open this project
Click Tools > Export Unitypackage
- MediaPipeUnity.[version].unitypackage file will be created at the project root.
Open your project
Import the built package

Build and install a local tarball file

Install npm command

Build a tarball file

cd Packages/com.github.homuler.mediapipe
npm pack
# com.github.homuler.mediapipe-[version].tgz will be created

mv com.github.homuler.mediapipe-[version].tgz your/favorite/path

Install the package from the tarball file

Install from a submodule

⚠️ Development with Git submodules tends to be a bit more complicated.

Add a submodule

mkdir Submodules
cd Submodules
git submodule add https://github.com/homuler/MediaPipeUnityPlugin

Build the plugin

cd MediaPipeUnityPlugin
python build.py build ...

Install the package from Submodules/MediaPipeUnityPlugin/Packages/com.github.homuler.mediapipe

Tutorial

⚠️ If you are not familiar with MediaPipe, you may want to read the Framework Concepts article first.

Hello World!

Let's write our first program!

🔔 The following code is based on mediapipe/examples/desktop/examples/hello_world/hello_world.cc.

Send input

To run the Calculators provided by MediaPipe, we usually need to initialize a CalculatorGraph, so let's do that first!

🔔 Each CalculatorGraph has its own config (CalculatorGraphConfig).

var configText = @"
input_stream: ""in""
output_stream: ""out""
node {
  calculator: ""PassThroughCalculator""
  input_stream: ""in""
  output_stream: ""out1""
}
node {
  calculator: ""PassThroughCalculator""
  input_stream: ""out1""
  output_stream: ""out""
}
";

var graph = new CalculatorGraph(configText);

To run a CalculatorGraph, call the StartRun method.

graph.StartRun().AssertOk();

Note that the StartRun method returns a Status object, which represents the result.
Status#AssertOk throws iff the result is not OK.

After starting, of course we want to give inputs to the CalculatorGraph, right?

Let's say we want to give a sequence of 10 strings ("Hello World!") as input.

for (var i = 0; i < 10; i++)
{
  // Send input to running graph
}

In MediaPipe, input is passed through a class called Packet.

var input = new StringPacket("Hello World!");

To pass an input Packet to the CalculatorGraph, we can use CalculatorGraph#AddPacketToInputStream.
Note that the only input stream name of this CalculatorGraph is in.

🔔 It depends on the CalculatorGraphConfig. CalculatorGraph can multiple input streams

for (var i = 0; i < 10; i++)
{
  var input = new StringPacket("Hello World!");
  graph.AddPacketToInputStream("in", input).AssertOk();
}

CalculatorGraph#AddPacketToInputStream also returns a Status object, so let's call AssertOk here as well.

After everything is done, we should

close input streams
dispose of the CalculatorGraph

so let's do that.
Again, note that each method returns a Status object.

graph.CloseInputStream("in").AssertOk();
graph.WaitUntilDone().AssertOk();
graph.Dispose();

For now, let's just run the code we've written so far.

Save the following code as HelloWorld.cs, attach it to an empty GameObject and play the scene.

using UnityEngine;

namespace Mediapipe.Unity.Tutorial
{
  public class HelloWorld : MonoBehaviour
  {
    private void Start()
    {
      var configText = @"
input_stream: ""in""
output_stream: ""out""
node {
  calculator: ""PassThroughCalculator""
  input_stream: ""in""
  output_stream: ""out1""
}
node {
  calculator: ""PassThroughCalculator""
  input_stream: ""out1""
  output_stream: ""out""
}
";
      var graph = new CalculatorGraph(configText);
      graph.StartRun().AssertOk();

      for (var i = 0; i < 10; i++)
      {
        var input = new StringPacket("Hello World!");
        graph.AddPacketToInputStream("in", input).AssertOk();
      }

      graph.CloseInputStream("in").AssertOk();
      graph.WaitUntilDone().AssertOk();
      graph.Dispose();

      Debug.Log("Done");
    }
  }
}

hello-world-timestamp-error

Oops, I see an error.

MediaPipeException: INVALID_ARGUMENT: Graph has errors: 
; In stream "in", timestamp not specified or set to illegal value: Timestamp::Unset()
  at Mediapipe.Status.AssertOk () [0x00014] in /home/homuler/Development/unity/MediaPipeUnityPlugin/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Framework/Port/Status.cs:50 
  at Mediapipe.Unity.Tutorial.HelloWorld.Start () [0x00025] in /home/homuler/Development/unity/MediaPipeUnityPlugin/Assets/MediaPipeUnity/Tutorial/Hello World/HelloWorld.cs:35

Each input packet should have a timestamp, but it does not appear to be set.
Let's fix the code that initializes a Packet as follows.

// var input = new StringPacket("Hello World!");
var input = new StringPacket("Hello World!", new Timestamp(i));

hello-world-no-output

This time it seems to work.
But wait, we are not receiving the CalculatorGraph output!

Get output

To get output, we need to do more work before running the CalculatorGraph.
Note that the only output stream name of this CalculatorGraph is out.

🔔 It depends on the CalculatorGraphConfig. CalculatorGraph can multiple output streams.

var graph = new CalculatorGraph(configText);

// Initialize an `OutputStreamPoller`.
// NOTE: The type parameter is `string` since the output type is `string`.
var poller = graph.AddOutputStreamPoller<string>("out").Value();

graph.StartRun().AssertOk();

CalculatorGraph#AddOutputStreamPoller<T> returns a StatusOr<T> object.
StatusOr<T> is similar to Status, but it can contain a value if the Status is OK.

🔔 In production, you should check if it's OK before calling StatusOr<V>#Value.
var statusOrPoller = graph.AddOutputStreamPoller<string>("out");
if (statusOrPoller.Ok())
{
  var poller = statusOrPoller.Value();
}  

Then, we can get output using the OutputStreamPoller<string>#Next.
Like inputs, outputs must be received through packets.

graph.CloseInputStream("in").AssertOk();

// Initialize an empty packet
var output = new StringPacket();

while (poller.Next(output))
{
  Debug.Log(output.Get());
}

graph.WaitUntilDone().AssertOk();

Now, our code would look like this.

using UnityEngine;

namespace Mediapipe.Unity.Tutorial
{
  public class HelloWorld : MonoBehaviour
  {
    private void Start()
    {
      var configText = @"
input_stream: ""in""
output_stream: ""out""
node {
  calculator: ""PassThroughCalculator""
  input_stream: ""in""
  output_stream: ""out1""
}
node {
  calculator: ""PassThroughCalculator""
  input_stream: ""out1""
  output_stream: ""out""
}
";
      var graph = new CalculatorGraph(configText);
      var poller = graph.AddOutputStreamPoller<string>("out").Value();
      graph.StartRun().AssertOk();

      for (var i = 0; i < 10; i++)
      {
        var input = new StringPacket("Hello World!", new Timestamp(i));
        graph.AddPacketToInputStream("in", input).AssertOk();
      }

      graph.CloseInputStream("in").AssertOk();

      var output = new StringPacket();
      while (poller.Next(output))
      {
        Debug.Log(output.Get());
      }

      graph.WaitUntilDone().AssertOk();
      graph.Dispose();

      Debug.Log("Done");
    }
  }
}

hello-world-output

More Tips

Validate the config format

What happens if the config format is invalid?

var graph = new CalculatorGraph("invalid format");

hello-world-invalid-config

Hmm, the constructor fails, which is probably the behavior it should be.
Let's check Editor.log.

[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:335] Error parsing text-format mediapipe.CalculatorGraphConfig: 1:9: Message type "mediapipe.CalculatorGraphConfig" has no field named "invalid".
MediaPipeException: Failed to parse config text. See error logs for more details
  at Mediapipe.CalculatorGraphConfigExtension.ParseFromTextFormat (Google.Protobuf.MessageParser`1[T] _, System.String configText) [0x0001e] in /home/homuler/Development/unity/MediaPipeUnityPlugin/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Framework/CalculatorGraphConfigExtension.cs:21 
  at Mediapipe.CalculatorGraph..ctor (System.String textFormatConfig) [0x00000] in /home/homuler/Development/unity/MediaPipeUnityPlugin/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Framework/CalculatorGraph.cs:33 
  at Mediapipe.Unity.Tutorial.HelloWorld.Start () [0x00000] in /home/homuler/Development/unity/MediaPipeUnityPlugin/Assets/MediaPipeUnity/Tutorial/Hello World/HelloWorld.cs:31

Not too bad, but it's inconvenient to check Editor.log every time.
Let's fix it so that the logs are visible in the Console Window.

Protobuf.SetLogHandler(Protobuf.DefaultLogHandler);
var graph = new CalculatorGraph("invalid format");

hello-world-protobuf-logger

Great!
But there's a minor but serious bug that can cause SIGSEGV.
Don't forget to restore the default LogHandler when the application exits.

void OnApplicationQuit()
{
  Protobuf.ResetLogHandler();
}

Official Solution

In this section, let's try running the Face Mesh Solution.

Setup WebCamTexture

First, let's display the Web Camera image on the screen.

using System.Collections;
using UnityEngine;
using UnityEngine.UI;

namespace Mediapipe.Unity.Tutorial
{
  public class FaceMesh : MonoBehaviour
  {
    [SerializeField] private TextAsset _configAsset;
    [SerializeField] private RawImage _screen;
    [SerializeField] private int _width;
    [SerializeField] private int _height;
    [SerializeField] private int _fps;

    private WebCamTexture _webCamTexture;

    private IEnumerator Start()
    {
      if (WebCamTexture.devices.Length == 0)
      {
        throw new System.Exception("Web Camera devices are not found");
      }
      var webCamDevice = WebCamTexture.devices[0];
      _webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
      _webCamTexture.Play();

      yield return new WaitUntil(() => _webCamTexture.width > 16);

      _screen.rectTransform.sizeDelta = new Vector2(_width, _height);
      _screen.texture = _webCamTexture;

      while (true)
      {
        yield return new WaitForEndOfFrame();
      }
    }

    private void OnDestroy()
    {
      if (_webCamTexture != null)
      {
        _webCamTexture.Stop();
      }
    }
  }
}

If everything is fine, your screen will look like this.

web-cam-setup

Send ImageFrame

Now let's try face_mesh_desktop_live.pbtxt, the official Face Mesh sample!

⚠️ To run the graph, you must build native libraries with GPU disabled.

First, initialize a CalculatorGraph as in the Hello World example.

var graph = new CalculatorGraph(_configAsset.text);
graph.StartRun().AssertOk();

In MediaPipe, image data on the CPU is stored in a class called ImageFrame.
Let's initialize an ImageFrame instance from the WebCamTexture image.

💡 On the other hand, image data on the GPU is stored in a class called GpuBuffer.

We can initialize an ImageFrame instance using NativeArray<byte>.
Here, although not the best from the perspective of the performance, we will copy the WebCamTexture data to Texture2D to obtain a NativeArray<byte>.

Texture2D inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
Color32[] pixelData = new Color32[_width * _height];

while (true)
{
  inputTexture.SetPixels32(_webCamTexture.GetPixels32(pixelData));

  yield return new WaitForEndOfFrame();
}

Now we can initialize an ImageFrame instance using inputTexture.

⚠️ In theory, you can build ImageFrame instances using various formats, but not all Calculators necessarily support all formats. As for official solutions, they often work only with RGBA32 format.

var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, inputTexture.GetRawTextureData<byte>());

The 4th argument, widthStep, may require some explanation.
It's the byte offset between a pixel value and the same pixel and channel in the next row.
In most cases, this is equal to the product of the width and the number of channels.

As usual, initialize a Packet and send it to the CalculatorGraph.
Note that the input stream name is "input_video" and the input type is ImageFrame this time.

graph.AddPacketToInputStream("input_video", new ImageFramePacket(imageFrame)).AssertOk();

We should stop the CalculatorGraph on the OnDestroy event.
With a little refactoring, the code now looks like this.

using System.Collections;
using UnityEngine;
using UnityEngine.UI;

namespace Mediapipe.Unity.Tutorial
{
  public class FaceMesh : MonoBehaviour
  {
    [SerializeField] private TextAsset _configAsset;
    [SerializeField] private RawImage _screen;
    [SerializeField] private int _width;
    [SerializeField] private int _height;
    [SerializeField] private int _fps;

    private CalculatorGraph _graph;

    private WebCamTexture _webCamTexture;
    private Texture2D _inputTexture;
    private Color32[] _pixelData;

    private IEnumerator Start()
    {
      if (WebCamTexture.devices.Length == 0)
      {
        throw new System.Exception("Web Camera devices are not found");
      }
      var webCamDevice = WebCamTexture.devices[0];
      _webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
      _webCamTexture.Play();

      yield return new WaitUntil(() => _webCamTexture.width > 16);

      _screen.rectTransform.sizeDelta = new Vector2(_width, _height);
      _screen.texture = _webCamTexture;

      _inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
      _pixelData = new Color32[_width * _height];

      _graph = new CalculatorGraph(_configAsset.text);
      _graph.StartRun().AssertOk();

      while (true)
      {
        _inputTexture.SetPixels32(_webCamTexture.GetPixels32(_pixelData));
        var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, _inputTexture.GetRawTextureData<byte>());
        _graph.AddPacketToInputStream("input_video", new ImageFramePacket(imageFrame)).AssertOk();

        yield return new WaitForEndOfFrame();
      }
    }

    private void OnDestroy()
    {
      if (_webCamTexture != null)
      {
        _webCamTexture.Stop();
      }

      if (_graph != null)
      {
        try
        {
          _graph.CloseInputStream("input_video").AssertOk();
          _graph.WaitUntilDone().AssertOk();
        }
        finally
        {

          _graph.Dispose();
        }
      }
    }
  }
}

Let's play the scene!

face-mesh-load-fail

Well, it's not so easy, is it?

MediaPipeException: INVALID_ARGUMENT: Graph has errors: 
Calculator::Open() for node "facelandmarkfrontcpu__facelandmarkcpu__facelandmarksmodelloader__LocalFileContentsCalculator" failed: ; Can't find file: mediapipe/modules/face_landmark/face_landmark_with_attention.tflite

It looks like LocalFileContentsCalculator failed to load face_landmark_with_attention.tflite.
In the next section, we will resolve this error.

⚠️ If you get error messages like the following, go to [...].

F20220418 11:58:05.626176 230087 calculator_graph.cc:126] Non-OK-status: Initialize(config) status: NOT_FOUND: ValidatedGraphConfig Initialization failed.
No registered object with name: FaceLandmarkFrontCpu; Unable to find Calculator "FaceLandmarkFrontCpu"
No registered object with name: FaceRendererCpu; Unable to find Calculator "FaceRendererCpu"

Load model files

To load model files on Unity, we need to resolve their paths because they are hardcoded.
Not only that, we even need to save the file in a specific path because some calculators are written to read dependent resources from the file system.

💡 The path to save is not fixed since we can translate each model path into an arbitrary path.

But don't worry. In most cases, all you need to do is initialize a ResourceManager class and call the PrepareAssetAsync method in advance.

💡 PrepareAssetAsync method will save the specified file under Application.persistentDataPath.

For testing purposes, the LocalResourceManager class is sufficient.

var resourceManager = new LocalResourceManager();
yield return resourceManager.PrepareAssetAsync("dependent_asset_name");

In development / production, you can choose either StreamingAssetResourceManager or AssetBundleResourceManager.
For example, StreamingAssetResourceManager will load model files from Application.streamingAssetsPath.

// NOTE: Dependent assets must be placed under `Assets/StreamingAssets`.
var resourceManager = new StreamingAssetsResourceManager();
yield return resourceManager.PrepareAssetAsync("dependent_asset_name");

⚠️ ResourceManager class can be initialized only once. In other words, you cannot use both StreamingAssetResourceManager and AssetBundleResourceManager in one application.

Now, let's get back to the code.

After trial and error, we find that we need to prepare files face_detection_short_range.tflite and face_landmark_with_attention.tflite.
Unity does not support .tflite extension, so this plugin adopts the .bytes extension instead.

Now the entire code will look like this.

using System.Collections;
using UnityEngine;
using UnityEngine.UI;

namespace Mediapipe.Unity.Tutorial
{
  public class FaceMesh : MonoBehaviour
  {
    [SerializeField] private TextAsset _configAsset;
    [SerializeField] private RawImage _screen;
    [SerializeField] private int _width;
    [SerializeField] private int _height;
    [SerializeField] private int _fps;

    private CalculatorGraph _graph;
    private ResourceManager _resourceManager;

    private WebCamTexture _webCamTexture;
    private Texture2D _inputTexture;
    private Color32[] _pixelData;

    private IEnumerator Start()
    {
      if (WebCamTexture.devices.Length == 0)
      {
        throw new System.Exception("Web Camera devices are not found");
      }
      var webCamDevice = WebCamTexture.devices[0];
      _webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
      _webCamTexture.Play();

      yield return new WaitUntil(() => _webCamTexture.width > 16);

      _screen.rectTransform.sizeDelta = new Vector2(_width, _height);
      _screen.texture = _webCamTexture;

      _inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
      _pixelData = new Color32[_width * _height];

      _resourceManager = new LocalResourceManager();
      yield return _resourceManager.PrepareAssetAsync("face_detection_short_range.bytes");
      yield return _resourceManager.PrepareAssetAsync("face_landmark_with_attention.bytes");

      _graph = new CalculatorGraph(_configAsset.text);
      _graph.StartRun().AssertOk();

      while (true)
      {
        _inputTexture.SetPixels32(_webCamTexture.GetPixels32(_pixelData));
        var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, _inputTexture.GetRawTextureData<byte>());
        _graph.AddPacketToInputStream("input_video", new ImageFramePacket(imageFrame)).AssertOk();

        yield return new WaitForEndOfFrame();
      }
    }

    private void OnDestroy()
    {
      if (_webCamTexture != null)
      {
        _webCamTexture.Stop();
      }

      if (_graph != null)
      {
        try
        {
          _graph.CloseInputStream("input_video").AssertOk();
          _graph.WaitUntilDone().AssertOk();
        }
        finally
        {

          _graph.Dispose();
        }
      }
    }
  }
}

What will be the result this time...?

face-mesh-resource-loaded

Oops, once again I forgot to set the timestamp.
But what value should I set for the timestamp this time?

Set the correct timestamp

In the Hello World example, the loop variable i was set to the value of Timestamp.
In practice, however, MediaPipe assumes that the value of Timestamp is in microseconds (cf.mediapipe/framework/timestamp.h).

🔔 There are calculators that care about the absolute value of the Timestamp, which causes unintended behavior when used if the value is not in microseconds.

Let's initialize a Timestamp with a microsecond value from the start.

using Stopwatch = System.Diagnostics.Stopwatch;

var stopwatch = new Stopwatch();
stopwatch.Start();

var currentTimestamp = stopwatch.ElapsedTicks / (System.TimeSpan.TicksPerMillisecond / 1000);
var timestamp = new Timestamp(currentTimestamp);

And the entire code:

using System.Collections;
using UnityEngine;
using UnityEngine.UI;

using Stopwatch = System.Diagnostics.Stopwatch;

namespace Mediapipe.Unity.Tutorial
{
  public class FaceMesh : MonoBehaviour
  {
    [SerializeField] private TextAsset _configAsset;
    [SerializeField] private RawImage _screen;
    [SerializeField] private int _width;
    [SerializeField] private int _height;
    [SerializeField] private int _fps;

    private CalculatorGraph _graph;
    private ResourceManager _resourceManager;

    private WebCamTexture _webCamTexture;
    private Texture2D _inputTexture;
    private Color32[] _pixelData;

    private IEnumerator Start()
    {
      if (WebCamTexture.devices.Length == 0)
      {
        throw new System.Exception("Web Camera devices are not found");
      }
      var webCamDevice = WebCamTexture.devices[0];
      _webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
      _webCamTexture.Play();

      yield return new WaitUntil(() => _webCamTexture.width > 16);

      _screen.rectTransform.sizeDelta = new Vector2(_width, _height);
      _screen.texture = _webCamTexture;

      _inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
      _pixelData = new Color32[_width * _height];

      _resourceManager = new LocalResourceManager();
      yield return _resourceManager.PrepareAssetAsync("face_detection_short_range.bytes");
      yield return _resourceManager.PrepareAssetAsync("face_landmark_with_attention.bytes");

      var stopwatch = new Stopwatch();

      _graph = new CalculatorGraph(_configAsset.text);
      _graph.StartRun().AssertOk();
      stopwatch.Start();

      while (true)
      {
        _inputTexture.SetPixels32(_webCamTexture.GetPixels32(_pixelData));
        var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, _inputTexture.GetRawTextureData<byte>());
        var currentTimestamp = stopwatch.ElapsedTicks / (System.TimeSpan.TicksPerMillisecond / 1000);
        _graph.AddPacketToInputStream("input_video", new ImageFramePacket(imageFrame, new Timestamp(currentTimestamp))).AssertOk();

        yield return new WaitForEndOfFrame();
      }
    }

    private void OnDestroy()
    {
      if (_webCamTexture != null)
      {
        _webCamTexture.Stop();
      }

      if (_graph != null)
      {
        try
        {
          _graph.CloseInputStream("input_video").AssertOk();
          _graph.WaitUntilDone().AssertOk();
        }
        finally
        {

          _graph.Dispose();
        }
      }
    }
  }
}

face-mesh-timestamp

Now, it seems to be working.
But of course, we want to receive output next.

Get ImageFrame

In the Hello World example, we initialized OutputStreamPoller using CalculatorGraph#AddOutputStreamPoller.
This time, to handle output more easily, let's use the OutputStream API provided by the plugin instead!

var graph = new CalculatorGraph(_configAsset.text);
var outputVideoStream = new OutputStreasm<ImageFramePacket, ImageFrame>(graph, "output_video");

This may sound a bit tedious, but both Packet type and value type of output must be specified.
And before running the CalculatorGraph, call StartPolling.

// NOTE: StartPolling returns Status
outputVideoStream.StartPolling().AssertOk();
_graph.StartRun().AssertOk();

To get the next output, call TryNext.
It returns true if the next output is retrieved successfully.

if (outputVideoStream.TryGetNext(out var outputVideo))
{
  // ...
}

This time, let's display the output image directly on the screen.
We can read the pixel data using ImageFrame#TryReadPixelData.

// NOTE: TryReadPixelData is implemented in `Mediapipe.Unity.ImageFrameExtension`.
// using Mediapipe.Unity;

var outputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
var outputPixelData = new Color32[_width * _height];
_screen.texture = outputTexture;

if (outputVideoStream.TryGetNext(out var outputVideo))
{
  if (outputVideo.TryReadPixelData(outputPixelData))
  {
    outputTexture.SetPixels32(outputPixelData);
    outputTexture.Apply();
  }
}

Now our code should look something like this.

using System.Collections;
using UnityEngine;
using UnityEngine.UI;

using Stopwatch = System.Diagnostics.Stopwatch;

namespace Mediapipe.Unity.Tutorial
{
  public class FaceMesh : MonoBehaviour
  {
    [SerializeField] private TextAsset _configAsset;
    [SerializeField] private RawImage _screen;
    [SerializeField] private int _width;
    [SerializeField] private int _height;
    [SerializeField] private int _fps;

    private CalculatorGraph _graph;
    private ResourceManager _resourceManager;

    private WebCamTexture _webCamTexture;
    private Texture2D _inputTexture;
    private Color32[] _inputPixelData;
    private Texture2D _outputTexture;
    private Color32[] _outputPixelData;

    private IEnumerator Start()
    {
      if (WebCamTexture.devices.Length == 0)
      {
        throw new System.Exception("Web Camera devices are not found");
      }
      var webCamDevice = WebCamTexture.devices[0];
      _webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
      _webCamTexture.Play();

      yield return new WaitUntil(() => _webCamTexture.width > 16);

      _screen.rectTransform.sizeDelta = new Vector2(_width, _height);

      _inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
      _inputPixelData = new Color32[_width * _height];
      _outputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
      _outputPixelData = new Color32[_width * _height];

      _screen.texture = _outputTexture;

      _resourceManager = new LocalResourceManager();
      yield return _resourceManager.PrepareAssetAsync("face_detection_short_range.bytes");
      yield return _resourceManager.PrepareAssetAsync("face_landmark_with_attention.bytes");

      var stopwatch = new Stopwatch();

      _graph = new CalculatorGraph(_configAsset.text);
      var outputVideoStream = new OutputStream<ImageFramePacket, ImageFrame>(_graph, "output_video");
      outputVideoStream.StartPolling().AssertOk();
      _graph.StartRun().AssertOk();
      stopwatch.Start();

      while (true)
      {
        _inputTexture.SetPixels32(_webCamTexture.GetPixels32(_inputPixelData));
        var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, _inputTexture.GetRawTextureData<byte>());
        var currentTimestamp = stopwatch.ElapsedTicks / (System.TimeSpan.TicksPerMillisecond / 1000);
        _graph.AddPacketToInputStream("input_video", new ImageFramePacket(imageFrame, new Timestamp(currentTimestamp))).AssertOk();

        yield return new WaitForEndOfFrame();

        if (outputVideoStream.TryGetNext(out var outputVideo))
        {
          if (outputVideo.TryReadPixelData(_outputPixelData))
          {
            _outputTexture.SetPixels32(_outputPixelData);
            _outputTexture.Apply();
          }
        }
      }
    }

    private void OnDestroy()
    {
      if (_webCamTexture != null)
      {
        _webCamTexture.Stop();
      }

      if (_graph != null)
      {
        try
        {
          _graph.CloseInputStream("input_video").AssertOk();
          _graph.WaitUntilDone().AssertOk();
        }
        finally
        {

          _graph.Dispose();
        }
      }
    }
  }
}

Let's try running!

face-mesh-upside-down

Hmm, it seems to be working, but the top and bottom appear to be reversed.

Coordinate System

In Unity, the pixel data is stored from bottom-left to top-right, whereas MediaPipe assumes the pixel data is stored from top-left to bottom-right.
Therefore, if you send the pixel data to MediaPipe as is, MediaPipe will receive an upside-down image.

🔔 ImageFrame#TryReadPixelData automatically reads pixels upside down, so the output image is received correctly.

You can flip the input image vertically by yourself, but here we will use ImageTransformationCalculator.

node: {
  calculator: "ImageTransformationCalculator"
  input_stream: "IMAGE:throttled_input_video"
  output_stream: "IMAGE:transformed_input_video"
  node_options: {
    [type.googleapis.com/mediapipe.ImageTransformationCalculatorOptions] {
      flip_vertically: true
    }
  }
}

Don't forget to replace throttled_input_video with transformed_input_video.

 # Subgraph that detects faces and corresponding landmarks.
 node {
   calculator: "FaceLandmarkFrontCpu"
-  input_stream: "IMAGE:throttled_input_video"
+  input_stream: "IMAGE:transformed_input_video"
   input_side_packet: "NUM_FACES:num_faces"
   input_side_packet: "WITH_ATTENTION:with_attention"
   output_stream: "LANDMARKS:multi_face_landmarks"

 # Subgraph that renders face-landmark annotation onto the input image.
 node {
   calculator: "FaceRendererCpu"
-  input_stream: "IMAGE:throttled_input_video"
+  input_stream: "IMAGE:transformed_input_video"
   input_stream: "LANDMARKS:multi_face_landmarks"
   input_stream: "NORM_RECTS:face_rects_from_landmarks"
   input_stream: "DETECTIONS:face_detections"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting Started

Preparation

Build / Installation

Test

Import into your project

Build and import a unity package

Build and install a local tarball file

Install from a submodule

Tutorial

Hello World!

Send input

Get output

More Tips

Validate the config format

Official Solution

Setup WebCamTexture

Send ImageFrame

Load model files

Set the correct timestamp

Get ImageFrame

Coordinate System

Get landmarks

Annotation

More Tips

Load ImageFrame fast

Glog

Verbose Log

GPU Compute

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally