结合SK和ChatGLM3B+whisper+Avalonia实现语音切换城市

发布时间 2023-12-04 19:42:24作者: tokengo

结合SK和ChatGLM3B+whisper+Avalonia实现语音切换城市

先创建一个Avalonia的MVVM项目模板,项目名称GisApp

项目创建完成以后添加以下nuget依赖

<PackageReference Include="Mapsui.Avalonia" Version="4.1.1" />
<PackageReference Include="Microsoft.Extensions.DependencyInjection" Version="8.0.0" />
<PackageReference Include="Microsoft.Extensions.Http" Version="8.0.0" />
<PackageReference Include="Microsoft.SemanticKernel" Version="1.0.0-beta8" />
<PackageReference Include="NAudio" Version="2.2.1" />
<PackageReference Include="Whisper.net" Version="1.5.0" />
<PackageReference Include="Whisper.net.Runtime" Version="1.5.0" />
  • Mapsui.Avalonia是Avalonia的一个Gis地图组件
  • Microsoft.Extensions.DependencyInjection用于构建一个DI容器
  • Microsoft.Extensions.Http用于注册一个HttpClient工厂
  • Microsoft.SemanticKernel则是SK用于构建AI插件
  • NAudio是一个用于录制语音的工具包
  • Whisper.net是一个.NET的Whisper封装Whisper用的是OpenAI开源的语音识别模型
  • Whisper.net.Runtime属于Whisper

修改App.cs

打开App.cs,修改成以下代码

public partial class App : Application
{
    public override void Initialize()
    {
        AvaloniaXamlLoader.Load(this);
    }

    public override void OnFrameworkInitializationCompleted()
    {
        if (ApplicationLifetime is IClassicDesktopStyleApplicationLifetime desktop)
        {
            var services = new ServiceCollection();
            services.AddSingleton<MainWindow>((services) => new MainWindow(services.GetRequiredService<IKernel>(), services.GetRequiredService<WhisperProcessor>())
            {
                DataContext = new MainWindowViewModel(),
            });
            services.AddHttpClient();

            var openAIHttpClientHandler = new OpenAIHttpClientHandler();
            var httpClient = new HttpClient(openAIHttpClientHandler);
            services.AddTransient<IKernel>((serviceProvider) =>
            {
                return new KernelBuilder()
                    .WithOpenAIChatCompletionService("gpt-3.5-turbo-16k", "fastgpt-zE0ub2ZxvPMwtd6XYgDX8jyn5ubiC",
                        httpClient: httpClient)
                    .Build();
            });

            services.AddSingleton(() =>
            {
                var ggmlType = GgmlType.Base;
                // 定义使用模型
                var modelFileName = "ggml-base.bin";

                return WhisperFactory.FromPath(modelFileName).CreateBuilder()
                    .WithLanguage("auto") // auto则是自动识别语言
                    .Build();
            });

            var serviceProvider = services.BuildServiceProvider();

            desktop.MainWindow = serviceProvider.GetRequiredService<MainWindow>();
        }

        base.OnFrameworkInitializationCompleted();
    }
}

OpenAIHttpClientHandler.cs,这个文件是用于修改SK的访问地址,默认的SK只支持OpenAI官方的地址并且不能进行修改!

public class OpenAIHttpClientHandler : HttpClientHandler
{
    protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        if (request.RequestUri.LocalPath == "/v1/chat/completions")
        {
            var uriBuilder = new UriBuilder("http://您的ChatGLM3B地址/api/v1/chat/completions");
            request.RequestUri = uriBuilder.Uri;
        }
        
        return base.SendAsync(request, cancellationToken);
    }
}

修改ViewModels/MainWindowViewModel.cs

public class MainWindowViewModel : ViewModelBase
{
    private string subtitle = string.Empty;
    
    public string Subtitle
    {
        get => subtitle;
        set => this.RaiseAndSetIfChanged(ref subtitle, value);
    }

    private Bitmap butBackground;
    
    public Bitmap ButBackground
    {
        get => butBackground;
        set => this.RaiseAndSetIfChanged(ref butBackground, value);
    }
}
  • ButBackground是显示麦克风图标的写到模型是为了切换图标
  • Subtitle用于显示识别的文字

添加SK插件

创建文件/plugins/MapPlugin/AcquireLatitudeLongitude/config.json:这个是插件的相关配置信息

{
  "schema": 1,
  "type": "completion",
  "description": "获取坐标",
  "completion": {
    "max_tokens": 1000,
    "temperature": 0.3,
    "top_p": 0.0,
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0
  },
  "input": {
    "parameters": [
      {
        "name": "input",
        "description": "获取坐标",
        "defaultValue": ""
      }
    ]
  }
}

创建文件/plugins/MapPlugin/AcquireLatitudeLongitude/skprompt.txt:下面是插件的prompt,通过以下内容可以提取用户城市然后得到城市的经纬度

请返回{{$input}}的经纬度然后返回以下格式,不要回复只需要下面这个格式:
{
    "latitude":"",
    "longitude":""
}

修改Views/MainWindow.axaml代码,将[素材](# 素材)添加到Assets中,

<Window xmlns="https://github.com/avaloniaui"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:vm="using:GisApp.ViewModels"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        mc:Ignorable="d" d:DesignWidth="800" d:DesignHeight="450"
        x:Class="GisApp.Views.MainWindow"
        x:DataType="vm:MainWindowViewModel"
        Icon="/Assets/avalonia-logo.ico"
        Width="800"
        Height="800"
        Title="GisApp">
    <Design.DataContext>
        <vm:MainWindowViewModel />
    </Design.DataContext>
    <Grid>
        <Grid Name="MapStackPanel">
        </Grid>
        <StackPanel HorizontalAlignment="Right" VerticalAlignment="Bottom" Background="Transparent" Margin="25">
            <TextBlock Foreground="Black" Text="{Binding Subtitle}" Width="80" TextWrapping="WrapWithOverflow" Padding="8">
            </TextBlock>
            <Button Width="60" Click="Button_OnClick" Background="Transparent" VerticalAlignment="Center" HorizontalAlignment="Center">
                <Image Name="ButBackground" Source="{Binding ButBackground}" Height="40" Width="40"></Image>
            </Button>
        </StackPanel>
    </Grid>
</Window>

修改Views/MainWindow.axaml.cs代码

public partial class MainWindow : Window
{
    private bool openVoice = false;

    private WaveInEvent waveIn;

    private readonly IKernel _kernel;
    private readonly WhisperProcessor _processor;

    private readonly Channel<string> _channel = Channel.CreateUnbounded<string>();
    private MapControl mapControl;

    public MainWindow(IKernel kernel, WhisperProcessor processor)
    {
        _kernel = kernel;
        _processor = processor;
        InitializeComponent();
        mapControl = new MapControl();
        // 默认定位到深圳
        mapControl.Map = new Map()
        {
            CRS = "EPSG:3857",
            Home = n =>
            {
                var centerOfLondonOntario = new MPoint(114.06667, 22.61667);
                var sphericalMercatorCoordinate = SphericalMercator
                    .FromLonLat(centerOfLondonOntario.X, centerOfLondonOntario.Y).ToMPoint();
                n.ZoomToLevel(15);
                n.CenterOnAndZoomTo(sphericalMercatorCoordinate, n.Resolutions[15]);
            }
        };
        mapControl.Map?.Layers.Add(Mapsui.Tiling.OpenStreetMap.CreateTileLayer());
        MapStackPanel.Children.Add(mapControl);

        DataContextChanged += (sender, args) =>
        {
            using var voice = AssetLoader.Open(new Uri("avares://GisApp/Assets/voice.png"));

            ViewModel.ButBackground = new Avalonia.Media.Imaging.Bitmap(voice);
        };

        Task.Factory.StartNew(ReadMessage);
    }

    private MainWindowViewModel ViewModel => (MainWindowViewModel)DataContext;

    private void Button_OnClick(object? sender, RoutedEventArgs e)
    {
        if (openVoice)
        {
            using var voice = AssetLoader.Open(new Uri("avares://GisApp/Assets/voice.png"));

            ViewModel.ButBackground = new Avalonia.Media.Imaging.Bitmap(voice);


            waveIn.StopRecording();
        }
        else
        {
            using var voice = AssetLoader.Open(new Uri("avares://GisApp/Assets/open-voice.png"));

            ViewModel.ButBackground = new Avalonia.Media.Imaging.Bitmap(voice);

            // 获取当前麦克风设备
            waveIn = new WaveInEvent();
            waveIn.DeviceNumber = 0; // 选择麦克风设备,0通常是默认设备

            WaveFileWriter writer = new WaveFileWriter("recorded.wav", waveIn.WaveFormat);

            // 设置数据接收事件
            waveIn.DataAvailable += (sender, a) =>
            {
                Console.WriteLine($"接收到音频数据: {a.BytesRecorded} 字节");
                writer.Write(a.Buffer, 0, a.BytesRecorded);
                if (writer.Position > waveIn.WaveFormat.AverageBytesPerSecond * 30)
                {
                    waveIn.StopRecording();
                }
            };

            // 录音结束事件
            waveIn.RecordingStopped += async (sender, e) =>
            {
                writer?.Dispose();
                writer = null;

                waveIn.Dispose();


                await using var fileStream = File.OpenRead("recorded.wav");
                using var wavStream = new MemoryStream();

                await using var reader = new WaveFileReader(fileStream);
                var resampler = new WdlResamplingSampleProvider(reader.ToSampleProvider(), 16000);
                WaveFileWriter.WriteWavFileToStream(wavStream, resampler.ToWaveProvider16());

                wavStream.Seek(0, SeekOrigin.Begin);

                await Dispatcher.UIThread.InvokeAsync(() => { ViewModel.Subtitle = string.Empty; });

                string text = string.Empty;
                await foreach (var result in _processor.ProcessAsync(wavStream))
                {
                    await Dispatcher.UIThread.InvokeAsync(() => { ViewModel.Subtitle += text += result.Text; });
                }

                _channel.Writer.TryWrite(text);
            };

            Console.WriteLine("开始录音...");
            waveIn.StartRecording();
        }

        openVoice = !openVoice;
    }


    private async Task ReadMessage()
    {
        try
        {
            var pluginsDirectory = Path.Combine(Directory.GetCurrentDirectory(), "plugins");

            var chatPlugin = _kernel
                .ImportSemanticFunctionsFromDirectory(pluginsDirectory, "MapPlugin");

            // 循环读取管道中的数据
            while (await _channel.Reader.WaitToReadAsync())
            {
                // 读取管道中的数据
                while (_channel.Reader.TryRead(out var message))
                {
                    // 使用AcquireLatitudeLongitude插件,解析用户输入的地点,然后得到地点的经纬度
                    var value = await _kernel.RunAsync(new ContextVariables
                    {
                        ["input"] = message
                    }, chatPlugin["AcquireLatitudeLongitude"]);

                    // 解析字符串成模型
                    var acquireLatitudeLongitude =
                        JsonSerializer.Deserialize<AcquireLatitudeLongitude>(value.ToString());

                    // 使用MapPlugin插件,定位到用户输入的地点
                    var centerOfLondonOntario = new MPoint(acquireLatitudeLongitude.longitude, acquireLatitudeLongitude.latitude);
                    var sphericalMercatorCoordinate = SphericalMercator
                        .FromLonLat(centerOfLondonOntario.X, centerOfLondonOntario.Y).ToMPoint();

                    // 默认使用15级缩放
                    mapControl.Map.Navigator.ZoomToLevel(15);
                    mapControl.Map.Navigator.CenterOnAndZoomTo(sphericalMercatorCoordinate, mapControl.Map.Navigator.Resolutions[15]);
                    
                }
            }
        }
        catch (Exception e)
        {
            Console.WriteLine(e);
        }
    }

    public class AcquireLatitudeLongitude
    {
        public double latitude { get; set; }
        public double longitude { get; set; }
    }
}

流程讲解:

  1. 用户点击了录制按钮触发了Button_OnClick事件,然后在Button_OnClick事件中会打开用户的麦克风,打开麦克风进行录制,在录制结束事件中使用录制完成产生的wav文件,然后拿到Whisper进行识别,识别完成以后会将识别结果写入到_channel
  2. ReadMessage则是一直监听_channel的数据,当有数据写入,这里则会读取到,然后就将数据使用下面的sk执行AcquireLatitudeLongitude函数。
 var value = await _kernel.RunAsync(new ContextVariables
                    {
                        ["input"] = message
                    }, chatPlugin["AcquireLatitudeLongitude"]);
  1. 在解析value得到用户的城市经纬度
  2. 通过mapControl.Map.Navigator修改到指定经纬度。

完整的操作流程就完成了,当然实际业务会比这个更复杂。

素材


分享总结

讨论总结:
在本次会议中,讨论了如何结合SK、ChatGLM3B、Whisper和Avalonia来实现语音切换城市的功能。具体讨论了创建Avalonia的MVVM项目模板,添加了相关的NuGet依赖,修改了App.cs、ViewModels/MainWindowViewModel.cs以及添加了SK插件的相关配置和文件。

行动项目:

  1. 创建Avalonia的MVVM项目模板,项目名称为GisApp
  2. 添加所需的NuGet依赖,包括Mapsui.Avalonia, Microsoft.Extensions.DependencyInjection, Microsoft.Extensions.Http, Microsoft.SemanticKernel, NAudio, Whisper.netWhisper.net.Runtime
  3. 修改App.csOpenAIHttpClientHandler.csViewModels/MainWindowViewModel.cs以及相关的视图文件。
  4. 添加SK插件,包括创建相关的配置信息和prompt文件。
  5. 实现录制语音、语音识别和切换城市的功能流程。

技术交流群:737776595