음성 통신 3편: C++로 WAV 파일 형식 바꾸기

#audio#c++

240612 15:26 views: 126

[음성 통신] 시리즈

안녕하세요, 신동민입니다..! 원래 계획대로는 WASAPI로 WAV 파일을 스피커로 출력하는 방법을 알아보려고 했지만 그 전에 다루어야 할 것이 있습니다. 바로 오디오 포맷 변환 입니다. WAV 파일의 오디오 형식과 스피커에서 지원하는 오디오 형식이 서로 맞지 않는다면 둘 중 하나의 오디오 형식을 변환해서 서로 맞춰야 합니다. 스피커는 지원하는 오디오 형식이 한정되어 있기 때문에 WAV의 오디오 형식을 변환하는 것이 더 범용적입니다. 이번에는 WAV파일의 오디오 형식을 변환하고 저장하는 프로그램을 작성해 보겠습니다.

프로그램 아키텍처

프로그램을 만들기 전에 흐름을 먼저 생각해 봅시다.

1. WAV 파일을 읽어오기: 먼저 포맷을 변경할 WAV 파일을 프로그램에서 불러와야 합니다. 파일을 불러온 다음에 파일의 헤더가 적절한지, 파일 길이에 이상이 없는지 등 오류 검출 작업도 포함됩니다. 오류가 없다면 오디오의 형식을 FMT 청크에서 로드합니다. 오디오의 PCM 데이터 또한 배열에 로드합니다.
2. 원하는 형식으로 오디오 형식 변환: 원하는 Sample-rate와 Bit Depth가 주어지면 로드된 PCM 데이터의 오디오 형식을 변환합니다. 이번 포스트에서는 채널 수 변경까지는 다루지 않겠습니다.
3. WAV 파일을 다시 저장하기: 형식이 변경된 WAV 파일을 다시 저장합니다.

1번과 3번은 간단한 프로그래밍 지식으로 해결할 수 있습니다. 다만 2번은 어떻게 해결해야 할까요?

Resampling에 대해서

원 오디오 신호의 Sample-Rate를 바꾸는 연산을 Resampling 이라고 합니다. Resampling은 Upsampling과 Downsampling 두가지로 나뉘고 둘은 Interpolation과 Decimation으로 불리기도 합니다.

Resampling은 사실 복잡한 프로세스이며 많은 노하우가 필요한 영역입니다. 하지만 이번 포스트에서는 간단한 보간법(Interpolation)만 사용해서 오디오 신호의 Resampling을 해보겠습니다.

보간법에 대해서

보간법은 주어진 데이터 점들이 있을 때 점들 사이의 값을 추정하는 방법입니다. 여러가지 보간법이 있지만 이 포스트에서는 두가지 보간법만 소개시켜드리고 넘어가겠습니다.

다음과 같이 8개의 점이 1차원으로 존재한다고 해봅시다.

여기서 이 물음표 부분의 값이 필요할 때 어떤 값으로 추정하는 것이 좋을까요?

Nearest-Neighbor Interpolation

첫번째로 가장 심플한 방법입니다. 최근접 이웃 보간법(Nearest-Neighbor Interpolation)은 어떤 한 포인트의 값을 예측할 때 그 포인트와 가장 가까운 점의 값으로 추정하는 방법입니다.

가장 간편하고 구현이 쉽지만 품질이 좋진 않습니다.

수식은 다음과 같습니다: \( y[n] = x[round(\frac{R_{input}}{R_{output}}n)], R_{input} = Input SampleRate, R_{output} = Output SampleRate\)

아래는 구현된 C++ 코드입니다. 원본 vector와 원본 Sample rate, 변경시킬 Sample rate를 입력으로 받고 Resample된 vector를 출력합니다.

std::vector<int32_t> NearestNeighbor(const std::vector<int32_t>& input, int inputRate, int outputRate) {
    if (inputRate == outputRate) {
        return input;
    }

    std::vector<int32_t> output;
    double ratio = static_cast<double>(inputRate) / outputRate;
    double nextSampleIndex = 0.0;

    while (nextSampleIndex < input.size()) {
        int index = static_cast<int>(std::round(nextSampleIndex));

        if (index >= input.size()) {
            index = input.size() - 1;
        }

        output.push_back(input[index]);
        nextSampleIndex += ratio;
    }

    return output;
}

Linear Interpolation

다음으로 선형 보간법(Linear Interpolation)은 어떤 한 포인트의 값을 예측할 때 그 포인트와 가장 가까운 두 점을 선으로 긋고 그 선과 포인트 간의 교점을 새로운 값으로 추정하는 방법입니다.

구현이 간편하고 품질 또한 그럭저럭 괜찮습니다.

위 그래프처럼 원본 디지털 신호 \((k, p_1), (k+1,p_2)\) 사이의 값 \(k+i (0 \leq i < 1, k \in \{0\} \cup \mathbb{N})\)을 유추할때 두 신호값을 지나는 직선의 방정식은 다음과 같습니다: \( y = (p_2 - p_1)(x - k) + p_1 \)

이때 추정된 점의 값은 다음과 같습니다: \( (k + i, (1 - p_1)i + p_2 i) \)

수식으로 정리하면 다음과 같습니다: \( y[n] = (1 - i) \cdot x[\lfloor \frac{R_{input}}{R_{output}}n \rfloor] + i \cdot x[\lfloor \frac{R_{input}}{R_{output}}n \rfloor + 1], i = \frac{R_{input}}{R_{output}}n - \lfloor \frac{R_{input}}{R_{output}}n \rfloor \)

아래는 구현된 C++ 코드입니다. 위에서와 똑같이 원본 vector와 원본 Sample rate, 변경시킬 Sample rate를 입력으로 받고 Resample된 vector를 출력합니다.

std::vector<int32_t> Linear(const std::vector<int32_t>& input, int inputRate, int outputRate) {
    if (inputRate == outputRate) {
        return input;
    }

    std::vector<int> output;
    double ratio = static_cast<double>(inputRate) / outputRate;
    double nextSampleIndex = 0.0;

    while (nextSampleIndex < input.size() - 1) {
        int index = static_cast<int>(nextSampleIndex);
        double fraction = nextSampleIndex - index;

        double interpolatedSample = (1 - fraction) * input[index] + fraction * input[index + 1];
        output.push_back(static_cast<int>(interpolatedSample + 0.5));

        nextSampleIndex += ratio;
    }

    return output;
}

Requantization에 대해서

8bit PCM 음원을 16bit로 변환한다고 생각해봅시다. 8bit 음원의 한 샘플의 값이 127, 즉 2의 보수에서 8bit 정수의 최대값일때 이것을 16bit로 변환할 때 그대로 127로 변환해야 할까요, 2의 보수에서 16bit 정수의 최대값인 32,767으로 변환해야 할까요?

정답은 32,767로 변환하는 것입니다. Requantization을 할 때 단순히 비트 수만 추가하는것이 아닌 값의 비율을 맞춰야 합니다.

수식은 다음과 같습니다: \( f(n, m) = \frac{1}{2^{n - 1} - 1} \times ( 2^{m - 1} - 1 ) \)

코드로 구현하면 다음과 같습니다.

int32_t Requantization(int value, size_t n, size_t m) {
    int64_t nMax = (1LL << (n * 8 - 1)) - 1;
    int64_t mMax = (1LL << (m * 8 - 1)) - 1;
    int64_t mMin = -1LL << (m * 8 - 1);

    double scale = static_cast<double>(value) / nMax;

    int64_t result = static_cast<int64_t>(scale * mMax);

        if (result < mMin) {
            result = mMin;
        }
        if (result > mMax) {
            result = mMax;
        }

    return static_cast<int32_t>(result);
}

그 외 필요한 유틸리티

PCM 데이터를 읽어오면 처음에는 가공되지 않은 단순한 바이트 배열만 있을 뿐입니다. 이 바이트 배열을 적절한 처리로 숫자 벡터로 만들어야 합니다. char 배열에서 length개의 바이트를 읽어와서 32비트 정수로 만들어줍니다. 2의 보수에서 가장 첫번째 비트는 부호비트이기 때문에 그것에 대한 처리도 해줘야 합니다.

이 코드는 시스템이 little-endian이라고 가정되고 작성되었습니다.

PCM은 8비트, 16비트, 24비트, 32비트 등 다양하지만 로직의 복잡함을 줄이기 위해서 내부적으로는 모두 32비트 정수형 하나로 처리를 해줍니다.

int32_t CharToInt(char* array, size_t length) {
    int32_t result = 0;

    for (size_t i = 0; i < length; ++i) {
        result |= (static_cast<unsigned char>(array[i]) << (i * 8));
    }

    if (array[length - 1] & 0x80) {
        for (size_t i = length; i < 4; ++i) {
            result |= (0xFF << (i * 8));
        }
    }

    return result;
}

이제 반대로 WAV 파일을 쓸 때 숫자 배열을 바이트 배열로 바꿔줘야 합니다.

void IntToChar(int value, char* array, size_t length) {
    for (size_t i = 0; i < length; ++i) {
        array[i] = (value >> (i * 8)) & 0xFF;
    }
}

Wave 클래스를 작성해보자!

모든 준비는 끝났습니다. 그러면 이제 WAV 파일을 읽고 가공하고 파일에 쓰는 기능이 모여진 C++ 클래스를 작성해 보겠습니다.

헤더파일 먼저 작성해 보겠습니다. 참고로 이전 포스트에서 썼던 WAV 구조체 코드 뒤에 이어서 작성하도록 하겠습니다.

// Wave.h
#pragma once

#include <mmdeviceapi.h>

#include <cstdint>
#include <cstring>
#include <exception>
#include <string>
#include <vector>

/* WAVE Structures */

class Wave {
private:
    WAVEFORMATEX wfx;
    std::vector<std::vector<int32_t>> pcm;

    std::vector<int32_t> NearestNeighbor(const std::vector<int32_t>& input, int inputRate, int outputRate);
    std::vector<int32_t> Linear(const std::vector<int32_t>& input, int inputRate, int outputRate);
    int32_t CharToInt(char* array, size_t length);
    void IntToChar(int value, char* array, size_t length);
    int32_t Requantization(int value, size_t n, size_t m);

public:
    Wave(const char* path);
    void Convert(int sampleRate, int numChannels, int bitDepth);
    void Save(const char* filename);
};

class WaveException : public std::exception {
private:
    std::string message;

public:
    WaveException(const std::string& msg);
    const char* what() const noexcept override;
};

클래스의 멤버로 wfx구조체와 pcm벡터를 정의합니다.

wfx: 클래스가 읽어온 WAV파일의 오디오 형식을 담는 구조체 입니다.
pcm: WAV파일을 읽은 뒤 PCM 데이터를 담을 2차원 벡터입니다. 벡터에 들어가는 데이터의 자료형은 32비트 부호있는 정수입니다. PCM은 8비트, 16비트, 24비트, 32비트 등 다양하지만 모든 bit depth에 대해 따로 처리를 해주기는 복잡해지니 32비트 정수형으로 일괄 처리를 하고 나중에 WAV 파일을 다시 파일에 쓸 때 32비트 정수형을 오디오 형식에 맞는 bit depth만큼의 바이트 배열로 바꿔주는 과정을 거칠 것입니다.

다음으로 생성자와 여러 메서드들을 정의합니다.

Wave 생성자: 클래스 생성 시 WAV파일의 경로 문자열을 입력으로 받습니다.
Convert: 클래스의 pcm 벡터를 입력된 오디오 형식으로 변환합니다.
Save: 클래스의 pcm 벡터를 입력된 파일 이름으로 현재 디렉토리에 작성합니다.

마지막으로 예외 처리를 위한 std::exception을 상속받은 예외 클래스를 만들어 줍니다.

파일을 읽어올 때 체크리스트

이제 구현부의 코드를 살펴보겠습니다. 먼저 Wave 클래스 생성자에서 WAV 파일을 읽어와야 합니다. 그리고 읽어온 파일이 무결점인 WAV인지 검사를 해야 합니다. 이전 포스트에서 WAV 파일의 구조를 설명드렸으니 어렵지 않게 파일의 오류 체크리스트를 만들 수 있을 것입니다.

1. 파일의 크기가 44Bytes 이상인가: WAV 파일은 적어도 44바이트 만큼의 헤더를 가지고 있어야 합니다. (RIFF Chunk + fmt chunk + data chunk = 44) 파일 길이가 만약 44바이트 미만이라면 올바르지 못한 헤더를 가지고 있다는 것을 체크할 수 있습니다.
2. 파일이 RIFF 청크로 시작되며 올바르게 되어 있는가: 파일의 첫 4바이트는 "RIFF"로 시작해야 합니다. 다음 4바이트는 little-endian unsigned 정수형으로 전체파일크기 - 8 값인지 체크합니다. 다음 4바이트는 "WAVE" 문자열이 저장되어 있는지 확인해야 합니다.
3. FMT 청크와 DATA 청크가 있는가: RIFF 청크는 확인했으니 다음 필수 청크들이 있는지 확인해야 합니다.
4. PCM 데이터인가: WAV 파일이라고 모두 Linear PCM 오디오 데이터만 저장된 것이 아닙니다. 이것은 FMT 청크에서 wFormatTag 값이 1인지 체크하면 됩니다.

// Wave.cpp
#include "Wave.h"

#include <fstream>
#include <iostream>


Wave::Wave(const char* path) {
    std::ifstream waveFile(path, std::ifstream::binary);
    waveFile.seekg(0, waveFile.end);
    int length = (int)waveFile.tellg();
    waveFile.seekg(0, waveFile.beg);

    if (length < 44) {
        throw WaveException("Invalid Length");
    }

    char* raw = new char[length];
    waveFile.read(raw, length);

    int cursor = 12;
    char chunkID[4];
    uint32_t chunkSize = 0;
    uint32_t pcmSize = 0;
    char* buffer = nullptr;
    bool bFMT = false;
    bool bDATA = false;

    // RIFF Chunk Check
    memcpy(chunkID, raw, 4);
    memcpy(&chunkSize, raw + 4, 4);
    if (std::memcmp(chunkID, "RIFF", 4) != 0) {
        throw WaveException("NOT RIFF");
    }
    if (chunkSize != length - 8) {
        throw WaveException("Invalid Length");
    }
    if (std::memcmp(raw + 8, "WAVE", 4) != 0) {
        throw WaveException("Not WaveFile");
    }

    // Reading Other Chunk
    while (cursor < length - 8) {
        char chunkID[4];
        memcpy(chunkID, raw + cursor, 4);
        uint32_t chunkSize;
        memcpy(&chunkSize, raw + cursor + 4, 4);

        if (std::memcmp(chunkID, "fmt ", 4) == 0) {
            memcpy(&wfx, raw + cursor + 8, sizeof(PCMWAVEFORMAT));
            bFMT = true;
        } else if (std::memcmp(chunkID, "data", 4) == 0) {
            pcmSize = chunkSize;
            buffer = new char[chunkSize];
            memcpy(buffer, raw + cursor + 8, chunkSize);
            bDATA = true;
        }
        cursor += chunkSize + (chunkSize % 2) + 8;
    }

    if (!bFMT || !bDATA) {
        throw WaveException("Insufficient Chunk");
    }
    if (wfx.wFormatTag != WAVE_FORMAT_PCM) {
        throw WaveException("Not PCM");
    }

    // Preparing PCM data
    for (int i = 0; i < wfx.nChannels; ++i) {
        pcm.push_back(std::vector<int32_t>());
    }

    for (int i = 0; i < pcmSize; i += wfx.wBitsPerSample / 8) {
        int32_t temp = CharToInt(buffer + i, wfx.wBitsPerSample / 8);
        pcm[(i / (wfx.wBitsPerSample / 8)) % wfx.nChannels].push_back(temp);
    }

    delete[] raw;
    delete[] buffer;
}

std::ifstream으로 WAV 파일을 이진 형식으로 가져오고 적절한 처리를 해서 wfx와 pcm 벡터를 채워줍니다. PCM 데이터는 다중 채널일때 채널이 번갈아가면서 나옵니다. 이것은 나중에 로직의 복잡함을 유발할 수 있으므로 pcm 벡터는 아래 사진과 같이 처리를 해서 하나의 행에는 하나의 채널이 열거되도록 합니다.

다음으로 Convert 메서드의 구현입니다. 단순하게 채널별로 보간을 적용해주면 됩니다. 보간을 적용하고 wfx 구조체의 값들을 변경된 오디오 형식으로 바꿔줍니다.

이 메서드는 만약 변경전 오디오 채널 수와 변경할 오디오 채널 수가 같지 않다면 예외를 뱉습니다.

void Wave::Convert(int sampleRate, int numChannels, int bitDepth) {
    if (wfx.nChannels != numChannels) {
        throw WaveException("Channel Exception");
    }

    std::vector<std::vector<int32_t>> result;
    for (int i = 0; i < wfx.nChannels; ++i) {
        // result.push_back(NearestNeighbor(pcm[i], pwfx->nSamplesPerSec, sampleRate));
        result.push_back(Linear(pcm[i], wfx.nSamplesPerSec, sampleRate));
    }

    for (auto& channel : result) {
        for (auto& sample : channel) {
            sample = Requantization(sample, wfx.wBitsPerSample / 8, bitDepth / 8);
        }
    }

    wfx.wFormatTag = WAVE_FORMAT_PCM;
    wfx.nChannels = numChannels;
    wfx.nSamplesPerSec = sampleRate;
    wfx.nAvgBytesPerSec = sampleRate * numChannels * bitDepth / 8;
    wfx.nBlockAlign = numChannels * bitDepth / 8;
    wfx.wBitsPerSample = bitDepth;
    wfx.cbSize = 0;

    pcm = result;
}

마지막으로 Save 메서드 입니다. 지난 포스트에서 WAV파일을 저장하는 코드와 매우 비슷합니다.

pwx 2차원 벡터를 다시 1차원으로 직렬화해준 뒤 IntToChar 메서드로 32비트 정수를 n비트 정수 바이트 배열로 변환해줍니다.

void Wave::Save(const char* filename) {
    std::vector<int32_t> serialized;
    for (int i = 0; i < pcm[0].size(); ++i) {
        for (int j = 0; j < wfx.nChannels; ++j) {
            serialized.push_back(pcm[j][i]);
        }
    }

    int byteSize = wfx.wBitsPerSample / 8;
    char* result = new char[serialized.size() * byteSize];
    char temp[4];

    for (int i = 0; i < serialized.size(); ++i) {
        IntToChar(serialized[i], temp, byteSize);
        memcpy(result + (i * byteSize), temp, byteSize);
    }

    WAVE_HEADER header(
        wfx.nSamplesPerSec,
        wfx.nChannels,
        wfx.wBitsPerSample,
        serialized.size() * byteSize
    );
    std::ofstream wavFile(filename, std::ios::binary);
    wavFile.write(reinterpret_cast<char*>(&header), sizeof(WAVE_HEADER));
    wavFile.write(result, serialized.size() * byteSize);
    wavFile.close();

    delete[] result;
}

예외 클래스 또한 적절하게 구현해 줍니다.

WaveException::WaveException(const std::string& msg) : message(msg) {};
const char* WaveException::what() const noexcept {
    return message.c_str();
}

클래스를 사용하는 것은 간단합니다. 클래스의 인스턴스를 만든 뒤 Convert 메서드를 호출하고 Save 메서드를 호출하면 됩니다. 저같은 경우 다음과 같이 코드를 작성했습니다.

// main.cpp
#include "Wave.h"

#include <iostream>
#include <string>


int main() {
    std::string input;
    int sampleRate, numChannels, bitDepth;
    std::string output;

    try {
        std::cout << "Input Filename: ";
        std::cin >> input;
        Wave wave(input.c_str());

        std::cout << "SampleRate: ";
        std::cin >> sampleRate;
        std::cout << "Channels: ";
        std::cin >> numChannels;
        std::cout << "BitDepth: ";
        std::cin >> bitDepth;
        wave.Convert(sampleRate, numChannels, bitDepth);

        std::cout << "Output Filename: ";
        std::cin >> output;
        wave.Save(output.c_str());
    } catch (const WaveException& e) {
        std::cerr << "Wave Exception: " << e.what() << std::endl;
    } catch (const std::exception& e) {
        std::cout << "Standard Exception: " << e.what() << std::endl;
    }

    std::cout << "Done!" << std::endl;
    std::cin.get();

    return 0;
}

이제 코드를 빌드하고 실행한 뒤 적절하게 값을 입력하면 컨버팅된 WAV파일을 확인할 수 있습니다!

// Wave.h
#pragma once

#include <mmdeviceapi.h>

#include <cstdint>
#include <cstring>
#include <exception>
#include <string>
#include <vector>

#pragma pack(push, 1)
typedef struct {
    char chunkID[4];
    uint32_t fileSize;
    char fileType[4];
} RIFF;

typedef struct {
    char chunkID[4];
    uint32_t chunkSize;
    uint16_t wFormatTag;
    uint16_t nChannels;
    uint32_t nSamplePerSec;
    uint32_t nAvgBytesPerSec;
    uint16_t nBlockAlign;
    uint16_t wBitsPerSample;
} FMT;

typedef struct {
    char chunkID[4];
    uint32_t chunkSize;
} DATA;

typedef struct WAVE_HEADER {
    RIFF riff;
    FMT fmt;
    DATA data;
    WAVE_HEADER(int sampleRate, int numChannels, int bitDepth, size_t pcmSize) {
        // RIFF Chunk
        memcpy(riff.chunkID, "RIFF", 4);
        riff.fileSize = pcmSize + sizeof(RIFF) + sizeof(FMT) + sizeof(DATA) - 8;
        memcpy(riff.fileType, "WAVE", 4);

        // fmt Chunk
        memcpy(fmt.chunkID, "fmt ", 4);
        fmt.chunkSize = 16;
        fmt.wFormatTag = 1;
        fmt.nChannels = numChannels;
        fmt.nSamplePerSec = sampleRate;
        fmt.nAvgBytesPerSec = sampleRate * numChannels * bitDepth / 8;
        fmt.nBlockAlign = numChannels * bitDepth / 8;
        fmt.wBitsPerSample = bitDepth;

        // data Chunk
        memcpy(data.chunkID, "data", 4);
        data.chunkSize = pcmSize;
    }
} WAVE_HEADER;
#pragma pack(pop)

class Wave {
private:
    WAVEFORMATEX wfx;
    std::vector<std::vector<int32_t>> pcm;

    std::vector<int32_t> NearestNeighbor(const std::vector<int32_t>& input, int inputRate, int outputRate);
    std::vector<int32_t> Linear(const std::vector<int32_t>& input, int inputRate, int outputRate);
    int32_t CharToInt(char* array, size_t length);
    void IntToChar(int value, char* array, size_t length);
    int32_t Requantization(int value, size_t n, size_t m);

public:
    Wave(const char* path);
    void Convert(int sampleRate, int numChannels, int bitDepth);
    void Save(const char* filename);
};

class WaveException : public std::exception {
private:
    std::string message;

public:
    WaveException(const std::string& msg);
    const char* what() const noexcept override;
};

// Wave.cpp
#include "Wave.h"

#include <fstream>
#include <iostream>


Wave::Wave(const char* path) {
    std::ifstream waveFile(path, std::ifstream::binary);
    waveFile.seekg(0, waveFile.end);
    int length = (int)waveFile.tellg();
    waveFile.seekg(0, waveFile.beg);

    if (length < 44) {
        throw WaveException("Invalid Length");
    }

    char* raw = new char[length];
    waveFile.read(raw, length);

    int cursor = 12;
    char chunkID[4];
    uint32_t chunkSize = 0;
    uint32_t pcmSize = 0;
    char* buffer = nullptr;
    bool bFMT = false;
    bool bDATA = false;

    // RIFF Chunk Check
    memcpy(chunkID, raw, 4);
    memcpy(&chunkSize, raw + 4, 4);
    if (std::memcmp(chunkID, "RIFF", 4) != 0) {
        throw WaveException("NOT RIFF");
    }
    if (chunkSize != length - 8) {
        throw WaveException("Invalid Length");
    }
    if (std::memcmp(raw + 8, "WAVE", 4) != 0) {
        throw WaveException("Not WaveFile");
    }

    // Reading Other Chunk
    while (cursor < length - 8) {
        char chunkID[4];
        memcpy(chunkID, raw + cursor, 4);
        uint32_t chunkSize;
        memcpy(&chunkSize, raw + cursor + 4, 4);

        if (std::memcmp(chunkID, "fmt ", 4) == 0) {
            memcpy(&wfx, raw + cursor + 8, sizeof(PCMWAVEFORMAT));
            bFMT = true;
        } else if (std::memcmp(chunkID, "data", 4) == 0) {
            pcmSize = chunkSize;
            buffer = new char[chunkSize];
            memcpy(buffer, raw + cursor + 8, chunkSize);
            bDATA = true;
        }
        cursor += chunkSize + (chunkSize % 2) + 8;
    }

    if (!bFMT || !bDATA) {
        throw WaveException("Insufficient Chunk");
    }
    if (wfx.wFormatTag != WAVE_FORMAT_PCM) {
        throw WaveException("Not PCM");
    }

    // Preparing PCM data
    for (int i = 0; i < wfx.nChannels; ++i) {
        pcm.push_back(std::vector<int32_t>());
    }

    for (int i = 0; i < pcmSize; i += wfx.wBitsPerSample / 8) {
        int32_t temp = CharToInt(buffer + i, wfx.wBitsPerSample / 8);
        pcm[(i / (wfx.wBitsPerSample / 8)) % wfx.nChannels].push_back(temp);
    }

    delete[] raw;
    delete[] buffer;
}

void Wave::Convert(int sampleRate, int numChannels, int bitDepth) {
    if (wfx.nChannels != numChannels) {
        throw WaveException("Channel Exception");
    }

    std::vector<std::vector<int32_t>> result;
    for (int i = 0; i < wfx.nChannels; ++i) {
        // result.push_back(NearestNeighbor(pcm[i], pwfx->nSamplesPerSec, sampleRate));
        result.push_back(Linear(pcm[i], wfx.nSamplesPerSec, sampleRate));
    }

    for (auto& channel : result) {
        for (auto& sample : channel) {
            sample = Requantization(sample, wfx.wBitsPerSample / 8, bitDepth / 8);
        }
    }

    wfx.wFormatTag = WAVE_FORMAT_PCM;
    wfx.nChannels = numChannels;
    wfx.nSamplesPerSec = sampleRate;
    wfx.nAvgBytesPerSec = sampleRate * numChannels * bitDepth / 8;
    wfx.nBlockAlign = numChannels * bitDepth / 8;
    wfx.wBitsPerSample = bitDepth;
    wfx.cbSize = 0;

    pcm = result;
}

void Wave::Save(const char* filename) {
    std::vector<int32_t> serialized;
    for (int i = 0; i < pcm[0].size(); ++i) {
        for (int j = 0; j < wfx.nChannels; ++j) {
            serialized.push_back(pcm[j][i]);
        }
    }

    int byteSize = wfx.wBitsPerSample / 8;
    char* result = new char[serialized.size() * byteSize];
    char temp[4];

    for (int i = 0; i < serialized.size(); ++i) {
        IntToChar(serialized[i], temp, byteSize);
        memcpy(result + (i * byteSize), temp, byteSize);
    }

    WAVE_HEADER header(
        wfx.nSamplesPerSec,
        wfx.nChannels,
        wfx.wBitsPerSample,
        serialized.size() * byteSize
    );
    std::ofstream wavFile(filename, std::ios::binary);
    wavFile.write(reinterpret_cast<char*>(&header), sizeof(WAVE_HEADER));
    wavFile.write(result, serialized.size() * byteSize);
    wavFile.close();

    delete[] result;
}

std::vector<int32_t> Wave::NearestNeighbor(const std::vector<int32_t>& input, int inputRate, int outputRate) {
    if (inputRate == outputRate) {
        return input;
    }

    std::vector<int32_t> output;
    double ratio = static_cast<double>(inputRate) / outputRate;
    double nextSampleIndex = 0.0;

    while (nextSampleIndex < input.size()) {
        int index = static_cast<int>(std::round(nextSampleIndex));

        if (index >= input.size()) {
            index = input.size() - 1;
        }

        output.push_back(input[index]);
        nextSampleIndex += ratio;
    }

    return output;
}

std::vector<int32_t> Wave::Linear(const std::vector<int32_t>& input, int inputRate, int outputRate) {
    if (inputRate == outputRate) {
        return input;
    }

    std::vector<int> output;
    double ratio = static_cast<double>(inputRate) / outputRate;
    double nextSampleIndex = 0.0;

    while (nextSampleIndex < input.size() - 1) {
        int index = static_cast<int>(nextSampleIndex);
        double fraction = nextSampleIndex - index;

        double interpolatedSample = (1 - fraction) * input[index] + fraction * input[index + 1];
        output.push_back(static_cast<int>(interpolatedSample + 0.5));

        nextSampleIndex += ratio;
    }

    return output;
}

int32_t Wave::CharToInt(char* array, size_t length) {
    int32_t result = 0;

    for (size_t i = 0; i < length; ++i) {
        result |= (static_cast<unsigned char>(array[i]) << (i * 8));
    }

    if (array[length - 1] & 0x80) {
        for (size_t i = length; i < 4; ++i) {
            result |= (0xFF << (i * 8));
        }
    }

    return result;
}

void Wave::IntToChar(int value, char* array, size_t length) {
    for (size_t i = 0; i < length; ++i) {
        array[i] = (value >> (i * 8)) & 0xFF;
    }
}

int32_t Wave::Requantization(int value, size_t n, size_t m) {
    int64_t nMax = (1LL << (n * 8 - 1)) - 1;
    int64_t mMax = (1LL << (m * 8 - 1)) - 1;
    int64_t mMin = -1LL << (m * 8 - 1);

    double scale = static_cast<double>(value) / nMax;

    int64_t result = static_cast<int64_t>(scale * mMax);

    if (result < mMin) {
        result = mMin;
    }
    if (result > mMax) {
        result = mMax;
    }

    return static_cast<int32_t>(result);
}


WaveException::WaveException(const std::string& msg) : message(msg) {};
const char* WaveException::what() const noexcept {
    return message.c_str();
}

// main.cpp
#include "Wave.h"

#include <iostream>
#include <string>


int main() {
    std::string input;
    int sampleRate, numChannels, bitDepth;
    std::string output;

    try {
        std::cout << "Input Filename: ";
        std::cin >> input;
        Wave wave(input.c_str());

        std::cout << "SampleRate: ";
        std::cin >> sampleRate;
        std::cout << "Channels: ";
        std::cin >> numChannels;
        std::cout << "BitDepth: ";
        std::cin >> bitDepth;
        wave.Convert(sampleRate, numChannels, bitDepth);

        std::cout << "Output Filename: ";
        std::cin >> output;
        wave.Save(output.c_str());
    } catch (const WaveException& e) {
        std::cerr << "Wave Exception: " << e.what() << std::endl;
    } catch (const std::exception& e) {
        std::cout << "Standard Exception: " << e.what() << std::endl;
    }

    std::cout << "Done!" << std::endl;
    std::cin.get();

    return 0;
}

마무리

아주 간단한 방법으로 오디오 포맷 변경 프로그램을 작성해 보았습니다. 하지만 지금 작성한 코드는 Aliasing을 없애기 위한 필터와 채널 수의 변경, 더 진보된 Interpolation과 Decimation, 더 빠른 작업을 위한 SIMD 도입 등 개선할 부분이 정말 많습니다. 그럼에도 불구하고 이정도 코드로도 꽤 그럴듯한 오디오 포맷 변경이 가능한 것을 확인하실 수 있습니다.

다음 챕터에서는 이번에 작성한 Wave 클래스와 WASAPI를 사용해서 WAV 파일을 스피커로 재생하는 방법을 알아보겠습니다.

긴 글 읽어주셔서 감사합니다. 다음 포스트에서 뵙겠습니다!