C++ STL 算法 transform 的数据转换应用

1. transform 算法概述

在 C++ 标准模板库（STL）中，transform 是一个非常实用的算法，它定义在 <algorithm> 头文件中。transform 算法的主要作用是将一个范围内的元素按照指定的规则进行转换，并将结果存储到另一个范围中。这种转换操作在许多实际编程场景中都非常有用，例如数据预处理、数据格式转换等。

transform 算法有两种重载形式：

第一种形式：

template<class InputIt, class OutputIt, class UnaryOperation>
OutputIt transform(InputIt first1, InputIt last1, OutputIt d_first,
                   UnaryOperation unary_op);

此形式将 [first1, last1) 范围内的每个元素应用 unary_op 一元操作，并将结果依次存储到从 d_first 开始的输出范围中。返回值是输出范围中最后一个被写入元素之后的位置。

第二种形式：

template<class InputIt1, class InputIt2, class OutputIt, class BinaryOperation>
OutputIt transform(InputIt1 first1, InputIt1 last1, InputIt2 first2,
                   OutputIt d_first, BinaryOperation binary_op);

此形式将 [first1, last1) 和 [first2, first2 + (last1 - first1)) 这两个范围内的对应元素应用 binary_op 二元操作，并将结果依次存储到从 d_first 开始的输出范围中。同样，返回值是输出范围中最后一个被写入元素之后的位置。

2. 一元操作的 transform 应用

2.1 简单数据类型转换

假设我们有一个 std::vector<int> 存储了一些整数，现在我们想将每个整数转换为它的平方值。这时候就可以使用 transform 算法结合一元函数对象（或者函数指针、lambda 表达式）来实现。

#include <iostream>
#include <vector>
#include <algorithm>

int square(int num) {
    return num * num;
}

int main() {
    std::vector<int> numbers = {1, 2, 3, 4, 5};
    std::vector<int> squared_numbers(numbers.size());

    std::transform(numbers.begin(), numbers.end(), squared_numbers.begin(), square);

    for (int num : squared_numbers) {
        std::cout << num << " ";
    }
    std::cout << std::endl;

    return 0;
}

在上述代码中，我们定义了一个 square 函数，它接受一个整数并返回其平方值。然后我们使用 std::transform 将 numbers 向量中的每个元素进行平方运算，并将结果存储到 squared_numbers 向量中。最后通过遍历 squared_numbers 向量输出结果。

2.2 字符串大小写转换

transform 还可以用于字符串的大小写转换。例如，我们想将一个字符串中的所有字符转换为大写。C++ 的 <cctype> 头文件提供了 toupper 和 tolower 函数来进行字符大小写转换。

#include <iostream>
#include <string>
#include <algorithm>
#include <cctype>

int main() {
    std::string str = "hello world";
    std::string upper_str(str.size());

    std::transform(str.begin(), str.end(), upper_str.begin(), [](unsigned char c) {
        return std::toupper(c);
    });

    std::cout << upper_str << std::endl;

    return 0;
}

在这段代码中，我们使用 std::transform 和一个 lambda 表达式，将 str 字符串中的每个字符通过 std::toupper 函数转换为大写，并存储到 upper_str 字符串中。注意在 lambda 表达式中，参数 c 被声明为 unsigned char 类型，这是因为 std::toupper 的参数类型是 int，而 char 在某些系统上可能是有符号的，直接传递 char 可能导致未定义行为。

3. 二元操作的 transform 应用

3.1 向量元素相加

假设有两个向量，我们想将它们对应位置的元素相加，得到一个新的向量。这可以通过 transform 的二元操作形式轻松实现。

#include <iostream>
#include <vector>
#include <algorithm>

int main() {
    std::vector<int> vec1 = {1, 2, 3};
    std::vector<int> vec2 = {4, 5, 6};
    std::vector<int> result(vec1.size());

    std::transform(vec1.begin(), vec1.end(), vec2.begin(), result.begin(),
                   [](int a, int b) { return a + b; });

    for (int num : result) {
        std::cout << num << " ";
    }
    std::cout << std::endl;

    return 0;
}

在上述代码中，我们使用 std::transform 的二元操作形式，将 vec1 和 vec2 向量对应位置的元素相加，并将结果存储到 result 向量中。lambda 表达式 [](int a, int b) { return a + b; } 定义了二元操作的规则，即两个整数相加。

3.2 字符串拼接

我们可以利用 transform 的二元操作来实现字符串的拼接。假设我们有两个字符串向量，我们想将它们对应位置的字符串拼接起来。

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>

int main() {
    std::vector<std::string> str_vec1 = {"Hello", "World"};
    std::vector<std::string> str_vec2 = {" ", "!"};
    std::vector<std::string> result_vec(str_vec1.size());

    std::transform(str_vec1.begin(), str_vec1.end(), str_vec2.begin(), result_vec.begin(),
                   [](const std::string& a, const std::string& b) { return a + b; });

    for (const std::string& str : result_vec) {
        std::cout << str << std::endl;
    }

    return 0;
}

在这段代码中，通过 std::transform 的二元操作，将 str_vec1 和 str_vec2 向量中对应位置的字符串通过 + 运算符拼接起来，并存储到 result_vec 向量中。这里使用了 lambda 表达式来定义字符串拼接的二元操作。

4. 自定义数据类型的转换

4.1 结构体类型的转换

假设我们有一个自定义的结构体 Point 表示二维平面上的点，包含 x 和 y 坐标。现在我们想将一组 Point 对象的 x 和 y 坐标都乘以一个缩放因子。

#include <iostream>
#include <vector>
#include <algorithm>

struct Point {
    int x;
    int y;
};

Point scale_point(const Point& p, double factor) {
    return {static_cast<int>(p.x * factor), static_cast<int>(p.y * factor)};
}

int main() {
    std::vector<Point> points = {{1, 2}, {3, 4}, {5, 6}};
    std::vector<Point> scaled_points(points.size());

    double scale_factor = 2.0;
    std::transform(points.begin(), points.end(), scaled_points.begin(),
                   [scale_factor](const Point& p) {
                       return scale_point(p, scale_factor);
                   });

    for (const Point& p : scaled_points) {
        std::cout << "(" << p.x << ", " << p.y << ") ";
    }
    std::cout << std::endl;

    return 0;
}

在上述代码中，我们定义了 Point 结构体和 scale_point 函数，用于将 Point 对象按指定因子缩放。然后通过 std::transform 和 lambda 表达式，将 points 向量中的每个 Point 对象进行缩放，并将结果存储到 scaled_points 向量中。

4.2 类类型的转换

下面以一个更复杂的类 Rectangle 为例，该类表示一个矩形，包含左上角和右下角的 Point 以及颜色信息。我们想将一组矩形的位置进行平移操作。

#include <iostream>
#include <vector>
#include <algorithm>
#include <string>

struct Point {
    int x;
    int y;
};

class Rectangle {
public:
    Point top_left;
    Point bottom_right;
    std::string color;

    Rectangle(const Point& tl, const Point& br, const std::string& col)
        : top_left(tl), bottom_right(br), color(col) {}

    Rectangle translate(int dx, int dy) const {
        Point new_top_left = {top_left.x + dx, top_left.y + dy};
        Point new_bottom_right = {bottom_right.x + dx, bottom_right.y + dy};
        return Rectangle(new_top_left, new_bottom_right, color);
    }
};

int main() {
    std::vector<Rectangle> rectangles = {
        Rectangle({1, 1}, {3, 3}, "red"),
        Rectangle({4, 4}, {6, 6}, "blue")
    };
    std::vector<Rectangle> translated_rectangles(rectangles.size());

    int dx = 2;
    int dy = 2;
    std::transform(rectangles.begin(), rectangles.end(), translated_rectangles.begin(),
                   [dx, dy](const Rectangle& rect) {
                       return rect.translate(dx, dy);
                   });

    for (const Rectangle& rect : translated_rectangles) {
        std::cout << "Top Left: (" << rect.top_left.x << ", " << rect.top_left.y << ") ";
        std::cout << "Bottom Right: (" << rect.bottom_right.x << ", " << rect.bottom_right.y << ") ";
        std::cout << "Color: " << rect.color << std::endl;
    }

    return 0;
}

在这段代码中，Rectangle 类包含了平移操作 translate。通过 std::transform 和 lambda 表达式，对 rectangles 向量中的每个 Rectangle 对象进行平移操作，并将结果存储到 translated_rectangles 向量中。

5. transform 与迭代器

5.1 使用不同类型的输入迭代器

transform 算法可以接受多种类型的输入迭代器，包括 std::vector 的随机访问迭代器、std::list 的双向迭代器等。例如，对于 std::list：

#include <iostream>
#include <list>
#include <algorithm>
#include <cctype>

int main() {
    std::list<char> char_list = {'h', 'e', 'l', 'l', 'o'};
    std::list<char> upper_char_list;

    std::transform(char_list.begin(), char_list.end(), std::back_inserter(upper_char_list),
                   [](unsigned char c) { return std::toupper(c); });

    for (char c : upper_char_list) {
        std::cout << c;
    }
    std::cout << std::endl;

    return 0;
}

在上述代码中，我们使用 std::list 存储字符，并通过 std::transform 将其转换为大写。这里使用了 std::back_inserter 来创建一个输出迭代器，以便将转换后的字符插入到 upper_char_list 中。

5.2 输出迭代器的选择

除了像 std::back_inserter 这样的插入迭代器，transform 还可以使用其他输出迭代器。例如，如果我们想将转换结果存储到一个数组中：

#include <iostream>
#include <algorithm>
#include <cctype>

int main() {
    char str[] = "world";
    char upper_str[6];

    std::transform(str, str + 5, upper_str, [](unsigned char c) { return std::toupper(c); });
    upper_str[5] = '\0';

    std::cout << upper_str << std::endl;

    return 0;
}

在这段代码中，我们直接将转换后的字符存储到 upper_str 数组中。需要注意的是，在存储完转换后的字符后，要手动在数组末尾添加字符串结束符 '\0'。

6. 性能与优化

6.1 避免不必要的复制

在使用 transform 进行自定义数据类型转换时，如果自定义类型较大，要注意避免不必要的复制。例如，在前面 Rectangle 类的例子中，translate 方法返回一个新的 Rectangle 对象，这会导致复制操作。可以通过返回 std::unique_ptr<Rectangle> 或者使用移动语义来优化性能。

#include <iostream>
#include <vector>
#include <algorithm>
#include <memory>
#include <string>

struct Point {
    int x;
    int y;
};

class Rectangle {
public:
    Point top_left;
    Point bottom_right;
    std::string color;

    Rectangle(const Point& tl, const Point& br, const std::string& col)
        : top_left(tl), bottom_right(br), color(col) {}

    std::unique_ptr<Rectangle> translate(int dx, int dy) const {
        Point new_top_left = {top_left.x + dx, top_left.y + dy};
        Point new_bottom_right = {bottom_right.x + dx, bottom_right.y + dy};
        return std::make_unique<Rectangle>(new_top_left, new_bottom_right, color);
    }
};

int main() {
    std::vector<std::unique_ptr<Rectangle>> rectangles = {
        std::make_unique<Rectangle>({1, 1}, {3, 3}, "red"),
        std::make_unique<Rectangle>({4, 4}, {6, 6}, "blue")
    };
    std::vector<std::unique_ptr<Rectangle>> translated_rectangles(rectangles.size());

    int dx = 2;
    int dy = 2;
    std::transform(rectangles.begin(), rectangles.end(), translated_rectangles.begin(),
                   [dx, dy](const std::unique_ptr<Rectangle>& rect) {
                       return rect->translate(dx, dy);
                   });

    for (const auto& rect : translated_rectangles) {
        std::cout << "Top Left: (" << rect->top_left.x << ", " << rect->top_left.y << ") ";
        std::cout << "Bottom Right: (" << rect->bottom_right.x << ", " << rect->bottom_right.y << ") ";
        std::cout << "Color: " << rect->color << std::endl;
    }

    return 0;
}

在上述优化后的代码中，translate 方法返回一个 std::unique_ptr<Rectangle>，这样在 transform 过程中避免了不必要的对象复制，提高了性能。

6.2 并行化 transform

在 C++17 及更高版本中，可以使用并行算法来加速 transform 操作。例如，使用 std::execution::par 策略：

#include <iostream>
#include <vector>
#include <algorithm>
#include <execution>

int square(int num) {
    return num * num;
}

int main() {
    std::vector<int> numbers(1000000);
    for (size_t i = 0; i < numbers.size(); ++i) {
        numbers[i] = i + 1;
    }
    std::vector<int> squared_numbers(numbers.size());

    std::transform(std::execution::par, numbers.begin(), numbers.end(), squared_numbers.begin(), square);

    std::cout << "First squared number: " << squared_numbers[0] << std::endl;

    return 0;
}

在这段代码中，通过 std::execution::par 策略，std::transform 会以并行方式执行，利用多核处理器的优势加速计算。对于大规模数据的转换，这种并行化操作可以显著提高性能。

7. 常见错误与陷阱

7.1 输出范围大小不匹配

在使用 transform 时，一定要确保输出范围足够大，能够容纳所有转换后的元素。例如：

#include <iostream>
#include <vector>
#include <algorithm>

int square(int num) {
    return num * num;
}

int main() {
    std::vector<int> numbers = {1, 2, 3, 4, 5};
    std::vector<int> squared_numbers(3); // 输出范围过小

    std::transform(numbers.begin(), numbers.end(), squared_numbers.begin(), square);

    // 这可能导致未定义行为，因为 squared_numbers 没有足够空间存储所有结果
    for (int num : squared_numbers) {
        std::cout << num << " ";
    }
    std::cout << std::endl;

    return 0;
}

在上述代码中，squared_numbers 向量的大小设置为 3，小于 numbers 向量的大小 5，这会导致在 transform 操作时写入超出 squared_numbers 范围的内存，从而引发未定义行为。

7.2 迭代器失效问题

在对容器进行某些操作（如插入、删除元素）后，容器的迭代器可能会失效。如果在迭代器失效后继续使用它进行 transform 操作，会导致程序崩溃或未定义行为。例如：

#include <iostream>
#include <vector>
#include <algorithm>

int main() {
    std::vector<int> numbers = {1, 2, 3, 4, 5};
    auto it = numbers.begin();

    numbers.erase(numbers.begin() + 2); // 迭代器 it 失效

    std::transform(it, numbers.end(), numbers.begin(), [](int num) { return num * 2; });
    // 这里使用失效的迭代器 it，会导致未定义行为

    for (int num : numbers) {
        std::cout << num << " ";
    }
    std::cout << std::endl;

    return 0;
}

在这段代码中，erase 操作使 it 迭代器失效，后续在 transform 中使用 it 会导致未定义行为。要避免这种情况，在对容器进行可能导致迭代器失效的操作后，应重新获取有效的迭代器。

通过深入了解 transform 算法的各种应用场景、结合迭代器的使用、注意性能优化以及避免常见错误，我们能够在 C++ 编程中更加灵活高效地使用 transform 进行数据转换，提升程序的质量和性能。无论是处理简单数据类型还是复杂的自定义类型，transform 都为我们提供了强大而便捷的数据处理工具。