Meshes: Sort / Reorder arrays referencing shared entries of another for cache efficiency

Question

Meshes: Sort / Reorder arrays referencing shared entries of another for cache efficiency

Given an array of vertices: {v1, v2, v3, v4, v5, ..., vN}

And the polygons K indexing it with such blocks for a 4-sided polygon pattern *: {v7, v2, v51, v16}

_{Please note that two or more polygons can have the same vertex. In fact, most vertices will be separated by 4-6 polygons (valency 4 for quads, valency 6 for triangular meshes).}

... how can we efficiently order / sort the vertex data, for example to reduce cache misses when reading the vertices of a given polygon? I am interested in an algorithm that finishes in a reasonable amount of time in more than one, which gives the best results. Even some crude heuristic is better than a completely arbitrary order here.

The idea is to look for something like {v1052, v507213, v63252, v3} as something more similar: {v70, v71, v72, v73}. This ideal is likely to often be unattainable without any outliers due to the number of vertex sections.

Full-fledged solutions are certainly welcome, but I'm only more interested in the name given to a family of algorithms associated with this kind of reorganization of user data at runtime for cache efficiency. I suppose such algorithms should exist, especially in the realm of efficient VBO grid representations, but I couldn't find any suitable search criteria. Will this still be called "sorting"?

I see that there can be two categories of ways to solve this problem: one deals with the intersection of neighbors of the grid, which would be very specific to the presentation of the grid, and the other just for a common access pattern for one set of arrays, indexing or pointing to records in friend. The latter refers to me as a more generalized solution, even if it may not be as effective.

Update:

As suggested, I tried to simply code some of the proposed heuristics instead of talking too much about them algorithmically. This gave me a rough foundation to start with and helped me better understand the problem, but unfortunately with poor results. I tagged this question as C and C ++ as I was more interested in algorithmic research suggestions and possibly snippets in those languages than in providing my own implementation, but here's mine in C ++ just for comparison:

#define _SECURE_SCL 0
#include <iostream>
#include <vector>
#include <cmath>
#include <cassert>
#include <algorithm>
#include <deque>

using namespace std;

static const float pi_f = 3.14159265358979323846f;
enum {poly_size = 3};

class Alloc
{
public:
    Alloc(): head(0)
    {
    }

    ~Alloc()
    {
        purge();
    }

    void purge()
    {
        while (head)
        {
            Pool* next = head->next;
            free(head);
            head = next;
        }
    }

    void* allocate(int size)
    {
        size = (size + 15) & (~0x0f);
        if (head && (head->used + size) < head->capacity)
        {
            void* mem = head->mem + head->used;
            head->used += size;
            return mem;
        }
        int new_pool_size = 4096;
        if (size > new_pool_size)
            new_pool_size = size;
        Pool* new_pool = static_cast<Pool*>(malloc(sizeof(Pool) + new_pool_size));
        new_pool->used = size;
        new_pool->capacity = new_pool_size;
        new_pool->next = head;
        head = new_pool;
        return new_pool->mem;
    }

private:
    struct Pool
    {
        Pool* next;
        int used;
        int capacity;
        char mem[1];
    };
    Pool* head;
};

struct Vertex
{
    Vertex(): x(0), y(0), z(0) {}
    float x, y, z;
};

struct VertexPolys
{
    VertexPolys(): num_polys(0), polys(0) {}
    int num_polys;
    int* polys;
};

struct Poly
{
    Poly()
    {
        vertices[0] = -1;
        vertices[1] = -1;
        vertices[2] = -1;
    }
    int vertices[poly_size];
};

struct IndexSet
{
    explicit IndexSet(int n): num(0), data(n), used(n, false) {}

    void insert(int index)
    {
        if (!used[index])
        {
            data[num++] = index;
            used[index] = true;
        }
    }

    int num;
    vector<int> data;
    vector<char> used;
};

struct Mesh
{
    void reorder_vertices(const vector<int>& new_order)
    {
        assert(new_order.size() == vertices.size());
        vector<int> to_new_order(new_order.size());
        for (size_t i=0; i < new_order.size(); ++i)
            to_new_order[new_order[i]] = i;

        for (size_t i=0; i < polys.size(); ++i)
        {
            Poly& poly = polys[i];
            for (int j=0; j < poly_size; ++j)
                poly.vertices[j] = to_new_order[poly.vertices[j]];
        }
        vector<Vertex> reordered_vertices(vertices.size());
        for (size_t i=0; i < new_order.size(); ++i)
            reordered_vertices[i] = vertices[new_order[i]];
        vertices.swap(reordered_vertices);
    }

    vector<Vertex> vertices;
    vector<Poly> polys;
    vector<VertexPolys> vertex_polys;
};

static void create_sphere(Mesh* mesh, float radius, int rings, int sectors)
{
    const float ring_step = 1.0f / (float)(rings-1);
    const float side_step = 1.0f / (float)(sectors-1);
    const int total_verts = rings * sectors;

    // Create sphere vertices.
    vector<int> indices;
    indices.reserve(total_verts);
    for (int r=0; r < rings; ++r) 
    {
        for (int s=0; s < sectors; ++s) 
        {           
            indices.push_back(mesh->vertices.size());
            const float x = cos(2*pi_f * s * side_step) * sin(pi_f * r * ring_step);
            const float y = sin(-pi_f/2 + pi_f * r * ring_step);
            const float z = sin(2*pi_f * s * side_step) * sin(pi_f * r * ring_step);
            Vertex new_vertex;
            new_vertex.x = x * radius;
            new_vertex.y = y * radius;
            new_vertex.z = z * radius;
            mesh->vertices.push_back(new_vertex);
        }
    }

    // Create sphere triangles.
    for (int r=0; r < rings-1; ++r) 
    {
        for (int s=0; s < sectors-1; ++s) 
        {
            int npv[4] = 
            {
                r * sectors + s,
                r * sectors + (s+1),
                (r+1) * sectors + (s+1),
                (r+1) * sectors + s
            };
            for (int j = 0; j < 4; ++j)
                npv[j] = indices[npv[j] % total_verts];

            Poly new_poly1;
            new_poly1.vertices[0] = npv[0];
            new_poly1.vertices[1] = npv[1];
            new_poly1.vertices[2] = npv[2];
            mesh->polys.push_back(new_poly1);

            Poly new_poly2;
            new_poly2.vertices[0] = npv[2];
            new_poly2.vertices[1] = npv[3];
            new_poly2.vertices[2] = npv[0];
            mesh->polys.push_back(new_poly2);
        }
    }
}

static Mesh create_mesh(Alloc& alloc)
{
    Mesh mesh;
    create_sphere(&mesh, 10.0f, 100, 100);

    // Shuffle the vertex order to make it all random (this tends
    // to reflect a real-world model better which can get very arbitrary
    // in terms of its vertex/polygon creation order).
    vector<int> new_vertex_order(mesh.vertices.size());
    for (size_t j=0; j < mesh.vertices.size(); ++j)
        new_vertex_order[j] = static_cast<int>(j);
    random_shuffle(new_vertex_order.begin(), new_vertex_order.end());
    mesh.reorder_vertices(new_vertex_order);

    // Count the number of polygons connected to each vertex.
    mesh.vertex_polys.resize(mesh.vertices.size());
    for (size_t i=0; i < mesh.polys.size(); ++i)
    {
        const Poly& poly = mesh.polys[i];
        for (int j=0; j < poly_size; ++j)
        {
            const int vertex_index = poly.vertices[j];
            ++mesh.vertex_polys[vertex_index].num_polys;
        }
    }

    // Allocate space for the vertex polygons.
    for (size_t i=0; i < mesh.vertex_polys.size(); ++i)
    {
        VertexPolys& vp = mesh.vertex_polys[i];
        vp.polys = static_cast<int*>(alloc.allocate(vp.num_polys * sizeof(int)));
        vp.num_polys = 0;
    }

    // Form the polygons connected to each vertex.
    for (size_t i=0; i < mesh.polys.size(); ++i)
    {
        const Poly& poly = mesh.polys[i];
        for (int j=0; j < poly_size; ++j)
        {
            const int vertex_index = poly.vertices[j];
            VertexPolys& vp = mesh.vertex_polys[vertex_index];
            vp.polys[vp.num_polys++] = i;
        }
    }
    return mesh;
}

static void output_stats(const Mesh& mesh, const char* type)
{
    // Measure the proximity of each pair of vertices in each polygon.
    int num_optimal = 0;
    float prox_sum = 0.0f;
    for (size_t i=0; i < mesh.polys.size(); ++i)
    {
        const Poly& poly = mesh.polys[i];
        const int prox1 = abs(poly.vertices[1] - poly.vertices[0]);
        const int prox2 = abs(poly.vertices[2] - poly.vertices[1]);
        const int prox3 = abs(poly.vertices[0] - poly.vertices[2]);
        const float poly_prox = (prox1 + prox2 + prox3) / 3.0f;
        prox_sum += poly_prox;
        if (prox1 <= 6 && prox2 <= 6 && prox3 <= 6)
            ++num_optimal;
    }
    cout << "-------------------------------------------------" << endl;
    cout << type << endl;
    cout << "-------------------------------------------------" << endl;
    cout << "-- # Vertices: " << mesh.vertices.size() << endl;
    cout << "-- # Polygons: " << mesh.polys.size() << endl;
    cout << "-- # Optimal Polygons: " << num_optimal << endl;
    cout << "-- Average vertex proximity: " << (prox_sum / mesh.polys.size()) << endl;
}

static void basic_optimization(Mesh& mesh)
{
    IndexSet index_set(static_cast<int>(mesh.vertices.size()));
    for (size_t i=0; i < mesh.polys.size(); ++i)
    {
        const Poly& poly = mesh.polys[i];
        for (int j=0; j < poly_size; ++j)
        {
            const int vertex_index = poly.vertices[j];
            index_set.insert(vertex_index);
        }
    }
    mesh.reorder_vertices(index_set.data);
}

static void breadth_first_optimization(Mesh& mesh)
{
    IndexSet index_set(static_cast<int>(mesh.vertices.size()));
    assert(!mesh.polys.empty());

    deque<int> to_process;
    vector<char> processed(mesh.polys.size(), false);
    to_process.push_back(0);
    processed[0] = true;

    while (!to_process.empty())
    {
        const int poly_index = to_process.front();
        to_process.pop_front();

        const Poly& poly = mesh.polys[poly_index];
        for (int i=0; i < poly_size; ++i)
        {
            const int vertex_index = poly.vertices[i];
            index_set.insert(vertex_index);
            const VertexPolys& vp = mesh.vertex_polys[vertex_index];
            for (int j=0; j < vp.num_polys; ++j)
            {
                if (!processed[vp.polys[j]])
                {
                    to_process.push_back(vp.polys[j]);
                    processed[vp.polys[j]] = true;
                }
            }
        }
    }
    mesh.reorder_vertices(index_set.data);
}

int main()
{
    // Linear Heuristic
    {
        Alloc alloc;
        Mesh mesh = create_mesh(alloc);
        basic_optimization(mesh);
        output_stats(mesh, "Linear Heuristic");
    }

    // Breadth-First Heuristic
    {
        Alloc alloc;
        Mesh mesh = create_mesh(alloc);
        breadth_first_optimization(mesh);
        output_stats(mesh, "Breadth-First Graph Heuristic");
    }
}

Results:

-------------------------------------------------
Linear Heuristic
-------------------------------------------------
-- # Vertices: 10000
-- # Polygons: 19602
-- # Optimal Polygons: 198
-- Average vertex proximity: 67.0118
-------------------------------------------------
Breadth-First Graph Heuristic
-------------------------------------------------
-- # Vertices: 10000
-- # Polygons: 19602
-- # Optimal Polygons: 13
-- Average vertex proximity: 88.9976

The nature of the problem seems to boil down to mapping this two-dimensional highly connected topological mesh into a one-dimensional space with good locality. This is why I think these basic algorithms are not doing very well: the connectivity and seamlessness of the mesh makes it impossible to render the mesh very efficiently with a heuristic that traverses the entire mesh.

It's like mapping a texture atlas where we're trying to take a 3D mesh and "unfold" it and map it into flat 2D space. To solve the problem, we often need to introduce seams into the mesh so that we can break it down into two-dimensional islands of connectivity. Here, we similarly move from 2D graphics space to break it up into islands with good locality in 1D space.

Therefore, I believe that the optimal algorithm splits the mesh into segments (sections) in this way and applies these types of algorithms for each segment. Thus, we get good local 1D memory in this segment of the mesh, and pathological cases are outliers at the boundaries of each segment, where the vertices are shared with other segments.

A breadth-first search option could do it roughly if it only went up to width K for a given centroid polygon, P, where K is some small number, before choosing a completely different P. This would isolate small connected two-dimensional islands in the mesh that can be mapped to a one-dimensional memory space with good locality on each island (only poor locality on the boundaries of these islands).

Anyway, I was mainly looking for formal algorithmic names for such a thing. I'm still not sure if this is going to be called "sorting" - maybe I was looking for the wrong thing.

+3

c ++ performance algorithm mesh

Team Upvote May 05 '15 at 1:05

source to share

1 answer

Team Upvote · Accepted Answer · 2015-11-17T14:09:19+0000

I feel a little silly answering my own question and accepting it as an answer, and appreciate all the comments provided!

But I found a class of algorithms and articles addressing this very issue, starting with Tom Lington's Vertex Vertex Cache Optimization as a frequently cited heuristic. Curiously, they often solve these problems by actually mimicking the hardware cache. Here is one such example: http://www-etud.iro.umontreal.ca/~blancher/projects/vertex_caching/vertex_caching_eb.pdf

The problem appears to be NP-complete, but there seem to be a number of practical heuristics.

Techniques like this seem to have the potential to dramatically improve rendering speed and should be applied to general mesh traversal algorithms (with or without GPU involvement and inside or outside the rendering context).

I ended up with a problem rather than a solution, segmenting the grids into smaller related areas (improving the locality of links with small areas of densely packed poly / vert / edge data, rather than improving the indexing order within the large area). But I would like to try these methods in the near future as I had to overcome many obstacles to try to easily bypass all these small segmented areas for the user, and it would be much easier to have one big grid with a more cacheable memory layout when accessing vertices of polygons or edges of vertices or edges, for example,

Since there were no answers yet, I thought I was offering my discoveries for the curious.

Meshes: Sort / Reorder arrays referencing shared entries of another for cache efficiency

Update:

More articles: