Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
374db7b
Add Louvain clustering algorithm
Becheler Jan 15, 2026
6ef3267
adding louvain tests to jamfile
Becheler Feb 4, 2026
76efd88
add some comments
Becheler Feb 4, 2026
de9b6a8
Delete scratch/benchmark/run_benchmark.sh
Becheler Feb 4, 2026
385e8c8
Delete scratch/benchmark/bgl_louvain.cpp
Becheler Feb 4, 2026
78d9225
PR review: fixed copyright, local optimization visibility, assertions…
Becheler Feb 9, 2026
422d376
fix: URGB made generic
Becheler Feb 9, 2026
6b278b8
adding LouvainQualityFunctionConcept
Becheler Feb 10, 2026
28721e1
incremental versus non-incremental concepts
Becheler Feb 16, 2026
c5c9ac4
fix wrong namespace
Becheler Feb 16, 2026
24002db
fix unused variables in concepts
Becheler Feb 16, 2026
a02bc0f
incremental and non incremental metrics can lead to different optimiz…
Becheler Feb 16, 2026
0034d3f
Trigger CI
Becheler Feb 16, 2026
e8760cf
incremental and non incremental metrics can lead to different optimiz…
Becheler Feb 16, 2026
7180cd6
fix: no hierarchy_t, free unfold function
Becheler Feb 17, 2026
fb61051
docs
Becheler Feb 17, 2026
af23880
index-based interals and contguous outputs labels
Becheler Feb 23, 2026
967f47b
index-based interals and contguous outputs labels
Becheler Feb 23, 2026
bdedea7
fix dosctrsing
Becheler Feb 23, 2026
4b8a584
specializing std::hash forbidden here
Becheler Feb 23, 2026
f04a9cc
quality functions passed as objects with named methods
Becheler Feb 24, 2026
39c4863
default value for policy
Becheler Feb 24, 2026
a4e2ada
typo
Becheler Feb 24, 2026
ca528f5
updated documentation
Becheler Feb 26, 2026
85f0174
fix: assertion on edge added
Becheler Mar 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
256 changes: 256 additions & 0 deletions doc/louvain_clustering.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<!--
Copyright (c) 2026 Arnaud Becheler

Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at
http://www.boost.org/LICENSE_1_0.txt)
-->
<HTML>
<Head>
<Title>Boost Graph Library: Louvain Clustering</Title>
</Head>
<BODY BGCOLOR="#ffffff" LINK="#0000ee" TEXT="#000000" VLINK="#551a8b"
ALINK="#ff0000">
<IMG SRC="../../../boost.png"
ALT="C++ Boost" width="277" height="86">

<BR Clear>

<H1><A NAME="sec:louvain-clustering"></A>
<TT>louvain_clustering</TT>
</H1>

<PRE>
template &lt;typename QualityFunction = newman_and_girvan,
typename Graph, typename ComponentMap,
typename WeightMap, typename URBG&gt;
typename property_traits&lt;WeightMap&gt;::value_type
louvain_clustering(const Graph&amp; g,
ComponentMap components,
const WeightMap&amp; w,
URBG&amp;&amp; gen,
QualityFunction f = QualityFunction{},
typename property_traits&lt;WeightMap&gt;::value_type min_improvement_inner = 0,
typename property_traits&lt;WeightMap&gt;::value_type min_improvement_outer = 0);
</PRE>

<P>
This algorithm implements the Louvain method for community detection
[<a href="#references">1</a>]. It finds a partition of the vertices into communities
that approximately maximizes a quality function (by default,
<a href="louvain_quality_functions.html#newman_and_girvan">Newman&ndash;Girvan
modularity</a>).

<P>The algorithm alternates two phases:
<OL>
<LI><B>Local optimization.</B> Each vertex is moved to the neighboring
community that yields the largest improvement in the quality function.
Vertices are visited in random order and the process repeats until no
single-vertex move improves the quality by more than
<TT>min_improvement_inner</TT>.

<LI><B>Aggregation.</B> The graph is contracted by collapsing each
community into a single super-vertex. Edge weights between
super-vertices are the sums of the original inter-community edge
weights and self-loops carry the total intra-community weight.
</OL>

<P> These two phases are applied repeatedly on the coarsened graph,
discovering communities of communities, until
the quality improvement between successive levels falls below
<TT>min_improvement_outer</TT>, or the graph can no longer be
coarsened.

<P> Once every level has converged, the algorithm iterates
from the coarsest aggregated graph down to the original graph to
trace assignment of vertices to communities to produce the final
community label written into <TT>components</TT>.

<P> The speed of the local optimization phase depends on the quality
function's interface. A quality function that only models
<a href="louvain_quality_functions.html#base_concept">
<TT>GraphPartitionQualityFunctionConcept</TT></a> requires a full
O(V+E) recomputation of the quality for every candidate vertex move.
A quality function that also models
<a href="louvain_quality_functions.html#incremental_concept">
<TT>GraphPartitionQualityFunctionIncrementalConcept</TT></a>
evaluates each candidate move in O(1) using incremental
bookkeeping, making the total cost per vertex O(degree).
The algorithm detects which interface is available at
compile time and selects the appropriate code path automatically.

<H3>Where Defined</H3>

<P>
<a href="../../../boost/graph/louvain_clustering.hpp"><TT>boost/graph/louvain_clustering.hpp</TT></a>

<H3>Parameters</H3>

IN: <tt>const Graph&amp; g</tt>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm has the additional requirement that vertices are copyable, hashable etc., as they're internally stored in unordered_sets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I have been changing the vertices handling in this aspect because it was not friendly with some types of graphs. The interface now takes a VertexIndexMap but I still have to commit those changes, sorry 😓
I will update the documentation in that sense once I merged the new stuff

<blockquote>
An undirected graph. Must model
<a href="VertexListGraph.html">Vertex List Graph</a> and
<a href="IncidenceGraph.html">Incidence Graph</a>.
The graph is not modified by the algorithm.
Passing a directed graph produces a compile-time error.
</blockquote>

OUT: <tt>ComponentMap components</tt>
<blockquote>
Records the community each vertex belongs to. After the call,
<tt>get(components, v)</tt> returns a contiguous integer label
in the range [0,&nbsp;<i>k</i>) where <i>k</i> is the number of
communities found. Two vertices with the same label are in the
same community. This matches the convention used by
<a href="connected_components.html"><tt>connected_components</tt></a>.<br>
Must model
<a href="../../property_map/doc/ReadWritePropertyMap.html">Read/Write
Property Map</a> with the graph's vertex descriptor as key type
and an integer type (e.g.&nbsp;<tt>std::size_t</tt>) as value type.
</blockquote>

IN: <tt>const WeightMap&amp; w</tt>
<blockquote>
Edge weights. Must model
<a href="../../property_map/doc/ReadablePropertyMap.html">Readable
Property Map</a> with the graph's edge descriptor as key type.
Weights must be non-negative.
</blockquote>

IN: <tt>URBG&amp;&amp; gen</tt>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to provide a default arg for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what it would be. And also it would differ from what i've seen in random.hpp utilities:
https://github.com/boostorg/graph/blob/3131c24630e42c79b43c1f32558041c219ab84b8/include/boost/graph/random.hpp

<blockquote>
A random number generator used to shuffle the vertex processing
order at each pass. Any type meeting the C++
<i>UniformRandomBitGenerator</i> requirements works
(e.g.&nbsp;<tt>std::mt19937</tt>).
</blockquote>

IN: <tt>QualityFunction f</tt>
<blockquote>
An instance of the quality function to use for evaluating and
incrementally updating partition quality.<br>
<b>Default:</b> <tt>QualityFunction{}</tt>
</blockquote>

IN: <tt>weight_type min_improvement_inner</tt>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Louvain guaranteed to converge (i.e. to eventually stop)? If this is not clear, does it make sense to provide additional hard limits on the number of inner/outer iterations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand well it's guaranteed to terminate in theory:

  • the quality (modularity) is monotically non-decreasing through the algorithm
  • becasue the algorithm is supposed to only accept nodes moves and community merges that strictly improve (or does not decrease in some variations) the quality of the partition
  • modularity is bounded <=1
  • the number of partition is finite

That being said there is the case of large graphs and the trouble on floating point precision.

  • For very large graphs maybe it would make sense to have some sense of async task: "please dear louvain, aggregate this for some time and when I'm done with waiting give me the last aggreagated graph you had". But that sounds like a very different interface ?
  • The current API is still more flexible than igraph and genlouvain, that do not offer parametrization of the stopping condition (igraph has 0 for inner and outer thresholds and genlouvain has 10-6, fixed)

Am i making sense ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental output

There is a general interface for keeping only the last value, but it's not a pattern I've actually ever seen in practice.

Instead of returning a single value at the end (either by return value or in/out parameter), emit values by output iterator as they are calculated. Now, normally values emitted by output iterator are all kept by pushing back on a vector for example. But with a "clobbering" output iterator that always writes to the same location, the user can choose to just keep the last value. Neat, huh?

However, that kind of updating output value is only necessary if we do actually allow the user to specify an early termination condition. (Which, by the way, is orthogonal to asynchrony. At least for now, until we start supporting C++20 and start returning futures from algorithms. If the user wants to run the algorithm in the background, they use std::async.) Usually that kind of user-specified condition should come in the form of a callback predicate, so the user can either use number of iterations or time elapsed and it is opaque to the algorithm. The algorithm would provide a default predicate that does whatever is the most reasonable thing to do, which might be applying no early termination condition at all.

Floating-point

This is a huge topic, but in short we don't want to let floating-point accuracy cause our algorithm to run indefinitely. The basics are a) using a robust summation algorithm such as Kahan's, which is in Boost.Accumulators, and ... I can't remember what else. :) Looking through the calculations to find subtractions that could be between two extremely close values. I might be able to provide more ideas later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your ideas ! 😄
That sounds like matter for a next iteration on the algorithm, right ? Right now the termination criteria is fairly simple and avoids the issue of floating point precision ( Qlast - Qnow < threshold) so we should be ok.
The only thing it affects is that on a same graph, brute force quality computation and incremental computations lead to somewhat different values:
image

It's still not clear how this difference in partition quality computation impacts the underlying partition optimization path, but I doubt it is very impacting (algthough I should check this out, but then I would need to find a metric to compare partitions, like Normilized Mutualize Information or Variation of Information

<blockquote>
The inner loop (local optimization) stops when a full pass over
all vertices improves quality by less than this value.<br>
<b>Default:</b> <tt>0</tt>
</blockquote>

IN: <tt>weight_type min_improvement_outer</tt>
<blockquote>
The outer loop (aggregation) stops when quality improves by less
than this value between successive levels.<br>
<b>Default:</b> <tt>0</tt>
</blockquote>

<H3>Template Parameters</H3>

<tt>QualityFunction</tt>
<blockquote>
The partition quality metric to maximize. Must model
<a href="louvain_quality_functions.html#base_concept">
<tt>GraphPartitionQualityFunctionConcept</tt></a>. If it also models
<a href="louvain_quality_functions.html#incremental_concept">
<tt>GraphPartitionQualityFunctionIncrementalConcept</tt></a>, the
faster incremental code path is selected automatically.<br>
<b>Default:</b>
<tt><a href="louvain_quality_functions.html#newman_and_girvan">newman_and_girvan</a></tt>
</blockquote>

<H3>Return Value</H3>
<P>The quality (e.g.&nbsp;modularity) of the best partition found.
For Newman&ndash;Girvan modularity this is a value in
[&minus;0.5,&nbsp;1).

<H3>Complexity</H3>
<P>With the incremental quality function (the default), each local
optimization pass costs O(E) since every vertex is visited once and
each visit scans its neighbors. With a non-incremental quality function,
each candidate move requires a full O(V+E) traversal, making each pass
O(E&nbsp;&middot;&nbsp;(V+E)). The number of passes per level and the
number of aggregation levels are both small in practice, so the
incremental path typically runs in O(E&nbsp;log&nbsp;V) overall on
sparse graphs.

<H3>Preconditions</H3>
<UL>
<LI>The graph must be undirected (enforced at compile time).
<LI>Edge weights must be non-negative.
<LI>The graph must have a <TT>vertex_index</TT> property mapping
vertices to contiguous integers in
[0,&nbsp;<TT>num_vertices(g)</TT>).
</UL>

<H3>Example</H3>
<PRE>
#include &lt;boost/graph/adjacency_list.hpp&gt;
#include &lt;boost/graph/louvain_clustering.hpp&gt;
#include &lt;random&gt;
#include &lt;iostream&gt;

int main()
{
using Graph = boost::adjacency_list&lt;
boost::vecS, boost::vecS, boost::undirectedS,
boost::no_property,
boost::property&lt;boost::edge_weight_t, double&gt;&gt;;

// Two triangles connected by a weak bridge
Graph g(6);
boost::add_edge(0, 1, 1.0, g);
boost::add_edge(1, 2, 1.0, g);
boost::add_edge(0, 2, 1.0, g);
boost::add_edge(3, 4, 1.0, g);
boost::add_edge(4, 5, 1.0, g);
boost::add_edge(3, 5, 1.0, g);
boost::add_edge(2, 3, 0.1, g);

std::vector&lt;std::size_t&gt; communities(boost::num_vertices(g));
auto cmap = boost::make_iterator_property_map(
communities.begin(), boost::get(boost::vertex_index, g));

std::mt19937 rng(42);
double Q = boost::louvain_clustering(
g, cmap, boost::get(boost::edge_weight, g), rng);

std::cout &lt;&lt; "Modularity: " &lt;&lt; Q &lt;&lt; "\n";
for (auto v : boost::make_iterator_range(boost::vertices(g)))
std::cout &lt;&lt; " vertex " &lt;&lt; v
&lt;&lt; " -&gt; community " &lt;&lt; boost::get(cmap, v) &lt;&lt; "\n";
}
</PRE>

<H3>See Also</H3>
<P>
<a href="louvain_quality_functions.html">Louvain Quality Function Concepts</a>,
<a href="bc_clustering.html"><TT>betweenness_centrality_clustering</TT></a>

<H3>References</H3>
<a name="references"></a>
<P>[1] V.&nbsp;D.&nbsp;Blondel, J.&#8209;L.&nbsp;Guillaume,
R.&nbsp;Lambiotte, and E.&nbsp;Lefebvre,
&ldquo;Fast unfolding of communities in large networks,&rdquo;
<i>Journal of Statistical Mechanics: Theory and Experiment</i>,
vol.&nbsp;2008, no.&nbsp;10, P10008, 2008.
<a href="https://doi.org/10.1088/1742-5468/2008/10/P10008">doi:10.1088/1742-5468/2008/10/P10008</a>

<P>[2] V.&nbsp;A.&nbsp;Traag, L.&nbsp;Waltman, and
N.&nbsp;J.&nbsp;van&nbsp;Eck,
&ldquo;From Louvain to Leiden: guaranteeing well-connected communities,&rdquo;
<i>Scientific Reports</i>, vol.&nbsp;9, 5233, 2019.
<a href="https://doi.org/10.1038/s41598-019-41695-z">doi:10.1038/s41598-019-41695-z</a>

<BR>
<HR>
<TABLE>
<TR valign=top>
<TD nowrap>Copyright &copy; 2026</TD><TD>
Arnaud Becheler
</TD></TR></TABLE>

</BODY>
</HTML>
Loading
Loading