[PW_SID:1060320] [v4] Bluetooth: Increase LE connection timeout for industrial sensors#3339
[PW_SID:1060320] [v4] Bluetooth: Increase LE connection timeout for industrial sensors#3339BluezTestBot wants to merge 2 commits intoworkflowfrom
Conversation
This patch adds workflow files for ci: [sync.yml] - The workflow file for scheduled work - Sync the repo with upstream repo and rebase the workflow branch - Review the patches in the patchwork and creates the PR if needed [ci.yml] - The workflow file for CI tasks - Run CI tests when PR is created Signed-off-by: Tedd Ho-Jeong An <tedd.an@intel.com>
In an industrial IoT context at Volvo Group, we use TE Connectivity
BLE pressure sensors. These sensors exhibit high latency during
the initial LE connection handshake in noisy RF environments. The
connection systematically fails on Ubuntu Core 22 (BlueZ) because the
connection attempt is aborted too early.
In the v2 thread, it was suggested that userspace (via setsockopt
SO_SNDTIMEO) dictates the connection timeout (defaulting to 40s),
suspecting that userspace was cutting the connection at 2 seconds,
not the kernel.
To verify this, an empirical test was conducted using the following
Python/Bleak script to force the application timeout to 45.0 seconds:
import asyncio
from bleak import BleakClient, BleakScanner
import time
ADDRESS = "E8:C0:B1:D4:A3:3C"
async def test_connection():
device = await BleakScanner.find_device_by_address(ADDRESS, timeout=15.0)
start_time = time.time()
try:
# Forcing 45s timeout in userspace
async with BleakClient(device, timeout=45.0) as client:
print(f"Connected in {time.time() - start_time:.2f}s")
except Exception as e:
print(f"Failed after {time.time() - start_time:.2f}s: {e}")
asyncio.run(test_connection())
1. Result on UNMODIFIED Kernel: The userspace script patiently waited
for the full 45 seconds before raising a TimeoutError. If the kernel
had actually kept the radio connection attempt alive for those
45 seconds, the connection would have succeeded around the
12.5-second mark (as proven by the patched kernel test below).
The fact that it did not proves that the underlying HCI connection
attempt was aborted early by the kernel. Userspace was blind to this
abort and kept waiting in a vacuum.
2. Result on MODIFIED Kernel (with this patch): Using the exact same
userspace script (45.0s timeout), the connection successfully
established at the 12.51-second mark.
Conclusion:
This proves that the underlying HCI LE Connection creation is bound by
a strict 2-second timeout derived from `conn_timeout` in `hci_conn.c`,
and that userspace socket options do not override this hardcoded HCI
abort in our stack. The sensor physically takes 12.5 seconds to
handshake, making the 2-second kernel limit a hard blocker.
This patch increases the hardcoded LE connection timeout to 20 seconds
to provide a comfortable margin for handshake retries.
Note: If the upstream preference is to not hardcode 20 seconds globally,
I would be happy to submit a v5 that exposes this as a configurable
module parameter (e.g., `le_conn_timeout`).
|
CheckPatch |
|
GitLint |
|
SubjectPrefix |
|
BuildKernel |
|
CheckAllWarning |
|
CheckSparse |
|
BuildKernel32 |
|
TestRunnerSetup |
|
TestRunner_l2cap-tester |
|
TestRunner_iso-tester |
|
TestRunner_bnep-tester |
|
TestRunner_mgmt-tester |
|
TestRunner_rfcomm-tester |
|
TestRunner_sco-tester |
|
TestRunner_ioctl-tester |
|
TestRunner_mesh-tester |
|
TestRunner_smp-tester |
|
TestRunner_userchan-tester |
|
IncrementalBuild |
ab1b299 to
0bcc21a
Compare
In an industrial IoT context at Volvo Group, we use TE Connectivity
BLE pressure sensors. These sensors exhibit high latency during
the initial LE connection handshake in noisy RF environments. The
connection systematically fails on Ubuntu Core 22 (BlueZ) because the
connection attempt is aborted too early.
In the v2 thread, it was suggested that userspace (via setsockopt
SO_SNDTIMEO) dictates the connection timeout (defaulting to 40s),
suspecting that userspace was cutting the connection at 2 seconds,
not the kernel.
To verify this, an empirical test was conducted using the following
Python/Bleak script to force the application timeout to 45.0 seconds:
import asyncio
from bleak import BleakClient, BleakScanner
import time
ADDRESS = "E8:C0:B1:D4:A3:3C"
async def test_connection():
device = await BleakScanner.find_device_by_address(ADDRESS, timeout=15.0)
start_time = time.time()
try:
# Forcing 45s timeout in userspace
async with BleakClient(device, timeout=45.0) as client:
print(f"Connected in {time.time() - start_time:.2f}s")
except Exception as e:
print(f"Failed after {time.time() - start_time:.2f}s: {e}")
asyncio.run(test_connection())
Result on UNMODIFIED Kernel: The userspace script patiently waited
for the full 45 seconds before raising a TimeoutError. If the kernel
had actually kept the radio connection attempt alive for those
45 seconds, the connection would have succeeded around the
12.5-second mark (as proven by the patched kernel test below).
The fact that it did not proves that the underlying HCI connection
attempt was aborted early by the kernel. Userspace was blind to this
abort and kept waiting in a vacuum.
Result on MODIFIED Kernel (with this patch): Using the exact same
userspace script (45.0s timeout), the connection successfully
established at the 12.51-second mark.
Conclusion:
This proves that the underlying HCI LE Connection creation is bound by
a strict 2-second timeout derived from
conn_timeoutinhci_conn.c,and that userspace socket options do not override this hardcoded HCI
abort in our stack. The sensor physically takes 12.5 seconds to
handshake, making the 2-second kernel limit a hard blocker.
This patch increases the hardcoded LE connection timeout to 20 seconds
to provide a comfortable margin for handshake retries.
Note: If the upstream preference is to not hardcode 20 seconds globally,
I would be happy to submit a v5 that exposes this as a configurable
module parameter (e.g.,
le_conn_timeout).net/bluetooth/hci_conn.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)