Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
332b9f9
Revert "NVIDIA: VR: SAUCE: perf vendor events arm64: Add Tegra410 Oly…
nvmochs Apr 17, 2026
2dcaf0b
Revert "NVIDIA: VR: SAUCE: perf: add NVIDIA Tegra410 C2C PMU"
nvmochs Apr 17, 2026
54c6101
Revert "NVIDIA: VR: SAUCE: perf: add NVIDIA Tegra410 CPU Memory Laten…
nvmochs Apr 17, 2026
d007bc5
Revert "NVIDIA: VR: SAUCE: perf/arm_cspmu: nvidia: Add Tegra410 PCIE-…
nvmochs Apr 17, 2026
07a7992
Revert "NVIDIA: VR: SAUCE: perf/arm_cspmu: nvidia: Add Tegra410 PCIE …
nvmochs Apr 17, 2026
08c6144
Revert "NVIDIA: VR: SAUCE: perf/arm_cspmu: Add arm_cspmu_acpi_dev_get"
nvmochs Apr 17, 2026
02a9cdc
Revert "NVIDIA: VR: SAUCE: perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU"
nvmochs Apr 17, 2026
5178a81
Revert "NVIDIA: VR: SAUCE: perf/arm_cspmu: nvidia: Rename doc to Tegr…
nvmochs Apr 17, 2026
4729da4
perf/arm_cspmu: nvidia: Rename doc to Tegra241
bwicaksononv Mar 24, 2026
2eadbf6
perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU
bwicaksononv Mar 24, 2026
1f7f669
perf/arm_cspmu: Add arm_cspmu_acpi_dev_get
bwicaksononv Mar 24, 2026
f0aab13
perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
bwicaksononv Mar 24, 2026
1ba0cbb
perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU
bwicaksononv Mar 24, 2026
ae82af0
perf: add NVIDIA Tegra410 CPU Memory Latency PMU
bwicaksononv Mar 24, 2026
526e580
perf: add NVIDIA Tegra410 C2C PMU
bwicaksononv Mar 24, 2026
5e154c9
perf vendor events arm64: Add Tegra410 Olympus PMU events
bwicaksononv Feb 12, 2026
cd5faf6
NVIDIA: VR: SAUCE: perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA Olympus
bwicaksononv May 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 16 additions & 14 deletions Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,9 @@ Example usage:
PCIE PMU
--------

This PMU monitors all read/write traffic from the root port(s) or a particular
BDF in a PCIE root complex (RC) to local or remote memory. There is one PMU per
This PMU is located in the SOC fabric connecting the PCIE root complex (RC) and
the memory subsystem. It monitors all read/write traffic from the root port(s)
or a particular BDF in a PCIE RC to local or remote memory. There is one PMU per
PCIE RC in the SoC. Each RC can have up to 16 lanes that can be bifurcated into
up to 8 root ports. The traffic from each root port can be filtered using RP or
BDF filter. For example, specifying "src_rp_mask=0xFF" means the PMU counter will
Expand All @@ -132,7 +133,7 @@ latency:
* rd_bytes: count the number of bytes transferred by rd_req.
* wr_bytes: count the number of bytes transferred by wr_req.
* rd_cum_outs: count outstanding rd_req each cycle.
* cycles: counts the PCIE cycles.
* cycles: count the clock cycles of SOC fabric connected to the PCIE interface.

The average bandwidth is calculated as::

Expand Down Expand Up @@ -280,15 +281,16 @@ Example output::
PCIE-TGT PMU
------------

The PCIE-TGT PMU monitors traffic targeting PCIE BAR and CXL HDM ranges.
There is one PCIE-TGT PMU per PCIE root complex (RC) in the SoC. Each RC in
Tegra410 SoC can have up to 16 lanes that can be bifurcated into up to 8 root
ports (RP). The PMU provides RP filter to count PCIE BAR traffic to each RP and
address filter to count access to PCIE BAR or CXL HDM ranges. The details
of the filters are described in the following sections.
This PMU is located in the SOC fabric connecting the PCIE root complex (RC) and
the memory subsystem. It monitors traffic targeting PCIE BAR and CXL HDM ranges.
There is one PCIE-TGT PMU per PCIE RC in the SoC. Each RC in Tegra410 SoC can
have up to 16 lanes that can be bifurcated into up to 8 root ports (RP). The PMU
provides RP filter to count PCIE BAR traffic to each RP and address filter to
count access to PCIE BAR or CXL HDM ranges. The details of the filters are
described in the following sections.

Mapping the RC# to lspci segment number is similar to the PCIE PMU.
Please see :ref:`NVIDIA_T410_PCIE_PMU_RC_Mapping_Section` for more info.
Mapping the RC# to lspci segment number is similar to the PCIE PMU. Please see
:ref:`NVIDIA_T410_PCIE_PMU_RC_Mapping_Section` for more info.

The events and configuration options of this PMU device are available in sysfs,
see /sys/bus/event_source/devices/nvidia_pcie_tgt_pmu_<socket-id>_rc_<pcie-rc-id>.
Expand All @@ -299,7 +301,7 @@ The events in this PMU can be used to measure bandwidth and utilization:
* wr_req: count the number of write requests to PCIE.
* rd_bytes: count the number of bytes transferred by rd_req.
* wr_bytes: count the number of bytes transferred by wr_req.
* cycles: counts the PCIE cycles.
* cycles: count the clock cycles of SOC fabric connected to the PCIE interface.

The average bandwidth is calculated as::

Expand Down Expand Up @@ -350,8 +352,8 @@ Example usage:
CPU Memory (CMEM) Latency PMU
-----------------------------

This PMU monitors latency events of memory read requests to local
CPU DRAM:
This PMU monitors latency events of memory read requests from the edge of the
Unified Coherence Fabric (UCF) to local CPU DRAM:

* RD_REQ counters: count read requests (32B per request).
* RD_CUM_OUTS counters: accumulated outstanding request counter, which track
Expand Down
2 changes: 1 addition & 1 deletion drivers/perf/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ config MARVELL_PEM_PMU

config NVIDIA_TEGRA410_CMEM_LATENCY_PMU
tristate "NVIDIA Tegra410 CPU Memory Latency PMU"
depends on ARM64
depends on ARM64 && ACPI
help
Enable perf support for CPU memory latency counters monitoring on
NVIDIA Tegra410 SoC.
Expand Down
11 changes: 3 additions & 8 deletions drivers/perf/arm_cspmu/arm_cspmu.c
Original file line number Diff line number Diff line change
Expand Up @@ -1135,23 +1135,18 @@ static int arm_cspmu_acpi_get_cpus(struct arm_cspmu *cspmu)

struct acpi_device *arm_cspmu_acpi_dev_get(const struct arm_cspmu *cspmu)
{
char hid[16];
char uid[16];
struct acpi_device *adev;
char hid[16] = {};
char uid[16] = {};
const struct acpi_apmt_node *apmt_node;

apmt_node = arm_cspmu_apmt_node(cspmu->dev);
if (!apmt_node || apmt_node->type != ACPI_APMT_NODE_TYPE_ACPI)
return NULL;

memset(hid, 0, sizeof(hid));
memset(uid, 0, sizeof(uid));

memcpy(hid, &apmt_node->inst_primary, sizeof(apmt_node->inst_primary));
snprintf(uid, sizeof(uid), "%u", apmt_node->inst_secondary);

adev = acpi_dev_get_first_match_dev(hid, uid, -1);
return adev;
return acpi_dev_get_first_match_dev(hid, uid, -1);
}
EXPORT_SYMBOL_GPL(arm_cspmu_acpi_dev_get);
#else
Expand Down
2 changes: 1 addition & 1 deletion drivers/perf/arm_cspmu/arm_cspmu.h
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ int arm_cspmu_impl_register(const struct arm_cspmu_impl_match *impl_match);
/* Unregister vendor backend. */
void arm_cspmu_impl_unregister(const struct arm_cspmu_impl_match *impl_match);

#if defined(CONFIG_ACPI)
#if defined(CONFIG_ACPI) && defined(CONFIG_ARM64)
/**
* Get ACPI device associated with the PMU.
* The caller is responsible for calling acpi_dev_put() on the returned device.
Expand Down
34 changes: 12 additions & 22 deletions drivers/perf/arm_cspmu/nvidia_cspmu.c
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,6 @@ static struct attribute *ucf_pmu_event_attrs[] = {
ARM_CSPMU_EVENT_ATTR(slc_access_dataless, 0x183),
ARM_CSPMU_EVENT_ATTR(slc_access_atomic, 0x184),

ARM_CSPMU_EVENT_ATTR(slc_access, 0xF2),
ARM_CSPMU_EVENT_ATTR(slc_access_rd, 0x111),
ARM_CSPMU_EVENT_ATTR(slc_access_wr, 0x112),
ARM_CSPMU_EVENT_ATTR(slc_bytes_rd, 0x113),
Expand All @@ -191,7 +190,7 @@ static struct attribute *ucf_pmu_event_attrs[] = {
ARM_CSPMU_EVENT_ATTR(ext_snp_evict, 0x182),

ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
NULL,
NULL
};

static struct attribute *pcie_v2_pmu_event_attrs[] = {
Expand All @@ -201,7 +200,7 @@ static struct attribute *pcie_v2_pmu_event_attrs[] = {
ARM_CSPMU_EVENT_ATTR(wr_req, 0x3),
ARM_CSPMU_EVENT_ATTR(rd_cum_outs, 0x4),
ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
NULL,
NULL
};

static struct attribute *pcie_tgt_pmu_event_attrs[] = {
Expand All @@ -210,7 +209,7 @@ static struct attribute *pcie_tgt_pmu_event_attrs[] = {
ARM_CSPMU_EVENT_ATTR(rd_req, 0x2),
ARM_CSPMU_EVENT_ATTR(wr_req, 0x3),
ARM_CSPMU_EVENT_ATTR(cycles, NV_PCIE_TGT_EV_TYPE_CC),
NULL,
NULL
};

static struct attribute *generic_pmu_event_attrs[] = {
Expand Down Expand Up @@ -250,7 +249,7 @@ static struct attribute *ucf_pmu_format_attrs[] = {
ARM_CSPMU_FORMAT_ATTR(dst_loc_gmem, "config1:9"),
ARM_CSPMU_FORMAT_ATTR(dst_loc_other, "config1:10"),
ARM_CSPMU_FORMAT_ATTR(dst_rem, "config1:11"),
NULL,
NULL
};

static struct attribute *pcie_v2_pmu_format_attrs[] = {
Expand All @@ -263,7 +262,7 @@ static struct attribute *pcie_v2_pmu_format_attrs[] = {
ARM_CSPMU_FORMAT_ATTR(dst_loc_pcie_p2p, "config2:2"),
ARM_CSPMU_FORMAT_ATTR(dst_loc_pcie_cxl, "config2:3"),
ARM_CSPMU_FORMAT_ATTR(dst_rem, "config2:4"),
NULL,
NULL
};

static struct attribute *pcie_tgt_pmu_format_attrs[] = {
Expand All @@ -272,7 +271,7 @@ static struct attribute *pcie_tgt_pmu_format_attrs[] = {
ARM_CSPMU_FORMAT_ATTR(dst_addr_en, "config:11"),
ARM_CSPMU_FORMAT_ATTR(dst_addr_base, "config1:0-63"),
ARM_CSPMU_FORMAT_ATTR(dst_addr_mask, "config2:0-63"),
NULL,
NULL
};

static struct attribute *generic_pmu_format_attrs[] = {
Expand Down Expand Up @@ -306,7 +305,7 @@ nv_cspmu_get_name(const struct arm_cspmu *cspmu)
return ctx->name;
}

#if defined(CONFIG_ACPI)
#if defined(CONFIG_ACPI) && defined(CONFIG_ARM64)
static int nv_cspmu_get_inst_id(const struct arm_cspmu *cspmu, u32 *id)
{
struct fwnode_handle *fwnode;
Expand Down Expand Up @@ -468,7 +467,7 @@ static int pcie_v2_pmu_validate_event(struct arm_cspmu *cspmu,

int idx;
u32 new_filter, new_rp, new_bdf, new_lead_filter, new_lead_bdf;
struct perf_event *leader, *new_leader;
struct perf_event *new_leader;

if (cspmu->impl.ops.is_cycle_counter_event(new_ev))
return 0;
Expand Down Expand Up @@ -500,7 +499,7 @@ static int pcie_v2_pmu_validate_event(struct arm_cspmu *cspmu,
cspmu->cycle_counter_logical_idx);

if (idx != cspmu->cycle_counter_logical_idx) {
leader = cspmu->hw_events.events[idx]->group_leader;
struct perf_event *leader = cspmu->hw_events.events[idx]->group_leader;

const u32 lead_filter = pcie_v2_pmu_event_filter(leader);
const u32 lead_bdf = pcie_v2_pmu_bdf_val_en(lead_filter);
Expand All @@ -525,7 +524,7 @@ struct pcie_tgt_data {
void __iomem *addr_filter_reg;
};

#if defined(CONFIG_ACPI)
#if defined(CONFIG_ACPI) && defined(CONFIG_ARM64)
static int pcie_tgt_init_data(struct arm_cspmu *cspmu)
{
int ret;
Expand Down Expand Up @@ -760,8 +759,7 @@ static void pcie_tgt_pmu_reset_ev_filter(struct arm_cspmu *cspmu,
return;
}

pcie_tgt_pmu_config_addr_filter(
cspmu, false, base, mask, idx);
pcie_tgt_pmu_config_addr_filter(cspmu, false, base, mask, idx);
}

static u32 pcie_tgt_pmu_event_type(const struct perf_event *event)
Expand All @@ -779,7 +777,7 @@ static bool pcie_tgt_pmu_is_cycle_counter_event(const struct perf_event *event)
enum nv_cspmu_name_fmt {
NAME_FMT_GENERIC,
NAME_FMT_SOCKET,
NAME_FMT_SOCKET_INST
NAME_FMT_SOCKET_INST,
};

struct nv_cspmu_match {
Expand Down Expand Up @@ -895,8 +893,6 @@ static const struct nv_cspmu_match nv_cspmu_match[] = {
.filter2_mask = 0x0,
.filter2_default_val = 0x0,
.get_filter = ucf_pmu_event_filter,
.get_filter2 = NULL,
.init_data = NULL
},
},
{
Expand All @@ -913,7 +909,6 @@ static const struct nv_cspmu_match nv_cspmu_match[] = {
.filter2_default_val = NV_PCIE_V2_FILTER2_DEFAULT,
.get_filter = pcie_v2_pmu_event_filter,
.get_filter2 = nv_cspmu_event_filter2,
.init_data = NULL
},
.ops = {
.validate_event = pcie_v2_pmu_validate_event,
Expand All @@ -932,8 +927,6 @@ static const struct nv_cspmu_match nv_cspmu_match[] = {
.filter_default_val = 0x0,
.filter2_mask = NV_PCIE_TGT_FILTER2_MASK,
.filter2_default_val = NV_PCIE_TGT_FILTER2_DEFAULT,
.get_filter = NULL,
.get_filter2 = NULL,
.init_data = pcie_tgt_init_data
},
.ops = {
Expand Down Expand Up @@ -995,9 +988,6 @@ static char *nv_cspmu_format_name(const struct arm_cspmu *cspmu,
name = devm_kasprintf(dev, GFP_KERNEL, match->name_pattern,
atomic_fetch_inc(&pmu_generic_idx));
break;
default:
name = NULL;
break;
}

return name;
Expand Down
7 changes: 6 additions & 1 deletion drivers/perf/arm_pmu.c
Original file line number Diff line number Diff line change
Expand Up @@ -931,8 +931,13 @@ int armpmu_register(struct arm_pmu *pmu)
/*
* By this stage we know our supported CPUs on either DT/ACPI platforms,
* detect the SMT implementation.
* On SMT CPUs, the PMCCNTR_EL0 increments from the processor clock rather
* than the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
* counting on a WFI PE if one of its SMT sibling is not idle on a
* multi-threaded implementation. So don't use it on SMT cores.
*/
pmu->has_smt = topology_core_has_smt(cpumask_first(&pmu->supported_cpus));
pmu->avoid_pmccntr |=
topology_core_has_smt(cpumask_first(&pmu->supported_cpus));

if (!pmu->set_event_filter)
pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
Expand Down
51 changes: 44 additions & 7 deletions drivers/perf/arm_pmuv3.c
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
* This code is based heavily on the ARMv7 perf event code.
*/

#include <asm/cputype.h>
#include <asm/irq_regs.h>
#include <asm/perf_event.h>
#include <asm/virt.h>
Expand Down Expand Up @@ -1002,13 +1003,7 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
if (has_branch_stack(event))
return false;

/*
* The PMCCNTR_EL0 increments from the processor clock rather than
* the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
* counting on a WFI PE if one of its SMT sibling is not idle on a
* multi-threaded implementation. So don't use it on SMT cores.
*/
if (cpu_pmu->has_smt)
if (cpu_pmu->avoid_pmccntr)
return false;

return true;
Expand Down Expand Up @@ -1299,6 +1294,41 @@ static int armv8_vulcan_map_event(struct perf_event *event)
&armv8_vulcan_perf_cache_map);
}

#ifdef CONFIG_ARM64
/*
* List of CPUs that should avoid using PMCCNTR_EL0.
*/
static struct midr_range armv8pmu_avoid_pmccntr_cpus[] = {
/*
* The PMCCNTR_EL0 in Olympus CPU may still increment while in WFI/WFE state.
* This is an implementation specific behavior and not an erratum.
*
* From ARM DDI0487 D14.4:
* It is IMPLEMENTATION SPECIFIC whether CPU_CYCLES and PMCCNTR count
* when the PE is in WFI or WFE state, even if the clocks are not stopped.
*
* From ARM DDI0487 D24.5.2:
* All counters are subject to any changes in clock frequency, including
* clock stopping caused by the WFI and WFE instructions.
* This means that it is CONSTRAINED UNPREDICTABLE whether or not
* PMCCNTR_EL0 continues to increment when clocks are stopped by WFI and
* WFE instructions.
*/
MIDR_ALL_VERSIONS(MIDR_NVIDIA_OLYMPUS),
{}
};

static bool armv8pmu_is_in_avoid_pmccntr_cpus(void)
{
return is_midr_in_range_list(armv8pmu_avoid_pmccntr_cpus);
}
#else
static bool armv8pmu_is_in_avoid_pmccntr_cpus(void)
{
return false;
}
#endif

struct armv8pmu_probe_info {
struct arm_pmu *pmu;
bool present;
Expand Down Expand Up @@ -1348,6 +1378,13 @@ static void __armv8pmu_probe_pmu(void *info)
else
cpu_pmu->reg_pmmir = 0;

/*
* On some CPUs, PMCCNTR_EL0 does not match the behavior of CPU_CYCLES
* programmable counter, so avoid routing cycles through PMCCNTR_EL0 to
* prevent inconsistency in the results.
*/
cpu_pmu->avoid_pmccntr |= armv8pmu_is_in_avoid_pmccntr_cpus();

brbe_probe(cpu_pmu);
}

Expand Down
Loading
Loading