diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst index 2adec1e6520a68e4d6dd9c52bd86f48158e30957..812dc0697890795b26bce824d9bdc831e380c39b 100644 --- a/Documentation/admin-guide/hw-vuln/index.rst +++ b/Documentation/admin-guide/hw-vuln/index.rst @@ -16,3 +16,4 @@ are configurable at compile, boot or run time. multihit.rst special-register-buffer-data-sampling.rst processor_mmio_stale_data.rst + rsb diff --git a/Documentation/admin-guide/hw-vuln/rsb.rst b/Documentation/admin-guide/hw-vuln/rsb.rst new file mode 100644 index 0000000000000000000000000000000000000000..21dbf9cf25f8bd9d16d4f1c6157c9212520d78fe --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/rsb.rst @@ -0,0 +1,268 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================= +RSB-related mitigations +======================= + +.. warning:: + Please keep this document up-to-date, otherwise you will be + volunteered to update it and convert it to a very long comment in + bugs.c! + +Since 2018 there have been many Spectre CVEs related to the Return Stack +Buffer (RSB) (sometimes referred to as the Return Address Stack (RAS) or +Return Address Predictor (RAP) on AMD). + +Information about these CVEs and how to mitigate them is scattered +amongst a myriad of microarchitecture-specific documents. + +This document attempts to consolidate all the relevant information in +once place and clarify the reasoning behind the current RSB-related +mitigations. It's meant to be as concise as possible, focused only on +the current kernel mitigations: what are the RSB-related attack vectors +and how are they currently being mitigated? + +It's *not* meant to describe how the RSB mechanism operates or how the +exploits work. More details about those can be found in the references +below. + +Rather, this is basically a glorified comment, but too long to actually +be one. So when the next CVE comes along, a kernel developer can +quickly refer to this as a refresher to see what we're actually doing +and why. + +At a high level, there are two classes of RSB attacks: RSB poisoning +(Intel and AMD) and RSB underflow (Intel only). They must each be +considered individually for each attack vector (and microarchitecture +where applicable). + +---- + +RSB poisoning (Intel and AMD) +============================= + +SpectreRSB +~~~~~~~~~~ + +RSB poisoning is a technique used by SpectreRSB [#spectre-rsb]_ where +an attacker poisons an RSB entry to cause a victim's return instruction +to speculate to an attacker-controlled address. This can happen when +there are unbalanced CALLs/RETs after a context switch or VMEXIT. + +* All attack vectors can potentially be mitigated by flushing out any + poisoned RSB entries using an RSB filling sequence + [#intel-rsb-filling]_ [#amd-rsb-filling]_ when transitioning between + untrusted and trusted domains. But this has a performance impact and + should be avoided whenever possible. + + .. DANGER:: + **FIXME**: Currently we're flushing 32 entries. However, some CPU + models have more than 32 entries. The loop count needs to be + increased for those. More detailed information is needed about RSB + sizes. + +* On context switch, the user->user mitigation requires ensuring the + RSB gets filled or cleared whenever IBPB gets written [#cond-ibpb]_ + during a context switch: + + * AMD: + On Zen 4+, IBPB (or SBPB [#amd-sbpb]_ if used) clears the RSB. + This is indicated by IBPB_RET in CPUID [#amd-ibpb-rsb]_. + + On Zen < 4, the RSB filling sequence [#amd-rsb-filling]_ must be + always be done in addition to IBPB [#amd-ibpb-no-rsb]_. This is + indicated by X86_BUG_IBPB_NO_RET. + + * Intel: + IBPB always clears the RSB: + + "Software that executed before the IBPB command cannot control + the predicted targets of indirect branches executed after the + command on the same logical processor. The term indirect branch + in this context includes near return instructions, so these + predicted targets may come from the RSB." [#intel-ibpb-rsb]_ + +* On context switch, user->kernel attacks are prevented by SMEP. User + space can only insert user space addresses into the RSB. Even + non-canonical addresses can't be inserted due to the page gap at the + end of the user canonical address space reserved by TASK_SIZE_MAX. + A SMEP #PF at instruction fetch prevents the kernel from speculatively + executing user space. + + * AMD: + "Finally, branches that are predicted as 'ret' instructions get + their predicted targets from the Return Address Predictor (RAP). + AMD recommends software use a RAP stuffing sequence (mitigation + V2-3 in [2]) and/or Supervisor Mode Execution Protection (SMEP) + to ensure that the addresses in the RAP are safe for + speculation. Collectively, we refer to these mitigations as "RAP + Protection"." [#amd-smep-rsb]_ + + * Intel: + "On processors with enhanced IBRS, an RSB overwrite sequence may + not suffice to prevent the predicted target of a near return + from using an RSB entry created in a less privileged predictor + mode. Software can prevent this by enabling SMEP (for + transitions from user mode to supervisor mode) and by having + IA32_SPEC_CTRL.IBRS set during VM exits." [#intel-smep-rsb]_ + +* On VMEXIT, guest->host attacks are mitigated by eIBRS (and PBRSB + mitigation if needed): + + * AMD: + "When Automatic IBRS is enabled, the internal return address + stack used for return address predictions is cleared on VMEXIT." + [#amd-eibrs-vmexit]_ + + * Intel: + "On processors with enhanced IBRS, an RSB overwrite sequence may + not suffice to prevent the predicted target of a near return + from using an RSB entry created in a less privileged predictor + mode. Software can prevent this by enabling SMEP (for + transitions from user mode to supervisor mode) and by having + IA32_SPEC_CTRL.IBRS set during VM exits. Processors with + enhanced IBRS still support the usage model where IBRS is set + only in the OS/VMM for OSes that enable SMEP. To do this, such + processors will ensure that guest behavior cannot control the + RSB after a VM exit once IBRS is set, even if IBRS was not set + at the time of the VM exit." [#intel-eibrs-vmexit]_ + + Note that some Intel CPUs are susceptible to Post-barrier Return + Stack Buffer Predictions (PBRSB) [#intel-pbrsb]_, where the last + CALL from the guest can be used to predict the first unbalanced RET. + In this case the PBRSB mitigation is needed in addition to eIBRS. + +AMD RETBleed / SRSO / Branch Type Confusion +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +On AMD, poisoned RSB entries can also be created by the AMD RETBleed +variant [#retbleed-paper]_ [#amd-btc]_ or by Speculative Return Stack +Overflow [#amd-srso]_ (Inception [#inception-paper]_). The kernel +protects itself by replacing every RET in the kernel with a branch to a +single safe RET. + +---- + +RSB underflow (Intel only) +========================== + +RSB Alternate (RSBA) ("Intel Retbleed") +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Some Intel Skylake-generation CPUs are susceptible to the Intel variant +of RETBleed [#retbleed-paper]_ (Return Stack Buffer Underflow +[#intel-rsbu]_). If a RET is executed when the RSB buffer is empty due +to mismatched CALLs/RETs or returning from a deep call stack, the branch +predictor can fall back to using the Branch Target Buffer (BTB). If a +user forces a BTB collision then the RET can speculatively branch to a +user-controlled address. + +* Note that RSB filling doesn't fully mitigate this issue. If there + are enough unbalanced RETs, the RSB may still underflow and fall back + to using a poisoned BTB entry. + +* On context switch, user->user underflow attacks are mitigated by the + conditional IBPB [#cond-ibpb]_ on context switch which effectively + clears the BTB: + + * "The indirect branch predictor barrier (IBPB) is an indirect branch + control mechanism that establishes a barrier, preventing software + that executed before the barrier from controlling the predicted + targets of indirect branches executed after the barrier on the same + logical processor." [#intel-ibpb-btb]_ + +* On context switch and VMEXIT, user->kernel and guest->host RSB + underflows are mitigated by IBRS or eIBRS: + + * "Enabling IBRS (including enhanced IBRS) will mitigate the "RSBU" + attack demonstrated by the researchers. As previously documented, + Intel recommends the use of enhanced IBRS, where supported. This + includes any processor that enumerates RRSBA but not RRSBA_DIS_S." + [#intel-rsbu]_ + + However, note that eIBRS and IBRS do not mitigate intra-mode attacks. + Like RRSBA below, this is mitigated by clearing the BHB on kernel + entry. + + As an alternative to classic IBRS, call depth tracking (combined with + retpolines) can be used to track kernel returns and fill the RSB when + it gets close to being empty. + +Restricted RSB Alternate (RRSBA) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Some newer Intel CPUs have Restricted RSB Alternate (RRSBA) behavior, +which, similar to RSBA described above, also falls back to using the BTB +on RSB underflow. The only difference is that the predicted targets are +restricted to the current domain when eIBRS is enabled: + +* "Restricted RSB Alternate (RRSBA) behavior allows alternate branch + predictors to be used by near RET instructions when the RSB is + empty. When eIBRS is enabled, the predicted targets of these + alternate predictors are restricted to those belonging to the + indirect branch predictor entries of the current prediction domain. + [#intel-eibrs-rrsba]_ + +When a CPU with RRSBA is vulnerable to Branch History Injection +[#bhi-paper]_ [#intel-bhi]_, an RSB underflow could be used for an +intra-mode BTI attack. This is mitigated by clearing the BHB on +kernel entry. + +However if the kernel uses retpolines instead of eIBRS, it needs to +disable RRSBA: + +* "Where software is using retpoline as a mitigation for BHI or + intra-mode BTI, and the processor both enumerates RRSBA and + enumerates RRSBA_DIS controls, it should disable this behavior." + [#intel-retpoline-rrsba]_ + +---- + +References +========== + +.. [#spectre-rsb] `Spectre Returns! Speculation Attacks using the Return Stack Buffer `_ + +.. [#intel-rsb-filling] "Empty RSB Mitigation on Skylake-generation" in `Retpoline: A Branch Target Injection Mitigation `_ + +.. [#amd-rsb-filling] "Mitigation V2-3" in `Software Techniques for Managing Speculation `_ + +.. [#cond-ibpb] Whether IBPB is written depends on whether the prev and/or next task is protected from Spectre attacks. It typically requires opting in per task or system-wide. For more details see the documentation for the ``spectre_v2_user`` cmdline option in Documentation/admin-guide/kernel-parameters.txt. + +.. [#amd-sbpb] IBPB without flushing of branch type predictions. Only exists for AMD. + +.. [#amd-ibpb-rsb] "Function 8000_0008h -- Processor Capacity Parameters and Extended Feature Identification" in `AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions `_. SBPB behaves the same way according to `this email `_. + +.. [#amd-ibpb-no-rsb] `Spectre Attacks: Exploiting Speculative Execution `_ + +.. [#intel-ibpb-rsb] "Introduction" in `Post-barrier Return Stack Buffer Predictions / CVE-2022-26373 / INTEL-SA-00706 `_ + +.. [#amd-smep-rsb] "Existing Mitigations" in `Technical Guidance for Mitigating Branch Type Confusion `_ + +.. [#intel-smep-rsb] "Enhanced IBRS" in `Indirect Branch Restricted Speculation `_ + +.. [#amd-eibrs-vmexit] "Extended Feature Enable Register (EFER)" in `AMD64 Architecture Programmer's Manual Volume 2: System Programming `_ + +.. [#intel-eibrs-vmexit] "Enhanced IBRS" in `Indirect Branch Restricted Speculation `_ + +.. [#intel-pbrsb] `Post-barrier Return Stack Buffer Predictions / CVE-2022-26373 / INTEL-SA-00706 `_ + +.. [#retbleed-paper] `RETBleed: Arbitrary Speculative Code Execution with Return Instruction `_ + +.. [#amd-btc] `Technical Guidance for Mitigating Branch Type Confusion `_ + +.. [#amd-srso] `Technical Update Regarding Speculative Return Stack Overflow `_ + +.. [#inception-paper] `Inception: Exposing New Attack Surfaces with Training in Transient Execution `_ + +.. [#intel-rsbu] `Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 `_ + +.. [#intel-ibpb-btb] `Indirect Branch Predictor Barrier' `_ + +.. [#intel-eibrs-rrsba] "Guidance for RSBU" in `Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 `_ + +.. [#bhi-paper] `Branch History Injection: On the Effectiveness of Hardware Mitigations Against Cross-Privilege Spectre-v2 Attacks `_ + +.. [#intel-bhi] `Branch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598 `_ + +.. [#intel-retpoline-rrsba] "Retpoline" in `Branch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598 `_ diff --git a/arch/Kconfig b/arch/Kconfig index 80682c6f6d12a3b39007dff0d8d26e9750cad0a1..7adde6e58dce60c72d8a7d7d9feeba68728c9c8f 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -973,6 +973,9 @@ config ARCH_HAS_NONLEAF_PMD_YOUNG address translations. Page table walkers that clear the accessed bit may use this capability to reduce their search space. +config HAVE_STATIC_CALL + bool + source "kernel/gcov/Kconfig" source "scripts/gcc-plugins/Kconfig" diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 6f013e418834987c5b40218ce974e3d344357eb0..f5f973077c28c431902399c6cbe20deffd9de70f 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -529,6 +529,7 @@ static void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw) } } cpuhw->bhrb_stack.nr = u_index; + cpuhw->bhrb_stack.hw_idx = -1ULL; return; } @@ -2135,7 +2136,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val, if (event->attr.sample_type & PERF_SAMPLE_WEIGHT && ppmu->get_mem_weight) - ppmu->get_mem_weight(&data.weight); + ppmu->get_mem_weight(&data.weight.full); if (perf_event_overflow(event, &data, regs)) power_pmu_stop(event, 0); diff --git a/arch/x86/events/Kconfig b/arch/x86/events/Kconfig index 4a809c6cbd2f5d8d1b19b187b2a8a0fbfd95092d..03b8674b192c27cf8f8eeb288c77bda7dadade5b 100644 --- a/arch/x86/events/Kconfig +++ b/arch/x86/events/Kconfig @@ -34,4 +34,14 @@ config PERF_EVENTS_AMD_POWER (CPUID Fn8000_0007_EDX[12]) interface to calculate the average power consumption on Family 15h processors. +config PERF_EVENTS_AMD_UNCORE + tristate "AMD Uncore performance events" + depends on PERF_EVENTS && CPU_SUP_AMD + default y + help + Include support for AMD uncore performance events for use with + e.g., perf stat -e amd_l3/.../,amd_df/.../. + + To compile this driver as a module, choose M here: the + module will be called 'amd-uncore'. endmenu diff --git a/arch/x86/events/amd/Makefile b/arch/x86/events/amd/Makefile index fe8795a67385a5deda3a5ee145b2778607fe1046..cf323ffab5cdb98eef89c7044b59a892be9895f4 100644 --- a/arch/x86/events/amd/Makefile +++ b/arch/x86/events/amd/Makefile @@ -1,8 +1,9 @@ # SPDX-License-Identifier: GPL-2.0 -obj-$(CONFIG_CPU_SUP_AMD) += core.o uncore.o +obj-$(CONFIG_CPU_SUP_AMD) += core.o brs.o obj-$(CONFIG_PERF_EVENTS_AMD_POWER) += power.o obj-$(CONFIG_X86_LOCAL_APIC) += ibs.o +obj-$(CONFIG_PERF_EVENTS_AMD_UNCORE) += amd-uncore.o +amd-uncore-objs := uncore.o ifdef CONFIG_AMD_IOMMU obj-$(CONFIG_CPU_SUP_AMD) += iommu.o endif - diff --git a/arch/x86/events/amd/brs.c b/arch/x86/events/amd/brs.c new file mode 100644 index 0000000000000000000000000000000000000000..3c13c484c637be8d0f209663c49b4649835be98a --- /dev/null +++ b/arch/x86/events/amd/brs.c @@ -0,0 +1,317 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Implement support for AMD Fam19h Branch Sampling feature + * Based on specifications published in AMD PPR Fam19 Model 01 + * + * Copyright 2021 Google LLC + * Contributed by Stephane Eranian + */ +#include +#include +#include + +#include "../perf_event.h" + +#define BRS_POISON 0xFFFFFFFFFFFFFFFEULL /* mark limit of valid entries */ + +/* Debug Extension Configuration register layout */ +union amd_debug_extn_cfg { + __u64 val; + struct { + __u64 rsvd0:2, /* reserved */ + brsmen:1, /* branch sample enable */ + rsvd4_3:2,/* reserved - must be 0x3 */ + vb:1, /* valid branches recorded */ + rsvd2:10, /* reserved */ + msroff:4, /* index of next entry to write */ + rsvd3:4, /* reserved */ + pmc:3, /* #PMC holding the sampling event */ + rsvd4:37; /* reserved */ + }; +}; + +static inline unsigned int brs_from(int idx) +{ + return MSR_AMD_SAMP_BR_FROM + 2 * idx; +} + +static inline unsigned int brs_to(int idx) +{ + return MSR_AMD_SAMP_BR_FROM + 2 * idx + 1; +} + +static inline void set_debug_extn_cfg(u64 val) +{ + /* bits[4:3] must always be set to 11b */ + wrmsrl(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3); +} + +static inline u64 get_debug_extn_cfg(void) +{ + u64 val; + + rdmsrl(MSR_AMD_DBG_EXTN_CFG, val); + return val; +} + +static bool __init amd_brs_detect(void) +{ + if (!boot_cpu_has(X86_FEATURE_BRS)) + return false; + + switch (boot_cpu_data.x86) { + case 0x19: /* AMD Fam19h (Zen3) */ + x86_pmu.lbr_nr = 16; + + /* No hardware filtering supported */ + x86_pmu.lbr_sel_map = NULL; + x86_pmu.lbr_sel_mask = 0; + break; + default: + return false; + } + + return true; +} + +/* + * Current BRS implementation does not support branch type or privilege level + * filtering. Therefore, this function simply enforces these limitations. No need for + * a br_sel_map. Software filtering is not supported because it would not correlate well + * with a sampling period. + */ +int amd_brs_setup_filter(struct perf_event *event) +{ + u64 type = event->attr.branch_sample_type; + + /* No BRS support */ + if (!x86_pmu.lbr_nr) + return -EOPNOTSUPP; + + /* Can only capture all branches, i.e., no filtering */ + if ((type & ~PERF_SAMPLE_BRANCH_PLM_ALL) != PERF_SAMPLE_BRANCH_ANY) + return -EINVAL; + + /* can only capture at all priv levels due to the way BRS works */ + if ((type & PERF_SAMPLE_BRANCH_PLM_ALL) != PERF_SAMPLE_BRANCH_PLM_ALL) + return -EINVAL; + + return 0; +} + +/* tos = top of stack, i.e., last valid entry written */ +static inline int amd_brs_get_tos(union amd_debug_extn_cfg *cfg) +{ + /* + * msroff: index of next entry to write so top-of-stack is one off + * if BRS is full then msroff is set back to 0. + */ + return (cfg->msroff ? cfg->msroff : x86_pmu.lbr_nr) - 1; +} + +/* + * make sure we have a sane BRS offset to begin with + * especially with kexec + */ +void amd_brs_reset(void) +{ + /* + * Reset config + */ + set_debug_extn_cfg(0); + + /* + * Mark first entry as poisoned + */ + wrmsrl(brs_to(0), BRS_POISON); +} + +int __init amd_brs_init(void) +{ + if (!amd_brs_detect()) + return -EOPNOTSUPP; + + pr_cont("%d-deep BRS, ", x86_pmu.lbr_nr); + + return 0; +} + +void amd_brs_enable(void) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + union amd_debug_extn_cfg cfg; + + /* Activate only on first user */ + if (++cpuc->brs_active > 1) + return; + + cfg.val = 0; /* reset all fields */ + cfg.brsmen = 1; /* enable branch sampling */ + + /* Set enable bit */ + set_debug_extn_cfg(cfg.val); +} + +void amd_brs_enable_all(void) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + if (cpuc->lbr_users) + amd_brs_enable(); +} + +void amd_brs_disable(void) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + union amd_debug_extn_cfg cfg; + + /* Check if active (could be disabled via x86_pmu_disable_all()) */ + if (!cpuc->brs_active) + return; + + /* Only disable for last user */ + if (--cpuc->brs_active) + return; + + /* + * Clear the brsmen bit but preserve the others as they contain + * useful state such as vb and msroff + */ + cfg.val = get_debug_extn_cfg(); + + /* + * When coming in on interrupt and BRS is full, then hw will have + * already stopped BRS, no need to issue wrmsr again + */ + if (cfg.brsmen) { + cfg.brsmen = 0; + set_debug_extn_cfg(cfg.val); + } +} + +void amd_brs_disable_all(void) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + if (cpuc->lbr_users) + amd_brs_disable(); +} + +/* + * Caller must ensure amd_brs_inuse() is true before calling + * return: + */ +void amd_brs_drain(void) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct perf_event *event = cpuc->events[0]; + struct perf_branch_entry *br = cpuc->lbr_entries; + union amd_debug_extn_cfg cfg; + u32 i, nr = 0, num, tos, start; + u32 shift = 64 - boot_cpu_data.x86_virt_bits; + + /* + * BRS event forced on PMC0, + * so check if there is an event. + * It is possible to have lbr_users > 0 but the event + * not yet scheduled due to long latency PMU irq + */ + if (!event) + goto empty; + + cfg.val = get_debug_extn_cfg(); + + /* Sanity check [0-x86_pmu.lbr_nr] */ + if (WARN_ON_ONCE(cfg.msroff >= x86_pmu.lbr_nr)) + goto empty; + + /* No valid branch */ + if (cfg.vb == 0) + goto empty; + + /* + * msr.off points to next entry to be written + * tos = most recent entry index = msr.off - 1 + * BRS register buffer saturates, so we know we have + * start < tos and that we have to read from start to tos + */ + start = 0; + tos = amd_brs_get_tos(&cfg); + + num = tos - start + 1; + + /* + * BRS is only one pass (saturation) from MSROFF to depth-1 + * MSROFF wraps to zero when buffer is full + */ + for (i = 0; i < num; i++) { + u32 brs_idx = tos - i; + u64 from, to; + + rdmsrl(brs_to(brs_idx), to); + + /* Entry does not belong to us (as marked by kernel) */ + if (to == BRS_POISON) + break; + + rdmsrl(brs_from(brs_idx), from); + + /* + * Sign-extend SAMP_BR_TO to 64 bits, bits 61-63 are reserved. + * Necessary to generate proper virtual addresses suitable for + * symbolization + */ + to = (u64)(((s64)to << shift) >> shift); + + perf_clear_branch_entry_bitfields(br+nr); + + br[nr].from = from; + br[nr].to = to; + + nr++; + } +empty: + /* Record number of sampled branches */ + cpuc->lbr_stack.nr = nr; +} + +/* + * Poison most recent entry to prevent reuse by next task + * required because BRS entry are not tagged by PID + */ +static void amd_brs_poison_buffer(void) +{ + union amd_debug_extn_cfg cfg; + unsigned int idx; + + /* Get current state */ + cfg.val = get_debug_extn_cfg(); + + /* idx is most recently written entry */ + idx = amd_brs_get_tos(&cfg); + + /* Poison target of entry */ + wrmsrl(brs_to(idx), BRS_POISON); +} + +/* + * On context switch in, we need to make sure no samples from previous user + * are left in the BRS. + * + * On ctxswin, sched_in = true, called after the PMU has started + * On ctxswout, sched_in = false, called before the PMU is stopped + */ +void amd_pmu_brs_sched_task(struct perf_event_context *ctx, bool sched_in) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + + /* no active users */ + if (!cpuc->lbr_users) + return; + + /* + * On context switch in, we need to ensure we do not use entries + * from previous BRS user on that CPU, so we poison the buffer as + * a faster way compared to resetting all entries. + */ + if (sched_in) + amd_brs_poison_buffer(); +} diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c index f6d8808b7b3c285b57bd841b73a9f510eccaea8a..438ab9243409e24aef3a8e199eff068445133659 100644 --- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include "../perf_event.h" @@ -18,6 +19,9 @@ static unsigned long perf_nmi_window; #define AMD_MERGE_EVENT ((0xFULL << 32) | 0xFFULL) #define AMD_MERGE_EVENT_ENABLE (AMD_MERGE_EVENT | ARCH_PERFMON_EVENTSEL_ENABLE) +/* PMC Enable and Overflow bits for PerfCntrGlobal* registers */ +static u64 amd_pmu_global_cntr_mask __read_mostly; + static __initconst const u64 amd_hw_cache_event_ids [PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_OP_MAX] @@ -245,7 +249,7 @@ static const u64 amd_perfmon_event_map[PERF_COUNT_HW_MAX] = /* * AMD Performance Monitor Family 17h and later: */ -static const u64 amd_f17h_perfmon_event_map[PERF_COUNT_HW_MAX] = +static const u64 amd_zen1_perfmon_event_map[PERF_COUNT_HW_MAX] = { [PERF_COUNT_HW_CPU_CYCLES] = 0x0076, [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0, @@ -257,10 +261,39 @@ static const u64 amd_f17h_perfmon_event_map[PERF_COUNT_HW_MAX] = [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x0187, }; +static const u64 amd_zen2_perfmon_event_map[PERF_COUNT_HW_MAX] = +{ + [PERF_COUNT_HW_CPU_CYCLES] = 0x0076, + [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0, + [PERF_COUNT_HW_CACHE_REFERENCES] = 0xff60, + [PERF_COUNT_HW_CACHE_MISSES] = 0x0964, + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c2, + [PERF_COUNT_HW_BRANCH_MISSES] = 0x00c3, + [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x00a9, +}; + +static const u64 amd_zen4_perfmon_event_map[PERF_COUNT_HW_MAX] = +{ + [PERF_COUNT_HW_CPU_CYCLES] = 0x0076, + [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0, + [PERF_COUNT_HW_CACHE_REFERENCES] = 0xff60, + [PERF_COUNT_HW_CACHE_MISSES] = 0x0964, + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c2, + [PERF_COUNT_HW_BRANCH_MISSES] = 0x00c3, + [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x00a9, + [PERF_COUNT_HW_REF_CPU_CYCLES] = 0x100000120, +}; + static u64 amd_pmu_event_map(int hw_event) { - if (boot_cpu_data.x86 >= 0x17) - return amd_f17h_perfmon_event_map[hw_event]; + if (cpu_feature_enabled(X86_FEATURE_ZEN4) || boot_cpu_data.x86 >= 0x1a) + return amd_zen4_perfmon_event_map[hw_event]; + + if (cpu_feature_enabled(X86_FEATURE_ZEN2) || boot_cpu_data.x86 >= 0x19) + return amd_zen2_perfmon_event_map[hw_event]; + + if (cpu_feature_enabled(X86_FEATURE_ZEN1)) + return amd_zen1_perfmon_event_map[hw_event]; return amd_perfmon_event_map[hw_event]; } @@ -325,8 +358,16 @@ static inline bool amd_is_pair_event_code(struct hw_perf_event *hwc) } } +#define AMD_FAM19H_BRS_EVENT 0xc4 /* RETIRED_TAKEN_BRANCH_INSTRUCTIONS */ +static inline int amd_is_brs_event(struct perf_event *e) +{ + return (e->hw.config & AMD64_RAW_EVENT_MASK) == AMD_FAM19H_BRS_EVENT; +} + static int amd_core_hw_config(struct perf_event *event) { + int ret = 0; + if (event->attr.exclude_host && event->attr.exclude_guest) /* * When HO == GO == 1 the hardware treats that as GO == HO == 0 @@ -343,7 +384,66 @@ static int amd_core_hw_config(struct perf_event *event) if ((x86_pmu.flags & PMU_FL_PAIR) && amd_is_pair_event_code(&event->hw)) event->hw.flags |= PERF_X86_EVENT_PAIR; - return 0; + /* + * if branch stack is requested + */ + if (has_branch_stack(event)) { + /* + * Due to interrupt holding, BRS is not recommended in + * counting mode. + */ + if (!is_sampling_event(event)) + return -EINVAL; + + /* + * Due to the way BRS operates by holding the interrupt until + * lbr_nr entries have been captured, it does not make sense + * to allow sampling on BRS with an event that does not match + * what BRS is capturing, i.e., retired taken branches. + * Otherwise the correlation with the event's period is even + * more loose: + * + * With retired taken branch: + * Effective P = P + 16 + X + * With any other event: + * Effective P = P + Y + X + * + * Where X is the number of taken branches due to interrupt + * skid. Skid is large. + * + * Where Y is the occurences of the event while BRS is + * capturing the lbr_nr entries. + * + * By using retired taken branches, we limit the impact on the + * Y variable. We know it cannot be more than the depth of + * BRS. + */ + if (!amd_is_brs_event(event)) + return -EINVAL; + + /* + * BRS implementation does not work with frequency mode + * reprogramming of the period. + */ + if (event->attr.freq) + return -EINVAL; + /* + * The kernel subtracts BRS depth from period, so it must + * be big enough. + */ + if (event->attr.sample_period <= x86_pmu.lbr_nr) + return -EINVAL; + + /* + * Check if we can allow PERF_SAMPLE_BRANCH_STACK + */ + ret = amd_brs_setup_filter(event); + + /* only set in case of success */ + if (!ret) + event->hw.flags |= PERF_X86_EVENT_AMD_BRS; + } + return ret; } static inline int amd_is_nb_event(struct hw_perf_event *hwc) @@ -366,7 +466,7 @@ static int amd_pmu_hw_config(struct perf_event *event) if (event->attr.precise_ip && get_ibs_caps()) return -ENOENT; - if (has_branch_stack(event)) + if (has_branch_stack(event) && !x86_pmu.lbr_nr) return -EOPNOTSUPP; ret = x86_pmu_hw_config(event); @@ -510,6 +610,22 @@ static struct amd_nb *amd_alloc_nb(int cpu) return nb; } +static void amd_pmu_cpu_reset(int cpu) +{ + if (x86_pmu.version < 2) + return; + + /* Clear enable bits i.e. PerfCntrGlobalCtl.PerfCntrEn */ + wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0); + + /* + * Clear freeze and overflow bits i.e. PerfCntrGLobalStatus.LbrFreeze + * and PerfCntrGLobalStatus.PerfCntrOvfl + */ + wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, + GLOBAL_STATUS_LBRS_FROZEN | amd_pmu_global_cntr_mask); +} + static int amd_pmu_cpu_prepare(int cpu) { struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu); @@ -534,6 +650,7 @@ static void amd_pmu_cpu_starting(int cpu) int i, nb_id; cpuc->perf_ctr_virt_mask = AMD64_EVENTSEL_HOSTONLY; + amd_pmu_cpu_reset(cpu); if (!x86_pmu.amd_nb_constraints) return; @@ -555,6 +672,8 @@ static void amd_pmu_cpu_starting(int cpu) cpuc->amd_nb->nb_id = nb_id; cpuc->amd_nb->refcnt++; + + amd_brs_reset(); } static void amd_pmu_cpu_dead(int cpu) @@ -576,6 +695,48 @@ static void amd_pmu_cpu_dead(int cpu) } } +static inline void amd_pmu_set_global_ctl(u64 ctl) +{ + wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, ctl); +} + +static inline u64 amd_pmu_get_global_status(void) +{ + u64 status; + + /* PerfCntrGlobalStatus is read-only */ + rdmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, status); + + return status & amd_pmu_global_cntr_mask; +} + +static inline void amd_pmu_ack_global_status(u64 status) +{ + /* + * PerfCntrGlobalStatus is read-only but an overflow acknowledgment + * mechanism exists; writing 1 to a bit in PerfCntrGlobalStatusClr + * clears the same bit in PerfCntrGlobalStatus + */ + + /* Only allow modifications to PerfCntrGlobalStatus.PerfCntrOvfl */ + status &= amd_pmu_global_cntr_mask; + wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, status); +} + +static bool amd_pmu_test_overflow_topbit(int idx) +{ + u64 counter; + + rdmsrl(x86_pmu_event_addr(idx), counter); + + return !(counter & BIT_ULL(x86_pmu.cntval_bits - 1)); +} + +static bool amd_pmu_test_overflow_status(int idx) +{ + return amd_pmu_get_global_status() & BIT_ULL(idx); +} + /* * When a PMC counter overflows, an NMI is used to process the event and * reset the counter. NMI latency can result in the counter being updated @@ -588,7 +749,6 @@ static void amd_pmu_cpu_dead(int cpu) static void amd_pmu_wait_on_overflow(int idx) { unsigned int i; - u64 counter; /* * Wait for the counter to be reset if it has overflowed. This loop @@ -596,22 +756,24 @@ static void amd_pmu_wait_on_overflow(int idx) * forever... */ for (i = 0; i < OVERFLOW_WAIT_COUNT; i++) { - rdmsrl(x86_pmu_event_addr(idx), counter); - if (counter & (1ULL << (x86_pmu.cntval_bits - 1))) - break; + if ( x86_pmu.version >= 2 ) { + if ( !amd_pmu_test_overflow_status(idx) ) + break; + } else { + if ( !amd_pmu_test_overflow_topbit(idx) ) + break; + } /* Might be in IRQ context, so can't sleep */ udelay(1); } } -static void amd_pmu_disable_all(void) +static void amd_pmu_check_overflow(void) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); int idx; - x86_pmu_disable_all(); - /* * This shouldn't be called from NMI context, but add a safeguard here * to return, since if we're in NMI context we can't wait for an NMI @@ -634,6 +796,50 @@ static void amd_pmu_disable_all(void) } } +static void amd_pmu_enable_event(struct perf_event *event) +{ + x86_pmu_enable_event(event); +} + +static void amd_pmu_enable_all(int added) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct hw_perf_event *hwc; + int idx; + + amd_brs_enable_all(); + + for (idx = 0; idx < x86_pmu.num_counters; idx++) { + hwc = &cpuc->events[idx]->hw; + + /* only activate events which are marked as active */ + if (!test_bit(idx, cpuc->active_mask)) + continue; + + amd_pmu_enable_event(cpuc->events[idx]); + } +} + +static void amd_pmu_v2_enable_event(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + + /* + * Testing cpu_hw_events.enabled should be skipped in this case unlike + * in x86_pmu_enable_event(). + * + * Since cpu_hw_events.enabled is set only after returning from + * x86_pmu_start(), the PMCs must be programmed and kept ready. + * Counting starts only after x86_pmu_enable_all() is called. + */ + __x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE); +} + +static void amd_pmu_v2_enable_all(int added) +{ + amd_pmu_set_global_ctl(amd_pmu_global_cntr_mask); +} + static void amd_pmu_disable_event(struct perf_event *event) { x86_pmu_disable_event(event); @@ -651,6 +857,32 @@ static void amd_pmu_disable_event(struct perf_event *event) amd_pmu_wait_on_overflow(event->hw.idx); } +static void amd_pmu_disable_all(void) +{ + amd_brs_disable_all(); + x86_pmu_disable_all(); + amd_pmu_check_overflow(); +} + +static void amd_pmu_v2_disable_all(void) +{ + /* Disable all PMCs */ + amd_pmu_set_global_ctl(0); + amd_pmu_check_overflow(); +} + +static void amd_pmu_add_event(struct perf_event *event) +{ + if (needs_branch_stack(event)) + amd_pmu_brs_add(event); +} + +static void amd_pmu_del_event(struct perf_event *event) +{ + if (needs_branch_stack(event)) + amd_pmu_brs_del(event); +} + /* * Because of NMI latency, if multiple PMC counters are active or other sources * of NMIs are received, the perf NMI handler can handle one or more overflowed @@ -669,36 +901,128 @@ static void amd_pmu_disable_event(struct perf_event *event) * handled a counter. When an un-handled NMI is received, it will be claimed * only if arriving within that window. */ +static inline int amd_pmu_adjust_nmi_window(int handled) +{ + /* + * If a counter was handled, record a timestamp such that un-handled + * NMIs will be claimed if arriving within that window. + */ + if (handled) { + this_cpu_write(perf_nmi_tstamp, jiffies + perf_nmi_window); + + return handled; + } + + if (time_after(jiffies, this_cpu_read(perf_nmi_tstamp))) + return NMI_DONE; + + return NMI_HANDLED; +} + static int amd_pmu_handle_irq(struct pt_regs *regs) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - int active, handled; + int handled; + int pmu_enabled; /* - * Obtain the active count before calling x86_pmu_handle_irq() since - * it is possible that x86_pmu_handle_irq() may make a counter - * inactive (through x86_pmu_stop). + * Save the PMU state. + * It needs to be restored when leaving the handler. */ - active = __bitmap_weight(cpuc->active_mask, X86_PMC_IDX_MAX); + pmu_enabled = cpuc->enabled; + cpuc->enabled = 0; + + /* stop everything (includes BRS) */ + amd_pmu_disable_all(); + + /* Drain BRS is in use (could be inactive) */ + if (cpuc->lbr_users) + amd_brs_drain(); /* Process any counter overflows */ handled = x86_pmu_handle_irq(regs); + cpuc->enabled = pmu_enabled; + if (pmu_enabled) + amd_pmu_enable_all(0); + + return amd_pmu_adjust_nmi_window(handled); +} + +static int amd_pmu_v2_handle_irq(struct pt_regs *regs) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct perf_sample_data data; + struct hw_perf_event *hwc; + struct perf_event *event; + int handled = 0, idx; + u64 status, mask; + bool pmu_enabled; + /* - * If a counter was handled, record a timestamp such that un-handled - * NMIs will be claimed if arriving within that window. + * Save the PMU state as it needs to be restored when leaving the + * handler */ - if (handled) { - this_cpu_write(perf_nmi_tstamp, - jiffies + perf_nmi_window); + pmu_enabled = cpuc->enabled; + cpuc->enabled = 0; - return handled; + /* Stop counting */ + amd_pmu_v2_disable_all(); + + status = amd_pmu_get_global_status(); + + /* Check if any overflows are pending */ + if (!status) + goto done; + + for (idx = 0; idx < x86_pmu.num_counters; idx++) { + if (!test_bit(idx, cpuc->active_mask)) + continue; + + event = cpuc->events[idx]; + hwc = &event->hw; + x86_perf_event_update(event); + mask = BIT_ULL(idx); + + if (!(status & mask)) + continue; + + /* Event overflow */ + handled++; + status &= ~mask; + perf_sample_data_init(&data, 0, hwc->last_period); + + if (!x86_perf_event_set_period(event)) + continue; + + if (perf_event_overflow(event, &data, regs)) + x86_pmu_stop(event, 0); } - if (time_after(jiffies, this_cpu_read(perf_nmi_tstamp))) - return NMI_DONE; + /* + * It should never be the case that some overflows are not handled as + * the corresponding PMCs are expected to be inactive according to the + * active_mask + */ + WARN_ON(status > 0); - return NMI_HANDLED; + /* Clear overflow bits */ + amd_pmu_ack_global_status(~status); + + /* + * Unmasking the LVTPC is not required as the Mask (M) bit of the LVT + * PMI entry is not set by the local APIC when a PMC overflow occurs + */ + inc_irq_stat(apic_perf_irqs); + +done: + cpuc->enabled = pmu_enabled; + + /* Resume counting only if PMU is active */ + if (pmu_enabled) + amd_pmu_v2_enable_all(0); + + return amd_pmu_adjust_nmi_window(handled); } static struct event_constraint * @@ -906,6 +1230,51 @@ static void amd_put_event_constraints_f17h(struct cpu_hw_events *cpuc, --cpuc->n_pair; } +/* + * Because of the way BRS operates with an inactive and active phases, and + * the link to one counter, it is not possible to have two events using BRS + * scheduled at the same time. There would be an issue with enforcing the + * period of each one and given that the BRS saturates, it would not be possible + * to guarantee correlated content for all events. Therefore, in situations + * where multiple events want to use BRS, the kernel enforces mutual exclusion. + * Exclusion is enforced by chosing only one counter for events using BRS. + * The event scheduling logic will then automatically multiplex the + * events and ensure that at most one event is actively using BRS. + * + * The BRS counter could be any counter, but there is no constraint on Fam19h, + * therefore all counters are equal and thus we pick the first one: PMC0 + */ +static struct event_constraint amd_fam19h_brs_cntr0_constraint = + EVENT_CONSTRAINT(0, 0x1, AMD64_RAW_EVENT_MASK); + +static struct event_constraint amd_fam19h_brs_pair_cntr0_constraint = + __EVENT_CONSTRAINT(0, 0x1, AMD64_RAW_EVENT_MASK, 1, 0, PERF_X86_EVENT_PAIR); + +static struct event_constraint * +amd_get_event_constraints_f19h(struct cpu_hw_events *cpuc, int idx, + struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + bool has_brs = has_amd_brs(hwc); + + /* + * In case BRS is used with an event requiring a counter pair, + * the kernel allows it but only on counter 0 & 1 to enforce + * multiplexing requiring to protect BRS in case of multiple + * BRS users + */ + if (amd_is_pair_event_code(hwc)) { + return has_brs ? &amd_fam19h_brs_pair_cntr0_constraint + : &pair_constraint; + } + + if (has_brs) + return &amd_fam19h_brs_cntr0_constraint; + + return &unconstrained; +} + + static ssize_t amd_event_sysfs_show(char *page, u64 config) { u64 event = (config & ARCH_PERFMON_EVENTSEL_EVENT) | @@ -914,12 +1283,19 @@ static ssize_t amd_event_sysfs_show(char *page, u64 config) return x86_event_sysfs_show(page, config, event); } +static void amd_pmu_sched_task(struct perf_event_context *ctx, + bool sched_in) +{ + if (sched_in && x86_pmu.lbr_nr) + amd_pmu_brs_sched_task(ctx, sched_in); +} + static __initconst const struct x86_pmu amd_pmu = { .name = "AMD", .handle_irq = amd_pmu_handle_irq, .disable_all = amd_pmu_disable_all, - .enable_all = x86_pmu_enable_all, - .enable = x86_pmu_enable_event, + .enable_all = amd_pmu_enable_all, + .enable = amd_pmu_enable_event, .disable = amd_pmu_disable_event, .hw_config = amd_pmu_hw_config, .schedule_events = x86_schedule_events, @@ -929,6 +1305,8 @@ static __initconst const struct x86_pmu amd_pmu = { .event_map = amd_pmu_event_map, .max_events = ARRAY_SIZE(amd_perfmon_event_map), .num_counters = AMD64_NUM_COUNTERS, + .add = amd_pmu_add_event, + .del = amd_pmu_del_event, .cntval_bits = 48, .cntval_mask = (1ULL << 48) - 1, .apic = 1, @@ -947,8 +1325,40 @@ static __initconst const struct x86_pmu amd_pmu = { .amd_nb_constraints = 1, }; +static ssize_t branches_show(struct device *cdev, + struct device_attribute *attr, + char *buf) +{ + return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu.lbr_nr); +} + +static DEVICE_ATTR_RO(branches); + +static struct attribute *amd_pmu_brs_attrs[] = { + &dev_attr_branches.attr, + NULL, +}; + +static umode_t +amd_brs_is_visible(struct kobject *kobj, struct attribute *attr, int i) +{ + return x86_pmu.lbr_nr ? attr->mode : 0; +} + +static struct attribute_group group_caps_amd_brs = { + .name = "caps", + .attrs = amd_pmu_brs_attrs, + .is_visible = amd_brs_is_visible, +}; + +static const struct attribute_group *amd_attr_update[] = { + &group_caps_amd_brs, + NULL, +}; + static int __init amd_core_pmu_init(void) { + union cpuid_0x80000022_ebx ebx; u64 even_ctr_mask = 0ULL; int i; @@ -966,6 +1376,26 @@ static int __init amd_core_pmu_init(void) x86_pmu.eventsel = MSR_F15H_PERF_CTL; x86_pmu.perfctr = MSR_F15H_PERF_CTR; x86_pmu.num_counters = AMD64_NUM_COUNTERS_CORE; + + /* Check for Performance Monitoring v2 support */ + if (boot_cpu_has(X86_FEATURE_PERFMON_V2)) { + ebx.full = cpuid_ebx(EXT_PERFMON_DEBUG_FEATURES); + + /* Update PMU version for later usage */ + x86_pmu.version = 2; + + /* Find the number of available Core PMCs */ + x86_pmu.num_counters = ebx.split.num_core_pmc; + + amd_pmu_global_cntr_mask = (1ULL << x86_pmu.num_counters) - 1; + + /* Update PMC handling functions */ + x86_pmu.enable_all = amd_pmu_v2_enable_all; + x86_pmu.disable_all = amd_pmu_v2_disable_all; + x86_pmu.enable = amd_pmu_v2_enable_event; + x86_pmu.handle_irq = amd_pmu_v2_handle_irq; + } + /* * AMD Core perfctr has separate MSRs for the NB events, see * the amd/uncore.c driver. @@ -998,6 +1428,19 @@ static int __init amd_core_pmu_init(void) x86_pmu.flags |= PMU_FL_PAIR; } + /* + * BRS requires special event constraints and flushing on ctxsw. + */ + if (boot_cpu_data.x86 >= 0x19 && !amd_brs_init()) { + x86_pmu.get_event_constraints = amd_get_event_constraints_f19h; + x86_pmu.sched_task = amd_pmu_sched_task; + /* + * put_event_constraints callback same as Fam17h, set above + */ + } + + x86_pmu.attr_update = amd_attr_update; + pr_cont("core perfctr, "); return 0; } @@ -1032,6 +1475,24 @@ __init int amd_pmu_init(void) return 0; } +static inline void amd_pmu_reload_virt(void) +{ + if (x86_pmu.version >= 2) { + /* + * Clear global enable bits, reprogram the PERF_CTL + * registers with updated perf_ctr_virt_mask and then + * set global enable bits once again + */ + amd_pmu_v2_disable_all(); + amd_pmu_enable_all(0); + amd_pmu_v2_enable_all(0); + return; + } + + amd_pmu_disable_all(); + amd_pmu_enable_all(0); +} + void amd_pmu_enable_virt(void) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); @@ -1039,8 +1500,7 @@ void amd_pmu_enable_virt(void) cpuc->perf_ctr_virt_mask = 0; /* Reload all events */ - amd_pmu_disable_all(); - x86_pmu_enable_all(0); + amd_pmu_reload_virt(); } EXPORT_SYMBOL_GPL(amd_pmu_enable_virt); @@ -1057,7 +1517,6 @@ void amd_pmu_disable_virt(void) cpuc->perf_ctr_virt_mask = AMD64_EVENTSEL_HOSTONLY; /* Reload all events */ - amd_pmu_disable_all(); - x86_pmu_enable_all(0); + amd_pmu_reload_virt(); } EXPORT_SYMBOL_GPL(amd_pmu_disable_virt); diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c index 2e930d8c04d955fea000435b608c9a611ab66a90..c251bc44c088d8382b9bfac688e1901f78642bf1 100644 --- a/arch/x86/events/amd/ibs.c +++ b/arch/x86/events/amd/ibs.c @@ -26,6 +26,7 @@ static u32 ibs_caps; #include #include +#include #define IBS_FETCH_CONFIG_MASK (IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT) #define IBS_OP_CONFIG_MASK IBS_OP_MAX_CNT @@ -93,22 +94,9 @@ struct perf_ibs { unsigned int fetch_ignore_if_zero_rip : 1; struct cpu_perf_ibs __percpu *pcpu; - struct attribute **format_attrs; - struct attribute_group format_group; - const struct attribute_group *attr_groups[2]; - u64 (*get_count)(u64 config); }; -struct perf_ibs_data { - u32 size; - union { - u32 data[0]; /* data buffer starts here */ - u32 caps; - }; - u64 regs[MSR_AMD64_IBS_REG_COUNT_MAX]; -}; - static int perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *hw_period) { @@ -339,11 +327,14 @@ static int perf_ibs_set_period(struct perf_ibs *perf_ibs, static u64 get_ibs_fetch_count(u64 config) { - return (config & IBS_FETCH_CNT) >> 12; + union ibs_fetch_ctl fetch_ctl = (union ibs_fetch_ctl)config; + + return fetch_ctl.fetch_cnt << 4; } static u64 get_ibs_op_count(u64 config) { + union ibs_op_ctl op_ctl = (union ibs_op_ctl)config; u64 count = 0; /* @@ -351,10 +342,13 @@ static u64 get_ibs_op_count(u64 config) * and the lower 7 bits of CurCnt are randomized. * Otherwise CurCnt has the full 27-bit current counter value. */ - if (config & IBS_OP_VAL) - count = (config & IBS_OP_MAX_CNT) << 4; - else if (ibs_caps & IBS_CAPS_RDWROPCNT) - count = (config & IBS_OP_CUR_CNT) >> 32; + if (op_ctl.op_val) { + count = op_ctl.opmaxcnt << 4; + if (ibs_caps & IBS_CAPS_OPCNTEXT) + count += op_ctl.opmaxcnt_ext << 20; + } else if (ibs_caps & IBS_CAPS_RDWROPCNT) { + count = op_ctl.opcurcnt; + } return count; } @@ -415,7 +409,7 @@ static void perf_ibs_start(struct perf_event *event, int flags) struct hw_perf_event *hwc = &event->hw; struct perf_ibs *perf_ibs = container_of(event->pmu, struct perf_ibs, pmu); struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu); - u64 period; + u64 period, config = 0; if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED))) return; @@ -424,13 +418,19 @@ static void perf_ibs_start(struct perf_event *event, int flags) hwc->state = 0; perf_ibs_set_period(perf_ibs, hwc, &period); + if (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_OPCNTEXT)) { + config |= period & IBS_OP_MAX_CNT_EXT_MASK; + period &= ~IBS_OP_MAX_CNT_EXT_MASK; + } + config |= period >> 4; + /* * Set STARTED before enabling the hardware, such that a subsequent NMI * must observe it. */ set_bit(IBS_STARTED, pcpu->state); clear_bit(IBS_STOPPING, pcpu->state); - perf_ibs_enable_event(perf_ibs, hwc, period >> 4); + perf_ibs_enable_event(perf_ibs, hwc, config); perf_event_update_userpage(event); } @@ -524,16 +524,118 @@ static void perf_ibs_del(struct perf_event *event, int flags) static void perf_ibs_read(struct perf_event *event) { } +/* + * We need to initialize with empty group if all attributes in the + * group are dynamic. + */ +static struct attribute *attrs_empty[] = { + NULL, +}; + +static struct attribute_group empty_format_group = { + .name = "format", + .attrs = attrs_empty, +}; + +static struct attribute_group empty_caps_group = { + .name = "caps", + .attrs = attrs_empty, +}; + +static const struct attribute_group *empty_attr_groups[] = { + &empty_format_group, + &empty_caps_group, + NULL, +}; + PMU_FORMAT_ATTR(rand_en, "config:57"); PMU_FORMAT_ATTR(cnt_ctl, "config:19"); +PMU_EVENT_ATTR_STRING(l3missonly, fetch_l3missonly, "config:59"); +PMU_EVENT_ATTR_STRING(l3missonly, op_l3missonly, "config:16"); +PMU_EVENT_ATTR_STRING(zen4_ibs_extensions, zen4_ibs_extensions, "1"); + +static umode_t +zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int i) +{ + return ibs_caps & IBS_CAPS_ZEN4 ? attr->mode : 0; +} -static struct attribute *ibs_fetch_format_attrs[] = { +static struct attribute *rand_en_attrs[] = { &format_attr_rand_en.attr, NULL, }; -static struct attribute *ibs_op_format_attrs[] = { - NULL, /* &format_attr_cnt_ctl.attr if IBS_CAPS_OPCNT */ +static struct attribute *fetch_l3missonly_attrs[] = { + &fetch_l3missonly.attr.attr, + NULL, +}; + +static struct attribute *zen4_ibs_extensions_attrs[] = { + &zen4_ibs_extensions.attr.attr, + NULL, +}; + +static struct attribute_group group_rand_en = { + .name = "format", + .attrs = rand_en_attrs, +}; + +static struct attribute_group group_fetch_l3missonly = { + .name = "format", + .attrs = fetch_l3missonly_attrs, + .is_visible = zen4_ibs_extensions_is_visible, +}; + +static struct attribute_group group_zen4_ibs_extensions = { + .name = "caps", + .attrs = zen4_ibs_extensions_attrs, + .is_visible = zen4_ibs_extensions_is_visible, +}; + +static const struct attribute_group *fetch_attr_groups[] = { + &group_rand_en, + &empty_caps_group, + NULL, +}; + +static const struct attribute_group *fetch_attr_update[] = { + &group_fetch_l3missonly, + &group_zen4_ibs_extensions, + NULL, +}; + +static umode_t +cnt_ctl_is_visible(struct kobject *kobj, struct attribute *attr, int i) +{ + return ibs_caps & IBS_CAPS_OPCNT ? attr->mode : 0; +} + +static struct attribute *cnt_ctl_attrs[] = { + &format_attr_cnt_ctl.attr, + NULL, +}; + +static struct attribute *op_l3missonly_attrs[] = { + &op_l3missonly.attr.attr, + NULL, +}; + +static struct attribute_group group_cnt_ctl = { + .name = "format", + .attrs = cnt_ctl_attrs, + .is_visible = cnt_ctl_is_visible, +}; + +static struct attribute_group group_op_l3missonly = { + .name = "format", + .attrs = op_l3missonly_attrs, + .is_visible = zen4_ibs_extensions_is_visible, +}; + +static const struct attribute_group *op_attr_update[] = { + &group_cnt_ctl, + &group_op_l3missonly, + &group_zen4_ibs_extensions, NULL, }; @@ -557,7 +659,6 @@ static struct perf_ibs perf_ibs_fetch = { .max_period = IBS_FETCH_MAX_CNT << 4, .offset_mask = { MSR_AMD64_IBSFETCH_REG_MASK }, .offset_max = MSR_AMD64_IBSFETCH_REG_COUNT, - .format_attrs = ibs_fetch_format_attrs, .get_count = get_ibs_fetch_count, }; @@ -583,7 +684,6 @@ static struct perf_ibs perf_ibs_op = { .max_period = IBS_OP_MAX_CNT << 4, .offset_mask = { MSR_AMD64_IBSOP_REG_MASK }, .offset_max = MSR_AMD64_IBSOP_REG_COUNT, - .format_attrs = ibs_op_format_attrs, .get_count = get_ibs_op_count, }; @@ -599,7 +699,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs) struct perf_ibs_data ibs_data; int offset, size, check_rip, offset_max, throttle = 0; unsigned int msr; - u64 *buf, *config, period; + u64 *buf, *config, period, new_config = 0; if (!test_bit(IBS_STARTED, pcpu->state)) { fail: @@ -706,13 +806,17 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs) if (throttle) { perf_ibs_stop(event, 0); } else { - period >>= 4; - - if ((ibs_caps & IBS_CAPS_RDWROPCNT) && - (*config & IBS_OP_CNT_CTL)) - period |= *config & IBS_OP_CUR_CNT_RAND; + if (perf_ibs == &perf_ibs_op) { + if (ibs_caps & IBS_CAPS_OPCNTEXT) { + new_config = period & IBS_OP_MAX_CNT_EXT_MASK; + period &= ~IBS_OP_MAX_CNT_EXT_MASK; + } + if ((ibs_caps & IBS_CAPS_RDWROPCNT) && (*config & IBS_OP_CNT_CTL)) + new_config |= *config & IBS_OP_CUR_CNT_RAND; + } + new_config |= period >> 4; - perf_ibs_enable_event(perf_ibs, hwc, period); + perf_ibs_enable_event(perf_ibs, hwc, new_config); } perf_event_update_userpage(event); @@ -749,17 +853,6 @@ static __init int perf_ibs_pmu_init(struct perf_ibs *perf_ibs, char *name) perf_ibs->pcpu = pcpu; - /* register attributes */ - if (perf_ibs->format_attrs[0]) { - memset(&perf_ibs->format_group, 0, sizeof(perf_ibs->format_group)); - perf_ibs->format_group.name = "format"; - perf_ibs->format_group.attrs = perf_ibs->format_attrs; - - memset(&perf_ibs->attr_groups, 0, sizeof(perf_ibs->attr_groups)); - perf_ibs->attr_groups[0] = &perf_ibs->format_group; - perf_ibs->pmu.attr_groups = perf_ibs->attr_groups; - } - ret = perf_pmu_register(&perf_ibs->pmu, name, -1); if (ret) { perf_ibs->pcpu = NULL; @@ -769,10 +862,8 @@ static __init int perf_ibs_pmu_init(struct perf_ibs *perf_ibs, char *name) return ret; } -static __init void perf_event_ibs_init(void) +static __init int perf_ibs_fetch_init(void) { - struct attribute **attr = ibs_op_format_attrs; - /* * Some chips fail to reset the fetch count when it is written; instead * they need a 0-1 transition of IbsFetchEn. @@ -783,21 +874,72 @@ static __init void perf_event_ibs_init(void) if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model < 0x10) perf_ibs_fetch.fetch_ignore_if_zero_rip = 1; - perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch"); + if (ibs_caps & IBS_CAPS_ZEN4) + perf_ibs_fetch.config_mask |= IBS_FETCH_L3MISSONLY; - if (ibs_caps & IBS_CAPS_OPCNT) { + perf_ibs_fetch.pmu.attr_groups = fetch_attr_groups; + perf_ibs_fetch.pmu.attr_update = fetch_attr_update; + + return perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch"); +} + +static __init int perf_ibs_op_init(void) +{ + if (ibs_caps & IBS_CAPS_OPCNT) perf_ibs_op.config_mask |= IBS_OP_CNT_CTL; - *attr++ = &format_attr_cnt_ctl.attr; + + if (ibs_caps & IBS_CAPS_OPCNTEXT) { + perf_ibs_op.max_period |= IBS_OP_MAX_CNT_EXT_MASK; + perf_ibs_op.config_mask |= IBS_OP_MAX_CNT_EXT_MASK; + perf_ibs_op.cnt_mask |= IBS_OP_MAX_CNT_EXT_MASK; } - perf_ibs_pmu_init(&perf_ibs_op, "ibs_op"); - register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs"); + if (ibs_caps & IBS_CAPS_ZEN4) + perf_ibs_op.config_mask |= IBS_OP_L3MISSONLY; + + perf_ibs_op.pmu.attr_groups = empty_attr_groups; + perf_ibs_op.pmu.attr_update = op_attr_update; + + return perf_ibs_pmu_init(&perf_ibs_op, "ibs_op"); +} + +static __init int perf_event_ibs_init(void) +{ + int ret; + + ret = perf_ibs_fetch_init(); + if (ret) + return ret; + + ret = perf_ibs_op_init(); + if (ret) + goto err_op; + + ret = register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs"); + if (ret) + goto err_nmi; + pr_info("perf: AMD IBS detected (0x%08x)\n", ibs_caps); + return 0; + +err_nmi: + perf_pmu_unregister(&perf_ibs_op.pmu); + free_percpu(perf_ibs_op.pcpu); + perf_ibs_op.pcpu = NULL; +err_op: + perf_pmu_unregister(&perf_ibs_fetch.pmu); + free_percpu(perf_ibs_fetch.pcpu); + perf_ibs_fetch.pcpu = NULL; + + return ret; } #else /* defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD) */ -static __init void perf_event_ibs_init(void) { } +static __init int perf_event_ibs_init(void) +{ + return 0; +} #endif @@ -1067,9 +1209,7 @@ static __init int amd_ibs_init(void) x86_pmu_amd_ibs_starting_cpu, x86_pmu_amd_ibs_dying_cpu); - perf_event_ibs_init(); - - return 0; + return perf_event_ibs_init(); } /* Since we need the pci subsystem to init ibs we can't do this earlier: */ diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c index af14e78a1697a4ae9080e99396bb272797eeb79b..2d7840ea0743f2afc47c10b2afa66c39b6fb1068 100644 --- a/arch/x86/events/amd/uncore.c +++ b/arch/x86/events/amd/uncore.c @@ -21,7 +21,6 @@ #define NUM_COUNTERS_NB 4 #define NUM_COUNTERS_L2 4 #define NUM_COUNTERS_L3 6 -#define MAX_COUNTERS 6 #define RDPMC_BASE_NB 6 #define RDPMC_BASE_LLC 10 @@ -31,6 +30,7 @@ #undef pr_fmt #define pr_fmt(fmt) "amd_uncore: " fmt +static int pmu_version; static int num_counters_llc; static int num_counters_nb; static bool l3_mask; @@ -46,7 +46,7 @@ struct amd_uncore { u32 msr_base; cpumask_t *active_mask; struct pmu *pmu; - struct perf_event *events[MAX_COUNTERS]; + struct perf_event **events; struct hlist_node node; }; @@ -158,6 +158,16 @@ static int amd_uncore_add(struct perf_event *event, int flags) hwc->event_base_rdpmc = uncore->rdpmc_base + hwc->idx; hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED; + /* + * The first four DF counters are accessible via RDPMC index 6 to 9 + * followed by the L3 counters from index 10 to 15. For processors + * with more than four DF counters, the DF RDPMC assignments become + * discontiguous as the additional counters are accessible starting + * from index 16. + */ + if (is_nb_event(event) && hwc->idx >= NUM_COUNTERS_NB) + hwc->event_base_rdpmc += NUM_COUNTERS_L3; + if (flags & PERF_EF_START) amd_uncore_start(event, PERF_EF_RELOAD); @@ -238,6 +248,9 @@ static int amd_uncore_event_init(struct perf_event *event) boot_cpu_data.x86_model == 0x10) event_mask = HYGON_F18H_M6H_RAW_EVENT_MASK_NB; } + if (pmu_version >= 2 && is_nb_event(event)) + event_mask = AMD64_PERFMON_V2_RAW_EVENT_MASK_NB; + /* * NB and Last level cache counters (MSRs) are shared across all cores * that share the same NB / Last level cache. On family 16h and below, @@ -280,6 +293,19 @@ hygon_f18h_m6h_uncore_is_visible(struct kobject *kobj, struct attribute *attr, i attr->mode : 0; } +static umode_t +amd_f17h_uncore_is_visible(struct kobject *kobj, struct attribute *attr, int i) +{ + return boot_cpu_data.x86 >= 0x17 && boot_cpu_data.x86 < 0x19 ? + attr->mode : 0; +} + +static umode_t +amd_f19h_uncore_is_visible(struct kobject *kobj, struct attribute *attr, int i) +{ + return boot_cpu_data.x86 >= 0x19 ? attr->mode : 0; +} + static ssize_t amd_uncore_attr_show_cpumask(struct device *dev, struct device_attribute *attr, char *buf) @@ -320,11 +346,13 @@ static struct device_attribute format_attr_##_var = \ DEFINE_UNCORE_FORMAT_ATTR(event12, event, "config:0-7,32-35"); DEFINE_UNCORE_FORMAT_ATTR(event14, event, "config:0-7,32-35,59-60"); /* F17h+ DF */ +DEFINE_UNCORE_FORMAT_ATTR(event14v2, event, "config:0-7,32-37"); /* PerfMonV2 DF */ DEFINE_UNCORE_FORMAT_ATTR(event14f18h, event, "config:0-7,32-35,61-62"); /* F18h DF */ DEFINE_UNCORE_FORMAT_ATTR(event8, event, "config:0-7"); /* F17h+ L3 */ -DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15"); DEFINE_UNCORE_FORMAT_ATTR(umask10f18h, umask, "config:8-17"); /* F18h M4h DF */ DEFINE_UNCORE_FORMAT_ATTR(umask12f18h, umask, "config:8-19"); /* F18h M6h DF */ +DEFINE_UNCORE_FORMAT_ATTR(umask8, umask, "config:8-15"); +DEFINE_UNCORE_FORMAT_ATTR(umask12, umask, "config:8-15,24-27"); /* PerfMonV2 DF */ DEFINE_UNCORE_FORMAT_ATTR(coreid, coreid, "config:42-44"); /* F19h L3 */ DEFINE_UNCORE_FORMAT_ATTR(slicemask, slicemask, "config:48-51"); /* F17h L3 */ DEFINE_UNCORE_FORMAT_ATTR(threadmask8, threadmask, "config:56-63"); /* F17h L3 */ @@ -335,20 +363,33 @@ DEFINE_UNCORE_FORMAT_ATTR(sliceid, sliceid, "config:48-50"); /* F19h L3 */ DEFINE_UNCORE_FORMAT_ATTR(slicemask4, slicemask, "config:28-31"); /* F18h L3 */ DEFINE_UNCORE_FORMAT_ATTR(threadmask32, threadmask, "config:32-63"); /* F18h L3 */ +/* Common DF and NB attributes */ static struct attribute *amd_uncore_df_format_attr[] = { - &format_attr_event12.attr, /* event14 if F17h+ */ - &format_attr_umask.attr, + &format_attr_event12.attr, /* event */ + &format_attr_umask8.attr, /* umask */ NULL, }; +/* Common L2 and L3 attributes */ static struct attribute *amd_uncore_l3_format_attr[] = { - &format_attr_event12.attr, /* event8 if F17h+ */ - &format_attr_umask.attr, - NULL, /* slicemask if F17h, coreid if F19h */ - NULL, /* threadmask8 if F17h, enallslices if F19h */ - NULL, /* enallcores if F19h */ - NULL, /* sliceid if F19h */ - NULL, /* threadmask2 if F19h */ + &format_attr_event12.attr, /* event */ + &format_attr_umask8.attr, /* umask */ + NULL, /* threadmask */ + NULL, +}; + +/* F17h unique L3 attributes */ +static struct attribute *amd_f17h_uncore_l3_format_attr[] = { + &format_attr_slicemask.attr, /* slicemask */ + NULL, +}; + +/* F19h unique L3 attributes */ +static struct attribute *amd_f19h_uncore_l3_format_attr[] = { + &format_attr_coreid.attr, /* coreid */ + &format_attr_enallslices.attr, /* enallslices */ + &format_attr_enallcores.attr, /* enallcores */ + &format_attr_sliceid.attr, /* sliceid */ NULL, }; @@ -374,6 +415,18 @@ static struct attribute_group hygon_f18h_m6h_uncore_l3_format_group = { .is_visible = hygon_f18h_m6h_uncore_is_visible, }; +static struct attribute_group amd_f17h_uncore_l3_format_group = { + .name = "format", + .attrs = amd_f17h_uncore_l3_format_attr, + .is_visible = amd_f17h_uncore_is_visible, +}; + +static struct attribute_group amd_f19h_uncore_l3_format_group = { + .name = "format", + .attrs = amd_f19h_uncore_l3_format_attr, + .is_visible = amd_f19h_uncore_is_visible, +}; + static const struct attribute_group *amd_uncore_df_attr_groups[] = { &amd_uncore_attr_group, &amd_uncore_df_format_group, @@ -391,6 +444,12 @@ static const struct attribute_group *hygon_uncore_l3_attr_update[] = { NULL, }; +static const struct attribute_group *amd_uncore_l3_attr_update[] = { + &amd_f17h_uncore_l3_format_group, + &amd_f19h_uncore_l3_format_group, + NULL, +}; + static struct pmu amd_nb_pmu = { .task_ctx_nr = perf_invalid_context, .attr_groups = amd_uncore_df_attr_groups, @@ -402,11 +461,13 @@ static struct pmu amd_nb_pmu = { .stop = amd_uncore_stop, .read = amd_uncore_read, .capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT, + .module = THIS_MODULE, }; static struct pmu amd_llc_pmu = { .task_ctx_nr = perf_invalid_context, .attr_groups = amd_uncore_l3_attr_groups, + .attr_update = amd_uncore_l3_attr_update, .name = "amd_l2", .event_init = amd_uncore_event_init, .add = amd_uncore_add, @@ -415,6 +476,7 @@ static struct pmu amd_llc_pmu = { .stop = amd_uncore_stop, .read = amd_uncore_read, .capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT, + .module = THIS_MODULE, }; static struct amd_uncore *amd_uncore_alloc(unsigned int cpu) @@ -423,11 +485,19 @@ static struct amd_uncore *amd_uncore_alloc(unsigned int cpu) cpu_to_node(cpu)); } +static inline struct perf_event ** +amd_uncore_events_alloc(unsigned int num, unsigned int cpu) +{ + return kzalloc_node(sizeof(struct perf_event *) * num, GFP_KERNEL, + cpu_to_node(cpu)); +} + static int amd_uncore_cpu_up_prepare(unsigned int cpu) { - struct amd_uncore *uncore_nb = NULL, *uncore_llc; + struct amd_uncore *uncore_nb = NULL, *uncore_llc = NULL; if (amd_uncore_nb) { + *per_cpu_ptr(amd_uncore_nb, cpu) = NULL; uncore_nb = amd_uncore_alloc(cpu); if (!uncore_nb) goto fail; @@ -437,11 +507,15 @@ static int amd_uncore_cpu_up_prepare(unsigned int cpu) uncore_nb->msr_base = MSR_F15H_NB_PERF_CTL; uncore_nb->active_mask = &amd_nb_active_mask; uncore_nb->pmu = &amd_nb_pmu; + uncore_nb->events = amd_uncore_events_alloc(num_counters_nb, cpu); + if (!uncore_nb->events) + goto fail; uncore_nb->id = -1; *per_cpu_ptr(amd_uncore_nb, cpu) = uncore_nb; } if (amd_uncore_llc) { + *per_cpu_ptr(amd_uncore_llc, cpu) = NULL; uncore_llc = amd_uncore_alloc(cpu); if (!uncore_llc) goto fail; @@ -451,6 +525,9 @@ static int amd_uncore_cpu_up_prepare(unsigned int cpu) uncore_llc->msr_base = MSR_F16H_L2I_PERF_CTL; uncore_llc->active_mask = &amd_llc_active_mask; uncore_llc->pmu = &amd_llc_pmu; + uncore_llc->events = amd_uncore_events_alloc(num_counters_llc, cpu); + if (!uncore_llc->events) + goto fail; uncore_llc->id = -1; *per_cpu_ptr(amd_uncore_llc, cpu) = uncore_llc; } @@ -458,9 +535,16 @@ static int amd_uncore_cpu_up_prepare(unsigned int cpu) return 0; fail: - if (amd_uncore_nb) - *per_cpu_ptr(amd_uncore_nb, cpu) = NULL; - kfree(uncore_nb); + if (uncore_nb) { + kfree(uncore_nb->events); + kfree(uncore_nb); + } + + if (uncore_llc) { + kfree(uncore_llc->events); + kfree(uncore_llc); + } + return -ENOMEM; } @@ -593,8 +677,11 @@ static void uncore_dead(unsigned int cpu, struct amd_uncore * __percpu *uncores) if (cpu == uncore->cpu) cpumask_clear_cpu(cpu, uncore->active_mask); - if (!--uncore->refcnt) + if (!--uncore->refcnt) { + kfree(uncore->events); kfree(uncore); + } + *per_cpu_ptr(uncores, cpu) = NULL; } @@ -613,6 +700,7 @@ static int __init amd_uncore_init(void) { struct attribute **df_attr = amd_uncore_df_format_attr; struct attribute **l3_attr = amd_uncore_l3_format_attr; + union cpuid_0x80000022_ebx ebx; int ret = -ENODEV; if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD && @@ -622,6 +710,9 @@ static int __init amd_uncore_init(void) if (!boot_cpu_has(X86_FEATURE_TOPOEXT)) return -ENODEV; + if (boot_cpu_has(X86_FEATURE_PERFMON_V2)) + pmu_version = 2; + num_counters_nb = NUM_COUNTERS_NB; num_counters_llc = NUM_COUNTERS_L2; if (boot_cpu_data.x86 >= 0x17) { @@ -638,9 +729,13 @@ static int __init amd_uncore_init(void) } if (boot_cpu_has(X86_FEATURE_PERFCTR_NB)) { - if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && - boot_cpu_data.x86 >= 0x17) { - *df_attr = &format_attr_event14.attr; + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD){ + if (pmu_version >= 2) { + *df_attr++ = &format_attr_event14v2.attr; + *df_attr++ = &format_attr_umask12.attr; + } else if (boot_cpu_data.x86 >= 0x17) { + *df_attr = &format_attr_event14.attr; + } } else if (boot_cpu_data.x86_vendor == X86_VENDOR_HYGON && boot_cpu_data.x86 == 0x18) { *df_attr++ = &format_attr_event14f18h.attr; @@ -663,31 +758,32 @@ static int __init amd_uncore_init(void) if (ret) goto fail_nb; - pr_info("%s NB counters detected\n", - boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ? - "HYGON" : "AMD"); + if (pmu_version >= 2) { + ebx.full = cpuid_ebx(EXT_PERFMON_DEBUG_FEATURES); + num_counters_nb = ebx.split.num_df_pmc; + } + + pr_info("%d %s %s counters detected\n", num_counters_nb, + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ? "HYGON" : "", + amd_nb_pmu.name); + ret = 0; } if (boot_cpu_has(X86_FEATURE_PERFCTR_LLC)) { if (boot_cpu_data.x86 >= 0x19) { *l3_attr++ = &format_attr_event8.attr; - *l3_attr++ = &format_attr_umask.attr; - *l3_attr++ = &format_attr_coreid.attr; - *l3_attr++ = &format_attr_enallslices.attr; - *l3_attr++ = &format_attr_enallcores.attr; - *l3_attr++ = &format_attr_sliceid.attr; + *l3_attr++ = &format_attr_umask8.attr; *l3_attr++ = &format_attr_threadmask2.attr; } else if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && boot_cpu_data.x86 >= 0x17) { *l3_attr++ = &format_attr_event8.attr; - *l3_attr++ = &format_attr_umask.attr; - *l3_attr++ = &format_attr_slicemask.attr; + *l3_attr++ = &format_attr_umask8.attr; *l3_attr++ = &format_attr_threadmask8.attr; } else if (boot_cpu_data.x86_vendor == X86_VENDOR_HYGON && boot_cpu_data.x86 == 0x18) { *l3_attr++ = &format_attr_event8.attr; - *l3_attr++ = &format_attr_umask.attr; + *l3_attr++ = &format_attr_umask8.attr; if (boot_cpu_data.x86_model >= 0x6 && boot_cpu_data.x86_model <= 0xf) { *l3_attr++ = &format_attr_threadmask32.attr; amd_llc_pmu.attr_update = hygon_uncore_l3_attr_update; @@ -706,9 +802,9 @@ static int __init amd_uncore_init(void) if (ret) goto fail_llc; - pr_info("%s LLC counters detected\n", - boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ? - "HYGON" : "AMD"); + pr_info("%d %s %s counters detected\n", num_counters_llc, + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ? "HYGON" : "", + amd_llc_pmu.name); ret = 0; } @@ -746,4 +842,28 @@ static int __init amd_uncore_init(void) return ret; } -device_initcall(amd_uncore_init); + +static void __exit amd_uncore_exit(void) +{ + cpuhp_remove_state(CPUHP_AP_PERF_X86_AMD_UNCORE_ONLINE); + cpuhp_remove_state(CPUHP_AP_PERF_X86_AMD_UNCORE_STARTING); + cpuhp_remove_state(CPUHP_PERF_X86_AMD_UNCORE_PREP); + + if (boot_cpu_has(X86_FEATURE_PERFCTR_LLC)) { + perf_pmu_unregister(&amd_llc_pmu); + free_percpu(amd_uncore_llc); + amd_uncore_llc = NULL; + } + + if (boot_cpu_has(X86_FEATURE_PERFCTR_NB)) { + perf_pmu_unregister(&amd_nb_pmu); + free_percpu(amd_uncore_nb); + amd_uncore_nb = NULL; + } +} + +module_init(amd_uncore_init); +module_exit(amd_uncore_exit); + +MODULE_DESCRIPTION("AMD Uncore Driver"); +MODULE_LICENSE("GPL v2"); diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 6f32db1a251d17c6c5ed79524d8f1d145d2107db..0388280b0004318f31835155b7fec208a42ad9cd 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -223,6 +223,8 @@ static bool check_hw_exists(void) if (ret) goto msr_fail; for (i = 0; i < x86_pmu.num_counters_fixed; i++) { + if (fixed_counter_disabled(i)) + continue; if (val & (0x03 << i*4)) { bios_fail = 1; val_fail = val; @@ -1259,6 +1261,10 @@ static void x86_pmu_enable(struct pmu *pmu) if (hwc->state & PERF_HES_ARCH) continue; + /* + * if cpuc->enabled = 0, then no wrmsr as + * per x86_pmu_enable_event() + */ x86_pmu_start(event, PERF_EF_RELOAD); } cpuc->n_added = 0; @@ -1501,6 +1507,8 @@ void perf_event_print_debug(void) cpu, idx, prev_left); } for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) { + if (fixed_counter_disabled(idx)) + continue; rdmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, pmc_count); pr_info("CPU#%d: fixed-PMC%d count: %016llx\n", @@ -1624,11 +1632,15 @@ int x86_pmu_handle_irq(struct pt_regs *regs) * event overflow */ handled++; - perf_sample_data_init(&data, 0, event->hw.last_period); if (!x86_perf_event_set_period(event)) continue; + perf_sample_data_init(&data, 0, event->hw.last_period); + + if (has_branch_stack(event)) + data.br_stack = &cpuc->lbr_stack; + if (perf_event_overflow(event, &data, regs)) x86_pmu_stop(event, 0); } @@ -1878,6 +1890,20 @@ ssize_t x86_event_sysfs_show(char *page, u64 config, u64 event) static struct attribute_group x86_pmu_attr_group; static struct attribute_group x86_pmu_caps_group; +void x86_pmu_show_pmu_cap(int num_counters, int num_counters_fixed, + u64 intel_ctrl) +{ + pr_info("... version: %d\n", x86_pmu.version); + pr_info("... bit width: %d\n", x86_pmu.cntval_bits); + pr_info("... generic registers: %d\n", num_counters); + pr_info("... value mask: %016Lx\n", x86_pmu.cntval_mask); + pr_info("... max period: %016Lx\n", x86_pmu.max_period); + pr_info("... fixed-purpose events: %lu\n", + hweight64((((1ULL << num_counters_fixed) - 1) + << INTEL_PMC_IDX_FIXED) & intel_ctrl)); + pr_info("... event mask: %016Lx\n", intel_ctrl); +} + static int __init init_hw_perf_events(void) { struct x86_pmu_quirk *quirk; @@ -1934,13 +1960,8 @@ static int __init init_hw_perf_events(void) pmu.attr_update = x86_pmu.attr_update; - pr_info("... version: %d\n", x86_pmu.version); - pr_info("... bit width: %d\n", x86_pmu.cntval_bits); - pr_info("... generic registers: %d\n", x86_pmu.num_counters); - pr_info("... value mask: %016Lx\n", x86_pmu.cntval_mask); - pr_info("... max period: %016Lx\n", x86_pmu.max_period); - pr_info("... fixed-purpose events: %d\n", x86_pmu.num_counters_fixed); - pr_info("... event mask: %016Lx\n", x86_pmu.intel_ctrl); + x86_pmu_show_pmu_cap(x86_pmu.num_counters, x86_pmu.num_counters_fixed, + x86_pmu.intel_ctrl); /* * Install callbacks. Core will call them for each online @@ -2219,7 +2240,7 @@ static int x86_pmu_event_init(struct perf_event *event) if (READ_ONCE(x86_pmu.attr_rdpmc) && !(event->hw.flags & PERF_X86_EVENT_LARGE_PEBS)) - event->hw.flags |= PERF_X86_EVENT_RDPMC_ALLOWED; + event->hw.flags |= PERF_EVENT_FLAG_USER_READ_CNT; return err; } @@ -2231,7 +2252,7 @@ static void refresh_pce(void *ignored) static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm) { - if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED)) + if (!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT)) return; /* @@ -2253,7 +2274,7 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm) static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm) { - if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED)) + if (!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT)) return; if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed)) @@ -2264,7 +2285,7 @@ static int x86_pmu_event_idx(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; - if (!(hwc->flags & PERF_X86_EVENT_RDPMC_ALLOWED)) + if (!(hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT)) return 0; if (is_metric_idx(hwc->idx)) @@ -2427,7 +2448,7 @@ void arch_perf_update_userpage(struct perf_event *event, userpg->cap_user_time = 0; userpg->cap_user_time_zero = 0; userpg->cap_user_rdpmc = - !!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED); + !!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT); userpg->pmc_width = x86_pmu.cntval_bits; if (!using_native_sched_clock() || !sched_clock_stable()) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index bf891caafa3c068bcf67130c1151ddce47ccdb54..7a0cb583ddbeac11db4438fa3960fd75155912f9 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2554,8 +2554,11 @@ static void intel_pmu_reset(void) wrmsrl_safe(x86_pmu_config_addr(idx), 0ull); wrmsrl_safe(x86_pmu_event_addr(idx), 0ull); } - for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) + for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) { + if (fixed_counter_disabled(idx)) + continue; wrmsrl_safe(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, 0ull); + } if (ds) ds->bts_index = ds->bts_buffer_base; @@ -4277,6 +4280,11 @@ static __initconst const struct x86_pmu core_pmu = { .cpu_dead = intel_pmu_cpu_dead, .check_period = intel_pmu_check_period, + + .lbr_reset = intel_pmu_lbr_reset_64, + .lbr_read = intel_pmu_lbr_read_64, + .lbr_save = intel_pmu_lbr_save, + .lbr_restore = intel_pmu_lbr_restore, }; static __initconst const struct x86_pmu intel_pmu = { @@ -4321,6 +4329,11 @@ static __initconst const struct x86_pmu intel_pmu = { .check_period = intel_pmu_check_period, .aux_output_match = intel_pmu_aux_output_match, + + .lbr_reset = intel_pmu_lbr_reset_64, + .lbr_read = intel_pmu_lbr_read_64, + .lbr_save = intel_pmu_lbr_save, + .lbr_restore = intel_pmu_lbr_restore, }; static __init void intel_clovertown_quirk(void) @@ -4897,7 +4910,7 @@ __init int intel_pmu_init(void) union cpuid10_eax eax; union cpuid10_ebx ebx; struct event_constraint *c; - unsigned int unused; + unsigned int fixed_mask; struct extra_reg *er; bool pmem = false; int version, i; @@ -4919,7 +4932,7 @@ __init int intel_pmu_init(void) * Check whether the Architectural PerfMon supports * Branch Misses Retired hw_event or not. */ - cpuid(10, &eax.full, &ebx.full, &unused, &edx.full); + cpuid(10, &eax.full, &ebx.full, &fixed_mask, &edx.full); if (eax.split.mask_length < ARCH_PERFMON_EVENTS_COUNT) return -ENODEV; @@ -4943,12 +4956,15 @@ __init int intel_pmu_init(void) * Quirk: v2 perfmon does not report fixed-purpose events, so * assume at least 3 events, when not running in a hypervisor: */ - if (version > 1) { + if (version > 1 && version < 5) { int assume = 3 * !boot_cpu_has(X86_FEATURE_HYPERVISOR); x86_pmu.num_counters_fixed = max((int)edx.split.num_counters_fixed, assume); - } + + fixed_mask = (1L << x86_pmu.num_counters_fixed) - 1; + } else if (version >= 5) + x86_pmu.num_counters_fixed = fls(fixed_mask); if (version >= 4) x86_pmu.counter_freezing = !disable_counter_freezing; @@ -4960,6 +4976,14 @@ __init int intel_pmu_init(void) x86_pmu.intel_cap.capabilities = capabilities; } + if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_32) { + x86_pmu.lbr_reset = intel_pmu_lbr_reset_32; + x86_pmu.lbr_read = intel_pmu_lbr_read_32; + } + + if (boot_cpu_has(X86_FEATURE_ARCH_LBR)) + intel_pmu_arch_lbr_init(); + intel_ds_init(); x86_add_quirk(intel_arch_events_quirk); /* Install first, so it runs last */ @@ -5486,8 +5510,7 @@ __init int intel_pmu_init(void) x86_pmu.num_counters_fixed = INTEL_PMC_MAX_FIXED; } - x86_pmu.intel_ctrl |= - ((1LL << x86_pmu.num_counters_fixed)-1) << INTEL_PMC_IDX_FIXED; + x86_pmu.intel_ctrl |= (u64)fixed_mask << INTEL_PMC_IDX_FIXED; if (x86_pmu.event_constraints) { /* @@ -5500,13 +5523,22 @@ __init int intel_pmu_init(void) * events to the generic counters. */ if (c->idxmsk64 & INTEL_PMC_MSK_TOPDOWN) { + /* + * Disable topdown slots and metrics events, + * if slots event is not in CPUID. + */ + if (!(INTEL_PMC_MSK_FIXED_SLOTS & x86_pmu.intel_ctrl)) + c->idxmsk64 = 0; c->weight = hweight64(c->idxmsk64); continue; } - if (c->cmask == FIXED_EVENT_FLAGS - && c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES) { - c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1; + if (c->cmask == FIXED_EVENT_FLAGS) { + /* Disabled fixed counters which are not in CPUID */ + c->idxmsk64 &= x86_pmu.intel_ctrl; + + if (c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES) + c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1; } c->idxmsk64 &= ~(~0ULL << (INTEL_PMC_IDX_FIXED + x86_pmu.num_counters_fixed)); diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 6da90c6f33903a92a216a4a32a830ed6d1dd71b0..0fc8ce368e8811c88da123af3d022658cbf06b8a 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -956,14 +956,16 @@ static void adaptive_pebs_record_size_update(void) if (pebs_data_cfg & PEBS_DATACFG_XMMS) sz += sizeof(struct pebs_xmm); if (pebs_data_cfg & PEBS_DATACFG_LBRS) - sz += x86_pmu.lbr_nr * sizeof(struct pebs_lbr_entry); + sz += x86_pmu.lbr_nr * sizeof(struct lbr_entry); cpuc->pebs_record_size = sz; } #define PERF_PEBS_MEMINFO_TYPE (PERF_SAMPLE_ADDR | PERF_SAMPLE_DATA_SRC | \ - PERF_SAMPLE_PHYS_ADDR | PERF_SAMPLE_WEIGHT | \ - PERF_SAMPLE_TRANSACTION) + PERF_SAMPLE_PHYS_ADDR | \ + PERF_SAMPLE_WEIGHT_TYPE | \ + PERF_SAMPLE_TRANSACTION | \ + PERF_SAMPLE_DATA_PAGE_SIZE) static u64 pebs_update_adaptive_cfg(struct perf_event *event) { @@ -988,7 +990,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event *event) gprs = (sample_type & PERF_SAMPLE_REGS_INTR) && (attr->sample_regs_intr & PEBS_GP_REGS); - tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT) && + tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT_TYPE) && ((attr->config & INTEL_ARCH_EVENT_MASK) == x86_pmu.rtm_abort_event); @@ -1339,6 +1341,10 @@ static u64 get_data_src(struct perf_event *event, u64 aux) return val; } +#define PERF_SAMPLE_ADDR_TYPE (PERF_SAMPLE_ADDR | \ + PERF_SAMPLE_PHYS_ADDR | \ + PERF_SAMPLE_DATA_PAGE_SIZE) + static void setup_pebs_fixed_sample_data(struct perf_event *event, struct pt_regs *iregs, void *__pebs, struct perf_sample_data *data, @@ -1366,8 +1372,8 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event, /* * Use latency for weight (only avail with PEBS-LL) */ - if (fll && (sample_type & PERF_SAMPLE_WEIGHT)) - data->weight = pebs->lat; + if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE)) + data->weight.full = pebs->lat; /* * data.data_src encodes the data source @@ -1453,14 +1459,14 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event, } - if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR)) && + if ((sample_type & PERF_SAMPLE_ADDR_TYPE) && x86_pmu.intel_cap.pebs_format >= 1) data->addr = pebs->dla; if (x86_pmu.intel_cap.pebs_format >= 2) { /* Only set the TSX weight when no memory weight. */ - if ((sample_type & PERF_SAMPLE_WEIGHT) && !fll) - data->weight = intel_get_tsx_weight(pebs->tsx_tuning); + if ((sample_type & PERF_SAMPLE_WEIGHT_TYPE) && !fll) + data->weight.full = intel_get_tsx_weight(pebs->tsx_tuning); if (sample_type & PERF_SAMPLE_TRANSACTION) data->txn = intel_get_tsx_transaction(pebs->tsx_tuning, @@ -1574,14 +1580,14 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event, } if (format_size & PEBS_DATACFG_MEMINFO) { - if (sample_type & PERF_SAMPLE_WEIGHT) - data->weight = meminfo->latency ?: + if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) + data->weight.full = meminfo->latency ?: intel_get_tsx_weight(meminfo->tsx_tuning); if (sample_type & PERF_SAMPLE_DATA_SRC) data->data_src.val = get_data_src(event, meminfo->aux); - if (sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR)) + if (sample_type & PERF_SAMPLE_ADDR_TYPE) data->addr = meminfo->address; if (sample_type & PERF_SAMPLE_TRANSACTION) @@ -1597,10 +1603,10 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event, } if (format_size & PEBS_DATACFG_LBRS) { - struct pebs_lbr *lbr = next_record; + struct lbr_entry *lbr = next_record; int num_lbr = ((format_size >> PEBS_DATACFG_LBR_SHIFT) & 0xff) + 1; - next_record = next_record + num_lbr*sizeof(struct pebs_lbr_entry); + next_record = next_record + num_lbr * sizeof(struct lbr_entry); if (has_branch_stack(event)) { intel_pmu_store_pebs_lbrs(lbr); diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index 73dbd30f82d33ffb2cf398f1852c8dca930f421f..aaf3852d26da588aa0df927e8997e13433d733a5 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -8,17 +8,6 @@ #include "../perf_event.h" -enum { - LBR_FORMAT_32 = 0x00, - LBR_FORMAT_LIP = 0x01, - LBR_FORMAT_EIP = 0x02, - LBR_FORMAT_EIP_FLAGS = 0x03, - LBR_FORMAT_EIP_FLAGS2 = 0x04, - LBR_FORMAT_INFO = 0x05, - LBR_FORMAT_TIME = 0x06, - LBR_FORMAT_MAX_KNOWN = LBR_FORMAT_TIME, -}; - static const enum { LBR_EIP_FLAGS = 1, LBR_TSX = 2, @@ -143,8 +132,54 @@ enum { X86_BR_IRQ |\ X86_BR_INT) +/* + * Intel LBR_CTL bits + * + * Hardware branch filter for Arch LBR + */ +#define ARCH_LBR_KERNEL_BIT 1 /* capture at ring0 */ +#define ARCH_LBR_USER_BIT 2 /* capture at ring > 0 */ +#define ARCH_LBR_CALL_STACK_BIT 3 /* enable call stack */ +#define ARCH_LBR_JCC_BIT 16 /* capture conditional branches */ +#define ARCH_LBR_REL_JMP_BIT 17 /* capture relative jumps */ +#define ARCH_LBR_IND_JMP_BIT 18 /* capture indirect jumps */ +#define ARCH_LBR_REL_CALL_BIT 19 /* capture relative calls */ +#define ARCH_LBR_IND_CALL_BIT 20 /* capture indirect calls */ +#define ARCH_LBR_RETURN_BIT 21 /* capture near returns */ +#define ARCH_LBR_OTHER_BRANCH_BIT 22 /* capture other branches */ + +#define ARCH_LBR_KERNEL (1ULL << ARCH_LBR_KERNEL_BIT) +#define ARCH_LBR_USER (1ULL << ARCH_LBR_USER_BIT) +#define ARCH_LBR_CALL_STACK (1ULL << ARCH_LBR_CALL_STACK_BIT) +#define ARCH_LBR_JCC (1ULL << ARCH_LBR_JCC_BIT) +#define ARCH_LBR_REL_JMP (1ULL << ARCH_LBR_REL_JMP_BIT) +#define ARCH_LBR_IND_JMP (1ULL << ARCH_LBR_IND_JMP_BIT) +#define ARCH_LBR_REL_CALL (1ULL << ARCH_LBR_REL_CALL_BIT) +#define ARCH_LBR_IND_CALL (1ULL << ARCH_LBR_IND_CALL_BIT) +#define ARCH_LBR_RETURN (1ULL << ARCH_LBR_RETURN_BIT) +#define ARCH_LBR_OTHER_BRANCH (1ULL << ARCH_LBR_OTHER_BRANCH_BIT) + +#define ARCH_LBR_ANY \ + (ARCH_LBR_JCC |\ + ARCH_LBR_REL_JMP |\ + ARCH_LBR_IND_JMP |\ + ARCH_LBR_REL_CALL |\ + ARCH_LBR_IND_CALL |\ + ARCH_LBR_RETURN |\ + ARCH_LBR_OTHER_BRANCH) + +#define ARCH_LBR_CTL_MASK 0x7f000e + static void intel_pmu_lbr_filter(struct cpu_hw_events *cpuc); +static __always_inline bool is_lbr_call_stack_bit_set(u64 config) +{ + if (static_cpu_has(X86_FEATURE_ARCH_LBR)) + return !!(config & ARCH_LBR_CALL_STACK); + + return !!(config & LBR_CALL_STACK); +} + /* * We only support LBR implementations that have FREEZE_LBRS_ON_PMI * otherwise it becomes near impossible to get a reliable stack. @@ -168,33 +203,46 @@ static void __intel_pmu_lbr_enable(bool pmi) */ if (cpuc->lbr_sel) lbr_select = cpuc->lbr_sel->config & x86_pmu.lbr_sel_mask; - if (!pmi && cpuc->lbr_sel) + if (!static_cpu_has(X86_FEATURE_ARCH_LBR) && !pmi && cpuc->lbr_sel) wrmsrl(MSR_LBR_SELECT, lbr_select); rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl); orig_debugctl = debugctl; - debugctl |= DEBUGCTLMSR_LBR; + + if (!static_cpu_has(X86_FEATURE_ARCH_LBR)) + debugctl |= DEBUGCTLMSR_LBR; /* * LBR callstack does not work well with FREEZE_LBRS_ON_PMI. * If FREEZE_LBRS_ON_PMI is set, PMI near call/return instructions * may cause superfluous increase/decrease of LBR_TOS. */ - if (!(lbr_select & LBR_CALL_STACK)) + if (is_lbr_call_stack_bit_set(lbr_select)) + debugctl &= ~DEBUGCTLMSR_FREEZE_LBRS_ON_PMI; + else debugctl |= DEBUGCTLMSR_FREEZE_LBRS_ON_PMI; + if (orig_debugctl != debugctl) wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl); + + if (static_cpu_has(X86_FEATURE_ARCH_LBR)) + wrmsrl(MSR_ARCH_LBR_CTL, lbr_select | ARCH_LBR_CTL_LBREN); } static void __intel_pmu_lbr_disable(void) { u64 debugctl; + if (static_cpu_has(X86_FEATURE_ARCH_LBR)) { + wrmsrl(MSR_ARCH_LBR_CTL, 0); + return; + } + rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl); debugctl &= ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI); wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl); } -static void intel_pmu_lbr_reset_32(void) +void intel_pmu_lbr_reset_32(void) { int i; @@ -202,7 +250,7 @@ static void intel_pmu_lbr_reset_32(void) wrmsrl(x86_pmu.lbr_from + i, 0); } -static void intel_pmu_lbr_reset_64(void) +void intel_pmu_lbr_reset_64(void) { int i; @@ -210,10 +258,16 @@ static void intel_pmu_lbr_reset_64(void) wrmsrl(x86_pmu.lbr_from + i, 0); wrmsrl(x86_pmu.lbr_to + i, 0); if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO) - wrmsrl(MSR_LBR_INFO_0 + i, 0); + wrmsrl(x86_pmu.lbr_info + i, 0); } } +static void intel_pmu_arch_lbr_reset(void) +{ + /* Write to ARCH_LBR_DEPTH MSR, all LBR entries are reset to 0 */ + wrmsrl(MSR_ARCH_LBR_DEPTH, x86_pmu.lbr_nr); +} + void intel_pmu_lbr_reset(void) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); @@ -221,10 +275,7 @@ void intel_pmu_lbr_reset(void) if (!x86_pmu.lbr_nr) return; - if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_32) - intel_pmu_lbr_reset_32(); - else - intel_pmu_lbr_reset_64(); + x86_pmu.lbr_reset(); cpuc->last_task_ctx = NULL; cpuc->last_log_id = 0; @@ -308,69 +359,97 @@ static u64 lbr_from_signext_quirk_rd(u64 val) return val; } -static inline void wrlbr_from(unsigned int idx, u64 val) +static __always_inline void wrlbr_from(unsigned int idx, u64 val) { val = lbr_from_signext_quirk_wr(val); wrmsrl(x86_pmu.lbr_from + idx, val); } -static inline void wrlbr_to(unsigned int idx, u64 val) +static __always_inline void wrlbr_to(unsigned int idx, u64 val) { wrmsrl(x86_pmu.lbr_to + idx, val); } -static inline u64 rdlbr_from(unsigned int idx) +static __always_inline void wrlbr_info(unsigned int idx, u64 val) +{ + wrmsrl(x86_pmu.lbr_info + idx, val); +} + +static __always_inline u64 rdlbr_from(unsigned int idx, struct lbr_entry *lbr) { u64 val; + if (lbr) + return lbr->from; + rdmsrl(x86_pmu.lbr_from + idx, val); return lbr_from_signext_quirk_rd(val); } -static inline u64 rdlbr_to(unsigned int idx) +static __always_inline u64 rdlbr_to(unsigned int idx, struct lbr_entry *lbr) { u64 val; + if (lbr) + return lbr->to; + rdmsrl(x86_pmu.lbr_to + idx, val); return val; } -static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx) +static __always_inline u64 rdlbr_info(unsigned int idx, struct lbr_entry *lbr) +{ + u64 val; + + if (lbr) + return lbr->info; + + rdmsrl(x86_pmu.lbr_info + idx, val); + + return val; +} + +static inline void +wrlbr_all(struct lbr_entry *lbr, unsigned int idx, bool need_info) +{ + wrlbr_from(idx, lbr->from); + wrlbr_to(idx, lbr->to); + if (need_info) + wrlbr_info(idx, lbr->info); +} + +static inline bool +rdlbr_all(struct lbr_entry *lbr, unsigned int idx, bool need_info) { + u64 from = rdlbr_from(idx, NULL); + + /* Don't read invalid entry */ + if (!from) + return false; + + lbr->from = from; + lbr->to = rdlbr_to(idx, NULL); + if (need_info) + lbr->info = rdlbr_info(idx, NULL); + + return true; +} + +void intel_pmu_lbr_restore(void *ctx) +{ + bool need_info = x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO; struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct x86_perf_task_context *task_ctx = ctx; int i; unsigned lbr_idx, mask; - u64 tos; - - if (task_ctx->lbr_callstack_users == 0 || - task_ctx->lbr_stack_state == LBR_NONE) { - intel_pmu_lbr_reset(); - return; - } - - tos = task_ctx->tos; - /* - * Does not restore the LBR registers, if - * - No one else touched them, and - * - Did not enter C6 - */ - if ((task_ctx == cpuc->last_task_ctx) && - (task_ctx->log_id == cpuc->last_log_id) && - rdlbr_from(tos)) { - task_ctx->lbr_stack_state = LBR_NONE; - return; - } + u64 tos = task_ctx->tos; mask = x86_pmu.lbr_nr - 1; for (i = 0; i < task_ctx->valid_lbrs; i++) { lbr_idx = (tos - i) & mask; - wrlbr_from(lbr_idx, task_ctx->lbr_from[i]); - wrlbr_to (lbr_idx, task_ctx->lbr_to[i]); - - if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO) - wrmsrl(MSR_LBR_INFO_0 + lbr_idx, task_ctx->lbr_info[i]); + wrlbr_all(&task_ctx->lbr[i], lbr_idx, need_info); } for (; i < x86_pmu.lbr_nr; i++) { @@ -378,55 +457,150 @@ static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx) wrlbr_from(lbr_idx, 0); wrlbr_to(lbr_idx, 0); if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO) - wrmsrl(MSR_LBR_INFO_0 + lbr_idx, 0); + wrlbr_info(lbr_idx, 0); } wrmsrl(x86_pmu.lbr_tos, tos); - task_ctx->lbr_stack_state = LBR_NONE; if (cpuc->lbr_select) wrmsrl(MSR_LBR_SELECT, task_ctx->lbr_sel); } -static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx) +static void intel_pmu_arch_lbr_restore(void *ctx) { - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - unsigned lbr_idx, mask; - u64 tos, from; + struct x86_perf_task_context_arch_lbr *task_ctx = ctx; + struct lbr_entry *entries = task_ctx->entries; int i; - if (task_ctx->lbr_callstack_users == 0) { - task_ctx->lbr_stack_state = LBR_NONE; + /* Fast reset the LBRs before restore if the call stack is not full. */ + if (!entries[x86_pmu.lbr_nr - 1].from) + intel_pmu_arch_lbr_reset(); + + for (i = 0; i < x86_pmu.lbr_nr; i++) { + if (!entries[i].from) + break; + wrlbr_all(&entries[i], i, true); + } +} + +static __always_inline bool lbr_is_reset_in_cstate(void *ctx) +{ + if (static_cpu_has(X86_FEATURE_ARCH_LBR)) + return x86_pmu.lbr_deep_c_reset && !rdlbr_from(0, NULL); + + return !rdlbr_from(((struct x86_perf_task_context *)ctx)->tos, NULL); +} + +static void __intel_pmu_lbr_restore(void *ctx) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + + if (task_context_opt(ctx)->lbr_callstack_users == 0 || + task_context_opt(ctx)->lbr_stack_state == LBR_NONE) { + intel_pmu_lbr_reset(); + return; + } + + /* + * Does not restore the LBR registers, if + * - No one else touched them, and + * - Was not cleared in Cstate + */ + if ((ctx == cpuc->last_task_ctx) && + (task_context_opt(ctx)->log_id == cpuc->last_log_id) && + !lbr_is_reset_in_cstate(ctx)) { + task_context_opt(ctx)->lbr_stack_state = LBR_NONE; return; } + x86_pmu.lbr_restore(ctx); + + task_context_opt(ctx)->lbr_stack_state = LBR_NONE; +} + +void intel_pmu_lbr_save(void *ctx) +{ + bool need_info = x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO; + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct x86_perf_task_context *task_ctx = ctx; + unsigned lbr_idx, mask; + u64 tos; + int i; + mask = x86_pmu.lbr_nr - 1; tos = intel_pmu_lbr_tos(); for (i = 0; i < x86_pmu.lbr_nr; i++) { lbr_idx = (tos - i) & mask; - from = rdlbr_from(lbr_idx); - if (!from) + if (!rdlbr_all(&task_ctx->lbr[i], lbr_idx, need_info)) break; - task_ctx->lbr_from[i] = from; - task_ctx->lbr_to[i] = rdlbr_to(lbr_idx); - if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO) - rdmsrl(MSR_LBR_INFO_0 + lbr_idx, task_ctx->lbr_info[i]); } task_ctx->valid_lbrs = i; task_ctx->tos = tos; - task_ctx->lbr_stack_state = LBR_VALID; - - cpuc->last_task_ctx = task_ctx; - cpuc->last_log_id = ++task_ctx->log_id; if (cpuc->lbr_select) - rdmsrl(MSR_LBR_SELECT, task_ctx->lbr_sel); + rdmsrl(MSR_LBR_SELECT, task_ctx->lbr_sel); +} + +static void intel_pmu_arch_lbr_save(void *ctx) +{ + struct x86_perf_task_context_arch_lbr *task_ctx = ctx; + struct lbr_entry *entries = task_ctx->entries; + int i; + + for (i = 0; i < x86_pmu.lbr_nr; i++) { + if (!rdlbr_all(&entries[i], i, true)) + break; + } + + /* LBR call stack is not full. Reset is required in restore. */ + if (i < x86_pmu.lbr_nr) + entries[x86_pmu.lbr_nr - 1].from = 0; +} + +static void __intel_pmu_lbr_save(void *ctx) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + + if (task_context_opt(ctx)->lbr_callstack_users == 0) { + task_context_opt(ctx)->lbr_stack_state = LBR_NONE; + return; + } + + x86_pmu.lbr_save(ctx); + + task_context_opt(ctx)->lbr_stack_state = LBR_VALID; + + cpuc->last_task_ctx = ctx; + cpuc->last_log_id = ++task_context_opt(ctx)->log_id; +} + +void intel_pmu_lbr_swap_task_ctx(struct perf_event_context *prev, + struct perf_event_context *next) +{ + void *prev_ctx_data, *next_ctx_data; + + swap(prev->task_ctx_data, next->task_ctx_data); + + /* + * Architecture specific synchronization makes sense in + * case both prev->task_ctx_data and next->task_ctx_data + * pointers are allocated. + */ + + prev_ctx_data = next->task_ctx_data; + next_ctx_data = prev->task_ctx_data; + + if (!prev_ctx_data || !next_ctx_data) + return; + + swap(task_context_opt(prev_ctx_data)->lbr_callstack_users, + task_context_opt(next_ctx_data)->lbr_callstack_users); } void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - struct x86_perf_task_context *task_ctx; + void *task_ctx; if (!cpuc->lbr_users) return; @@ -463,7 +637,6 @@ static inline bool branch_user_callstack(unsigned br_sel) void intel_pmu_lbr_add(struct perf_event *event) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - struct x86_perf_task_context *task_ctx; if (!x86_pmu.lbr_nr) return; @@ -473,10 +646,8 @@ void intel_pmu_lbr_add(struct perf_event *event) cpuc->br_sel = event->hw.branch_reg.reg; - if (branch_user_callstack(cpuc->br_sel) && event->ctx->task_ctx_data) { - task_ctx = event->ctx->task_ctx_data; - task_ctx->lbr_callstack_users++; - } + if (branch_user_callstack(cpuc->br_sel) && event->ctx->task_ctx_data) + task_context_opt(event->ctx->task_ctx_data)->lbr_callstack_users++; /* * Request pmu::sched_task() callback, which will fire inside the @@ -507,16 +678,13 @@ void intel_pmu_lbr_add(struct perf_event *event) void intel_pmu_lbr_del(struct perf_event *event) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - struct x86_perf_task_context *task_ctx; if (!x86_pmu.lbr_nr) return; if (branch_user_callstack(cpuc->br_sel) && - event->ctx->task_ctx_data) { - task_ctx = event->ctx->task_ctx_data; - task_ctx->lbr_callstack_users--; - } + event->ctx->task_ctx_data) + task_context_opt(event->ctx->task_ctx_data)->lbr_callstack_users--; if (event->hw.flags & PERF_X86_EVENT_LBR_SELECT) cpuc->lbr_select = 0; @@ -553,9 +721,10 @@ void intel_pmu_lbr_disable_all(void) __intel_pmu_lbr_disable(); } -static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc) +void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc) { unsigned long mask = x86_pmu.lbr_nr - 1; + struct perf_branch_entry *br = cpuc->lbr_entries; u64 tos = intel_pmu_lbr_tos(); int i; @@ -571,17 +740,14 @@ static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc) rdmsrl(x86_pmu.lbr_from + lbr_idx, msr_lastbranch.lbr); - cpuc->lbr_entries[i].from = msr_lastbranch.from; - cpuc->lbr_entries[i].to = msr_lastbranch.to; - cpuc->lbr_entries[i].mispred = 0; - cpuc->lbr_entries[i].predicted = 0; - cpuc->lbr_entries[i].in_tx = 0; - cpuc->lbr_entries[i].abort = 0; - cpuc->lbr_entries[i].cycles = 0; - cpuc->lbr_entries[i].type = 0; - cpuc->lbr_entries[i].reserved = 0; + perf_clear_branch_entry_bitfields(br); + + br->from = msr_lastbranch.from; + br->to = msr_lastbranch.to; + br++; } cpuc->lbr_stack.nr = i; + cpuc->lbr_stack.hw_idx = tos; } /* @@ -589,11 +755,12 @@ static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc) * is the same as the linear address, allowing us to merge the LIP and EIP * LBR formats. */ -static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) +void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) { bool need_info = false, call_stack = false; unsigned long mask = x86_pmu.lbr_nr - 1; int lbr_format = x86_pmu.intel_cap.lbr_format; + struct perf_branch_entry *br = cpuc->lbr_entries; u64 tos = intel_pmu_lbr_tos(); int i; int out = 0; @@ -612,8 +779,8 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) u16 cycles = 0; int lbr_flags = lbr_desc[lbr_format]; - from = rdlbr_from(lbr_idx); - to = rdlbr_to(lbr_idx); + from = rdlbr_from(lbr_idx, NULL); + to = rdlbr_to(lbr_idx, NULL); /* * Read LBR call stack entries @@ -625,7 +792,7 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) if (lbr_format == LBR_FORMAT_INFO && need_info) { u64 info; - rdmsrl(MSR_LBR_INFO_0 + lbr_idx, info); + info = rdlbr_info(lbr_idx, NULL); mis = !!(info & LBR_INFO_MISPRED); pred = !mis; in_tx = !!(info & LBR_INFO_IN_TX); @@ -665,18 +832,93 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) if (abort && x86_pmu.lbr_double_abort && out > 0) out--; - cpuc->lbr_entries[out].from = from; - cpuc->lbr_entries[out].to = to; - cpuc->lbr_entries[out].mispred = mis; - cpuc->lbr_entries[out].predicted = pred; - cpuc->lbr_entries[out].in_tx = in_tx; - cpuc->lbr_entries[out].abort = abort; - cpuc->lbr_entries[out].cycles = cycles; - cpuc->lbr_entries[out].type = 0; - cpuc->lbr_entries[out].reserved = 0; + perf_clear_branch_entry_bitfields(br+out); + br[out].from = from; + br[out].to = to; + br[out].mispred = mis; + br[out].predicted = pred; + br[out].in_tx = in_tx; + br[out].abort = abort; + br[out].cycles = cycles; out++; } cpuc->lbr_stack.nr = out; + cpuc->lbr_stack.hw_idx = tos; +} + +static __always_inline int get_lbr_br_type(u64 info) +{ + if (!static_cpu_has(X86_FEATURE_ARCH_LBR) || !x86_pmu.lbr_br_type) + return 0; + + return (info & LBR_INFO_BR_TYPE) >> LBR_INFO_BR_TYPE_OFFSET; +} + +static __always_inline bool get_lbr_mispred(u64 info) +{ + if (static_cpu_has(X86_FEATURE_ARCH_LBR) && !x86_pmu.lbr_mispred) + return 0; + + return !!(info & LBR_INFO_MISPRED); +} + +static __always_inline bool get_lbr_predicted(u64 info) +{ + if (static_cpu_has(X86_FEATURE_ARCH_LBR) && !x86_pmu.lbr_mispred) + return 0; + + return !(info & LBR_INFO_MISPRED); +} + +static __always_inline bool get_lbr_cycles(u64 info) +{ + if (static_cpu_has(X86_FEATURE_ARCH_LBR) && + !(x86_pmu.lbr_timed_lbr && info & LBR_INFO_CYC_CNT_VALID)) + return 0; + + return info & LBR_INFO_CYCLES; +} + +static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc, + struct lbr_entry *entries) +{ + struct perf_branch_entry *e; + struct lbr_entry *lbr; + u64 from, to, info; + int i; + + for (i = 0; i < x86_pmu.lbr_nr; i++) { + lbr = entries ? &entries[i] : NULL; + e = &cpuc->lbr_entries[i]; + + from = rdlbr_from(i, lbr); + /* + * Read LBR entries until invalid entry (0s) is detected. + */ + if (!from) + break; + + to = rdlbr_to(i, lbr); + info = rdlbr_info(i, lbr); + + perf_clear_branch_entry_bitfields(e); + + e->from = from; + e->to = to; + e->mispred = get_lbr_mispred(info); + e->predicted = get_lbr_predicted(info); + e->in_tx = !!(info & LBR_INFO_IN_TX); + e->abort = !!(info & LBR_INFO_ABORT); + e->cycles = get_lbr_cycles(info); + e->type = get_lbr_br_type(info); + } + + cpuc->lbr_stack.nr = i; +} + +static void intel_pmu_arch_lbr_read(struct cpu_hw_events *cpuc) +{ + intel_pmu_store_lbr(cpuc, NULL); } void intel_pmu_lbr_read(void) @@ -693,10 +935,7 @@ void intel_pmu_lbr_read(void) cpuc->lbr_users == cpuc->lbr_pebs_users) return; - if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_32) - intel_pmu_lbr_read_32(cpuc); - else - intel_pmu_lbr_read_64(cpuc); + x86_pmu.lbr_read(cpuc); intel_pmu_lbr_filter(cpuc); } @@ -796,6 +1035,11 @@ static int intel_pmu_setup_hw_lbr_filter(struct perf_event *event) reg = &event->hw.branch_reg; reg->idx = EXTRA_REG_LBR; + if (static_cpu_has(X86_FEATURE_ARCH_LBR)) { + reg->config = mask; + return 0; + } + /* * The first 9 bits (LBR_SEL_MASK) in LBR_SELECT operate * in suppress mode. So LBR_SELECT should be set to @@ -1052,6 +1296,27 @@ common_branch_type(int type) return PERF_BR_UNKNOWN; } +enum { + ARCH_LBR_BR_TYPE_JCC = 0, + ARCH_LBR_BR_TYPE_NEAR_IND_JMP = 1, + ARCH_LBR_BR_TYPE_NEAR_REL_JMP = 2, + ARCH_LBR_BR_TYPE_NEAR_IND_CALL = 3, + ARCH_LBR_BR_TYPE_NEAR_REL_CALL = 4, + ARCH_LBR_BR_TYPE_NEAR_RET = 5, + ARCH_LBR_BR_TYPE_KNOWN_MAX = ARCH_LBR_BR_TYPE_NEAR_RET, + + ARCH_LBR_BR_TYPE_MAP_MAX = 16, +}; + +static const int arch_lbr_br_type_map[ARCH_LBR_BR_TYPE_MAP_MAX] = { + [ARCH_LBR_BR_TYPE_JCC] = X86_BR_JCC, + [ARCH_LBR_BR_TYPE_NEAR_IND_JMP] = X86_BR_IND_JMP, + [ARCH_LBR_BR_TYPE_NEAR_REL_JMP] = X86_BR_JMP, + [ARCH_LBR_BR_TYPE_NEAR_IND_CALL] = X86_BR_IND_CALL, + [ARCH_LBR_BR_TYPE_NEAR_REL_CALL] = X86_BR_CALL, + [ARCH_LBR_BR_TYPE_NEAR_RET] = X86_BR_RET, +}; + /* * implement actual branch filter based on user demand. * Hardware may not exactly satisfy that request, thus @@ -1064,7 +1329,7 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc) { u64 from, to; int br_sel = cpuc->br_sel; - int i, j, type; + int i, j, type, to_plm; bool compress = false; /* if sampling all branches, then nothing to filter */ @@ -1076,8 +1341,19 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc) from = cpuc->lbr_entries[i].from; to = cpuc->lbr_entries[i].to; + type = cpuc->lbr_entries[i].type; - type = branch_type(from, to, cpuc->lbr_entries[i].abort); + /* + * Parse the branch type recorded in LBR_x_INFO MSR. + * Doesn't support OTHER_BRANCH decoding for now. + * OTHER_BRANCH branch type still rely on software decoding. + */ + if (static_cpu_has(X86_FEATURE_ARCH_LBR) && + type <= ARCH_LBR_BR_TYPE_KNOWN_MAX) { + to_plm = kernel_ip(to) ? X86_BR_KERNEL : X86_BR_USER; + type = arch_lbr_br_type_map[type] | to_plm; + } else + type = branch_type(from, to, cpuc->lbr_entries[i].abort); if (type != X86_BR_NONE && (br_sel & X86_BR_ANYTX)) { if (cpuc->lbr_entries[i].in_tx) type |= X86_BR_IN_TX; @@ -1112,25 +1388,18 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc) } } -void intel_pmu_store_pebs_lbrs(struct pebs_lbr *lbr) +void intel_pmu_store_pebs_lbrs(struct lbr_entry *lbr) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - int i; - cpuc->lbr_stack.nr = x86_pmu.lbr_nr; - for (i = 0; i < x86_pmu.lbr_nr; i++) { - u64 info = lbr->lbr[i].info; - struct perf_branch_entry *e = &cpuc->lbr_entries[i]; + /* Cannot get TOS for large PEBS and Arch LBR */ + if (static_cpu_has(X86_FEATURE_ARCH_LBR) || + (cpuc->n_pebs == cpuc->n_large_pebs)) + cpuc->lbr_stack.hw_idx = -1ULL; + else + cpuc->lbr_stack.hw_idx = intel_pmu_lbr_tos(); - e->from = lbr->lbr[i].from; - e->to = lbr->lbr[i].to; - e->mispred = !!(info & LBR_INFO_MISPRED); - e->predicted = !(info & LBR_INFO_MISPRED); - e->in_tx = !!(info & LBR_INFO_IN_TX); - e->abort = !!(info & LBR_INFO_ABORT); - e->cycles = info & LBR_INFO_CYCLES; - e->reserved = 0; - } + intel_pmu_store_lbr(cpuc, lbr); intel_pmu_lbr_filter(cpuc); } @@ -1187,6 +1456,26 @@ static const int hsw_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = { [PERF_SAMPLE_BRANCH_CALL_SHIFT] = LBR_REL_CALL, }; +static int arch_lbr_ctl_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = { + [PERF_SAMPLE_BRANCH_ANY_SHIFT] = ARCH_LBR_ANY, + [PERF_SAMPLE_BRANCH_USER_SHIFT] = ARCH_LBR_USER, + [PERF_SAMPLE_BRANCH_KERNEL_SHIFT] = ARCH_LBR_KERNEL, + [PERF_SAMPLE_BRANCH_HV_SHIFT] = LBR_IGN, + [PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT] = ARCH_LBR_RETURN | + ARCH_LBR_OTHER_BRANCH, + [PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] = ARCH_LBR_REL_CALL | + ARCH_LBR_IND_CALL | + ARCH_LBR_OTHER_BRANCH, + [PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] = ARCH_LBR_IND_CALL, + [PERF_SAMPLE_BRANCH_COND_SHIFT] = ARCH_LBR_JCC, + [PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT] = ARCH_LBR_REL_CALL | + ARCH_LBR_IND_CALL | + ARCH_LBR_RETURN | + ARCH_LBR_CALL_STACK, + [PERF_SAMPLE_BRANCH_IND_JUMP_SHIFT] = ARCH_LBR_IND_JMP, + [PERF_SAMPLE_BRANCH_CALL_SHIFT] = ARCH_LBR_REL_CALL, +}; + /* core */ void __init intel_pmu_lbr_init_core(void) { @@ -1240,9 +1529,17 @@ void __init intel_pmu_lbr_init_snb(void) */ } +static inline struct kmem_cache * +create_lbr_kmem_cache(size_t size, size_t align) +{ + return kmem_cache_create("x86_lbr", size, align, 0, NULL); +} + /* haswell */ void intel_pmu_lbr_init_hsw(void) { + size_t size = sizeof(struct x86_perf_task_context); + x86_pmu.lbr_nr = 16; x86_pmu.lbr_tos = MSR_LBR_TOS; x86_pmu.lbr_from = MSR_LBR_NHM_FROM; @@ -1251,6 +1548,8 @@ void intel_pmu_lbr_init_hsw(void) x86_pmu.lbr_sel_mask = LBR_SEL_MASK; x86_pmu.lbr_sel_map = hsw_lbr_sel_map; + x86_get_pmu()->task_ctx_cache = create_lbr_kmem_cache(size, 0); + if (lbr_from_signext_quirk_needed()) static_branch_enable(&lbr_from_quirk_key); } @@ -1258,14 +1557,19 @@ void intel_pmu_lbr_init_hsw(void) /* skylake */ __init void intel_pmu_lbr_init_skl(void) { + size_t size = sizeof(struct x86_perf_task_context); + x86_pmu.lbr_nr = 32; x86_pmu.lbr_tos = MSR_LBR_TOS; x86_pmu.lbr_from = MSR_LBR_NHM_FROM; x86_pmu.lbr_to = MSR_LBR_NHM_TO; + x86_pmu.lbr_info = MSR_LBR_INFO_0; x86_pmu.lbr_sel_mask = LBR_SEL_MASK; x86_pmu.lbr_sel_map = hsw_lbr_sel_map; + x86_get_pmu()->task_ctx_cache = create_lbr_kmem_cache(size, 0); + /* * SW branch filter usage: * - support syscall, sysret capture. @@ -1333,6 +1637,84 @@ void intel_pmu_lbr_init_knl(void) x86_pmu.intel_cap.lbr_format = LBR_FORMAT_EIP_FLAGS; } +void __init intel_pmu_arch_lbr_init(void) +{ + union cpuid28_eax eax; + union cpuid28_ebx ebx; + union cpuid28_ecx ecx; + unsigned int unused_edx; + size_t size; + u64 lbr_nr; + + /* Arch LBR Capabilities */ + cpuid(28, &eax.full, &ebx.full, &ecx.full, &unused_edx); + + lbr_nr = fls(eax.split.lbr_depth_mask) * 8; + if (!lbr_nr) + goto clear_arch_lbr; + + /* Apply the max depth of Arch LBR */ + if (wrmsrl_safe(MSR_ARCH_LBR_DEPTH, lbr_nr)) + goto clear_arch_lbr; + + x86_pmu.lbr_depth_mask = eax.split.lbr_depth_mask; + x86_pmu.lbr_deep_c_reset = eax.split.lbr_deep_c_reset; + x86_pmu.lbr_lip = eax.split.lbr_lip; + x86_pmu.lbr_cpl = ebx.split.lbr_cpl; + x86_pmu.lbr_filter = ebx.split.lbr_filter; + x86_pmu.lbr_call_stack = ebx.split.lbr_call_stack; + x86_pmu.lbr_mispred = ecx.split.lbr_mispred; + x86_pmu.lbr_timed_lbr = ecx.split.lbr_timed_lbr; + x86_pmu.lbr_br_type = ecx.split.lbr_br_type; + x86_pmu.lbr_nr = lbr_nr; + + size = sizeof(struct x86_perf_task_context_arch_lbr) + + lbr_nr * sizeof(struct lbr_entry); + x86_get_pmu()->task_ctx_size = size; + x86_get_pmu()->task_ctx_cache = create_lbr_kmem_cache(size, 0); + + x86_pmu.lbr_from = MSR_ARCH_LBR_FROM_0; + x86_pmu.lbr_to = MSR_ARCH_LBR_TO_0; + x86_pmu.lbr_info = MSR_ARCH_LBR_INFO_0; + + /* LBR callstack requires both CPL and Branch Filtering support */ + if (!x86_pmu.lbr_cpl || + !x86_pmu.lbr_filter || + !x86_pmu.lbr_call_stack) + arch_lbr_ctl_map[PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT] = LBR_NOT_SUPP; + + if (!x86_pmu.lbr_cpl) { + arch_lbr_ctl_map[PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_NOT_SUPP; + arch_lbr_ctl_map[PERF_SAMPLE_BRANCH_KERNEL_SHIFT] = LBR_NOT_SUPP; + } else if (!x86_pmu.lbr_filter) { + arch_lbr_ctl_map[PERF_SAMPLE_BRANCH_ANY_SHIFT] = LBR_NOT_SUPP; + arch_lbr_ctl_map[PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT] = LBR_NOT_SUPP; + arch_lbr_ctl_map[PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] = LBR_NOT_SUPP; + arch_lbr_ctl_map[PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] = LBR_NOT_SUPP; + arch_lbr_ctl_map[PERF_SAMPLE_BRANCH_COND_SHIFT] = LBR_NOT_SUPP; + arch_lbr_ctl_map[PERF_SAMPLE_BRANCH_IND_JUMP_SHIFT] = LBR_NOT_SUPP; + arch_lbr_ctl_map[PERF_SAMPLE_BRANCH_CALL_SHIFT] = LBR_NOT_SUPP; + } + + x86_pmu.lbr_ctl_mask = ARCH_LBR_CTL_MASK; + x86_pmu.lbr_ctl_map = arch_lbr_ctl_map; + + if (!x86_pmu.lbr_cpl && !x86_pmu.lbr_filter) + x86_pmu.lbr_ctl_map = NULL; + + x86_pmu.lbr_reset = intel_pmu_arch_lbr_reset; + x86_pmu.lbr_read = intel_pmu_arch_lbr_read; + x86_pmu.lbr_save = intel_pmu_arch_lbr_save; + x86_pmu.lbr_restore = intel_pmu_arch_lbr_restore; + + pr_cont("Architectural LBR, "); + + return; + +clear_arch_lbr: + clear_cpu_cap(&boot_cpu_data, X86_FEATURE_ARCH_LBR); +} + /** * x86_perf_get_lbr - get the LBR records information * @@ -1347,7 +1729,7 @@ int x86_perf_get_lbr(struct x86_pmu_lbr *lbr) lbr->nr = x86_pmu.lbr_nr; lbr->from = x86_pmu.lbr_from; lbr->to = x86_pmu.lbr_to; - lbr->info = (lbr_fmt == LBR_FORMAT_INFO) ? MSR_LBR_INFO_0 : 0; + lbr->info = (lbr_fmt == LBR_FORMAT_INFO) ? x86_pmu.lbr_info : 0; return 0; } diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 39db37ee8ffa7d79b4c767a9ed66c4b1fe9d260c..ba5d948e8979e6f7d1163d4c3d0679961c5dba9f 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -65,21 +65,23 @@ static inline bool constraint_match(struct event_constraint *c, u64 ecode) /* * struct hw_perf_event.flags flags */ -#define PERF_X86_EVENT_PEBS_LDLAT 0x0001 /* ld+ldlat data address sampling */ -#define PERF_X86_EVENT_PEBS_ST 0x0002 /* st data address sampling */ -#define PERF_X86_EVENT_PEBS_ST_HSW 0x0004 /* haswell style datala, store */ -#define PERF_X86_EVENT_PEBS_LD_HSW 0x0008 /* haswell style datala, load */ -#define PERF_X86_EVENT_PEBS_NA_HSW 0x0010 /* haswell style datala, unknown */ -#define PERF_X86_EVENT_EXCL 0x0020 /* HT exclusivity on counter */ -#define PERF_X86_EVENT_DYNAMIC 0x0040 /* dynamic alloc'd constraint */ -#define PERF_X86_EVENT_RDPMC_ALLOWED 0x0080 /* grant rdpmc permission */ -#define PERF_X86_EVENT_EXCL_ACCT 0x0100 /* accounted EXCL event */ -#define PERF_X86_EVENT_AUTO_RELOAD 0x0200 /* use PEBS auto-reload */ -#define PERF_X86_EVENT_LARGE_PEBS 0x0400 /* use large PEBS */ -#define PERF_X86_EVENT_PEBS_VIA_PT 0x0800 /* use PT buffer for PEBS */ -#define PERF_X86_EVENT_PAIR 0x1000 /* Large Increment per Cycle */ -#define PERF_X86_EVENT_LBR_SELECT 0x2000 /* Save/Restore MSR_LBR_SELECT */ -#define PERF_X86_EVENT_TOPDOWN 0x4000 /* Count Topdown slots/metrics events */ +#define PERF_X86_EVENT_PEBS_LDLAT 0x00001 /* ld+ldlat data address sampling */ +#define PERF_X86_EVENT_PEBS_ST 0x00002 /* st data address sampling */ +#define PERF_X86_EVENT_PEBS_ST_HSW 0x00004 /* haswell style datala, store */ +#define PERF_X86_EVENT_PEBS_LD_HSW 0x00008 /* haswell style datala, load */ +#define PERF_X86_EVENT_PEBS_NA_HSW 0x00010 /* haswell style datala, unknown */ +#define PERF_X86_EVENT_EXCL 0x00020 /* HT exclusivity on counter */ +#define PERF_X86_EVENT_DYNAMIC 0x00040 /* dynamic alloc'd constraint */ + +#define PERF_X86_EVENT_EXCL_ACCT 0x00100 /* accounted EXCL event */ +#define PERF_X86_EVENT_AUTO_RELOAD 0x00200 /* use PEBS auto-reload */ +#define PERF_X86_EVENT_LARGE_PEBS 0x00400 /* use large PEBS */ +#define PERF_X86_EVENT_PEBS_VIA_PT 0x00800 /* use PT buffer for PEBS */ +#define PERF_X86_EVENT_PAIR 0x01000 /* Large Increment per Cycle */ +#define PERF_X86_EVENT_LBR_SELECT 0x02000 /* Save/Restore MSR_LBR_SELECT */ +#define PERF_X86_EVENT_TOPDOWN 0x04000 /* Count Topdown slots/metrics events */ +#define PERF_X86_EVENT_PEBS_STLAT 0x08000 /* st+stlat data address sampling */ +#define PERF_X86_EVENT_AMD_BRS 0x10000 /* AMD Branch Sampling */ static inline bool is_topdown_count(struct perf_event *event) { @@ -132,7 +134,7 @@ struct amd_nb { PERF_SAMPLE_DATA_SRC | PERF_SAMPLE_IDENTIFIER | \ PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR | \ PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER | \ - PERF_SAMPLE_PERIOD) + PERF_SAMPLE_PERIOD | PERF_SAMPLE_CODE_PAGE_SIZE) #define PEBS_GP_REGS \ ((1ULL << PERF_REG_X86_AX) | \ @@ -204,6 +206,17 @@ struct intel_excl_cntrs { struct x86_perf_task_context; #define MAX_LBR_ENTRIES 32 +enum { + LBR_FORMAT_32 = 0x00, + LBR_FORMAT_LIP = 0x01, + LBR_FORMAT_EIP = 0x02, + LBR_FORMAT_EIP_FLAGS = 0x03, + LBR_FORMAT_EIP_FLAGS2 = 0x04, + LBR_FORMAT_INFO = 0x05, + LBR_FORMAT_TIME = 0x06, + LBR_FORMAT_MAX_KNOWN = LBR_FORMAT_TIME, +}; + enum { X86_PERF_KFREE_SHARED = 0, X86_PERF_KFREE_EXCL = 1, @@ -261,9 +274,12 @@ struct cpu_hw_events { int lbr_pebs_users; struct perf_branch_stack lbr_stack; struct perf_branch_entry lbr_entries[MAX_LBR_ENTRIES]; - struct er_account *lbr_sel; + union { + struct er_account *lbr_sel; + struct er_account *lbr_ctl; + }; u64 br_sel; - struct x86_perf_task_context *last_task_ctx; + void *last_task_ctx; int last_log_id; int lbr_select; @@ -306,6 +322,8 @@ struct cpu_hw_events { * AMD specific bits */ struct amd_nb *amd_nb; + int brs_active; /* BRS is enabled */ + /* Inverted mask of bits to clear in the perf_ctr ctrl registers */ u64 perf_ctr_virt_mask; int n_pair; /* Large increment events */ @@ -721,12 +739,36 @@ struct x86_pmu { * Intel LBR */ unsigned int lbr_tos, lbr_from, lbr_to, - lbr_nr; /* LBR base regs and size */ - u64 lbr_sel_mask; /* LBR_SELECT valid bits */ - const int *lbr_sel_map; /* lbr_select mappings */ + lbr_info, lbr_nr; /* LBR base regs and size */ + union { + u64 lbr_sel_mask; /* LBR_SELECT valid bits */ + u64 lbr_ctl_mask; /* LBR_CTL valid bits */ + }; + union { + const int *lbr_sel_map; /* lbr_select mappings */ + int *lbr_ctl_map; /* LBR_CTL mappings */ + }; bool lbr_double_abort; /* duplicated lbr aborts */ bool lbr_pt_coexist; /* (LBR|BTS) may coexist with PT */ + /* + * Intel Architectural LBR CPUID Enumeration + */ + unsigned int lbr_depth_mask:8; + unsigned int lbr_deep_c_reset:1; + unsigned int lbr_lip:1; + unsigned int lbr_cpl:1; + unsigned int lbr_filter:1; + unsigned int lbr_call_stack:1; + unsigned int lbr_mispred:1; + unsigned int lbr_timed_lbr:1; + unsigned int lbr_br_type:1; + + void (*lbr_reset)(void); + void (*lbr_read)(struct cpu_hw_events *cpuc); + void (*lbr_save)(void *ctx); + void (*lbr_restore)(void *ctx); + /* * Intel PT/LBR/BTS are exclusive */ @@ -763,16 +805,23 @@ struct x86_pmu { int (*aux_output_match) (struct perf_event *event); }; +struct x86_perf_task_context_opt { + int lbr_callstack_users; + int lbr_stack_state; + int log_id; +}; + struct x86_perf_task_context { - u64 lbr_from[MAX_LBR_ENTRIES]; - u64 lbr_to[MAX_LBR_ENTRIES]; - u64 lbr_info[MAX_LBR_ENTRIES]; u64 lbr_sel; int tos; int valid_lbrs; - int lbr_callstack_users; - int lbr_stack_state; - int log_id; + struct x86_perf_task_context_opt opt; + struct lbr_entry lbr[MAX_LBR_ENTRIES]; +}; + +struct x86_perf_task_context_arch_lbr { + struct x86_perf_task_context_opt opt; + struct lbr_entry entries[]; }; #define x86_add_quirk(func_) \ @@ -823,6 +872,14 @@ static struct perf_pmu_events_ht_attr event_attr_##v = { \ struct pmu *x86_get_pmu(void); extern struct x86_pmu x86_pmu __read_mostly; +static __always_inline struct x86_perf_task_context_opt *task_context_opt(void *ctx) +{ + if (static_cpu_has(X86_FEATURE_ARCH_LBR)) + return &((struct x86_perf_task_context_arch_lbr *)ctx)->opt; + + return &((struct x86_perf_task_context *)ctx)->opt; +} + static inline bool x86_pmu_has_lbr_callstack(void) { return x86_pmu.lbr_sel_map && @@ -889,6 +946,11 @@ int x86_pmu_hw_config(struct perf_event *event); void x86_pmu_disable_all(void); +static inline bool has_amd_brs(struct hw_perf_event *hwc) +{ + return hwc->flags & PERF_X86_EVENT_AMD_BRS; +} + static inline bool is_counter_pair(struct hw_perf_event *hwc) { return hwc->flags & PERF_X86_EVENT_PAIR; @@ -935,6 +997,9 @@ void x86_pmu_enable_event(struct perf_event *event); int x86_pmu_handle_irq(struct pt_regs *regs); +void x86_pmu_show_pmu_cap(int num_counters, int num_counters_fixed, + u64 intel_ctrl); + extern struct event_constraint emptyconstraint; extern struct event_constraint unconstrained; @@ -976,9 +1041,58 @@ ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr, ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr, char *page); +static inline bool fixed_counter_disabled(int i) +{ + return !(x86_pmu.intel_ctrl >> (i + INTEL_PMC_IDX_FIXED)); +} + #ifdef CONFIG_CPU_SUP_AMD int amd_pmu_init(void); +int amd_brs_init(void); +void amd_brs_disable(void); +void amd_brs_enable(void); +void amd_brs_enable_all(void); +void amd_brs_disable_all(void); +void amd_brs_drain(void); +void amd_brs_disable_all(void); +int amd_brs_setup_filter(struct perf_event *event); +void amd_brs_reset(void); + +static inline void amd_pmu_brs_add(struct perf_event *event) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + + perf_sched_cb_inc(event->ctx->pmu); + cpuc->lbr_users++; + /* + * No need to reset BRS because it is reset + * on brs_enable() and it is saturating + */ +} + +static inline void amd_pmu_brs_del(struct perf_event *event) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + + cpuc->lbr_users--; + WARN_ON_ONCE(cpuc->lbr_users < 0); + + perf_sched_cb_dec(event->ctx->pmu); +} + +void amd_pmu_brs_sched_task(struct perf_event_context *ctx, bool sched_in); + +/* + * check if BRS is activated on the CPU + * active defined as it has non-zero users and DBG_EXT_CFG.BRSEN=1 + */ +static inline bool amd_brs_active(void) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + + return cpuc->brs_active; +} #else /* CONFIG_CPU_SUP_AMD */ @@ -987,6 +1101,23 @@ static inline int amd_pmu_init(void) return 0; } +static inline int amd_brs_init(void) +{ + return -EOPNOTSUPP; +} + +static inline void amd_brs_drain(void) +{ +} + +static inline void amd_brs_enable_all(void) +{ +} + +static inline void amd_brs_disable_all(void) +{ +} + #endif /* CONFIG_CPU_SUP_AMD */ static inline int is_pebs_pt(struct perf_event *event) @@ -1089,16 +1220,23 @@ void intel_pmu_pebs_sched_task(struct perf_event_context *ctx, bool sched_in); void intel_pmu_auto_reload_read(struct perf_event *event); -void intel_pmu_store_pebs_lbrs(struct pebs_lbr *lbr); +void intel_pmu_store_pebs_lbrs(struct lbr_entry *lbr); void intel_ds_init(void); +void intel_pmu_lbr_swap_task_ctx(struct perf_event_context *prev, + struct perf_event_context *next); + void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in); u64 lbr_from_signext_quirk_wr(u64 val); void intel_pmu_lbr_reset(void); +void intel_pmu_lbr_reset_32(void); + +void intel_pmu_lbr_reset_64(void); + void intel_pmu_lbr_add(struct perf_event *event); void intel_pmu_lbr_del(struct perf_event *event); @@ -1109,6 +1247,14 @@ void intel_pmu_lbr_disable_all(void); void intel_pmu_lbr_read(void); +void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc); + +void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc); + +void intel_pmu_lbr_save(void *ctx); + +void intel_pmu_lbr_restore(void *ctx); + void intel_pmu_lbr_init_core(void); void intel_pmu_lbr_init_nhm(void); @@ -1125,6 +1271,8 @@ void intel_pmu_lbr_init_skl(void); void intel_pmu_lbr_init_knl(void); +void intel_pmu_arch_lbr_init(void); + void intel_pmu_pebs_data_source_nhm(void); void intel_pmu_pebs_data_source_skl(bool pmem); diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 13adca37c99a36a3a8282e22283b130b4a5dd0ad..f4a52222cb22230f836b2d1c0823334918d286de 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -9,6 +9,10 @@ #include #include +#define ALT_FLAGS_SHIFT 16 +#define ALT_FLAG_NOT BIT(0) +#define ALT_NOT(feature) ((ALT_FLAG_NOT << ALT_FLAGS_SHIFT) | (feature)) + /* * Alternative inline assembly for SMP. * diff --git a/arch/x86/include/asm/amd-ibs.h b/arch/x86/include/asm/amd-ibs.h new file mode 100644 index 0000000000000000000000000000000000000000..46e1df45efc0462774de91891209c50c2acef0b5 --- /dev/null +++ b/arch/x86/include/asm/amd-ibs.h @@ -0,0 +1,132 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * From PPR Vol 1 for AMD Family 19h Model 01h B1 + * 55898 Rev 0.35 - Feb 5, 2021 + */ + +#include + +/* + * IBS Hardware MSRs + */ + +/* MSR 0xc0011030: IBS Fetch Control */ +union ibs_fetch_ctl { + __u64 val; + struct { + __u64 fetch_maxcnt:16,/* 0-15: instruction fetch max. count */ + fetch_cnt:16, /* 16-31: instruction fetch count */ + fetch_lat:16, /* 32-47: instruction fetch latency */ + fetch_en:1, /* 48: instruction fetch enable */ + fetch_val:1, /* 49: instruction fetch valid */ + fetch_comp:1, /* 50: instruction fetch complete */ + ic_miss:1, /* 51: i-cache miss */ + phy_addr_valid:1,/* 52: physical address valid */ + l1tlb_pgsz:2, /* 53-54: i-cache L1TLB page size + * (needs IbsPhyAddrValid) */ + l1tlb_miss:1, /* 55: i-cache fetch missed in L1TLB */ + l2tlb_miss:1, /* 56: i-cache fetch missed in L2TLB */ + rand_en:1, /* 57: random tagging enable */ + fetch_l2_miss:1,/* 58: L2 miss for sampled fetch + * (needs IbsFetchComp) */ + reserved:5; /* 59-63: reserved */ + }; +}; + +/* MSR 0xc0011033: IBS Execution Control */ +union ibs_op_ctl { + __u64 val; + struct { + __u64 opmaxcnt:16, /* 0-15: periodic op max. count */ + reserved0:1, /* 16: reserved */ + op_en:1, /* 17: op sampling enable */ + op_val:1, /* 18: op sample valid */ + cnt_ctl:1, /* 19: periodic op counter control */ + opmaxcnt_ext:7, /* 20-26: upper 7 bits of periodic op maximum count */ + reserved1:5, /* 27-31: reserved */ + opcurcnt:27, /* 32-58: periodic op counter current count */ + reserved2:5; /* 59-63: reserved */ + }; +}; + +/* MSR 0xc0011035: IBS Op Data 2 */ +union ibs_op_data { + __u64 val; + struct { + __u64 comp_to_ret_ctr:16, /* 0-15: op completion to retire count */ + tag_to_ret_ctr:16, /* 15-31: op tag to retire count */ + reserved1:2, /* 32-33: reserved */ + op_return:1, /* 34: return op */ + op_brn_taken:1, /* 35: taken branch op */ + op_brn_misp:1, /* 36: mispredicted branch op */ + op_brn_ret:1, /* 37: branch op retired */ + op_rip_invalid:1, /* 38: RIP is invalid */ + op_brn_fuse:1, /* 39: fused branch op */ + op_microcode:1, /* 40: microcode op */ + reserved2:23; /* 41-63: reserved */ + }; +}; + +/* MSR 0xc0011036: IBS Op Data 2 */ +union ibs_op_data2 { + __u64 val; + struct { + __u64 data_src:3, /* 0-2: data source */ + reserved0:1, /* 3: reserved */ + rmt_node:1, /* 4: destination node */ + cache_hit_st:1, /* 5: cache hit state */ + reserved1:57; /* 5-63: reserved */ + }; +}; + +/* MSR 0xc0011037: IBS Op Data 3 */ +union ibs_op_data3 { + __u64 val; + struct { + __u64 ld_op:1, /* 0: load op */ + st_op:1, /* 1: store op */ + dc_l1tlb_miss:1, /* 2: data cache L1TLB miss */ + dc_l2tlb_miss:1, /* 3: data cache L2TLB hit in 2M page */ + dc_l1tlb_hit_2m:1, /* 4: data cache L1TLB hit in 2M page */ + dc_l1tlb_hit_1g:1, /* 5: data cache L1TLB hit in 1G page */ + dc_l2tlb_hit_2m:1, /* 6: data cache L2TLB hit in 2M page */ + dc_miss:1, /* 7: data cache miss */ + dc_mis_acc:1, /* 8: misaligned access */ + reserved:4, /* 9-12: reserved */ + dc_wc_mem_acc:1, /* 13: write combining memory access */ + dc_uc_mem_acc:1, /* 14: uncacheable memory access */ + dc_locked_op:1, /* 15: locked operation */ + dc_miss_no_mab_alloc:1, /* 16: DC miss with no MAB allocated */ + dc_lin_addr_valid:1, /* 17: data cache linear address valid */ + dc_phy_addr_valid:1, /* 18: data cache physical address valid */ + dc_l2_tlb_hit_1g:1, /* 19: data cache L2 hit in 1GB page */ + l2_miss:1, /* 20: L2 cache miss */ + sw_pf:1, /* 21: software prefetch */ + op_mem_width:4, /* 22-25: load/store size in bytes */ + op_dc_miss_open_mem_reqs:6, /* 26-31: outstanding mem reqs on DC fill */ + dc_miss_lat:16, /* 32-47: data cache miss latency */ + tlb_refill_lat:16; /* 48-63: L1 TLB refill latency */ + }; +}; + +/* MSR 0xc001103c: IBS Fetch Control Extended */ +union ic_ibs_extd_ctl { + __u64 val; + struct { + __u64 itlb_refill_lat:16, /* 0-15: ITLB Refill latency for sampled fetch */ + reserved:48; /* 16-63: reserved */ + }; +}; + +/* + * IBS driver related + */ + +struct perf_ibs_data { + u32 size; + union { + u32 data[0]; /* data buffer starts here */ + u32 caps; + }; + u64 regs[MSR_AMD64_IBS_REG_COUNT_MAX]; +}; diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 4819d5e5a3353d7db2fa314f9665a6cd07db741a..7f828fe497978e3132a2c6d87786d02029719fd2 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -84,22 +84,4 @@ do { \ #include -/* - * Make previous memory operations globally visible before - * a WRMSR. - * - * MFENCE makes writes visible, but only affects load/store - * instructions. WRMSR is unfortunately not a load/store - * instruction and is unaffected by MFENCE. The LFENCE ensures - * that the WRMSR is not reordered. - * - * Most WRMSRs are full serializing instructions themselves and - * do not require this barrier. This is only required for the - * IA32_TSC_DEADLINE and X2APIC MSRs. - */ -static inline void weak_wrmsr_fence(void) -{ - asm volatile("mfence; lfence" : : : "memory"); -} - #endif /* _ASM_X86_BARRIER_H */ diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index adc6cc86b06201761a2a19e2fcdb306d631241be..258996697e58f10b648d99123892c4eed4bb4ef4 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -40,4 +40,22 @@ int mwait_usable(const struct cpuinfo_x86 *); unsigned int x86_family(unsigned int sig); unsigned int x86_model(unsigned int sig); unsigned int x86_stepping(unsigned int sig); +struct ucode_cpu_info; + +int intel_cpu_collect_info(struct ucode_cpu_info *uci); + +static inline bool intel_cpu_signatures_match(unsigned int s1, unsigned int p1, + unsigned int s2, unsigned int p2) +{ + if (s1 != s2) + return false; + + /* Processor flags are either both 0 ... */ + if (!p1 && !p2) + return true; + + /* ... or they intersect. */ + return p1 & p2; +} + #endif /* _ASM_X86_CPU_H */ diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 619c1f80a2abe018fafcff27470631496240c782..00d329a990e56d8735216f482aa82225a79dc637 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -30,6 +30,7 @@ enum cpuid_leafs CPUID_7_ECX, CPUID_8000_0007_EBX, CPUID_7_EDX, + CPUID_8000_001F_EAX, }; #ifdef CONFIG_X86_FEATURE_NAMES @@ -88,8 +89,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 16, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 17, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) || \ + CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 19, feature_bit) || \ REQUIRED_MASK_CHECK || \ - BUILD_BUG_ON_ZERO(NCAPINTS != 19)) + BUILD_BUG_ON_ZERO(NCAPINTS != 20)) #define DISABLED_MASK_BIT_SET(feature_bit) \ ( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 0, feature_bit) || \ @@ -111,8 +113,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 16, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 17, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) || \ + CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 19, feature_bit) || \ DISABLED_MASK_CHECK || \ - BUILD_BUG_ON_ZERO(NCAPINTS != 19)) + BUILD_BUG_ON_ZERO(NCAPINTS != 20)) #define cpu_has(c, bit) \ (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \ diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 34ff851bd2fa8dc9a9a9594b8bbb254cb2c29836..9e87d60f1bdd1ff2ffee5a281917f75b2de47cca 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -13,7 +13,7 @@ /* * Defines x86 CPU feature bits */ -#define NCAPINTS 19 /* N 32-bit words worth of info */ +#define NCAPINTS 20 /* N 32-bit words worth of info */ #define NBUGINTS 2 /* N 32-bit bug flags */ /* @@ -94,7 +94,7 @@ #define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in IA32 userspace */ #define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */ #define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */ -#define X86_FEATURE_SME_COHERENT ( 3*32+17) /* "" AMD hardware-enforced cache coherency */ +/* FREE! ( 3*32+17) */ #define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" LFENCE synchronizes RDTSC */ #define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */ #define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */ @@ -220,7 +220,7 @@ #define X86_FEATURE_INVPCID_SINGLE ( 7*32+ 7) /* Effectively INVPCID && CR4.PCIDE=1 */ #define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ -#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */ +/* FREE! ( 7*32+10) */ #define X86_FEATURE_PTI ( 7*32+11) /* Kernel Page Table Isolation enabled */ #define X86_FEATURE_KERNEL_IBRS ( 7*32+12) /* "" Set/clear IBRS on kernel entry/exit */ #define X86_FEATURE_RSB_VMEXIT ( 7*32+13) /* "" Fill RSB on VM-Exit */ @@ -230,7 +230,7 @@ #define X86_FEATURE_SSBD ( 7*32+17) /* Speculative Store Bypass Disable */ #define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */ #define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* "" Fill RSB on context switches */ -#define X86_FEATURE_SEV ( 7*32+20) /* AMD Secure Encrypted Virtualization */ +#define X86_FEATURE_PERFMON_V2 ( 7*32+20) /* AMD Performance Monitoring Version 2 */ #define X86_FEATURE_USE_IBPB ( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled */ #define X86_FEATURE_USE_IBRS_FW ( 7*32+22) /* "" Use IBRS during runtime firmware calls */ #define X86_FEATURE_SPEC_STORE_BYPASS_DISABLE ( 7*32+23) /* "" Disable Speculative Store Bypass. */ @@ -311,6 +311,7 @@ #define X86_FEATURE_RETPOLINE_LFENCE (11*32+13) /* "" Use LFENCE for Spectre variant 2 */ #define X86_FEATURE_RSB_VMEXIT_LITE (11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */ #define X86_FEATURE_MSR_TSX_CTRL (11*32+18) /* "" MSR IA32_TSX_CTRL (Intel) implemented */ +#define X86_FEATURE_APIC_MSRS_FENCE (11*32+27) /* "" IA32_TSC_DEADLINE and X2APIC MSRs need fencing */ #define X86_FEATURE_ZEN2 (11*32+28) /* "" CPU based on Zen2 microarchitecture */ #define X86_FEATURE_ZEN3 (11*32+29) /* "" CPU based on Zen3 microarchitecture */ #define X86_FEATURE_ZEN4 (11*32+30) /* "" CPU based on Zen4 microarchitecture */ @@ -332,6 +333,7 @@ #define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */ #define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */ #define X86_FEATURE_BTC_NO (13*32+29) /* "" Not vulnerable to Branch Type Confusion */ +#define X86_FEATURE_BRS (13*32+31) /* Branch Sampling available */ /* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */ #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ @@ -395,12 +397,20 @@ #define X86_FEATURE_MD_CLEAR (18*32+10) /* VERW clears CPU buffers */ #define X86_FEATURE_TSX_FORCE_ABORT (18*32+13) /* "" TSX_FORCE_ABORT */ #define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */ +#define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */ #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */ #define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */ #define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */ #define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */ +/* AMD-defined memory encryption features, CPUID level 0x8000001f (EAX), word 19 */ +#define X86_FEATURE_SME (19*32+ 0) /* AMD Secure Memory Encryption */ +#define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */ +#define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */ +#define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */ +#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */ + /* * BUG word(s) */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 210bc1350e87a2b065c2736f8d5ed2cddb467864..251a84d87ec2cd308d0a51756ef70efbfe28479f 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -90,6 +90,7 @@ #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 -#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) +#define DISABLED_MASK19 0 +#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20) #endif /* _ASM_X86_DISABLED_FEATURES_H */ diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 77cf6c11f66bd86341ef58f5503d68b24e3b6833..757d66bb848dab1933825cb924e33df866e42874 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -374,6 +374,30 @@ struct x86_emulate_ctxt { #define X86EMUL_CPUID_VENDOR_GenuineIntel_ecx 0x6c65746e #define X86EMUL_CPUID_VENDOR_GenuineIntel_edx 0x49656e69 +static inline bool is_guest_vendor_intel(u32 ebx, u32 ecx, u32 edx) +{ + return ebx == X86EMUL_CPUID_VENDOR_GenuineIntel_ebx && + ecx == X86EMUL_CPUID_VENDOR_GenuineIntel_ecx && + edx == X86EMUL_CPUID_VENDOR_GenuineIntel_edx; +} + +static inline bool is_guest_vendor_amd(u32 ebx, u32 ecx, u32 edx) +{ + return (ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx && + ecx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx && + edx == X86EMUL_CPUID_VENDOR_AuthenticAMD_edx) || + (ebx == X86EMUL_CPUID_VENDOR_AMDisbetterI_ebx && + ecx == X86EMUL_CPUID_VENDOR_AMDisbetterI_ecx && + edx == X86EMUL_CPUID_VENDOR_AMDisbetterI_edx); +} + +static inline bool is_guest_vendor_hygon(u32 ebx, u32 ecx, u32 edx) +{ + return ebx == X86EMUL_CPUID_VENDOR_HygonGenuine_ebx && + ecx == X86EMUL_CPUID_VENDOR_HygonGenuine_ecx && + edx == X86EMUL_CPUID_VENDOR_HygonGenuine_edx; +} + enum x86_intercept_stage { X86_ICTP_NONE = 0, /* Allow zero-init to not match anything */ X86_ICPT_PRE_EXCEPT, diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e3c13b8749198812edcce0ea74d3bfd73e07a085..a324193effda5fb211f50886ff3d5ebbaa80472a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -648,6 +648,7 @@ struct kvm_vcpu_arch { int cpuid_nent; struct kvm_cpuid_entry2 cpuid_entries[KVM_MAX_CPUID_ENTRIES]; + bool is_amd_compatible; int maxphyaddr; @@ -1101,7 +1102,6 @@ struct kvm_x86_ops { int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*set_identity_map_addr)(struct kvm *kvm, u64 ident_addr); - int (*get_tdp_level)(struct kvm_vcpu *vcpu); u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); int (*get_lpage_level)(void); bool (*rdtscp_supported)(void); @@ -1456,9 +1456,6 @@ void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva); void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid); void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3, bool skip_tlb_flush); -void kvm_enable_tdp(void); -void kvm_disable_tdp(void); - static inline gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access, struct x86_exception *exception) { @@ -1471,6 +1468,8 @@ static inline struct kvm_mmu_page *page_header(hpa_t shadow_page) return (struct kvm_mmu_page *)page_private(page); } +void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level, + int tdp_max_root_level, int tdp_huge_page_level); static inline u16 kvm_read_ldt(void) { diff --git a/arch/x86/include/asm/microcode_amd.h b/arch/x86/include/asm/microcode_amd.h index c6d83cee427d84e3df3afba32c8dd31a20a2a6cc..df42512087e2cfbeca66b7dd274ff2820b2cd648 100644 --- a/arch/x86/include/asm/microcode_amd.h +++ b/arch/x86/include/asm/microcode_amd.h @@ -44,13 +44,11 @@ struct microcode_amd { #define PATCH_MAX_SIZE (3 * PAGE_SIZE) #ifdef CONFIG_MICROCODE_AMD -extern void __init load_ucode_amd_bsp(unsigned int family); -extern void load_ucode_amd_ap(unsigned int family); +extern void load_ucode_amd_early(unsigned int cpuid_1_eax); extern int __init save_microcode_in_initrd_amd(unsigned int family); void reload_ucode_amd(unsigned int cpu); #else -static inline void __init load_ucode_amd_bsp(unsigned int family) {} -static inline void load_ucode_amd_ap(unsigned int family) {} +static inline void load_ucode_amd_early(unsigned int cpuid_1_eax) {} static inline int __init save_microcode_in_initrd_amd(unsigned int family) { return -EINVAL; } static inline void reload_ucode_amd(unsigned int cpu) {} diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index f779f80bb7c5738b9dca4841412198258be351d6..220448e729b5545fcc27af62d6b6408d1e9fde43 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -192,7 +192,23 @@ #define LBR_INFO_MISPRED BIT_ULL(63) #define LBR_INFO_IN_TX BIT_ULL(62) #define LBR_INFO_ABORT BIT_ULL(61) +#define LBR_INFO_CYC_CNT_VALID BIT_ULL(60) #define LBR_INFO_CYCLES 0xffff +#define LBR_INFO_BR_TYPE_OFFSET 56 +#define LBR_INFO_BR_TYPE (0xfull << LBR_INFO_BR_TYPE_OFFSET) + +#define MSR_ARCH_LBR_CTL 0x000014ce +#define ARCH_LBR_CTL_LBREN BIT(0) +#define ARCH_LBR_CTL_CPL_OFFSET 1 +#define ARCH_LBR_CTL_CPL (0x3ull << ARCH_LBR_CTL_CPL_OFFSET) +#define ARCH_LBR_CTL_STACK_OFFSET 3 +#define ARCH_LBR_CTL_STACK (0x1ull << ARCH_LBR_CTL_STACK_OFFSET) +#define ARCH_LBR_CTL_FILTER_OFFSET 16 +#define ARCH_LBR_CTL_FILTER (0x7full << ARCH_LBR_CTL_FILTER_OFFSET) +#define MSR_ARCH_LBR_DEPTH 0x000014cf +#define MSR_ARCH_LBR_FROM_0 0x00001500 +#define MSR_ARCH_LBR_TO_0 0x00001600 +#define MSR_ARCH_LBR_INFO_0 0x00001200 #define MSR_IA32_PEBS_ENABLE 0x000003f1 #define MSR_PEBS_DATA_CFG 0x000003f2 @@ -497,6 +513,11 @@ #define MSR_ZEN2_SPECTRAL_CHICKEN 0xc00110e3 #define MSR_ZEN2_SPECTRAL_CHICKEN_BIT BIT_ULL(1) +/* AMD Performance Counter Global Status and Control MSRs */ +#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300 +#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL 0xc0000301 +#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR 0xc0000302 + /* Fam 17h MSRs */ #define MSR_F17H_IRPERF 0xc00000e9 @@ -661,6 +682,10 @@ #define MSR_IA32_PERF_CTL 0x00000199 #define INTEL_PERF_CTL_MASK 0xffff +/* AMD Branch Sampling configuration */ +#define MSR_AMD_DBG_EXTN_CFG 0xc000010f +#define MSR_AMD_SAMP_BR_FROM 0xc0010300 + #define MSR_IA32_MPERF 0x000000e7 #define MSR_IA32_APERF 0x000000e8 diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index 9c0200ede2427afb40f2f1904ac00d1c08f380bc..9ed1e1ae8bcb2a083594957617d6af06742e359f 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -98,6 +98,7 @@ #define AMD64_RAW_EVENT_MASK_NB \ (AMD64_EVENTSEL_EVENT | \ ARCH_PERFMON_EVENTSEL_UMASK) + #define HYGON_F18H_M4H_EVENTSEL_UMASK_NB 0x0003FF00ULL #define HYGON_F18H_M6H_EVENTSEL_UMASK_NB 0x000FFF00ULL @@ -111,6 +112,18 @@ (HYGON_F18H_EVENTSEL_EVENT | \ HYGON_F18H_M6H_EVENTSEL_UMASK_NB) +#define AMD64_PERFMON_V2_EVENTSEL_EVENT_NB \ + (AMD64_EVENTSEL_EVENT | \ + GENMASK_ULL(37, 36)) + +#define AMD64_PERFMON_V2_EVENTSEL_UMASK_NB \ + (ARCH_PERFMON_EVENTSEL_UMASK | \ + GENMASK_ULL(27, 24)) + +#define AMD64_PERFMON_V2_RAW_EVENT_MASK_NB \ + (AMD64_PERFMON_V2_EVENTSEL_EVENT_NB | \ + AMD64_PERFMON_V2_EVENTSEL_UMASK_NB) + #define AMD64_NUM_COUNTERS 4 #define AMD64_NUM_COUNTERS_CORE 6 #define AMD64_NUM_COUNTERS_NB 4 @@ -166,6 +179,61 @@ union cpuid10_edx { unsigned int full; }; +/* + * Intel Architectural LBR CPUID detection/enumeration details: + */ +union cpuid28_eax { + struct { + /* Supported LBR depth values */ + unsigned int lbr_depth_mask:8; + unsigned int reserved:22; + /* Deep C-state Reset */ + unsigned int lbr_deep_c_reset:1; + /* IP values contain LIP */ + unsigned int lbr_lip:1; + } split; + unsigned int full; +}; + +union cpuid28_ebx { + struct { + /* CPL Filtering Supported */ + unsigned int lbr_cpl:1; + /* Branch Filtering Supported */ + unsigned int lbr_filter:1; + /* Call-stack Mode Supported */ + unsigned int lbr_call_stack:1; + } split; + unsigned int full; +}; + +union cpuid28_ecx { + struct { + /* Mispredict Bit Supported */ + unsigned int lbr_mispred:1; + /* Timed LBRs Supported */ + unsigned int lbr_timed_lbr:1; + /* Branch Type Field Supported */ + unsigned int lbr_br_type:1; + } split; + unsigned int full; +}; + +/* + * AMD "Extended Performance Monitoring and Debug" CPUID + * detection/enumeration details: + */ +union cpuid_0x80000022_ebx { + struct { + /* Number of Core Performance Counters */ + unsigned int num_core_pmc:4; + unsigned int reserved:6; + /* Number of Data Fabric Counters */ + unsigned int num_df_pmc:6; + } split; + unsigned int full; +}; + struct x86_pmu_capability { int version; int num_counters_gp; @@ -337,13 +405,14 @@ struct pebs_xmm { u64 xmm[16*2]; /* two entries for each register */ }; -struct pebs_lbr_entry { +struct lbr_entry { u64 from, to, info; }; -struct pebs_lbr { - struct pebs_lbr_entry lbr[0]; /* Variable length */ -}; +/* + * AMD Extended Performance Monitoring and Debug cpuid feature detection + */ +#define EXT_PERFMON_DEBUG_FEATURES 0x80000022 /* * IBS cpuid feature detection @@ -366,6 +435,7 @@ struct pebs_lbr { #define IBS_CAPS_OPBRNFUSE (1U<<8) #define IBS_CAPS_FETCHCTLEXTD (1U<<9) #define IBS_CAPS_OPDATA4 (1U<<10) +#define IBS_CAPS_ZEN4 (1U<<11) #define IBS_CAPS_DEFAULT (IBS_CAPS_AVAIL \ | IBS_CAPS_FETCHSAM \ @@ -379,6 +449,7 @@ struct pebs_lbr { #define IBSCTL_LVT_OFFSET_MASK 0x0F /* IBS fetch bits/masks */ +#define IBS_FETCH_L3MISSONLY (1ULL<<59) #define IBS_FETCH_RAND_EN (1ULL<<57) #define IBS_FETCH_VAL (1ULL<<49) #define IBS_FETCH_ENABLE (1ULL<<48) @@ -395,8 +466,10 @@ struct pebs_lbr { #define IBS_OP_CNT_CTL (1ULL<<19) #define IBS_OP_VAL (1ULL<<18) #define IBS_OP_ENABLE (1ULL<<17) +#define IBS_OP_L3MISSONLY (1ULL<<16) #define IBS_OP_MAX_CNT 0x0000FFFFULL #define IBS_OP_MAX_CNT_EXT 0x007FFFFFULL /* not a register bit mask */ +#define IBS_OP_MAX_CNT_EXT_MASK (0x7FULL<<20) /* separate upper 7 bits */ #define IBS_RIP_INVALID (1ULL<<38) #ifdef CONFIG_X86_LOCAL_APIC @@ -455,17 +528,10 @@ struct x86_pmu_lbr { unsigned int info; }; -extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr); extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap); extern void perf_check_microcode(void); extern int x86_perf_rdpmc_index(struct perf_event *event); #else -static inline struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr) -{ - *nr = 0; - return NULL; -} - static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap) { memset(cap, 0, sizeof(*cap)); @@ -477,15 +543,26 @@ static inline void perf_check_microcode(void) { } #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL) extern int x86_perf_get_lbr(struct x86_pmu_lbr *lbr); +extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr); #else static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr) { return -1; } +static inline struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr) +{ + *nr = 0; + return NULL; +} #endif #ifdef CONFIG_CPU_SUP_INTEL extern void intel_pt_handle_vmx(int on); +#else +static inline void intel_pt_handle_vmx(int on) +{ + +} #endif #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index ac48297094ea35bcf03408d6b3885465d489f927..258e54676f1ac597e9274f6490baf8342963518c 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -987,4 +987,22 @@ enum taa_mitigations { TAA_MITIGATION_TSX_DISABLED, }; +/* + * Make previous memory operations globally visible before + * a WRMSR. + * + * MFENCE makes writes visible, but only affects load/store + * instructions. WRMSR is unfortunately not a load/store + * instruction and is unaffected by MFENCE. The LFENCE ensures + * that the WRMSR is not reordered. + * + * Most WRMSRs are full serializing instructions themselves and + * do not require this barrier. This is only required for the + * IA32_TSC_DEADLINE and X2APIC MSRs. + */ +static inline void weak_wrmsr_fence(void) +{ + alternative("mfence; lfence", "", ALT_NOT(X86_FEATURE_APIC_MSRS_FENCE)); +} + #endif /* _ASM_X86_PROCESSOR_H */ diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h index 6847d85400a8b738ca2adf00ea95fdf8f349fafc..fa5700097f6474dcecef8da81b625d30b648ee60 100644 --- a/arch/x86/include/asm/required-features.h +++ b/arch/x86/include/asm/required-features.h @@ -101,6 +101,7 @@ #define REQUIRED_MASK16 0 #define REQUIRED_MASK17 0 #define REQUIRED_MASK18 0 -#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) +#define REQUIRED_MASK19 0 +#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20) #endif /* _ASM_X86_REQUIRED_FEATURES_H */ diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 6f66d841262d9c9713918d1014effff035f2bbeb..69e6ea20679c8731421cd924316c43213fea5ba0 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -172,7 +172,7 @@ struct tlb_state { /* Last user mm for optimizing IBPB */ union { struct mm_struct *last_user_mm; - unsigned long last_user_mm_ibpb; + unsigned long last_user_mm_spec; }; u16 loaded_mm_asid; diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index d1afddf9a140be0e5c478860972bbef3a8cac93c..cdeba6e3b2a72476cb14472327079824dbabf04a 100644 --- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -26,7 +26,10 @@ #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F4 0x144c #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444 #define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654 +#define PCI_DEVICE_ID_AMD_1AH_M00H_ROOT 0x153a +#define PCI_DEVICE_ID_AMD_1AH_M20H_ROOT 0x1507 #define PCI_DEVICE_ID_AMD_19H_M10H_DF_F4 0x14b1 +#define PCI_DEVICE_ID_AMD_1AH_M00H_DF_F4 0x12c4 #define PCI_DEVICE_ID_HYGON_18H_M05H_ROOT 0x14a0 #define PCI_DEVICE_ID_HYGON_18H_M10H_ROOT 0x14c0 @@ -48,6 +51,8 @@ static const struct pci_device_id amd_root_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M30H_ROOT) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M60H_ROOT) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M10H_ROOT) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_1AH_M00H_ROOT) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_1AH_M20H_ROOT) }, {} }; @@ -70,6 +75,8 @@ const struct pci_device_id amd_nb_misc_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M70H_DF_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_DF_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M10H_DF_F3) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_1AH_M00H_DF_F3) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_1AH_M20H_DF_F3) }, {} }; EXPORT_SYMBOL_GPL(amd_nb_misc_ids); @@ -88,6 +95,7 @@ static const struct pci_device_id amd_nb_link_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_DF_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M10H_DF_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_1AH_M00H_DF_F4) }, {} }; diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 84ca77a2f943143312ac788f59fda558c17cec19..b953bfc739b31a4f03c5f3c5a353b9a8e384294e 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -34,62 +34,6 @@ */ static u32 nodes_per_socket = 1; -/* - * AMD errata checking - * - * Errata are defined as arrays of ints using the AMD_LEGACY_ERRATUM() or - * AMD_OSVW_ERRATUM() macros. The latter is intended for newer errata that - * have an OSVW id assigned, which it takes as first argument. Both take a - * variable number of family-specific model-stepping ranges created by - * AMD_MODEL_RANGE(). - * - * Example: - * - * const int amd_erratum_319[] = - * AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x10, 0x2, 0x1, 0x4, 0x2), - * AMD_MODEL_RANGE(0x10, 0x8, 0x0, 0x8, 0x0), - * AMD_MODEL_RANGE(0x10, 0x9, 0x0, 0x9, 0x0)); - */ - -#define AMD_LEGACY_ERRATUM(...) { -1, __VA_ARGS__, 0 } -#define AMD_OSVW_ERRATUM(osvw_id, ...) { osvw_id, __VA_ARGS__, 0 } -#define AMD_MODEL_RANGE(f, m_start, s_start, m_end, s_end) \ - ((f << 24) | (m_start << 16) | (s_start << 12) | (m_end << 4) | (s_end)) -#define AMD_MODEL_RANGE_FAMILY(range) (((range) >> 24) & 0xff) -#define AMD_MODEL_RANGE_START(range) (((range) >> 12) & 0xfff) -#define AMD_MODEL_RANGE_END(range) ((range) & 0xfff) - -static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum) -{ - int osvw_id = *erratum++; - u32 range; - u32 ms; - - if (osvw_id >= 0 && osvw_id < 65536 && - cpu_has(cpu, X86_FEATURE_OSVW)) { - u64 osvw_len; - - rdmsrl(MSR_AMD64_OSVW_ID_LENGTH, osvw_len); - if (osvw_id < osvw_len) { - u64 osvw_bits; - - rdmsrl(MSR_AMD64_OSVW_STATUS + (osvw_id >> 6), - osvw_bits); - return osvw_bits & (1ULL << (osvw_id & 0x3f)); - } - } - - /* OSVW unavailable or ID unknown, match family-model-stepping range */ - ms = (cpu->x86_model << 4) | cpu->x86_stepping; - while ((range = *erratum++)) - if ((cpu->x86 == AMD_MODEL_RANGE_FAMILY(range)) && - (ms >= AMD_MODEL_RANGE_START(range)) && - (ms <= AMD_MODEL_RANGE_END(range))) - return true; - - return false; -} - static inline int rdmsrl_amd_safe(unsigned msr, unsigned long long *p) { u32 gprs[8] = { 0 }; @@ -1246,6 +1190,9 @@ static void init_amd(struct cpuinfo_x86 *c) msr_set_bit(MSR_K7_HWCR, MSR_K7_HWCR_IRPERF_EN_BIT); check_null_seg_clears_base(c); + + /* AMD CPUs don't need fencing after x2APIC/TSC_DEADLINE MSR writes. */ + clear_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE); } #ifdef CONFIG_X86_32 diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 71be1bf011be766c4e975e74f98f3e9feb98e6a9..38efbd12204184b48bc912062b7ec133e8ea2434 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1222,51 +1222,54 @@ static void __init spec_ctrl_disable_kernel_rrsba(void) } } -static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode) +static void __init spectre_v2_select_rsb_mitigation(enum spectre_v2_mitigation mode) { /* - * Similar to context switches, there are two types of RSB attacks - * after VM exit: + * WARNING! There are many subtleties to consider when changing *any* + * code related to RSB-related mitigations. Before doing so, carefully + * read the following document, and update if necessary: * - * 1) RSB underflow + * Documentation/admin-guide/hw-vuln/rsb.rst * - * 2) Poisoned RSB entry + * In an overly simplified nutshell: * - * When retpoline is enabled, both are mitigated by filling/clearing - * the RSB. + * - User->user RSB attacks are conditionally mitigated during + * context switches by cond_mitigation -> write_ibpb(). * - * When IBRS is enabled, while #1 would be mitigated by the IBRS branch - * prediction isolation protections, RSB still needs to be cleared - * because of #2. Note that SMEP provides no protection here, unlike - * user-space-poisoned RSB entries. + * - User->kernel and guest->host attacks are mitigated by eIBRS or + * RSB filling. * - * eIBRS should protect against RSB poisoning, but if the EIBRS_PBRSB - * bug is present then a LITE version of RSB protection is required, - * just a single call needs to retire before a RET is executed. + * Though, depending on config, note that other alternative + * mitigations may end up getting used instead, e.g., IBPB on + * entry/vmexit, call depth tracking, or return thunks. */ + switch (mode) { case SPECTRE_V2_NONE: - return; + break; - case SPECTRE_V2_EIBRS_LFENCE: case SPECTRE_V2_EIBRS: + case SPECTRE_V2_EIBRS_LFENCE: + case SPECTRE_V2_EIBRS_RETPOLINE: if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) { - setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n"); + setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); } - return; + break; - case SPECTRE_V2_EIBRS_RETPOLINE: case SPECTRE_V2_RETPOLINE: case SPECTRE_V2_LFENCE: case SPECTRE_V2_IBRS: + pr_info("Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT\n"); + setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT); - pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n"); - return; - } + break; - pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation at VM exit"); - dump_stack(); + default: + pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation\n"); + dump_stack(); + break; + } } static void __init spectre_v2_select_mitigation(void) @@ -1377,48 +1380,7 @@ static void __init spectre_v2_select_mitigation(void) spectre_v2_enabled = mode; pr_info("%s\n", spectre_v2_strings[mode]); - /* - * If Spectre v2 protection has been enabled, fill the RSB during a - * context switch. In general there are two types of RSB attacks - * across context switches, for which the CALLs/RETs may be unbalanced. - * - * 1) RSB underflow - * - * Some Intel parts have "bottomless RSB". When the RSB is empty, - * speculated return targets may come from the branch predictor, - * which could have a user-poisoned BTB or BHB entry. - * - * AMD has it even worse: *all* returns are speculated from the BTB, - * regardless of the state of the RSB. - * - * When IBRS or eIBRS is enabled, the "user -> kernel" attack - * scenario is mitigated by the IBRS branch prediction isolation - * properties, so the RSB buffer filling wouldn't be necessary to - * protect against this type of attack. - * - * The "user -> user" attack scenario is mitigated by RSB filling. - * - * 2) Poisoned RSB entry - * - * If the 'next' in-kernel return stack is shorter than 'prev', - * 'next' could be tricked into speculating with a user-poisoned RSB - * entry. - * - * The "user -> kernel" attack scenario is mitigated by SMEP and - * eIBRS. - * - * The "user -> user" scenario, also known as SpectreBHB, requires - * RSB clearing. - * - * So to mitigate all cases, unconditionally fill RSB on context - * switches. - * - * FIXME: Is this pointless for retbleed-affected AMD? - */ - setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); - pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n"); - - spectre_v2_determine_rsb_fill_type_at_vmexit(mode); + spectre_v2_select_rsb_mitigation(mode); /* * Retpoline protects the kernel, but doesn't protect firmware. IBRS diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index e667cc3ffeec092765e2345e9444c9f2afd941ef..236de8d5155384092622a40e7d2c083dd93fea4e 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -959,6 +959,9 @@ void get_cpu_cap(struct cpuinfo_x86 *c) if (c->extended_cpuid_level >= 0x8000000a) c->x86_capability[CPUID_8000_000A_EDX] = cpuid_edx(0x8000000a); + if (c->extended_cpuid_level >= 0x8000001f) + c->x86_capability[CPUID_8000_001F_EAX] = cpuid_eax(0x8000001f); + init_scattered_cpuid_features(c); init_speculation_control(c); @@ -1615,6 +1618,13 @@ static void identify_cpu(struct cpuinfo_x86 *c) c->apicid = apic->phys_pkg_id(c->initial_apicid, 0); #endif + + /* + * Set default APIC and TSC_DEADLINE MSR fencing flag. AMD and + * Hygon will clear it in ->c_init() below. + */ + set_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE); + /* * Vendor-specific initialization. In this section we * canonicalize the feature flags, meaning if there are diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c index 3da9c951ac31badfe0375091e8cc3193e890e218..94ad45d491add9c4b5fcc19495d32351349df5b6 100644 --- a/arch/x86/kernel/cpu/hygon.c +++ b/arch/x86/kernel/cpu/hygon.c @@ -367,6 +367,9 @@ static void init_hygon(struct cpuinfo_x86 *c) set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS); check_null_seg_clears_base(c); + + /* Hygon CPUs don't need fencing after x2APIC/TSC_DEADLINE MSR writes. */ + clear_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE); } static void cpu_detect_tlb_hygon(struct cpuinfo_x86 *c) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 0f6f230323b97ff5d22f50e58c966f9be0d36786..4b5a819bb6aa68f5673d82cdb67a0bad3bb309c0 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -188,6 +188,38 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c) return false; } +int intel_cpu_collect_info(struct ucode_cpu_info *uci) +{ + unsigned int val[2]; + unsigned int family, model; + struct cpu_signature csig = { 0 }; + unsigned int eax, ebx, ecx, edx; + + memset(uci, 0, sizeof(*uci)); + + eax = 0x00000001; + ecx = 0; + native_cpuid(&eax, &ebx, &ecx, &edx); + csig.sig = eax; + + family = x86_family(eax); + model = x86_model(eax); + + if (model >= 5 || family > 6) { + /* get processor flags from MSR 0x17 */ + native_rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]); + csig.pf = 1 << ((val[1] >> 18) & 7); + } + + csig.rev = intel_get_microcode_revision(); + + uci->cpu_sig = csig; + uci->valid = 1; + + return 0; +} +EXPORT_SYMBOL_GPL(intel_cpu_collect_info); + static void early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable; diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c index a738f294ab7c65e497a9bd1c7ff703cb0706cf71..ff11a3777bae03e59ec81e4561ea4e0fe9f2c42f 100644 --- a/arch/x86/kernel/cpu/microcode/amd.c +++ b/arch/x86/kernel/cpu/microcode/amd.c @@ -56,9 +56,6 @@ struct cont_desc { static u32 ucode_new_rev; -/* One blob per node. */ -static u8 amd_ucode_patch[MAX_NUMNODES][PATCH_MAX_SIZE]; - /* * Microcode patch container file is prepended to the initrd in cpio * format. See Documentation/x86/microcode.rst @@ -417,21 +414,17 @@ static int __apply_microcode_amd(struct microcode_amd *mc) * * Returns true if container found (sets @desc), false otherwise. */ -static bool -apply_microcode_early_amd(u32 cpuid_1_eax, void *ucode, size_t size, bool save_patch) +static bool early_apply_microcode(u32 cpuid_1_eax, void *ucode, size_t size) { struct cont_desc desc = { 0 }; - u8 (*patch)[PATCH_MAX_SIZE]; struct microcode_amd *mc; u32 rev, dummy, *new_rev; bool ret = false; #ifdef CONFIG_X86_32 new_rev = (u32 *)__pa_nodebug(&ucode_new_rev); - patch = (u8 (*)[PATCH_MAX_SIZE])__pa_nodebug(&amd_ucode_patch); #else new_rev = &ucode_new_rev; - patch = &amd_ucode_patch[0]; #endif desc.cpuid_1_eax = cpuid_1_eax; @@ -455,9 +448,6 @@ apply_microcode_early_amd(u32 cpuid_1_eax, void *ucode, size_t size, bool save_p if (!__apply_microcode_amd(mc)) { *new_rev = mc->hdr.patch_id; ret = true; - - if (save_patch) - memcpy(patch, mc, min_t(u32, desc.psize, PATCH_MAX_SIZE)); } return ret; @@ -481,7 +471,7 @@ static bool get_builtin_microcode(struct cpio_data *cp, unsigned int family) #endif } -static void __load_ucode_amd(unsigned int cpuid_1_eax, struct cpio_data *ret) +static void find_blobs_in_containers(unsigned int cpuid_1_eax, struct cpio_data *ret) { struct ucode_cpu_info *uci; struct cpio_data cp; @@ -514,50 +504,20 @@ static void __load_ucode_amd(unsigned int cpuid_1_eax, struct cpio_data *ret) *ret = cp; } -void __init load_ucode_amd_bsp(unsigned int cpuid_1_eax) +static void apply_ucode_from_containers(unsigned int cpuid_1_eax) { struct cpio_data cp = { }; - __load_ucode_amd(cpuid_1_eax, &cp); + find_blobs_in_containers(cpuid_1_eax, &cp); if (!(cp.data && cp.size)) return; - apply_microcode_early_amd(cpuid_1_eax, cp.data, cp.size, true); + early_apply_microcode(cpuid_1_eax, cp.data, cp.size); } -void load_ucode_amd_ap(unsigned int cpuid_1_eax) +void load_ucode_amd_early(unsigned int cpuid_1_eax) { - struct microcode_amd *mc; - struct cpio_data cp; - u32 *new_rev, rev, dummy; - - if (IS_ENABLED(CONFIG_X86_32)) { - mc = (struct microcode_amd *)__pa_nodebug(amd_ucode_patch); - new_rev = (u32 *)__pa_nodebug(&ucode_new_rev); - } else { - mc = (struct microcode_amd *)amd_ucode_patch; - new_rev = &ucode_new_rev; - } - - native_rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy); - - /* - * Check whether a new patch has been saved already. Also, allow application of - * the same revision in order to pick up SMT-thread-specific configuration even - * if the sibling SMT thread already has an up-to-date revision. - */ - if (*new_rev && rev <= mc->hdr.patch_id) { - if (!__apply_microcode_amd(mc)) { - *new_rev = mc->hdr.patch_id; - return; - } - } - - __load_ucode_amd(cpuid_1_eax, &cp); - if (!(cp.data && cp.size)) - return; - - apply_microcode_early_amd(cpuid_1_eax, cp.data, cp.size, false); + return apply_ucode_from_containers(cpuid_1_eax); } static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t size); @@ -591,22 +551,6 @@ int __init save_microcode_in_initrd_amd(unsigned int cpuid_1_eax) return 0; } -void reload_ucode_amd(unsigned int cpu) -{ - u32 rev, dummy; - struct microcode_amd *mc; - - mc = (struct microcode_amd *)amd_ucode_patch[cpu_to_node(cpu)]; - - rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy); - - if (rev < mc->hdr.patch_id) { - if (!__apply_microcode_amd(mc)) { - ucode_new_rev = mc->hdr.patch_id; - pr_info("reload patch_level=0x%08x\n", ucode_new_rev); - } - } -} static u16 __find_equiv_id(unsigned int cpu) { struct ucode_cpu_info *uci = ucode_cpu_info + cpu; @@ -671,6 +615,28 @@ static struct ucode_patch *find_patch(unsigned int cpu) return cache_find_patch(equiv_id); } +void reload_ucode_amd(unsigned int cpu) +{ + u32 rev, dummy __always_unused; + struct microcode_amd *mc; + struct ucode_patch *p; + + p = find_patch(cpu); + if (!p) + return; + + mc = p->data; + + rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy); + + if (rev < mc->hdr.patch_id) { + if (!__apply_microcode_amd(mc)) { + ucode_new_rev = mc->hdr.patch_id; + pr_info("reload patch_level=0x%08x\n", ucode_new_rev); + } + } +} + static int collect_cpu_info_amd(int cpu, struct cpu_signature *csig) { struct cpuinfo_x86 *c = &cpu_data(cpu); @@ -700,7 +666,7 @@ static enum ucode_state apply_microcode_amd(int cpu) struct ucode_cpu_info *uci; struct ucode_patch *p; enum ucode_state ret; - u32 rev, dummy; + u32 rev, dummy __always_unused; BUG_ON(raw_smp_processor_id() != cpu); @@ -828,6 +794,7 @@ static int verify_and_add_patch(u8 family, u8 *fw, unsigned int leftover, return 0; } +/* Scan the blob in @data and add microcode patches to the cache. */ static enum ucode_state __load_microcode_amd(u8 family, const u8 *data, size_t size) { @@ -890,9 +857,6 @@ static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t siz continue; ret = UCODE_NEW; - - memset(&amd_ucode_patch[nid], 0, PATCH_MAX_SIZE); - memcpy(&amd_ucode_patch[nid], p->data, min_t(u32, p->size, PATCH_MAX_SIZE)); } return ret; diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c index 4eced47ed17ceb446b8083382b95362c63f9aca8..1b2fd31d0aa742a410bccceefbb8af3715928346 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -205,7 +205,7 @@ void __init load_ucode_bsp(void) if (intel) load_ucode_intel_bsp(); else - load_ucode_amd_bsp(cpuid_1_eax); + load_ucode_amd_early(cpuid_1_eax); } static bool check_loader_disabled_ap(void) @@ -233,10 +233,10 @@ void load_ucode_ap(void) break; case X86_VENDOR_AMD: if (x86_family(cpuid_1_eax) >= 0x10) - load_ucode_amd_ap(cpuid_1_eax); + load_ucode_amd_early(cpuid_1_eax); break; case X86_VENDOR_HYGON: - load_ucode_amd_ap(cpuid_1_eax); + load_ucode_amd_early(cpuid_1_eax); break; default: break; diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c index 896f456b3f5c8288c4b23ef2c30d0b68f6123203..de7068461e38f887264d6870a435dcb75f414b27 100644 --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -45,20 +45,6 @@ static struct microcode_intel *intel_ucode_patch; /* last level cache size per core */ static int llc_size_per_core; -static inline bool cpu_signatures_match(unsigned int s1, unsigned int p1, - unsigned int s2, unsigned int p2) -{ - if (s1 != s2) - return false; - - /* Processor flags are either both 0 ... */ - if (!p1 && !p2) - return true; - - /* ... or they intersect. */ - return p1 & p2; -} - /* * Returns 1 if update has been found, 0 otherwise. */ @@ -69,7 +55,7 @@ static int find_matching_signature(void *mc, unsigned int csig, int cpf) struct extended_signature *ext_sig; int i; - if (cpu_signatures_match(csig, cpf, mc_hdr->sig, mc_hdr->pf)) + if (intel_cpu_signatures_match(csig, cpf, mc_hdr->sig, mc_hdr->pf)) return 1; /* Look for ext. headers: */ @@ -80,7 +66,7 @@ static int find_matching_signature(void *mc, unsigned int csig, int cpf) ext_sig = (void *)ext_hdr + EXT_HEADER_SIZE; for (i = 0; i < ext_hdr->count; i++) { - if (cpu_signatures_match(csig, cpf, ext_sig->sig, ext_sig->pf)) + if (intel_cpu_signatures_match(csig, cpf, ext_sig->sig, ext_sig->pf)) return 1; ext_sig++; } @@ -342,37 +328,6 @@ scan_microcode(void *data, size_t size, struct ucode_cpu_info *uci, bool save) return patch; } -static int collect_cpu_info_early(struct ucode_cpu_info *uci) -{ - unsigned int val[2]; - unsigned int family, model; - struct cpu_signature csig = { 0 }; - unsigned int eax, ebx, ecx, edx; - - memset(uci, 0, sizeof(*uci)); - - eax = 0x00000001; - ecx = 0; - native_cpuid(&eax, &ebx, &ecx, &edx); - csig.sig = eax; - - family = x86_family(eax); - model = x86_model(eax); - - if ((model >= 5) || (family > 6)) { - /* get processor flags from MSR 0x17 */ - native_rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]); - csig.pf = 1 << ((val[1] >> 18) & 7); - } - - csig.rev = intel_get_microcode_revision(); - - uci->cpu_sig = csig; - uci->valid = 1; - - return 0; -} - static void show_saved_mc(void) { #ifdef DEBUG @@ -386,7 +341,7 @@ static void show_saved_mc(void) return; } - collect_cpu_info_early(&uci); + intel_cpu_collect_info(&uci); sig = uci.cpu_sig.sig; pf = uci.cpu_sig.pf; @@ -495,7 +450,7 @@ void show_ucode_info_early(void) struct ucode_cpu_info uci; if (delay_ucode_info) { - collect_cpu_info_early(&uci); + intel_cpu_collect_info(&uci); print_ucode_info(&uci, current_mc_date); delay_ucode_info = 0; } @@ -597,7 +552,7 @@ int __init save_microcode_in_initrd_intel(void) if (!(cp.data && cp.size)) return 0; - collect_cpu_info_early(&uci); + intel_cpu_collect_info(&uci); scan_microcode(cp.data, cp.size, &uci, true); @@ -630,7 +585,7 @@ static struct microcode_intel *__load_ucode_intel(struct ucode_cpu_info *uci) if (!(cp.data && cp.size)) return NULL; - collect_cpu_info_early(uci); + intel_cpu_collect_info(uci); return scan_microcode(cp.data, cp.size, uci, false); } @@ -699,7 +654,7 @@ void reload_ucode_intel(void) struct microcode_intel *p; struct ucode_cpu_info uci; - collect_cpu_info_early(&uci); + intel_cpu_collect_info(&uci); p = find_patch(&uci); if (!p) diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c index a03e309a0ac5f401efc1c291691d26435c720cd8..a5872551f3f6a1934cb068d6a0643826f88a8ccb 100644 --- a/arch/x86/kernel/cpu/scattered.c +++ b/arch/x86/kernel/cpu/scattered.c @@ -40,9 +40,7 @@ static const struct cpuid_bit cpuid_bits[] = { { X86_FEATURE_CPB, CPUID_EDX, 9, 0x80000007, 0 }, { X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 }, { X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 }, - { X86_FEATURE_SME, CPUID_EAX, 0, 0x8000001f, 0 }, - { X86_FEATURE_SEV, CPUID_EAX, 1, 0x8000001f, 0 }, - { X86_FEATURE_SME_COHERENT, CPUID_EAX, 10, 0x8000001f, 0 }, + { X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 }, { 0, 0, 0, 0, 0 } }; diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 31ecf7a76d5a40474e2bc833f9834797e10f295f..c855aeb9ad7e9704545e7fbbccfef22484d13a25 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -8,12 +8,12 @@ kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o -kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ +kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ - hyperv.o page_track.o debugfs.o + hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o vmx/evmcs.o vmx/nested.o -kvm-amd-y += svm.o pmu_amd.o +kvm-amd-y += svm/svm.o svm/pmu.o obj-$(CONFIG_KVM) += kvm.o obj-$(CONFIG_KVM_INTEL) += kvm-intel.o diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index db383866746613647a8e1ad2e6e92d1681235818..9c28cc22dae0ef9f04498f28a5477d2d11dda4a2 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -144,6 +144,7 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu) } } + vcpu->arch.is_amd_compatible = guest_cpuid_is_amd_or_hygon(vcpu); /* Update physical-address width */ vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); kvm_mmu_reset_context(vcpu); @@ -1024,7 +1025,7 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, * requested. AMD CPUID semantics returns all zeroes for any * undefined leaf, whether or not the leaf is in range. */ - if (!entry && check_limit && !guest_cpuid_is_amd(vcpu) && + if (!entry && check_limit && !guest_cpuid_is_amd_or_hygon(vcpu) && !cpuid_function_in_range(vcpu, function)) { max = kvm_find_cpuid_entry(vcpu, 0, 0); if (max) { diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 7dec43b2c4205680925bca103541287e78dfb450..d732dfa8dea045185deb789563afd98f2c6c89e1 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -113,12 +113,24 @@ static __always_inline void guest_cpuid_clear(struct kvm_vcpu *vcpu, unsigned x8 *reg &= ~bit(x86_feature); } -static inline bool guest_cpuid_is_amd(struct kvm_vcpu *vcpu) +static inline bool guest_cpuid_is_amd_or_hygon(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; best = kvm_find_cpuid_entry(vcpu, 0, 0); - return best && best->ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx; + return best && + (is_guest_vendor_amd(best->ebx, best->ecx, best->edx) || + is_guest_vendor_hygon(best->ebx, best->ecx, best->edx)); +} + +static inline bool guest_cpuid_is_amd_compatible(struct kvm_vcpu *vcpu) +{ + return vcpu->arch.is_amd_compatible; +} + +static inline bool guest_cpuid_is_intel_compatible(struct kvm_vcpu *vcpu) +{ + return !guest_cpuid_is_amd_compatible(vcpu); } static inline int guest_cpuid_family(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 1a9fa29038526e221d48ed05b89710595be8d6c7..f116f56ab83b1dd0bcb6f2b84eb89106c4d691e7 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2756,9 +2756,7 @@ static bool vendor_intel(struct x86_emulate_ctxt *ctxt) eax = ecx = 0; ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false); - return ebx == X86EMUL_CPUID_VENDOR_GenuineIntel_ebx - && ecx == X86EMUL_CPUID_VENDOR_GenuineIntel_ecx - && edx == X86EMUL_CPUID_VENDOR_GenuineIntel_edx; + return is_guest_vendor_intel(ebx, ecx, edx); } static bool em_syscall_is_enabled(struct x86_emulate_ctxt *ctxt) @@ -2777,34 +2775,16 @@ static bool em_syscall_is_enabled(struct x86_emulate_ctxt *ctxt) ecx = 0x00000000; ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false); /* - * Intel ("GenuineIntel") - * remark: Intel CPUs only support "syscall" in 64bit - * longmode. Also an 64bit guest with a - * 32bit compat-app running will #UD !! While this - * behaviour can be fixed (by emulating) into AMD - * response - CPUs of AMD can't behave like Intel. + * remark: Intel CPUs only support "syscall" in 64bit longmode. Also a + * 64bit guest with a 32bit compat-app running will #UD !! While this + * behaviour can be fixed (by emulating) into AMD response - CPUs of + * AMD can't behave like Intel. */ - if (ebx == X86EMUL_CPUID_VENDOR_GenuineIntel_ebx && - ecx == X86EMUL_CPUID_VENDOR_GenuineIntel_ecx && - edx == X86EMUL_CPUID_VENDOR_GenuineIntel_edx) + if (is_guest_vendor_intel(ebx, ecx, edx)) return false; - /* AMD ("AuthenticAMD") */ - if (ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx && - ecx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx && - edx == X86EMUL_CPUID_VENDOR_AuthenticAMD_edx) - return true; - - /* AMD ("AMDisbetter!") */ - if (ebx == X86EMUL_CPUID_VENDOR_AMDisbetterI_ebx && - ecx == X86EMUL_CPUID_VENDOR_AMDisbetterI_ecx && - edx == X86EMUL_CPUID_VENDOR_AMDisbetterI_edx) - return true; - - /* Hygon ("HygonGenuine") */ - if (ebx == X86EMUL_CPUID_VENDOR_HygonGenuine_ebx && - ecx == X86EMUL_CPUID_VENDOR_HygonGenuine_ecx && - edx == X86EMUL_CPUID_VENDOR_HygonGenuine_edx) + if (is_guest_vendor_amd(ebx, ecx, edx) || + is_guest_vendor_hygon(ebx, ecx, edx)) return true; /* diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 23480d8e4ef17fa80353ba2e275e2eaf6cf2a4ae..6691729ef081803c2971afd53d67fb266bdd9153 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2244,13 +2244,18 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type) { u32 reg = kvm_lapic_get_reg(apic, lvt_type); int vector, mode, trig_mode; + int r; if (kvm_apic_hw_enabled(apic) && !(reg & APIC_LVT_MASKED)) { vector = reg & APIC_VECTOR_MASK; mode = reg & APIC_MODE_MASK; trig_mode = reg & APIC_LVT_LEVEL_TRIGGER; - return __apic_accept_irq(apic, mode, vector, 1, trig_mode, - NULL); + + r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL); + if (r && lvt_type == APIC_LVTPC && + guest_cpuid_is_intel_compatible(apic->vcpu)) + kvm_lapic_set_reg(apic, APIC_LVTPC, reg | APIC_LVT_MASKED); + return r; } return 0; } diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu/mmu.c similarity index 99% rename from arch/x86/kvm/mmu.c rename to arch/x86/kvm/mmu/mmu.c index 83de01b0b534c938dad99274c9b65c32f643c122..15d41308590ab3f0f50e7b3aa85552b69fdeb1db 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -86,6 +86,10 @@ __MODULE_PARM_TYPE(nx_huge_pages_recovery_ratio, "uint"); */ bool tdp_enabled = false; +static int max_huge_page_level __read_mostly; +static int tdp_root_level __read_mostly; +static int max_tdp_level __read_mostly; + enum { AUDIT_PRE_PAGE_FAULT, AUDIT_POST_PAGE_FAULT, @@ -4644,7 +4648,8 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, cpuid_maxphyaddr(vcpu), context->root_level, context->nx, guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES), - is_pse(vcpu), guest_cpuid_is_amd(vcpu)); + is_pse(vcpu), + guest_cpuid_is_amd_compatible(vcpu)); } static void @@ -5043,13 +5048,26 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu, return role; } +static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu) +{ + /* tdp_root_level is architecture forced level, use it if nonzero */ + if (tdp_root_level) + return tdp_root_level; + + /* Use 5-level TDP if and only if it's useful/necessary. */ + if (max_tdp_level == 5 && cpuid_maxphyaddr(vcpu) <= 48) + return 4; + + return max_tdp_level; +} + static union kvm_mmu_role kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only) { union kvm_mmu_role role = kvm_calc_mmu_role_common(vcpu, base_only); role.base.ad_disabled = (shadow_accessed_mask == 0); - role.base.level = kvm_x86_ops->get_tdp_level(vcpu); + role.base.level = kvm_mmu_get_tdp_level(vcpu); role.base.direct = true; role.base.gpte_is_8_bytes = true; @@ -5070,7 +5088,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu) context->page_fault = tdp_page_fault; context->sync_page = nonpaging_sync_page; context->invlpg = nonpaging_invlpg; - context->shadow_root_level = kvm_x86_ops->get_tdp_level(vcpu); + context->shadow_root_level = kvm_mmu_get_tdp_level(vcpu); context->direct_map = true; context->set_cr3 = kvm_x86_ops->set_tdp_cr3; context->get_cr3 = get_cr3; @@ -5694,18 +5712,28 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid) } EXPORT_SYMBOL_GPL(kvm_mmu_invpcid_gva); -void kvm_enable_tdp(void) +void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level, + int tdp_max_root_level, int tdp_huge_page_level) { - tdp_enabled = true; -} -EXPORT_SYMBOL_GPL(kvm_enable_tdp); + tdp_enabled = enable_tdp; + tdp_root_level = tdp_forced_root_level; + max_tdp_level = tdp_max_root_level; -void kvm_disable_tdp(void) -{ - tdp_enabled = false; + /* + * max_huge_page_level reflects KVM's MMU capabilities irrespective + * of kernel support, e.g. KVM may be capable of using 1GB pages when + * the kernel is not. But, KVM never creates a page size greater than + * what is used by the kernel for any given HVA, i.e. the kernel's + * capabilities are ultimately consulted by kvm_mmu_hugepage_adjust(). + */ + if (tdp_enabled) + max_huge_page_level = tdp_huge_page_level; + else if (boot_cpu_has(X86_FEATURE_GBPAGES)) + max_huge_page_level = PG_LEVEL_1G; + else + max_huge_page_level = PG_LEVEL_2M; } -EXPORT_SYMBOL_GPL(kvm_disable_tdp); - +EXPORT_SYMBOL_GPL(kvm_configure_mmu); /* The return value indicates if tlb flush on all vcpus is needed. */ typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head); @@ -5802,7 +5830,7 @@ static int alloc_mmu_pages(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu) * SVM's 32-bit NPT support, TDP paging doesn't use PAE paging and can * skip allocating the PDP table. */ - if (tdp_enabled && kvm_x86_ops->get_tdp_level(vcpu) > PT32E_ROOT_LEVEL) + if (tdp_enabled && kvm_mmu_get_tdp_level(vcpu) > PT32E_ROOT_LEVEL) return 0; page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_DMA32); diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/mmu/page_track.c similarity index 100% rename from arch/x86/kvm/page_track.c rename to arch/x86/kvm/mmu/page_track.c diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h similarity index 100% rename from arch/x86/kvm/paging_tmpl.h rename to arch/x86/kvm/mmu/paging_tmpl.h diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/svm/pmu.c similarity index 100% rename from arch/x86/kvm/pmu_amd.c rename to arch/x86/kvm/svm/pmu.c diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm/svm.c similarity index 99% rename from arch/x86/kvm/svm.c rename to arch/x86/kvm/svm/svm.c index 4e2502855b26863642fd70bd871fcb25751b0dd8..179a2301c4cebce8de1d54745467060e5544fa14 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -296,13 +296,6 @@ static const struct svm_direct_access_msrs { { .index = MSR_INVALID, .always = false }, }; -/* enable NPT for AMD64 and X86 with PAE */ -#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) -static bool npt_enabled = true; -#else -static bool npt_enabled; -#endif - /* * These 2 parameters are used to config the controls for Pause-Loop Exiting: * pause_filter_count: On processors that support Pause filtering(indicated @@ -351,9 +344,12 @@ module_param(pause_filter_count_shrink, ushort, 0444); static unsigned short pause_filter_count_max = KVM_SVM_DEFAULT_PLE_WINDOW_MAX; module_param(pause_filter_count_max, ushort, 0444); -/* allow nested paging (virtualized MMU) for all guests */ -static int npt = true; -module_param(npt, int, S_IRUGO); +/* + * Use nested page tables by default. Note, NPT may get forced off by + * svm_hardware_setup() if it's unsupported by hardware or the host kernel. + */ +bool npt_enabled = true; +module_param_named(npt, npt_enabled, bool, 0444); /* allow nested virtualization in KVM/SVM */ static int nested = true; @@ -739,7 +735,7 @@ static inline void invlpga(unsigned long addr, u32 asid) asm volatile (__ex("invlpga %1, %0") : : "c"(asid), "a"(addr)); } -static int get_npt_level(struct kvm_vcpu *vcpu) +static int get_max_npt_level(void) { #ifdef CONFIG_X86_64 return pgtable_l5_enabled() ? PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL; @@ -1420,19 +1416,21 @@ static __init int svm_hardware_setup(void) goto err; } - if (!boot_cpu_has(X86_FEATURE_NPT)) + /* + * KVM's MMU doesn't support using 2-level paging for itself, and thus + * NPT isn't supported if the host is using 2-level paging since host + * CR4 is unchanged on VMRUN. + */ + if (!IS_ENABLED(CONFIG_X86_64) && !IS_ENABLED(CONFIG_X86_PAE)) npt_enabled = false; - if (npt_enabled && !npt) { - printk(KERN_INFO "kvm: Nested Paging disabled\n"); + if (!boot_cpu_has(X86_FEATURE_NPT)) npt_enabled = false; - } - if (npt_enabled) { - printk(KERN_INFO "kvm: Nested Paging enabled\n"); - kvm_enable_tdp(); - } else - kvm_disable_tdp(); + /* Force VM NPT level equal to the host's max NPT level */ + kvm_configure_mmu(npt_enabled, get_max_npt_level(), + get_max_npt_level(), PG_LEVEL_1G); + pr_info("kvm: Nested Paging %sabled\n", npt_enabled ? "en" : "dis"); if (nrips) { if (!boot_cpu_has(X86_FEATURE_NRIPS)) @@ -3073,7 +3071,7 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu) vcpu->arch.mmu->get_cr3 = nested_svm_get_tdp_cr3; vcpu->arch.mmu->get_pdptr = nested_svm_get_tdp_pdptr; vcpu->arch.mmu->inject_page_fault = nested_svm_inject_npf_exit; - vcpu->arch.mmu->shadow_root_level = get_npt_level(vcpu); + vcpu->arch.mmu->shadow_root_level = get_max_npt_level(); reset_shadow_zero_bits_mask(vcpu, vcpu->arch.mmu); vcpu->arch.walk_mmu = &vcpu->arch.nested_mmu; } @@ -7374,7 +7372,6 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = { .set_tss_addr = svm_set_tss_addr, .set_identity_map_addr = svm_set_identity_map_addr, - .get_tdp_level = get_npt_level, .get_mt_mask = svm_get_mt_mask, .get_exit_info = svm_get_exit_info, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index ef3a4cf65cfeceeea767dbe05e4ef24bea192a04..189f6598ea631b03a59ebdf1cead481c5049c6bb 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3049,12 +3049,9 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) vmx->emulation_required = emulation_required(vcpu); } -static int get_ept_level(struct kvm_vcpu *vcpu) +static int vmx_get_max_tdp_level(void) { - /* Nested EPT currently only supports 4-level walks. */ - if (is_guest_mode(vcpu) && nested_cpu_has_ept(get_vmcs12(vcpu))) - return 4; - if (cpu_has_vmx_ept_5levels() && (cpuid_maxphyaddr(vcpu) > 48)) + if (cpu_has_vmx_ept_5levels()) return 5; return 4; } @@ -3063,7 +3060,7 @@ u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa) { u64 eptp = VMX_EPTP_MT_WB; - eptp |= (get_ept_level(vcpu) == 5) ? VMX_EPTP_PWL_5 : VMX_EPTP_PWL_4; + eptp |= (vmx_get_max_tdp_level() == 5) ? VMX_EPTP_PWL_5 : VMX_EPTP_PWL_4; if (enable_ept_ad_bits && (!is_guest_mode(vcpu) || nested_ept_ad_enabled(vcpu))) @@ -5418,7 +5415,6 @@ static void vmx_enable_tdp(void) VMX_EPT_RWX_MASK, 0ull); ept_set_mmio_spte_mask(); - kvm_enable_tdp(); } /* @@ -7708,7 +7704,7 @@ static __init int hardware_setup(void) { unsigned long host_bndcfgs; struct desc_ptr dt; - int r, i; + int r, i, ept_lpage_level; rdmsrl_safe(MSR_EFER, &host_efer); @@ -7800,8 +7796,16 @@ static __init int hardware_setup(void) if (enable_ept) vmx_enable_tdp(); + if (!enable_ept) + ept_lpage_level = 0; + else if (cpu_has_vmx_ept_1g_page()) + ept_lpage_level = PG_LEVEL_1G; + else if (cpu_has_vmx_ept_2m_page()) + ept_lpage_level = PG_LEVEL_2M; else - kvm_disable_tdp(); + ept_lpage_level = PG_LEVEL_4K; + kvm_configure_mmu(enable_ept, 0, vmx_get_max_tdp_level(), + ept_lpage_level); /* * Only enable PML when hardware supports PML feature, and both EPT @@ -7965,7 +7969,6 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .set_tss_addr = vmx_set_tss_addr, .set_identity_map_addr = vmx_set_identity_map_addr, - .get_tdp_level = get_ept_level, .get_mt_mask = vmx_get_mt_mask, .get_exit_info = vmx_get_exit_info, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a2b1d8081121fbe3233a331ac1d99c8a9638683e..a99588f68c99347c5962a7a34b17bb6e1f459c1f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2535,7 +2535,7 @@ static void kvmclock_sync_fn(struct work_struct *work) static bool can_set_mci_status(struct kvm_vcpu *vcpu) { /* McStatusWrEn enabled? */ - if (guest_cpuid_is_amd(vcpu)) + if (guest_cpuid_is_amd_compatible(vcpu)) return !!(vcpu->arch.msr_hwcr & BIT_ULL(18)); return false; diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 851359b7edc571ef392c2ddacc9a628fd852f0a6..47a35e335da2a41123e4d33999c6823ee5eb115f 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -33,10 +33,14 @@ */ /* - * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer which is - * stored in cpu_tlb_state.last_user_mm_ibpb. + * Bits to mangle the TIF_SPEC_IB state into the mm pointer which is + * stored in cpu_tlb_state.last_user_mm_spec. */ #define LAST_USER_MM_IBPB 0x1UL +#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB) + +/* Bits to set when tlbstate and flush is (re)initialized */ +#define LAST_USER_MM_INIT LAST_USER_MM_IBPB /* * We get here when we do something requiring a TLB invalidation @@ -189,20 +193,29 @@ static void sync_current_stack_to_mm(struct mm_struct *mm) } } -static inline unsigned long mm_mangle_tif_spec_ib(struct task_struct *next) +static unsigned long mm_mangle_tif_spec_bits(struct task_struct *next) { unsigned long next_tif = task_thread_info(next)->flags; - unsigned long ibpb = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_IBPB; + unsigned long spec_bits = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_SPEC_MASK; - return (unsigned long)next->mm | ibpb; + return (unsigned long)next->mm | spec_bits; } -static void cond_ibpb(struct task_struct *next) +static void cond_mitigation(struct task_struct *next) { + unsigned long prev_mm, next_mm; + if (!next || !next->mm) return; + next_mm = mm_mangle_tif_spec_bits(next); + prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_spec); + /* + * Avoid user->user BTB/RSB poisoning by flushing them when switching + * between processes. This stops one process from doing Spectre-v2 + * attacks on another. + * * Both, the conditional and the always IBPB mode use the mm * pointer to avoid the IBPB when switching between tasks of the * same process. Using the mm pointer instead of mm->context.ctx_id @@ -212,8 +225,6 @@ static void cond_ibpb(struct task_struct *next) * exposed data is not really interesting. */ if (static_branch_likely(&switch_mm_cond_ibpb)) { - unsigned long prev_mm, next_mm; - /* * This is a bit more complex than the always mode because * it has to handle two cases: @@ -243,20 +254,14 @@ static void cond_ibpb(struct task_struct *next) * Optimize this with reasonably small overhead for the * above cases. Mangle the TIF_SPEC_IB bit into the mm * pointer of the incoming task which is stored in - * cpu_tlbstate.last_user_mm_ibpb for comparison. - */ - next_mm = mm_mangle_tif_spec_ib(next); - prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_ibpb); - - /* + * cpu_tlbstate.last_user_mm_spec for comparison. + * * Issue IBPB only if the mm's are different and one or * both have the IBPB bit set. */ if (next_mm != prev_mm && (next_mm | prev_mm) & LAST_USER_MM_IBPB) indirect_branch_prediction_barrier(); - - this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, next_mm); } if (static_branch_unlikely(&switch_mm_always_ibpb)) { @@ -265,11 +270,12 @@ static void cond_ibpb(struct task_struct *next) * different context than the user space task which ran * last on this CPU. */ - if (this_cpu_read(cpu_tlbstate.last_user_mm) != next->mm) { + if ((prev_mm & ~LAST_USER_MM_SPEC_MASK) != + (unsigned long)next->mm) indirect_branch_prediction_barrier(); - this_cpu_write(cpu_tlbstate.last_user_mm, next->mm); - } } + + this_cpu_write(cpu_tlbstate.last_user_mm_spec, next_mm); } void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, @@ -377,11 +383,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, need_flush = true; } else { /* - * Avoid user/user BTB poisoning by flushing the branch - * predictor when switching between processes. This stops - * one process from doing Spectre-v2 attacks on another. + * Apply process to process speculation vulnerability + * mitigations if applicable. */ - cond_ibpb(tsk); + cond_mitigation(tsk); if (IS_ENABLED(CONFIG_VMAP_STACK)) { /* @@ -507,7 +512,7 @@ void initialize_tlbstate_and_flush(void) write_cr3(build_cr3(mm->pgd, 0)); /* Reinitialize tlbstate. */ - this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, LAST_USER_MM_IBPB); + this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_INIT); this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0); this_cpu_write(cpu_tlbstate.next_asid, 1); this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id); @@ -735,7 +740,7 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct flush_tlb_info, flush_tlb_info); static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx); #endif -static inline struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, +static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables, u64 new_tlb_gen) @@ -761,7 +766,7 @@ static inline struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, return info; } -static inline void put_flush_tlb_info(void) +static void put_flush_tlb_info(void) { #ifdef CONFIG_DEBUG_VM /* Complete reentrency prevention checks */ diff --git a/drivers/crypto/ccp/Makefile b/drivers/crypto/ccp/Makefile index 6b86f1e6d634a9d46940998e4a5c7cea45506485..db362fe472ea3ab137da1b1b31e7505e9037534d 100644 --- a/drivers/crypto/ccp/Makefile +++ b/drivers/crypto/ccp/Makefile @@ -8,7 +8,9 @@ ccp-$(CONFIG_CRYPTO_DEV_SP_CCP) += ccp-dev.o \ ccp-dmaengine.o ccp-$(CONFIG_CRYPTO_DEV_CCP_DEBUGFS) += ccp-debugfs.o ccp-$(CONFIG_PCI) += sp-pci.o -ccp-$(CONFIG_CRYPTO_DEV_SP_PSP) += psp-dev.o +ccp-$(CONFIG_CRYPTO_DEV_SP_PSP) += psp-dev.o \ + sev-dev.o \ + tee-dev.o obj-$(CONFIG_CRYPTO_DEV_CCP_CRYPTO) += ccp-crypto.o ccp-crypto-objs := ccp-crypto-main.o \ diff --git a/drivers/crypto/ccp/ccp-dev-v5.c b/drivers/crypto/ccp/ccp-dev-v5.c index 57eb53b8ac21730abe9622d4913af07e53f5d80a..5cd883cfec9fa3c1c200c940478a6d52b07b4594 100644 --- a/drivers/crypto/ccp/ccp-dev-v5.c +++ b/drivers/crypto/ccp/ccp-dev-v5.c @@ -789,6 +789,18 @@ static int ccp5_init(struct ccp_device *ccp) /* Find available queues */ qmr = ioread32(ccp->io_regs + Q_MASK_REG); + /* + * Check for a access to the registers. If this read returns + * 0xffffffff, it's likely that the system is running a broken + * BIOS which disallows access to the device. Stop here and fail + * the initialization (but not the load, as the PSP could get + * properly initialized). + */ + if (qmr == 0xffffffff) { + dev_notice(dev, "ccp: unable to access the device: you might be running a broken BIOS.\n"); + return 1; + } + for (i = 0; (i < MAX_HW_QUEUES) && (ccp->cmd_q_count < ccp->max_q_count); i++) { if (!(qmr & (1 << i))) continue; diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c index 5acf6ae5af667b3f2fca67014b82887989512449..e95e7aa5dbf11041e240d2f282e3de22f1536c3f 100644 --- a/drivers/crypto/ccp/psp-dev.c +++ b/drivers/crypto/ccp/psp-dev.c @@ -2,59 +2,20 @@ /* * AMD Platform Security Processor (PSP) interface * - * Copyright (C) 2016,2018 Advanced Micro Devices, Inc. + * Copyright (C) 2016,2019 Advanced Micro Devices, Inc. * * Author: Brijesh Singh */ -#include #include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include +#include #include "sp-dev.h" #include "psp-dev.h" +#include "sev-dev.h" +#include "tee-dev.h" -#define DEVICE_NAME "sev" -#define SEV_FW_FILE "amd/sev.fw" -#define SEV_FW_NAME_SIZE 64 - -static DEFINE_MUTEX(sev_cmd_mutex); -static struct sev_misc_dev *misc_dev; -static struct psp_device *psp_master; - -static int psp_cmd_timeout = 100; -module_param(psp_cmd_timeout, int, 0644); -MODULE_PARM_DESC(psp_cmd_timeout, " default timeout value, in seconds, for PSP commands"); - -static int psp_probe_timeout = 5; -module_param(psp_probe_timeout, int, 0644); -MODULE_PARM_DESC(psp_probe_timeout, " default timeout value, in seconds, during PSP device probe"); - -MODULE_FIRMWARE("amd/amd_sev_fam17h_model0xh.sbin"); /* 1st gen EPYC */ -MODULE_FIRMWARE("amd/amd_sev_fam17h_model3xh.sbin"); /* 2nd gen EPYC */ -MODULE_FIRMWARE("amd/amd_sev_fam19h_model0xh.sbin"); /* 3rd gen EPYC */ - -static bool psp_dead; -static int psp_timeout; - -static inline bool sev_version_greater_or_equal(u8 maj, u8 min) -{ - if (psp_master->api_major > maj) - return true; - if (psp_master->api_major == maj && psp_master->api_minor >= min) - return true; - return false; -} +struct psp_device *psp_master; static struct psp_device *psp_alloc_struct(struct sp_device *sp) { @@ -77,866 +38,95 @@ static irqreturn_t psp_irq_handler(int irq, void *data) { struct psp_device *psp = data; unsigned int status; - int reg; /* Read the interrupt status: */ status = ioread32(psp->io_regs + psp->vdata->intsts_reg); - /* Check if it is command completion: */ - if (!(status & PSP_CMD_COMPLETE)) - goto done; + /* invoke subdevice interrupt handlers */ + if (status) { + if (psp->sev_irq_handler) + psp->sev_irq_handler(irq, psp->sev_irq_data, status); - /* Check if it is SEV command completion: */ - reg = ioread32(psp->io_regs + psp->vdata->cmdresp_reg); - if (reg & PSP_CMDRESP_RESP) { - psp->sev_int_rcvd = 1; - wake_up(&psp->sev_int_queue); + if (psp->tee_irq_handler) + psp->tee_irq_handler(irq, psp->tee_irq_data, status); } -done: /* Clear the interrupt status by writing the same value we read. */ iowrite32(status, psp->io_regs + psp->vdata->intsts_reg); return IRQ_HANDLED; } -static int sev_wait_cmd_ioc(struct psp_device *psp, - unsigned int *reg, unsigned int timeout) +static unsigned int psp_get_capability(struct psp_device *psp) { - int ret; - - ret = wait_event_timeout(psp->sev_int_queue, - psp->sev_int_rcvd, timeout * HZ); - if (!ret) - return -ETIMEDOUT; - - *reg = ioread32(psp->io_regs + psp->vdata->cmdresp_reg); - - return 0; -} - -static int sev_cmd_buffer_len(int cmd) -{ - switch (cmd) { - case SEV_CMD_INIT: return sizeof(struct sev_data_init); - case SEV_CMD_PLATFORM_STATUS: return sizeof(struct sev_user_data_status); - case SEV_CMD_PEK_CSR: return sizeof(struct sev_data_pek_csr); - case SEV_CMD_PEK_CERT_IMPORT: return sizeof(struct sev_data_pek_cert_import); - case SEV_CMD_PDH_CERT_EXPORT: return sizeof(struct sev_data_pdh_cert_export); - case SEV_CMD_LAUNCH_START: return sizeof(struct sev_data_launch_start); - case SEV_CMD_LAUNCH_UPDATE_DATA: return sizeof(struct sev_data_launch_update_data); - case SEV_CMD_LAUNCH_UPDATE_VMSA: return sizeof(struct sev_data_launch_update_vmsa); - case SEV_CMD_LAUNCH_FINISH: return sizeof(struct sev_data_launch_finish); - case SEV_CMD_LAUNCH_MEASURE: return sizeof(struct sev_data_launch_measure); - case SEV_CMD_ACTIVATE: return sizeof(struct sev_data_activate); - case SEV_CMD_DEACTIVATE: return sizeof(struct sev_data_deactivate); - case SEV_CMD_DECOMMISSION: return sizeof(struct sev_data_decommission); - case SEV_CMD_GUEST_STATUS: return sizeof(struct sev_data_guest_status); - case SEV_CMD_DBG_DECRYPT: return sizeof(struct sev_data_dbg); - case SEV_CMD_DBG_ENCRYPT: return sizeof(struct sev_data_dbg); - case SEV_CMD_SEND_START: return sizeof(struct sev_data_send_start); - case SEV_CMD_SEND_UPDATE_DATA: return sizeof(struct sev_data_send_update_data); - case SEV_CMD_SEND_UPDATE_VMSA: return sizeof(struct sev_data_send_update_vmsa); - case SEV_CMD_SEND_FINISH: return sizeof(struct sev_data_send_finish); - case SEV_CMD_RECEIVE_START: return sizeof(struct sev_data_receive_start); - case SEV_CMD_RECEIVE_FINISH: return sizeof(struct sev_data_receive_finish); - case SEV_CMD_RECEIVE_UPDATE_DATA: return sizeof(struct sev_data_receive_update_data); - case SEV_CMD_RECEIVE_UPDATE_VMSA: return sizeof(struct sev_data_receive_update_vmsa); - case SEV_CMD_LAUNCH_UPDATE_SECRET: return sizeof(struct sev_data_launch_secret); - case SEV_CMD_DOWNLOAD_FIRMWARE: return sizeof(struct sev_data_download_firmware); - case SEV_CMD_GET_ID: return sizeof(struct sev_data_get_id); - default: return 0; - } - - return 0; -} - -static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret) -{ - struct psp_device *psp = psp_master; - unsigned int phys_lsb, phys_msb; - unsigned int reg, ret = 0; - - if (!psp) - return -ENODEV; - - if (psp_dead) - return -EBUSY; - - /* Get the physical address of the command buffer */ - phys_lsb = data ? lower_32_bits(__psp_pa(data)) : 0; - phys_msb = data ? upper_32_bits(__psp_pa(data)) : 0; - - dev_dbg(psp->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n", - cmd, phys_msb, phys_lsb, psp_timeout); - - print_hex_dump_debug("(in): ", DUMP_PREFIX_OFFSET, 16, 2, data, - sev_cmd_buffer_len(cmd), false); - - iowrite32(phys_lsb, psp->io_regs + psp->vdata->cmdbuff_addr_lo_reg); - iowrite32(phys_msb, psp->io_regs + psp->vdata->cmdbuff_addr_hi_reg); - - psp->sev_int_rcvd = 0; - - reg = cmd; - reg <<= PSP_CMDRESP_CMD_SHIFT; - reg |= PSP_CMDRESP_IOC; - iowrite32(reg, psp->io_regs + psp->vdata->cmdresp_reg); - - /* wait for command completion */ - ret = sev_wait_cmd_ioc(psp, ®, psp_timeout); - if (ret) { - if (psp_ret) - *psp_ret = 0; - - dev_err(psp->dev, "sev command %#x timed out, disabling PSP \n", cmd); - psp_dead = true; - - return ret; - } - - psp_timeout = psp_cmd_timeout; - - if (psp_ret) - *psp_ret = reg & PSP_CMDRESP_ERR_MASK; - - if (reg & PSP_CMDRESP_ERR_MASK) { - dev_dbg(psp->dev, "sev command %#x failed (%#010x)\n", - cmd, reg & PSP_CMDRESP_ERR_MASK); - ret = -EIO; - } - - print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data, - sev_cmd_buffer_len(cmd), false); - - return ret; -} - -static int sev_do_cmd(int cmd, void *data, int *psp_ret) -{ - int rc; - - mutex_lock(&sev_cmd_mutex); - rc = __sev_do_cmd_locked(cmd, data, psp_ret); - mutex_unlock(&sev_cmd_mutex); - - return rc; -} - -static int __sev_platform_init_locked(int *error) -{ - struct psp_device *psp = psp_master; - int rc = 0; - - if (!psp) - return -ENODEV; - - if (psp->sev_state == SEV_STATE_INIT) - return 0; - - rc = __sev_do_cmd_locked(SEV_CMD_INIT, &psp->init_cmd_buf, error); - if (rc) - return rc; - - psp->sev_state = SEV_STATE_INIT; - dev_dbg(psp->dev, "SEV firmware initialized\n"); - - return rc; -} - -int sev_platform_init(int *error) -{ - int rc; - - mutex_lock(&sev_cmd_mutex); - rc = __sev_platform_init_locked(error); - mutex_unlock(&sev_cmd_mutex); - - return rc; -} -EXPORT_SYMBOL_GPL(sev_platform_init); - -static int __sev_platform_shutdown_locked(int *error) -{ - int ret; - - ret = __sev_do_cmd_locked(SEV_CMD_SHUTDOWN, NULL, error); - if (ret) - return ret; - - psp_master->sev_state = SEV_STATE_UNINIT; - dev_dbg(psp_master->dev, "SEV firmware shutdown\n"); - - return ret; -} - -static int sev_platform_shutdown(int *error) -{ - int rc; - - mutex_lock(&sev_cmd_mutex); - rc = __sev_platform_shutdown_locked(NULL); - mutex_unlock(&sev_cmd_mutex); - - return rc; -} - -static int sev_get_platform_state(int *state, int *error) -{ - int rc; - - rc = __sev_do_cmd_locked(SEV_CMD_PLATFORM_STATUS, - &psp_master->status_cmd_buf, error); - if (rc) - return rc; - - *state = psp_master->status_cmd_buf.state; - return rc; -} - -static int sev_ioctl_do_reset(struct sev_issue_cmd *argp) -{ - int state, rc; + unsigned int val = ioread32(psp->io_regs + psp->vdata->feature_reg); /* - * The SEV spec requires that FACTORY_RESET must be issued in - * UNINIT state. Before we go further lets check if any guest is - * active. - * - * If FW is in WORKING state then deny the request otherwise issue - * SHUTDOWN command do INIT -> UNINIT before issuing the FACTORY_RESET. - * - */ - rc = sev_get_platform_state(&state, &argp->error); - if (rc) - return rc; - - if (state == SEV_STATE_WORKING) - return -EBUSY; - - if (state == SEV_STATE_INIT) { - rc = __sev_platform_shutdown_locked(&argp->error); - if (rc) - return rc; - } - - return __sev_do_cmd_locked(SEV_CMD_FACTORY_RESET, NULL, &argp->error); -} - -static int sev_ioctl_do_platform_status(struct sev_issue_cmd *argp) -{ - struct sev_user_data_status *data = &psp_master->status_cmd_buf; - int ret; - - ret = __sev_do_cmd_locked(SEV_CMD_PLATFORM_STATUS, data, &argp->error); - if (ret) - return ret; - - if (copy_to_user((void __user *)argp->data, data, sizeof(*data))) - ret = -EFAULT; - - return ret; -} - -static int sev_ioctl_do_pek_pdh_gen(int cmd, struct sev_issue_cmd *argp) -{ - int rc; - - if (psp_master->sev_state == SEV_STATE_UNINIT) { - rc = __sev_platform_init_locked(&argp->error); - if (rc) - return rc; - } - - return __sev_do_cmd_locked(cmd, NULL, &argp->error); -} - -static int sev_ioctl_do_pek_csr(struct sev_issue_cmd *argp) -{ - struct sev_user_data_pek_csr input; - struct sev_data_pek_csr *data; - void *blob = NULL; - int ret; - - if (copy_from_user(&input, (void __user *)argp->data, sizeof(input))) - return -EFAULT; - - data = kzalloc(sizeof(*data), GFP_KERNEL); - if (!data) - return -ENOMEM; - - /* userspace wants to query CSR length */ - if (!input.address || !input.length) - goto cmd; - - /* allocate a physically contiguous buffer to store the CSR blob */ - if (!access_ok(input.address, input.length) || - input.length > SEV_FW_BLOB_MAX_SIZE) { - ret = -EFAULT; - goto e_free; - } - - blob = kmalloc(input.length, GFP_KERNEL); - if (!blob) { - ret = -ENOMEM; - goto e_free; - } - - data->address = __psp_pa(blob); - data->len = input.length; - -cmd: - if (psp_master->sev_state == SEV_STATE_UNINIT) { - ret = __sev_platform_init_locked(&argp->error); - if (ret) - goto e_free_blob; - } - - ret = __sev_do_cmd_locked(SEV_CMD_PEK_CSR, data, &argp->error); - - /* If we query the CSR length, FW responded with expected data. */ - input.length = data->len; - - if (copy_to_user((void __user *)argp->data, &input, sizeof(input))) { - ret = -EFAULT; - goto e_free_blob; - } - - if (blob) { - if (copy_to_user((void __user *)input.address, blob, input.length)) - ret = -EFAULT; - } - -e_free_blob: - kfree(blob); -e_free: - kfree(data); - return ret; -} - -void *psp_copy_user_blob(u64 __user uaddr, u32 len) -{ - if (!uaddr || !len) - return ERR_PTR(-EINVAL); - - /* verify that blob length does not exceed our limit */ - if (len > SEV_FW_BLOB_MAX_SIZE) - return ERR_PTR(-EINVAL); - - return memdup_user((void __user *)(uintptr_t)uaddr, len); -} -EXPORT_SYMBOL_GPL(psp_copy_user_blob); - -static int sev_get_api_version(void) -{ - struct sev_user_data_status *status; - int error = 0, ret; - - status = &psp_master->status_cmd_buf; - ret = sev_platform_status(status, &error); - if (ret) { - dev_err(psp_master->dev, - "SEV: failed to get status. Error: %#x\n", error); - return 1; - } - - psp_master->api_major = status->api_major; - psp_master->api_minor = status->api_minor; - psp_master->build = status->build; - psp_master->sev_state = status->state; - - return 0; -} - -static int sev_get_firmware(struct device *dev, - const struct firmware **firmware) -{ - char fw_name_specific[SEV_FW_NAME_SIZE]; - char fw_name_subset[SEV_FW_NAME_SIZE]; - - snprintf(fw_name_specific, sizeof(fw_name_specific), - "amd/amd_sev_fam%.2xh_model%.2xh.sbin", - boot_cpu_data.x86, boot_cpu_data.x86_model); - - snprintf(fw_name_subset, sizeof(fw_name_subset), - "amd/amd_sev_fam%.2xh_model%.1xxh.sbin", - boot_cpu_data.x86, (boot_cpu_data.x86_model & 0xf0) >> 4); - - /* Check for SEV FW for a particular model. - * Ex. amd_sev_fam17h_model00h.sbin for Family 17h Model 00h - * - * or - * - * Check for SEV FW common to a subset of models. - * Ex. amd_sev_fam17h_model0xh.sbin for - * Family 17h Model 00h -- Family 17h Model 0Fh - * - * or - * - * Fall-back to using generic name: sev.fw + * Check for a access to the registers. If this read returns + * 0xffffffff, it's likely that the system is running a broken + * BIOS which disallows access to the device. Stop here and + * fail the PSP initialization (but not the load, as the CCP + * could get properly initialized). */ - if ((firmware_request_nowarn(firmware, fw_name_specific, dev) >= 0) || - (firmware_request_nowarn(firmware, fw_name_subset, dev) >= 0) || - (firmware_request_nowarn(firmware, SEV_FW_FILE, dev) >= 0)) + if (val == 0xffffffff) { + dev_notice(psp->dev, "psp: unable to access the device: you might be running a broken BIOS.\n"); return 0; - - return -ENOENT; -} - -/* Don't fail if SEV FW couldn't be updated. Continue with existing SEV FW */ -static int sev_update_firmware(struct device *dev) -{ - struct sev_data_download_firmware *data; - const struct firmware *firmware; - int ret, error, order; - struct page *p; - u64 data_size; - - if (sev_get_firmware(dev, &firmware) == -ENOENT) { - dev_dbg(dev, "No SEV firmware file present\n"); - return -1; - } - - /* - * SEV FW expects the physical address given to it to be 32 - * byte aligned. Memory allocated has structure placed at the - * beginning followed by the firmware being passed to the SEV - * FW. Allocate enough memory for data structure + alignment - * padding + SEV FW. - */ - data_size = ALIGN(sizeof(struct sev_data_download_firmware), 32); - - order = get_order(firmware->size + data_size); - p = alloc_pages(GFP_KERNEL, order); - if (!p) { - ret = -1; - goto fw_err; } - /* - * Copy firmware data to a kernel allocated contiguous - * memory region. - */ - data = page_address(p); - memcpy(page_address(p) + data_size, firmware->data, firmware->size); - - data->address = __psp_pa(page_address(p) + data_size); - data->len = firmware->size; - - ret = sev_do_cmd(SEV_CMD_DOWNLOAD_FIRMWARE, data, &error); - if (ret) - dev_dbg(dev, "Failed to update SEV firmware: %#x\n", error); - else - dev_info(dev, "SEV firmware update successful\n"); - - __free_pages(p, order); - -fw_err: - release_firmware(firmware); - - return ret; -} - -static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp) -{ - struct sev_user_data_pek_cert_import input; - struct sev_data_pek_cert_import *data; - void *pek_blob, *oca_blob; - int ret; - - if (copy_from_user(&input, (void __user *)argp->data, sizeof(input))) - return -EFAULT; - - data = kzalloc(sizeof(*data), GFP_KERNEL); - if (!data) - return -ENOMEM; - - /* copy PEK certificate blobs from userspace */ - pek_blob = psp_copy_user_blob(input.pek_cert_address, input.pek_cert_len); - if (IS_ERR(pek_blob)) { - ret = PTR_ERR(pek_blob); - goto e_free; - } - - data->pek_cert_address = __psp_pa(pek_blob); - data->pek_cert_len = input.pek_cert_len; - - /* copy PEK certificate blobs from userspace */ - oca_blob = psp_copy_user_blob(input.oca_cert_address, input.oca_cert_len); - if (IS_ERR(oca_blob)) { - ret = PTR_ERR(oca_blob); - goto e_free_pek; - } - - data->oca_cert_address = __psp_pa(oca_blob); - data->oca_cert_len = input.oca_cert_len; - - /* If platform is not in INIT state then transition it to INIT */ - if (psp_master->sev_state != SEV_STATE_INIT) { - ret = __sev_platform_init_locked(&argp->error); - if (ret) - goto e_free_oca; - } - - ret = __sev_do_cmd_locked(SEV_CMD_PEK_CERT_IMPORT, data, &argp->error); - -e_free_oca: - kfree(oca_blob); -e_free_pek: - kfree(pek_blob); -e_free: - kfree(data); - return ret; -} - -static int sev_ioctl_do_get_id2(struct sev_issue_cmd *argp) -{ - struct sev_user_data_get_id2 input; - struct sev_data_get_id *data; - void *id_blob = NULL; - int ret; - - /* SEV GET_ID is available from SEV API v0.16 and up */ - if (!sev_version_greater_or_equal(0, 16)) - return -ENOTSUPP; - - if (copy_from_user(&input, (void __user *)argp->data, sizeof(input))) - return -EFAULT; - - /* Check if we have write access to the userspace buffer */ - if (input.address && - input.length && - !access_ok(input.address, input.length)) - return -EFAULT; - - data = kzalloc(sizeof(*data), GFP_KERNEL); - if (!data) - return -ENOMEM; - - if (input.address && input.length) { - id_blob = kmalloc(input.length, GFP_KERNEL); - if (!id_blob) { - kfree(data); - return -ENOMEM; - } - - data->address = __psp_pa(id_blob); - data->len = input.length; - } - - ret = __sev_do_cmd_locked(SEV_CMD_GET_ID, data, &argp->error); - - /* - * Firmware will return the length of the ID value (either the minimum - * required length or the actual length written), return it to the user. - */ - input.length = data->len; - - if (copy_to_user((void __user *)argp->data, &input, sizeof(input))) { - ret = -EFAULT; - goto e_free; - } - - if (id_blob) { - if (copy_to_user((void __user *)input.address, - id_blob, data->len)) { - ret = -EFAULT; - goto e_free; - } - } - -e_free: - kfree(id_blob); - kfree(data); - - return ret; -} - -static int sev_ioctl_do_get_id(struct sev_issue_cmd *argp) -{ - struct sev_data_get_id *data; - u64 data_size, user_size; - void *id_blob, *mem; - int ret; - - /* SEV GET_ID available from SEV API v0.16 and up */ - if (!sev_version_greater_or_equal(0, 16)) - return -ENOTSUPP; - - /* SEV FW expects the buffer it fills with the ID to be - * 8-byte aligned. Memory allocated should be enough to - * hold data structure + alignment padding + memory - * where SEV FW writes the ID. - */ - data_size = ALIGN(sizeof(struct sev_data_get_id), 8); - user_size = sizeof(struct sev_user_data_get_id); - - mem = kzalloc(data_size + user_size, GFP_KERNEL); - if (!mem) - return -ENOMEM; - - data = mem; - id_blob = mem + data_size; - - data->address = __psp_pa(id_blob); - data->len = user_size; - - ret = __sev_do_cmd_locked(SEV_CMD_GET_ID, data, &argp->error); - if (!ret) { - if (copy_to_user((void __user *)argp->data, id_blob, data->len)) - ret = -EFAULT; - } - - kfree(mem); - - return ret; + return val; } -static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp) +static int psp_check_sev_support(struct psp_device *psp, + unsigned int capability) { - struct sev_user_data_pdh_cert_export input; - void *pdh_blob = NULL, *cert_blob = NULL; - struct sev_data_pdh_cert_export *data; - int ret; - - if (copy_from_user(&input, (void __user *)argp->data, sizeof(input))) - return -EFAULT; - - data = kzalloc(sizeof(*data), GFP_KERNEL); - if (!data) - return -ENOMEM; - - /* Userspace wants to query the certificate length. */ - if (!input.pdh_cert_address || - !input.pdh_cert_len || - !input.cert_chain_address) - goto cmd; - - /* Allocate a physically contiguous buffer to store the PDH blob. */ - if ((input.pdh_cert_len > SEV_FW_BLOB_MAX_SIZE) || - !access_ok(input.pdh_cert_address, input.pdh_cert_len)) { - ret = -EFAULT; - goto e_free; - } - - /* Allocate a physically contiguous buffer to store the cert chain blob. */ - if ((input.cert_chain_len > SEV_FW_BLOB_MAX_SIZE) || - !access_ok(input.cert_chain_address, input.cert_chain_len)) { - ret = -EFAULT; - goto e_free; - } - - pdh_blob = kmalloc(input.pdh_cert_len, GFP_KERNEL); - if (!pdh_blob) { - ret = -ENOMEM; - goto e_free; - } - - data->pdh_cert_address = __psp_pa(pdh_blob); - data->pdh_cert_len = input.pdh_cert_len; - - cert_blob = kmalloc(input.cert_chain_len, GFP_KERNEL); - if (!cert_blob) { - ret = -ENOMEM; - goto e_free_pdh; - } - - data->cert_chain_address = __psp_pa(cert_blob); - data->cert_chain_len = input.cert_chain_len; - -cmd: - /* If platform is not in INIT state then transition it to INIT. */ - if (psp_master->sev_state != SEV_STATE_INIT) { - ret = __sev_platform_init_locked(&argp->error); - if (ret) - goto e_free_cert; - } - - ret = __sev_do_cmd_locked(SEV_CMD_PDH_CERT_EXPORT, data, &argp->error); - - /* If we query the length, FW responded with expected data. */ - input.cert_chain_len = data->cert_chain_len; - input.pdh_cert_len = data->pdh_cert_len; - - if (copy_to_user((void __user *)argp->data, &input, sizeof(input))) { - ret = -EFAULT; - goto e_free_cert; - } - - if (pdh_blob) { - if (copy_to_user((void __user *)input.pdh_cert_address, - pdh_blob, input.pdh_cert_len)) { - ret = -EFAULT; - goto e_free_cert; - } - } - - if (cert_blob) { - if (copy_to_user((void __user *)input.cert_chain_address, - cert_blob, input.cert_chain_len)) - ret = -EFAULT; + /* Check if device supports SEV feature */ + if (!(capability & 1)) { + dev_dbg(psp->dev, "psp does not support SEV\n"); + return -ENODEV; } -e_free_cert: - kfree(cert_blob); -e_free_pdh: - kfree(pdh_blob); -e_free: - kfree(data); - return ret; + return 0; } -static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) +static int psp_check_tee_support(struct psp_device *psp, + unsigned int capability) { - void __user *argp = (void __user *)arg; - struct sev_issue_cmd input; - int ret = -EFAULT; - - if (!psp_master) + /* Check if device supports TEE feature */ + if (!(capability & 2)) { + dev_dbg(psp->dev, "psp does not support TEE\n"); return -ENODEV; - - if (ioctl != SEV_ISSUE_CMD) - return -EINVAL; - - if (copy_from_user(&input, argp, sizeof(struct sev_issue_cmd))) - return -EFAULT; - - if (input.cmd > SEV_MAX) - return -EINVAL; - - mutex_lock(&sev_cmd_mutex); - - switch (input.cmd) { - - case SEV_FACTORY_RESET: - ret = sev_ioctl_do_reset(&input); - break; - case SEV_PLATFORM_STATUS: - ret = sev_ioctl_do_platform_status(&input); - break; - case SEV_PEK_GEN: - ret = sev_ioctl_do_pek_pdh_gen(SEV_CMD_PEK_GEN, &input); - break; - case SEV_PDH_GEN: - ret = sev_ioctl_do_pek_pdh_gen(SEV_CMD_PDH_GEN, &input); - break; - case SEV_PEK_CSR: - ret = sev_ioctl_do_pek_csr(&input); - break; - case SEV_PEK_CERT_IMPORT: - ret = sev_ioctl_do_pek_import(&input); - break; - case SEV_PDH_CERT_EXPORT: - ret = sev_ioctl_do_pdh_export(&input); - break; - case SEV_GET_ID: - pr_warn_once("SEV_GET_ID command is deprecated, use SEV_GET_ID2\n"); - ret = sev_ioctl_do_get_id(&input); - break; - case SEV_GET_ID2: - ret = sev_ioctl_do_get_id2(&input); - break; - default: - ret = -EINVAL; - goto out; } - if (copy_to_user(argp, &input, sizeof(struct sev_issue_cmd))) - ret = -EFAULT; -out: - mutex_unlock(&sev_cmd_mutex); - - return ret; -} - -static const struct file_operations sev_fops = { - .owner = THIS_MODULE, - .unlocked_ioctl = sev_ioctl, -}; - -int sev_platform_status(struct sev_user_data_status *data, int *error) -{ - return sev_do_cmd(SEV_CMD_PLATFORM_STATUS, data, error); -} -EXPORT_SYMBOL_GPL(sev_platform_status); - -int sev_guest_deactivate(struct sev_data_deactivate *data, int *error) -{ - return sev_do_cmd(SEV_CMD_DEACTIVATE, data, error); -} -EXPORT_SYMBOL_GPL(sev_guest_deactivate); - -int sev_guest_activate(struct sev_data_activate *data, int *error) -{ - return sev_do_cmd(SEV_CMD_ACTIVATE, data, error); -} -EXPORT_SYMBOL_GPL(sev_guest_activate); - -int sev_guest_decommission(struct sev_data_decommission *data, int *error) -{ - return sev_do_cmd(SEV_CMD_DECOMMISSION, data, error); + return 0; } -EXPORT_SYMBOL_GPL(sev_guest_decommission); -int sev_guest_df_flush(int *error) +static int psp_check_support(struct psp_device *psp, + unsigned int capability) { - return sev_do_cmd(SEV_CMD_DF_FLUSH, NULL, error); -} -EXPORT_SYMBOL_GPL(sev_guest_df_flush); + int sev_support = psp_check_sev_support(psp, capability); + int tee_support = psp_check_tee_support(psp, capability); -static void sev_exit(struct kref *ref) -{ - struct sev_misc_dev *misc_dev = container_of(ref, struct sev_misc_dev, refcount); + /* Return error if device neither supports SEV nor TEE */ + if (sev_support && tee_support) + return -ENODEV; - misc_deregister(&misc_dev->misc); + return 0; } -static int sev_misc_init(struct psp_device *psp) +static int psp_init(struct psp_device *psp, unsigned int capability) { - struct device *dev = psp->dev; int ret; - /* - * SEV feature support can be detected on multiple devices but the SEV - * FW commands must be issued on the master. During probe, we do not - * know the master hence we create /dev/sev on the first device probe. - * sev_do_cmd() finds the right master device to which to issue the - * command to the firmware. - */ - if (!misc_dev) { - struct miscdevice *misc; - - misc_dev = devm_kzalloc(dev, sizeof(*misc_dev), GFP_KERNEL); - if (!misc_dev) - return -ENOMEM; - - misc = &misc_dev->misc; - misc->minor = MISC_DYNAMIC_MINOR; - misc->name = DEVICE_NAME; - misc->fops = &sev_fops; - - ret = misc_register(misc); + if (!psp_check_sev_support(psp, capability)) { + ret = sev_dev_init(psp); if (ret) return ret; - - kref_init(&misc_dev->refcount); - } else { - kref_get(&misc_dev->refcount); } - init_waitqueue_head(&psp->sev_int_queue); - psp->sev_misc = misc_dev; - dev_dbg(dev, "registered SEV device\n"); - - return 0; -} - -static int psp_check_sev_support(struct psp_device *psp) -{ - /* Check if device supports SEV feature */ - if (!(ioread32(psp->io_regs + psp->vdata->feature_reg) & 1)) { - dev_dbg(psp->dev, "psp does not support SEV\n"); - return -ENODEV; + if (!psp_check_tee_support(psp, capability)) { + ret = tee_dev_init(psp); + if (ret) + return ret; } return 0; @@ -946,6 +136,7 @@ int psp_dev_init(struct sp_device *sp) { struct device *dev = sp->dev; struct psp_device *psp; + unsigned int capability; int ret; ret = -ENOMEM; @@ -964,7 +155,11 @@ int psp_dev_init(struct sp_device *sp) psp->io_regs = sp->io_map; - ret = psp_check_sev_support(psp); + capability = psp_get_capability(psp); + if (!capability) + goto e_disable; + + ret = psp_check_support(psp, capability); if (ret) goto e_disable; @@ -979,7 +174,7 @@ int psp_dev_init(struct sp_device *sp) goto e_err; } - ret = sev_misc_init(psp); + ret = psp_init(psp, capability); if (ret) goto e_irq; @@ -1015,71 +210,52 @@ void psp_dev_destroy(struct sp_device *sp) if (!psp) return; - if (psp->sev_misc) - kref_put(&misc_dev->refcount, sev_exit); + sev_dev_destroy(psp); + + tee_dev_destroy(psp); sp_free_psp_irq(sp, psp); } -int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd, - void *data, int *error) +void psp_set_sev_irq_handler(struct psp_device *psp, psp_irq_handler_t handler, + void *data) { - if (!filep || filep->f_op != &sev_fops) - return -EBADF; - - return sev_do_cmd(cmd, data, error); + psp->sev_irq_data = data; + psp->sev_irq_handler = handler; } -EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user); -void psp_pci_init(void) +void psp_clear_sev_irq_handler(struct psp_device *psp) { - struct sp_device *sp; - int error, rc; - - sp = sp_get_psp_master_device(); - if (!sp) - return; - - psp_master = sp->psp_data; + psp_set_sev_irq_handler(psp, NULL, NULL); +} - psp_timeout = psp_probe_timeout; +void psp_set_tee_irq_handler(struct psp_device *psp, psp_irq_handler_t handler, + void *data) +{ + psp->tee_irq_data = data; + psp->tee_irq_handler = handler; +} - if (sev_get_api_version()) - goto err; +void psp_clear_tee_irq_handler(struct psp_device *psp) +{ + psp_set_tee_irq_handler(psp, NULL, NULL); +} - /* - * If platform is not in UNINIT state then firmware upgrade and/or - * platform INIT command will fail. These command require UNINIT state. - * - * In a normal boot we should never run into case where the firmware - * is not in UNINIT state on boot. But in case of kexec boot, a reboot - * may not go through a typical shutdown sequence and may leave the - * firmware in INIT or WORKING state. - */ +struct psp_device *psp_get_master_device(void) +{ + struct sp_device *sp = sp_get_psp_master_device(); - if (psp_master->sev_state != SEV_STATE_UNINIT) { - sev_platform_shutdown(NULL); - psp_master->sev_state = SEV_STATE_UNINIT; - } + return sp ? sp->psp_data : NULL; +} - if (sev_version_greater_or_equal(0, 15) && - sev_update_firmware(psp_master->dev) == 0) - sev_get_api_version(); +void psp_pci_init(void) +{ + psp_master = psp_get_master_device(); - /* Initialize the platform */ - rc = sev_platform_init(&error); - if (rc) { - dev_err(sp->dev, "SEV: failed to INIT error %#x\n", error); + if (!psp_master) return; - } - - dev_info(sp->dev, "SEV API:%d.%d build:%d\n", psp_master->api_major, - psp_master->api_minor, psp_master->build); - - return; -err: - psp_master = NULL; + sev_pci_init(); } void psp_pci_exit(void) @@ -1087,5 +263,5 @@ void psp_pci_exit(void) if (!psp_master) return; - sev_platform_shutdown(NULL); + sev_pci_exit(); } diff --git a/drivers/crypto/ccp/psp-dev.h b/drivers/crypto/ccp/psp-dev.h index 82a084f0299011cd9a59722b362d225db490c077..ef38e4135d810ff44f4006af8858be16c7aa8e47 100644 --- a/drivers/crypto/ccp/psp-dev.h +++ b/drivers/crypto/ccp/psp-dev.h @@ -2,7 +2,7 @@ /* * AMD Platform Security Processor (PSP) interface driver * - * Copyright (C) 2017-2018 Advanced Micro Devices, Inc. + * Copyright (C) 2017-2019 Advanced Micro Devices, Inc. * * Author: Brijesh Singh */ @@ -11,34 +11,20 @@ #define __PSP_DEV_H__ #include -#include -#include #include -#include -#include -#include -#include +#include #include -#include -#include -#include -#include #include "sp-dev.h" -#define PSP_CMD_COMPLETE BIT(1) - -#define PSP_CMDRESP_CMD_SHIFT 16 -#define PSP_CMDRESP_IOC BIT(0) #define PSP_CMDRESP_RESP BIT(31) #define PSP_CMDRESP_ERR_MASK 0xffff #define MAX_PSP_NAME_LEN 16 -struct sev_misc_dev { - struct kref refcount; - struct miscdevice misc; -}; +extern struct psp_device *psp_master; + +typedef void (*psp_irq_handler_t)(int, void *, unsigned int); struct psp_device { struct list_head entry; @@ -51,16 +37,24 @@ struct psp_device { void __iomem *io_regs; - int sev_state; - unsigned int sev_int_rcvd; - wait_queue_head_t sev_int_queue; - struct sev_misc_dev *sev_misc; - struct sev_user_data_status status_cmd_buf; - struct sev_data_init init_cmd_buf; + psp_irq_handler_t sev_irq_handler; + void *sev_irq_data; + + psp_irq_handler_t tee_irq_handler; + void *tee_irq_data; - u8 api_major; - u8 api_minor; - u8 build; + void *sev_data; + void *tee_data; }; +void psp_set_sev_irq_handler(struct psp_device *psp, psp_irq_handler_t handler, + void *data); +void psp_clear_sev_irq_handler(struct psp_device *psp); + +void psp_set_tee_irq_handler(struct psp_device *psp, psp_irq_handler_t handler, + void *data); +void psp_clear_tee_irq_handler(struct psp_device *psp); + +struct psp_device *psp_get_master_device(void); + #endif /* __PSP_DEV_H */ diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c new file mode 100644 index 0000000000000000000000000000000000000000..b00dcc2be897d0d09a94bdae9861db649b6f7863 --- /dev/null +++ b/drivers/crypto/ccp/sev-dev.c @@ -0,0 +1,1072 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * AMD Secure Encrypted Virtualization (SEV) interface + * + * Copyright (C) 2016,2019 Advanced Micro Devices, Inc. + * + * Author: Brijesh Singh + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "psp-dev.h" +#include "sev-dev.h" + +#define DEVICE_NAME "sev" +#define SEV_FW_FILE "amd/sev.fw" +#define SEV_FW_NAME_SIZE 64 + +static DEFINE_MUTEX(sev_cmd_mutex); +static struct sev_misc_dev *misc_dev; + +static int psp_cmd_timeout = 100; +module_param(psp_cmd_timeout, int, 0644); +MODULE_PARM_DESC(psp_cmd_timeout, " default timeout value, in seconds, for PSP commands"); + +static int psp_probe_timeout = 5; +module_param(psp_probe_timeout, int, 0644); +MODULE_PARM_DESC(psp_probe_timeout, " default timeout value, in seconds, during PSP device probe"); + +MODULE_FIRMWARE("amd/amd_sev_fam17h_model0xh.sbin"); /* 1st gen EPYC */ +MODULE_FIRMWARE("amd/amd_sev_fam17h_model3xh.sbin"); /* 2nd gen EPYC */ +MODULE_FIRMWARE("amd/amd_sev_fam19h_model0xh.sbin"); /* 3rd gen EPYC */ + +static bool psp_dead; +static int psp_timeout; + +static inline bool sev_version_greater_or_equal(u8 maj, u8 min) +{ + struct sev_device *sev = psp_master->sev_data; + + if (sev->api_major > maj) + return true; + + if (sev->api_major == maj && sev->api_minor >= min) + return true; + + return false; +} + +static void sev_irq_handler(int irq, void *data, unsigned int status) +{ + struct sev_device *sev = data; + int reg; + + /* Check if it is command completion: */ + if (!(status & SEV_CMD_COMPLETE)) + return; + + /* Check if it is SEV command completion: */ + reg = ioread32(sev->io_regs + sev->vdata->cmdresp_reg); + if (reg & PSP_CMDRESP_RESP) { + sev->int_rcvd = 1; + wake_up(&sev->int_queue); + } +} + +static int sev_wait_cmd_ioc(struct sev_device *sev, + unsigned int *reg, unsigned int timeout) +{ + int ret; + + ret = wait_event_timeout(sev->int_queue, + sev->int_rcvd, timeout * HZ); + if (!ret) + return -ETIMEDOUT; + + *reg = ioread32(sev->io_regs + sev->vdata->cmdresp_reg); + + return 0; +} + +static int sev_cmd_buffer_len(int cmd) +{ + switch (cmd) { + case SEV_CMD_INIT: return sizeof(struct sev_data_init); + case SEV_CMD_PLATFORM_STATUS: return sizeof(struct sev_user_data_status); + case SEV_CMD_PEK_CSR: return sizeof(struct sev_data_pek_csr); + case SEV_CMD_PEK_CERT_IMPORT: return sizeof(struct sev_data_pek_cert_import); + case SEV_CMD_PDH_CERT_EXPORT: return sizeof(struct sev_data_pdh_cert_export); + case SEV_CMD_LAUNCH_START: return sizeof(struct sev_data_launch_start); + case SEV_CMD_LAUNCH_UPDATE_DATA: return sizeof(struct sev_data_launch_update_data); + case SEV_CMD_LAUNCH_UPDATE_VMSA: return sizeof(struct sev_data_launch_update_vmsa); + case SEV_CMD_LAUNCH_FINISH: return sizeof(struct sev_data_launch_finish); + case SEV_CMD_LAUNCH_MEASURE: return sizeof(struct sev_data_launch_measure); + case SEV_CMD_ACTIVATE: return sizeof(struct sev_data_activate); + case SEV_CMD_DEACTIVATE: return sizeof(struct sev_data_deactivate); + case SEV_CMD_DECOMMISSION: return sizeof(struct sev_data_decommission); + case SEV_CMD_GUEST_STATUS: return sizeof(struct sev_data_guest_status); + case SEV_CMD_DBG_DECRYPT: return sizeof(struct sev_data_dbg); + case SEV_CMD_DBG_ENCRYPT: return sizeof(struct sev_data_dbg); + case SEV_CMD_SEND_START: return sizeof(struct sev_data_send_start); + case SEV_CMD_SEND_UPDATE_DATA: return sizeof(struct sev_data_send_update_data); + case SEV_CMD_SEND_UPDATE_VMSA: return sizeof(struct sev_data_send_update_vmsa); + case SEV_CMD_SEND_FINISH: return sizeof(struct sev_data_send_finish); + case SEV_CMD_RECEIVE_START: return sizeof(struct sev_data_receive_start); + case SEV_CMD_RECEIVE_FINISH: return sizeof(struct sev_data_receive_finish); + case SEV_CMD_RECEIVE_UPDATE_DATA: return sizeof(struct sev_data_receive_update_data); + case SEV_CMD_RECEIVE_UPDATE_VMSA: return sizeof(struct sev_data_receive_update_vmsa); + case SEV_CMD_LAUNCH_UPDATE_SECRET: return sizeof(struct sev_data_launch_secret); + case SEV_CMD_DOWNLOAD_FIRMWARE: return sizeof(struct sev_data_download_firmware); + case SEV_CMD_GET_ID: return sizeof(struct sev_data_get_id); + default: return 0; + } + + return 0; +} + +static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret) +{ + struct psp_device *psp = psp_master; + struct sev_device *sev; + unsigned int phys_lsb, phys_msb; + unsigned int reg, ret = 0; + + if (!psp || !psp->sev_data) + return -ENODEV; + + if (psp_dead) + return -EBUSY; + + sev = psp->sev_data; + + /* Get the physical address of the command buffer */ + phys_lsb = data ? lower_32_bits(__psp_pa(data)) : 0; + phys_msb = data ? upper_32_bits(__psp_pa(data)) : 0; + + dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n", + cmd, phys_msb, phys_lsb, psp_timeout); + + print_hex_dump_debug("(in): ", DUMP_PREFIX_OFFSET, 16, 2, data, + sev_cmd_buffer_len(cmd), false); + + iowrite32(phys_lsb, sev->io_regs + sev->vdata->cmdbuff_addr_lo_reg); + iowrite32(phys_msb, sev->io_regs + sev->vdata->cmdbuff_addr_hi_reg); + + sev->int_rcvd = 0; + + reg = cmd; + reg <<= SEV_CMDRESP_CMD_SHIFT; + reg |= SEV_CMDRESP_IOC; + iowrite32(reg, sev->io_regs + sev->vdata->cmdresp_reg); + + /* wait for command completion */ + ret = sev_wait_cmd_ioc(sev, ®, psp_timeout); + if (ret) { + if (psp_ret) + *psp_ret = 0; + + dev_err(sev->dev, "sev command %#x timed out, disabling PSP\n", cmd); + psp_dead = true; + + return ret; + } + + psp_timeout = psp_cmd_timeout; + + if (psp_ret) + *psp_ret = reg & PSP_CMDRESP_ERR_MASK; + + if (reg & PSP_CMDRESP_ERR_MASK) { + dev_dbg(sev->dev, "sev command %#x failed (%#010x)\n", + cmd, reg & PSP_CMDRESP_ERR_MASK); + ret = -EIO; + } + + print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data, + sev_cmd_buffer_len(cmd), false); + + return ret; +} + +static int sev_do_cmd(int cmd, void *data, int *psp_ret) +{ + int rc; + + mutex_lock(&sev_cmd_mutex); + rc = __sev_do_cmd_locked(cmd, data, psp_ret); + mutex_unlock(&sev_cmd_mutex); + + return rc; +} + +static int __sev_platform_init_locked(int *error) +{ + struct psp_device *psp = psp_master; + struct sev_device *sev; + int rc = 0; + + if (!psp || !psp->sev_data) + return -ENODEV; + + sev = psp->sev_data; + + if (sev->state == SEV_STATE_INIT) + return 0; + + rc = __sev_do_cmd_locked(SEV_CMD_INIT, &sev->init_cmd_buf, error); + if (rc) + return rc; + + sev->state = SEV_STATE_INIT; + dev_dbg(sev->dev, "SEV firmware initialized\n"); + + return rc; +} + +int sev_platform_init(int *error) +{ + int rc; + + mutex_lock(&sev_cmd_mutex); + rc = __sev_platform_init_locked(error); + mutex_unlock(&sev_cmd_mutex); + + return rc; +} +EXPORT_SYMBOL_GPL(sev_platform_init); + +static int __sev_platform_shutdown_locked(int *error) +{ + struct sev_device *sev = psp_master->sev_data; + int ret; + + ret = __sev_do_cmd_locked(SEV_CMD_SHUTDOWN, NULL, error); + if (ret) + return ret; + + sev->state = SEV_STATE_UNINIT; + dev_dbg(sev->dev, "SEV firmware shutdown\n"); + + return ret; +} + +static int sev_platform_shutdown(int *error) +{ + int rc; + + mutex_lock(&sev_cmd_mutex); + rc = __sev_platform_shutdown_locked(NULL); + mutex_unlock(&sev_cmd_mutex); + + return rc; +} + +static int sev_get_platform_state(int *state, int *error) +{ + struct sev_device *sev = psp_master->sev_data; + int rc; + + rc = __sev_do_cmd_locked(SEV_CMD_PLATFORM_STATUS, + &sev->status_cmd_buf, error); + if (rc) + return rc; + + *state = sev->status_cmd_buf.state; + return rc; +} + +static int sev_ioctl_do_reset(struct sev_issue_cmd *argp) +{ + int state, rc; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + /* + * The SEV spec requires that FACTORY_RESET must be issued in + * UNINIT state. Before we go further lets check if any guest is + * active. + * + * If FW is in WORKING state then deny the request otherwise issue + * SHUTDOWN command do INIT -> UNINIT before issuing the FACTORY_RESET. + * + */ + rc = sev_get_platform_state(&state, &argp->error); + if (rc) + return rc; + + if (state == SEV_STATE_WORKING) + return -EBUSY; + + if (state == SEV_STATE_INIT) { + rc = __sev_platform_shutdown_locked(&argp->error); + if (rc) + return rc; + } + + return __sev_do_cmd_locked(SEV_CMD_FACTORY_RESET, NULL, &argp->error); +} + +static int sev_ioctl_do_platform_status(struct sev_issue_cmd *argp) +{ + struct sev_device *sev = psp_master->sev_data; + struct sev_user_data_status *data = &sev->status_cmd_buf; + int ret; + + ret = __sev_do_cmd_locked(SEV_CMD_PLATFORM_STATUS, data, &argp->error); + if (ret) + return ret; + + if (copy_to_user((void __user *)argp->data, data, sizeof(*data))) + ret = -EFAULT; + + return ret; +} + +static int sev_ioctl_do_pek_pdh_gen(int cmd, struct sev_issue_cmd *argp) +{ + struct sev_device *sev = psp_master->sev_data; + int rc; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (sev->state == SEV_STATE_UNINIT) { + rc = __sev_platform_init_locked(&argp->error); + if (rc) + return rc; + } + + return __sev_do_cmd_locked(cmd, NULL, &argp->error); +} + +static int sev_ioctl_do_pek_csr(struct sev_issue_cmd *argp) +{ + struct sev_device *sev = psp_master->sev_data; + struct sev_user_data_pek_csr input; + struct sev_data_pek_csr *data; + void *blob = NULL; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&input, (void __user *)argp->data, sizeof(input))) + return -EFAULT; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return -ENOMEM; + + /* userspace wants to query CSR length */ + if (!input.address || !input.length) + goto cmd; + + /* allocate a physically contiguous buffer to store the CSR blob */ + if (!access_ok(input.address, input.length) || + input.length > SEV_FW_BLOB_MAX_SIZE) { + ret = -EFAULT; + goto e_free; + } + + blob = kmalloc(input.length, GFP_KERNEL); + if (!blob) { + ret = -ENOMEM; + goto e_free; + } + + data->address = __psp_pa(blob); + data->len = input.length; + +cmd: + if (sev->state == SEV_STATE_UNINIT) { + ret = __sev_platform_init_locked(&argp->error); + if (ret) + goto e_free_blob; + } + + ret = __sev_do_cmd_locked(SEV_CMD_PEK_CSR, data, &argp->error); + + /* If we query the CSR length, FW responded with expected data. */ + input.length = data->len; + + if (copy_to_user((void __user *)argp->data, &input, sizeof(input))) { + ret = -EFAULT; + goto e_free_blob; + } + + if (blob) { + if (copy_to_user((void __user *)input.address, blob, input.length)) + ret = -EFAULT; + } + +e_free_blob: + kfree(blob); +e_free: + kfree(data); + return ret; +} + +void *psp_copy_user_blob(u64 __user uaddr, u32 len) +{ + if (!uaddr || !len) + return ERR_PTR(-EINVAL); + + /* verify that blob length does not exceed our limit */ + if (len > SEV_FW_BLOB_MAX_SIZE) + return ERR_PTR(-EINVAL); + + return memdup_user((void __user *)(uintptr_t)uaddr, len); +} +EXPORT_SYMBOL_GPL(psp_copy_user_blob); + +static int sev_get_api_version(void) +{ + struct sev_device *sev = psp_master->sev_data; + struct sev_user_data_status *status; + int error = 0, ret; + + status = &sev->status_cmd_buf; + ret = sev_platform_status(status, &error); + if (ret) { + dev_err(sev->dev, + "SEV: failed to get status. Error: %#x\n", error); + return 1; + } + + sev->api_major = status->api_major; + sev->api_minor = status->api_minor; + sev->build = status->build; + sev->state = status->state; + + return 0; +} + +static int sev_get_firmware(struct device *dev, + const struct firmware **firmware) +{ + char fw_name_specific[SEV_FW_NAME_SIZE]; + char fw_name_subset[SEV_FW_NAME_SIZE]; + + snprintf(fw_name_specific, sizeof(fw_name_specific), + "amd/amd_sev_fam%.2xh_model%.2xh.sbin", + boot_cpu_data.x86, boot_cpu_data.x86_model); + + snprintf(fw_name_subset, sizeof(fw_name_subset), + "amd/amd_sev_fam%.2xh_model%.1xxh.sbin", + boot_cpu_data.x86, (boot_cpu_data.x86_model & 0xf0) >> 4); + + /* Check for SEV FW for a particular model. + * Ex. amd_sev_fam17h_model00h.sbin for Family 17h Model 00h + * + * or + * + * Check for SEV FW common to a subset of models. + * Ex. amd_sev_fam17h_model0xh.sbin for + * Family 17h Model 00h -- Family 17h Model 0Fh + * + * or + * + * Fall-back to using generic name: sev.fw + */ + if ((firmware_request_nowarn(firmware, fw_name_specific, dev) >= 0) || + (firmware_request_nowarn(firmware, fw_name_subset, dev) >= 0) || + (firmware_request_nowarn(firmware, SEV_FW_FILE, dev) >= 0)) + return 0; + + return -ENOENT; +} + +/* Don't fail if SEV FW couldn't be updated. Continue with existing SEV FW */ +static int sev_update_firmware(struct device *dev) +{ + struct sev_data_download_firmware *data; + const struct firmware *firmware; + int ret, error, order; + struct page *p; + u64 data_size; + + if (sev_get_firmware(dev, &firmware) == -ENOENT) { + dev_dbg(dev, "No SEV firmware file present\n"); + return -1; + } + + /* + * SEV FW expects the physical address given to it to be 32 + * byte aligned. Memory allocated has structure placed at the + * beginning followed by the firmware being passed to the SEV + * FW. Allocate enough memory for data structure + alignment + * padding + SEV FW. + */ + data_size = ALIGN(sizeof(struct sev_data_download_firmware), 32); + + order = get_order(firmware->size + data_size); + p = alloc_pages(GFP_KERNEL, order); + if (!p) { + ret = -1; + goto fw_err; + } + + /* + * Copy firmware data to a kernel allocated contiguous + * memory region. + */ + data = page_address(p); + memcpy(page_address(p) + data_size, firmware->data, firmware->size); + + data->address = __psp_pa(page_address(p) + data_size); + data->len = firmware->size; + + ret = sev_do_cmd(SEV_CMD_DOWNLOAD_FIRMWARE, data, &error); + if (ret) + dev_dbg(dev, "Failed to update SEV firmware: %#x\n", error); + else + dev_info(dev, "SEV firmware update successful\n"); + + __free_pages(p, order); + +fw_err: + release_firmware(firmware); + + return ret; +} + +static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp) +{ + struct sev_device *sev = psp_master->sev_data; + struct sev_user_data_pek_cert_import input; + struct sev_data_pek_cert_import *data; + void *pek_blob, *oca_blob; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&input, (void __user *)argp->data, sizeof(input))) + return -EFAULT; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return -ENOMEM; + + /* copy PEK certificate blobs from userspace */ + pek_blob = psp_copy_user_blob(input.pek_cert_address, input.pek_cert_len); + if (IS_ERR(pek_blob)) { + ret = PTR_ERR(pek_blob); + goto e_free; + } + + data->pek_cert_address = __psp_pa(pek_blob); + data->pek_cert_len = input.pek_cert_len; + + /* copy PEK certificate blobs from userspace */ + oca_blob = psp_copy_user_blob(input.oca_cert_address, input.oca_cert_len); + if (IS_ERR(oca_blob)) { + ret = PTR_ERR(oca_blob); + goto e_free_pek; + } + + data->oca_cert_address = __psp_pa(oca_blob); + data->oca_cert_len = input.oca_cert_len; + + /* If platform is not in INIT state then transition it to INIT */ + if (sev->state != SEV_STATE_INIT) { + ret = __sev_platform_init_locked(&argp->error); + if (ret) + goto e_free_oca; + } + + ret = __sev_do_cmd_locked(SEV_CMD_PEK_CERT_IMPORT, data, &argp->error); + +e_free_oca: + kfree(oca_blob); +e_free_pek: + kfree(pek_blob); +e_free: + kfree(data); + return ret; +} + +static int sev_ioctl_do_get_id2(struct sev_issue_cmd *argp) +{ + struct sev_user_data_get_id2 input; + struct sev_data_get_id *data; + void *id_blob = NULL; + int ret; + + /* SEV GET_ID is available from SEV API v0.16 and up */ + if (!sev_version_greater_or_equal(0, 16)) + return -ENOTSUPP; + + if (copy_from_user(&input, (void __user *)argp->data, sizeof(input))) + return -EFAULT; + + /* Check if we have write access to the userspace buffer */ + if (input.address && + input.length && + !access_ok(input.address, input.length)) + return -EFAULT; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return -ENOMEM; + + if (input.address && input.length) { + id_blob = kmalloc(input.length, GFP_KERNEL); + if (!id_blob) { + kfree(data); + return -ENOMEM; + } + + data->address = __psp_pa(id_blob); + data->len = input.length; + } + + ret = __sev_do_cmd_locked(SEV_CMD_GET_ID, data, &argp->error); + + /* + * Firmware will return the length of the ID value (either the minimum + * required length or the actual length written), return it to the user. + */ + input.length = data->len; + + if (copy_to_user((void __user *)argp->data, &input, sizeof(input))) { + ret = -EFAULT; + goto e_free; + } + + if (id_blob) { + if (copy_to_user((void __user *)input.address, + id_blob, data->len)) { + ret = -EFAULT; + goto e_free; + } + } + +e_free: + kfree(id_blob); + kfree(data); + + return ret; +} + +static int sev_ioctl_do_get_id(struct sev_issue_cmd *argp) +{ + struct sev_data_get_id *data; + u64 data_size, user_size; + void *id_blob, *mem; + int ret; + + /* SEV GET_ID available from SEV API v0.16 and up */ + if (!sev_version_greater_or_equal(0, 16)) + return -ENOTSUPP; + + /* SEV FW expects the buffer it fills with the ID to be + * 8-byte aligned. Memory allocated should be enough to + * hold data structure + alignment padding + memory + * where SEV FW writes the ID. + */ + data_size = ALIGN(sizeof(struct sev_data_get_id), 8); + user_size = sizeof(struct sev_user_data_get_id); + + mem = kzalloc(data_size + user_size, GFP_KERNEL); + if (!mem) + return -ENOMEM; + + data = mem; + id_blob = mem + data_size; + + data->address = __psp_pa(id_blob); + data->len = user_size; + + ret = __sev_do_cmd_locked(SEV_CMD_GET_ID, data, &argp->error); + if (!ret) { + if (copy_to_user((void __user *)argp->data, id_blob, data->len)) + ret = -EFAULT; + } + + kfree(mem); + + return ret; +} + +static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp) +{ + struct sev_device *sev = psp_master->sev_data; + struct sev_user_data_pdh_cert_export input; + void *pdh_blob = NULL, *cert_blob = NULL; + struct sev_data_pdh_cert_export *data; + int ret; + + /* If platform is not in INIT state then transition it to INIT. */ + if (sev->state != SEV_STATE_INIT) { + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + ret = __sev_platform_init_locked(&argp->error); + if (ret) + return ret; + } + + if (copy_from_user(&input, (void __user *)argp->data, sizeof(input))) + return -EFAULT; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return -ENOMEM; + + /* Userspace wants to query the certificate length. */ + if (!input.pdh_cert_address || + !input.pdh_cert_len || + !input.cert_chain_address) + goto cmd; + + /* Allocate a physically contiguous buffer to store the PDH blob. */ + if ((input.pdh_cert_len > SEV_FW_BLOB_MAX_SIZE) || + !access_ok(input.pdh_cert_address, input.pdh_cert_len)) { + ret = -EFAULT; + goto e_free; + } + + /* Allocate a physically contiguous buffer to store the cert chain blob. */ + if ((input.cert_chain_len > SEV_FW_BLOB_MAX_SIZE) || + !access_ok(input.cert_chain_address, input.cert_chain_len)) { + ret = -EFAULT; + goto e_free; + } + + pdh_blob = kmalloc(input.pdh_cert_len, GFP_KERNEL); + if (!pdh_blob) { + ret = -ENOMEM; + goto e_free; + } + + data->pdh_cert_address = __psp_pa(pdh_blob); + data->pdh_cert_len = input.pdh_cert_len; + + cert_blob = kmalloc(input.cert_chain_len, GFP_KERNEL); + if (!cert_blob) { + ret = -ENOMEM; + goto e_free_pdh; + } + + data->cert_chain_address = __psp_pa(cert_blob); + data->cert_chain_len = input.cert_chain_len; + +cmd: + ret = __sev_do_cmd_locked(SEV_CMD_PDH_CERT_EXPORT, data, &argp->error); + + /* If we query the length, FW responded with expected data. */ + input.cert_chain_len = data->cert_chain_len; + input.pdh_cert_len = data->pdh_cert_len; + + if (copy_to_user((void __user *)argp->data, &input, sizeof(input))) { + ret = -EFAULT; + goto e_free_cert; + } + + if (pdh_blob) { + if (copy_to_user((void __user *)input.pdh_cert_address, + pdh_blob, input.pdh_cert_len)) { + ret = -EFAULT; + goto e_free_cert; + } + } + + if (cert_blob) { + if (copy_to_user((void __user *)input.cert_chain_address, + cert_blob, input.cert_chain_len)) + ret = -EFAULT; + } + +e_free_cert: + kfree(cert_blob); +e_free_pdh: + kfree(pdh_blob); +e_free: + kfree(data); + return ret; +} + +static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) +{ + void __user *argp = (void __user *)arg; + struct sev_issue_cmd input; + int ret = -EFAULT; + + if (!psp_master || !psp_master->sev_data) + return -ENODEV; + + if (ioctl != SEV_ISSUE_CMD) + return -EINVAL; + + if (copy_from_user(&input, argp, sizeof(struct sev_issue_cmd))) + return -EFAULT; + + if (input.cmd > SEV_MAX) + return -EINVAL; + + mutex_lock(&sev_cmd_mutex); + + switch (input.cmd) { + + case SEV_FACTORY_RESET: + ret = sev_ioctl_do_reset(&input); + break; + case SEV_PLATFORM_STATUS: + ret = sev_ioctl_do_platform_status(&input); + break; + case SEV_PEK_GEN: + ret = sev_ioctl_do_pek_pdh_gen(SEV_CMD_PEK_GEN, &input); + break; + case SEV_PDH_GEN: + ret = sev_ioctl_do_pek_pdh_gen(SEV_CMD_PDH_GEN, &input); + break; + case SEV_PEK_CSR: + ret = sev_ioctl_do_pek_csr(&input); + break; + case SEV_PEK_CERT_IMPORT: + ret = sev_ioctl_do_pek_import(&input); + break; + case SEV_PDH_CERT_EXPORT: + ret = sev_ioctl_do_pdh_export(&input); + break; + case SEV_GET_ID: + pr_warn_once("SEV_GET_ID command is deprecated, use SEV_GET_ID2\n"); + ret = sev_ioctl_do_get_id(&input); + break; + case SEV_GET_ID2: + ret = sev_ioctl_do_get_id2(&input); + break; + default: + ret = -EINVAL; + goto out; + } + + if (copy_to_user(argp, &input, sizeof(struct sev_issue_cmd))) + ret = -EFAULT; +out: + mutex_unlock(&sev_cmd_mutex); + + return ret; +} + +static const struct file_operations sev_fops = { + .owner = THIS_MODULE, + .unlocked_ioctl = sev_ioctl, +}; + +int sev_platform_status(struct sev_user_data_status *data, int *error) +{ + return sev_do_cmd(SEV_CMD_PLATFORM_STATUS, data, error); +} +EXPORT_SYMBOL_GPL(sev_platform_status); + +int sev_guest_deactivate(struct sev_data_deactivate *data, int *error) +{ + return sev_do_cmd(SEV_CMD_DEACTIVATE, data, error); +} +EXPORT_SYMBOL_GPL(sev_guest_deactivate); + +int sev_guest_activate(struct sev_data_activate *data, int *error) +{ + return sev_do_cmd(SEV_CMD_ACTIVATE, data, error); +} +EXPORT_SYMBOL_GPL(sev_guest_activate); + +int sev_guest_decommission(struct sev_data_decommission *data, int *error) +{ + return sev_do_cmd(SEV_CMD_DECOMMISSION, data, error); +} +EXPORT_SYMBOL_GPL(sev_guest_decommission); + +int sev_guest_df_flush(int *error) +{ + return sev_do_cmd(SEV_CMD_DF_FLUSH, NULL, error); +} +EXPORT_SYMBOL_GPL(sev_guest_df_flush); + +static void sev_exit(struct kref *ref) +{ + struct sev_misc_dev *misc_dev = container_of(ref, struct sev_misc_dev, refcount); + + misc_deregister(&misc_dev->misc); +} + +static int sev_misc_init(struct sev_device *sev) +{ + struct device *dev = sev->dev; + int ret; + + /* + * SEV feature support can be detected on multiple devices but the SEV + * FW commands must be issued on the master. During probe, we do not + * know the master hence we create /dev/sev on the first device probe. + * sev_do_cmd() finds the right master device to which to issue the + * command to the firmware. + */ + if (!misc_dev) { + struct miscdevice *misc; + + misc_dev = devm_kzalloc(dev, sizeof(*misc_dev), GFP_KERNEL); + if (!misc_dev) + return -ENOMEM; + + misc = &misc_dev->misc; + misc->minor = MISC_DYNAMIC_MINOR; + misc->name = DEVICE_NAME; + misc->fops = &sev_fops; + + ret = misc_register(misc); + if (ret) + return ret; + + kref_init(&misc_dev->refcount); + } else { + kref_get(&misc_dev->refcount); + } + + init_waitqueue_head(&sev->int_queue); + sev->misc = misc_dev; + dev_dbg(dev, "registered SEV device\n"); + + return 0; +} + +int sev_dev_init(struct psp_device *psp) +{ + struct device *dev = psp->dev; + struct sev_device *sev; + int ret = -ENOMEM; + + sev = devm_kzalloc(dev, sizeof(*sev), GFP_KERNEL); + if (!sev) + goto e_err; + + psp->sev_data = sev; + + sev->dev = dev; + sev->psp = psp; + + sev->io_regs = psp->io_regs; + + sev->vdata = (struct sev_vdata *)psp->vdata->sev; + if (!sev->vdata) { + ret = -ENODEV; + dev_err(dev, "sev: missing driver data\n"); + goto e_err; + } + + psp_set_sev_irq_handler(psp, sev_irq_handler, sev); + + ret = sev_misc_init(sev); + if (ret) + goto e_irq; + + dev_notice(dev, "sev enabled\n"); + + return 0; + +e_irq: + psp_clear_sev_irq_handler(psp); +e_err: + psp->sev_data = NULL; + + dev_notice(dev, "sev initialization failed\n"); + + return ret; +} + +void sev_dev_destroy(struct psp_device *psp) +{ + struct sev_device *sev = psp->sev_data; + + if (!sev) + return; + + if (sev->misc) + kref_put(&misc_dev->refcount, sev_exit); + + psp_clear_sev_irq_handler(psp); +} + +int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd, + void *data, int *error) +{ + if (!filep || filep->f_op != &sev_fops) + return -EBADF; + + return sev_do_cmd(cmd, data, error); +} +EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user); + +void sev_pci_init(void) +{ + struct sev_device *sev = psp_master->sev_data; + int error, rc; + + if (!sev) + return; + + psp_timeout = psp_probe_timeout; + + if (sev_get_api_version()) + goto err; + + /* + * If platform is not in UNINIT state then firmware upgrade and/or + * platform INIT command will fail. These command require UNINIT state. + * + * In a normal boot we should never run into case where the firmware + * is not in UNINIT state on boot. But in case of kexec boot, a reboot + * may not go through a typical shutdown sequence and may leave the + * firmware in INIT or WORKING state. + */ + + if (sev->state != SEV_STATE_UNINIT) { + sev_platform_shutdown(NULL); + sev->state = SEV_STATE_UNINIT; + } + + if (sev_version_greater_or_equal(0, 15) && + sev_update_firmware(sev->dev) == 0) + sev_get_api_version(); + + /* Initialize the platform */ + rc = sev_platform_init(&error); + if (rc && (error == SEV_RET_SECURE_DATA_INVALID)) { + /* + * INIT command returned an integrity check failure + * status code, meaning that firmware load and + * validation of SEV related persistent data has + * failed and persistent state has been erased. + * Retrying INIT command here should succeed. + */ + dev_dbg(sev->dev, "SEV: retrying INIT command"); + rc = sev_platform_init(&error); + } + + if (rc) { + dev_err(sev->dev, "SEV: failed to INIT error %#x\n", error); + return; + } + + dev_info(sev->dev, "SEV API:%d.%d build:%d\n", sev->api_major, + sev->api_minor, sev->build); + + return; + +err: + psp_master->sev_data = NULL; +} + +void sev_pci_exit(void) +{ + if (!psp_master->sev_data) + return; + + sev_platform_shutdown(NULL); +} diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h new file mode 100644 index 0000000000000000000000000000000000000000..dd5c4fe82914c7341673c95b2b2800c7bf71c008 --- /dev/null +++ b/drivers/crypto/ccp/sev-dev.h @@ -0,0 +1,63 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * AMD Platform Security Processor (PSP) interface driver + * + * Copyright (C) 2017-2019 Advanced Micro Devices, Inc. + * + * Author: Brijesh Singh + */ + +#ifndef __SEV_DEV_H__ +#define __SEV_DEV_H__ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SEV_CMD_COMPLETE BIT(1) +#define SEV_CMDRESP_CMD_SHIFT 16 +#define SEV_CMDRESP_IOC BIT(0) + +struct sev_misc_dev { + struct kref refcount; + struct miscdevice misc; +}; + +struct sev_device { + struct device *dev; + struct psp_device *psp; + + void __iomem *io_regs; + + struct sev_vdata *vdata; + + int state; + unsigned int int_rcvd; + wait_queue_head_t int_queue; + struct sev_misc_dev *misc; + struct sev_user_data_status status_cmd_buf; + struct sev_data_init init_cmd_buf; + + u8 api_major; + u8 api_minor; + u8 build; +}; + +int sev_dev_init(struct psp_device *psp); +void sev_dev_destroy(struct psp_device *psp); + +void sev_pci_init(void); +void sev_pci_exit(void); + +#endif /* __SEV_DEV_H */ diff --git a/drivers/crypto/ccp/sp-dev.h b/drivers/crypto/ccp/sp-dev.h index 53c12562d31e9bef3a0700de9c44035a104b4ddf..e5f4fa58df2c73b0b241f7cef99e31f48449dd82 100644 --- a/drivers/crypto/ccp/sp-dev.h +++ b/drivers/crypto/ccp/sp-dev.h @@ -2,7 +2,7 @@ /* * AMD Secure Processor driver * - * Copyright (C) 2017-2018 Advanced Micro Devices, Inc. + * Copyright (C) 2017-2019 Advanced Micro Devices, Inc. * * Author: Tom Lendacky * Author: Gary R Hook @@ -39,10 +39,33 @@ struct ccp_vdata { const unsigned int rsamax; }; -struct psp_vdata { +struct sev_vdata { + const unsigned int cmdresp_reg; + const unsigned int cmdbuff_addr_lo_reg; + const unsigned int cmdbuff_addr_hi_reg; +}; + +struct tee_vdata { const unsigned int cmdresp_reg; const unsigned int cmdbuff_addr_lo_reg; const unsigned int cmdbuff_addr_hi_reg; + const unsigned int ring_wptr_reg; + const unsigned int ring_rptr_reg; +}; + +struct platform_access_vdata { + const unsigned int cmdresp_reg; + const unsigned int cmdbuff_addr_lo_reg; + const unsigned int cmdbuff_addr_hi_reg; + const unsigned int doorbell_button_reg; + const unsigned int doorbell_cmd_reg; + +}; + +struct psp_vdata { + const struct sev_vdata *sev; + const struct tee_vdata *tee; + const struct platform_access_vdata *platform_access; const unsigned int feature_reg; const unsigned int inten_reg; const unsigned int intsts_reg; diff --git a/drivers/crypto/ccp/sp-pci.c b/drivers/crypto/ccp/sp-pci.c index 39a5f10e61a2e6ab3c20ab037363de18ba2c198d..6e70af6d91ed13c1e2bfa3b339480f92f3f8c026 100644 --- a/drivers/crypto/ccp/sp-pci.c +++ b/drivers/crypto/ccp/sp-pci.c @@ -2,7 +2,7 @@ /* * AMD Secure Processor device driver * - * Copyright (C) 2013,2018 Advanced Micro Devices, Inc. + * Copyright (C) 2013,2019 Advanced Micro Devices, Inc. * * Author: Tom Lendacky * Author: Gary R Hook @@ -264,23 +264,85 @@ static int sp_pci_resume(struct pci_dev *pdev) #endif #ifdef CONFIG_CRYPTO_DEV_SP_PSP -static const struct psp_vdata pspv1 = { +static const struct sev_vdata sevv1 = { .cmdresp_reg = 0x10580, .cmdbuff_addr_lo_reg = 0x105e0, .cmdbuff_addr_hi_reg = 0x105e4, +}; + +static const struct sev_vdata sevv2 = { + .cmdresp_reg = 0x10980, + .cmdbuff_addr_lo_reg = 0x109e0, + .cmdbuff_addr_hi_reg = 0x109e4, +}; + +static const struct tee_vdata teev1 = { + .cmdresp_reg = 0x10544, + .cmdbuff_addr_lo_reg = 0x10548, + .cmdbuff_addr_hi_reg = 0x1054c, + .ring_wptr_reg = 0x10550, + .ring_rptr_reg = 0x10554, + +}; + +static const struct tee_vdata teev2 = { + .cmdresp_reg = 0x10944, /* C2PMSG_17 */ + .cmdbuff_addr_lo_reg = 0x10948, /* C2PMSG_18 */ + .cmdbuff_addr_hi_reg = 0x1094c, /* C2PMSG_19 */ + .ring_wptr_reg = 0x10950, /* C2PMSG_20 */ + .ring_rptr_reg = 0x10954, /* C2PMSG_21 */ +}; + +static const struct platform_access_vdata pa_v2 = { + .doorbell_button_reg = 0x10a24, /* C2PMSG_73 */ + .doorbell_cmd_reg = 0x10a40, /* C2PMSG_80 */ +}; + +static const struct psp_vdata pspv1 = { + .sev = &sevv1, .feature_reg = 0x105fc, .inten_reg = 0x10610, .intsts_reg = 0x10614, }; static const struct psp_vdata pspv2 = { - .cmdresp_reg = 0x10980, - .cmdbuff_addr_lo_reg = 0x109e0, - .cmdbuff_addr_hi_reg = 0x109e4, + .sev = &sevv2, + .feature_reg = 0x109fc, + .inten_reg = 0x10690, + .intsts_reg = 0x10694, +}; + +static const struct psp_vdata pspv3 = { + .tee = &teev1, + .feature_reg = 0x109fc, + .inten_reg = 0x10690, + .intsts_reg = 0x10694, +}; + +static const struct psp_vdata pspv4 = { + .sev = &sevv2, + .tee = &teev1, .feature_reg = 0x109fc, .inten_reg = 0x10690, .intsts_reg = 0x10694, }; + +static const struct psp_vdata pspv5 = { + .tee = &teev2, + .platform_access = &pa_v2, + .feature_reg = 0x109fc, /* C2PMSG_63 */ + .inten_reg = 0x10510, /* P2CMSG_INTEN */ + .intsts_reg = 0x10514, /* P2CMSG_INTSTS */ +}; + +static const struct psp_vdata pspv6 = { + .sev = &sevv2, + .tee = &teev2, + .feature_reg = 0x109fc, /* C2PMSG_63 */ + .inten_reg = 0x10510, /* P2CMSG_INTEN */ + .intsts_reg = 0x10514, /* P2CMSG_INTSTS */ +}; + #endif static const struct sp_dev_vdata dev_vdata[] = { @@ -316,8 +378,35 @@ static const struct sp_dev_vdata dev_vdata[] = { }, { /* 4 */ .bar = 2, +#ifdef CONFIG_CRYPTO_DEV_SP_CCP + .ccp_vdata = &ccpv5a, +#endif #ifdef CONFIG_CRYPTO_DEV_SP_PSP - .psp_vdata = &pspv2, + .psp_vdata = &pspv3, +#endif + }, + { /* 5 */ + .bar = 2, +#ifdef CONFIG_CRYPTO_DEV_SP_PSP + .psp_vdata = &pspv4, +#endif + }, + { /* 6 */ + .bar = 2, +#ifdef CONFIG_CRYPTO_DEV_SP_PSP + .psp_vdata = &pspv3, +#endif + }, + { /* 7 */ + .bar = 2, +#ifdef CONFIG_CRYPTO_DEV_SP_PSP + .psp_vdata = &pspv5, +#endif + }, + { /* 8 */ + .bar = 2, +#ifdef CONFIG_CRYPTO_DEV_SP_PSP + .psp_vdata = &pspv6, #endif }, }; @@ -326,7 +415,11 @@ static const struct pci_device_id sp_pci_table[] = { { PCI_VDEVICE(AMD, 0x1456), (kernel_ulong_t)&dev_vdata[1] }, { PCI_VDEVICE(AMD, 0x1468), (kernel_ulong_t)&dev_vdata[2] }, { PCI_VDEVICE(AMD, 0x1486), (kernel_ulong_t)&dev_vdata[3] }, - { PCI_VDEVICE(AMD, 0x14CA), (kernel_ulong_t)&dev_vdata[4] }, + { PCI_VDEVICE(AMD, 0x15DF), (kernel_ulong_t)&dev_vdata[4] }, + { PCI_VDEVICE(AMD, 0x14CA), (kernel_ulong_t)&dev_vdata[5] }, + { PCI_VDEVICE(AMD, 0x15C7), (kernel_ulong_t)&dev_vdata[6] }, + { PCI_VDEVICE(AMD, 0x17E0), (kernel_ulong_t)&dev_vdata[7] }, + { PCI_VDEVICE(AMD, 0x156E), (kernel_ulong_t)&dev_vdata[8] }, /* Last entry must be zero */ { 0, } }; diff --git a/drivers/crypto/ccp/tee-dev.c b/drivers/crypto/ccp/tee-dev.c new file mode 100644 index 0000000000000000000000000000000000000000..758473a3e1be75f43cd20c473c140887b2e400f4 --- /dev/null +++ b/drivers/crypto/ccp/tee-dev.c @@ -0,0 +1,385 @@ +// SPDX-License-Identifier: MIT +/* + * AMD Trusted Execution Environment (TEE) interface + * + * Author: Rijo Thomas + * Author: Devaraj Rangasamy + * + * Copyright 2019 Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "psp-dev.h" +#include "tee-dev.h" + +static bool psp_dead; + +static int tee_alloc_ring(struct psp_tee_device *tee, int ring_size) +{ + struct ring_buf_manager *rb_mgr = &tee->rb_mgr; + void *start_addr; + + if (!ring_size) + return -EINVAL; + + /* We need actual physical address instead of DMA address, since + * Trusted OS running on AMD Secure Processor will map this region + */ + start_addr = (void *)__get_free_pages(GFP_KERNEL, get_order(ring_size)); + if (!start_addr) + return -ENOMEM; + + memset(start_addr, 0x0, ring_size); + rb_mgr->ring_start = start_addr; + rb_mgr->ring_size = ring_size; + rb_mgr->ring_pa = __psp_pa(start_addr); + mutex_init(&rb_mgr->mutex); + + return 0; +} + +static void tee_free_ring(struct psp_tee_device *tee) +{ + struct ring_buf_manager *rb_mgr = &tee->rb_mgr; + + if (!rb_mgr->ring_start) + return; + + free_pages((unsigned long)rb_mgr->ring_start, + get_order(rb_mgr->ring_size)); + + rb_mgr->ring_start = NULL; + rb_mgr->ring_size = 0; + rb_mgr->ring_pa = 0; + mutex_destroy(&rb_mgr->mutex); +} + +static int tee_wait_cmd_poll(struct psp_tee_device *tee, unsigned int timeout, + unsigned int *reg) +{ + /* ~10ms sleep per loop => nloop = timeout * 100 */ + int nloop = timeout * 100; + + while (--nloop) { + *reg = ioread32(tee->io_regs + tee->vdata->cmdresp_reg); + if (*reg & PSP_CMDRESP_RESP) + return 0; + + usleep_range(10000, 10100); + } + + dev_err(tee->dev, "tee: command timed out, disabling PSP\n"); + psp_dead = true; + + return -ETIMEDOUT; +} + +static +struct tee_init_ring_cmd *tee_alloc_cmd_buffer(struct psp_tee_device *tee) +{ + struct tee_init_ring_cmd *cmd; + + cmd = kzalloc(sizeof(*cmd), GFP_KERNEL); + if (!cmd) + return NULL; + + cmd->hi_addr = upper_32_bits(tee->rb_mgr.ring_pa); + cmd->low_addr = lower_32_bits(tee->rb_mgr.ring_pa); + cmd->size = tee->rb_mgr.ring_size; + + dev_dbg(tee->dev, "tee: ring address: high = 0x%x low = 0x%x size = %u\n", + cmd->hi_addr, cmd->low_addr, cmd->size); + + return cmd; +} + +static inline void tee_free_cmd_buffer(struct tee_init_ring_cmd *cmd) +{ + kfree(cmd); +} + +static int tee_init_ring(struct psp_tee_device *tee) +{ + int ring_size = MAX_RING_BUFFER_ENTRIES * sizeof(struct tee_ring_cmd); + struct tee_init_ring_cmd *cmd; + phys_addr_t cmd_buffer; + unsigned int reg; + int ret; + + BUILD_BUG_ON(sizeof(struct tee_ring_cmd) != 1024); + + ret = tee_alloc_ring(tee, ring_size); + if (ret) { + dev_err(tee->dev, "tee: ring allocation failed %d\n", ret); + return ret; + } + + tee->rb_mgr.wptr = 0; + + cmd = tee_alloc_cmd_buffer(tee); + if (!cmd) { + tee_free_ring(tee); + return -ENOMEM; + } + + cmd_buffer = __psp_pa((void *)cmd); + + /* Send command buffer details to Trusted OS by writing to + * CPU-PSP message registers + */ + + iowrite32(lower_32_bits(cmd_buffer), + tee->io_regs + tee->vdata->cmdbuff_addr_lo_reg); + iowrite32(upper_32_bits(cmd_buffer), + tee->io_regs + tee->vdata->cmdbuff_addr_hi_reg); + iowrite32(TEE_RING_INIT_CMD, + tee->io_regs + tee->vdata->cmdresp_reg); + + ret = tee_wait_cmd_poll(tee, TEE_DEFAULT_TIMEOUT, ®); + if (ret) { + dev_err(tee->dev, "tee: ring init command timed out\n"); + tee_free_ring(tee); + goto free_buf; + } + + if (reg & PSP_CMDRESP_ERR_MASK) { + dev_err(tee->dev, "tee: ring init command failed (%#010x)\n", + reg & PSP_CMDRESP_ERR_MASK); + tee_free_ring(tee); + ret = -EIO; + } + +free_buf: + tee_free_cmd_buffer(cmd); + + return ret; +} + +static void tee_destroy_ring(struct psp_tee_device *tee) +{ + unsigned int reg; + int ret; + + if (!tee->rb_mgr.ring_start) + return; + + if (psp_dead) + goto free_ring; + + iowrite32(TEE_RING_DESTROY_CMD, + tee->io_regs + tee->vdata->cmdresp_reg); + + ret = tee_wait_cmd_poll(tee, TEE_DEFAULT_TIMEOUT, ®); + if (ret) { + dev_err(tee->dev, "tee: ring destroy command timed out\n"); + } else if (reg & PSP_CMDRESP_ERR_MASK) { + dev_err(tee->dev, "tee: ring destroy command failed (%#010x)\n", + reg & PSP_CMDRESP_ERR_MASK); + } + +free_ring: + tee_free_ring(tee); +} + +int tee_dev_init(struct psp_device *psp) +{ + struct device *dev = psp->dev; + struct psp_tee_device *tee; + int ret; + + ret = -ENOMEM; + tee = devm_kzalloc(dev, sizeof(*tee), GFP_KERNEL); + if (!tee) + goto e_err; + + psp->tee_data = tee; + + tee->dev = dev; + tee->psp = psp; + + tee->io_regs = psp->io_regs; + + tee->vdata = (struct tee_vdata *)psp->vdata->tee; + if (!tee->vdata) { + ret = -ENODEV; + dev_err(dev, "tee: missing driver data\n"); + goto e_err; + } + + ret = tee_init_ring(tee); + if (ret) { + dev_err(dev, "tee: failed to init ring buffer\n"); + goto e_err; + } + + dev_notice(dev, "tee enabled\n"); + + return 0; + +e_err: + psp->tee_data = NULL; + + dev_notice(dev, "tee initialization failed\n"); + + return ret; +} + +void tee_dev_destroy(struct psp_device *psp) +{ + struct psp_tee_device *tee = psp->tee_data; + + if (!tee) + return; + + tee_destroy_ring(tee); +} + +static int tee_submit_cmd(struct psp_tee_device *tee, enum tee_cmd_id cmd_id, + void *buf, size_t len, struct tee_ring_cmd **resp) +{ + struct tee_ring_cmd *cmd; + int nloop = 1000, ret = 0; + u32 rptr; + + *resp = NULL; + + mutex_lock(&tee->rb_mgr.mutex); + + /* Loop until empty entry found in ring buffer */ + do { + /* Get pointer to ring buffer command entry */ + cmd = (struct tee_ring_cmd *) + (tee->rb_mgr.ring_start + tee->rb_mgr.wptr); + + rptr = ioread32(tee->io_regs + tee->vdata->ring_rptr_reg); + + /* Check if ring buffer is full or command entry is waiting + * for response from TEE + */ + if (!(tee->rb_mgr.wptr + sizeof(struct tee_ring_cmd) == rptr || + cmd->flag == CMD_WAITING_FOR_RESPONSE)) + break; + + dev_dbg(tee->dev, "tee: ring buffer full. rptr = %u wptr = %u\n", + rptr, tee->rb_mgr.wptr); + + /* Wait if ring buffer is full or TEE is processing data */ + mutex_unlock(&tee->rb_mgr.mutex); + schedule_timeout_interruptible(msecs_to_jiffies(10)); + mutex_lock(&tee->rb_mgr.mutex); + + } while (--nloop); + + if (!nloop && + (tee->rb_mgr.wptr + sizeof(struct tee_ring_cmd) == rptr || + cmd->flag == CMD_WAITING_FOR_RESPONSE)) { + dev_err(tee->dev, "tee: ring buffer full. rptr = %u wptr = %u response flag %u\n", + rptr, tee->rb_mgr.wptr, cmd->flag); + ret = -EBUSY; + goto unlock; + } + + /* Do not submit command if PSP got disabled while processing any + * command in another thread + */ + if (psp_dead) { + ret = -EBUSY; + goto unlock; + } + + /* Write command data into ring buffer */ + cmd->cmd_id = cmd_id; + cmd->cmd_state = TEE_CMD_STATE_INIT; + memset(&cmd->buf[0], 0, sizeof(cmd->buf)); + memcpy(&cmd->buf[0], buf, len); + + /* Indicate driver is waiting for response */ + cmd->flag = CMD_WAITING_FOR_RESPONSE; + + /* Update local copy of write pointer */ + tee->rb_mgr.wptr += sizeof(struct tee_ring_cmd); + if (tee->rb_mgr.wptr >= tee->rb_mgr.ring_size) + tee->rb_mgr.wptr = 0; + + /* Trigger interrupt to Trusted OS */ + iowrite32(tee->rb_mgr.wptr, tee->io_regs + tee->vdata->ring_wptr_reg); + + /* The response is provided by Trusted OS in same + * location as submitted data entry within ring buffer. + */ + *resp = cmd; + +unlock: + mutex_unlock(&tee->rb_mgr.mutex); + + return ret; +} + +static int tee_wait_cmd_completion(struct psp_tee_device *tee, + struct tee_ring_cmd *resp, + unsigned int timeout) +{ + /* ~5ms sleep per loop => nloop = timeout * 200 */ + int nloop = timeout * 200; + + while (--nloop) { + if (resp->cmd_state == TEE_CMD_STATE_COMPLETED) + return 0; + + usleep_range(5000, 5100); + } + + dev_err(tee->dev, "tee: command 0x%x timed out, disabling PSP\n", + resp->cmd_id); + + psp_dead = true; + + return -ETIMEDOUT; +} + +int psp_tee_process_cmd(enum tee_cmd_id cmd_id, void *buf, size_t len, + u32 *status) +{ + struct psp_device *psp = psp_get_master_device(); + struct psp_tee_device *tee; + struct tee_ring_cmd *resp; + int ret; + + if (!buf || !status || !len || len > sizeof(resp->buf)) + return -EINVAL; + + *status = 0; + + if (!psp || !psp->tee_data) + return -ENODEV; + + if (psp_dead) + return -EBUSY; + + tee = psp->tee_data; + + ret = tee_submit_cmd(tee, cmd_id, buf, len, &resp); + if (ret) + return ret; + + ret = tee_wait_cmd_completion(tee, resp, TEE_DEFAULT_TIMEOUT); + if (ret) { + resp->flag = CMD_RESPONSE_TIMEDOUT; + return ret; + } + + memcpy(buf, &resp->buf[0], len); + *status = resp->status; + + resp->flag = CMD_RESPONSE_COPIED; + + return 0; +} +EXPORT_SYMBOL(psp_tee_process_cmd); diff --git a/drivers/crypto/ccp/tee-dev.h b/drivers/crypto/ccp/tee-dev.h new file mode 100644 index 0000000000000000000000000000000000000000..49d26158b71e31635557d971ad19a148b9859043 --- /dev/null +++ b/drivers/crypto/ccp/tee-dev.h @@ -0,0 +1,126 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright (C) 2019,2021 Advanced Micro Devices, Inc. + * + * Author: Rijo Thomas + * Author: Devaraj Rangasamy + * + */ + +/* This file describes the TEE communication interface between host and AMD + * Secure Processor + */ + +#ifndef __TEE_DEV_H__ +#define __TEE_DEV_H__ + +#include +#include + +#define TEE_DEFAULT_TIMEOUT 10 +#define MAX_BUFFER_SIZE 988 + +/** + * enum tee_ring_cmd_id - TEE interface commands for ring buffer configuration + * @TEE_RING_INIT_CMD: Initialize ring buffer + * @TEE_RING_DESTROY_CMD: Destroy ring buffer + * @TEE_RING_MAX_CMD: Maximum command id + */ +enum tee_ring_cmd_id { + TEE_RING_INIT_CMD = 0x00010000, + TEE_RING_DESTROY_CMD = 0x00020000, + TEE_RING_MAX_CMD = 0x000F0000, +}; + +/** + * struct tee_init_ring_cmd - Command to init TEE ring buffer + * @low_addr: bits [31:0] of the physical address of ring buffer + * @hi_addr: bits [63:32] of the physical address of ring buffer + * @size: size of ring buffer in bytes + */ +struct tee_init_ring_cmd { + u32 low_addr; + u32 hi_addr; + u32 size; +}; + +#define MAX_RING_BUFFER_ENTRIES 32 + +/** + * struct ring_buf_manager - Helper structure to manage ring buffer. + * @ring_start: starting address of ring buffer + * @ring_size: size of ring buffer in bytes + * @ring_pa: physical address of ring buffer + * @wptr: index to the last written entry in ring buffer + */ +struct ring_buf_manager { + struct mutex mutex; /* synchronizes access to ring buffer */ + void *ring_start; + u32 ring_size; + phys_addr_t ring_pa; + u32 wptr; +}; + +struct psp_tee_device { + struct device *dev; + struct psp_device *psp; + void __iomem *io_regs; + struct tee_vdata *vdata; + struct ring_buf_manager rb_mgr; +}; + +/** + * enum tee_cmd_state - TEE command states for the ring buffer interface + * @TEE_CMD_STATE_INIT: initial state of command when sent from host + * @TEE_CMD_STATE_PROCESS: command being processed by TEE environment + * @TEE_CMD_STATE_COMPLETED: command processing completed + */ +enum tee_cmd_state { + TEE_CMD_STATE_INIT, + TEE_CMD_STATE_PROCESS, + TEE_CMD_STATE_COMPLETED, +}; + +/** + * enum cmd_resp_state - TEE command's response status maintained by driver + * @CMD_RESPONSE_INVALID: initial state when no command is written to ring + * @CMD_WAITING_FOR_RESPONSE: driver waiting for response from TEE + * @CMD_RESPONSE_TIMEDOUT: failed to get response from TEE + * @CMD_RESPONSE_COPIED: driver has copied response from TEE + */ +enum cmd_resp_state { + CMD_RESPONSE_INVALID, + CMD_WAITING_FOR_RESPONSE, + CMD_RESPONSE_TIMEDOUT, + CMD_RESPONSE_COPIED, +}; + +/** + * struct tee_ring_cmd - Structure of the command buffer in TEE ring + * @cmd_id: refers to &enum tee_cmd_id. Command id for the ring buffer + * interface + * @cmd_state: refers to &enum tee_cmd_state + * @status: status of TEE command execution + * @res0: reserved region + * @pdata: private data (currently unused) + * @res1: reserved region + * @buf: TEE command specific buffer + * @flag: refers to &enum cmd_resp_state + */ +struct tee_ring_cmd { + u32 cmd_id; + u32 cmd_state; + u32 status; + u32 res0[1]; + u64 pdata; + u32 res1[2]; + u8 buf[MAX_BUFFER_SIZE]; + u32 flag; + + /* Total size: 1024 bytes */ +} __packed; + +int tee_dev_init(struct psp_device *psp); +void tee_dev_destroy(struct psp_device *psp); + +#endif /* __TEE_DEV_H__ */ diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 5659ab477eba52f53382c7ea988fab1a0275722a..cac8c8f74a3f8e99d71fc0e8fd954c1610bc07a4 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -16,11 +16,9 @@ module_param(ecc_enable_override, int, 0644); static struct msr __percpu *msrs; -static struct amd64_family_type *fam_type; - -static inline u32 get_umc_reg(u32 reg) +static inline u32 get_umc_reg(struct amd64_pvt *pvt, u32 reg) { - if (!fam_type->flags.zn_regs_v2) + if (!pvt->flags.zn_regs_v2) return reg; switch (reg) { @@ -196,21 +194,6 @@ static inline int amd64_read_dct_pci_cfg(struct amd64_pvt *pvt, u8 dct, * other archs, we might not have access to the caches directly. */ -static inline void __f17h_set_scrubval(struct amd64_pvt *pvt, u32 scrubval) -{ - /* - * Fam17h supports scrub values between 0x5 and 0x14. Also, the values - * are shifted down by 0x5, so scrubval 0x5 is written to the register - * as 0x0, scrubval 0x6 as 0x1, etc. - */ - if (scrubval >= 0x5 && scrubval <= 0x14) { - scrubval -= 0x5; - pci_write_bits32(pvt->F6, F17H_SCR_LIMIT_ADDR, scrubval, 0xF); - pci_write_bits32(pvt->F6, F17H_SCR_BASE_ADDR, 1, 0x1); - } else { - pci_write_bits32(pvt->F6, F17H_SCR_BASE_ADDR, 0, 0x1); - } -} /* * Scan the scrub rate mapping table for a close or matching bandwidth value to * issue. If requested is too big, then use last maximum value found. @@ -243,9 +226,7 @@ static int __set_scrub_rate(struct amd64_pvt *pvt, u32 new_bw, u32 min_rate) scrubval = scrubrates[i].scrubval; - if (pvt->umc) { - __f17h_set_scrubval(pvt, scrubval); - } else if (pvt->fam == 0x15 && pvt->model == 0x60) { + if (pvt->fam == 0x15 && pvt->model == 0x60) { f15h_select_dct(pvt, 0); pci_write_bits32(pvt->F2, F15H_M60H_SCRCTRL, scrubval, 0x001F); f15h_select_dct(pvt, 1); @@ -285,16 +266,7 @@ static int get_scrub_rate(struct mem_ctl_info *mci) int i, retval = -EINVAL; u32 scrubval = 0; - if (pvt->umc) { - amd64_read_pci_cfg(pvt->F6, F17H_SCR_BASE_ADDR, &scrubval); - if (scrubval & BIT(0)) { - amd64_read_pci_cfg(pvt->F6, F17H_SCR_LIMIT_ADDR, &scrubval); - scrubval &= 0xF; - scrubval += 0x5; - } else { - scrubval = 0; - } - } else if (pvt->fam == 0x15) { + if (pvt->fam == 0x15) { /* Erratum #505 */ if (pvt->model < 0x10) f15h_select_dct(pvt, 0); @@ -475,7 +447,7 @@ static void get_cs_base_and_mask(struct amd64_pvt *pvt, int csrow, u8 dct, for (i = 0; i < pvt->csels[dct].m_cnt; i++) #define for_each_umc(i) \ - for (i = 0; i < fam_type->max_mcs; i++) + for (i = 0; i < pvt->max_mcs; i++) /* * @input_addr is an InputAddr associated with the node given by mci. Return the @@ -775,7 +747,65 @@ static unsigned long determine_edac_cap(struct amd64_pvt *pvt) return edac_cap; } -static void debug_display_dimm_sizes(struct amd64_pvt *, u8); +/* + * debug routine to display the memory sizes of all logical DIMMs and its + * CSROWs + */ +static void dct_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl) +{ + u32 *dcsb = ctrl ? pvt->csels[1].csbases : pvt->csels[0].csbases; + u32 dbam = ctrl ? pvt->dbam1 : pvt->dbam0; + int dimm, size0, size1; + + if (pvt->fam == 0xf) { + /* K8 families < revF not supported yet */ + if (pvt->ext_model < K8_REV_F) + return; + + WARN_ON(ctrl != 0); + } + + if (pvt->fam == 0x10) { + dbam = (ctrl && !dct_ganging_enabled(pvt)) ? pvt->dbam1 + : pvt->dbam0; + dcsb = (ctrl && !dct_ganging_enabled(pvt)) ? + pvt->csels[1].csbases : + pvt->csels[0].csbases; + } else if (ctrl) { + dbam = pvt->dbam0; + dcsb = pvt->csels[1].csbases; + } + edac_dbg(1, "F2x%d80 (DRAM Bank Address Mapping): 0x%08x\n", + ctrl, dbam); + + edac_printk(KERN_DEBUG, EDAC_MC, "DCT%d chip selects:\n", ctrl); + + /* Dump memory sizes for DIMM and its CSROWs */ + for (dimm = 0; dimm < 4; dimm++) { + size0 = 0; + if (dcsb[dimm * 2] & DCSB_CS_ENABLE) + /* + * For F15m60h, we need multiplier for LRDIMM cs_size + * calculation. We pass dimm value to the dbam_to_cs + * mapper so we can find the multiplier from the + * corresponding DCSM. + */ + size0 = pvt->ops->dbam_to_cs(pvt, ctrl, + DBAM_DIMM(dimm, dbam), + dimm); + + size1 = 0; + if (dcsb[dimm * 2 + 1] & DCSB_CS_ENABLE) + size1 = pvt->ops->dbam_to_cs(pvt, ctrl, + DBAM_DIMM(dimm, dbam), + dimm); + + amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n", + dimm * 2, size0, + dimm * 2 + 1, size1); + } +} + static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, u32 dclr, int chan) { @@ -818,7 +848,7 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, u32 dclr, int chan) #define CS_EVEN (CS_EVEN_PRIMARY | CS_EVEN_SECONDARY) #define CS_ODD (CS_ODD_PRIMARY | CS_ODD_SECONDARY) -static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) +static int umc_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) { u8 base, count = 0; int cs_mode = 0; @@ -850,7 +880,85 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) return cs_mode; } -static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl) +static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, + unsigned int cs_mode, int csrow_nr) +{ + u32 addr_mask_orig, addr_mask_deinterleaved; + u32 msb, weight, num_zero_bits; + int cs_mask_nr = csrow_nr; + int dimm, size = 0; + + /* No Chip Selects are enabled. */ + if (!cs_mode) + return size; + + /* Requested size of an even CS but none are enabled. */ + if (!(cs_mode & CS_EVEN) && !(csrow_nr & 1)) + return size; + + /* Requested size of an odd CS but none are enabled. */ + if (!(cs_mode & CS_ODD) && (csrow_nr & 1)) + return size; + + /* + * Family 17h introduced systems with one mask per DIMM, + * and two Chip Selects per DIMM. + * + * CS0 and CS1 -> MASK0 / DIMM0 + * CS2 and CS3 -> MASK1 / DIMM1 + * + * Family 19h Model 10h introduced systems with one mask per Chip Select, + * and two Chip Selects per DIMM. + * + * CS0 -> MASK0 -> DIMM0 + * CS1 -> MASK1 -> DIMM0 + * CS2 -> MASK2 -> DIMM1 + * CS3 -> MASK3 -> DIMM1 + * + * Keep the mask number equal to the Chip Select number for newer systems, + * and shift the mask number for older systems. + */ + dimm = csrow_nr >> 1; + + if (!pvt->flags.zn_regs_v2) + cs_mask_nr >>= 1; + + /* Asymmetric dual-rank DIMM support. */ + if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY)) + addr_mask_orig = pvt->csels[umc].csmasks_sec[cs_mask_nr]; + else + addr_mask_orig = pvt->csels[umc].csmasks[cs_mask_nr]; + + /* + * The number of zero bits in the mask is equal to the number of bits + * in a full mask minus the number of bits in the current mask. + * + * The MSB is the number of bits in the full mask because BIT[0] is + * always 0. + * + * In the special 3 Rank interleaving case, a single bit is flipped + * without swapping with the most significant bit. This can be handled + * by keeping the MSB where it is and ignoring the single zero bit. + */ + msb = fls(addr_mask_orig) - 1; + weight = hweight_long(addr_mask_orig); + num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE); + + /* Take the number of zero bits off from the top of the mask. */ + addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1); + + edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm); + edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig); + edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved); + + /* Register [31:1] = Address [39:9]. Size is in kBs here. */ + size = (addr_mask_deinterleaved >> 2) + 1; + + /* Return size in MBs. */ + return size >> 10; +} + +static void umc_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl) { int dimm, size0, size1, cs0, cs1, cs_mode; @@ -860,10 +968,10 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl) cs0 = dimm * 2; cs1 = dimm * 2 + 1; - cs_mode = f17_get_cs_mode(dimm, ctrl, pvt); + cs_mode = umc_get_cs_mode(dimm, ctrl, pvt); - size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0); - size1 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs1); + size0 = umc_addr_mask_to_cs_size(pvt, ctrl, cs_mode, cs0); + size1 = umc_addr_mask_to_cs_size(pvt, ctrl, cs_mode, cs1); amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n", cs0, size0, @@ -924,17 +1032,14 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt) if (umc->dram_type == MEM_LRDDR4 || umc->dram_type == MEM_LRDDR5) { amd_smn_read(pvt->mc_node_id, - umc_base + get_umc_reg(UMCCH_ADDR_CFG), + umc_base + get_umc_reg(pvt, UMCCH_ADDR_CFG), &tmp); edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n", i, 1 << ((tmp >> 4) & 0x3)); } - debug_display_dimm_sizes_df(pvt, i); + umc_debug_display_dimm_sizes(pvt, i); } - - edac_dbg(1, "F0x104 (DRAM Hole Address): 0x%08x, base: 0x%08x\n", - pvt->dhar, dhar_base(pvt)); } /* Display and decode various NB registers for debug purposes. */ @@ -958,17 +1063,19 @@ static void __dump_misc_regs(struct amd64_pvt *pvt) (pvt->fam == 0xf) ? k8_dhar_offset(pvt) : f10_dhar_offset(pvt)); - debug_display_dimm_sizes(pvt, 0); + dct_debug_display_dimm_sizes(pvt, 0); /* everything below this point is Fam10h and above */ if (pvt->fam == 0xf) return; - debug_display_dimm_sizes(pvt, 1); + dct_debug_display_dimm_sizes(pvt, 1); /* Only if NOT ganged does dclr1 have valid info */ if (!dct_ganging_enabled(pvt)) debug_dump_dramcfg_low(pvt, pvt->dclr1, 1); + + edac_dbg(1, " DramHoleValid: %s\n", dhar_valid(pvt) ? "yes" : "no"); } /* Display and decode various NB registers for debug purposes. */ @@ -979,8 +1086,6 @@ static void dump_misc_regs(struct amd64_pvt *pvt) else __dump_misc_regs(pvt); - edac_dbg(1, " DramHoleValid: %s\n", dhar_valid(pvt) ? "yes" : "no"); - amd64_info("using x%u syndromes.\n", pvt->ecc_sym_sz); } @@ -1000,7 +1105,7 @@ static void prep_chip_selects(struct amd64_pvt *pvt) for_each_umc(umc) { pvt->csels[umc].b_cnt = 4; - pvt->csels[umc].m_cnt = fam_type->flags.zn_regs_v2 ? 4 : 2; + pvt->csels[umc].m_cnt = pvt->flags.zn_regs_v2 ? 4 : 2; } } else { @@ -1049,7 +1154,7 @@ static void read_umc_base_mask(struct amd64_pvt *pvt) } umc_mask_reg = umc_base + UMCCH_ADDR_MASK; - umc_mask_reg_sec = umc_base + get_umc_reg(UMCCH_ADDR_MASK_SEC); + umc_mask_reg_sec = umc_base + get_umc_reg(pvt, UMCCH_ADDR_MASK_SEC); for_each_chip_select_mask(cs, umc, pvt) { mask = &pvt->csels[umc].csmasks[cs]; @@ -1137,7 +1242,7 @@ static void determine_memory_type_df(struct amd64_pvt *pvt) * Check if the system supports the "DDR Type" field in UMC Config * and has DDR5 DIMMs in use. */ - if ((fam_type->flags.zn_regs_v2 || + if ((pvt->flags.zn_regs_v2 || hygon_f18h_m4h() || hygon_f18h_m10h()) && ((umc->umc_cfg & GENMASK(2, 0)) == 0x1)) { @@ -1222,24 +1327,6 @@ static void determine_memory_type(struct amd64_pvt *pvt) pvt->dram_type = (pvt->dclr0 & BIT(16)) ? MEM_DDR3 : MEM_RDDR3; } -/* Get the number of DCT channels the memory controller is using. */ -static int k8_early_channel_count(struct amd64_pvt *pvt) -{ - int flag; - - if (pvt->ext_model >= K8_REV_F) - /* RevF (NPT) and later */ - flag = pvt->dclr0 & WIDTH_128; - else - /* RevE and earlier */ - flag = pvt->dclr0 & REVE_WIDTH_128; - - /* not used */ - pvt->dclr1 = 0; - - return (flag) ? 2 : 1; -} - /* On F10h and later ErrAddr is MC4_ADDR[47:1] */ static u64 get_error_address(struct amd64_pvt *pvt, struct mce *m) { @@ -1491,69 +1578,6 @@ static int k8_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct, } } -/* - * Get the number of DCT channels in use. - * - * Return: - * number of Memory Channels in operation - * Pass back: - * contents of the DCL0_LOW register - */ -static int f1x_early_channel_count(struct amd64_pvt *pvt) -{ - int i, j, channels = 0; - - /* On F10h, if we are in 128 bit mode, then we are using 2 channels */ - if (pvt->fam == 0x10 && (pvt->dclr0 & WIDTH_128)) - return 2; - - /* - * Need to check if in unganged mode: In such, there are 2 channels, - * but they are not in 128 bit mode and thus the above 'dclr0' status - * bit will be OFF. - * - * Need to check DCT0[0] and DCT1[0] to see if only one of them has - * their CSEnable bit on. If so, then SINGLE DIMM case. - */ - edac_dbg(0, "Data width is not 128 bits - need more decoding\n"); - - /* - * Check DRAM Bank Address Mapping values for each DIMM to see if there - * is more than just one DIMM present in unganged mode. Need to check - * both controllers since DIMMs can be placed in either one. - */ - for (i = 0; i < 2; i++) { - u32 dbam = (i ? pvt->dbam1 : pvt->dbam0); - - for (j = 0; j < 4; j++) { - if (DBAM_DIMM(j, dbam) > 0) { - channels++; - break; - } - } - } - - if (channels > 2) - channels = 2; - - amd64_info("MCT channel count: %d\n", channels); - - return channels; -} - -static int f17_early_channel_count(struct amd64_pvt *pvt) -{ - int i, channels = 0; - - /* SDP Control bit 31 (SdpInit) is clear for unused UMC channels */ - for_each_umc(i) - channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT); - - amd64_info("MCT channel count: %d\n", channels); - - return channels; -} - static int ddr3_cs_size(unsigned i, bool dct_width) { unsigned shift = 0; @@ -1681,84 +1705,6 @@ static int f16_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct, return ddr3_cs_size(cs_mode, false); } -static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, - unsigned int cs_mode, int csrow_nr) -{ - u32 addr_mask_orig, addr_mask_deinterleaved; - u32 msb, weight, num_zero_bits; - int cs_mask_nr = csrow_nr; - int dimm, size = 0; - - /* No Chip Selects are enabled. */ - if (!cs_mode) - return size; - - /* Requested size of an even CS but none are enabled. */ - if (!(cs_mode & CS_EVEN) && !(csrow_nr & 1)) - return size; - - /* Requested size of an odd CS but none are enabled. */ - if (!(cs_mode & CS_ODD) && (csrow_nr & 1)) - return size; - - /* - * Family 17h introduced systems with one mask per DIMM, - * and two Chip Selects per DIMM. - * - * CS0 and CS1 -> MASK0 / DIMM0 - * CS2 and CS3 -> MASK1 / DIMM1 - * - * Family 19h Model 10h introduced systems with one mask per Chip Select, - * and two Chip Selects per DIMM. - * - * CS0 -> MASK0 -> DIMM0 - * CS1 -> MASK1 -> DIMM0 - * CS2 -> MASK2 -> DIMM1 - * CS3 -> MASK3 -> DIMM1 - * - * Keep the mask number equal to the Chip Select number for newer systems, - * and shift the mask number for older systems. - */ - dimm = csrow_nr >> 1; - - if (!fam_type->flags.zn_regs_v2) - cs_mask_nr >>= 1; - - /* Asymmetric dual-rank DIMM support. */ - if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY)) - addr_mask_orig = pvt->csels[umc].csmasks_sec[cs_mask_nr]; - else - addr_mask_orig = pvt->csels[umc].csmasks[cs_mask_nr]; - - /* - * The number of zero bits in the mask is equal to the number of bits - * in a full mask minus the number of bits in the current mask. - * - * The MSB is the number of bits in the full mask because BIT[0] is - * always 0. - * - * In the special 3 Rank interleaving case, a single bit is flipped - * without swapping with the most significant bit. This can be handled - * by keeping the MSB where it is and ignoring the single zero bit. - */ - msb = fls(addr_mask_orig) - 1; - weight = hweight_long(addr_mask_orig); - num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE); - - /* Take the number of zero bits off from the top of the mask. */ - addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1); - - edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm); - edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig); - edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved); - - /* Register [31:1] = Address [39:9]. Size is in kBs here. */ - size = (addr_mask_deinterleaved >> 2) + 1; - - /* Return size in MBs. */ - return size >> 10; -} - static void read_dram_ctl_register(struct amd64_pvt *pvt) { @@ -2281,237 +2227,6 @@ static void f1x_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr, err->channel = get_channel_from_ecc_syndrome(mci, err->syndrome); } -/* - * debug routine to display the memory sizes of all logical DIMMs and its - * CSROWs - */ -static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl) -{ - int dimm, size0, size1; - u32 *dcsb = ctrl ? pvt->csels[1].csbases : pvt->csels[0].csbases; - u32 dbam = ctrl ? pvt->dbam1 : pvt->dbam0; - - if (pvt->fam == 0xf) { - /* K8 families < revF not supported yet */ - if (pvt->ext_model < K8_REV_F) - return; - else - WARN_ON(ctrl != 0); - } - - if (pvt->fam == 0x10) { - dbam = (ctrl && !dct_ganging_enabled(pvt)) ? pvt->dbam1 - : pvt->dbam0; - dcsb = (ctrl && !dct_ganging_enabled(pvt)) ? - pvt->csels[1].csbases : - pvt->csels[0].csbases; - } else if (ctrl) { - dbam = pvt->dbam0; - dcsb = pvt->csels[1].csbases; - } - edac_dbg(1, "F2x%d80 (DRAM Bank Address Mapping): 0x%08x\n", - ctrl, dbam); - - edac_printk(KERN_DEBUG, EDAC_MC, "DCT%d chip selects:\n", ctrl); - - /* Dump memory sizes for DIMM and its CSROWs */ - for (dimm = 0; dimm < 4; dimm++) { - - size0 = 0; - if (dcsb[dimm*2] & DCSB_CS_ENABLE) - /* - * For F15m60h, we need multiplier for LRDIMM cs_size - * calculation. We pass dimm value to the dbam_to_cs - * mapper so we can find the multiplier from the - * corresponding DCSM. - */ - size0 = pvt->ops->dbam_to_cs(pvt, ctrl, - DBAM_DIMM(dimm, dbam), - dimm); - - size1 = 0; - if (dcsb[dimm*2 + 1] & DCSB_CS_ENABLE) - size1 = pvt->ops->dbam_to_cs(pvt, ctrl, - DBAM_DIMM(dimm, dbam), - dimm); - - amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n", - dimm * 2, size0, - dimm * 2 + 1, size1); - } -} - -static struct amd64_family_type family_types[] = { - [K8_CPUS] = { - .ctl_name = "K8", - .f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP, - .f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL, - .max_mcs = 2, - .ops = { - .early_channel_count = k8_early_channel_count, - .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow, - .dbam_to_cs = k8_dbam_to_chip_select, - } - }, - [F10_CPUS] = { - .ctl_name = "F10h", - .f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP, - .f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM, - .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f10_dbam_to_chip_select, - } - }, - [F15_CPUS] = { - .ctl_name = "F15h", - .f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1, - .f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2, - .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f15_dbam_to_chip_select, - } - }, - [F15_M30H_CPUS] = { - .ctl_name = "F15h_M30h", - .f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1, - .f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2, - .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f16_dbam_to_chip_select, - } - }, - [F15_M60H_CPUS] = { - .ctl_name = "F15h_M60h", - .f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1, - .f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2, - .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f15_m60h_dbam_to_chip_select, - } - }, - [F16_CPUS] = { - .ctl_name = "F16h", - .f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1, - .f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2, - .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f16_dbam_to_chip_select, - } - }, - [F16_M30H_CPUS] = { - .ctl_name = "F16h_M30h", - .f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1, - .f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2, - .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f16_dbam_to_chip_select, - } - }, - [F17_CPUS] = { - .ctl_name = "F17h", - .f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6, - .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F17_M10H_CPUS] = { - .ctl_name = "F17h_M10h", - .f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6, - .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F17_M30H_CPUS] = { - .ctl_name = "F17h_M30h", - .f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6, - .max_mcs = 8, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F17_M60H_CPUS] = { - .ctl_name = "F17h_M60h", - .f0_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F6, - .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F17_M70H_CPUS] = { - .ctl_name = "F17h_M70h", - .f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6, - .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F18_M06H_CPUS] = { - .ctl_name = "F18h_M06h", - .f0_id = PCI_DEVICE_ID_HYGON_18H_M06H_DF_F0, - .f6_id = PCI_DEVICE_ID_HYGON_18H_M06H_DF_F6, - .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F18_M10H_CPUS] = { - .ctl_name = "F18h_M10h", - .f0_id = PCI_DEVICE_ID_HYGON_18H_M10H_DF_F0, - .f6_id = PCI_DEVICE_ID_HYGON_18H_M10H_DF_F6, - .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F19_CPUS] = { - .ctl_name = "F19h", - .f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_19H_DF_F6, - .max_mcs = 8, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, - [F19_M10H_CPUS] = { - .ctl_name = "F19h_M10h", - .f0_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F0, - .f6_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F6, - .max_mcs = 12, - .flags.zn_regs_v2 = 1, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } - }, -}; - /* * These are tables of eigenvectors (one per line) which can be used for the * construction of the syndrome tables. The modified syndrome search algorithm @@ -2834,41 +2549,17 @@ static void decode_umc_error(int node_id, struct mce *m) /* * Use pvt->F3 which contains the F3 CPU PCI device to get the related * F1 (AddrMap) and F2 (Dct) devices. Return negative value on error. - * Reserve F0 and F6 on systems with a UMC. */ static int reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2) { - if (pvt->umc) { - pvt->F0 = pci_get_related_function(pvt->F3->vendor, pci_id1, pvt->F3); - if (!pvt->F0) { - amd64_err("F0 not found, device 0x%x (broken BIOS?)\n", pci_id1); - return -ENODEV; - } - - pvt->F6 = pci_get_related_function(pvt->F3->vendor, pci_id2, pvt->F3); - if (!pvt->F6) { - pci_dev_put(pvt->F0); - pvt->F0 = NULL; - - amd64_err("F6 not found: device 0x%x (broken BIOS?)\n", pci_id2); - return -ENODEV; - } - - if (!pci_ctl_dev) - pci_ctl_dev = &pvt->F0->dev; - - edac_dbg(1, "F0: %s\n", pci_name(pvt->F0)); - edac_dbg(1, "F3: %s\n", pci_name(pvt->F3)); - edac_dbg(1, "F6: %s\n", pci_name(pvt->F6)); - + if (pvt->umc) return 0; - } /* Reserve the ADDRESS MAP Device */ pvt->F1 = pci_get_related_function(pvt->F3->vendor, pci_id1, pvt->F3); if (!pvt->F1) { - amd64_err("F1 not found: device 0x%x (broken BIOS?)\n", pci_id1); + edac_dbg(1, "F1 not found: device 0x%x\n", pci_id1); return -ENODEV; } @@ -2878,7 +2569,7 @@ reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2) pci_dev_put(pvt->F1); pvt->F1 = NULL; - amd64_err("F2 not found: device 0x%x (broken BIOS?)\n", pci_id2); + edac_dbg(1, "F2 not found: device 0x%x\n", pci_id2); return -ENODEV; } @@ -2895,8 +2586,7 @@ reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2) static void free_mc_sibling_devs(struct amd64_pvt *pvt) { if (pvt->umc) { - pci_dev_put(pvt->F0); - pci_dev_put(pvt->F6); + return; } else { pci_dev_put(pvt->F1); pci_dev_put(pvt->F2); @@ -2974,7 +2664,7 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt) umc = &pvt->umc[i]; - amd_smn_read(nid, umc_base + get_umc_reg(UMCCH_DIMM_CFG), &umc->dimm_cfg); + amd_smn_read(nid, umc_base + get_umc_reg(pvt, UMCCH_DIMM_CFG), &umc->dimm_cfg); amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg); amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl); amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl); @@ -3009,7 +2699,6 @@ static void read_mc_regs(struct amd64_pvt *pvt) if (pvt->umc) { __read_mc_regs_df(pvt); - amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar); goto skip; } @@ -3099,24 +2788,36 @@ static void read_mc_regs(struct amd64_pvt *pvt) * encompasses * */ -static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig) +static u32 dct_get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr) { u32 dbam = dct ? pvt->dbam1 : pvt->dbam0; - int csrow_nr = csrow_nr_orig; u32 cs_mode, nr_pages; - if (!pvt->umc) { - csrow_nr >>= 1; - cs_mode = DBAM_DIMM(csrow_nr, dbam); - } else { - cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt); - } + csrow_nr >>= 1; + cs_mode = DBAM_DIMM(csrow_nr, dbam); nr_pages = pvt->ops->dbam_to_cs(pvt, dct, cs_mode, csrow_nr); nr_pages <<= 20 - PAGE_SHIFT; edac_dbg(0, "csrow: %d, channel: %d, DBAM idx: %d\n", - csrow_nr_orig, dct, cs_mode); + csrow_nr, dct, cs_mode); + edac_dbg(0, "nr_pages/channel: %u\n", nr_pages); + + return nr_pages; +} + +static u32 umc_get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig) +{ + int csrow_nr = csrow_nr_orig; + u32 cs_mode, nr_pages; + + cs_mode = umc_get_cs_mode(csrow_nr >> 1, dct, pvt); + + nr_pages = umc_addr_mask_to_cs_size(pvt, dct, cs_mode, csrow_nr); + nr_pages <<= 20 - PAGE_SHIFT; + + edac_dbg(0, "csrow: %d, channel: %d, cs_mode %d\n", + csrow_nr_orig, dct, cs_mode); edac_dbg(0, "nr_pages/channel: %u\n", nr_pages); return nr_pages; @@ -3155,7 +2856,7 @@ static int init_csrows_df(struct mem_ctl_info *mci) edac_dbg(1, "MC node: %d, csrow: %d\n", pvt->mc_node_id, cs); - dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs); + dimm->nr_pages = umc_get_csrow_nr_pages(pvt, umc, cs); dimm->mtype = pvt->umc[umc].dram_type; dimm->edac_mode = edac_mode; dimm->dtype = dev_type; @@ -3211,13 +2912,13 @@ static int init_csrows(struct mem_ctl_info *mci) pvt->mc_node_id, i); if (row_dct0) { - nr_pages = get_csrow_nr_pages(pvt, 0, i); + nr_pages = dct_get_csrow_nr_pages(pvt, 0, i); csrow->channels[0]->dimm->nr_pages = nr_pages; } /* K8 has only one DCT */ if (pvt->fam != 0xf && row_dct1) { - int row_dct1_pages = get_csrow_nr_pages(pvt, 1, i); + int row_dct1_pages = dct_get_csrow_nr_pages(pvt, 1, i); csrow->channels[1]->dimm->nr_pages = row_dct1_pages; nr_pages += row_dct1_pages; @@ -3232,7 +2933,7 @@ static int init_csrows(struct mem_ctl_info *mci) : EDAC_SECDED; } - for (j = 0; j < pvt->channel_count; j++) { + for (j = 0; j < pvt->max_mcs; j++) { dimm = csrow->channels[j]->dimm; dimm->mtype = pvt->dram_type; dimm->edac_mode = edac_mode; @@ -3449,8 +3150,7 @@ static bool ecc_enabled(struct amd64_pvt *pvt) MSR_IA32_MCG_CTL, nid); } - amd64_info("Node %d: DRAM ECC %s.\n", - nid, (ecc_en ? "enabled" : "disabled")); + edac_dbg(3, "Node %d: DRAM ECC %s.\n", nid, (ecc_en ? "enabled" : "disabled")); if (!ecc_en || !nb_mce_en) return false; @@ -3508,153 +3208,194 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci) mci->edac_cap = determine_edac_cap(pvt); mci->mod_name = EDAC_MOD_STR; - mci->ctl_name = fam_type->ctl_name; + mci->ctl_name = pvt->ctl_name; mci->dev_name = pci_name(pvt->F3); mci->ctl_page_to_phys = NULL; + if (pvt->fam >= 0x17) + return; + /* memory scrubber interface */ mci->set_sdram_scrub_rate = set_scrub_rate; mci->get_sdram_scrub_rate = get_scrub_rate; } -/* - * returns a pointer to the family descriptor on success, NULL otherwise. - */ -static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) +static struct low_ops umc_ops = { +}; + +/* Use Family 16h versions for defaults and adjust as needed below. */ +static struct low_ops dct_ops = { + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, + .dbam_to_cs = f16_dbam_to_chip_select, +}; + +static int per_family_init(struct amd64_pvt *pvt) { pvt->ext_model = boot_cpu_data.x86_model >> 4; pvt->stepping = boot_cpu_data.x86_stepping; pvt->model = boot_cpu_data.x86_model; pvt->fam = boot_cpu_data.x86; + pvt->max_mcs = 2; + + /* + * Decide on which ops group to use here and do any family/model + * overrides below. + */ + if (pvt->fam >= 0x17) + pvt->ops = &umc_ops; + else + pvt->ops = &dct_ops; switch (pvt->fam) { case 0xf: - fam_type = &family_types[K8_CPUS]; - pvt->ops = &family_types[K8_CPUS].ops; + pvt->ctl_name = (pvt->ext_model >= K8_REV_F) ? + "K8 revF or later" : "K8 revE or earlier"; + pvt->f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP; + pvt->f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL; + pvt->ops->map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow; + pvt->ops->dbam_to_cs = k8_dbam_to_chip_select; break; case 0x10: - fam_type = &family_types[F10_CPUS]; - pvt->ops = &family_types[F10_CPUS].ops; + pvt->ctl_name = "F10h"; + pvt->f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP; + pvt->f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM; + pvt->ops->dbam_to_cs = f10_dbam_to_chip_select; break; case 0x15: - if (pvt->model == 0x30) { - fam_type = &family_types[F15_M30H_CPUS]; - pvt->ops = &family_types[F15_M30H_CPUS].ops; + switch (pvt->model) { + case 0x30: + pvt->ctl_name = "F15h_M30h"; + pvt->f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1; + pvt->f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2; + break; + case 0x60: + pvt->ctl_name = "F15h_M60h"; + pvt->f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1; + pvt->f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2; + pvt->ops->dbam_to_cs = f15_m60h_dbam_to_chip_select; break; - } else if (pvt->model == 0x60) { - fam_type = &family_types[F15_M60H_CPUS]; - pvt->ops = &family_types[F15_M60H_CPUS].ops; + case 0x13: + /* Richland is only client */ + return -ENODEV; + default: + pvt->ctl_name = "F15h"; + pvt->f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1; + pvt->f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2; + pvt->ops->dbam_to_cs = f15_dbam_to_chip_select; break; } - - fam_type = &family_types[F15_CPUS]; - pvt->ops = &family_types[F15_CPUS].ops; break; case 0x16: - if (pvt->model == 0x30) { - fam_type = &family_types[F16_M30H_CPUS]; - pvt->ops = &family_types[F16_M30H_CPUS].ops; + switch (pvt->model) { + case 0x30: + pvt->ctl_name = "F16h_M30h"; + pvt->f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1; + pvt->f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2; + break; + default: + pvt->ctl_name = "F16h"; + pvt->f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1; + pvt->f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2; break; } - fam_type = &family_types[F16_CPUS]; - pvt->ops = &family_types[F16_CPUS].ops; break; case 0x17: - if (pvt->model >= 0x10 && pvt->model <= 0x2f) { - fam_type = &family_types[F17_M10H_CPUS]; - pvt->ops = &family_types[F17_M10H_CPUS].ops; + switch (pvt->model) { + case 0x10 ... 0x2f: + pvt->ctl_name = "F17h_M10h"; + break; + case 0x30 ... 0x3f: + pvt->ctl_name = "F17h_M30h"; + pvt->max_mcs = 8; break; - } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) { - fam_type = &family_types[F17_M30H_CPUS]; - pvt->ops = &family_types[F17_M30H_CPUS].ops; + case 0x60 ... 0x6f: + pvt->ctl_name = "F17h_M60h"; break; - } else if (pvt->model >= 0x60 && pvt->model <= 0x6f) { - fam_type = &family_types[F17_M60H_CPUS]; - pvt->ops = &family_types[F17_M60H_CPUS].ops; + case 0x70 ... 0x7f: + pvt->ctl_name = "F17h_M70h"; break; - } else if (pvt->model >= 0x70 && pvt->model <= 0x7f) { - fam_type = &family_types[F17_M70H_CPUS]; - pvt->ops = &family_types[F17_M70H_CPUS].ops; + default: + pvt->ctl_name = "F17h"; break; } - fam_type = &family_types[F17_CPUS]; - pvt->ops = &family_types[F17_CPUS].ops; break; case 0x18: - if (pvt->model == 0x4) { - fam_type = &family_types[F17_M30H_CPUS]; - pvt->ops = &family_types[F17_M30H_CPUS].ops; - family_types[F17_M30H_CPUS].max_mcs = 3; - family_types[F17_M30H_CPUS].ctl_name = "F18h_M04h"; - break; - } else if (pvt->model == 0x5) { - fam_type = &family_types[F17_M30H_CPUS]; - pvt->ops = &family_types[F17_M30H_CPUS].ops; - family_types[F17_M30H_CPUS].max_mcs = 1; - family_types[F17_M30H_CPUS].ctl_name = "F18h_M05h"; + switch (pvt->model) { + case 0x04: + pvt->ctl_name = "F18h_M04h"; + pvt->max_mcs = 3; + break; + case 0x05: + pvt->ctl_name = "F18h_M05h"; + pvt->max_mcs = 1; + break; + case 0x06: + pvt->ctl_name = "F18h_M06h"; + pvt->max_mcs = 2; + break; + case 0x07: + pvt->ctl_name = "F18h_M07h"; + break; + case 0x08: + pvt->ctl_name = "F18h_M08h"; + break; + case 0x10: + pvt->ctl_name = "F18h_M10h"; + pvt->max_mcs = 2; + break; + } + break; + + case 0x19: + switch (pvt->model) { + case 0x00 ... 0x0f: + pvt->ctl_name = "F19h"; + pvt->max_mcs = 8; break; - } else if (pvt->model == 0x6) { - fam_type = &family_types[F18_M06H_CPUS]; - pvt->ops = &family_types[F18_M06H_CPUS].ops; + case 0x10 ... 0x1f: + pvt->ctl_name = "F19h_M10h"; + pvt->max_mcs = 12; + pvt->flags.zn_regs_v2 = 1; break; - } else if (pvt->model == 0x7) { - fam_type = &family_types[F18_M06H_CPUS]; - pvt->ops = &family_types[F18_M06H_CPUS].ops; - family_types[F18_M06H_CPUS].ctl_name = "F18h_M07h"; + case 0x20 ... 0x2f: + pvt->ctl_name = "F19h_M20h"; break; - } else if (pvt->model == 0x8) { - fam_type = &family_types[F18_M06H_CPUS]; - pvt->ops = &family_types[F18_M06H_CPUS].ops; - family_types[F18_M06H_CPUS].ctl_name = "F18h_M08h"; + case 0x50 ... 0x5f: + pvt->ctl_name = "F19h_M50h"; break; - } else if (pvt->model == 0x10) { - fam_type = &family_types[F18_M10H_CPUS]; - pvt->ops = &family_types[F18_M10H_CPUS].ops; - family_types[F18_M10H_CPUS].ctl_name = "F18h_M10h"; + case 0xa0 ... 0xaf: + pvt->ctl_name = "F19h_MA0h"; + pvt->max_mcs = 12; + pvt->flags.zn_regs_v2 = 1; break; } - fam_type = &family_types[F17_CPUS]; - pvt->ops = &family_types[F17_CPUS].ops; - family_types[F17_CPUS].ctl_name = "F18h"; break; - case 0x19: - if (pvt->model >= 0x10 && pvt->model <= 0x1f) { - fam_type = &family_types[F19_M10H_CPUS]; - pvt->ops = &family_types[F19_M10H_CPUS].ops; - break; - } else if (pvt->model >= 0x20 && pvt->model <= 0x2f) { - fam_type = &family_types[F17_M70H_CPUS]; - pvt->ops = &family_types[F17_M70H_CPUS].ops; - fam_type->ctl_name = "F19h_M20h"; + case 0x1A: + switch (pvt->model) { + case 0x00 ... 0x1f: + pvt->ctl_name = "F1Ah"; + pvt->max_mcs = 12; + pvt->flags.zn_regs_v2 = 1; break; - } else if (pvt->model >= 0xa0 && pvt->model <= 0xaf) { - fam_type = &family_types[F19_M10H_CPUS]; - pvt->ops = &family_types[F19_M10H_CPUS].ops; - fam_type->ctl_name = "F19h_MA0h"; + case 0x40 ... 0x4f: + pvt->ctl_name = "F1Ah_M40h"; + pvt->flags.zn_regs_v2 = 1; break; } - fam_type = &family_types[F19_CPUS]; - pvt->ops = &family_types[F19_CPUS].ops; - family_types[F19_CPUS].ctl_name = "F19h"; break; default: amd64_err("Unsupported family!\n"); - return NULL; + return -ENODEV; } - amd64_info("%s %sdetected (node %d).\n", fam_type->ctl_name, - (pvt->fam == 0xf ? - (pvt->ext_model >= K8_REV_F ? "revF or later " - : "revE or earlier ") - : ""), pvt->mc_node_id); - return fam_type; + return 0; } static const struct attribute_group *amd64_edac_attr_groups[] = { @@ -3673,15 +3414,12 @@ static int hw_info_get(struct amd64_pvt *pvt) int ret = -EINVAL; if (pvt->fam >= 0x17) { - pvt->umc = kcalloc(fam_type->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL); + pvt->umc = kcalloc(pvt->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL); if (!pvt->umc) return -ENOMEM; - - pci_id1 = fam_type->f0_id; - pci_id2 = fam_type->f6_id; } else { - pci_id1 = fam_type->f1_id; - pci_id2 = fam_type->f2_id; + pci_id1 = pvt->f1_id; + pci_id2 = pvt->f2_id; } ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2); @@ -3695,7 +3433,7 @@ static int hw_info_get(struct amd64_pvt *pvt) static void hw_info_put(struct amd64_pvt *pvt) { - if (pvt->F0 || pvt->F1) + if (pvt->F1) free_mc_sibling_devs(pvt); kfree(pvt->umc); @@ -3705,29 +3443,13 @@ static int init_one_instance(struct amd64_pvt *pvt) { struct mem_ctl_info *mci = NULL; struct edac_mc_layer layers[2]; - int ret = -EINVAL; - - /* - * We need to determine how many memory channels there are. Then use - * that information for calculating the size of the dynamic instance - * tables in the 'mci' structure. - */ - pvt->channel_count = pvt->ops->early_channel_count(pvt); - if (pvt->channel_count < 0) - return ret; + int ret = -ENOMEM; - ret = -ENOMEM; layers[0].type = EDAC_MC_LAYER_CHIP_SELECT; layers[0].size = pvt->csels[0].b_cnt; layers[0].is_virt_csrow = true; layers[1].type = EDAC_MC_LAYER_CHANNEL; - - /* - * Always allocate two channels since we can have setups with DIMMs on - * only one channel. Also, this simplifies handling later for the price - * of a couple of KBs tops. - */ - layers[1].size = fam_type->max_mcs; + layers[1].size = pvt->max_mcs; layers[1].is_virt_csrow = false; mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0); @@ -3757,7 +3479,7 @@ static bool instance_has_memory(struct amd64_pvt *pvt) bool cs_enabled = false; int cs = 0, dct = 0; - for (dct = 0; dct < fam_type->max_mcs; dct++) { + for (dct = 0; dct < pvt->max_mcs; dct++) { for_each_chip_select(cs, dct, pvt) cs_enabled |= csrow_enabled(cs, dct, pvt); } @@ -3786,8 +3508,8 @@ static int probe_one_instance(unsigned int nid) pvt->mc_node_id = nid; pvt->F3 = F3; - fam_type = per_family_init(pvt); - if (!fam_type) + ret = per_family_init(pvt); + if (ret < 0) goto err_enable; ret = hw_info_get(pvt); @@ -3826,6 +3548,8 @@ static int probe_one_instance(unsigned int nid) goto err_enable; } + amd64_info("%s detected (node %d).\n", pvt->ctl_name, pvt->mc_node_id); + dump_misc_regs(pvt); return ret; @@ -3885,13 +3609,14 @@ static void setup_pci_device(void) } static const struct x86_cpu_id amd64_cpuids[] = { - { X86_VENDOR_AMD, 0xF, X86_MODEL_ANY, X86_FEATURE_ANY, 0 }, - { X86_VENDOR_AMD, 0x10, X86_MODEL_ANY, X86_FEATURE_ANY, 0 }, - { X86_VENDOR_AMD, 0x15, X86_MODEL_ANY, X86_FEATURE_ANY, 0 }, - { X86_VENDOR_AMD, 0x16, X86_MODEL_ANY, X86_FEATURE_ANY, 0 }, - { X86_VENDOR_AMD, 0x17, X86_MODEL_ANY, X86_FEATURE_ANY, 0 }, - { X86_VENDOR_HYGON, 0x18, X86_MODEL_ANY, X86_FEATURE_ANY, 0 }, - { X86_VENDOR_AMD, 0x19, X86_MODEL_ANY, X86_FEATURE_ANY, 0 }, + X86_MATCH_VENDOR_FAM(AMD, 0x0F, NULL), + X86_MATCH_VENDOR_FAM(AMD, 0x10, NULL), + X86_MATCH_VENDOR_FAM(AMD, 0x15, NULL), + X86_MATCH_VENDOR_FAM(AMD, 0x16, NULL), + X86_MATCH_VENDOR_FAM(AMD, 0x17, NULL), + X86_MATCH_VENDOR_FAM(HYGON, 0x18, NULL), + X86_MATCH_VENDOR_FAM(AMD, 0x19, NULL), + X86_MATCH_VENDOR_FAM(AMD, 0x1A, NULL), { } }; MODULE_DEVICE_TABLE(x86cpu, amd64_cpuids); @@ -3949,12 +3674,12 @@ static int __init amd64_edac_init(void) if (report_gart_errors) amd_report_gart_errors(true); - if (boot_cpu_data.x86 >= 0x17) + if (boot_cpu_data.x86 >= 0x17) { amd_register_ecc_decoder(decode_umc_error); - else + } else { amd_register_ecc_decoder(decode_bus_error); - - setup_pci_device(); + setup_pci_device(); + } #ifdef CONFIG_X86_32 amd64_err("%s on 32-bit is unsupported. USE AT YOUR OWN RISK!\n", EDAC_MOD_STR); diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index ac2d939fb984080a5c80bd1b495dc858770e7a16..2bb81794072769d7f722abadf9de2f463b9384bc 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -114,25 +114,6 @@ #define PCI_DEVICE_ID_AMD_16H_NB_F2 0x1532 #define PCI_DEVICE_ID_AMD_16H_M30H_NB_F1 0x1581 #define PCI_DEVICE_ID_AMD_16H_M30H_NB_F2 0x1582 -#define PCI_DEVICE_ID_AMD_17H_DF_F0 0x1460 -#define PCI_DEVICE_ID_AMD_17H_DF_F6 0x1466 -#define PCI_DEVICE_ID_AMD_17H_M10H_DF_F0 0x15e8 -#define PCI_DEVICE_ID_AMD_17H_M10H_DF_F6 0x15ee -#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F0 0x1490 -#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F6 0x1496 -#define PCI_DEVICE_ID_AMD_17H_M60H_DF_F0 0x1448 -#define PCI_DEVICE_ID_AMD_17H_M60H_DF_F6 0x144e -#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F0 0x1440 -#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446 -#define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650 -#define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656 -#define PCI_DEVICE_ID_AMD_19H_M10H_DF_F0 0x14ad -#define PCI_DEVICE_ID_AMD_19H_M10H_DF_F6 0x14b3 - -#define PCI_DEVICE_ID_HYGON_18H_M06H_DF_F0 0x14b0 -#define PCI_DEVICE_ID_HYGON_18H_M06H_DF_F6 0x14b6 -#define PCI_DEVICE_ID_HYGON_18H_M10H_DF_F0 0x14d0 -#define PCI_DEVICE_ID_HYGON_18H_M10H_DF_F6 0x14d6 /* * Function 1 - Address Map @@ -218,8 +199,6 @@ #define DCT_SEL_HI 0x114 #define F15H_M60H_SCRCTRL 0x1C8 -#define F17H_SCR_BASE_ADDR 0x48 -#define F17H_SCR_LIMIT_ADDR 0x4C /* * Function 3 - Misc Control @@ -294,26 +273,6 @@ #define UMC_SDP_INIT BIT(31) -enum amd_families { - K8_CPUS = 0, - F10_CPUS, - F15_CPUS, - F15_M30H_CPUS, - F15_M60H_CPUS, - F16_CPUS, - F16_M30H_CPUS, - F17_CPUS, - F17_M10H_CPUS, - F17_M30H_CPUS, - F17_M60H_CPUS, - F17_M70H_CPUS, - F18_M06H_CPUS, - F18_M10H_CPUS, - F19_CPUS, - F19_M10H_CPUS, - NUM_FAMILIES, -}; - /* Error injection control structure */ struct error_injection { u32 section; @@ -356,11 +315,21 @@ struct amd64_umc { enum mem_type dram_type; }; +struct amd64_family_flags { + /* + * Indicates that the system supports the new register offsets, etc. + * first introduced with Family 19h Model 10h. + */ + __u64 zn_regs_v2 : 1, + + __reserved : 63; +}; + struct amd64_pvt { struct low_ops *ops; /* pci_device handles which we utilize */ - struct pci_dev *F0, *F1, *F2, *F3, *F6; + struct pci_dev *F1, *F2, *F3; u16 mc_node_id; /* MC index of this MC node */ u8 fam; /* CPU family */ @@ -368,7 +337,6 @@ struct amd64_pvt { u8 stepping; /* ... stepping */ int ext_model; /* extended model value of this node */ - int channel_count; /* Raw registers */ u32 dclr0; /* DRAM Configuration Low DCT0 reg */ @@ -398,6 +366,12 @@ struct amd64_pvt { /* x4, x8, or x16 syndromes in use */ u8 ecc_sym_sz; + const char *ctl_name; + u16 f1_id, f2_id; + /* Maximum number of memory controllers per die/node. */ + u8 max_mcs; + + struct amd64_family_flags flags; /* place to store error injection parameters prior to issue */ struct error_injection injection; @@ -496,30 +470,10 @@ extern const struct attribute_group amd64_edac_inj_group; * functions and per device encoding/decoding logic. */ struct low_ops { - int (*early_channel_count) (struct amd64_pvt *pvt); - void (*map_sysaddr_to_csrow) (struct mem_ctl_info *mci, u64 sys_addr, - struct err_info *); - int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct, - unsigned cs_mode, int cs_mask_nr); -}; - -struct amd64_family_flags { - /* - * Indicates that the system supports the new register offsets, etc. - * first introduced with Family 19h Model 10h. - */ - __u64 zn_regs_v2 : 1, - - __reserved : 63; -}; - -struct amd64_family_type { - const char *ctl_name; - u16 f0_id, f1_id, f2_id, f6_id; - /* Maximum number of memory controllers per die/node. */ - u8 max_mcs; - struct amd64_family_flags flags; - struct low_ops ops; + void (*map_sysaddr_to_csrow)(struct mem_ctl_info *mci, u64 sys_addr, + struct err_info *err); + int (*dbam_to_cs)(struct amd64_pvt *pvt, u8 dct, + unsigned int cs_mode, int cs_mask_nr); }; int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset, diff --git a/drivers/edac/i10nm_base.c b/drivers/edac/i10nm_base.c index 6ab09bf26babe799e8a0bad0ccaa427558eb0036..96a790d04531ea92bc825b0aceffdbee279883a5 100644 --- a/drivers/edac/i10nm_base.c +++ b/drivers/edac/i10nm_base.c @@ -255,10 +255,10 @@ static struct res_config i10nm_cfg1 = { }; static const struct x86_cpu_id i10nm_cpuids[] = { - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_TREMONT_D, 0, (kernel_ulong_t)&i10nm_cfg0 }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ICELAKE_X, 0, (kernel_ulong_t)&i10nm_cfg0 }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ICELAKE_D, 0, (kernel_ulong_t)&i10nm_cfg1 }, - { } + X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D, &i10nm_cfg0), + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, &i10nm_cfg0), + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, &i10nm_cfg1), + {} }; MODULE_DEVICE_TABLE(x86cpu, i10nm_cpuids); diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c index ef6f882a870b6481ff79fb059d55be4b8003df28..c6194b97b6671620153b4517876bc00f536c3c13 100644 --- a/drivers/edac/pnd2_edac.c +++ b/drivers/edac/pnd2_edac.c @@ -1538,8 +1538,8 @@ static struct dunit_ops dnv_ops = { }; static const struct x86_cpu_id pnd2_cpuids[] = { - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT, 0, (kernel_ulong_t)&apl_ops }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT_D, 0, (kernel_ulong_t)&dnv_ops }, + X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, &apl_ops), + X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT_D, &dnv_ops), { } }; MODULE_DEVICE_TABLE(x86cpu, pnd2_cpuids); diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c index 58a772ef437beeb1cf849cedcf8dd7aded974a07..ecbc7b9645f38fad4582298d93d5adef66a097fe 100644 --- a/drivers/edac/sb_edac.c +++ b/drivers/edac/sb_edac.c @@ -3423,13 +3423,13 @@ static int sbridge_register_mci(struct sbridge_dev *sbridge_dev, enum type type) } static const struct x86_cpu_id sbridge_cpuids[] = { - INTEL_CPU_FAM6(SANDYBRIDGE_X, pci_dev_descr_sbridge_table), - INTEL_CPU_FAM6(IVYBRIDGE_X, pci_dev_descr_ibridge_table), - INTEL_CPU_FAM6(HASWELL_X, pci_dev_descr_haswell_table), - INTEL_CPU_FAM6(BROADWELL_X, pci_dev_descr_broadwell_table), - INTEL_CPU_FAM6(BROADWELL_D, pci_dev_descr_broadwell_table), - INTEL_CPU_FAM6(XEON_PHI_KNL, pci_dev_descr_knl_table), - INTEL_CPU_FAM6(XEON_PHI_KNM, pci_dev_descr_knl_table), + X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE_X, &pci_dev_descr_sbridge_table), + X86_MATCH_INTEL_FAM6_MODEL(IVYBRIDGE_X, &pci_dev_descr_ibridge_table), + X86_MATCH_INTEL_FAM6_MODEL(HASWELL_X, &pci_dev_descr_haswell_table), + X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, &pci_dev_descr_broadwell_table), + X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_D, &pci_dev_descr_broadwell_table), + X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL, &pci_dev_descr_knl_table), + X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM, &pci_dev_descr_knl_table), { } }; MODULE_DEVICE_TABLE(x86cpu, sbridge_cpuids); diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_base.c index 9130da0f7421a83eb67af974402a2d82e33640de..1c9df95ac96e3c730e00c193c65249ab44527f99 100644 --- a/drivers/edac/skx_base.c +++ b/drivers/edac/skx_base.c @@ -164,7 +164,7 @@ static struct res_config skx_cfg = { }; static const struct x86_cpu_id skx_cpuids[] = { - { X86_VENDOR_INTEL, 6, INTEL_FAM6_SKYLAKE_X, 0, (kernel_ulong_t)&skx_cfg}, + X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, &skx_cfg), { } }; MODULE_DEVICE_TABLE(x86cpu, skx_cpuids); diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c index e08bb92d3451b2557d8a7e57f1f98f3192724c3b..47595a5beca17830cccc5ade2d56b72722f54d4c 100644 --- a/drivers/hwmon/k10temp.c +++ b/drivers/hwmon/k10temp.c @@ -66,7 +66,7 @@ static DEFINE_MUTEX(nb_smu_ind_mutex); #define F15H_M60H_REPORTED_TEMP_CTRL_OFFSET 0xd8200ca4 -/* Common for Zen CPU families (Family 17h and 18h and 19h) */ +/* Common for Zen CPU families (Family 17h and 18h and 19h and 1Ah) */ #define ZEN_REPORTED_TEMP_CTRL_BASE 0x00059800 #define ZEN_CCD_TEMP(offset, x) (ZEN_REPORTED_TEMP_CTRL_BASE + \ @@ -512,6 +512,10 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id) break; } + } else if (boot_cpu_data.x86 == 0x1a) { + data->temp_adjust_mask = ZEN_CUR_TEMP_RANGE_SEL_MASK; + data->read_tempreg = read_tempreg_nb_zen; + data->is_zen = true; } else { data->read_htcreg = read_htcreg_pci; data->read_tempreg = read_tempreg_pci; @@ -552,6 +556,8 @@ static const struct pci_device_id k10temp_id_table[] = { { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_M70H_DF_F3) }, { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_DF_F3) }, { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_19H_M10H_DF_F3) }, + { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_1AH_M00H_DF_F3) }, + { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_1AH_M20H_DF_F3) }, { PCI_VDEVICE(HYGON, PCI_DEVICE_ID_AMD_17H_DF_F3) }, { PCI_VDEVICE(HYGON, PCI_DEVICE_ID_AMD_17H_M30H_DF_F3) }, { PCI_VDEVICE(HYGON, PCI_DEVICE_ID_HYGON_18H_M05H_DF_F3) }, diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig index 99f368d354f561bfe1fe97d84c2e51e7d555b088..6a2250078189f72f7937f124d057e211641c7df7 100644 --- a/drivers/i2c/busses/Kconfig +++ b/drivers/i2c/busses/Kconfig @@ -523,6 +523,7 @@ config I2C_DAVINCI config I2C_DESIGNWARE_CORE tristate + select REGMAP config I2C_DESIGNWARE_PLATFORM tristate "Synopsys DesignWare Platform" diff --git a/drivers/i2c/busses/i2c-designware-common.c b/drivers/i2c/busses/i2c-designware-common.c index 8e275c8c63e642720a34b3878b2f834dc2e4b993..eaa0fadc9f1111942ef45a554e1b9eb79233a880 100644 --- a/drivers/i2c/busses/i2c-designware-common.c +++ b/drivers/i2c/busses/i2c-designware-common.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include "i2c-designware-core.h" @@ -53,66 +54,122 @@ static char *abort_sources[] = { "incorrect slave-transmitter mode configuration", }; -u32 dw_readl(struct dw_i2c_dev *dev, int offset) +static int dw_reg_read(void *context, unsigned int reg, unsigned int *val) { - u32 value; + struct dw_i2c_dev *dev = context; - if (dev->flags & ACCESS_16BIT) - value = readw(dev->base + offset) | - (readw(dev->base + offset + 2) << 16); - else - value = readl(dev->base + offset); + *val = readl(dev->base + reg); - if (dev->flags & ACCESS_SWAP) - return swab32(value); - else - return value; + return 0; } -void dw_writel(struct dw_i2c_dev *dev, u32 b, int offset) +static int dw_reg_write(void *context, unsigned int reg, unsigned int val) { - if (dev->flags & ACCESS_SWAP) - b = swab32(b); - - if (dev->flags & ACCESS_16BIT) { - writew((u16)b, dev->base + offset); - writew((u16)(b >> 16), dev->base + offset + 2); - } else { - writel(b, dev->base + offset); - } + struct dw_i2c_dev *dev = context; + + writel(val, dev->base + reg); + + return 0; +} + +static int dw_reg_read_swab(void *context, unsigned int reg, unsigned int *val) +{ + struct dw_i2c_dev *dev = context; + + *val = swab32(readl(dev->base + reg)); + + return 0; +} + +static int dw_reg_write_swab(void *context, unsigned int reg, unsigned int val) +{ + struct dw_i2c_dev *dev = context; + + writel(swab32(val), dev->base + reg); + + return 0; +} + +static int dw_reg_read_word(void *context, unsigned int reg, unsigned int *val) +{ + struct dw_i2c_dev *dev = context; + + *val = readw(dev->base + reg) | + (readw(dev->base + reg + 2) << 16); + + return 0; +} + +static int dw_reg_write_word(void *context, unsigned int reg, unsigned int val) +{ + struct dw_i2c_dev *dev = context; + + writew(val, dev->base + reg); + writew(val >> 16, dev->base + reg + 2); + + return 0; } /** - * i2c_dw_set_reg_access() - Set register access flags + * i2c_dw_init_regmap() - Initialize registers map * @dev: device private data * - * Autodetects needed register access mode and sets access flags accordingly. - * This must be called before doing any other register access. + * Autodetects needed register access mode and creates the regmap with + * corresponding read/write callbacks. This must be called before doing any + * other register access. */ -int i2c_dw_set_reg_access(struct dw_i2c_dev *dev) +int i2c_dw_init_regmap(struct dw_i2c_dev *dev) { + struct regmap_config map_cfg = { + .reg_bits = 32, + .val_bits = 32, + .reg_stride = 4, + .disable_locking = true, + .reg_read = dw_reg_read, + .reg_write = dw_reg_write, + .max_register = DW_IC_COMP_TYPE, + }; u32 reg; int ret; + /* + * Skip detecting the registers map configuration if the regmap has + * already been provided by a higher code. + */ + if (dev->map) + return 0; + ret = i2c_dw_acquire_lock(dev); if (ret) return ret; - reg = dw_readl(dev, DW_IC_COMP_TYPE); + reg = readl(dev->base + DW_IC_COMP_TYPE); i2c_dw_release_lock(dev); if (reg == swab32(DW_IC_COMP_TYPE_VALUE)) { - /* Configure register endianess access */ - dev->flags |= ACCESS_SWAP; + map_cfg.reg_read = dw_reg_read_swab; + map_cfg.reg_write = dw_reg_write_swab; } else if (reg == (DW_IC_COMP_TYPE_VALUE & 0x0000ffff)) { - /* Configure register access mode 16bit */ - dev->flags |= ACCESS_16BIT; + map_cfg.reg_read = dw_reg_read_word; + map_cfg.reg_write = dw_reg_write_word; } else if (reg != DW_IC_COMP_TYPE_VALUE) { dev_err(dev->dev, "Unknown Synopsys component type: 0x%08x\n", reg); return -ENODEV; } + /* + * Note we'll check the return value of the regmap IO accessors only + * at the probe stage. The rest of the code won't do this because + * basically we have MMIO-based regmap so non of the read/write methods + * can fail. + */ + dev->map = devm_regmap_init(dev->dev, NULL, dev, &map_cfg); + if (IS_ERR(dev->map)) { + dev_err(dev->dev, "Failed to init the registers map\n"); + return PTR_ERR(dev->map); + } + return 0; } @@ -181,11 +238,17 @@ int i2c_dw_set_sda_hold(struct dw_i2c_dev *dev) return ret; /* Configure SDA Hold Time if required */ - reg = dw_readl(dev, DW_IC_COMP_VERSION); + ret = regmap_read(dev->map, DW_IC_COMP_VERSION, ®); + if (ret) + goto err_release_lock; + if (reg >= DW_IC_SDA_HOLD_MIN_VERS) { if (!dev->sda_hold_time) { /* Keep previous hold time setting if no one set it */ - dev->sda_hold_time = dw_readl(dev, DW_IC_SDA_HOLD); + ret = regmap_read(dev->map, DW_IC_SDA_HOLD, + &dev->sda_hold_time); + if (ret) + goto err_release_lock; } /* @@ -209,22 +272,25 @@ int i2c_dw_set_sda_hold(struct dw_i2c_dev *dev) dev->sda_hold_time = 0; } +err_release_lock: i2c_dw_release_lock(dev); - return 0; + return ret; } void __i2c_dw_disable(struct dw_i2c_dev *dev) { int timeout = 100; + u32 status; /* * Workaround: When a slave goes offline and the master tries to send * it data, the bus gets stuck. Issuing abort seems to work. */ - if (dw_readl(dev, DW_IC_STATUS) & DW_IC_STATUS_MASTER_ACTIVITY) { + regmap_read(dev->map, DW_IC_STATUS, &status); + if (status & DW_IC_STATUS_MASTER_ACTIVITY) { /* Issue abort. */ - dw_writel(dev, DW_IC_ENABLE_ENABLE | DW_IC_ENABLE_ABORT, DW_IC_ENABLE); + regmap_write(dev->map, DW_IC_ENABLE, DW_IC_ENABLE_ENABLE | DW_IC_ENABLE_ABORT); usleep_range(50000, 100000); } @@ -234,7 +300,8 @@ void __i2c_dw_disable(struct dw_i2c_dev *dev) * The enable status register may be unimplemented, but * in that case this test reads zero and exits the loop. */ - if ((dw_readl(dev, DW_IC_ENABLE_STATUS) & 1) == 0) + regmap_read(dev->map, DW_IC_ENABLE_STATUS, &status); + if ((status & 1) == 0) return; /* @@ -310,22 +377,23 @@ void i2c_dw_release_lock(struct dw_i2c_dev *dev) */ int i2c_dw_wait_bus_not_busy(struct dw_i2c_dev *dev) { - int timeout = TIMEOUT; + u32 status; + int ret; - while (dw_readl(dev, DW_IC_STATUS) & DW_IC_STATUS_ACTIVITY) { - if (timeout <= 0) { - dev_warn(dev->dev, "timeout waiting for bus ready\n"); - i2c_recover_bus(&dev->adapter); + ret = regmap_read_poll_timeout(dev->map, DW_IC_STATUS, status, + !(status & DW_IC_STATUS_ACTIVITY), + 1100, 20000); + if (ret) { + dev_warn(dev->dev, "timeout waiting for bus ready\n"); - if (dw_readl(dev, DW_IC_STATUS) & DW_IC_STATUS_ACTIVITY) - return -ETIMEDOUT; - return 0; - } - timeout--; - usleep_range(1000, 1100); + i2c_recover_bus(&dev->adapter); + + regmap_read(dev->map, DW_IC_STATUS, &status); + if (!(status & DW_IC_STATUS_ACTIVITY)) + ret = 0; } - return 0; + return ret; } int i2c_dw_handle_tx_abort(struct dw_i2c_dev *dev) @@ -351,6 +419,34 @@ int i2c_dw_handle_tx_abort(struct dw_i2c_dev *dev) return -EIO; } +int i2c_dw_set_fifo_size(struct dw_i2c_dev *dev) +{ + u32 param, tx_fifo_depth, rx_fifo_depth; + int ret; + + /* + * Try to detect the FIFO depth if not set by interface driver, + * the depth could be from 2 to 256 from HW spec. + */ + ret = regmap_read(dev->map, DW_IC_COMP_PARAM_1, ¶m); + if (ret) + return ret; + + tx_fifo_depth = ((param >> 16) & 0xff) + 1; + rx_fifo_depth = ((param >> 8) & 0xff) + 1; + if (!dev->tx_fifo_depth) { + dev->tx_fifo_depth = tx_fifo_depth; + dev->rx_fifo_depth = rx_fifo_depth; + } else if (tx_fifo_depth >= 2) { + dev->tx_fifo_depth = min_t(u32, dev->tx_fifo_depth, + tx_fifo_depth); + dev->rx_fifo_depth = min_t(u32, dev->rx_fifo_depth, + rx_fifo_depth); + } + + return 0; +} + u32 i2c_dw_func(struct i2c_adapter *adap) { struct dw_i2c_dev *dev = i2c_get_adapdata(adap); @@ -360,24 +456,20 @@ u32 i2c_dw_func(struct i2c_adapter *adap) void i2c_dw_disable(struct dw_i2c_dev *dev) { + u32 dummy; + /* Disable controller */ __i2c_dw_disable(dev); /* Disable all interupts */ - dw_writel(dev, 0, DW_IC_INTR_MASK); - dw_readl(dev, DW_IC_CLR_INTR); + regmap_write(dev->map, DW_IC_INTR_MASK, 0); + regmap_read(dev->map, DW_IC_CLR_INTR, &dummy); } void i2c_dw_disable_int(struct dw_i2c_dev *dev) { - dw_writel(dev, 0, DW_IC_INTR_MASK); -} - -u32 i2c_dw_read_comp_param(struct dw_i2c_dev *dev) -{ - return dw_readl(dev, DW_IC_COMP_PARAM_1); + regmap_write(dev->map, DW_IC_INTR_MASK, 0); } -EXPORT_SYMBOL_GPL(i2c_dw_read_comp_param); MODULE_DESCRIPTION("Synopsys DesignWare I2C bus adapter core"); MODULE_LICENSE("GPL"); diff --git a/drivers/i2c/busses/i2c-designware-core.h b/drivers/i2c/busses/i2c-designware-core.h index 8a9e492716a4f7916cbd2b65b708bbf5650496c0..bf48ff6fa51669ee85f0799b66b268af8dbc09a6 100644 --- a/drivers/i2c/busses/i2c-designware-core.h +++ b/drivers/i2c/busses/i2c-designware-core.h @@ -10,6 +10,7 @@ */ #include +#include #define DW_IC_DEFAULT_FUNCTIONALITY (I2C_FUNC_I2C | \ I2C_FUNC_SMBUS_BYTE | \ @@ -30,6 +31,7 @@ #define DW_IC_CON_STOP_DET_IFADDRESSED 0x80 #define DW_IC_CON_TX_EMPTY_CTRL 0x100 #define DW_IC_CON_RX_FIFO_FULL_HLD_CTRL 0x200 +#define DW_IC_CON_BUS_CLEAR_CTRL BIT(11) /* * Registers offset @@ -123,8 +125,6 @@ #define STATUS_WRITE_IN_PROGRESS 0x1 #define STATUS_READ_IN_PROGRESS 0x2 -#define TIMEOUT 20 /* ms */ - /* * operation modes */ @@ -177,7 +177,9 @@ /** * struct dw_i2c_dev - private i2c-designware data * @dev: driver model device node + * @map: IO registers map * @base: IO registers pointer + * @ext: Extended IO registers pointer * @cmd_complete: tx completion indicator * @clk: input reference clock * @pclk: clock required to access the registers @@ -227,6 +229,7 @@ */ struct dw_i2c_dev { struct device *dev; + struct regmap *map; void __iomem *base; void __iomem *ext; struct completion cmd_complete; @@ -279,18 +282,14 @@ struct dw_i2c_dev { bool suspended; }; -#define ACCESS_SWAP 0x00000001 -#define ACCESS_16BIT 0x00000002 -#define ACCESS_INTR_MASK 0x00000004 -#define ACCESS_NO_IRQ_SUSPEND 0x00000008 +#define ACCESS_INTR_MASK 0x00000001 +#define ACCESS_NO_IRQ_SUSPEND 0x00000002 #define MODEL_CHERRYTRAIL 0x00000100 #define MODEL_MSCC_OCELOT 0x00000200 #define MODEL_MASK 0x00000f00 -u32 dw_readl(struct dw_i2c_dev *dev, int offset); -void dw_writel(struct dw_i2c_dev *dev, u32 b, int offset); -int i2c_dw_set_reg_access(struct dw_i2c_dev *dev); +int i2c_dw_init_regmap(struct dw_i2c_dev *dev); u32 i2c_dw_scl_hcnt(u32 ic_clk, u32 tSYMBOL, u32 tf, int cond, int offset); u32 i2c_dw_scl_lcnt(u32 ic_clk, u32 tLOW, u32 tf, int offset); int i2c_dw_set_sda_hold(struct dw_i2c_dev *dev); @@ -300,23 +299,23 @@ int i2c_dw_acquire_lock(struct dw_i2c_dev *dev); void i2c_dw_release_lock(struct dw_i2c_dev *dev); int i2c_dw_wait_bus_not_busy(struct dw_i2c_dev *dev); int i2c_dw_handle_tx_abort(struct dw_i2c_dev *dev); +int i2c_dw_set_fifo_size(struct dw_i2c_dev *dev); u32 i2c_dw_func(struct i2c_adapter *adap); void i2c_dw_disable(struct dw_i2c_dev *dev); void i2c_dw_disable_int(struct dw_i2c_dev *dev); static inline void __i2c_dw_enable(struct dw_i2c_dev *dev) { - dw_writel(dev, 1, DW_IC_ENABLE); + regmap_write(dev->map, DW_IC_ENABLE, 1); } static inline void __i2c_dw_disable_nowait(struct dw_i2c_dev *dev) { - dw_writel(dev, 0, DW_IC_ENABLE); + regmap_write(dev->map, DW_IC_ENABLE, 0); } void __i2c_dw_disable(struct dw_i2c_dev *dev); -extern u32 i2c_dw_read_comp_param(struct dw_i2c_dev *dev); extern int i2c_dw_probe(struct dw_i2c_dev *dev); #if IS_ENABLED(CONFIG_I2C_DESIGNWARE_SLAVE) extern int i2c_dw_probe_slave(struct dw_i2c_dev *dev); diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c index 05bf8c9b332c1ec9df076ebc913d827ca3ef0250..1335ee5e313742064c7210e08491f7b7947eb11d 100644 --- a/drivers/i2c/busses/i2c-designware-master.c +++ b/drivers/i2c/busses/i2c-designware-master.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include "i2c-designware-core.h" @@ -25,11 +26,11 @@ static void i2c_dw_configure_fifo_master(struct dw_i2c_dev *dev) { /* Configure Tx/Rx FIFO threshold levels */ - dw_writel(dev, dev->tx_fifo_depth / 2, DW_IC_TX_TL); - dw_writel(dev, 0, DW_IC_RX_TL); + regmap_write(dev->map, DW_IC_TX_TL, dev->tx_fifo_depth / 2); + regmap_write(dev->map, DW_IC_RX_TL, 0); /* Configure the I2C master */ - dw_writel(dev, dev->master_cfg, DW_IC_CON); + regmap_write(dev->map, DW_IC_CON, dev->master_cfg); } static int i2c_dw_set_timings_master(struct dw_i2c_dev *dev) @@ -44,8 +45,11 @@ static int i2c_dw_set_timings_master(struct dw_i2c_dev *dev) ret = i2c_dw_acquire_lock(dev); if (ret) return ret; - comp_param1 = dw_readl(dev, DW_IC_COMP_PARAM_1); + + ret = regmap_read(dev->map, DW_IC_COMP_PARAM_1, &comp_param1); i2c_dw_release_lock(dev); + if (ret) + return ret; /* Set standard and fast speed dividers for high/low periods */ sda_falling_time = t->sda_fall_ns ?: 300; /* ns */ @@ -162,22 +166,22 @@ static int i2c_dw_init_master(struct dw_i2c_dev *dev) __i2c_dw_disable(dev); /* Write standard speed timing parameters */ - dw_writel(dev, dev->ss_hcnt, DW_IC_SS_SCL_HCNT); - dw_writel(dev, dev->ss_lcnt, DW_IC_SS_SCL_LCNT); + regmap_write(dev->map, DW_IC_SS_SCL_HCNT, dev->ss_hcnt); + regmap_write(dev->map, DW_IC_SS_SCL_LCNT, dev->ss_lcnt); /* Write fast mode/fast mode plus timing parameters */ - dw_writel(dev, dev->fs_hcnt, DW_IC_FS_SCL_HCNT); - dw_writel(dev, dev->fs_lcnt, DW_IC_FS_SCL_LCNT); + regmap_write(dev->map, DW_IC_FS_SCL_HCNT, dev->fs_hcnt); + regmap_write(dev->map, DW_IC_FS_SCL_LCNT, dev->fs_lcnt); /* Write high speed timing parameters if supported */ if (dev->hs_hcnt && dev->hs_lcnt) { - dw_writel(dev, dev->hs_hcnt, DW_IC_HS_SCL_HCNT); - dw_writel(dev, dev->hs_lcnt, DW_IC_HS_SCL_LCNT); + regmap_write(dev->map, DW_IC_HS_SCL_HCNT, dev->hs_hcnt); + regmap_write(dev->map, DW_IC_HS_SCL_LCNT, dev->hs_lcnt); } /* Write SDA hold time if supported */ if (dev->sda_hold_time) - dw_writel(dev, dev->sda_hold_time, DW_IC_SDA_HOLD); + regmap_write(dev->map, DW_IC_SDA_HOLD, dev->sda_hold_time); i2c_dw_configure_fifo_master(dev); i2c_dw_release_lock(dev); @@ -188,15 +192,15 @@ static int i2c_dw_init_master(struct dw_i2c_dev *dev) static void i2c_dw_xfer_init(struct dw_i2c_dev *dev) { struct i2c_msg *msgs = dev->msgs; - u32 ic_con, ic_tar = 0; + u32 ic_con = 0, ic_tar = 0; + u32 dummy; /* Disable the adapter */ __i2c_dw_disable(dev); /* If the slave address is ten bit address, enable 10BITADDR */ - ic_con = dw_readl(dev, DW_IC_CON); if (msgs[dev->msg_write_idx].flags & I2C_M_TEN) { - ic_con |= DW_IC_CON_10BITADDR_MASTER; + ic_con = DW_IC_CON_10BITADDR_MASTER; /* * If I2C_DYNAMIC_TAR_UPDATE is set, the 10-bit addressing * mode has to be enabled via bit 12 of IC_TAR register. @@ -204,17 +208,17 @@ static void i2c_dw_xfer_init(struct dw_i2c_dev *dev) * detected from registers. */ ic_tar = DW_IC_TAR_10BITADDR_MASTER; - } else { - ic_con &= ~DW_IC_CON_10BITADDR_MASTER; } - dw_writel(dev, ic_con, DW_IC_CON); + regmap_update_bits(dev->map, DW_IC_CON, DW_IC_CON_10BITADDR_MASTER, + ic_con); /* * Set the slave (target) address and enable 10-bit addressing mode * if applicable. */ - dw_writel(dev, msgs[dev->msg_write_idx].addr | ic_tar, DW_IC_TAR); + regmap_write(dev->map, DW_IC_TAR, + msgs[dev->msg_write_idx].addr | ic_tar); /* Enforce disabled interrupts (due to HW issues) */ i2c_dw_disable_int(dev); @@ -223,11 +227,11 @@ static void i2c_dw_xfer_init(struct dw_i2c_dev *dev) __i2c_dw_enable(dev); /* Dummy read to avoid the register getting stuck on Bay Trail */ - dw_readl(dev, DW_IC_ENABLE_STATUS); + regmap_read(dev->map, DW_IC_ENABLE_STATUS, &dummy); /* Clear and enable interrupts */ - dw_readl(dev, DW_IC_CLR_INTR); - dw_writel(dev, DW_IC_INTR_MASTER_MASK, DW_IC_INTR_MASK); + regmap_read(dev->map, DW_IC_CLR_INTR, &dummy); + regmap_write(dev->map, DW_IC_INTR_MASK, DW_IC_INTR_MASTER_MASK); } /* @@ -246,6 +250,7 @@ i2c_dw_xfer_msg(struct dw_i2c_dev *dev) u32 buf_len = dev->tx_buf_len; u8 *buf = dev->tx_buf; bool need_restart = false; + unsigned int flr; intr_mask = DW_IC_INTR_MASTER_MASK; @@ -278,8 +283,11 @@ i2c_dw_xfer_msg(struct dw_i2c_dev *dev) need_restart = true; } - tx_limit = dev->tx_fifo_depth - dw_readl(dev, DW_IC_TXFLR); - rx_limit = dev->rx_fifo_depth - dw_readl(dev, DW_IC_RXFLR); + regmap_read(dev->map, DW_IC_TXFLR, &flr); + tx_limit = dev->tx_fifo_depth - flr; + + regmap_read(dev->map, DW_IC_RXFLR, &flr); + rx_limit = dev->rx_fifo_depth - flr; while (buf_len > 0 && tx_limit > 0 && rx_limit > 0) { u32 cmd = 0; @@ -312,11 +320,14 @@ i2c_dw_xfer_msg(struct dw_i2c_dev *dev) if (dev->rx_outstanding >= dev->rx_fifo_depth) break; - dw_writel(dev, cmd | 0x100, DW_IC_DATA_CMD); + regmap_write(dev->map, DW_IC_DATA_CMD, + cmd | 0x100); rx_limit--; dev->rx_outstanding++; - } else - dw_writel(dev, cmd | *buf++, DW_IC_DATA_CMD); + } else { + regmap_write(dev->map, DW_IC_DATA_CMD, + cmd | *buf++); + } tx_limit--; buf_len--; } @@ -352,7 +363,7 @@ i2c_dw_xfer_msg(struct dw_i2c_dev *dev) if (dev->msg_err) intr_mask = 0; - dw_writel(dev, intr_mask, DW_IC_INTR_MASK); + regmap_write(dev->map, DW_IC_INTR_MASK, intr_mask); } static u8 @@ -375,9 +386,9 @@ i2c_dw_recv_len(struct dw_i2c_dev *dev, u8 len) * Received buffer length, re-enable TX_EMPTY interrupt * to resume the SMBUS transaction. */ - intr_mask = dw_readl(dev, DW_IC_INTR_MASK); + regmap_read(dev->map, DW_IC_INTR_MASK, &intr_mask); intr_mask |= DW_IC_INTR_TX_EMPTY; - dw_writel(dev, intr_mask, DW_IC_INTR_MASK); + regmap_write(dev->map, DW_IC_INTR_MASK, intr_mask); return len; } @@ -386,10 +397,10 @@ static void i2c_dw_read(struct dw_i2c_dev *dev) { struct i2c_msg *msgs = dev->msgs; - int rx_valid; + unsigned int rx_valid; for (; dev->msg_read_idx < dev->msgs_num; dev->msg_read_idx++) { - u32 len; + u32 len, tmp; u8 *buf; if (!(msgs[dev->msg_read_idx].flags & I2C_M_RD)) @@ -403,18 +414,18 @@ i2c_dw_read(struct dw_i2c_dev *dev) buf = dev->rx_buf; } - rx_valid = dw_readl(dev, DW_IC_RXFLR); + regmap_read(dev->map, DW_IC_RXFLR, &rx_valid); for (; len > 0 && rx_valid > 0; len--, rx_valid--) { u32 flags = msgs[dev->msg_read_idx].flags; - *buf = dw_readl(dev, DW_IC_DATA_CMD); + regmap_read(dev->map, DW_IC_DATA_CMD, &tmp); /* Ensure length byte is a valid value */ if (flags & I2C_M_RECV_LEN && - *buf <= I2C_SMBUS_BLOCK_MAX && *buf > 0) { - len = i2c_dw_recv_len(dev, *buf); + tmp <= I2C_SMBUS_BLOCK_MAX && tmp > 0) { + len = i2c_dw_recv_len(dev, tmp); } - buf++; + *buf++ = tmp; dev->rx_outstanding--; } @@ -532,7 +543,7 @@ static const struct i2c_adapter_quirks i2c_dw_quirks = { static u32 i2c_dw_read_clear_intrbits(struct dw_i2c_dev *dev) { - u32 stat; + u32 stat, dummy; /* * The IC_INTR_STAT register just indicates "enabled" interrupts. @@ -540,47 +551,47 @@ static u32 i2c_dw_read_clear_intrbits(struct dw_i2c_dev *dev) * in the IC_RAW_INTR_STAT register. * * That is, - * stat = dw_readl(IC_INTR_STAT); + * stat = readl(IC_INTR_STAT); * equals to, - * stat = dw_readl(IC_RAW_INTR_STAT) & dw_readl(IC_INTR_MASK); + * stat = readl(IC_RAW_INTR_STAT) & readl(IC_INTR_MASK); * * The raw version might be useful for debugging purposes. */ - stat = dw_readl(dev, DW_IC_INTR_STAT); + regmap_read(dev->map, DW_IC_INTR_STAT, &stat); /* * Do not use the IC_CLR_INTR register to clear interrupts, or * you'll miss some interrupts, triggered during the period from - * dw_readl(IC_INTR_STAT) to dw_readl(IC_CLR_INTR). + * readl(IC_INTR_STAT) to readl(IC_CLR_INTR). * * Instead, use the separately-prepared IC_CLR_* registers. */ if (stat & DW_IC_INTR_RX_UNDER) - dw_readl(dev, DW_IC_CLR_RX_UNDER); + regmap_read(dev->map, DW_IC_CLR_RX_UNDER, &dummy); if (stat & DW_IC_INTR_RX_OVER) - dw_readl(dev, DW_IC_CLR_RX_OVER); + regmap_read(dev->map, DW_IC_CLR_RX_OVER, &dummy); if (stat & DW_IC_INTR_TX_OVER) - dw_readl(dev, DW_IC_CLR_TX_OVER); + regmap_read(dev->map, DW_IC_CLR_TX_OVER, &dummy); if (stat & DW_IC_INTR_RD_REQ) - dw_readl(dev, DW_IC_CLR_RD_REQ); + regmap_read(dev->map, DW_IC_CLR_RD_REQ, &dummy); if (stat & DW_IC_INTR_TX_ABRT) { /* * The IC_TX_ABRT_SOURCE register is cleared whenever * the IC_CLR_TX_ABRT is read. Preserve it beforehand. */ - dev->abort_source = dw_readl(dev, DW_IC_TX_ABRT_SOURCE); - dw_readl(dev, DW_IC_CLR_TX_ABRT); + regmap_read(dev->map, DW_IC_TX_ABRT_SOURCE, &dev->abort_source); + regmap_read(dev->map, DW_IC_CLR_TX_ABRT, &dummy); } if (stat & DW_IC_INTR_RX_DONE) - dw_readl(dev, DW_IC_CLR_RX_DONE); + regmap_read(dev->map, DW_IC_CLR_RX_DONE, &dummy); if (stat & DW_IC_INTR_ACTIVITY) - dw_readl(dev, DW_IC_CLR_ACTIVITY); + regmap_read(dev->map, DW_IC_CLR_ACTIVITY, &dummy); if (stat & DW_IC_INTR_STOP_DET) - dw_readl(dev, DW_IC_CLR_STOP_DET); + regmap_read(dev->map, DW_IC_CLR_STOP_DET, &dummy); if (stat & DW_IC_INTR_START_DET) - dw_readl(dev, DW_IC_CLR_START_DET); + regmap_read(dev->map, DW_IC_CLR_START_DET, &dummy); if (stat & DW_IC_INTR_GEN_CALL) - dw_readl(dev, DW_IC_CLR_GEN_CALL); + regmap_read(dev->map, DW_IC_CLR_GEN_CALL, &dummy); return stat; } @@ -602,7 +613,7 @@ static int i2c_dw_irq_handler_master(struct dw_i2c_dev *dev) * Anytime TX_ABRT is set, the contents of the tx/rx * buffers are flushed. Make sure to skip them. */ - dw_writel(dev, 0, DW_IC_INTR_MASK); + regmap_write(dev->map, DW_IC_INTR_MASK, 0); goto tx_aborted; } @@ -623,9 +634,9 @@ static int i2c_dw_irq_handler_master(struct dw_i2c_dev *dev) complete(&dev->cmd_complete); else if (unlikely(dev->flags & ACCESS_INTR_MASK)) { /* Workaround to trigger pending interrupt */ - stat = dw_readl(dev, DW_IC_INTR_MASK); + regmap_read(dev->map, DW_IC_INTR_MASK, &stat); i2c_dw_disable_int(dev); - dw_writel(dev, stat, DW_IC_INTR_MASK); + regmap_write(dev->map, DW_IC_INTR_MASK, stat); } return 0; @@ -636,8 +647,8 @@ static irqreturn_t i2c_dw_isr(int this_irq, void *dev_id) struct dw_i2c_dev *dev = dev_id; u32 stat, enabled; - enabled = dw_readl(dev, DW_IC_ENABLE); - stat = dw_readl(dev, DW_IC_RAW_INTR_STAT); + regmap_read(dev->map, DW_IC_ENABLE, &enabled); + regmap_read(dev->map, DW_IC_RAW_INTR_STAT, &stat); dev_dbg(dev->dev, "enabled=%#x stat=%#x\n", enabled, stat); if (!enabled || !(stat & ~DW_IC_INTR_ACTIVITY)) return IRQ_NONE; @@ -697,6 +708,7 @@ int i2c_dw_probe(struct dw_i2c_dev *dev) { struct i2c_adapter *adap = &dev->adapter; unsigned long irq_flags; + unsigned int ic_con; int ret; init_completion(&dev->cmd_complete); @@ -705,7 +717,7 @@ int i2c_dw_probe(struct dw_i2c_dev *dev) dev->disable = i2c_dw_disable; dev->disable_int = i2c_dw_disable_int; - ret = i2c_dw_set_reg_access(dev); + ret = i2c_dw_init_regmap(dev); if (ret) return ret; @@ -713,6 +725,29 @@ int i2c_dw_probe(struct dw_i2c_dev *dev) if (ret) return ret; + ret = i2c_dw_set_fifo_size(dev); + if (ret) + return ret; + + /* Lock the bus for accessing DW_IC_CON */ + ret = i2c_dw_acquire_lock(dev); + if (ret) + return ret; + + /* + * On AMD platforms BIOS advertises the bus clear feature + * and enables the SCL/SDA stuck low. SMU FW does the + * bus recovery process. Driver should not ignore this BIOS + * advertisement of bus clear feature. + */ + ret = regmap_read(dev->map, DW_IC_CON, &ic_con); + i2c_dw_release_lock(dev); + if (ret) + return ret; + + if (ic_con & DW_IC_CON_BUS_CLEAR_CTRL) + dev->master_cfg |= DW_IC_CON_BUS_CLEAR_CTRL; + ret = dev->init(dev); if (ret) return ret; diff --git a/drivers/i2c/busses/i2c-designware-platdrv.c b/drivers/i2c/busses/i2c-designware-platdrv.c index 871c714093a4784d9322b469dd48740a77db5383..7d2359db2d2ab8417b710f2fc3391063a76c1cbd 100644 --- a/drivers/i2c/busses/i2c-designware-platdrv.c +++ b/drivers/i2c/busses/i2c-designware-platdrv.c @@ -220,28 +220,6 @@ static void i2c_dw_configure_slave(struct dw_i2c_dev *dev) dev->mode = DW_IC_SLAVE; } -static void dw_i2c_set_fifo_size(struct dw_i2c_dev *dev) -{ - u32 param, tx_fifo_depth, rx_fifo_depth; - - /* - * Try to detect the FIFO depth if not set by interface driver, - * the depth could be from 2 to 256 from HW spec. - */ - param = i2c_dw_read_comp_param(dev); - tx_fifo_depth = ((param >> 16) & 0xff) + 1; - rx_fifo_depth = ((param >> 8) & 0xff) + 1; - if (!dev->tx_fifo_depth) { - dev->tx_fifo_depth = tx_fifo_depth; - dev->rx_fifo_depth = rx_fifo_depth; - } else if (tx_fifo_depth >= 2) { - dev->tx_fifo_depth = min_t(u32, dev->tx_fifo_depth, - tx_fifo_depth); - dev->rx_fifo_depth = min_t(u32, dev->rx_fifo_depth, - rx_fifo_depth); - } -} - static void dw_i2c_plat_pm_cleanup(struct dw_i2c_dev *dev) { pm_runtime_disable(dev->dev); @@ -372,8 +350,6 @@ static int dw_i2c_plat_probe(struct platform_device *pdev) div_u64(clk_khz * t->sda_hold_ns + 500000, 1000000); } - dw_i2c_set_fifo_size(dev); - adap = &dev->adapter; adap->owner = THIS_MODULE; adap->class = I2C_CLASS_DEPRECATED; diff --git a/drivers/i2c/busses/i2c-designware-slave.c b/drivers/i2c/busses/i2c-designware-slave.c index f5f001738df5e2b1b8c6e9db4973a9ce8f065140..aab5b3473b07adebc5385087939a45688da12194 100644 --- a/drivers/i2c/busses/i2c-designware-slave.c +++ b/drivers/i2c/busses/i2c-designware-slave.c @@ -14,18 +14,19 @@ #include #include #include +#include #include "i2c-designware-core.h" static void i2c_dw_configure_fifo_slave(struct dw_i2c_dev *dev) { /* Configure Tx/Rx FIFO threshold levels. */ - dw_writel(dev, 0, DW_IC_TX_TL); - dw_writel(dev, 0, DW_IC_RX_TL); + regmap_write(dev->map, DW_IC_TX_TL, 0); + regmap_write(dev->map, DW_IC_RX_TL, 0); /* Configure the I2C slave. */ - dw_writel(dev, dev->slave_cfg, DW_IC_CON); - dw_writel(dev, DW_IC_INTR_SLAVE_MASK, DW_IC_INTR_MASK); + regmap_write(dev->map, DW_IC_CON, dev->slave_cfg); + regmap_write(dev->map, DW_IC_INTR_MASK, DW_IC_INTR_SLAVE_MASK); } /** @@ -49,7 +50,7 @@ static int i2c_dw_init_slave(struct dw_i2c_dev *dev) /* Write SDA hold time if supported */ if (dev->sda_hold_time) - dw_writel(dev, dev->sda_hold_time, DW_IC_SDA_HOLD); + regmap_write(dev->map, DW_IC_SDA_HOLD, dev->sda_hold_time); i2c_dw_configure_fifo_slave(dev); i2c_dw_release_lock(dev); @@ -72,7 +73,7 @@ static int i2c_dw_reg_slave(struct i2c_client *slave) * the address to which the DW_apb_i2c responds. */ __i2c_dw_disable_nowait(dev); - dw_writel(dev, slave->addr, DW_IC_SAR); + regmap_write(dev->map, DW_IC_SAR, slave->addr); dev->slave = slave; __i2c_dw_enable(dev); @@ -103,7 +104,7 @@ static int i2c_dw_unreg_slave(struct i2c_client *slave) static u32 i2c_dw_read_clear_intrbits_slave(struct dw_i2c_dev *dev) { - u32 stat; + u32 stat, dummy; /* * The IC_INTR_STAT register just indicates "enabled" interrupts. @@ -111,39 +112,39 @@ static u32 i2c_dw_read_clear_intrbits_slave(struct dw_i2c_dev *dev) * in the IC_RAW_INTR_STAT register. * * That is, - * stat = dw_readl(IC_INTR_STAT); + * stat = readl(IC_INTR_STAT); * equals to, - * stat = dw_readl(IC_RAW_INTR_STAT) & dw_readl(IC_INTR_MASK); + * stat = readl(IC_RAW_INTR_STAT) & readl(IC_INTR_MASK); * * The raw version might be useful for debugging purposes. */ - stat = dw_readl(dev, DW_IC_INTR_STAT); + regmap_read(dev->map, DW_IC_INTR_STAT, &stat); /* * Do not use the IC_CLR_INTR register to clear interrupts, or * you'll miss some interrupts, triggered during the period from - * dw_readl(IC_INTR_STAT) to dw_readl(IC_CLR_INTR). + * readl(IC_INTR_STAT) to readl(IC_CLR_INTR). * * Instead, use the separately-prepared IC_CLR_* registers. */ if (stat & DW_IC_INTR_TX_ABRT) - dw_readl(dev, DW_IC_CLR_TX_ABRT); + regmap_read(dev->map, DW_IC_CLR_TX_ABRT, &dummy); if (stat & DW_IC_INTR_RX_UNDER) - dw_readl(dev, DW_IC_CLR_RX_UNDER); + regmap_read(dev->map, DW_IC_CLR_RX_UNDER, &dummy); if (stat & DW_IC_INTR_RX_OVER) - dw_readl(dev, DW_IC_CLR_RX_OVER); + regmap_read(dev->map, DW_IC_CLR_RX_OVER, &dummy); if (stat & DW_IC_INTR_TX_OVER) - dw_readl(dev, DW_IC_CLR_TX_OVER); + regmap_read(dev->map, DW_IC_CLR_TX_OVER, &dummy); if (stat & DW_IC_INTR_RX_DONE) - dw_readl(dev, DW_IC_CLR_RX_DONE); + regmap_read(dev->map, DW_IC_CLR_RX_DONE, &dummy); if (stat & DW_IC_INTR_ACTIVITY) - dw_readl(dev, DW_IC_CLR_ACTIVITY); + regmap_read(dev->map, DW_IC_CLR_ACTIVITY, &dummy); if (stat & DW_IC_INTR_STOP_DET) - dw_readl(dev, DW_IC_CLR_STOP_DET); + regmap_read(dev->map, DW_IC_CLR_STOP_DET, &dummy); if (stat & DW_IC_INTR_START_DET) - dw_readl(dev, DW_IC_CLR_START_DET); + regmap_read(dev->map, DW_IC_CLR_START_DET, &dummy); if (stat & DW_IC_INTR_GEN_CALL) - dw_readl(dev, DW_IC_CLR_GEN_CALL); + regmap_read(dev->map, DW_IC_CLR_GEN_CALL, &dummy); return stat; } @@ -155,14 +156,14 @@ static u32 i2c_dw_read_clear_intrbits_slave(struct dw_i2c_dev *dev) static int i2c_dw_irq_handler_slave(struct dw_i2c_dev *dev) { - u32 raw_stat, stat, enabled; - u8 val, slave_activity; + u32 raw_stat, stat, enabled, tmp; + u8 val = 0, slave_activity; - stat = dw_readl(dev, DW_IC_INTR_STAT); - enabled = dw_readl(dev, DW_IC_ENABLE); - raw_stat = dw_readl(dev, DW_IC_RAW_INTR_STAT); - slave_activity = ((dw_readl(dev, DW_IC_STATUS) & - DW_IC_STATUS_SLAVE_ACTIVITY) >> 6); + regmap_read(dev->map, DW_IC_INTR_STAT, &stat); + regmap_read(dev->map, DW_IC_ENABLE, &enabled); + regmap_read(dev->map, DW_IC_RAW_INTR_STAT, &raw_stat); + regmap_read(dev->map, DW_IC_STATUS, &tmp); + slave_activity = ((tmp & DW_IC_STATUS_SLAVE_ACTIVITY) >> 6); if (!enabled || !(raw_stat & ~DW_IC_INTR_ACTIVITY) || !dev->slave) return 0; @@ -177,7 +178,8 @@ static int i2c_dw_irq_handler_slave(struct dw_i2c_dev *dev) if (stat & DW_IC_INTR_RD_REQ) { if (slave_activity) { if (stat & DW_IC_INTR_RX_FULL) { - val = dw_readl(dev, DW_IC_DATA_CMD); + regmap_read(dev->map, DW_IC_DATA_CMD, &tmp); + val = tmp; if (!i2c_slave_event(dev->slave, I2C_SLAVE_WRITE_RECEIVED, @@ -185,24 +187,24 @@ static int i2c_dw_irq_handler_slave(struct dw_i2c_dev *dev) dev_vdbg(dev->dev, "Byte %X acked!", val); } - dw_readl(dev, DW_IC_CLR_RD_REQ); + regmap_read(dev->map, DW_IC_CLR_RD_REQ, &tmp); stat = i2c_dw_read_clear_intrbits_slave(dev); } else { - dw_readl(dev, DW_IC_CLR_RD_REQ); - dw_readl(dev, DW_IC_CLR_RX_UNDER); + regmap_read(dev->map, DW_IC_CLR_RD_REQ, &tmp); + regmap_read(dev->map, DW_IC_CLR_RX_UNDER, &tmp); stat = i2c_dw_read_clear_intrbits_slave(dev); } if (!i2c_slave_event(dev->slave, I2C_SLAVE_READ_REQUESTED, &val)) - dw_writel(dev, val, DW_IC_DATA_CMD); + regmap_write(dev->map, DW_IC_DATA_CMD, val); } } if (stat & DW_IC_INTR_RX_DONE) { if (!i2c_slave_event(dev->slave, I2C_SLAVE_READ_PROCESSED, &val)) - dw_readl(dev, DW_IC_CLR_RX_DONE); + regmap_read(dev->map, DW_IC_CLR_RX_DONE, &tmp); i2c_slave_event(dev->slave, I2C_SLAVE_STOP, &val); stat = i2c_dw_read_clear_intrbits_slave(dev); @@ -210,7 +212,8 @@ static int i2c_dw_irq_handler_slave(struct dw_i2c_dev *dev) } if (stat & DW_IC_INTR_RX_FULL) { - val = dw_readl(dev, DW_IC_DATA_CMD); + regmap_read(dev->map, DW_IC_DATA_CMD, &tmp); + val = tmp; if (!i2c_slave_event(dev->slave, I2C_SLAVE_WRITE_RECEIVED, &val)) dev_vdbg(dev->dev, "Byte %X acked!", val); @@ -252,7 +255,7 @@ int i2c_dw_probe_slave(struct dw_i2c_dev *dev) dev->disable = i2c_dw_disable; dev->disable_int = i2c_dw_disable_int; - ret = i2c_dw_set_reg_access(dev); + ret = i2c_dw_init_regmap(dev); if (ret) return ret; @@ -260,6 +263,10 @@ int i2c_dw_probe_slave(struct dw_i2c_dev *dev) if (ret) return ret; + ret = i2c_dw_set_fifo_size(dev); + if (ret) + return ret; + ret = dev->init(dev); if (ret) return ret; diff --git a/drivers/tee/Kconfig b/drivers/tee/Kconfig index 676ffcb649857aedd872c43bdb12d72aa0bc23df..19c796692aa542bf8f656dfe21c562e2ca6cacfa 100644 --- a/drivers/tee/Kconfig +++ b/drivers/tee/Kconfig @@ -14,7 +14,7 @@ if TEE menu "TEE drivers" source "drivers/tee/optee/Kconfig" - +source "drivers/tee/amdtee/Kconfig" endmenu endif diff --git a/drivers/tee/Makefile b/drivers/tee/Makefile index 21f51fd88b0746ddf8f52b558d19940559ec00c4..68da044afbfaeab80fa5187997b5f4f628b7146a 100644 --- a/drivers/tee/Makefile +++ b/drivers/tee/Makefile @@ -4,3 +4,4 @@ tee-objs += tee_core.o tee-objs += tee_shm.o tee-objs += tee_shm_pool.o obj-$(CONFIG_OPTEE) += optee/ +obj-$(CONFIG_AMDTEE) += amdtee/ diff --git a/drivers/tee/amdtee/Kconfig b/drivers/tee/amdtee/Kconfig new file mode 100644 index 0000000000000000000000000000000000000000..191f9715fa9afcb3ee98fb2ee509e64a9a448b94 --- /dev/null +++ b/drivers/tee/amdtee/Kconfig @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: MIT +# AMD-TEE Trusted Execution Environment Configuration +config AMDTEE + tristate "AMD-TEE" + default m + depends on CRYPTO_DEV_SP_PSP && CRYPTO_DEV_CCP_DD + help + This implements AMD's Trusted Execution Environment (TEE) driver. diff --git a/drivers/tee/amdtee/Makefile b/drivers/tee/amdtee/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..ff14852661170fc6cb07812903b8ffa9202d78d7 --- /dev/null +++ b/drivers/tee/amdtee/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: MIT +obj-$(CONFIG_AMDTEE) += amdtee.o +amdtee-objs += core.o +amdtee-objs += call.o +amdtee-objs += shm_pool.o diff --git a/drivers/tee/amdtee/amdtee_if.h b/drivers/tee/amdtee/amdtee_if.h new file mode 100644 index 0000000000000000000000000000000000000000..e2014e21530acb86d50eb5995e2124f52acb5e7c --- /dev/null +++ b/drivers/tee/amdtee/amdtee_if.h @@ -0,0 +1,185 @@ +/* SPDX-License-Identifier: MIT */ + +/* + * Copyright 2019 Advanced Micro Devices, Inc. + */ + +/* + * This file has definitions related to Host and AMD-TEE Trusted OS interface. + * These definitions must match the definitions on the TEE side. + */ + +#ifndef AMDTEE_IF_H +#define AMDTEE_IF_H + +#include + +/***************************************************************************** + ** TEE Param + ******************************************************************************/ +#define TEE_MAX_PARAMS 4 + +/** + * struct memref - memory reference structure + * @buf_id: buffer ID of the buffer mapped by TEE_CMD_ID_MAP_SHARED_MEM + * @offset: offset in bytes from beginning of the buffer + * @size: data size in bytes + */ +struct memref { + u32 buf_id; + u32 offset; + u32 size; +}; + +struct value { + u32 a; + u32 b; +}; + +/* + * Parameters passed to open_session or invoke_command + */ +union tee_op_param { + struct memref mref; + struct value val; +}; + +struct tee_operation { + u32 param_types; + union tee_op_param params[TEE_MAX_PARAMS]; +}; + +/* Must be same as in GP TEE specification */ +#define TEE_OP_PARAM_TYPE_NONE 0 +#define TEE_OP_PARAM_TYPE_VALUE_INPUT 1 +#define TEE_OP_PARAM_TYPE_VALUE_OUTPUT 2 +#define TEE_OP_PARAM_TYPE_VALUE_INOUT 3 +#define TEE_OP_PARAM_TYPE_INVALID 4 +#define TEE_OP_PARAM_TYPE_MEMREF_INPUT 5 +#define TEE_OP_PARAM_TYPE_MEMREF_OUTPUT 6 +#define TEE_OP_PARAM_TYPE_MEMREF_INOUT 7 + +#define TEE_PARAM_TYPE_GET(t, i) (((t) >> ((i) * 4)) & 0xF) +#define TEE_PARAM_TYPES(t0, t1, t2, t3) \ + ((t0) | ((t1) << 4) | ((t2) << 8) | ((t3) << 12)) + +/***************************************************************************** + ** TEE Commands + *****************************************************************************/ + +/* + * The shared memory between rich world and secure world may be physically + * non-contiguous. Below structures are meant to describe a shared memory region + * via scatter/gather (sg) list + */ + +/** + * struct tee_sg_desc - sg descriptor for a physically contiguous buffer + * @low_addr: [in] bits[31:0] of buffer's physical address. Must be 4KB aligned + * @hi_addr: [in] bits[63:32] of the buffer's physical address + * @size: [in] size in bytes (must be multiple of 4KB) + */ +struct tee_sg_desc { + u32 low_addr; + u32 hi_addr; + u32 size; +}; + +/** + * struct tee_sg_list - structure describing a scatter/gather list + * @count: [in] number of sg descriptors + * @size: [in] total size of all buffers in the list. Must be multiple of 4KB + * @buf: [in] list of sg buffer descriptors + */ +#define TEE_MAX_SG_DESC 64 +struct tee_sg_list { + u32 count; + u32 size; + struct tee_sg_desc buf[TEE_MAX_SG_DESC]; +}; + +/** + * struct tee_cmd_map_shared_mem - command to map shared memory + * @buf_id: [out] return buffer ID value + * @sg_list: [in] list describing memory to be mapped + */ +struct tee_cmd_map_shared_mem { + u32 buf_id; + struct tee_sg_list sg_list; +}; + +/** + * struct tee_cmd_unmap_shared_mem - command to unmap shared memory + * @buf_id: [in] buffer ID of memory to be unmapped + */ +struct tee_cmd_unmap_shared_mem { + u32 buf_id; +}; + +/** + * struct tee_cmd_load_ta - load Trusted Application (TA) binary into TEE + * @low_addr: [in] bits [31:0] of the physical address of the TA binary + * @hi_addr: [in] bits [63:32] of the physical address of the TA binary + * @size: [in] size of TA binary in bytes + * @ta_handle: [out] return handle of the loaded TA + * @return_origin: [out] origin of return code after TEE processing + */ +struct tee_cmd_load_ta { + u32 low_addr; + u32 hi_addr; + u32 size; + u32 ta_handle; + u32 return_origin; +}; + +/** + * struct tee_cmd_unload_ta - command to unload TA binary from TEE environment + * @ta_handle: [in] handle of the loaded TA to be unloaded + */ +struct tee_cmd_unload_ta { + u32 ta_handle; +}; + +/** + * struct tee_cmd_open_session - command to call TA_OpenSessionEntryPoint in TA + * @ta_handle: [in] handle of the loaded TA + * @session_info: [out] pointer to TA allocated session data + * @op: [in/out] operation parameters + * @return_origin: [out] origin of return code after TEE processing + */ +struct tee_cmd_open_session { + u32 ta_handle; + u32 session_info; + struct tee_operation op; + u32 return_origin; +}; + +/** + * struct tee_cmd_close_session - command to call TA_CloseSessionEntryPoint() + * in TA + * @ta_handle: [in] handle of the loaded TA + * @session_info: [in] pointer to TA allocated session data + */ +struct tee_cmd_close_session { + u32 ta_handle; + u32 session_info; +}; + +/** + * struct tee_cmd_invoke_cmd - command to call TA_InvokeCommandEntryPoint() in + * TA + * @ta_handle: [in] handle of the loaded TA + * @cmd_id: [in] TA command ID + * @session_info: [in] pointer to TA allocated session data + * @op: [in/out] operation parameters + * @return_origin: [out] origin of return code after TEE processing + */ +struct tee_cmd_invoke_cmd { + u32 ta_handle; + u32 cmd_id; + u32 session_info; + struct tee_operation op; + u32 return_origin; +}; + +#endif /*AMDTEE_IF_H*/ diff --git a/drivers/tee/amdtee/amdtee_private.h b/drivers/tee/amdtee/amdtee_private.h new file mode 100644 index 0000000000000000000000000000000000000000..6d0f7062bb870749e3a5117cf6dd98fe9d944a4b --- /dev/null +++ b/drivers/tee/amdtee/amdtee_private.h @@ -0,0 +1,172 @@ +/* SPDX-License-Identifier: MIT */ + +/* + * Copyright 2019 Advanced Micro Devices, Inc. + */ + +#ifndef AMDTEE_PRIVATE_H +#define AMDTEE_PRIVATE_H + +#include +#include +#include +#include +#include +#include "amdtee_if.h" + +#define DRIVER_NAME "amdtee" +#define DRIVER_AUTHOR "AMD-TEE Linux driver team" + +/* Some GlobalPlatform error codes used in this driver */ +#define TEEC_SUCCESS 0x00000000 +#define TEEC_ERROR_GENERIC 0xFFFF0000 +#define TEEC_ERROR_BAD_PARAMETERS 0xFFFF0006 +#define TEEC_ERROR_OUT_OF_MEMORY 0xFFFF000C +#define TEEC_ERROR_COMMUNICATION 0xFFFF000E + +#define TEEC_ORIGIN_COMMS 0x00000002 + +/* Maximum number of sessions which can be opened with a Trusted Application */ +#define TEE_NUM_SESSIONS 32 + +#define TA_LOAD_PATH "/amdtee" +#define TA_PATH_MAX 60 + +/** + * struct amdtee - main service struct + * @teedev: client device + * @pool: shared memory pool + */ +struct amdtee { + struct tee_device *teedev; + struct tee_shm_pool *pool; +}; + +/** + * struct amdtee_session - Trusted Application (TA) session related information. + * @ta_handle: handle to Trusted Application (TA) loaded in TEE environment + * @refcount: counter to keep track of sessions opened for the TA instance + * @session_info: an array pointing to TA allocated session data. + * @sess_mask: session usage bit-mask. If a particular bit is set, then the + * corresponding @session_info entry is in use or valid. + * + * Session structure is updated on open_session and this information is used for + * subsequent operations with the Trusted Application. + */ +struct amdtee_session { + struct list_head list_node; + u32 ta_handle; + struct kref refcount; + u32 session_info[TEE_NUM_SESSIONS]; + DECLARE_BITMAP(sess_mask, TEE_NUM_SESSIONS); + spinlock_t lock; /* synchronizes access to @sess_mask */ +}; + +/** + * struct amdtee_context_data - AMD-TEE driver context data + * @sess_list: Keeps track of sessions opened in current TEE context + * @shm_list: Keeps track of buffers allocated and mapped in current TEE + * context + */ +struct amdtee_context_data { + struct list_head sess_list; + struct list_head shm_list; + struct mutex shm_mutex; /* synchronizes access to @shm_list */ +}; + +struct amdtee_driver_data { + struct amdtee *amdtee; +}; + +struct shmem_desc { + void *kaddr; + u64 size; +}; + +/** + * struct amdtee_shm_data - Shared memory data + * @kaddr: Kernel virtual address of shared memory + * @buf_id: Buffer id of memory mapped by TEE_CMD_ID_MAP_SHARED_MEM + */ +struct amdtee_shm_data { + struct list_head shm_node; + void *kaddr; + u32 buf_id; +}; + +/** + * struct amdtee_ta_data - Keeps track of all TAs loaded in AMD Secure + * Processor + * @ta_handle: Handle to TA loaded in TEE + * @refcount: Reference count for the loaded TA + */ +struct amdtee_ta_data { + struct list_head list_node; + u32 ta_handle; + u32 refcount; +}; + +#define LOWER_TWO_BYTE_MASK 0x0000FFFF + +/** + * set_session_id() - Sets the session identifier. + * @ta_handle: [in] handle of the loaded Trusted Application (TA) + * @session_index: [in] Session index. Range: 0 to (TEE_NUM_SESSIONS - 1). + * @session: [out] Pointer to session id + * + * Lower two bytes of the session identifier represents the TA handle and the + * upper two bytes is session index. + */ +static inline void set_session_id(u32 ta_handle, u32 session_index, + u32 *session) +{ + *session = (session_index << 16) | (LOWER_TWO_BYTE_MASK & ta_handle); +} + +static inline u32 get_ta_handle(u32 session) +{ + return session & LOWER_TWO_BYTE_MASK; +} + +static inline u32 get_session_index(u32 session) +{ + return (session >> 16) & LOWER_TWO_BYTE_MASK; +} + +int amdtee_open_session(struct tee_context *ctx, + struct tee_ioctl_open_session_arg *arg, + struct tee_param *param); + +int amdtee_close_session(struct tee_context *ctx, u32 session); + +int amdtee_invoke_func(struct tee_context *ctx, + struct tee_ioctl_invoke_arg *arg, + struct tee_param *param); + +int amdtee_cancel_req(struct tee_context *ctx, u32 cancel_id, u32 session); + +int amdtee_map_shmem(struct tee_shm *shm); + +void amdtee_unmap_shmem(struct tee_shm *shm); + +int handle_load_ta(void *data, u32 size, + struct tee_ioctl_open_session_arg *arg); + +int handle_unload_ta(u32 ta_handle); + +int handle_open_session(struct tee_ioctl_open_session_arg *arg, u32 *info, + struct tee_param *p); + +int handle_close_session(u32 ta_handle, u32 info); + +int handle_map_shmem(u32 count, struct shmem_desc *start, u32 *buf_id); + +void handle_unmap_shmem(u32 buf_id); + +int handle_invoke_cmd(struct tee_ioctl_invoke_arg *arg, u32 sinfo, + struct tee_param *p); + +struct tee_shm_pool *amdtee_config_shm(void); + +u32 get_buffer_id(struct tee_shm *shm); +#endif /*AMDTEE_PRIVATE_H*/ diff --git a/drivers/tee/amdtee/call.c b/drivers/tee/amdtee/call.c new file mode 100644 index 0000000000000000000000000000000000000000..63d428423e904b9ce035e5e3371cf39ae771d864 --- /dev/null +++ b/drivers/tee/amdtee/call.c @@ -0,0 +1,451 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright 2019 Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include "amdtee_if.h" +#include "amdtee_private.h" + +static int tee_params_to_amd_params(struct tee_param *tee, u32 count, + struct tee_operation *amd) +{ + int i, ret = 0; + u32 type; + + if (!count) + return 0; + + if (!tee || !amd || count > TEE_MAX_PARAMS) + return -EINVAL; + + amd->param_types = 0; + for (i = 0; i < count; i++) { + /* AMD TEE does not support meta parameter */ + if (tee[i].attr > TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT) + return -EINVAL; + + amd->param_types |= ((tee[i].attr & 0xF) << i * 4); + } + + for (i = 0; i < count; i++) { + type = TEE_PARAM_TYPE_GET(amd->param_types, i); + pr_debug("%s: type[%d] = 0x%x\n", __func__, i, type); + + if (type == TEE_OP_PARAM_TYPE_INVALID) + return -EINVAL; + + if (type == TEE_OP_PARAM_TYPE_NONE) + continue; + + /* It is assumed that all values are within 2^32-1 */ + if (type > TEE_OP_PARAM_TYPE_VALUE_INOUT) { + u32 buf_id = get_buffer_id(tee[i].u.memref.shm); + + amd->params[i].mref.buf_id = buf_id; + amd->params[i].mref.offset = tee[i].u.memref.shm_offs; + amd->params[i].mref.size = tee[i].u.memref.size; + pr_debug("%s: bufid[%d] = 0x%x, offset[%d] = 0x%x, size[%d] = 0x%x\n", + __func__, + i, amd->params[i].mref.buf_id, + i, amd->params[i].mref.offset, + i, amd->params[i].mref.size); + } else { + if (tee[i].u.value.c) + pr_warn("%s: Discarding value c", __func__); + + amd->params[i].val.a = tee[i].u.value.a; + amd->params[i].val.b = tee[i].u.value.b; + pr_debug("%s: a[%d] = 0x%x, b[%d] = 0x%x\n", __func__, + i, amd->params[i].val.a, + i, amd->params[i].val.b); + } + } + return ret; +} + +static int amd_params_to_tee_params(struct tee_param *tee, u32 count, + struct tee_operation *amd) +{ + int i, ret = 0; + u32 type; + + if (!count) + return 0; + + if (!tee || !amd || count > TEE_MAX_PARAMS) + return -EINVAL; + + /* Assumes amd->param_types is valid */ + for (i = 0; i < count; i++) { + type = TEE_PARAM_TYPE_GET(amd->param_types, i); + pr_debug("%s: type[%d] = 0x%x\n", __func__, i, type); + + if (type == TEE_OP_PARAM_TYPE_INVALID || + type > TEE_OP_PARAM_TYPE_MEMREF_INOUT) + return -EINVAL; + + if (type == TEE_OP_PARAM_TYPE_NONE || + type == TEE_OP_PARAM_TYPE_VALUE_INPUT || + type == TEE_OP_PARAM_TYPE_MEMREF_INPUT) + continue; + + /* + * It is assumed that buf_id remains unchanged for + * both open_session and invoke_cmd call + */ + if (type > TEE_OP_PARAM_TYPE_MEMREF_INPUT) { + tee[i].u.memref.shm_offs = amd->params[i].mref.offset; + tee[i].u.memref.size = amd->params[i].mref.size; + pr_debug("%s: bufid[%d] = 0x%x, offset[%d] = 0x%x, size[%d] = 0x%x\n", + __func__, + i, amd->params[i].mref.buf_id, + i, amd->params[i].mref.offset, + i, amd->params[i].mref.size); + } else { + /* field 'c' not supported by AMD TEE */ + tee[i].u.value.a = amd->params[i].val.a; + tee[i].u.value.b = amd->params[i].val.b; + tee[i].u.value.c = 0; + pr_debug("%s: a[%d] = 0x%x, b[%d] = 0x%x\n", + __func__, + i, amd->params[i].val.a, + i, amd->params[i].val.b); + } + } + return ret; +} + +static DEFINE_MUTEX(ta_refcount_mutex); +static struct list_head ta_list = LIST_HEAD_INIT(ta_list); + +static u32 get_ta_refcount(u32 ta_handle) +{ + struct amdtee_ta_data *ta_data; + u32 count = 0; + + /* Caller must hold a mutex */ + list_for_each_entry(ta_data, &ta_list, list_node) + if (ta_data->ta_handle == ta_handle) + return ++ta_data->refcount; + + ta_data = kzalloc(sizeof(*ta_data), GFP_KERNEL); + if (ta_data) { + ta_data->ta_handle = ta_handle; + ta_data->refcount = 1; + count = ta_data->refcount; + list_add(&ta_data->list_node, &ta_list); + } + + return count; +} + +static u32 put_ta_refcount(u32 ta_handle) +{ + struct amdtee_ta_data *ta_data; + u32 count = 0; + + /* Caller must hold a mutex */ + list_for_each_entry(ta_data, &ta_list, list_node) + if (ta_data->ta_handle == ta_handle) { + count = --ta_data->refcount; + if (count == 0) { + list_del(&ta_data->list_node); + kfree(ta_data); + break; + } + } + + return count; +} + +int handle_unload_ta(u32 ta_handle) +{ + struct tee_cmd_unload_ta cmd = {0}; + u32 status, count; + int ret; + + if (!ta_handle) + return -EINVAL; + + mutex_lock(&ta_refcount_mutex); + + count = put_ta_refcount(ta_handle); + + if (count) { + pr_debug("unload ta: not unloading %u count %u\n", + ta_handle, count); + ret = -EBUSY; + goto unlock; + } + + cmd.ta_handle = ta_handle; + + ret = psp_tee_process_cmd(TEE_CMD_ID_UNLOAD_TA, (void *)&cmd, + sizeof(cmd), &status); + if (!ret && status != 0) { + pr_err("unload ta: status = 0x%x\n", status); + ret = -EBUSY; + } else { + pr_debug("unloaded ta handle %u\n", ta_handle); + } + +unlock: + mutex_unlock(&ta_refcount_mutex); + return ret; +} + +int handle_close_session(u32 ta_handle, u32 info) +{ + struct tee_cmd_close_session cmd = {0}; + u32 status; + int ret; + + if (ta_handle == 0) + return -EINVAL; + + cmd.ta_handle = ta_handle; + cmd.session_info = info; + + ret = psp_tee_process_cmd(TEE_CMD_ID_CLOSE_SESSION, (void *)&cmd, + sizeof(cmd), &status); + if (!ret && status != 0) { + pr_err("close session: status = 0x%x\n", status); + ret = -EBUSY; + } + + return ret; +} + +void handle_unmap_shmem(u32 buf_id) +{ + struct tee_cmd_unmap_shared_mem cmd = {0}; + u32 status; + int ret; + + cmd.buf_id = buf_id; + + ret = psp_tee_process_cmd(TEE_CMD_ID_UNMAP_SHARED_MEM, (void *)&cmd, + sizeof(cmd), &status); + if (!ret) + pr_debug("unmap shared memory: buf_id %u status = 0x%x\n", + buf_id, status); +} + +int handle_invoke_cmd(struct tee_ioctl_invoke_arg *arg, u32 sinfo, + struct tee_param *p) +{ + struct tee_cmd_invoke_cmd cmd = {0}; + int ret; + + if (!arg || (!p && arg->num_params)) + return -EINVAL; + + arg->ret_origin = TEEC_ORIGIN_COMMS; + + if (arg->session == 0) { + arg->ret = TEEC_ERROR_BAD_PARAMETERS; + return -EINVAL; + } + + ret = tee_params_to_amd_params(p, arg->num_params, &cmd.op); + if (ret) { + pr_err("invalid Params. Abort invoke command\n"); + arg->ret = TEEC_ERROR_BAD_PARAMETERS; + return ret; + } + + cmd.ta_handle = get_ta_handle(arg->session); + cmd.cmd_id = arg->func; + cmd.session_info = sinfo; + + ret = psp_tee_process_cmd(TEE_CMD_ID_INVOKE_CMD, (void *)&cmd, + sizeof(cmd), &arg->ret); + if (ret) { + arg->ret = TEEC_ERROR_COMMUNICATION; + } else { + ret = amd_params_to_tee_params(p, arg->num_params, &cmd.op); + if (unlikely(ret)) { + pr_err("invoke command: failed to copy output\n"); + arg->ret = TEEC_ERROR_GENERIC; + return ret; + } + arg->ret_origin = cmd.return_origin; + pr_debug("invoke command: RO = 0x%x ret = 0x%x\n", + arg->ret_origin, arg->ret); + } + + return ret; +} + +int handle_map_shmem(u32 count, struct shmem_desc *start, u32 *buf_id) +{ + struct tee_cmd_map_shared_mem *cmd; + phys_addr_t paddr; + int ret, i; + u32 status; + + if (!count || !start || !buf_id) + return -EINVAL; + + cmd = kzalloc(sizeof(*cmd), GFP_KERNEL); + if (!cmd) + return -ENOMEM; + + /* Size must be page aligned */ + for (i = 0; i < count ; i++) { + if (!start[i].kaddr || (start[i].size & (PAGE_SIZE - 1))) { + ret = -EINVAL; + goto free_cmd; + } + + if ((u64)start[i].kaddr & (PAGE_SIZE - 1)) { + pr_err("map shared memory: page unaligned. addr 0x%llx", + (u64)start[i].kaddr); + ret = -EINVAL; + goto free_cmd; + } + } + + cmd->sg_list.count = count; + + /* Create buffer list */ + for (i = 0; i < count ; i++) { + paddr = __psp_pa(start[i].kaddr); + cmd->sg_list.buf[i].hi_addr = upper_32_bits(paddr); + cmd->sg_list.buf[i].low_addr = lower_32_bits(paddr); + cmd->sg_list.buf[i].size = start[i].size; + cmd->sg_list.size += cmd->sg_list.buf[i].size; + + pr_debug("buf[%d]:hi addr = 0x%x\n", i, + cmd->sg_list.buf[i].hi_addr); + pr_debug("buf[%d]:low addr = 0x%x\n", i, + cmd->sg_list.buf[i].low_addr); + pr_debug("buf[%d]:size = 0x%x\n", i, cmd->sg_list.buf[i].size); + pr_debug("list size = 0x%x\n", cmd->sg_list.size); + } + + *buf_id = 0; + + ret = psp_tee_process_cmd(TEE_CMD_ID_MAP_SHARED_MEM, (void *)cmd, + sizeof(*cmd), &status); + if (!ret && !status) { + *buf_id = cmd->buf_id; + pr_debug("mapped buffer ID = 0x%x\n", *buf_id); + } else { + pr_err("map shared memory: status = 0x%x\n", status); + ret = -ENOMEM; + } + +free_cmd: + kfree(cmd); + + return ret; +} + +int handle_open_session(struct tee_ioctl_open_session_arg *arg, u32 *info, + struct tee_param *p) +{ + struct tee_cmd_open_session cmd = {0}; + int ret; + + if (!arg || !info || (!p && arg->num_params)) + return -EINVAL; + + arg->ret_origin = TEEC_ORIGIN_COMMS; + + if (arg->session == 0) { + arg->ret = TEEC_ERROR_GENERIC; + return -EINVAL; + } + + ret = tee_params_to_amd_params(p, arg->num_params, &cmd.op); + if (ret) { + pr_err("invalid Params. Abort open session\n"); + arg->ret = TEEC_ERROR_BAD_PARAMETERS; + return ret; + } + + cmd.ta_handle = get_ta_handle(arg->session); + *info = 0; + + ret = psp_tee_process_cmd(TEE_CMD_ID_OPEN_SESSION, (void *)&cmd, + sizeof(cmd), &arg->ret); + if (ret) { + arg->ret = TEEC_ERROR_COMMUNICATION; + } else { + ret = amd_params_to_tee_params(p, arg->num_params, &cmd.op); + if (unlikely(ret)) { + pr_err("open session: failed to copy output\n"); + arg->ret = TEEC_ERROR_GENERIC; + return ret; + } + arg->ret_origin = cmd.return_origin; + *info = cmd.session_info; + pr_debug("open session: session info = 0x%x\n", *info); + } + + pr_debug("open session: ret = 0x%x RO = 0x%x\n", arg->ret, + arg->ret_origin); + + return ret; +} + +int handle_load_ta(void *data, u32 size, struct tee_ioctl_open_session_arg *arg) +{ + struct tee_cmd_unload_ta unload_cmd = {}; + struct tee_cmd_load_ta load_cmd = {}; + phys_addr_t blob; + int ret; + + if (size == 0 || !data || !arg) + return -EINVAL; + + blob = __psp_pa(data); + if (blob & (PAGE_SIZE - 1)) { + pr_err("load TA: page unaligned. blob 0x%llx", blob); + return -EINVAL; + } + + load_cmd.hi_addr = upper_32_bits(blob); + load_cmd.low_addr = lower_32_bits(blob); + load_cmd.size = size; + + mutex_lock(&ta_refcount_mutex); + + ret = psp_tee_process_cmd(TEE_CMD_ID_LOAD_TA, (void *)&load_cmd, + sizeof(load_cmd), &arg->ret); + if (ret) { + arg->ret_origin = TEEC_ORIGIN_COMMS; + arg->ret = TEEC_ERROR_COMMUNICATION; + } else { + arg->ret_origin = load_cmd.return_origin; + + if (arg->ret == TEEC_SUCCESS) { + ret = get_ta_refcount(load_cmd.ta_handle); + if (!ret) { + arg->ret_origin = TEEC_ORIGIN_COMMS; + arg->ret = TEEC_ERROR_OUT_OF_MEMORY; + + /* Unload the TA on error */ + unload_cmd.ta_handle = load_cmd.ta_handle; + psp_tee_process_cmd(TEE_CMD_ID_UNLOAD_TA, + (void *)&unload_cmd, + sizeof(unload_cmd), &ret); + } else { + set_session_id(load_cmd.ta_handle, 0, &arg->session); + } + } + } + mutex_unlock(&ta_refcount_mutex); + + pr_debug("load TA: TA handle = 0x%x, RO = 0x%x, ret = 0x%x\n", + load_cmd.ta_handle, arg->ret_origin, arg->ret); + + return 0; +} diff --git a/drivers/tee/amdtee/core.c b/drivers/tee/amdtee/core.c new file mode 100644 index 0000000000000000000000000000000000000000..c969eb13885e138768a52d8c951a5198ae437821 --- /dev/null +++ b/drivers/tee/amdtee/core.c @@ -0,0 +1,528 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright 2019 Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "amdtee_private.h" +#include "../tee_private.h" + +static struct amdtee_driver_data *drv_data; +static DEFINE_MUTEX(session_list_mutex); + +static void amdtee_get_version(struct tee_device *teedev, + struct tee_ioctl_version_data *vers) +{ + struct tee_ioctl_version_data v = { + .impl_id = TEE_IMPL_ID_AMDTEE, + .impl_caps = 0, + .gen_caps = TEE_GEN_CAP_GP, + }; + *vers = v; +} + +static int amdtee_open(struct tee_context *ctx) +{ + struct amdtee_context_data *ctxdata; + + ctxdata = kzalloc(sizeof(*ctxdata), GFP_KERNEL); + if (!ctxdata) + return -ENOMEM; + + INIT_LIST_HEAD(&ctxdata->sess_list); + INIT_LIST_HEAD(&ctxdata->shm_list); + mutex_init(&ctxdata->shm_mutex); + + ctx->data = ctxdata; + return 0; +} + +static void release_session(struct amdtee_session *sess) +{ + int i; + + /* Close any open session */ + for (i = 0; i < TEE_NUM_SESSIONS; ++i) { + /* Check if session entry 'i' is valid */ + if (!test_bit(i, sess->sess_mask)) + continue; + + handle_close_session(sess->ta_handle, sess->session_info[i]); + handle_unload_ta(sess->ta_handle); + } + + kfree(sess); +} + +static void amdtee_release(struct tee_context *ctx) +{ + struct amdtee_context_data *ctxdata = ctx->data; + + if (!ctxdata) + return; + + while (true) { + struct amdtee_session *sess; + + sess = list_first_entry_or_null(&ctxdata->sess_list, + struct amdtee_session, + list_node); + + if (!sess) + break; + + list_del(&sess->list_node); + release_session(sess); + } + mutex_destroy(&ctxdata->shm_mutex); + kfree(ctxdata); + + ctx->data = NULL; +} + +/** + * alloc_session() - Allocate a session structure + * @ctxdata: TEE Context data structure + * @session: Session ID for which 'struct amdtee_session' structure is to be + * allocated. + * + * Scans the TEE context's session list to check if TA is already loaded in to + * TEE. If yes, returns the 'session' structure for that TA. Else allocates, + * initializes a new 'session' structure and adds it to context's session list. + * + * The caller must hold a mutex. + * + * Returns: + * 'struct amdtee_session *' on success and NULL on failure. + */ +static struct amdtee_session *alloc_session(struct amdtee_context_data *ctxdata, + u32 session) +{ + struct amdtee_session *sess; + u32 ta_handle = get_ta_handle(session); + + /* Scan session list to check if TA is already loaded in to TEE */ + list_for_each_entry(sess, &ctxdata->sess_list, list_node) + if (sess->ta_handle == ta_handle) { + kref_get(&sess->refcount); + return sess; + } + + /* Allocate a new session and add to list */ + sess = kzalloc(sizeof(*sess), GFP_KERNEL); + if (sess) { + sess->ta_handle = ta_handle; + kref_init(&sess->refcount); + spin_lock_init(&sess->lock); + list_add(&sess->list_node, &ctxdata->sess_list); + } + + return sess; +} + +/* Requires mutex to be held */ +static struct amdtee_session *find_session(struct amdtee_context_data *ctxdata, + u32 session) +{ + u32 ta_handle = get_ta_handle(session); + u32 index = get_session_index(session); + struct amdtee_session *sess; + + if (index >= TEE_NUM_SESSIONS) + return NULL; + + list_for_each_entry(sess, &ctxdata->sess_list, list_node) + if (ta_handle == sess->ta_handle && + test_bit(index, sess->sess_mask)) + return sess; + + return NULL; +} + +u32 get_buffer_id(struct tee_shm *shm) +{ + struct amdtee_context_data *ctxdata = shm->ctx->data; + struct amdtee_shm_data *shmdata; + u32 buf_id = 0; + + mutex_lock(&ctxdata->shm_mutex); + list_for_each_entry(shmdata, &ctxdata->shm_list, shm_node) + if (shmdata->kaddr == shm->kaddr) { + buf_id = shmdata->buf_id; + break; + } + mutex_unlock(&ctxdata->shm_mutex); + + return buf_id; +} + +static DEFINE_MUTEX(drv_mutex); +static int copy_ta_binary(struct tee_context *ctx, void *ptr, void **ta, + size_t *ta_size) +{ + const struct firmware *fw; + char fw_name[TA_PATH_MAX]; + struct { + u32 lo; + u16 mid; + u16 hi_ver; + u8 seq_n[8]; + } *uuid = ptr; + int n, rc = 0; + + n = snprintf(fw_name, TA_PATH_MAX, + "%s/%08x-%04x-%04x-%02x%02x%02x%02x%02x%02x%02x%02x.bin", + TA_LOAD_PATH, uuid->lo, uuid->mid, uuid->hi_ver, + uuid->seq_n[0], uuid->seq_n[1], + uuid->seq_n[2], uuid->seq_n[3], + uuid->seq_n[4], uuid->seq_n[5], + uuid->seq_n[6], uuid->seq_n[7]); + if (n < 0 || n >= TA_PATH_MAX) { + pr_err("failed to get firmware name\n"); + return -EINVAL; + } + + mutex_lock(&drv_mutex); + n = request_firmware(&fw, fw_name, &ctx->teedev->dev); + if (n) { + pr_err("failed to load firmware %s\n", fw_name); + rc = -ENOMEM; + goto unlock; + } + + *ta_size = roundup(fw->size, PAGE_SIZE); + *ta = (void *)__get_free_pages(GFP_KERNEL, get_order(*ta_size)); + if (!*ta) { + pr_err("%s: get_free_pages failed\n", __func__); + rc = -ENOMEM; + goto rel_fw; + } + + memcpy(*ta, fw->data, fw->size); +rel_fw: + release_firmware(fw); +unlock: + mutex_unlock(&drv_mutex); + return rc; +} + +/* mutex must be held by caller */ +static void destroy_session(struct kref *ref) +{ + struct amdtee_session *sess = container_of(ref, struct amdtee_session, + refcount); + + list_del(&sess->list_node); + mutex_unlock(&session_list_mutex); + kfree(sess); +} + +int amdtee_open_session(struct tee_context *ctx, + struct tee_ioctl_open_session_arg *arg, + struct tee_param *param) +{ + struct amdtee_context_data *ctxdata = ctx->data; + struct amdtee_session *sess = NULL; + u32 session_info, ta_handle; + size_t ta_size; + int rc, i; + void *ta; + + if (arg->clnt_login != TEE_IOCTL_LOGIN_PUBLIC) { + pr_err("unsupported client login method\n"); + return -EINVAL; + } + + rc = copy_ta_binary(ctx, &arg->uuid[0], &ta, &ta_size); + if (rc) { + pr_err("failed to copy TA binary\n"); + return rc; + } + + /* Load the TA binary into TEE environment */ + handle_load_ta(ta, ta_size, arg); + if (arg->ret != TEEC_SUCCESS) + goto out; + + ta_handle = get_ta_handle(arg->session); + + mutex_lock(&session_list_mutex); + sess = alloc_session(ctxdata, arg->session); + mutex_unlock(&session_list_mutex); + + if (!sess) { + handle_unload_ta(ta_handle); + rc = -ENOMEM; + goto out; + } + + /* Open session with loaded TA */ + handle_open_session(arg, &session_info, param); + if (arg->ret != TEEC_SUCCESS) { + pr_err("open_session failed %d\n", arg->ret); + handle_unload_ta(ta_handle); + kref_put_mutex(&sess->refcount, destroy_session, + &session_list_mutex); + goto out; + } + + /* Find an empty session index for the given TA */ + spin_lock(&sess->lock); + i = find_first_zero_bit(sess->sess_mask, TEE_NUM_SESSIONS); + if (i < TEE_NUM_SESSIONS) { + sess->session_info[i] = session_info; + set_session_id(ta_handle, i, &arg->session); + set_bit(i, sess->sess_mask); + } + spin_unlock(&sess->lock); + + if (i >= TEE_NUM_SESSIONS) { + pr_err("reached maximum session count %d\n", TEE_NUM_SESSIONS); + handle_close_session(ta_handle, session_info); + handle_unload_ta(ta_handle); + kref_put_mutex(&sess->refcount, destroy_session, + &session_list_mutex); + rc = -ENOMEM; + goto out; + } + +out: + free_pages((u64)ta, get_order(ta_size)); + return rc; +} + +int amdtee_close_session(struct tee_context *ctx, u32 session) +{ + struct amdtee_context_data *ctxdata = ctx->data; + u32 i, ta_handle, session_info; + struct amdtee_session *sess; + + pr_debug("%s: sid = 0x%x\n", __func__, session); + + /* + * Check that the session is valid and clear the session + * usage bit + */ + mutex_lock(&session_list_mutex); + sess = find_session(ctxdata, session); + if (sess) { + ta_handle = get_ta_handle(session); + i = get_session_index(session); + session_info = sess->session_info[i]; + spin_lock(&sess->lock); + clear_bit(i, sess->sess_mask); + spin_unlock(&sess->lock); + } + mutex_unlock(&session_list_mutex); + + if (!sess) + return -EINVAL; + + /* Close the session */ + handle_close_session(ta_handle, session_info); + handle_unload_ta(ta_handle); + + kref_put_mutex(&sess->refcount, destroy_session, &session_list_mutex); + + return 0; +} + +int amdtee_map_shmem(struct tee_shm *shm) +{ + struct amdtee_context_data *ctxdata; + struct amdtee_shm_data *shmnode; + struct shmem_desc shmem; + int rc, count; + u32 buf_id; + + if (!shm) + return -EINVAL; + + shmnode = kmalloc(sizeof(*shmnode), GFP_KERNEL); + if (!shmnode) + return -ENOMEM; + + count = 1; + shmem.kaddr = shm->kaddr; + shmem.size = shm->size; + + /* + * Send a MAP command to TEE and get the corresponding + * buffer Id + */ + rc = handle_map_shmem(count, &shmem, &buf_id); + if (rc) { + pr_err("map_shmem failed: ret = %d\n", rc); + kfree(shmnode); + return rc; + } + + shmnode->kaddr = shm->kaddr; + shmnode->buf_id = buf_id; + ctxdata = shm->ctx->data; + mutex_lock(&ctxdata->shm_mutex); + list_add(&shmnode->shm_node, &ctxdata->shm_list); + mutex_unlock(&ctxdata->shm_mutex); + + pr_debug("buf_id :[%x] kaddr[%p]\n", shmnode->buf_id, shmnode->kaddr); + + return 0; +} + +void amdtee_unmap_shmem(struct tee_shm *shm) +{ + struct amdtee_context_data *ctxdata; + struct amdtee_shm_data *shmnode; + u32 buf_id; + + if (!shm) + return; + + buf_id = get_buffer_id(shm); + /* Unmap the shared memory from TEE */ + handle_unmap_shmem(buf_id); + + ctxdata = shm->ctx->data; + mutex_lock(&ctxdata->shm_mutex); + list_for_each_entry(shmnode, &ctxdata->shm_list, shm_node) + if (buf_id == shmnode->buf_id) { + list_del(&shmnode->shm_node); + kfree(shmnode); + break; + } + mutex_unlock(&ctxdata->shm_mutex); +} + +int amdtee_invoke_func(struct tee_context *ctx, + struct tee_ioctl_invoke_arg *arg, + struct tee_param *param) +{ + struct amdtee_context_data *ctxdata = ctx->data; + struct amdtee_session *sess; + u32 i, session_info; + + /* Check that the session is valid */ + mutex_lock(&session_list_mutex); + sess = find_session(ctxdata, arg->session); + if (sess) { + i = get_session_index(arg->session); + session_info = sess->session_info[i]; + } + mutex_unlock(&session_list_mutex); + + if (!sess) + return -EINVAL; + + handle_invoke_cmd(arg, session_info, param); + + return 0; +} + +int amdtee_cancel_req(struct tee_context *ctx, u32 cancel_id, u32 session) +{ + return -EINVAL; +} + +static const struct tee_driver_ops amdtee_ops = { + .get_version = amdtee_get_version, + .open = amdtee_open, + .release = amdtee_release, + .open_session = amdtee_open_session, + .close_session = amdtee_close_session, + .invoke_func = amdtee_invoke_func, + .cancel_req = amdtee_cancel_req, +}; + +static const struct tee_desc amdtee_desc = { + .name = DRIVER_NAME "-clnt", + .ops = &amdtee_ops, + .owner = THIS_MODULE, +}; + +static int __init amdtee_driver_init(void) +{ + struct tee_device *teedev; + struct tee_shm_pool *pool; + struct amdtee *amdtee; + int rc; + + drv_data = kzalloc(sizeof(*drv_data), GFP_KERNEL); + if (!drv_data) + return -ENOMEM; + + amdtee = kzalloc(sizeof(*amdtee), GFP_KERNEL); + if (!amdtee) { + rc = -ENOMEM; + goto err_kfree_drv_data; + } + + pool = amdtee_config_shm(); + if (IS_ERR(pool)) { + pr_err("shared pool configuration error\n"); + rc = PTR_ERR(pool); + goto err_kfree_amdtee; + } + + teedev = tee_device_alloc(&amdtee_desc, NULL, pool, amdtee); + if (IS_ERR(teedev)) { + rc = PTR_ERR(teedev); + goto err; + } + amdtee->teedev = teedev; + + rc = tee_device_register(amdtee->teedev); + if (rc) + goto err; + + amdtee->pool = pool; + + drv_data->amdtee = amdtee; + + pr_info("amd-tee driver initialization successful\n"); + return 0; + +err: + tee_device_unregister(amdtee->teedev); + if (pool) + tee_shm_pool_free(pool); + +err_kfree_amdtee: + kfree(amdtee); + +err_kfree_drv_data: + kfree(drv_data); + drv_data = NULL; + + pr_err("amd-tee driver initialization failed\n"); + return rc; +} +module_init(amdtee_driver_init); + +static void __exit amdtee_driver_exit(void) +{ + struct amdtee *amdtee; + + if (!drv_data || !drv_data->amdtee) + return; + + amdtee = drv_data->amdtee; + + tee_device_unregister(amdtee->teedev); + tee_shm_pool_free(amdtee->pool); +} +module_exit(amdtee_driver_exit); + +MODULE_AUTHOR(DRIVER_AUTHOR); +MODULE_DESCRIPTION("AMD-TEE driver"); +MODULE_VERSION("1.0"); +MODULE_LICENSE("Dual MIT/GPL"); diff --git a/drivers/tee/amdtee/shm_pool.c b/drivers/tee/amdtee/shm_pool.c new file mode 100644 index 0000000000000000000000000000000000000000..065854e2db18644acfb2fd90894c5b977a88be05 --- /dev/null +++ b/drivers/tee/amdtee/shm_pool.c @@ -0,0 +1,93 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright 2019 Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include "amdtee_private.h" + +static int pool_op_alloc(struct tee_shm_pool_mgr *poolm, struct tee_shm *shm, + size_t size) +{ + unsigned int order = get_order(size); + unsigned long va; + int rc; + + va = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); + if (!va) + return -ENOMEM; + + shm->kaddr = (void *)va; + shm->paddr = __psp_pa((void *)va); + shm->size = PAGE_SIZE << order; + + /* Map the allocated memory in to TEE */ + rc = amdtee_map_shmem(shm); + if (rc) { + free_pages(va, order); + shm->kaddr = NULL; + return rc; + } + + return 0; +} + +static void pool_op_free(struct tee_shm_pool_mgr *poolm, struct tee_shm *shm) +{ + /* Unmap the shared memory from TEE */ + amdtee_unmap_shmem(shm); + free_pages((unsigned long)shm->kaddr, get_order(shm->size)); + shm->kaddr = NULL; +} + +static void pool_op_destroy_poolmgr(struct tee_shm_pool_mgr *poolm) +{ + kfree(poolm); +} + +static const struct tee_shm_pool_mgr_ops pool_ops = { + .alloc = pool_op_alloc, + .free = pool_op_free, + .destroy_poolmgr = pool_op_destroy_poolmgr, +}; + +static struct tee_shm_pool_mgr *pool_mem_mgr_alloc(void) +{ + struct tee_shm_pool_mgr *mgr = kzalloc(sizeof(*mgr), GFP_KERNEL); + + if (!mgr) + return ERR_PTR(-ENOMEM); + + mgr->ops = &pool_ops; + + return mgr; +} + +struct tee_shm_pool *amdtee_config_shm(void) +{ + struct tee_shm_pool_mgr *priv_mgr; + struct tee_shm_pool_mgr *dmabuf_mgr; + void *rc; + + rc = pool_mem_mgr_alloc(); + if (IS_ERR(rc)) + return rc; + priv_mgr = rc; + + rc = pool_mem_mgr_alloc(); + if (IS_ERR(rc)) { + tee_shm_pool_mgr_destroy(priv_mgr); + return rc; + } + dmabuf_mgr = rc; + + rc = tee_shm_pool_alloc(priv_mgr, dmabuf_mgr); + if (IS_ERR(rc)) { + tee_shm_pool_mgr_destroy(priv_mgr); + tee_shm_pool_mgr_destroy(dmabuf_mgr); + } + + return rc; +} diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 2a87de3c0b3e716c5c0795472a41d3ac207acb1a..e0c6235ef0be2fc4615bbd5d54c1764ec3da6f79 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -556,6 +556,8 @@ #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443 #define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653 #define PCI_DEVICE_ID_AMD_19H_M10H_DF_F3 0x14b0 +#define PCI_DEVICE_ID_AMD_1AH_M00H_DF_F3 0x12c3 +#define PCI_DEVICE_ID_AMD_1AH_M20H_DF_F3 0x16fb #define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703 #define PCI_DEVICE_ID_AMD_LANCE 0x2000 #define PCI_DEVICE_ID_AMD_LANCE_HOME 0x2001 diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 07fc4b3e3ce49e3bcb5bc898b20489d4972edac0..7f6f174f481d06e56b792c8e087a51b6c26be371 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -92,14 +92,26 @@ struct perf_raw_record { /* * branch stack layout: * nr: number of taken branches stored in entries[] + * hw_idx: The low level index of raw branch records + * for the most recent branch. + * -1ULL means invalid/unknown. * * Note that nr can vary from sample to sample * branches (to, from) are stored from most recent * to least recent, i.e., entries[0] contains the most * recent branch. + * The entries[] is an abstraction of raw branch records, + * which may not be stored in age order in HW, e.g. Intel LBR. + * The hw_idx is to expose the low level index of raw + * branch record for the most recent branch aka entries[0]. + * The hw_idx index is between -1 (unknown) and max depth, + * which can be retrieved in /sys/devices/cpu/caps/branches. + * For the architectures whose raw branch records are + * already stored in age order, the hw_idx should be 0. */ struct perf_branch_stack { __u64 nr; + __u64 hw_idx; struct perf_branch_entry entries[0]; }; @@ -115,6 +127,15 @@ struct hw_perf_event_extra { int idx; /* index in shared_regs->regs[] */ }; +/** + * hw_perf_event::flag values + * + * PERF_EVENT_FLAG_ARCH bits are reserved for architecture-specific + * usage. + */ +#define PERF_EVENT_FLAG_ARCH 0x0000ffff +#define PERF_EVENT_FLAG_USER_READ_CNT 0x80000000 + /** * struct hw_perf_event - performance event hardware details: */ @@ -257,6 +278,8 @@ struct perf_event; #define PERF_PMU_CAP_NO_EXCLUDE 0x80 #define PERF_PMU_CAP_AUX_OUTPUT 0x100 +struct perf_output_handle; + /** * struct pmu - generic performance monitoring unit */ @@ -418,6 +441,11 @@ struct pmu { */ size_t task_ctx_size; + /* + * Kmem cache of PMU specific data + */ + struct kmem_cache *task_ctx_cache; + /* * Set up pmu-private data structures for an AUX area @@ -431,6 +459,19 @@ struct pmu { */ void (*free_aux) (void *aux); /* optional */ + /* + * Take a snapshot of the AUX buffer without touching the event + * state, so that preempting ->start()/->stop() callbacks does + * not interfere with their logic. Called in PMI context. + * + * Returns the size of AUX data copied to the output handle. + * + * Optional. + */ + long (*snapshot_aux) (struct perf_event *event, + struct perf_output_handle *handle, + unsigned long size); + /* * Validate address range filters: make sure the HW supports the * requested configuration and number of filters; return 0 if the @@ -956,7 +997,7 @@ struct perf_sample_data { struct perf_raw_record *raw; struct perf_branch_stack *br_stack; u64 period; - u64 weight; + union perf_sample_weight weight; u64 txn; union perf_mem_data_src data_src; @@ -978,6 +1019,7 @@ struct perf_sample_data { u32 reserved; } cpu_entry; struct perf_callchain_entry *callchain; + u64 aux_size; /* * regs_user may point to task_pt_regs or to regs_user_copy, depending @@ -990,6 +1032,9 @@ struct perf_sample_data { u64 stack_user_size; u64 phys_addr; + u64 cgroup; + u64 data_page_size; + u64 code_page_size; } ____cacheline_aligned; /* default value for data source */ @@ -1007,11 +1052,27 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data->raw = NULL; data->br_stack = NULL; data->period = period; - data->weight = 0; + data->weight.full = 0; data->data_src.val = PERF_MEM_NA; data->txn = 0; } +/* + * Clear all bitfields in the perf_branch_entry. + * The to and from fields are not cleared because they are + * systematically modified by caller. + */ +static inline void perf_clear_branch_entry_bitfields(struct perf_branch_entry *br) +{ + br->mispred = 0; + br->predicted = 0; + br->in_tx = 0; + br->abort = 0; + br->cycles = 0; + br->type = 0; + br->reserved = 0; +} + extern void perf_output_sample(struct perf_output_handle *handle, struct perf_event_header *header, struct perf_sample_data *data, @@ -1358,6 +1419,9 @@ extern unsigned int perf_output_copy(struct perf_output_handle *handle, const void *buf, unsigned int len); extern unsigned int perf_output_skip(struct perf_output_handle *handle, unsigned int len); +extern long perf_output_copy_aux(struct perf_output_handle *aux_handle, + struct perf_output_handle *handle, + unsigned long from, unsigned long to); extern int perf_swevent_get_recursion_context(void); extern void perf_swevent_put_recursion_context(int rctx); extern u64 perf_swevent_set_period(struct perf_event *event); diff --git a/include/linux/psp-tee.h b/include/linux/psp-tee.h new file mode 100644 index 0000000000000000000000000000000000000000..63bb2212fce011f64ddc89dbae3e74c0ac086566 --- /dev/null +++ b/include/linux/psp-tee.h @@ -0,0 +1,73 @@ +/* SPDX-License-Identifier: MIT */ +/* + * AMD Trusted Execution Environment (TEE) interface + * + * Author: Rijo Thomas + * + * Copyright 2019 Advanced Micro Devices, Inc. + * + */ + +#ifndef __PSP_TEE_H_ +#define __PSP_TEE_H_ + +#include +#include + +/* This file defines the Trusted Execution Environment (TEE) interface commands + * and the API exported by AMD Secure Processor driver to communicate with + * AMD-TEE Trusted OS. + */ + +/** + * enum tee_cmd_id - TEE Interface Command IDs + * @TEE_CMD_ID_LOAD_TA: Load Trusted Application (TA) binary into + * TEE environment + * @TEE_CMD_ID_UNLOAD_TA: Unload TA binary from TEE environment + * @TEE_CMD_ID_OPEN_SESSION: Open session with loaded TA + * @TEE_CMD_ID_CLOSE_SESSION: Close session with loaded TA + * @TEE_CMD_ID_INVOKE_CMD: Invoke a command with loaded TA + * @TEE_CMD_ID_MAP_SHARED_MEM: Map shared memory + * @TEE_CMD_ID_UNMAP_SHARED_MEM: Unmap shared memory + */ +enum tee_cmd_id { + TEE_CMD_ID_LOAD_TA = 1, + TEE_CMD_ID_UNLOAD_TA, + TEE_CMD_ID_OPEN_SESSION, + TEE_CMD_ID_CLOSE_SESSION, + TEE_CMD_ID_INVOKE_CMD, + TEE_CMD_ID_MAP_SHARED_MEM, + TEE_CMD_ID_UNMAP_SHARED_MEM, +}; + +#ifdef CONFIG_CRYPTO_DEV_SP_PSP +/** + * psp_tee_process_cmd() - Process command in Trusted Execution Environment + * @cmd_id: TEE command ID (&enum tee_cmd_id) + * @buf: Command buffer for TEE processing. On success, is updated + * with the response + * @len: Length of command buffer in bytes + * @status: On success, holds the TEE command execution status + * + * This function submits a command to the Trusted OS for processing in the + * TEE environment and waits for a response or until the command times out. + * + * Returns: + * 0 if TEE successfully processed the command + * -%ENODEV if PSP device not available + * -%EINVAL if invalid input + * -%ETIMEDOUT if TEE command timed out + * -%EBUSY if PSP device is not responsive + */ +int psp_tee_process_cmd(enum tee_cmd_id cmd_id, void *buf, size_t len, + u32 *status); + +#else /* !CONFIG_CRYPTO_DEV_SP_PSP */ + +static inline int psp_tee_process_cmd(enum tee_cmd_id cmd_id, void *buf, + size_t len, u32 *status) +{ + return -ENODEV; +} +#endif /* CONFIG_CRYPTO_DEV_SP_PSP */ +#endif /* __PSP_TEE_H_ */ diff --git a/include/linux/static_call.h b/include/linux/static_call.h new file mode 100644 index 0000000000000000000000000000000000000000..d8892dff2e91f80c4aa83933742f6ea6b32c45cf --- /dev/null +++ b/include/linux/static_call.h @@ -0,0 +1,156 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_STATIC_CALL_H +#define _LINUX_STATIC_CALL_H + +/* + * Static call support + * + * Static calls use code patching to hard-code function pointers into direct + * branch instructions. They give the flexibility of function pointers, but + * with improved performance. This is especially important for cases where + * retpolines would otherwise be used, as retpolines can significantly impact + * performance. + * + * + * API overview: + * + * DECLARE_STATIC_CALL(name, func); + * DEFINE_STATIC_CALL(name, func); + * static_call(name)(args...); + * static_call_update(name, func); + * + * Usage example: + * + * # Start with the following functions (with identical prototypes): + * int func_a(int arg1, int arg2); + * int func_b(int arg1, int arg2); + * + * # Define a 'my_name' reference, associated with func_a() by default + * DEFINE_STATIC_CALL(my_name, func_a); + * + * # Call func_a() + * static_call(my_name)(arg1, arg2); + * + * # Update 'my_name' to point to func_b() + * static_call_update(my_name, &func_b); + * + * # Call func_b() + * static_call(my_name)(arg1, arg2); + * + * + * Implementation details: + * + * This requires some arch-specific code (CONFIG_HAVE_STATIC_CALL). + * Otherwise basic indirect calls are used (with function pointers). + * + * Each static_call() site calls into a trampoline associated with the name. + * The trampoline has a direct branch to the default function. Updates to a + * name will modify the trampoline's branch destination. + * + * If the arch has CONFIG_HAVE_STATIC_CALL_INLINE, then the call sites + * themselves will be patched at runtime to call the functions directly, + * rather than calling through the trampoline. This requires objtool or a + * compiler plugin to detect all the static_call() sites and annotate them + * in the .static_call_sites section. + */ + +#include +#include +#include + +#ifdef CONFIG_HAVE_STATIC_CALL +#include + +/* + * Either @site or @tramp can be NULL. + */ +extern void arch_static_call_transform(void *site, void *tramp, void *func); + +#define STATIC_CALL_TRAMP_ADDR(name) &STATIC_CALL_TRAMP(name) + +/* + * __ADDRESSABLE() is used to ensure the key symbol doesn't get stripped from + * the symbol table so that objtool can reference it when it generates the + * .static_call_sites section. + */ +#define __static_call(name) \ +({ \ + __ADDRESSABLE(STATIC_CALL_KEY(name)); \ + &STATIC_CALL_TRAMP(name); \ +}) + +#else +#define STATIC_CALL_TRAMP_ADDR(name) NULL +#endif + + +#define DECLARE_STATIC_CALL(name, func) \ + extern struct static_call_key STATIC_CALL_KEY(name); \ + extern typeof(func) STATIC_CALL_TRAMP(name); + +#define static_call_update(name, func) \ +({ \ + BUILD_BUG_ON(!__same_type(*(func), STATIC_CALL_TRAMP(name))); \ + __static_call_update(&STATIC_CALL_KEY(name), \ + STATIC_CALL_TRAMP_ADDR(name), func); \ +}) + +#if defined(CONFIG_HAVE_STATIC_CALL) + +struct static_call_key { + void *func; +}; + +#define DEFINE_STATIC_CALL(name, _func) \ + DECLARE_STATIC_CALL(name, _func); \ + struct static_call_key STATIC_CALL_KEY(name) = { \ + .func = _func, \ + }; \ + ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func) + +#define static_call(name) __static_call(name) + +static inline +void __static_call_update(struct static_call_key *key, void *tramp, void *func) +{ + cpus_read_lock(); + WRITE_ONCE(key->func, func); + arch_static_call_transform(NULL, tramp, func); + cpus_read_unlock(); +} + +#define EXPORT_STATIC_CALL(name) \ + EXPORT_SYMBOL(STATIC_CALL_KEY(name)); \ + EXPORT_SYMBOL(STATIC_CALL_TRAMP(name)) + +#define EXPORT_STATIC_CALL_GPL(name) \ + EXPORT_SYMBOL_GPL(STATIC_CALL_KEY(name)); \ + EXPORT_SYMBOL_GPL(STATIC_CALL_TRAMP(name)) + +#else /* Generic implementation */ + +struct static_call_key { + void *func; +}; + +#define DEFINE_STATIC_CALL(name, _func) \ + DECLARE_STATIC_CALL(name, _func); \ + struct static_call_key STATIC_CALL_KEY(name) = { \ + .func = _func, \ + } + +#define static_call(name) \ + ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func)) + +static inline +void __static_call_update(struct static_call_key *key, void *tramp, void *func) +{ + WRITE_ONCE(key->func, func); +} + +#define EXPORT_STATIC_CALL(name) EXPORT_SYMBOL(STATIC_CALL_KEY(name)) +#define EXPORT_STATIC_CALL_GPL(name) EXPORT_SYMBOL_GPL(STATIC_CALL_KEY(name)) + +#endif /* CONFIG_HAVE_STATIC_CALL */ + +#endif /* _LINUX_STATIC_CALL_H */ diff --git a/include/linux/static_call_types.h b/include/linux/static_call_types.h new file mode 100644 index 0000000000000000000000000000000000000000..5ed249dc47d304001c8ec705175bdd723a02002b --- /dev/null +++ b/include/linux/static_call_types.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _STATIC_CALL_TYPES_H +#define _STATIC_CALL_TYPES_H + +#include + +#define STATIC_CALL_KEY_PREFIX __SCK__ +#define STATIC_CALL_KEY(name) __PASTE(STATIC_CALL_KEY_PREFIX, name) + +#define STATIC_CALL_TRAMP_PREFIX __SCT__ +#define STATIC_CALL_TRAMP_PREFIX_STR __stringify(STATIC_CALL_TRAMP_PREFIX) +#define STATIC_CALL_TRAMP(name) __PASTE(STATIC_CALL_TRAMP_PREFIX, name) +#define STATIC_CALL_TRAMP_STR(name) __stringify(STATIC_CALL_TRAMP(name)) + +#endif /* _STATIC_CALL_TYPES_H */ diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index ceccd980ffcfe44c5204920f4ad73216bd20ae3c..81c8bfa18e90240405b860806e6f30c3fd3e3827 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -141,12 +141,18 @@ enum perf_event_sample_format { PERF_SAMPLE_TRANSACTION = 1U << 17, PERF_SAMPLE_REGS_INTR = 1U << 18, PERF_SAMPLE_PHYS_ADDR = 1U << 19, + PERF_SAMPLE_AUX = 1U << 20, + PERF_SAMPLE_CGROUP = 1U << 21, + PERF_SAMPLE_DATA_PAGE_SIZE = 1U << 22, + PERF_SAMPLE_CODE_PAGE_SIZE = 1U << 23, + PERF_SAMPLE_WEIGHT_STRUCT = 1U << 24, - PERF_SAMPLE_MAX = 1U << 20, /* non-ABI */ + PERF_SAMPLE_MAX = 1U << 25, /* non-ABI */ __PERF_SAMPLE_CALLCHAIN_EARLY = 1ULL << 63, /* non-ABI; internal use */ }; +#define PERF_SAMPLE_WEIGHT_TYPE (PERF_SAMPLE_WEIGHT | PERF_SAMPLE_WEIGHT_STRUCT) /* * values to program into branch_sample_type when PERF_SAMPLE_BRANCH is set * @@ -180,6 +186,8 @@ enum perf_branch_sample_type_shift { PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */ + PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT = 17, /* save low level index of raw branch records */ + PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */ }; @@ -207,6 +215,8 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_TYPE_SAVE = 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT, + PERF_SAMPLE_BRANCH_HW_INDEX = 1U << PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT, + PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT, }; @@ -300,6 +310,7 @@ enum perf_event_read_format { /* add: sample_stack_user */ #define PERF_ATTR_SIZE_VER4 104 /* add: sample_regs_intr */ #define PERF_ATTR_SIZE_VER5 112 /* add: aux_watermark */ +#define PERF_ATTR_SIZE_VER6 120 /* add: aux_sample_size */ /* * Hardware event_id to monitor via a performance monitoring event: @@ -375,7 +386,8 @@ struct perf_event_attr { ksymbol : 1, /* include ksymbol events */ bpf_event : 1, /* include bpf events */ aux_output : 1, /* generate AUX records instead of events */ - __reserved_1 : 32; + cgroup : 1, /* include cgroup events */ + __reserved_1 : 31; union { __u32 wakeup_events; /* wakeup every n events */ @@ -424,7 +436,9 @@ struct perf_event_attr { */ __u32 aux_watermark; __u16 sample_max_stack; - __u16 __reserved_2; /* align to __u64 */ + __u16 __reserved_2; + __u32 aux_sample_size; + __u32 __reserved_3; }; /* @@ -849,7 +863,9 @@ enum perf_event_type { * char data[size];}&& PERF_SAMPLE_RAW * * { u64 nr; - * { u64 from, to, flags } lbr[nr];} && PERF_SAMPLE_BRANCH_STACK + * { u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX + * { u64 from, to, flags } lbr[nr]; + * } && PERF_SAMPLE_BRANCH_STACK * * { u64 abi; # enum perf_sample_regs_abi * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER @@ -858,12 +874,33 @@ enum perf_event_type { * char data[size]; * u64 dyn_size; } && PERF_SAMPLE_STACK_USER * - * { u64 weight; } && PERF_SAMPLE_WEIGHT + * { union perf_sample_weight + * { + * u64 full; && PERF_SAMPLE_WEIGHT + * #if defined(__LITTLE_ENDIAN_BITFIELD) + * struct { + * u32 var1_dw; + * u16 var2_w; + * u16 var3_w; + * } && PERF_SAMPLE_WEIGHT_STRUCT + * #elif defined(__BIG_ENDIAN_BITFIELD) + * struct { + * u16 var3_w; + * u16 var2_w; + * u32 var1_dw; + * } && PERF_SAMPLE_WEIGHT_STRUCT + * #endif + * } + * } * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION * { u64 abi; # enum perf_sample_regs_abi * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR + * { u64 size; + * char data[size]; } && PERF_SAMPLE_AUX + * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE + * { u64 code_page_size;} && PERF_SAMPLE_CODE_PAGE_SIZE * }; */ PERF_RECORD_SAMPLE = 9, @@ -1000,6 +1037,16 @@ enum perf_event_type { */ PERF_RECORD_BPF_EVENT = 18, + /* + * struct { + * struct perf_event_header header; + * u64 id; + * char path[]; + * struct sample_id sample_id; + * }; + */ + PERF_RECORD_CGROUP = 19, + PERF_RECORD_MAX, /* non-ABI */ }; @@ -1179,4 +1226,23 @@ struct perf_branch_entry { reserved:40; }; +union perf_sample_weight { + __u64 full; +#if defined(__LITTLE_ENDIAN_BITFIELD) + struct { + __u32 var1_dw; + __u16 var2_w; + __u16 var3_w; + }; +#elif defined(__BIG_ENDIAN_BITFIELD) + struct { + __u16 var3_w; + __u16 var2_w; + __u32 var1_dw; + }; +#else +#error "Unknown endianness" +#endif +}; + #endif /* _UAPI_LINUX_PERF_EVENT_H */ diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h index 592a0c1b77c93eef274bf86bf465649970a3cde6..0549a5c622bf33a9c3c96999d1f2f0a4f0747e87 100644 --- a/include/uapi/linux/psp-sev.h +++ b/include/uapi/linux/psp-sev.h @@ -58,6 +58,9 @@ typedef enum { SEV_RET_HWSEV_RET_PLATFORM, SEV_RET_HWSEV_RET_UNSAFE, SEV_RET_UNSUPPORTED, + SEV_RET_INVALID_PARAM, + SEV_RET_RESOURCE_LIMIT, + SEV_RET_SECURE_DATA_INVALID, SEV_RET_MAX, } sev_ret_code; diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h index 4b9eb064d7e709298a011dbd54dae7940a42c010..6596f3a09e543b3ba9b748308a67ba8b124f408c 100644 --- a/include/uapi/linux/tee.h +++ b/include/uapi/linux/tee.h @@ -56,6 +56,7 @@ * TEE Implementation ID */ #define TEE_IMPL_ID_OPTEE 1 +#define TEE_IMPL_ID_AMDTEE 2 /* * OP-TEE specific capabilities diff --git a/init/Kconfig b/init/Kconfig index 3aa73f27028f80568f2c235d84d5524973cb1c6b..57f06ffcf26ec84792f5f31aa3dda4d3b977a8b8 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1120,7 +1120,8 @@ config CGROUP_PERF help This option extends the perf per-cpu mode to restrict monitoring to threads which belong to the cgroup specified and run on the - designated cpu. + designated cpu. Or this can be used to have cgroup ID in samples + so that it can monitor performance events among cgroups. Say N if unsure. diff --git a/kernel/events/core.c b/kernel/events/core.c index 2043b0f729a2ee34c4b764803a888047eaac5d46..a3270243c917b9291ac1247b298bbc502fd806df 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -49,6 +49,7 @@ #include #include #include +#include #include "internal.h" @@ -392,6 +393,7 @@ static atomic_t nr_freq_events __read_mostly; static atomic_t nr_switch_events __read_mostly; static atomic_t nr_ksymbol_events __read_mostly; static atomic_t nr_bpf_events __read_mostly; +static atomic_t nr_cgroup_events __read_mostly; static LIST_HEAD(pmus); static DEFINE_MUTEX(pmus_lock); @@ -1181,12 +1183,28 @@ static void get_ctx(struct perf_event_context *ctx) refcount_inc(&ctx->refcount); } +static void *alloc_task_ctx_data(struct pmu *pmu) +{ + if (pmu->task_ctx_cache) + return kmem_cache_zalloc(pmu->task_ctx_cache, GFP_KERNEL); + + return kzalloc(pmu->task_ctx_size, GFP_KERNEL); +} + +static void free_task_ctx_data(struct pmu *pmu, void *task_ctx_data) +{ + if (pmu->task_ctx_cache && task_ctx_data) + kmem_cache_free(pmu->task_ctx_cache, task_ctx_data); + else + kfree(task_ctx_data); +} + static void free_ctx(struct rcu_head *head) { struct perf_event_context *ctx; ctx = container_of(head, struct perf_event_context, rcu_head); - kfree(ctx->task_ctx_data); + free_task_ctx_data(ctx->pmu, ctx->task_ctx_data); kfree(ctx); } @@ -1744,8 +1762,8 @@ static void __perf_event_header_size(struct perf_event *event, u64 sample_type) if (sample_type & PERF_SAMPLE_PERIOD) size += sizeof(data->period); - if (sample_type & PERF_SAMPLE_WEIGHT) - size += sizeof(data->weight); + if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) + size += sizeof(data->weight.full); if (sample_type & PERF_SAMPLE_READ) size += event->read_size; @@ -1759,6 +1777,15 @@ static void __perf_event_header_size(struct perf_event *event, u64 sample_type) if (sample_type & PERF_SAMPLE_PHYS_ADDR) size += sizeof(data->phys_addr); + if (sample_type & PERF_SAMPLE_CGROUP) + size += sizeof(data->cgroup); + + if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE) + size += sizeof(data->data_page_size); + + if (sample_type & PERF_SAMPLE_CODE_PAGE_SIZE) + size += sizeof(data->code_page_size); + event->header_size = size; } @@ -1947,6 +1974,11 @@ static void perf_put_aux_event(struct perf_event *event) } } +static bool perf_need_aux_event(struct perf_event *event) +{ + return !!event->attr.aux_output || !!event->attr.aux_sample_size; +} + static int perf_get_aux_event(struct perf_event *event, struct perf_event *group_leader) { @@ -1959,7 +1991,17 @@ static int perf_get_aux_event(struct perf_event *event, if (!group_leader) return 0; - if (!perf_aux_output_match(event, group_leader)) + /* + * aux_output and aux_sample_size are mutually exclusive. + */ + if (event->attr.aux_output && event->attr.aux_sample_size) + return 0; + + if (event->attr.aux_output && + !perf_aux_output_match(event, group_leader)) + return 0; + + if (event->attr.aux_sample_size && !group_leader->pmu->snapshot_aux) return 0; if (!atomic_long_inc_not_zero(&group_leader->refcount)) @@ -4290,7 +4332,7 @@ find_get_context(struct pmu *pmu, struct task_struct *task, goto errout; if (event->attach_state & PERF_ATTACH_TASK_DATA) { - task_ctx_data = kzalloc(pmu->task_ctx_size, GFP_KERNEL); + task_ctx_data = alloc_task_ctx_data(pmu); if (!task_ctx_data) { err = -ENOMEM; goto errout; @@ -4348,11 +4390,11 @@ find_get_context(struct pmu *pmu, struct task_struct *task, } } - kfree(task_ctx_data); + free_task_ctx_data(pmu, task_ctx_data); return ctx; errout: - kfree(task_ctx_data); + free_task_ctx_data(pmu, task_ctx_data); return ERR_PTR(err); } @@ -4453,6 +4495,8 @@ static void unaccount_event(struct perf_event *event) atomic_dec(&nr_comm_events); if (event->attr.namespaces) atomic_dec(&nr_namespaces_events); + if (event->attr.cgroup) + atomic_dec(&nr_cgroup_events); if (event->attr.task) atomic_dec(&nr_task_events); if (event->attr.freq) @@ -6234,6 +6278,122 @@ perf_output_sample_ustack(struct perf_output_handle *handle, u64 dump_size, } } +static unsigned long perf_prepare_sample_aux(struct perf_event *event, + struct perf_sample_data *data, + size_t size) +{ + struct perf_event *sampler = event->aux_event; + struct ring_buffer *rb; + + data->aux_size = 0; + + if (!sampler) + goto out; + + if (WARN_ON_ONCE(READ_ONCE(sampler->state) != PERF_EVENT_STATE_ACTIVE)) + goto out; + + if (WARN_ON_ONCE(READ_ONCE(sampler->oncpu) != smp_processor_id())) + goto out; + + rb = ring_buffer_get(sampler->parent ? sampler->parent : sampler); + if (!rb) + goto out; + + /* + * If this is an NMI hit inside sampling code, don't take + * the sample. See also perf_aux_sample_output(). + */ + if (READ_ONCE(rb->aux_in_sampling)) { + data->aux_size = 0; + } else { + size = min_t(size_t, size, perf_aux_size(rb)); + data->aux_size = ALIGN(size, sizeof(u64)); + } + ring_buffer_put(rb); + +out: + return data->aux_size; +} + +long perf_pmu_snapshot_aux(struct ring_buffer *rb, + struct perf_event *event, + struct perf_output_handle *handle, + unsigned long size) +{ + unsigned long flags; + long ret; + + /* + * Normal ->start()/->stop() callbacks run in IRQ mode in scheduler + * paths. If we start calling them in NMI context, they may race with + * the IRQ ones, that is, for example, re-starting an event that's just + * been stopped, which is why we're using a separate callback that + * doesn't change the event state. + * + * IRQs need to be disabled to prevent IPIs from racing with us. + */ + local_irq_save(flags); + /* + * Guard against NMI hits inside the critical section; + * see also perf_prepare_sample_aux(). + */ + WRITE_ONCE(rb->aux_in_sampling, 1); + barrier(); + + ret = event->pmu->snapshot_aux(event, handle, size); + + barrier(); + WRITE_ONCE(rb->aux_in_sampling, 0); + local_irq_restore(flags); + + return ret; +} + +static void perf_aux_sample_output(struct perf_event *event, + struct perf_output_handle *handle, + struct perf_sample_data *data) +{ + struct perf_event *sampler = event->aux_event; + unsigned long pad; + struct ring_buffer *rb; + long size; + + if (WARN_ON_ONCE(!sampler || !data->aux_size)) + return; + + rb = ring_buffer_get(sampler->parent ? sampler->parent : sampler); + if (!rb) + return; + + size = perf_pmu_snapshot_aux(rb, sampler, handle, data->aux_size); + + /* + * An error here means that perf_output_copy() failed (returned a + * non-zero surplus that it didn't copy), which in its current + * enlightened implementation is not possible. If that changes, we'd + * like to know. + */ + if (WARN_ON_ONCE(size < 0)) + goto out_put; + + /* + * The pad comes from ALIGN()ing data->aux_size up to u64 in + * perf_prepare_sample_aux(), so should not be more than that. + */ + pad = data->aux_size - size; + if (WARN_ON_ONCE(pad >= sizeof(u64))) + pad = 8; + + if (pad) { + u64 zero = 0; + perf_output_copy(handle, &zero, pad); + } + +out_put: + ring_buffer_put(rb); +} + static void __perf_event_header__init_id(struct perf_event_header *header, struct perf_sample_data *data, struct perf_event *event) @@ -6403,6 +6563,11 @@ static void perf_output_read(struct perf_output_handle *handle, perf_output_read_one(handle, event, enabled, running); } +static inline bool perf_sample_save_hw_index(struct perf_event *event) +{ + return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX; +} + void perf_output_sample(struct perf_output_handle *handle, struct perf_event_header *header, struct perf_sample_data *data, @@ -6491,6 +6656,8 @@ void perf_output_sample(struct perf_output_handle *handle, * sizeof(struct perf_branch_entry); perf_output_put(handle, data->br_stack->nr); + if (perf_sample_save_hw_index(event)) + perf_output_put(handle, data->br_stack->hw_idx); perf_output_copy(handle, data->br_stack->entries, size); } else { /* @@ -6524,8 +6691,8 @@ void perf_output_sample(struct perf_output_handle *handle, data->regs_user.regs); } - if (sample_type & PERF_SAMPLE_WEIGHT) - perf_output_put(handle, data->weight); + if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) + perf_output_put(handle, data->weight.full); if (sample_type & PERF_SAMPLE_DATA_SRC) perf_output_put(handle, data->data_src.val); @@ -6553,6 +6720,22 @@ void perf_output_sample(struct perf_output_handle *handle, if (sample_type & PERF_SAMPLE_PHYS_ADDR) perf_output_put(handle, data->phys_addr); + if (sample_type & PERF_SAMPLE_CGROUP) + perf_output_put(handle, data->cgroup); + + if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE) + perf_output_put(handle, data->data_page_size); + + if (sample_type & PERF_SAMPLE_CODE_PAGE_SIZE) + perf_output_put(handle, data->code_page_size); + + if (sample_type & PERF_SAMPLE_AUX) { + perf_output_put(handle, data->aux_size); + + if (data->aux_size) + perf_aux_sample_output(event, handle, data); + } + if (!event->attr.watermark) { int wakeup_events = event->attr.wakeup_events; @@ -6603,6 +6786,94 @@ static u64 perf_virt_to_phys(u64 virt) return phys_addr; } +#ifdef CONFIG_MMU + +/* + * Return the MMU page size of a given virtual address + */ +static u64 __perf_get_page_size(struct mm_struct *mm, unsigned long addr) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + pgd = pgd_offset(mm, addr); + if (pgd_none(*pgd)) + return 0; + + p4d = p4d_offset(pgd, addr); + if (!p4d_present(*p4d)) + return 0; + + if (p4d_leaf(*p4d)) + return 1ULL << P4D_SHIFT; + + pud = pud_offset(p4d, addr); + if (!pud_present(*pud)) + return 0; + + if (pud_leaf(*pud)) + return 1ULL << PUD_SHIFT; + + pmd = pmd_offset(pud, addr); + if (!pmd_present(*pmd)) + return 0; + + if (pmd_leaf(*pmd)) + return 1ULL << PMD_SHIFT; + + pte = pte_offset_map(pmd, addr); + if (!pte_present(*pte)) { + pte_unmap(pte); + return 0; + } + + pte_unmap(pte); + return PAGE_SIZE; +} + +#else + +static u64 __perf_get_page_size(struct mm_struct *mm, unsigned long addr) +{ + return 0; +} + +#endif + +static u64 perf_get_page_size(unsigned long addr) +{ + struct mm_struct *mm; + unsigned long flags; + u64 size; + + if (!addr) + return 0; + + /* + * Software page-table walkers must disable IRQs, + * which prevents any tear down of the page tables. + */ + local_irq_save(flags); + + mm = current->mm; + if (!mm) { + /* + * For kernel threads and the like, use init_mm so that + * we can find kernel memory. + */ + mm = &init_mm; + } + + size = __perf_get_page_size(mm, addr); + + local_irq_restore(flags); + + return size; +} + static struct perf_callchain_entry __empty_callchain = { .nr = 0, }; struct perf_callchain_entry * @@ -6638,7 +6909,7 @@ void perf_prepare_sample(struct perf_event_header *header, __perf_event_header__init_id(header, data, event); - if (sample_type & PERF_SAMPLE_IP) + if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE)) data->ip = perf_instruction_pointer(regs); if (sample_type & PERF_SAMPLE_CALLCHAIN) { @@ -6680,6 +6951,9 @@ void perf_prepare_sample(struct perf_event_header *header, if (sample_type & PERF_SAMPLE_BRANCH_STACK) { int size = sizeof(u64); /* nr */ if (data->br_stack) { + if (perf_sample_save_hw_index(event)) + size += sizeof(u64); + size += data->br_stack->nr * sizeof(struct perf_branch_entry); } @@ -6744,6 +7018,56 @@ void perf_prepare_sample(struct perf_event_header *header, if (sample_type & PERF_SAMPLE_PHYS_ADDR) data->phys_addr = perf_virt_to_phys(data->addr); + +#ifdef CONFIG_CGROUP_PERF + if (sample_type & PERF_SAMPLE_CGROUP) { + struct cgroup *cgrp; + + /* protected by RCU */ + cgrp = task_css_check(current, perf_event_cgrp_id, 1)->cgroup; + data->cgroup = cgroup_id(cgrp); + } +#endif + + /* + * PERF_DATA_PAGE_SIZE requires PERF_SAMPLE_ADDR. If the user doesn't + * require PERF_SAMPLE_ADDR, kernel implicitly retrieve the data->addr, + * but the value will not dump to the userspace. + */ + if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE) + data->data_page_size = perf_get_page_size(data->addr); + + if (sample_type & PERF_SAMPLE_CODE_PAGE_SIZE) + data->code_page_size = perf_get_page_size(data->ip); + + if (sample_type & PERF_SAMPLE_AUX) { + u64 size; + + header->size += sizeof(u64); /* size */ + + /* + * Given the 16bit nature of header::size, an AUX sample can + * easily overflow it, what with all the preceding sample bits. + * Make sure this doesn't happen by using up to U16_MAX bytes + * per sample in total (rounded down to 8 byte boundary). + */ + size = min_t(size_t, U16_MAX - header->size, + event->attr.aux_sample_size); + size = rounddown(size, 8); + size = perf_prepare_sample_aux(event, data, size); + + WARN_ON_ONCE(size + header->size > U16_MAX); + header->size += size; + } + /* + * If you're adding more sample types here, you likely need to do + * something about the overflowing header::size, like repurpose the + * lowest 3 bits of size, which should be always zero at the moment. + * This raises a more important question, do we really need 512k sized + * samples and why, so good argumentation is in order for whatever you + * do here next. + */ + WARN_ON_ONCE(header->size & 7); } static __always_inline int @@ -7395,6 +7719,105 @@ void perf_event_namespaces(struct task_struct *task) NULL); } +/* + * cgroup tracking + */ +#ifdef CONFIG_CGROUP_PERF + +struct perf_cgroup_event { + char *path; + int path_size; + struct { + struct perf_event_header header; + u64 id; + char path[]; + } event_id; +}; + +static int perf_event_cgroup_match(struct perf_event *event) +{ + return event->attr.cgroup; +} + +static void perf_event_cgroup_output(struct perf_event *event, void *data) +{ + struct perf_cgroup_event *cgroup_event = data; + struct perf_output_handle handle; + struct perf_sample_data sample; + u16 header_size = cgroup_event->event_id.header.size; + int ret; + + if (!perf_event_cgroup_match(event)) + return; + + perf_event_header__init_id(&cgroup_event->event_id.header, + &sample, event); + ret = perf_output_begin(&handle, event, + cgroup_event->event_id.header.size); + if (ret) + goto out; + + perf_output_put(&handle, cgroup_event->event_id); + __output_copy(&handle, cgroup_event->path, cgroup_event->path_size); + + perf_event__output_id_sample(event, &handle, &sample); + + perf_output_end(&handle); +out: + cgroup_event->event_id.header.size = header_size; +} + +static void perf_event_cgroup(struct cgroup *cgrp) +{ + struct perf_cgroup_event cgroup_event; + char path_enomem[16] = "//enomem"; + char *pathname; + size_t size; + + if (!atomic_read(&nr_cgroup_events)) + return; + + cgroup_event = (struct perf_cgroup_event){ + .event_id = { + .header = { + .type = PERF_RECORD_CGROUP, + .misc = 0, + .size = sizeof(cgroup_event.event_id), + }, + .id = cgroup_id(cgrp), + }, + }; + + pathname = kmalloc(PATH_MAX, GFP_KERNEL); + if (pathname == NULL) { + cgroup_event.path = path_enomem; + } else { + /* just to be sure to have enough space for alignment */ + cgroup_path(cgrp, pathname, PATH_MAX - sizeof(u64)); + cgroup_event.path = pathname; + } + + /* + * Since our buffer works in 8 byte units we need to align our string + * size to a multiple of 8. However, we must guarantee the tail end is + * zero'd out to avoid leaking random bits to userspace. + */ + size = strlen(cgroup_event.path) + 1; + while (!IS_ALIGNED(size, sizeof(u64))) + cgroup_event.path[size++] = '\0'; + + cgroup_event.event_id.header.size += size; + cgroup_event.path_size = size; + + perf_iterate_sb(perf_event_cgroup_output, + &cgroup_event, + NULL); + + kfree(pathname); +} + +#endif + /* * mmap tracking */ @@ -10411,6 +10834,8 @@ static void account_event(struct perf_event *event) atomic_inc(&nr_comm_events); if (event->attr.namespaces) atomic_inc(&nr_namespaces_events); + if (event->attr.cgroup) + atomic_inc(&nr_cgroup_events); if (event->attr.task) atomic_inc(&nr_task_events); if (event->attr.freq) @@ -10710,7 +11135,7 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr, attr->size = size; - if (attr->__reserved_1 || attr->__reserved_2) + if (attr->__reserved_1 || attr->__reserved_2 || attr->__reserved_3) return -EINVAL; if (attr->sample_type & ~(PERF_SAMPLE_MAX-1)) @@ -10779,6 +11204,15 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr, if (attr->sample_type & PERF_SAMPLE_REGS_INTR) ret = perf_reg_validate(attr->sample_regs_intr); + +#ifndef CONFIG_CGROUP_PERF + if (attr->sample_type & PERF_SAMPLE_CGROUP) + return -EINVAL; +#endif + if ((attr->sample_type & PERF_SAMPLE_WEIGHT) && + (attr->sample_type & PERF_SAMPLE_WEIGHT_STRUCT)) + return -EINVAL; + out: return ret; @@ -11281,7 +11715,7 @@ SYSCALL_DEFINE5(perf_event_open, } } - if (event->attr.aux_output && !perf_get_aux_event(event, group_leader)) { + if (perf_need_aux_event(event) && !perf_get_aux_event(event, group_leader)) { err = -EINVAL; goto err_locked; } @@ -11897,8 +12331,7 @@ inherit_event(struct perf_event *parent_event, !child_ctx->task_ctx_data) { struct pmu *pmu = child_event->pmu; - child_ctx->task_ctx_data = kzalloc(pmu->task_ctx_size, - GFP_KERNEL); + child_ctx->task_ctx_data = alloc_task_ctx_data(pmu); if (!child_ctx->task_ctx_data) { free_event(child_event); return ERR_PTR(-ENOMEM); @@ -12412,6 +12845,12 @@ static void perf_cgroup_css_free(struct cgroup_subsys_state *css) kfree(jc); } +static int perf_cgroup_css_online(struct cgroup_subsys_state *css) +{ + perf_event_cgroup(css->cgroup); + return 0; +} + static int __perf_cgroup_move(void *info) { struct task_struct *task = info; @@ -12433,6 +12872,7 @@ static void perf_cgroup_attach(struct cgroup_taskset *tset) struct cgroup_subsys perf_event_cgrp_subsys = { .css_alloc = perf_cgroup_css_alloc, .css_free = perf_cgroup_css_free, + .css_online = perf_cgroup_css_online, .attach = perf_cgroup_attach, /* * Implicitly enable on dfl hierarchy so that perf events can diff --git a/kernel/events/internal.h b/kernel/events/internal.h index 6e87b358e082628e007edcd1499a0f75b03c2f94..00a0d3f08de3f2f396e0156df16cc01a4af195bc 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -50,6 +50,7 @@ struct ring_buffer { unsigned long aux_mmap_locked; void (*free_aux)(void *); refcount_t aux_refcount; + int aux_in_sampling; void **aux_pages; void *aux_priv; diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index ffb59a4ef4ff3d5e4774296c802967224af40789..07ab081952f33ce8c0aec74a6a6b11568a7050de 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -562,6 +562,42 @@ void *perf_get_aux(struct perf_output_handle *handle) } EXPORT_SYMBOL_GPL(perf_get_aux); +/* + * Copy out AUX data from an AUX handle. + */ +long perf_output_copy_aux(struct perf_output_handle *aux_handle, + struct perf_output_handle *handle, + unsigned long from, unsigned long to) +{ + unsigned long tocopy, remainder, len = 0; + struct ring_buffer *rb = aux_handle->rb; + void *addr; + + from &= (rb->aux_nr_pages << PAGE_SHIFT) - 1; + to &= (rb->aux_nr_pages << PAGE_SHIFT) - 1; + + do { + tocopy = PAGE_SIZE - offset_in_page(from); + if (to > from) + tocopy = min(tocopy, to - from); + if (!tocopy) + break; + + addr = rb->aux_pages[from >> PAGE_SHIFT]; + addr += offset_in_page(from); + + remainder = perf_output_copy(handle, addr, tocopy); + if (remainder) + return -EFAULT; + + len += tocopy; + from += tocopy; + from &= (rb->aux_nr_pages << PAGE_SHIFT) - 1; + } while (to != from); + + return len; +} + #define PERF_AUX_GFP (GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY) static struct page *rb_alloc_aux_page(int node, int order) diff --git a/tools/arch/x86/include/asm/amd-ibs.h b/tools/arch/x86/include/asm/amd-ibs.h new file mode 100644 index 0000000000000000000000000000000000000000..21e01cf6162e923e63c2810a093d92453783332e --- /dev/null +++ b/tools/arch/x86/include/asm/amd-ibs.h @@ -0,0 +1,136 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * From PPR Vol 1 for AMD Family 19h Model 01h B1 + * 55898 Rev 0.35 - Feb 5, 2021 + */ + +#include "msr-index.h" + +/* + * IBS Hardware MSRs + */ + +/* MSR 0xc0011030: IBS Fetch Control */ +union ibs_fetch_ctl { + __u64 val; + struct { + __u64 fetch_maxcnt:16,/* 0-15: instruction fetch max. count */ + fetch_cnt:16, /* 16-31: instruction fetch count */ + fetch_lat:16, /* 32-47: instruction fetch latency */ + fetch_en:1, /* 48: instruction fetch enable */ + fetch_val:1, /* 49: instruction fetch valid */ + fetch_comp:1, /* 50: instruction fetch complete */ + ic_miss:1, /* 51: i-cache miss */ + phy_addr_valid:1,/* 52: physical address valid */ + l1tlb_pgsz:2, /* 53-54: i-cache L1TLB page size + * (needs IbsPhyAddrValid) */ + l1tlb_miss:1, /* 55: i-cache fetch missed in L1TLB */ + l2tlb_miss:1, /* 56: i-cache fetch missed in L2TLB */ + rand_en:1, /* 57: random tagging enable */ + fetch_l2_miss:1,/* 58: L2 miss for sampled fetch + * (needs IbsFetchComp) */ + l3_miss_only:1, /* 59: Collect L3 miss samples only */ + fetch_oc_miss:1,/* 60: Op cache miss for the sampled fetch */ + fetch_l3_miss:1,/* 61: L3 cache miss for the sampled fetch */ + reserved:2; /* 62-63: reserved */ + }; +}; + +/* MSR 0xc0011033: IBS Execution Control */ +union ibs_op_ctl { + __u64 val; + struct { + __u64 opmaxcnt:16, /* 0-15: periodic op max. count */ + l3_miss_only:1, /* 16: Collect L3 miss samples only */ + op_en:1, /* 17: op sampling enable */ + op_val:1, /* 18: op sample valid */ + cnt_ctl:1, /* 19: periodic op counter control */ + opmaxcnt_ext:7, /* 20-26: upper 7 bits of periodic op maximum count */ + reserved0:5, /* 27-31: reserved */ + opcurcnt:27, /* 32-58: periodic op counter current count */ + reserved1:5; /* 59-63: reserved */ + }; +}; + +/* MSR 0xc0011035: IBS Op Data 2 */ +union ibs_op_data { + __u64 val; + struct { + __u64 comp_to_ret_ctr:16, /* 0-15: op completion to retire count */ + tag_to_ret_ctr:16, /* 15-31: op tag to retire count */ + reserved1:2, /* 32-33: reserved */ + op_return:1, /* 34: return op */ + op_brn_taken:1, /* 35: taken branch op */ + op_brn_misp:1, /* 36: mispredicted branch op */ + op_brn_ret:1, /* 37: branch op retired */ + op_rip_invalid:1, /* 38: RIP is invalid */ + op_brn_fuse:1, /* 39: fused branch op */ + op_microcode:1, /* 40: microcode op */ + reserved2:23; /* 41-63: reserved */ + }; +}; + +/* MSR 0xc0011036: IBS Op Data 2 */ +union ibs_op_data2 { + __u64 val; + struct { + __u64 data_src_lo:3, /* 0-2: data source low */ + reserved0:1, /* 3: reserved */ + rmt_node:1, /* 4: destination node */ + cache_hit_st:1, /* 5: cache hit state */ + data_src_hi:2, /* 6-7: data source high */ + reserved1:56; /* 8-63: reserved */ + }; +}; + +/* MSR 0xc0011037: IBS Op Data 3 */ +union ibs_op_data3 { + __u64 val; + struct { + __u64 ld_op:1, /* 0: load op */ + st_op:1, /* 1: store op */ + dc_l1tlb_miss:1, /* 2: data cache L1TLB miss */ + dc_l2tlb_miss:1, /* 3: data cache L2TLB hit in 2M page */ + dc_l1tlb_hit_2m:1, /* 4: data cache L1TLB hit in 2M page */ + dc_l1tlb_hit_1g:1, /* 5: data cache L1TLB hit in 1G page */ + dc_l2tlb_hit_2m:1, /* 6: data cache L2TLB hit in 2M page */ + dc_miss:1, /* 7: data cache miss */ + dc_mis_acc:1, /* 8: misaligned access */ + reserved:4, /* 9-12: reserved */ + dc_wc_mem_acc:1, /* 13: write combining memory access */ + dc_uc_mem_acc:1, /* 14: uncacheable memory access */ + dc_locked_op:1, /* 15: locked operation */ + dc_miss_no_mab_alloc:1, /* 16: DC miss with no MAB allocated */ + dc_lin_addr_valid:1, /* 17: data cache linear address valid */ + dc_phy_addr_valid:1, /* 18: data cache physical address valid */ + dc_l2_tlb_hit_1g:1, /* 19: data cache L2 hit in 1GB page */ + l2_miss:1, /* 20: L2 cache miss */ + sw_pf:1, /* 21: software prefetch */ + op_mem_width:4, /* 22-25: load/store size in bytes */ + op_dc_miss_open_mem_reqs:6, /* 26-31: outstanding mem reqs on DC fill */ + dc_miss_lat:16, /* 32-47: data cache miss latency */ + tlb_refill_lat:16; /* 48-63: L1 TLB refill latency */ + }; +}; + +/* MSR 0xc001103c: IBS Fetch Control Extended */ +union ic_ibs_extd_ctl { + __u64 val; + struct { + __u64 itlb_refill_lat:16, /* 0-15: ITLB Refill latency for sampled fetch */ + reserved:48; /* 16-63: reserved */ + }; +}; + +/* + * IBS driver related + */ + +struct perf_ibs_data { + u32 size; + union { + u32 data[0]; /* data buffer starts here */ + u32 caps; + }; + u64 regs[MSR_AMD64_IBS_REG_COUNT_MAX]; +}; diff --git a/tools/arch/x86/include/asm/disabled-features.h b/tools/arch/x86/include/asm/disabled-features.h index 8e1d0bb463611026f80589660c7ea16c850fecf1..f6d8002d11e0e976348c7dc67a64969668d1b2f2 100644 --- a/tools/arch/x86/include/asm/disabled-features.h +++ b/tools/arch/x86/include/asm/disabled-features.h @@ -84,6 +84,7 @@ #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 -#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) +#define DISABLED_MASK19 0 +#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20) #endif /* _ASM_X86_DISABLED_FEATURES_H */ diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h new file mode 100644 index 0000000000000000000000000000000000000000..20ce682a2540f389dc62f29dbf07c0b3f32620ea --- /dev/null +++ b/tools/arch/x86/include/asm/msr-index.h @@ -0,0 +1,857 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_MSR_INDEX_H +#define _ASM_X86_MSR_INDEX_H + +#include + +/* + * CPU model specific register (MSR) numbers. + * + * Do not add new entries to this file unless the definitions are shared + * between multiple compilation units. + */ + +/* x86-64 specific MSRs */ +#define MSR_EFER 0xc0000080 /* extended feature register */ +#define MSR_STAR 0xc0000081 /* legacy mode SYSCALL target */ +#define MSR_LSTAR 0xc0000082 /* long mode SYSCALL target */ +#define MSR_CSTAR 0xc0000083 /* compat mode SYSCALL target */ +#define MSR_SYSCALL_MASK 0xc0000084 /* EFLAGS mask for syscall */ +#define MSR_FS_BASE 0xc0000100 /* 64bit FS base */ +#define MSR_GS_BASE 0xc0000101 /* 64bit GS base */ +#define MSR_KERNEL_GS_BASE 0xc0000102 /* SwapGS GS shadow */ +#define MSR_TSC_AUX 0xc0000103 /* Auxiliary TSC */ + +/* EFER bits: */ +#define _EFER_SCE 0 /* SYSCALL/SYSRET */ +#define _EFER_LME 8 /* Long mode enable */ +#define _EFER_LMA 10 /* Long mode active (read-only) */ +#define _EFER_NX 11 /* No execute enable */ +#define _EFER_SVME 12 /* Enable virtualization */ +#define _EFER_LMSLE 13 /* Long Mode Segment Limit Enable */ +#define _EFER_FFXSR 14 /* Enable Fast FXSAVE/FXRSTOR */ + +#define EFER_SCE (1<<_EFER_SCE) +#define EFER_LME (1<<_EFER_LME) +#define EFER_LMA (1<<_EFER_LMA) +#define EFER_NX (1<<_EFER_NX) +#define EFER_SVME (1<<_EFER_SVME) +#define EFER_LMSLE (1<<_EFER_LMSLE) +#define EFER_FFXSR (1<<_EFER_FFXSR) + +/* Intel MSRs. Some also available on other CPUs */ + +#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */ +#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */ +#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */ +#define SPEC_CTRL_STIBP BIT(SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */ +#define SPEC_CTRL_SSBD_SHIFT 2 /* Speculative Store Bypass Disable bit */ +#define SPEC_CTRL_SSBD BIT(SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */ + +#define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */ +#define PRED_CMD_IBPB BIT(0) /* Indirect Branch Prediction Barrier */ + +#define MSR_PPIN_CTL 0x0000004e +#define MSR_PPIN 0x0000004f + +#define MSR_IA32_PERFCTR0 0x000000c1 +#define MSR_IA32_PERFCTR1 0x000000c2 +#define MSR_FSB_FREQ 0x000000cd +#define MSR_PLATFORM_INFO 0x000000ce +#define MSR_PLATFORM_INFO_CPUID_FAULT_BIT 31 +#define MSR_PLATFORM_INFO_CPUID_FAULT BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT) + +#define MSR_IA32_UMWAIT_CONTROL 0xe1 +#define MSR_IA32_UMWAIT_CONTROL_C02_DISABLE BIT(0) +#define MSR_IA32_UMWAIT_CONTROL_RESERVED BIT(1) +/* + * The time field is bit[31:2], but representing a 32bit value with + * bit[1:0] zero. + */ +#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U) + +#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2 +#define NHM_C3_AUTO_DEMOTE (1UL << 25) +#define NHM_C1_AUTO_DEMOTE (1UL << 26) +#define ATM_LNC_C6_AUTO_DEMOTE (1UL << 25) +#define SNB_C3_AUTO_UNDEMOTE (1UL << 27) +#define SNB_C1_AUTO_UNDEMOTE (1UL << 28) + +#define MSR_MTRRcap 0x000000fe + +#define MSR_IA32_ARCH_CAPABILITIES 0x0000010a +#define ARCH_CAP_RDCL_NO BIT(0) /* Not susceptible to Meltdown */ +#define ARCH_CAP_IBRS_ALL BIT(1) /* Enhanced IBRS support */ +#define ARCH_CAP_SKIP_VMENTRY_L1DFLUSH BIT(3) /* Skip L1D flush on vmentry */ +#define ARCH_CAP_SSB_NO BIT(4) /* + * Not susceptible to Speculative Store Bypass + * attack, so no Speculative Store Bypass + * control required. + */ +#define ARCH_CAP_MDS_NO BIT(5) /* + * Not susceptible to + * Microarchitectural Data + * Sampling (MDS) vulnerabilities. + */ + +#define MSR_IA32_FLUSH_CMD 0x0000010b +#define L1D_FLUSH BIT(0) /* + * Writeback and invalidate the + * L1 data cache. + */ + +#define MSR_IA32_BBL_CR_CTL 0x00000119 +#define MSR_IA32_BBL_CR_CTL3 0x0000011e + +#define MSR_IA32_SYSENTER_CS 0x00000174 +#define MSR_IA32_SYSENTER_ESP 0x00000175 +#define MSR_IA32_SYSENTER_EIP 0x00000176 + +#define MSR_IA32_MCG_CAP 0x00000179 +#define MSR_IA32_MCG_STATUS 0x0000017a +#define MSR_IA32_MCG_CTL 0x0000017b +#define MSR_IA32_MCG_EXT_CTL 0x000004d0 + +#define MSR_OFFCORE_RSP_0 0x000001a6 +#define MSR_OFFCORE_RSP_1 0x000001a7 +#define MSR_TURBO_RATIO_LIMIT 0x000001ad +#define MSR_TURBO_RATIO_LIMIT1 0x000001ae +#define MSR_TURBO_RATIO_LIMIT2 0x000001af + +#define MSR_LBR_SELECT 0x000001c8 +#define MSR_LBR_TOS 0x000001c9 +#define MSR_LBR_NHM_FROM 0x00000680 +#define MSR_LBR_NHM_TO 0x000006c0 +#define MSR_LBR_CORE_FROM 0x00000040 +#define MSR_LBR_CORE_TO 0x00000060 + +#define MSR_LBR_INFO_0 0x00000dc0 /* ... 0xddf for _31 */ +#define LBR_INFO_MISPRED BIT_ULL(63) +#define LBR_INFO_IN_TX BIT_ULL(62) +#define LBR_INFO_ABORT BIT_ULL(61) +#define LBR_INFO_CYCLES 0xffff + +#define MSR_IA32_PEBS_ENABLE 0x000003f1 +#define MSR_PEBS_DATA_CFG 0x000003f2 +#define MSR_IA32_DS_AREA 0x00000600 +#define MSR_IA32_PERF_CAPABILITIES 0x00000345 +#define MSR_PEBS_LD_LAT_THRESHOLD 0x000003f6 + +#define MSR_IA32_RTIT_CTL 0x00000570 +#define RTIT_CTL_TRACEEN BIT(0) +#define RTIT_CTL_CYCLEACC BIT(1) +#define RTIT_CTL_OS BIT(2) +#define RTIT_CTL_USR BIT(3) +#define RTIT_CTL_PWR_EVT_EN BIT(4) +#define RTIT_CTL_FUP_ON_PTW BIT(5) +#define RTIT_CTL_FABRIC_EN BIT(6) +#define RTIT_CTL_CR3EN BIT(7) +#define RTIT_CTL_TOPA BIT(8) +#define RTIT_CTL_MTC_EN BIT(9) +#define RTIT_CTL_TSC_EN BIT(10) +#define RTIT_CTL_DISRETC BIT(11) +#define RTIT_CTL_PTW_EN BIT(12) +#define RTIT_CTL_BRANCH_EN BIT(13) +#define RTIT_CTL_MTC_RANGE_OFFSET 14 +#define RTIT_CTL_MTC_RANGE (0x0full << RTIT_CTL_MTC_RANGE_OFFSET) +#define RTIT_CTL_CYC_THRESH_OFFSET 19 +#define RTIT_CTL_CYC_THRESH (0x0full << RTIT_CTL_CYC_THRESH_OFFSET) +#define RTIT_CTL_PSB_FREQ_OFFSET 24 +#define RTIT_CTL_PSB_FREQ (0x0full << RTIT_CTL_PSB_FREQ_OFFSET) +#define RTIT_CTL_ADDR0_OFFSET 32 +#define RTIT_CTL_ADDR0 (0x0full << RTIT_CTL_ADDR0_OFFSET) +#define RTIT_CTL_ADDR1_OFFSET 36 +#define RTIT_CTL_ADDR1 (0x0full << RTIT_CTL_ADDR1_OFFSET) +#define RTIT_CTL_ADDR2_OFFSET 40 +#define RTIT_CTL_ADDR2 (0x0full << RTIT_CTL_ADDR2_OFFSET) +#define RTIT_CTL_ADDR3_OFFSET 44 +#define RTIT_CTL_ADDR3 (0x0full << RTIT_CTL_ADDR3_OFFSET) +#define MSR_IA32_RTIT_STATUS 0x00000571 +#define RTIT_STATUS_FILTEREN BIT(0) +#define RTIT_STATUS_CONTEXTEN BIT(1) +#define RTIT_STATUS_TRIGGEREN BIT(2) +#define RTIT_STATUS_BUFFOVF BIT(3) +#define RTIT_STATUS_ERROR BIT(4) +#define RTIT_STATUS_STOPPED BIT(5) +#define RTIT_STATUS_BYTECNT_OFFSET 32 +#define RTIT_STATUS_BYTECNT (0x1ffffull << RTIT_STATUS_BYTECNT_OFFSET) +#define MSR_IA32_RTIT_ADDR0_A 0x00000580 +#define MSR_IA32_RTIT_ADDR0_B 0x00000581 +#define MSR_IA32_RTIT_ADDR1_A 0x00000582 +#define MSR_IA32_RTIT_ADDR1_B 0x00000583 +#define MSR_IA32_RTIT_ADDR2_A 0x00000584 +#define MSR_IA32_RTIT_ADDR2_B 0x00000585 +#define MSR_IA32_RTIT_ADDR3_A 0x00000586 +#define MSR_IA32_RTIT_ADDR3_B 0x00000587 +#define MSR_IA32_RTIT_CR3_MATCH 0x00000572 +#define MSR_IA32_RTIT_OUTPUT_BASE 0x00000560 +#define MSR_IA32_RTIT_OUTPUT_MASK 0x00000561 + +#define MSR_MTRRfix64K_00000 0x00000250 +#define MSR_MTRRfix16K_80000 0x00000258 +#define MSR_MTRRfix16K_A0000 0x00000259 +#define MSR_MTRRfix4K_C0000 0x00000268 +#define MSR_MTRRfix4K_C8000 0x00000269 +#define MSR_MTRRfix4K_D0000 0x0000026a +#define MSR_MTRRfix4K_D8000 0x0000026b +#define MSR_MTRRfix4K_E0000 0x0000026c +#define MSR_MTRRfix4K_E8000 0x0000026d +#define MSR_MTRRfix4K_F0000 0x0000026e +#define MSR_MTRRfix4K_F8000 0x0000026f +#define MSR_MTRRdefType 0x000002ff + +#define MSR_IA32_CR_PAT 0x00000277 + +#define MSR_IA32_DEBUGCTLMSR 0x000001d9 +#define MSR_IA32_LASTBRANCHFROMIP 0x000001db +#define MSR_IA32_LASTBRANCHTOIP 0x000001dc +#define MSR_IA32_LASTINTFROMIP 0x000001dd +#define MSR_IA32_LASTINTTOIP 0x000001de + +/* DEBUGCTLMSR bits (others vary by model): */ +#define DEBUGCTLMSR_LBR (1UL << 0) /* last branch recording */ +#define DEBUGCTLMSR_BTF_SHIFT 1 +#define DEBUGCTLMSR_BTF (1UL << 1) /* single-step on branches */ +#define DEBUGCTLMSR_TR (1UL << 6) +#define DEBUGCTLMSR_BTS (1UL << 7) +#define DEBUGCTLMSR_BTINT (1UL << 8) +#define DEBUGCTLMSR_BTS_OFF_OS (1UL << 9) +#define DEBUGCTLMSR_BTS_OFF_USR (1UL << 10) +#define DEBUGCTLMSR_FREEZE_LBRS_ON_PMI (1UL << 11) +#define DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI (1UL << 12) +#define DEBUGCTLMSR_FREEZE_IN_SMM_BIT 14 +#define DEBUGCTLMSR_FREEZE_IN_SMM (1UL << DEBUGCTLMSR_FREEZE_IN_SMM_BIT) + +#define MSR_PEBS_FRONTEND 0x000003f7 + +#define MSR_IA32_POWER_CTL 0x000001fc + +#define MSR_IA32_MC0_CTL 0x00000400 +#define MSR_IA32_MC0_STATUS 0x00000401 +#define MSR_IA32_MC0_ADDR 0x00000402 +#define MSR_IA32_MC0_MISC 0x00000403 + +/* C-state Residency Counters */ +#define MSR_PKG_C3_RESIDENCY 0x000003f8 +#define MSR_PKG_C6_RESIDENCY 0x000003f9 +#define MSR_ATOM_PKG_C6_RESIDENCY 0x000003fa +#define MSR_PKG_C7_RESIDENCY 0x000003fa +#define MSR_CORE_C3_RESIDENCY 0x000003fc +#define MSR_CORE_C6_RESIDENCY 0x000003fd +#define MSR_CORE_C7_RESIDENCY 0x000003fe +#define MSR_KNL_CORE_C6_RESIDENCY 0x000003ff +#define MSR_PKG_C2_RESIDENCY 0x0000060d +#define MSR_PKG_C8_RESIDENCY 0x00000630 +#define MSR_PKG_C9_RESIDENCY 0x00000631 +#define MSR_PKG_C10_RESIDENCY 0x00000632 + +/* Interrupt Response Limit */ +#define MSR_PKGC3_IRTL 0x0000060a +#define MSR_PKGC6_IRTL 0x0000060b +#define MSR_PKGC7_IRTL 0x0000060c +#define MSR_PKGC8_IRTL 0x00000633 +#define MSR_PKGC9_IRTL 0x00000634 +#define MSR_PKGC10_IRTL 0x00000635 + +/* Run Time Average Power Limiting (RAPL) Interface */ + +#define MSR_RAPL_POWER_UNIT 0x00000606 + +#define MSR_PKG_POWER_LIMIT 0x00000610 +#define MSR_PKG_ENERGY_STATUS 0x00000611 +#define MSR_PKG_PERF_STATUS 0x00000613 +#define MSR_PKG_POWER_INFO 0x00000614 + +#define MSR_DRAM_POWER_LIMIT 0x00000618 +#define MSR_DRAM_ENERGY_STATUS 0x00000619 +#define MSR_DRAM_PERF_STATUS 0x0000061b +#define MSR_DRAM_POWER_INFO 0x0000061c + +#define MSR_PP0_POWER_LIMIT 0x00000638 +#define MSR_PP0_ENERGY_STATUS 0x00000639 +#define MSR_PP0_POLICY 0x0000063a +#define MSR_PP0_PERF_STATUS 0x0000063b + +#define MSR_PP1_POWER_LIMIT 0x00000640 +#define MSR_PP1_ENERGY_STATUS 0x00000641 +#define MSR_PP1_POLICY 0x00000642 + +/* Config TDP MSRs */ +#define MSR_CONFIG_TDP_NOMINAL 0x00000648 +#define MSR_CONFIG_TDP_LEVEL_1 0x00000649 +#define MSR_CONFIG_TDP_LEVEL_2 0x0000064A +#define MSR_CONFIG_TDP_CONTROL 0x0000064B +#define MSR_TURBO_ACTIVATION_RATIO 0x0000064C + +#define MSR_PLATFORM_ENERGY_STATUS 0x0000064D + +#define MSR_PKG_WEIGHTED_CORE_C0_RES 0x00000658 +#define MSR_PKG_ANY_CORE_C0_RES 0x00000659 +#define MSR_PKG_ANY_GFXE_C0_RES 0x0000065A +#define MSR_PKG_BOTH_CORE_GFXE_C0_RES 0x0000065B + +#define MSR_CORE_C1_RES 0x00000660 +#define MSR_MODULE_C6_RES_MS 0x00000664 + +#define MSR_CC6_DEMOTION_POLICY_CONFIG 0x00000668 +#define MSR_MC6_DEMOTION_POLICY_CONFIG 0x00000669 + +#define MSR_ATOM_CORE_RATIOS 0x0000066a +#define MSR_ATOM_CORE_VIDS 0x0000066b +#define MSR_ATOM_CORE_TURBO_RATIOS 0x0000066c +#define MSR_ATOM_CORE_TURBO_VIDS 0x0000066d + + +#define MSR_CORE_PERF_LIMIT_REASONS 0x00000690 +#define MSR_GFX_PERF_LIMIT_REASONS 0x000006B0 +#define MSR_RING_PERF_LIMIT_REASONS 0x000006B1 + +/* Hardware P state interface */ +#define MSR_PPERF 0x0000064e +#define MSR_PERF_LIMIT_REASONS 0x0000064f +#define MSR_PM_ENABLE 0x00000770 +#define MSR_HWP_CAPABILITIES 0x00000771 +#define MSR_HWP_REQUEST_PKG 0x00000772 +#define MSR_HWP_INTERRUPT 0x00000773 +#define MSR_HWP_REQUEST 0x00000774 +#define MSR_HWP_STATUS 0x00000777 + +/* CPUID.6.EAX */ +#define HWP_BASE_BIT (1<<7) +#define HWP_NOTIFICATIONS_BIT (1<<8) +#define HWP_ACTIVITY_WINDOW_BIT (1<<9) +#define HWP_ENERGY_PERF_PREFERENCE_BIT (1<<10) +#define HWP_PACKAGE_LEVEL_REQUEST_BIT (1<<11) + +/* IA32_HWP_CAPABILITIES */ +#define HWP_HIGHEST_PERF(x) (((x) >> 0) & 0xff) +#define HWP_GUARANTEED_PERF(x) (((x) >> 8) & 0xff) +#define HWP_MOSTEFFICIENT_PERF(x) (((x) >> 16) & 0xff) +#define HWP_LOWEST_PERF(x) (((x) >> 24) & 0xff) + +/* IA32_HWP_REQUEST */ +#define HWP_MIN_PERF(x) (x & 0xff) +#define HWP_MAX_PERF(x) ((x & 0xff) << 8) +#define HWP_DESIRED_PERF(x) ((x & 0xff) << 16) +#define HWP_ENERGY_PERF_PREFERENCE(x) (((unsigned long long) x & 0xff) << 24) +#define HWP_EPP_PERFORMANCE 0x00 +#define HWP_EPP_BALANCE_PERFORMANCE 0x80 +#define HWP_EPP_BALANCE_POWERSAVE 0xC0 +#define HWP_EPP_POWERSAVE 0xFF +#define HWP_ACTIVITY_WINDOW(x) ((unsigned long long)(x & 0xff3) << 32) +#define HWP_PACKAGE_CONTROL(x) ((unsigned long long)(x & 0x1) << 42) + +/* IA32_HWP_STATUS */ +#define HWP_GUARANTEED_CHANGE(x) (x & 0x1) +#define HWP_EXCURSION_TO_MINIMUM(x) (x & 0x4) + +/* IA32_HWP_INTERRUPT */ +#define HWP_CHANGE_TO_GUARANTEED_INT(x) (x & 0x1) +#define HWP_EXCURSION_TO_MINIMUM_INT(x) (x & 0x2) + +#define MSR_AMD64_MC0_MASK 0xc0010044 + +#define MSR_IA32_MCx_CTL(x) (MSR_IA32_MC0_CTL + 4*(x)) +#define MSR_IA32_MCx_STATUS(x) (MSR_IA32_MC0_STATUS + 4*(x)) +#define MSR_IA32_MCx_ADDR(x) (MSR_IA32_MC0_ADDR + 4*(x)) +#define MSR_IA32_MCx_MISC(x) (MSR_IA32_MC0_MISC + 4*(x)) + +#define MSR_AMD64_MCx_MASK(x) (MSR_AMD64_MC0_MASK + (x)) + +/* These are consecutive and not in the normal 4er MCE bank block */ +#define MSR_IA32_MC0_CTL2 0x00000280 +#define MSR_IA32_MCx_CTL2(x) (MSR_IA32_MC0_CTL2 + (x)) + +#define MSR_P6_PERFCTR0 0x000000c1 +#define MSR_P6_PERFCTR1 0x000000c2 +#define MSR_P6_EVNTSEL0 0x00000186 +#define MSR_P6_EVNTSEL1 0x00000187 + +#define MSR_KNC_PERFCTR0 0x00000020 +#define MSR_KNC_PERFCTR1 0x00000021 +#define MSR_KNC_EVNTSEL0 0x00000028 +#define MSR_KNC_EVNTSEL1 0x00000029 + +/* Alternative perfctr range with full access. */ +#define MSR_IA32_PMC0 0x000004c1 + +/* Auto-reload via MSR instead of DS area */ +#define MSR_RELOAD_PMC0 0x000014c1 +#define MSR_RELOAD_FIXED_CTR0 0x00001309 + +/* + * AMD64 MSRs. Not complete. See the architecture manual for a more + * complete list. + */ +#define MSR_AMD64_PATCH_LEVEL 0x0000008b +#define MSR_AMD64_TSC_RATIO 0xc0000104 +#define MSR_AMD64_NB_CFG 0xc001001f +#define MSR_AMD64_CPUID_FN_1 0xc0011004 +#define MSR_AMD64_PATCH_LOADER 0xc0010020 +#define MSR_AMD_PERF_CTL 0xc0010062 +#define MSR_AMD_PERF_STATUS 0xc0010063 +#define MSR_AMD_PSTATE_DEF_BASE 0xc0010064 +#define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140 +#define MSR_AMD64_OSVW_STATUS 0xc0010141 +#define MSR_AMD64_LS_CFG 0xc0011020 +#define MSR_AMD64_DC_CFG 0xc0011022 +#define MSR_AMD64_BU_CFG2 0xc001102a +#define MSR_AMD64_IBSFETCHCTL 0xc0011030 +#define MSR_AMD64_IBSFETCHLINAD 0xc0011031 +#define MSR_AMD64_IBSFETCHPHYSAD 0xc0011032 +#define MSR_AMD64_IBSFETCH_REG_COUNT 3 +#define MSR_AMD64_IBSFETCH_REG_MASK ((1UL< #endif diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index db439100de3aa1feeb3db99b139b35ab18ea602d..0eeab4d13bad65010524c845754f299b79604502 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -37,6 +37,21 @@ enum perf_type_id { PERF_TYPE_MAX, /* non-ABI */ }; +/* + * attr.config layout for type PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE + * PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA + * AA: hardware event ID + * EEEEEEEE: PMU type ID + * PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB + * BB: hardware cache ID + * CC: hardware cache op ID + * DD: hardware cache op result ID + * EEEEEEEE: PMU type ID + * If the PMU type ID is 0, the PERF_TYPE_RAW will be applied. + */ +#define PERF_PMU_TYPE_SHIFT 32 +#define PERF_HW_EVENT_MASK 0xffffffff + /* * Generalized performance event event_id types, used by the * attr.event_id parameter of the sys_perf_event_open() @@ -142,12 +157,17 @@ enum perf_event_sample_format { PERF_SAMPLE_REGS_INTR = 1U << 18, PERF_SAMPLE_PHYS_ADDR = 1U << 19, PERF_SAMPLE_AUX = 1U << 20, + PERF_SAMPLE_CGROUP = 1U << 21, + PERF_SAMPLE_DATA_PAGE_SIZE = 1U << 22, + PERF_SAMPLE_CODE_PAGE_SIZE = 1U << 23, + PERF_SAMPLE_WEIGHT_STRUCT = 1U << 24, - PERF_SAMPLE_MAX = 1U << 21, /* non-ABI */ + PERF_SAMPLE_MAX = 1U << 25, /* non-ABI */ __PERF_SAMPLE_CALLCHAIN_EARLY = 1ULL << 63, /* non-ABI; internal use */ }; +#define PERF_SAMPLE_WEIGHT_TYPE (PERF_SAMPLE_WEIGHT | PERF_SAMPLE_WEIGHT_STRUCT) /* * values to program into branch_sample_type when PERF_SAMPLE_BRANCH is set * @@ -181,6 +201,8 @@ enum perf_branch_sample_type_shift { PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */ + PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT = 17, /* save low level index of raw branch records */ + PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */ }; @@ -208,6 +230,8 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_TYPE_SAVE = 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT, + PERF_SAMPLE_BRANCH_HW_INDEX = 1U << PERF_SAMPLE_BRANCH_HW_INDEX_SHIFT, + PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT, }; @@ -377,7 +401,8 @@ struct perf_event_attr { ksymbol : 1, /* include ksymbol events */ bpf_event : 1, /* include bpf events */ aux_output : 1, /* generate AUX records instead of events */ - __reserved_1 : 32; + cgroup : 1, /* include cgroup events */ + __reserved_1 : 31; union { __u32 wakeup_events; /* wakeup every n events */ @@ -853,7 +878,9 @@ enum perf_event_type { * char data[size];}&& PERF_SAMPLE_RAW * * { u64 nr; - * { u64 from, to, flags } lbr[nr];} && PERF_SAMPLE_BRANCH_STACK + * { u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX + * { u64 from, to, flags } lbr[nr]; + * } && PERF_SAMPLE_BRANCH_STACK * * { u64 abi; # enum perf_sample_regs_abi * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_USER @@ -862,7 +889,24 @@ enum perf_event_type { * char data[size]; * u64 dyn_size; } && PERF_SAMPLE_STACK_USER * - * { u64 weight; } && PERF_SAMPLE_WEIGHT + * { union perf_sample_weight + * { + * u64 full; && PERF_SAMPLE_WEIGHT + * #if defined(__LITTLE_ENDIAN_BITFIELD) + * struct { + * u32 var1_dw; + * u16 var2_w; + * u16 var3_w; + * } && PERF_SAMPLE_WEIGHT_STRUCT + * #elif defined(__BIG_ENDIAN_BITFIELD) + * struct { + * u16 var3_w; + * u16 var2_w; + * u32 var1_dw; + * } && PERF_SAMPLE_WEIGHT_STRUCT + * #endif + * } + * } * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION * { u64 abi; # enum perf_sample_regs_abi @@ -870,6 +914,8 @@ enum perf_event_type { * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR * { u64 size; * char data[size]; } && PERF_SAMPLE_AUX + * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE + * { u64 code_page_size;} && PERF_SAMPLE_CODE_PAGE_SIZE * }; */ PERF_RECORD_SAMPLE = 9, @@ -1006,6 +1052,16 @@ enum perf_event_type { */ PERF_RECORD_BPF_EVENT = 18, + /* + * struct { + * struct perf_event_header header; + * u64 id; + * char path[]; + * struct sample_id sample_id; + * }; + */ + PERF_RECORD_CGROUP = 19, + PERF_RECORD_MAX, /* non-ABI */ }; @@ -1064,14 +1120,16 @@ union perf_mem_data_src { mem_lvl_num:4, /* memory hierarchy level number */ mem_remote:1, /* remote */ mem_snoopx:2, /* snoop mode, ext */ - mem_rsvd:24; + mem_blk:3, /* access blocked */ + mem_rsvd:21; }; }; #elif defined(__BIG_ENDIAN_BITFIELD) union perf_mem_data_src { __u64 val; struct { - __u64 mem_rsvd:24, + __u64 mem_rsvd:21, + mem_blk:3, /* access blocked */ mem_snoopx:2, /* snoop mode, ext */ mem_remote:1, /* remote */ mem_lvl_num:4, /* memory hierarchy level number */ @@ -1154,6 +1212,12 @@ union perf_mem_data_src { #define PERF_MEM_TLB_OS 0x40 /* OS fault handler */ #define PERF_MEM_TLB_SHIFT 26 +/* Access blocked */ +#define PERF_MEM_BLK_NA 0x01 /* not available */ +#define PERF_MEM_BLK_DATA 0x02 /* data could not be forwarded */ +#define PERF_MEM_BLK_ADDR 0x04 /* address conflict */ +#define PERF_MEM_BLK_SHIFT 40 + #define PERF_MEM_S(a, s) \ (((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT) @@ -1185,4 +1249,23 @@ struct perf_branch_entry { reserved:40; }; +union perf_sample_weight { + __u64 full; +#if defined(__LITTLE_ENDIAN_BITFIELD) + struct { + __u32 var1_dw; + __u16 var2_w; + __u16 var3_w; + }; +#elif defined(__BIG_ENDIAN_BITFIELD) + struct { + __u16 var3_w; + __u16 var2_w; + __u32 var1_dw; + }; +#else +#error "Unknown endianness" +#endif +}; + #endif /* _UAPI_LINUX_PERF_EVENT_H */ diff --git a/tools/perf/Documentation/perf-annotate.txt b/tools/perf/Documentation/perf-annotate.txt index e8c972f89357d2dc47bbe57000e503f46d964f37..1b5042f134a8679bf0d6f0e707a839cd76bfce38 100644 --- a/tools/perf/Documentation/perf-annotate.txt +++ b/tools/perf/Documentation/perf-annotate.txt @@ -112,6 +112,12 @@ OPTIONS --objdump=:: Path to objdump binary. +--prefix=PREFIX:: +--prefix-strip=N:: + Remove first N entries from source file path names in executables + and add PREFIX. This allows to display source code compiled on systems + with different file system layout. + --skip-missing:: Skip symbols that cannot be annotated. diff --git a/tools/perf/Documentation/perf-bench.txt b/tools/perf/Documentation/perf-bench.txt index 0921a3c673815c6a031278be3107f80011eb5c46..bad16512c48d7b38846fff58a1749c9ec54a9f08 100644 --- a/tools/perf/Documentation/perf-bench.txt +++ b/tools/perf/Documentation/perf-bench.txt @@ -61,6 +61,9 @@ SUBSYSTEM 'epoll':: Eventpoll (epoll) stressing benchmarks. +'internals':: + Benchmark internal perf functionality. + 'all':: All benchmark subsystems. @@ -214,6 +217,11 @@ Suite for evaluating concurrent epoll_wait calls. *ctl*:: Suite for evaluating multiple epoll_ctl calls. +SUITES FOR 'internals' +~~~~~~~~~~~~~~~~~~~~~~ +*synthesize*:: +Suite for evaluating perf's event synthesis performance. + SEE ALSO -------- linkperf:perf[1] diff --git a/tools/perf/Documentation/perf-data.txt b/tools/perf/Documentation/perf-data.txt index c87180764829c069c27c7c4cfaec4f43248ac29d..417bf17e265c2030d45021ae966fe2a8d0663f65 100644 --- a/tools/perf/Documentation/perf-data.txt +++ b/tools/perf/Documentation/perf-data.txt @@ -17,7 +17,7 @@ Data file related processing. COMMANDS -------- convert:: - Converts perf data file into another format (only CTF [1] format is support by now). + Converts perf data file into another format. It's possible to set data-convert debug variable to get debug messages from conversion, like: perf --debug data-convert data convert ... @@ -27,6 +27,12 @@ OPTIONS for 'convert' --to-ctf:: Triggers the CTF conversion, specify the path of CTF data directory. +--to-json:: + Triggers JSON conversion. Specify the JSON filename to output. + +--tod:: + Convert time to wall clock time. + -i:: Specify input perf data file path. diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt index a64d6588470e605845e1eb4589e8e9a7a8a6f049..c34eb5b8c762b677cc1386e71a6179d4c6c0ab33 100644 --- a/tools/perf/Documentation/perf-inject.txt +++ b/tools/perf/Documentation/perf-inject.txt @@ -24,8 +24,12 @@ information could make use of this facility. OPTIONS ------- -b:: ---build-ids=:: +--build-ids:: Inject build-ids into the output stream + +--buildid-all: + Inject build-ids of all DSOs into the output stream + -v:: --verbose:: Be more verbose. @@ -64,6 +68,16 @@ include::itrace.txt[] --force:: Don't complain, do it. +--vm-time-correlation[=OPTIONS]:: + Some architectures may capture AUX area data which contains timestamps + affected by virtualization. This option will update those timestamps + in place, to correlate with host timestamps. The in-place update means + that an output file is not specified, and instead the input file is + modified. The options are architecture specific, except that they may + start with "dry-run" which will cause the file to be processed but + without updating it. Currently this option is supported only by + Intel PT, refer linkperf:perf-intel-pt[1] + SEE ALSO -------- linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-archive[1] diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 414a9f8b658df62061059e7c1e272e11c29d8564..5a666750bd4b7b6083a630e3242f0bc78a953e78 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -287,6 +287,12 @@ OPTIONS --phys-data:: Record the sample physical addresses. +--data-page-size:: + Record the sampled data address data page size. + +--code-page-size:: + Record the sampled code address (ip) page size + -T:: --timestamp:: Record the sample timestamps. Use it with 'perf report -D' to see the @@ -389,7 +395,10 @@ displayed with the weight and local_weight sort keys. This currently works for abort events and some memory events in precise mode on modern Intel CPUs. --namespaces:: -Record events of type PERF_RECORD_NAMESPACES. +Record events of type PERF_RECORD_NAMESPACES. This enables 'cgroup_id' sort key. + +--all-cgroups:: +Record events of type PERF_RECORD_CGROUP. This enables 'cgroup' sort key. --transaction:: Record transaction flags for transaction related events. diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 7315f155803f753ea9603ed001f1daf36a09a64e..f627a962b80eebd31fc1b6bf08470c03b32f947d 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -367,6 +367,12 @@ OPTIONS --objdump=:: Path to objdump binary. +--prefix=PREFIX:: +--prefix-strip=N:: + Remove first N entries from source file path names in executables + and add PREFIX. This allows to display source code compiled on systems + with different file system layout. + --group:: Show event group information together. It forces group output also if there are no groups defined in data file. diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index 5596129a71cf5d5136e2fba62114ebe9af411ea7..324b6b53c86b65d325dd6726bb35d82b3a03f823 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -158,6 +158,12 @@ Default is to monitor all CPUS. -M:: --disassembler-style=:: Set disassembler style for objdump. +--prefix=PREFIX:: +--prefix-strip=N:: + Remove first N entries from source file path names in executables + and add PREFIX. This allows to display source code compiled on systems + with different file system layout. + --source:: Interleave source code with assembly code. Enabled by default, disable with --no-source. diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt index b0152e1095c58af089b8cd41bbc9264bbcdd8aa3..99866b5ea02844b3045abacad2ac5d6fe8a53893 100644 --- a/tools/perf/Documentation/perf.data-file-format.txt +++ b/tools/perf/Documentation/perf.data-file-format.txt @@ -373,6 +373,70 @@ struct { Indicates that trace contains records of PERF_RECORD_COMPRESSED type that have perf_events records in compressed form. + HEADER_CPU_PMU_CAPS = 28, + + A list of cpu PMU capabilities. The format of data is as below. + +struct { + u32 nr_cpu_pmu_caps; + { + char name[]; + char value[]; + } [nr_cpu_pmu_caps] +}; + + +Example: + cpu pmu capabilities: branches=32, max_precise=3, pmu_name=icelake + + HEADER_CLOCK_DATA = 29, + + Contains clock id and its reference time together with wall clock + time taken at the 'same time', both values are in nanoseconds. + The format of data is as below. + +struct { + u32 version; /* version = 1 */ + u32 clockid; + u64 wall_clock_ns; + u64 clockid_time_ns; +}; + + HEADER_HYBRID_TOPOLOGY = 30, + +Indicate the hybrid CPUs. The format of data is as below. + +struct { + u32 nr; + struct { + char pmu_name[]; + char cpus[]; + } [nr]; /* Variable length records */ +}; + +Example: + hybrid cpu system: + cpu_core cpu list : 0-15 + cpu_atom cpu list : 16-23 + + HEADER_PMU_CAPS = 31, + + List of pmu capabilities (except cpu pmu which is already + covered by HEADER_CPU_PMU_CAPS). Note that hybrid cpu pmu + capabilities are also stored here. + +struct { + u32 nr_pmu; + struct { + u32 nr_caps; + { + char name[]; + char value[]; + } [nr_caps]; + char pmu_name[]; + } [nr_pmu]; +}; + other bits are reserved and should ignored for now HEADER_FEAT_BITS = 256, diff --git a/tools/perf/Documentation/perf.txt b/tools/perf/Documentation/perf.txt index 401f0ed67439855b8ede71ff439c149e6f6eebbc..3f37ded13f8ce404d28b3f6261c62de144e62cd0 100644 --- a/tools/perf/Documentation/perf.txt +++ b/tools/perf/Documentation/perf.txt @@ -24,6 +24,8 @@ OPTIONS data-convert - data convert command debug messages stderr - write debug output (option -v) to stderr in browser mode + perf-event-open - Print perf_event_open() arguments and + return value --buildid-dir:: Setup buildid cache directory. It has higher priority than diff --git a/tools/perf/arch/arm/tests/regs_load.S b/tools/perf/arch/arm/tests/regs_load.S index 6e2495cc4517191656784a9bf61a43653901e7cc..4284307d78226f2f76fa6673c6d22339ccca5bc0 100644 --- a/tools/perf/arch/arm/tests/regs_load.S +++ b/tools/perf/arch/arm/tests/regs_load.S @@ -37,7 +37,7 @@ .text .type perf_regs_load,%function -ENTRY(perf_regs_load) +SYM_FUNC_START(perf_regs_load) str r0, [r0, #R0] str r1, [r0, #R1] str r2, [r0, #R2] @@ -56,4 +56,4 @@ ENTRY(perf_regs_load) str lr, [r0, #PC] // store pc as lr in order to skip the call // to this function mov pc, lr -ENDPROC(perf_regs_load) +SYM_FUNC_END(perf_regs_load) diff --git a/tools/perf/arch/arm/util/Build b/tools/perf/arch/arm/util/Build index 296f0eac5e18ca5f26105c0b55de9e0bdde49fa2..37fc63708966d49cc12885649f0511c6a1aeabaf 100644 --- a/tools/perf/arch/arm/util/Build +++ b/tools/perf/arch/arm/util/Build @@ -1,3 +1,5 @@ +perf-y += perf_regs.o + perf-$(CONFIG_DWARF) += dwarf-regs.o perf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c index b4fe7e41f84700898a395ae11c40474180529801..8cefba31143a0db88194583e03004be6c55c9467 100644 --- a/tools/perf/arch/arm/util/cs-etm.c +++ b/tools/perf/arch/arm/util/cs-etm.c @@ -23,6 +23,7 @@ #include "../../util/event.h" #include "../../util/evlist.h" #include "../../util/evsel.h" +#include "../../util/perf_api_probe.h" #include "../../util/evsel_config.h" #include "../../util/pmu.h" #include "../../util/cs-etm.h" @@ -226,7 +227,7 @@ static int cs_etm_set_sink_attr(struct perf_pmu *pmu, if (term->type != PERF_EVSEL__CONFIG_TERM_DRV_CFG) continue; - sink = term->val.drv_cfg; + sink = term->val.str; snprintf(path, PATH_MAX, "sinks/%s", sink); ret = perf_pmu__scan_file(pmu, path, "%x", &hash); @@ -401,7 +402,7 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, * when a context switch happened. */ if (!perf_cpu_map__empty(cpus)) { - perf_evsel__set_sample_bit(cs_etm_evsel, CPU); + evsel__set_sample_bit(cs_etm_evsel, CPU); err = cs_etm_set_option(itr, cs_etm_evsel, ETM_OPT_CTXTID | ETM_OPT_TS); @@ -425,7 +426,7 @@ static int cs_etm_recording_options(struct auxtrace_record *itr, /* In per-cpu case, always need the time of mmap events etc */ if (!perf_cpu_map__empty(cpus)) - perf_evsel__set_sample_bit(tracking_evsel, TIME); + evsel__set_sample_bit(tracking_evsel, TIME); } out: diff --git a/tools/perf/arch/arm/util/perf_regs.c b/tools/perf/arch/arm/util/perf_regs.c new file mode 100644 index 0000000000000000000000000000000000000000..2864e2e3776d5105d39f83f4de94d306df4a07ac --- /dev/null +++ b/tools/perf/arch/arm/util/perf_regs.c @@ -0,0 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "../../util/perf_regs.h" + +const struct sample_reg sample_reg_masks[] = { + SMPL_REG_END +}; diff --git a/tools/perf/arch/arm64/tests/regs_load.S b/tools/perf/arch/arm64/tests/regs_load.S index 07042511dca925fc3cbf3ce9594476bce0d19b47..d49de40b6818021d1ba8a75522c4cbf2901fff76 100644 --- a/tools/perf/arch/arm64/tests/regs_load.S +++ b/tools/perf/arch/arm64/tests/regs_load.S @@ -7,7 +7,7 @@ #define LDR_REG(r) ldr x##r, [x0, 8 * r] #define SP (8 * 31) #define PC (8 * 32) -ENTRY(perf_regs_load) +SYM_FUNC_START(perf_regs_load) STR_REG(0) STR_REG(1) STR_REG(2) @@ -44,4 +44,4 @@ ENTRY(perf_regs_load) str x30, [x0, #PC] LDR_REG(1) ret -ENDPROC(perf_regs_load) +SYM_FUNC_END(perf_regs_load) diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build index 4ed1ebdc5dc085dbfda9219fb7798718cedccf1f..264ac3c74408eab0be0ad8d6b495e97dbf4d31d2 100644 --- a/tools/perf/arch/arm64/util/Build +++ b/tools/perf/arch/arm64/util/Build @@ -1,4 +1,5 @@ perf-y += header.o +perf-y += perf_regs.o perf-y += sym-handling.o perf-$(CONFIG_DWARF) += dwarf-regs.o perf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c index 8d6821d9c3f6cea8b26c8f5019d5d4f3793ef9b9..e3593063b3d17879cc514f06a8911e5943eb22ad 100644 --- a/tools/perf/arch/arm64/util/arm-spe.c +++ b/tools/perf/arch/arm64/util/arm-spe.c @@ -11,17 +11,17 @@ #include #include -#include "../../util/cpumap.h" -#include "../../util/event.h" -#include "../../util/evsel.h" -#include "../../util/evlist.h" -#include "../../util/session.h" +#include "../../../util/cpumap.h" +#include "../../../util/event.h" +#include "../../../util/evsel.h" +#include "../../../util/evlist.h" +#include "../../../util/session.h" #include // page_size -#include "../../util/pmu.h" -#include "../../util/debug.h" -#include "../../util/auxtrace.h" -#include "../../util/record.h" -#include "../../util/arm-spe.h" +#include "../../../util/pmu.h" +#include "../../../util/debug.h" +#include "../../../util/auxtrace.h" +#include "../../../util/record.h" +#include "../../../util/arm-spe.h" #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024) @@ -120,9 +120,9 @@ static int arm_spe_recording_options(struct auxtrace_record *itr, */ perf_evlist__to_front(evlist, arm_spe_evsel); - perf_evsel__set_sample_bit(arm_spe_evsel, CPU); - perf_evsel__set_sample_bit(arm_spe_evsel, TIME); - perf_evsel__set_sample_bit(arm_spe_evsel, TID); + evsel__set_sample_bit(arm_spe_evsel, CPU); + evsel__set_sample_bit(arm_spe_evsel, TIME); + evsel__set_sample_bit(arm_spe_evsel, TID); /* Add dummy event to keep tracking */ err = parse_events(evlist, "dummy:u", NULL); @@ -134,9 +134,9 @@ static int arm_spe_recording_options(struct auxtrace_record *itr, tracking_evsel->core.attr.freq = 0; tracking_evsel->core.attr.sample_period = 1; - perf_evsel__set_sample_bit(tracking_evsel, TIME); - perf_evsel__set_sample_bit(tracking_evsel, CPU); - perf_evsel__reset_sample_bit(tracking_evsel, BRANCH_STACK); + evsel__set_sample_bit(tracking_evsel, TIME); + evsel__set_sample_bit(tracking_evsel, CPU); + evsel__reset_sample_bit(tracking_evsel, BRANCH_STACK); return 0; } diff --git a/tools/perf/arch/arm64/util/perf_regs.c b/tools/perf/arch/arm64/util/perf_regs.c new file mode 100644 index 0000000000000000000000000000000000000000..2833e101a7c6407263130e9948a06a2caa32bc4b --- /dev/null +++ b/tools/perf/arch/arm64/util/perf_regs.c @@ -0,0 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "../../../util/perf_regs.h" + +const struct sample_reg sample_reg_masks[] = { + SMPL_REG_END +}; diff --git a/tools/perf/arch/csky/util/Build b/tools/perf/arch/csky/util/Build index 1160bb2332bad22be319c9802feff2cdd4111c8c..7d3050134ae0fd4b1bacabb41a95fa51903beff3 100644 --- a/tools/perf/arch/csky/util/Build +++ b/tools/perf/arch/csky/util/Build @@ -1,2 +1,4 @@ +perf-y += perf_regs.o + perf-$(CONFIG_DWARF) += dwarf-regs.o perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o diff --git a/tools/perf/arch/csky/util/perf_regs.c b/tools/perf/arch/csky/util/perf_regs.c new file mode 100644 index 0000000000000000000000000000000000000000..2864e2e3776d5105d39f83f4de94d306df4a07ac --- /dev/null +++ b/tools/perf/arch/csky/util/perf_regs.c @@ -0,0 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "../../util/perf_regs.h" + +const struct sample_reg sample_reg_masks[] = { + SMPL_REG_END +}; diff --git a/tools/perf/arch/powerpc/util/perf_regs.c b/tools/perf/arch/powerpc/util/perf_regs.c index e9c436eeffc9d8f390eb564619283e49897c0adf..0a5242900248504530b760a60886d68cb1208df9 100644 --- a/tools/perf/arch/powerpc/util/perf_regs.c +++ b/tools/perf/arch/powerpc/util/perf_regs.c @@ -4,8 +4,8 @@ #include #include -#include "../../util/perf_regs.h" -#include "../../util/debug.h" +#include "../../../util/perf_regs.h" +#include "../../../util/debug.h" #include diff --git a/tools/perf/arch/riscv/util/Build b/tools/perf/arch/riscv/util/Build index 1160bb2332bad22be319c9802feff2cdd4111c8c..7d3050134ae0fd4b1bacabb41a95fa51903beff3 100644 --- a/tools/perf/arch/riscv/util/Build +++ b/tools/perf/arch/riscv/util/Build @@ -1,2 +1,4 @@ +perf-y += perf_regs.o + perf-$(CONFIG_DWARF) += dwarf-regs.o perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o diff --git a/tools/perf/arch/riscv/util/perf_regs.c b/tools/perf/arch/riscv/util/perf_regs.c new file mode 100644 index 0000000000000000000000000000000000000000..2864e2e3776d5105d39f83f4de94d306df4a07ac --- /dev/null +++ b/tools/perf/arch/riscv/util/perf_regs.c @@ -0,0 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "../../util/perf_regs.h" + +const struct sample_reg sample_reg_masks[] = { + SMPL_REG_END +}; diff --git a/tools/perf/arch/s390/util/Build b/tools/perf/arch/s390/util/Build index 22797f043b844378d7889d6243a38b079e7336d1..3d9d0f4f72ca1f50f70bfb0e57e82a1d62bbebdc 100644 --- a/tools/perf/arch/s390/util/Build +++ b/tools/perf/arch/s390/util/Build @@ -1,5 +1,6 @@ perf-y += header.o perf-y += kvm-stat.o +perf-y += perf_regs.o perf-$(CONFIG_DWARF) += dwarf-regs.o perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o diff --git a/tools/perf/arch/s390/util/perf_regs.c b/tools/perf/arch/s390/util/perf_regs.c new file mode 100644 index 0000000000000000000000000000000000000000..2864e2e3776d5105d39f83f4de94d306df4a07ac --- /dev/null +++ b/tools/perf/arch/s390/util/perf_regs.c @@ -0,0 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "../../util/perf_regs.h" + +const struct sample_reg sample_reg_masks[] = { + SMPL_REG_END +}; diff --git a/tools/perf/arch/x86/tests/perf-time-to-tsc.c b/tools/perf/arch/x86/tests/perf-time-to-tsc.c index fa947952c16a16957704b8b4e9c3d03ce60e6f06..909ead08a6f6e1397d786d698845cb76f1174db0 100644 --- a/tools/perf/arch/x86/tests/perf-time-to-tsc.c +++ b/tools/perf/arch/x86/tests/perf-time-to-tsc.c @@ -9,6 +9,7 @@ #include #include #include +#include #include "debug.h" #include "parse-events.h" @@ -117,10 +118,10 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe for (i = 0; i < evlist->core.nr_mmaps; i++) { md = &evlist->mmap[i]; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) continue; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { struct perf_sample sample; if (event->header.type != PERF_RECORD_COMM || @@ -139,9 +140,9 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe comm2_time = sample.time; } next_event: - perf_mmap__consume(md); + perf_mmap__consume(&md->core); } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); } if (!comm1_time || !comm2_time) diff --git a/tools/perf/arch/x86/tests/regs_load.S b/tools/perf/arch/x86/tests/regs_load.S index bbe5a0d16e51083d5432c30f6a45dc8fcec7211f..80f14f52e3f6072fcc1ca179e544a2b945fac5e5 100644 --- a/tools/perf/arch/x86/tests/regs_load.S +++ b/tools/perf/arch/x86/tests/regs_load.S @@ -28,7 +28,7 @@ .text #ifdef HAVE_ARCH_X86_64_SUPPORT -ENTRY(perf_regs_load) +SYM_FUNC_START(perf_regs_load) movq %rax, AX(%rdi) movq %rbx, BX(%rdi) movq %rcx, CX(%rdi) @@ -60,9 +60,9 @@ ENTRY(perf_regs_load) movq %r14, R14(%rdi) movq %r15, R15(%rdi) ret -ENDPROC(perf_regs_load) +SYM_FUNC_END(perf_regs_load) #else -ENTRY(perf_regs_load) +SYM_FUNC_START(perf_regs_load) push %edi movl 8(%esp), %edi movl %eax, AX(%edi) @@ -88,7 +88,7 @@ ENTRY(perf_regs_load) movl $0, FS(%edi) movl $0, GS(%edi) ret -ENDPROC(perf_regs_load) +SYM_FUNC_END(perf_regs_load) #endif /* diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build index 47f9c56e744f8c3751e6cf22ec9f738182ea8370..1504b356f827b87947e85189c3718159b8e69da2 100644 --- a/tools/perf/arch/x86/util/Build +++ b/tools/perf/arch/x86/util/Build @@ -6,6 +6,7 @@ perf-y += perf_regs.o perf-y += group.o perf-y += machine.o perf-y += event.o +perf-y += evsel.o perf-$(CONFIG_DWARF) += dwarf-regs.o perf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c index 96f4a2c118937171f830cf775a875dca5feff5a2..330d03216b0e66c653ea9ee0ba5e5167ecb6824e 100644 --- a/tools/perf/arch/x86/util/auxtrace.c +++ b/tools/perf/arch/x86/util/auxtrace.c @@ -7,13 +7,13 @@ #include #include -#include "../../util/header.h" -#include "../../util/debug.h" -#include "../../util/pmu.h" -#include "../../util/auxtrace.h" -#include "../../util/intel-pt.h" -#include "../../util/intel-bts.h" -#include "../../util/evlist.h" +#include "../../../util/header.h" +#include "../../../util/debug.h" +#include "../../../util/pmu.h" +#include "../../../util/auxtrace.h" +#include "../../../util/intel-pt.h" +#include "../../../util/intel-bts.h" +#include "../../../util/evlist.h" static struct auxtrace_record *auxtrace_record__init_intel(struct evlist *evlist, diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c index d357c625c09ffb0e9ae6ea8a90f05a0425b5d4ee..91d938e88b42a8015368ba6e1ce45cd97147fccd 100644 --- a/tools/perf/arch/x86/util/event.c +++ b/tools/perf/arch/x86/util/event.c @@ -3,12 +3,12 @@ #include #include -#include "../../util/event.h" -#include "../../util/synthetic-events.h" -#include "../../util/machine.h" -#include "../../util/tool.h" -#include "../../util/map.h" -#include "../../util/debug.h" +#include "../../../util/event.h" +#include "../../../util/synthetic-events.h" +#include "../../../util/machine.h" +#include "../../../util/tool.h" +#include "../../../util/map.h" +#include "../../../util/debug.h" #if defined(__x86_64__) diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c new file mode 100644 index 0000000000000000000000000000000000000000..6222bbb26c652ef41c1f441cd0f073779c071d6d --- /dev/null +++ b/tools/perf/arch/x86/util/evsel.c @@ -0,0 +1,83 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include "util/evsel.h" +#include "util/env.h" +#include "util/pmu.h" +#include "linux/string.h" +#include "util/debug.h" + +#define IBS_FETCH_L3MISSONLY (1ULL << 59) +#define IBS_OP_L3MISSONLY (1ULL << 16) + +void arch_evsel__set_sample_weight(struct evsel *evsel) +{ + evsel__set_sample_bit(evsel, WEIGHT_STRUCT); +} + +void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr) +{ + struct perf_env env = { .total_mem = 0, } ; + + if (!perf_env__cpuid(&env)) + return; + + /* + * On AMD, precise cycles event sampling internally uses IBS pmu. + * But IBS does not have filtering capabilities and perf by default + * sets exclude_guest = 1. This makes IBS pmu event init fail and + * thus perf ends up doing non-precise sampling. Avoid it by clearing + * exclude_guest. + */ + if (env.cpuid && strstarts(env.cpuid, "AuthenticAMD")) + attr->exclude_guest = 0; + + free(env.cpuid); +} +static void ibs_l3miss_warn(void) +{ + pr_warning( +"WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled\n" +"and tagged operation does not cause L3 Miss. This causes sampling period skew.\n"); +} + +void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr) +{ + struct perf_pmu *evsel_pmu, *ibs_fetch_pmu, *ibs_op_pmu; + static int warned_once; + /* 0: Uninitialized, 1: Yes, -1: No */ + static int is_amd; + + if (warned_once || is_amd == -1) + return; + + if (!is_amd) { + struct perf_env *env = perf_evsel__env(evsel); + + if (!perf_env__cpuid(env) || !env->cpuid || + !strstarts(env->cpuid, "AuthenticAMD")) { + is_amd = -1; + return; + } + is_amd = 1; + } + + evsel_pmu = evsel__find_pmu(evsel); + if (!evsel_pmu) + return; + + ibs_fetch_pmu = perf_pmu__find("ibs_fetch"); + ibs_op_pmu = perf_pmu__find("ibs_op"); + + if (ibs_fetch_pmu && ibs_fetch_pmu->type == evsel_pmu->type) { + if (attr->config & IBS_FETCH_L3MISSONLY) { + ibs_l3miss_warn(); + warned_once = 1; + } + } else if (ibs_op_pmu && ibs_op_pmu->type == evsel_pmu->type) { + if (attr->config & IBS_OP_L3MISSONLY) { + ibs_l3miss_warn(); + warned_once = 1; + } + } +} diff --git a/tools/perf/arch/x86/util/header.c b/tools/perf/arch/x86/util/header.c index aa6deb463bf3c7407fd47cde1edcd933c68c42ff..578c8c568ffd6f1210a421052fd2ad5ba5242879 100644 --- a/tools/perf/arch/x86/util/header.c +++ b/tools/perf/arch/x86/util/header.c @@ -7,8 +7,8 @@ #include #include -#include "../../util/debug.h" -#include "../../util/header.h" +#include "../../../util/debug.h" +#include "../../../util/header.h" static inline void cpuid(unsigned int op, unsigned int *a, unsigned int *b, unsigned int *c, diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c index 7d22dbdf0f1dc45139b728ca378a5b0fe49d9950..138d0058a6ab33adf420e80b210fa0416f50ac8a 100644 --- a/tools/perf/arch/x86/util/intel-bts.c +++ b/tools/perf/arch/x86/util/intel-bts.c @@ -11,18 +11,18 @@ #include #include -#include "../../util/cpumap.h" -#include "../../util/event.h" -#include "../../util/evsel.h" -#include "../../util/evlist.h" -#include "../../util/mmap.h" -#include "../../util/session.h" -#include "../../util/pmu.h" -#include "../../util/debug.h" -#include "../../util/record.h" -#include "../../util/tsc.h" -#include "../../util/auxtrace.h" -#include "../../util/intel-bts.h" +#include "../../../util/cpumap.h" +#include "../../../util/event.h" +#include "../../../util/evsel.h" +#include "../../../util/evlist.h" +#include "../../../util/mmap.h" +#include "../../../util/session.h" +#include "../../../util/pmu.h" +#include "../../../util/debug.h" +#include "../../../util/record.h" +#include "../../../util/tsc.h" +#include "../../../util/auxtrace.h" +#include "../../../util/intel-bts.h" #include // page_size #define KiB(x) ((x) * 1024) @@ -219,7 +219,7 @@ static int intel_bts_recording_options(struct auxtrace_record *itr, * AUX event. */ if (!perf_cpu_map__empty(cpus)) - perf_evsel__set_sample_bit(intel_bts_evsel, CPU); + evsel__set_sample_bit(intel_bts_evsel, CPU); } /* Add dummy event to keep tracking */ diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c index 761140ff14f20606f5dfc8b01e118791c8ecb0a8..4b4dffd2b4c40156f9066081696e52739e4f2040 100644 --- a/tools/perf/arch/x86/util/intel-pt.c +++ b/tools/perf/arch/x86/util/intel-pt.c @@ -13,22 +13,23 @@ #include #include -#include "../../util/session.h" -#include "../../util/event.h" -#include "../../util/evlist.h" -#include "../../util/evsel.h" -#include "../../util/cpumap.h" -#include "../../util/mmap.h" +#include "../../../util/session.h" +#include "../../../util/event.h" +#include "../../../util/evlist.h" +#include "../../../util/evsel.h" +#include "../../../util/cpumap.h" +#include "../../../util/mmap.h" #include -#include "../../util/parse-events.h" -#include "../../util/pmu.h" -#include "../../util/debug.h" -#include "../../util/auxtrace.h" -#include "../../util/record.h" -#include "../../util/target.h" -#include "../../util/tsc.h" +#include "../../../util/parse-events.h" +#include "../../../util/pmu.h" +#include "../../../util/debug.h" +#include "../../../util/auxtrace.h" +#include "../../../util/perf_api_probe.h" +#include "../../../util/record.h" +#include "../../../util/target.h" +#include "../../../util/tsc.h" #include // page_size -#include "../../util/intel-pt.h" +#include "../../../util/intel-pt.h" #define KiB(x) ((x) * 1024) #define MiB(x) ((x) * 1024 * 1024) @@ -419,8 +420,8 @@ static int intel_pt_track_switches(struct evlist *evlist) evsel = evlist__last(evlist); - perf_evsel__set_sample_bit(evsel, CPU); - perf_evsel__set_sample_bit(evsel, TIME); + evsel__set_sample_bit(evsel, CPU); + evsel__set_sample_bit(evsel, TIME); evsel->core.system_wide = true; evsel->no_aux_samples = true; @@ -728,10 +729,10 @@ static int intel_pt_recording_options(struct auxtrace_record *itr, switch_evsel->no_aux_samples = true; switch_evsel->immediate = true; - perf_evsel__set_sample_bit(switch_evsel, TID); - perf_evsel__set_sample_bit(switch_evsel, TIME); - perf_evsel__set_sample_bit(switch_evsel, CPU); - perf_evsel__reset_sample_bit(switch_evsel, BRANCH_STACK); + evsel__set_sample_bit(switch_evsel, TID); + evsel__set_sample_bit(switch_evsel, TIME); + evsel__set_sample_bit(switch_evsel, CPU); + evsel__reset_sample_bit(switch_evsel, BRANCH_STACK); opts->record_switch_events = false; ptr->have_sched_switch = 3; @@ -765,7 +766,7 @@ static int intel_pt_recording_options(struct auxtrace_record *itr, * AUX event. */ if (!perf_cpu_map__empty(cpus)) - perf_evsel__set_sample_bit(intel_pt_evsel, CPU); + evsel__set_sample_bit(intel_pt_evsel, CPU); } /* Add dummy event to keep tracking */ @@ -789,11 +790,11 @@ static int intel_pt_recording_options(struct auxtrace_record *itr, /* In per-cpu case, always need the time of mmap events etc */ if (!perf_cpu_map__empty(cpus)) { - perf_evsel__set_sample_bit(tracking_evsel, TIME); + evsel__set_sample_bit(tracking_evsel, TIME); /* And the CPU for switch events */ - perf_evsel__set_sample_bit(tracking_evsel, CPU); + evsel__set_sample_bit(tracking_evsel, CPU); } - perf_evsel__reset_sample_bit(tracking_evsel, BRANCH_STACK); + evsel__reset_sample_bit(tracking_evsel, BRANCH_STACK); } /* diff --git a/tools/perf/arch/x86/util/machine.c b/tools/perf/arch/x86/util/machine.c index e17e080e76f49b1ed1bcf3415f473e2d1b21ebb1..31679c35d493e87d10af534fa1c5fc9732d8191d 100644 --- a/tools/perf/arch/x86/util/machine.c +++ b/tools/perf/arch/x86/util/machine.c @@ -5,9 +5,9 @@ #include #include // page_size -#include "../../util/machine.h" -#include "../../util/map.h" -#include "../../util/symbol.h" +#include "../../../util/machine.h" +#include "../../../util/map.h" +#include "../../../util/symbol.h" #include #include diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/util/perf_regs.c index c218b83e063b52510697a41c047b328d1e17f36f..fca81b39b09f4f65a2f4d64d05264197e8893388 100644 --- a/tools/perf/arch/x86/util/perf_regs.c +++ b/tools/perf/arch/x86/util/perf_regs.c @@ -5,10 +5,10 @@ #include #include -#include "../../perf-sys.h" -#include "../../util/perf_regs.h" -#include "../../util/debug.h" -#include "../../util/event.h" +#include "../../../perf-sys.h" +#include "../../../util/perf_regs.h" +#include "../../../util/debug.h" +#include "../../../util/event.h" const struct sample_reg sample_reg_masks[] = { SMPL_REG(AX, PERF_REG_X86_AX), diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c index e33ef5bc31c57f08aa40ac682b4ca8931aba7751..d48d608517fd273212b0c8c0da310aa8075085a4 100644 --- a/tools/perf/arch/x86/util/pmu.c +++ b/tools/perf/arch/x86/util/pmu.c @@ -4,9 +4,9 @@ #include #include -#include "../../util/intel-pt.h" -#include "../../util/intel-bts.h" -#include "../../util/pmu.h" +#include "../../../util/intel-pt.h" +#include "../../../util/intel-bts.h" +#include "../../../util/pmu.h" struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused) { diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build index e4e321b6f8835e88a6d3fe28e18459aeed9ccb9a..075132693375c4c9a15aa97a67b67eb34f2db6ab 100644 --- a/tools/perf/bench/Build +++ b/tools/perf/bench/Build @@ -6,9 +6,10 @@ perf-y += futex-wake.o perf-y += futex-wake-parallel.o perf-y += futex-requeue.o perf-y += futex-lock-pi.o - perf-y += epoll-wait.o perf-y += epoll-ctl.o +perf-y += synthesize.o +perf-y += inject-buildid.o perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-lib.o perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h index d9329ae84e178993042534d5e89d219dffa3fe34..02b93a7a6f4a35184a45348303a8349f3a9e620e 100644 --- a/tools/perf/bench/bench.h +++ b/tools/perf/bench/bench.h @@ -29,9 +29,10 @@ int bench_futex_wake_parallel(int argc, const char **argv); int bench_futex_requeue(int argc, const char **argv); /* pi futexes */ int bench_futex_lock_pi(int argc, const char **argv); - int bench_epoll_wait(int argc, const char **argv); int bench_epoll_ctl(int argc, const char **argv); +int bench_synthesize(int argc, const char **argv); +int bench_inject_build_id(int argc, const char **argv); #define BENCH_FORMAT_DEFAULT_STR "default" #define BENCH_FORMAT_DEFAULT 0 diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c new file mode 100644 index 0000000000000000000000000000000000000000..0fccf2a9e95b29b96e2e7bab664bdcbbcdb217dd --- /dev/null +++ b/tools/perf/bench/inject-buildid.c @@ -0,0 +1,460 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "bench.h" +#include "util/data.h" +#include "util/stat.h" +#include "util/debug.h" +#include "util/event.h" +#include "util/symbol.h" +#include "util/session.h" +#include "util/build-id.h" +#include "util/synthetic-events.h" + +#define MMAP_DEV_MAJOR 8 +#define DSO_MMAP_RATIO 4 + +static unsigned int iterations = 100; +static unsigned int nr_mmaps = 100; +static unsigned int nr_samples = 100; /* samples per mmap */ + +static u64 bench_sample_type; +static u16 bench_id_hdr_size; + +struct bench_data { + int pid; + int input_pipe[2]; + int output_pipe[2]; + pthread_t th; +}; + +struct bench_dso { + struct list_head list; + char *name; + int ino; +}; + +static int nr_dsos; +static struct bench_dso *dsos; + +extern int cmd_inject(int argc, const char *argv[]); + +static const struct option options[] = { + OPT_UINTEGER('i', "iterations", &iterations, + "Number of iterations used to compute average (default: 100)"), + OPT_UINTEGER('m', "nr-mmaps", &nr_mmaps, + "Number of mmap events for each iteration (default: 100)"), + OPT_UINTEGER('n', "nr-samples", &nr_samples, + "Number of sample events per mmap event (default: 100)"), + OPT_INCR('v', "verbose", &verbose, + "be more verbose (show iteration count, DSO name, etc)"), + OPT_END() +}; + +static const char *const bench_usage[] = { + "perf bench internals inject-build-id ", + NULL +}; + +/* + * Helper for collect_dso that adds the given file as a dso to dso_list + * if it contains a build-id. Stops after collecting 4 times more than + * we need (for MMAP2 events). + */ +static int add_dso(const char *fpath, const struct stat *sb __maybe_unused, + int typeflag, struct FTW *ftwbuf __maybe_unused) +{ + struct bench_dso *dso = &dsos[nr_dsos]; + unsigned char build_id[BUILD_ID_SIZE]; + + if (typeflag == FTW_D || typeflag == FTW_SL) + return 0; + + if (filename__read_build_id(fpath, build_id, BUILD_ID_SIZE) < 0) + return 0; + + dso->name = realpath(fpath, NULL); + if (dso->name == NULL) + return -1; + + dso->ino = nr_dsos++; + pr_debug2(" Adding DSO: %s\n", fpath); + + /* stop if we collected enough DSOs */ + if ((unsigned int)nr_dsos == DSO_MMAP_RATIO * nr_mmaps) + return 1; + + return 0; +} + +static void collect_dso(void) +{ + dsos = calloc(nr_mmaps * DSO_MMAP_RATIO, sizeof(*dsos)); + if (dsos == NULL) { + printf(" Memory allocation failed\n"); + exit(1); + } + + if (nftw("/usr/lib/", add_dso, 10, FTW_PHYS) < 0) + return; + + pr_debug(" Collected %d DSOs\n", nr_dsos); +} + +static void release_dso(void) +{ + int i; + + for (i = 0; i < nr_dsos; i++) { + struct bench_dso *dso = &dsos[i]; + + free(dso->name); + } + free(dsos); +} + +/* Fake address used by mmap and sample events */ +static u64 dso_map_addr(struct bench_dso *dso) +{ + return 0x400000ULL + dso->ino * 8192ULL; +} + +static u32 synthesize_attr(struct bench_data *data) +{ + union perf_event event; + + memset(&event, 0, sizeof(event.attr) + sizeof(u64)); + + event.header.type = PERF_RECORD_HEADER_ATTR; + event.header.size = sizeof(event.attr) + sizeof(u64); + + event.attr.attr.type = PERF_TYPE_SOFTWARE; + event.attr.attr.config = PERF_COUNT_SW_TASK_CLOCK; + event.attr.attr.exclude_kernel = 1; + event.attr.attr.sample_id_all = 1; + event.attr.attr.sample_type = bench_sample_type; + + return writen(data->input_pipe[1], &event, event.header.size); +} + +static u32 synthesize_fork(struct bench_data *data) +{ + union perf_event event; + + memset(&event, 0, sizeof(event.fork) + bench_id_hdr_size); + + event.header.type = PERF_RECORD_FORK; + event.header.misc = PERF_RECORD_MISC_FORK_EXEC; + event.header.size = sizeof(event.fork) + bench_id_hdr_size; + + event.fork.ppid = 1; + event.fork.ptid = 1; + event.fork.pid = data->pid; + event.fork.tid = data->pid; + + return writen(data->input_pipe[1], &event, event.header.size); +} + +static u32 synthesize_mmap(struct bench_data *data, struct bench_dso *dso, + u64 timestamp) +{ + union perf_event event; + size_t len = offsetof(struct perf_record_mmap2, filename); + u64 *id_hdr_ptr = (void *)&event; + int ts_idx; + + len += roundup(strlen(dso->name) + 1, 8) + bench_id_hdr_size; + + memset(&event, 0, min(len, sizeof(event.mmap2))); + + event.header.type = PERF_RECORD_MMAP2; + event.header.misc = PERF_RECORD_MISC_USER; + event.header.size = len; + + event.mmap2.pid = data->pid; + event.mmap2.tid = data->pid; + event.mmap2.maj = MMAP_DEV_MAJOR; + event.mmap2.ino = dso->ino; + + strcpy(event.mmap2.filename, dso->name); + + event.mmap2.start = dso_map_addr(dso); + event.mmap2.len = 4096; + event.mmap2.prot = PROT_EXEC; + + if (len > sizeof(event.mmap2)) { + /* write mmap2 event first */ + writen(data->input_pipe[1], &event, len - bench_id_hdr_size); + /* zero-fill sample id header */ + memset(id_hdr_ptr, 0, bench_id_hdr_size); + /* put timestamp in the right position */ + ts_idx = (bench_id_hdr_size / sizeof(u64)) - 2; + id_hdr_ptr[ts_idx] = timestamp; + writen(data->input_pipe[1], id_hdr_ptr, bench_id_hdr_size); + } else { + ts_idx = (len / sizeof(u64)) - 2; + id_hdr_ptr[ts_idx] = timestamp; + writen(data->input_pipe[1], &event, len); + } + return len; +} + +static u32 synthesize_sample(struct bench_data *data, struct bench_dso *dso, + u64 timestamp) +{ + union perf_event event; + struct perf_sample sample = { + .tid = data->pid, + .pid = data->pid, + .ip = dso_map_addr(dso), + .time = timestamp, + }; + + event.header.type = PERF_RECORD_SAMPLE; + event.header.misc = PERF_RECORD_MISC_USER; + event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0); + + perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample); + + return writen(data->input_pipe[1], &event, event.header.size); +} + +static u32 synthesize_flush(struct bench_data *data) +{ + struct perf_event_header header = { + .size = sizeof(header), + .type = PERF_RECORD_FINISHED_ROUND, + }; + + return writen(data->input_pipe[1], &header, header.size); +} + +static void *data_reader(void *arg) +{ + struct bench_data *data = arg; + char buf[8192]; + int flag; + int n; + + flag = fcntl(data->output_pipe[0], F_GETFL); + fcntl(data->output_pipe[0], F_SETFL, flag | O_NONBLOCK); + + /* read out data from child */ + while (true) { + n = read(data->output_pipe[0], buf, sizeof(buf)); + if (n > 0) + continue; + if (n == 0) + break; + + if (errno != EINTR && errno != EAGAIN) + break; + + usleep(100); + } + + close(data->output_pipe[0]); + return NULL; +} + +static int setup_injection(struct bench_data *data) +{ + int ready_pipe[2]; + int dev_null_fd; + char buf; + + if (pipe(ready_pipe) < 0) + return -1; + + if (pipe(data->input_pipe) < 0) + return -1; + + if (pipe(data->output_pipe) < 0) + return -1; + + data->pid = fork(); + if (data->pid < 0) + return -1; + + if (data->pid == 0) { + const char **inject_argv; + + close(data->input_pipe[1]); + close(data->output_pipe[0]); + close(ready_pipe[0]); + + dup2(data->input_pipe[0], STDIN_FILENO); + close(data->input_pipe[0]); + dup2(data->output_pipe[1], STDOUT_FILENO); + close(data->output_pipe[1]); + + dev_null_fd = open("/dev/null", O_WRONLY); + if (dev_null_fd < 0) + exit(1); + + dup2(dev_null_fd, STDERR_FILENO); + + inject_argv = calloc(3, sizeof(*inject_argv)); + if (inject_argv == NULL) + exit(1); + + inject_argv[0] = strdup("inject"); + inject_argv[1] = strdup("-b"); + + /* signal that we're ready to go */ + close(ready_pipe[1]); + + cmd_inject(2, inject_argv); + + exit(0); + } + + pthread_create(&data->th, NULL, data_reader, data); + + close(ready_pipe[1]); + close(data->input_pipe[0]); + close(data->output_pipe[1]); + + /* wait for child ready */ + if (read(ready_pipe[0], &buf, 1) < 0) + return -1; + close(ready_pipe[0]); + + return 0; +} + +static int inject_build_id(struct bench_data *data, u64 *max_rss) +{ + int status; + unsigned int i, k; + struct rusage rusage; + u64 len = 0; + + /* this makes the child to run */ + if (perf_header__write_pipe(data->input_pipe[1]) < 0) + return -1; + + len += synthesize_attr(data); + len += synthesize_fork(data); + + for (i = 0; i < nr_mmaps; i++) { + int idx = rand() % (nr_dsos - 1); + struct bench_dso *dso = &dsos[idx]; + u64 timestamp = rand() % 1000000; + + pr_debug2(" [%d] injecting: %s\n", i+1, dso->name); + len += synthesize_mmap(data, dso, timestamp); + + for (k = 0; k < nr_samples; k++) + len += synthesize_sample(data, dso, timestamp + k * 1000); + + if ((i + 1) % 10 == 0) + len += synthesize_flush(data); + } + + /* tihs makes the child to finish */ + close(data->input_pipe[1]); + + wait4(data->pid, &status, 0, &rusage); + *max_rss = rusage.ru_maxrss; + + pr_debug(" Child %d exited with %d\n", data->pid, status); + + return 0; +} + +static int do_inject_loop(struct bench_data *data) +{ + unsigned int i; + struct stats time_stats, mem_stats; + double time_average, time_stddev; + double mem_average, mem_stddev; + + srand(time(NULL)); + init_stats(&time_stats); + init_stats(&mem_stats); + symbol__init(NULL); + + bench_sample_type = PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_IP; + bench_sample_type |= PERF_SAMPLE_TID | PERF_SAMPLE_TIME; + bench_id_hdr_size = 32; + + collect_dso(); + if (nr_dsos == 0) { + printf(" Cannot collect DSOs for injection\n"); + return -1; + } + + for (i = 0; i < iterations; i++) { + struct timeval start, end, diff; + u64 runtime_us, max_rss; + + pr_debug(" Iteration #%d\n", i+1); + + if (setup_injection(data) < 0) { + printf(" Build-id injection setup failed\n"); + break; + } + + gettimeofday(&start, NULL); + if (inject_build_id(data, &max_rss) < 0) { + printf(" Build-id injection failed\n"); + break; + } + + gettimeofday(&end, NULL); + timersub(&end, &start, &diff); + runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec; + update_stats(&time_stats, runtime_us); + update_stats(&mem_stats, max_rss); + + pthread_join(data->th, NULL); + } + + time_average = avg_stats(&time_stats) / USEC_PER_MSEC; + time_stddev = stddev_stats(&time_stats) / USEC_PER_MSEC; + printf(" Average build-id injection took: %.3f msec (+- %.3f msec)\n", + time_average, time_stddev); + + /* each iteration, it processes MMAP2 + BUILD_ID + nr_samples * SAMPLE */ + time_average = avg_stats(&time_stats) / (nr_mmaps * (nr_samples + 2)); + time_stddev = stddev_stats(&time_stats) / (nr_mmaps * (nr_samples + 2)); + printf(" Average time per event: %.3f usec (+- %.3f usec)\n", + time_average, time_stddev); + + mem_average = avg_stats(&mem_stats); + mem_stddev = stddev_stats(&mem_stats); + printf(" Average memory usage: %.0f KB (+- %.0f KB)\n", + mem_average, mem_stddev); + + release_dso(); + return 0; +} + +int bench_inject_build_id(int argc, const char **argv) +{ + struct bench_data data; + + argc = parse_options(argc, argv, options, bench_usage, 0); + if (argc) { + usage_with_options(bench_usage, options); + exit(EXIT_FAILURE); + } + + return do_inject_loop(&data); +} + diff --git a/tools/perf/bench/synthesize.c b/tools/perf/bench/synthesize.c new file mode 100644 index 0000000000000000000000000000000000000000..05f7c923c745b4e8d2acca73c4c50e0cc7be8ae3 --- /dev/null +++ b/tools/perf/bench/synthesize.c @@ -0,0 +1,262 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Benchmark synthesis of perf events such as at the start of a 'perf + * record'. Synthesis is done on the current process and the 'dummy' event + * handlers are invoked that support dump_trace but otherwise do nothing. + * + * Copyright 2019 Google LLC. + */ +#include +#include "bench.h" +#include "../util/debug.h" +#include "../util/session.h" +#include "../util/stat.h" +#include "../util/synthetic-events.h" +#include "../util/target.h" +#include "../util/thread_map.h" +#include "../util/tool.h" +#include "../util/util.h" +#include +#include +#include +#include + +static unsigned int min_threads = 1; +static unsigned int max_threads = UINT_MAX; +static unsigned int single_iterations = 10000; +static unsigned int multi_iterations = 10; +static bool run_st; +static bool run_mt; + +static const struct option options[] = { + OPT_BOOLEAN('s', "st", &run_st, "Run single threaded benchmark"), + OPT_BOOLEAN('t', "mt", &run_mt, "Run multi-threaded benchmark"), + OPT_UINTEGER('m', "min-threads", &min_threads, + "Minimum number of threads in multithreaded bench"), + OPT_UINTEGER('M', "max-threads", &max_threads, + "Maximum number of threads in multithreaded bench"), + OPT_UINTEGER('i', "single-iterations", &single_iterations, + "Number of iterations used to compute single-threaded average"), + OPT_UINTEGER('I', "multi-iterations", &multi_iterations, + "Number of iterations used to compute multi-threaded average"), + OPT_END() +}; + +static const char *const bench_usage[] = { + "perf bench internals synthesize ", + NULL +}; + +static atomic_t event_count; + +static int process_synthesized_event(struct perf_tool *tool __maybe_unused, + union perf_event *event __maybe_unused, + struct perf_sample *sample __maybe_unused, + struct machine *machine __maybe_unused) +{ + atomic_inc(&event_count); + return 0; +} + +static int do_run_single_threaded(struct perf_session *session, + struct perf_thread_map *threads, + struct target *target, bool data_mmap) +{ + const unsigned int nr_threads_synthesize = 1; + struct timeval start, end, diff; + u64 runtime_us; + unsigned int i; + double time_average, time_stddev, event_average, event_stddev; + int err; + struct stats time_stats, event_stats; + + init_stats(&time_stats); + init_stats(&event_stats); + + for (i = 0; i < single_iterations; i++) { + atomic_set(&event_count, 0); + gettimeofday(&start, NULL); + err = __machine__synthesize_threads(&session->machines.host, + NULL, + target, threads, + process_synthesized_event, + data_mmap, + nr_threads_synthesize); + if (err) + return err; + + gettimeofday(&end, NULL); + timersub(&end, &start, &diff); + runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec; + update_stats(&time_stats, runtime_us); + update_stats(&event_stats, atomic_read(&event_count)); + } + + time_average = avg_stats(&time_stats); + time_stddev = stddev_stats(&time_stats); + printf(" Average %ssynthesis took: %.3f usec (+- %.3f usec)\n", + data_mmap ? "data " : "", time_average, time_stddev); + + event_average = avg_stats(&event_stats); + event_stddev = stddev_stats(&event_stats); + printf(" Average num. events: %.3f (+- %.3f)\n", + event_average, event_stddev); + + printf(" Average time per event %.3f usec\n", + time_average / event_average); + return 0; +} + +static int run_single_threaded(void) +{ + struct perf_session *session; + struct target target = { + .pid = "self", + }; + struct perf_thread_map *threads; + int err; + + perf_set_singlethreaded(); + session = perf_session__new(NULL, NULL); + if (IS_ERR(session)) { + pr_err("Session creation failed.\n"); + return PTR_ERR(session); + } + threads = thread_map__new_by_pid(getpid()); + if (!threads) { + pr_err("Thread map creation failed.\n"); + err = -ENOMEM; + goto err_out; + } + + puts( +"Computing performance of single threaded perf event synthesis by\n" +"synthesizing events on the perf process itself:"); + + err = do_run_single_threaded(session, threads, &target, false); + if (err) + goto err_out; + + err = do_run_single_threaded(session, threads, &target, true); + +err_out: + if (threads) + perf_thread_map__put(threads); + + perf_session__delete(session); + return err; +} + +static int do_run_multi_threaded(struct target *target, + unsigned int nr_threads_synthesize) +{ + struct timeval start, end, diff; + u64 runtime_us; + unsigned int i; + double time_average, time_stddev, event_average, event_stddev; + int err; + struct stats time_stats, event_stats; + struct perf_session *session; + + init_stats(&time_stats); + init_stats(&event_stats); + for (i = 0; i < multi_iterations; i++) { + session = perf_session__new(NULL, NULL); + if (IS_ERR(session)) + return PTR_ERR(session); + + atomic_set(&event_count, 0); + gettimeofday(&start, NULL); + err = __machine__synthesize_threads(&session->machines.host, + NULL, + target, NULL, + process_synthesized_event, + false, + nr_threads_synthesize); + if (err) { + perf_session__delete(session); + return err; + } + + gettimeofday(&end, NULL); + timersub(&end, &start, &diff); + runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec; + update_stats(&time_stats, runtime_us); + update_stats(&event_stats, atomic_read(&event_count)); + perf_session__delete(session); + } + + time_average = avg_stats(&time_stats); + time_stddev = stddev_stats(&time_stats); + printf(" Average synthesis took: %.3f usec (+- %.3f usec)\n", + time_average, time_stddev); + + event_average = avg_stats(&event_stats); + event_stddev = stddev_stats(&event_stats); + printf(" Average num. events: %.3f (+- %.3f)\n", + event_average, event_stddev); + + printf(" Average time per event %.3f usec\n", + time_average / event_average); + return 0; +} + +static int run_multi_threaded(void) +{ + struct target target = { + .cpu_list = "0" + }; + unsigned int nr_threads_synthesize; + int err; + + if (max_threads == UINT_MAX) + max_threads = sysconf(_SC_NPROCESSORS_ONLN); + + puts( +"Computing performance of multi threaded perf event synthesis by\n" +"synthesizing events on CPU 0:"); + + for (nr_threads_synthesize = min_threads; + nr_threads_synthesize <= max_threads; + nr_threads_synthesize++) { + if (nr_threads_synthesize == 1) + perf_set_singlethreaded(); + else + perf_set_multithreaded(); + + printf(" Number of synthesis threads: %u\n", + nr_threads_synthesize); + + err = do_run_multi_threaded(&target, nr_threads_synthesize); + if (err) + return err; + } + perf_set_singlethreaded(); + return 0; +} + +int bench_synthesize(int argc, const char **argv) +{ + int err = 0; + + argc = parse_options(argc, argv, options, bench_usage, 0); + if (argc) { + usage_with_options(bench_usage, options); + exit(EXIT_FAILURE); + } + + /* + * If neither single threaded or multi-threaded are specified, default + * to running just single threaded. + */ + if (!run_st && !run_mt) + run_st = true; + + if (run_st) + err = run_single_threaded(); + + if (!err && run_mt) + err = run_multi_threaded(); + + return err; +} diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 8db8fc9bddef3b9cd820f0daf233675c4b065b38..e26125c7c1068d3da1f6dcc18caf7e0e2ad03b30 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -433,7 +433,7 @@ static int __cmd_annotate(struct perf_annotate *ann) total_nr_samples += nr_samples; hists__collapse_resort(hists, NULL); /* Don't sort callchain */ - perf_evsel__reset_sample_bit(pos, CALLCHAIN); + evsel__reset_sample_bit(pos, CALLCHAIN); perf_evsel__output_resort(pos, NULL); if (symbol_conf.event_group && @@ -535,6 +535,10 @@ int cmd_annotate(int argc, const char **argv) "Display raw encoding of assembly instructions (default)"), OPT_STRING('M', "disassembler-style", &annotate.opts.disassembler_style, "disassembler style", "Specify disassembler style (e.g. -M intel for intel syntax)"), + OPT_STRING(0, "prefix", &annotate.opts.prefix, "prefix", + "Add prefix to source file path names in programs (with --prefix-strip)"), + OPT_STRING(0, "prefix-strip", &annotate.opts.prefix_strip, "N", + "Strip first N entries of source file path name in programs (with --prefix)"), OPT_STRING(0, "objdump", &annotate.opts.objdump_path, "path", "objdump binary to use for disassembly and annotations"), OPT_BOOLEAN(0, "group", &symbol_conf.event_group, @@ -574,6 +578,9 @@ int cmd_annotate(int argc, const char **argv) annotate.sym_hist_filter = argv[0]; } + if (annotate_check_args(&annotate.opts) < 0) + return -EINVAL; + if (symbol_conf.show_nr_samples && annotate.use_gtk) { pr_err("--show-nr-samples is not available in --gtk mode at this time\n"); return ret; @@ -584,7 +591,7 @@ int cmd_annotate(int argc, const char **argv) data.path = input_name; - annotate.session = perf_session__new(&data, false, &annotate.tool); + annotate.session = perf_session__new(&data, &annotate.tool); if (IS_ERR(annotate.session)) return PTR_ERR(annotate.session); diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c index c06fe21c86134423fc1da3dd6bd4091a0f9cd2e5..e78e8196ccac4d18660fe8c4a608a515a886c73d 100644 --- a/tools/perf/builtin-bench.c +++ b/tools/perf/builtin-bench.c @@ -76,6 +76,12 @@ static struct bench epoll_benchmarks[] = { }; #endif // HAVE_EVENTFD +static struct bench internals_benchmarks[] = { + { "synthesize", "Benchmark perf event synthesis", bench_synthesize }, + { "inject-build-id", "Benchmark build-id injection", bench_inject_build_id }, + { NULL, NULL, NULL } +}; + struct collection { const char *name; const char *summary; @@ -92,6 +98,7 @@ static struct collection collections[] = { #ifdef HAVE_EVENTFD {"epoll", "Epoll stressing benchmarks", epoll_benchmarks }, #endif + { "internals", "Perf-internals benchmarks", internals_benchmarks }, { "all", "All benchmarks", NULL }, { NULL, NULL, NULL } }; diff --git a/tools/perf/builtin-buildid-cache.c b/tools/perf/builtin-buildid-cache.c index 39efa51d7fb3d0884b97c1a0f59b969332c69cd7..18821556b6031efc830a04506cd5d44190e38216 100644 --- a/tools/perf/builtin-buildid-cache.c +++ b/tools/perf/builtin-buildid-cache.c @@ -422,7 +422,7 @@ int cmd_buildid_cache(int argc, const char **argv) data.path = missing_filename; data.force = force; - session = perf_session__new(&data, false, NULL); + session = perf_session__new(&data, NULL); if (IS_ERR(session)) return PTR_ERR(session); } diff --git a/tools/perf/builtin-buildid-list.c b/tools/perf/builtin-buildid-list.c index e3ef75583514b61ffe571f4258332a5ca937e7d8..27b0ab5d03ef4739fde6469ac94bd5fd3e062890 100644 --- a/tools/perf/builtin-buildid-list.c +++ b/tools/perf/builtin-buildid-list.c @@ -65,7 +65,7 @@ static int perf_session__list_build_ids(bool force, bool with_hits) if (filename__fprintf_build_id(input_name, stdout) > 0) goto out; - session = perf_session__new(&data, false, &build_id__mark_dso_hit_ops); + session = perf_session__new(&data, &build_id__mark_dso_hit_ops); if (IS_ERR(session)) return PTR_ERR(session); diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c index 29d460c30176709f708dd5a3a6be60e39690e589..cd25ba49195c1a927b82ab87a70a1294a3b47b6c 100644 --- a/tools/perf/builtin-c2c.c +++ b/tools/perf/builtin-c2c.c @@ -2789,7 +2789,7 @@ static int perf_c2c__report(int argc, const char **argv) goto out; } - session = perf_session__new(&data, 0, &c2c.tool); + session = perf_session__new(&data, &c2c.tool); if (IS_ERR(session)) { err = PTR_ERR(session); pr_debug("Error creating perf session\n"); diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c index ca2fb44874e4b1503140af6edc8a744a142eed39..15ca23675ef028104ff031f6e3f79cb2c47dea51 100644 --- a/tools/perf/builtin-data.c +++ b/tools/perf/builtin-data.c @@ -7,7 +7,6 @@ #include "debug.h" #include #include "data-convert.h" -#include "data-convert-bt.h" typedef int (*data_cmd_fn_t)(int argc, const char **argv); @@ -55,7 +54,8 @@ static const char * const data_convert_usage[] = { static int cmd_data_convert(int argc, const char **argv) { - const char *to_ctf = NULL; + const char *to_json = NULL; + const char *to_ctf = NULL; struct perf_data_convert_opts opts = { .force = false, .all = false, @@ -63,19 +63,16 @@ static int cmd_data_convert(int argc, const char **argv) const struct option options[] = { OPT_INCR('v', "verbose", &verbose, "be more verbose"), OPT_STRING('i', "input", &input_name, "file", "input file name"), + OPT_STRING(0, "to-json", &to_json, NULL, "Convert to JSON format"), #ifdef HAVE_LIBBABELTRACE_SUPPORT OPT_STRING(0, "to-ctf", &to_ctf, NULL, "Convert to CTF format"), + OPT_BOOLEAN(0, "tod", &opts.tod, "Convert time to wall clock time"), #endif OPT_BOOLEAN('f', "force", &opts.force, "don't complain, do it"), OPT_BOOLEAN(0, "all", &opts.all, "Convert all events"), OPT_END() }; -#ifndef HAVE_LIBBABELTRACE_SUPPORT - pr_err("No conversion support compiled in. perf should be compiled with environment variables LIBBABELTRACE=1 and LIBBABELTRACE_DIR=/path/to/libbabeltrace/\n"); - return -1; -#endif - argc = parse_options(argc, argv, options, data_convert_usage, 0); if (argc) { @@ -83,11 +80,25 @@ static int cmd_data_convert(int argc, const char **argv) return -1; } + if (to_json && to_ctf) { + pr_err("You cannot specify both --to-ctf and --to-json.\n"); + return -1; + } + if (!to_json && !to_ctf) { + pr_err("You must specify one of --to-ctf or --to-json.\n"); + return -1; + } + + if (to_json) + return bt_convert__perf2json(input_name, to_json, &opts); + if (to_ctf) { #ifdef HAVE_LIBBABELTRACE_SUPPORT return bt_convert__perf2ctf(input_name, to_ctf, &opts); #else - pr_err("The libbabeltrace support is not compiled in.\n"); + pr_err("The libbabeltrace support is not compiled in. perf should be " + "compiled with environment variables LIBBABELTRACE=1 and " + "LIBBABELTRACE_DIR=/path/to/libbabeltrace/\n"); return -1; #endif } diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index 265682296836d390e6e5969ef7cc2f8e28253cc3..f679bf7c04b403bf6808c364690d0de6685bcad9 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -448,6 +448,7 @@ static struct perf_diff pdiff = { .fork = perf_event__process_fork, .lost = perf_event__process_lost, .namespaces = perf_event__process_namespaces, + .cgroup = perf_event__process_cgroup, .ordered_events = true, .ordering_requires_timestamps = true, }, @@ -1052,7 +1053,7 @@ static void data_process(void) data__fprintf(); /* Don't sort callchain for perf diff */ - perf_evsel__reset_sample_bit(evsel_base, CALLCHAIN); + evsel__reset_sample_bit(evsel_base, CALLCHAIN); hists__process(hists_base); } @@ -1153,7 +1154,7 @@ static int check_file_brstack(void) int i; data__for_each_file(i, d) { - d->session = perf_session__new(&d->data, false, &pdiff.tool); + d->session = perf_session__new(&d->data, &pdiff.tool); if (IS_ERR(d->session)) { pr_err("Failed to open %s\n", d->data.path); return PTR_ERR(d->session); @@ -1185,7 +1186,7 @@ static int __cmd_diff(void) ret = -EINVAL; data__for_each_file(i, d) { - d->session = perf_session__new(&d->data, false, &pdiff.tool); + d->session = perf_session__new(&d->data, &pdiff.tool); if (IS_ERR(d->session)) { ret = PTR_ERR(d->session); pr_err("Failed to open %s\n", d->data.path); diff --git a/tools/perf/builtin-evlist.c b/tools/perf/builtin-evlist.c index 440501994931dcb50beb3991c9065a6616311a98..d1d7a8370c095540fe93d8f5b13be72c2d11a041 100644 --- a/tools/perf/builtin-evlist.c +++ b/tools/perf/builtin-evlist.c @@ -17,6 +17,14 @@ #include "util/data.h" #include "util/debug.h" #include +#include "util/tool.h" + +static int process_header_feature(struct perf_session *session __maybe_unused, + union perf_event *event __maybe_unused) +{ + session_done = 1; + return 0; +} static int __cmd_evlist(const char *file_name, struct perf_attr_details *details) { @@ -27,12 +35,20 @@ static int __cmd_evlist(const char *file_name, struct perf_attr_details *details .mode = PERF_DATA_MODE_READ, .force = details->force, }; + struct perf_tool tool = { + /* only needed for pipe mode */ + .attr = perf_event__process_attr, + .feature = process_header_feature, + }; bool has_tracepoint = false; - session = perf_session__new(&data, 0, NULL); + session = perf_session__new(&data, &tool); if (IS_ERR(session)) return PTR_ERR(session); + if (data.is_pipe) + perf_session__process_events(session); + evlist__for_each_entry(session->evlist, pos) { perf_evsel__fprintf(pos, details, stdout); diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index 6b6bb86d62d3598d3193dd1a6ae59f8e8dae3e15..a45fa511d4e85995841220c07cddde02cee1c972 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -10,6 +10,7 @@ #include "util/color.h" #include "util/dso.h" +#include "util/vdso.h" #include "util/evlist.h" #include "util/evsel.h" #include "util/map.h" @@ -23,11 +24,16 @@ #include "util/symbol.h" #include "util/synthetic-events.h" #include "util/thread.h" -#include +#include "util/namespaces.h" + +#include +#include #include +#include /* To get things like MAP_HUGETLB even on older libc headers */ #include +#include #include #include @@ -35,16 +41,21 @@ struct perf_inject { struct perf_tool tool; struct perf_session *session; bool build_ids; + bool build_id_all; bool sched_stat; bool have_auxtrace; bool strip; bool jit_mode; + bool in_place_update; + bool in_place_update_dry_run; + bool is_pipe; const char *input_name; struct perf_data output; u64 bytes_written; u64 aux_id; struct list_head samples; struct itrace_synth_opts itrace_synth_opts; + struct perf_file_section secs[HEADER_FEAT_BITS]; }; struct event_entry { @@ -53,6 +64,9 @@ struct event_entry { union perf_event event[0]; }; +static int dso__inject_build_id(struct dso *dso, struct perf_tool *tool, + struct machine *machine, u8 cpumode, u32 flags); + static int output_bytes(struct perf_inject *inject, void *buf, size_t sz) { ssize_t size; @@ -108,7 +122,7 @@ static int perf_event__repipe_attr(struct perf_tool *tool, if (ret) return ret; - if (!inject->output.is_pipe) + if (!inject->is_pipe) return 0; return perf_event__repipe_synth(tool, event); @@ -274,6 +288,68 @@ static int perf_event__jit_repipe_mmap(struct perf_tool *tool, } #endif +static struct dso *findnew_dso(int pid, int tid, const char *filename, + struct dso_id *id, struct machine *machine) +{ + struct thread *thread; + struct nsinfo *nsi = NULL; + struct nsinfo *nnsi; + struct dso *dso; + bool vdso; + + thread = machine__findnew_thread(machine, pid, tid); + if (thread == NULL) { + pr_err("cannot find or create a task %d/%d.\n", tid, pid); + return NULL; + } + + vdso = is_vdso_map(filename); + nsi = nsinfo__get(thread->nsinfo); + + if (vdso) { + /* The vdso maps are always on the host and not the + * container. Ensure that we don't use setns to look + * them up. + */ + nnsi = nsinfo__copy(nsi); + if (nnsi) { + nsinfo__put(nsi); + nnsi->need_setns = false; + nsi = nnsi; + } + dso = machine__findnew_vdso(machine, thread); + } else { + dso = machine__findnew_dso_id(machine, filename, id); + } + + if (dso) + dso->nsinfo = nsi; + else + nsinfo__put(nsi); + + thread__put(thread); + return dso; +} + +static int perf_event__repipe_buildid_mmap(struct perf_tool *tool, + union perf_event *event, + struct perf_sample *sample, + struct machine *machine) +{ + struct dso *dso; + + dso = findnew_dso(event->mmap.pid, event->mmap.tid, + event->mmap.filename, NULL, machine); + + if (dso && !dso->hit) { + dso->hit = 1; + dso__inject_build_id(dso, tool, machine, sample->cpumode, 0); + dso__put(dso); + } + + return perf_event__repipe(tool, event, sample, machine); +} + static int perf_event__repipe_mmap2(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, @@ -312,6 +388,34 @@ static int perf_event__jit_repipe_mmap2(struct perf_tool *tool, } #endif +static int perf_event__repipe_buildid_mmap2(struct perf_tool *tool, + union perf_event *event, + struct perf_sample *sample, + struct machine *machine) +{ + struct dso_id dso_id = { + .maj = event->mmap2.maj, + .min = event->mmap2.min, + .ino = event->mmap2.ino, + .ino_generation = event->mmap2.ino_generation, + }; + struct dso *dso; + + dso = findnew_dso(event->mmap2.pid, event->mmap2.tid, + event->mmap2.filename, &dso_id, machine); + + if (dso && !dso->hit) { + dso->hit = 1; + dso__inject_build_id(dso, tool, machine, sample->cpumode, + event->mmap2.flags); + dso__put(dso); + } + + perf_event__repipe(tool, event, sample, machine); + + return 0; +} + static int perf_event__repipe_fork(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, @@ -387,34 +491,38 @@ static int perf_event__repipe_id_index(struct perf_session *session, static int dso__read_build_id(struct dso *dso) { + struct nscookie nsc; + if (dso->has_build_id) return 0; + nsinfo__mountns_enter(dso->nsinfo, &nsc); if (filename__read_build_id(dso->long_name, dso->build_id, sizeof(dso->build_id)) > 0) { dso->has_build_id = true; - return 0; } + nsinfo__mountns_exit(&nsc); - return -1; + return dso->has_build_id ? 0 : -1; } static int dso__inject_build_id(struct dso *dso, struct perf_tool *tool, - struct machine *machine) + struct machine *machine, u8 cpumode, u32 flags) { - u16 misc = PERF_RECORD_MISC_USER; int err; + if (is_anon_memory(dso->long_name) || flags & MAP_HUGETLB) + return 0; + if (is_no_dso_memory(dso->long_name)) + return 0; + if (dso__read_build_id(dso) < 0) { pr_debug("no build_id found for %s\n", dso->long_name); return -1; } - if (dso->kernel) - misc = PERF_RECORD_MISC_KERNEL; - - err = perf_event__synthesize_build_id(tool, dso, misc, perf_event__repipe, - machine); + err = perf_event__synthesize_build_id(tool, dso, cpumode, + perf_event__repipe, machine); if (err) { pr_err("Can't synthesize build_id event for %s\n", dso->long_name); return -1; @@ -423,11 +531,10 @@ static int dso__inject_build_id(struct dso *dso, struct perf_tool *tool, return 0; } -static int perf_event__inject_buildid(struct perf_tool *tool, - union perf_event *event, - struct perf_sample *sample, - struct evsel *evsel __maybe_unused, - struct machine *machine) +int perf_event__inject_buildid(struct perf_tool *tool, union perf_event *event, + struct perf_sample *sample, + struct evsel *evsel __maybe_unused, + struct machine *machine) { struct addr_location al; struct thread *thread; @@ -442,19 +549,8 @@ static int perf_event__inject_buildid(struct perf_tool *tool, if (thread__find_map(thread, sample->cpumode, sample->ip, &al)) { if (!al.map->dso->hit) { al.map->dso->hit = 1; - if (map__load(al.map) >= 0) { - dso__inject_build_id(al.map->dso, tool, machine); - /* - * If this fails, too bad, let the other side - * account this as unresolved. - */ - } else { -#ifdef HAVE_LIBELF_SUPPORT - pr_warning("no symbols found in %s, maybe " - "install a debug package?\n", - al.map->dso->long_name); -#endif - } + dso__inject_build_id(al.map->dso, tool, machine, + sample->cpumode, al.map->flags); } } @@ -630,18 +726,166 @@ static void strip_fini(struct perf_inject *inject) } } +static int parse_vm_time_correlation(const struct option *opt, const char *str, int unset) +{ + struct perf_inject *inject = opt->value; + const char *args; + char *dry_run; + + if (unset) + return 0; + + inject->itrace_synth_opts.set = true; + inject->itrace_synth_opts.vm_time_correlation = true; + inject->in_place_update = true; + + if (!str) + return 0; + + dry_run = skip_spaces(str); + if (!strncmp(dry_run, "dry-run", strlen("dry-run"))) { + inject->itrace_synth_opts.vm_tm_corr_dry_run = true; + inject->in_place_update_dry_run = true; + args = dry_run + strlen("dry-run"); + } else { + args = str; + } + + inject->itrace_synth_opts.vm_tm_corr_args = strdup(args); + + return inject->itrace_synth_opts.vm_tm_corr_args ? 0 : -ENOMEM; +} + +static int save_section_info_cb(struct perf_file_section *section, + struct perf_header *ph __maybe_unused, + int feat, int fd __maybe_unused, void *data) +{ + struct perf_inject *inject = data; + + inject->secs[feat] = *section; + return 0; +} + +static int save_section_info(struct perf_inject *inject) +{ + struct perf_header *header = &inject->session->header; + int fd = perf_data__fd(inject->session->data); + + return perf_header__process_sections(header, fd, inject, save_section_info_cb); +} + +static bool keep_feat(int feat) +{ + switch (feat) { + /* Keep original information that describes the machine or software */ + case HEADER_TRACING_DATA: + case HEADER_HOSTNAME: + case HEADER_OSRELEASE: + case HEADER_VERSION: + case HEADER_ARCH: + case HEADER_NRCPUS: + case HEADER_CPUDESC: + case HEADER_CPUID: + case HEADER_TOTAL_MEM: + case HEADER_CPU_TOPOLOGY: + case HEADER_NUMA_TOPOLOGY: + case HEADER_PMU_MAPPINGS: + case HEADER_CACHE: + case HEADER_MEM_TOPOLOGY: + case HEADER_CLOCKID: + case HEADER_BPF_PROG_INFO: + case HEADER_BPF_BTF: + case HEADER_CPU_PMU_CAPS: + case HEADER_CLOCK_DATA: + case HEADER_HYBRID_TOPOLOGY: + case HEADER_PMU_CAPS: + return true; + /* Information that can be updated */ + case HEADER_BUILD_ID: + case HEADER_CMDLINE: + case HEADER_EVENT_DESC: + case HEADER_BRANCH_STACK: + case HEADER_GROUP_DESC: + case HEADER_AUXTRACE: + case HEADER_STAT: + case HEADER_SAMPLE_TIME: + case HEADER_DIR_FORMAT: + case HEADER_COMPRESSED: + default: + return false; + }; +} + +static int read_file(int fd, u64 offs, void *buf, size_t sz) +{ + ssize_t ret = preadn(fd, buf, sz, offs); + + if (ret < 0) + return -errno; + if ((size_t)ret != sz) + return -EINVAL; + return 0; +} + +static int feat_copy(struct perf_inject *inject, int feat, struct feat_writer *fw) +{ + int fd = perf_data__fd(inject->session->data); + u64 offs = inject->secs[feat].offset; + size_t sz = inject->secs[feat].size; + void *buf = malloc(sz); + int ret; + + if (!buf) + return -ENOMEM; + + ret = read_file(fd, offs, buf, sz); + if (ret) + goto out_free; + + ret = fw->write(fw, buf, sz); +out_free: + free(buf); + return ret; +} + +struct inject_fc { + struct feat_copier fc; + struct perf_inject *inject; +}; + +static int feat_copy_cb(struct feat_copier *fc, int feat, struct feat_writer *fw) +{ + struct inject_fc *inj_fc = container_of(fc, struct inject_fc, fc); + struct perf_inject *inject = inj_fc->inject; + int ret; + + if (!inject->secs[feat].offset || + !keep_feat(feat)) + return 0; + + ret = feat_copy(inject, feat, fw); + if (ret < 0) + return ret; + + return 1; /* Feature section copied */ +} + +static int output_fd(struct perf_inject *inject) +{ + return inject->in_place_update ? -1 : perf_data__fd(&inject->output); +} + static int __cmd_inject(struct perf_inject *inject) { int ret = -EINVAL; struct perf_session *session = inject->session; - struct perf_data *data_out = &inject->output; - int fd = perf_data__fd(data_out); + int fd = output_fd(inject); u64 output_data_offset; signal(SIGINT, sig_handler); if (inject->build_ids || inject->sched_stat || - inject->itrace_synth_opts.set) { + inject->itrace_synth_opts.set || inject->build_id_all) { inject->tool.mmap = perf_event__repipe_mmap; inject->tool.mmap2 = perf_event__repipe_mmap2; inject->tool.fork = perf_event__repipe_fork; @@ -650,7 +894,10 @@ static int __cmd_inject(struct perf_inject *inject) output_data_offset = session->header.data_offset; - if (inject->build_ids) { + if (inject->build_id_all) { + inject->tool.mmap = perf_event__repipe_buildid_mmap; + inject->tool.mmap2 = perf_event__repipe_buildid_mmap2; + } else if (inject->build_ids) { inject->tool.sample = perf_event__inject_buildid; } else if (inject->sched_stat) { struct evsel *evsel; @@ -668,6 +915,15 @@ static int __cmd_inject(struct perf_inject *inject) else if (!strncmp(name, "sched:sched_stat_", 17)) evsel->handler = perf_inject__sched_stat; } + } else if (inject->itrace_synth_opts.vm_time_correlation) { + session->itrace_synth_opts = &inject->itrace_synth_opts; + memset(&inject->tool, 0, sizeof(inject->tool)); + inject->tool.id_index = perf_event__process_id_index; + inject->tool.auxtrace_info = perf_event__process_auxtrace_info; + inject->tool.auxtrace = perf_event__process_auxtrace; + inject->tool.auxtrace_error = perf_event__process_auxtrace_error; + inject->tool.ordered_events = true; + inject->tool.ordering_requires_timestamps = true; } else if (inject->itrace_synth_opts.set) { session->itrace_synth_opts = &inject->itrace_synth_opts; inject->itrace_synth_opts.inject = true; @@ -690,14 +946,19 @@ static int __cmd_inject(struct perf_inject *inject) if (!inject->itrace_synth_opts.set) auxtrace_index__free(&session->auxtrace_index); - if (!data_out->is_pipe) + if (!inject->is_pipe && !inject->in_place_update) lseek(fd, output_data_offset, SEEK_SET); ret = perf_session__process_events(session); if (ret) return ret; - if (!data_out->is_pipe) { + if (!inject->is_pipe && !inject->in_place_update) { + struct inject_fc inj_fc = { + .fc.copy = feat_copy_cb, + .inject = inject, + }; + if (inject->build_ids) perf_header__set_feat(&session->header, HEADER_BUILD_ID); @@ -734,7 +995,7 @@ static int __cmd_inject(struct perf_inject *inject) } session->header.data_offset = output_data_offset; session->header.data_size = inject->bytes_written; - perf_session__write_header(session, session->evlist, fd, true); + perf_session__inject_header(session, session->evlist, fd, &inj_fc.fc); } return ret; @@ -774,16 +1035,21 @@ int cmd_inject(int argc, const char **argv) .output = { .path = "-", .mode = PERF_DATA_MODE_WRITE, + .use_stdio = true, }, }; struct perf_data data = { .mode = PERF_DATA_MODE_READ, + .use_stdio = true, }; int ret; + bool repipe = true; struct option options[] = { OPT_BOOLEAN('b', "build-ids", &inject.build_ids, "Inject build-ids into the output stream"), + OPT_BOOLEAN(0, "buildid-all", &inject.build_id_all, + "Inject build-ids of all DSOs into the output stream"), OPT_STRING('i', "input", &inject.input_name, "file", "input file name"), OPT_STRING('o', "output", &inject.output.path, "file", @@ -805,6 +1071,9 @@ int cmd_inject(int argc, const char **argv) itrace_parse_synth_opts), OPT_BOOLEAN(0, "strip", &inject.strip, "strip non-synthesized events (use with --itrace)"), + OPT_CALLBACK_OPTARG(0, "vm-time-correlation", &inject, NULL, "opts", + "correlate time between VM guests and the host", + parse_vm_time_correlation), OPT_END() }; const char * const inject_usage[] = { @@ -827,15 +1096,42 @@ int cmd_inject(int argc, const char **argv) return -1; } - if (perf_data__open(&inject.output)) { + if (inject.in_place_update) { + if (!strcmp(inject.input_name, "-")) { + pr_err("Input file name required for in-place updating\n"); + return -1; + } + if (strcmp(inject.output.path, "-")) { + pr_err("Output file name must not be specified for in-place updating\n"); + return -1; + } + if (!data.force && !inject.in_place_update_dry_run) { + pr_err("The input file would be updated in place, " + "the --force option is required.\n"); + return -1; + } + if (!inject.in_place_update_dry_run) + data.in_place_update = true; + } else if (perf_data__open(&inject.output)) { perror("failed to create output file"); return -1; } - inject.tool.ordered_events = inject.sched_stat; - data.path = inject.input_name; - inject.session = perf_session__new(&data, inject.output.is_pipe, &inject.tool); + if (!strcmp(inject.input_name, "-") || inject.output.is_pipe) { + inject.is_pipe = true; + /* + * Do not repipe header when input is a regular file + * since either it can rewrite the header at the end + * or write a new pipe header. + */ + if (strcmp(inject.input_name, "-")) + repipe = false; + } + + inject.session = __perf_session__new(&data, repipe, + output_fd(&inject), + &inject.tool); if (IS_ERR(inject.session)) { ret = PTR_ERR(inject.session); goto out_close_output; @@ -844,7 +1140,27 @@ int cmd_inject(int argc, const char **argv) if (zstd_init(&(inject.session->zstd_data), 0) < 0) pr_warning("Decompression initialization failed.\n"); - if (inject.build_ids) { + /* Save original section info before feature bits change */ + ret = save_section_info(&inject); + if (ret) + goto out_delete; + + if (!data.is_pipe && inject.output.is_pipe) { + ret = perf_header__write_pipe(perf_data__fd(&inject.output)); + if (ret < 0) { + pr_err("Couldn't write a new pipe header.\n"); + goto out_delete; + } + + ret = perf_event__synthesize_for_pipe(&inject.tool, + inject.session, + &inject.output, + perf_event__repipe); + if (ret < 0) + goto out_delete; + } + + if (inject.build_ids && !inject.build_id_all) { /* * to make sure the mmap records are ordered correctly * and so that the correct especially due to jitted code @@ -854,6 +1170,11 @@ int cmd_inject(int argc, const char **argv) inject.tool.ordered_events = true; inject.tool.ordering_requires_timestamps = true; } + + if (inject.sched_stat) { + inject.tool.ordered_events = true; + } + #ifdef HAVE_JITDUMP if (inject.jit_mode) { inject.tool.mmap2 = perf_event__jit_repipe_mmap2; @@ -876,6 +1197,7 @@ int cmd_inject(int argc, const char **argv) out_delete: zstd_fini(&(inject.session->zstd_data)); perf_session__delete(inject.session); + free(inject.itrace_synth_opts.vm_tm_corr_args); out_close_output: perf_data__close(&inject.output); return ret; diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index 9661671cc26ec29029b895aa3a4c630a31e65cd0..afa36cbd34e351bb9b4c69c83db6e598dd2b31ed 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -1957,7 +1957,7 @@ int cmd_kmem(int argc, const char **argv) data.path = input_name; - kmem_session = session = perf_session__new(&data, false, &perf_kmem); + kmem_session = session = perf_session__new(&data, &perf_kmem); if (IS_ERR(session)) return PTR_ERR(session); diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 58a9e09894913741c47e129dcbb32fdd67042873..1b49ae6001d64ebf956dfceac05df62a9d26fed6 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -46,6 +46,7 @@ #include #include #include +#include static const char *get_filename_for_perf_kvm(void) { @@ -759,14 +760,14 @@ static s64 perf_kvm__mmap_read_idx(struct perf_kvm_stat *kvm, int idx, *mmap_time = ULLONG_MAX; md = &evlist->mmap[idx]; - err = perf_mmap__read_init(md); + err = perf_mmap__read_init(&md->core); if (err < 0) return (err == -EAGAIN) ? 0 : -1; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { err = perf_evlist__parse_sample_timestamp(evlist, event, ×tamp); if (err) { - perf_mmap__consume(md); + perf_mmap__consume(&md->core); pr_err("Failed to parse sample\n"); return -1; } @@ -776,7 +777,7 @@ static s64 perf_kvm__mmap_read_idx(struct perf_kvm_stat *kvm, int idx, * FIXME: Here we can't consume the event, as perf_session__queue_event will * point to it, and it'll get possibly overwritten by the kernel. */ - perf_mmap__consume(md); + perf_mmap__consume(&md->core); if (err) { pr_err("Failed to enqueue sample: %d\n", err); @@ -793,7 +794,7 @@ static s64 perf_kvm__mmap_read_idx(struct perf_kvm_stat *kvm, int idx, break; } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); return n; } @@ -1032,16 +1033,16 @@ static int kvm_live_open_events(struct perf_kvm_stat *kvm) struct perf_event_attr *attr = &pos->core.attr; /* make sure these *are* set */ - perf_evsel__set_sample_bit(pos, TID); - perf_evsel__set_sample_bit(pos, TIME); - perf_evsel__set_sample_bit(pos, CPU); - perf_evsel__set_sample_bit(pos, RAW); + evsel__set_sample_bit(pos, TID); + evsel__set_sample_bit(pos, TIME); + evsel__set_sample_bit(pos, CPU); + evsel__set_sample_bit(pos, RAW); /* make sure these are *not*; want as small a sample as possible */ - perf_evsel__reset_sample_bit(pos, PERIOD); - perf_evsel__reset_sample_bit(pos, IP); - perf_evsel__reset_sample_bit(pos, CALLCHAIN); - perf_evsel__reset_sample_bit(pos, ADDR); - perf_evsel__reset_sample_bit(pos, READ); + evsel__reset_sample_bit(pos, PERIOD); + evsel__reset_sample_bit(pos, IP); + evsel__reset_sample_bit(pos, CALLCHAIN); + evsel__reset_sample_bit(pos, ADDR); + evsel__reset_sample_bit(pos, READ); attr->mmap = 0; attr->comm = 0; attr->task = 0; @@ -1093,7 +1094,7 @@ static int read_events(struct perf_kvm_stat *kvm) }; kvm->tool = eops; - kvm->session = perf_session__new(&file, false, &kvm->tool); + kvm->session = perf_session__new(&file, &kvm->tool); if (IS_ERR(kvm->session)) { pr_err("Initializing perf session failed\n"); return PTR_ERR(kvm->session); @@ -1319,7 +1320,7 @@ static struct evlist *kvm_live_event_list(void) *name = '\0'; name++; - if (perf_evlist__add_newtp(evlist, sys, name, NULL)) { + if (evlist__add_newtp(evlist, sys, name, NULL)) { pr_err("Failed to add %s tracepoint to the list\n", *events_tp); free(tp); goto out; @@ -1448,7 +1449,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, /* * perf session */ - kvm->session = perf_session__new(&data, false, &kvm->tool); + kvm->session = perf_session__new(&data, &kvm->tool); if (IS_ERR(kvm->session)) { err = PTR_ERR(kvm->session); goto out; diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c index d06665077dfe0289f1fb6e28a7e69cc6f37b6fe7..0b29babb38bc4b03565f571918ee5e011f2ebcc6 100644 --- a/tools/perf/builtin-lock.c +++ b/tools/perf/builtin-lock.c @@ -872,7 +872,7 @@ static int __cmd_report(bool display_info) .force = force, }; - session = perf_session__new(&data, false, &eops); + session = perf_session__new(&data, &eops); if (IS_ERR(session)) { pr_err("Initializing perf session failed\n"); return PTR_ERR(session); diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c index a13f5817d6fca4abac7cdfa80f5b90b1a8c3a097..a0d93d7e0f63a1fb639bf0ac441bfa2bed1bb6c3 100644 --- a/tools/perf/builtin-mem.c +++ b/tools/perf/builtin-mem.c @@ -247,8 +247,7 @@ static int report_raw_events(struct perf_mem *mem) .force = mem->force, }; int ret; - struct perf_session *session = perf_session__new(&data, false, - &mem->tool); + struct perf_session *session = perf_session__new(&data, &mem->tool); if (IS_ERR(session)) return PTR_ERR(session); diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index ceae6168d329ce8048651e932619e2c57e39b2e5..2af9b868581789b312bd0803eab4e98bb7a271ba 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -34,6 +34,7 @@ #include "util/tsc.h" #include "util/parse-branch-options.h" #include "util/parse-regs-options.h" +#include "util/perf_api_probe.h" #include "util/llvm-utils.h" #include "util/bpf-loader.h" #include "util/trigger.h" @@ -43,6 +44,9 @@ #include "util/time-utils.h" #include "util/units.h" #include "util/bpf-event.h" +#include "util/pmu-hybrid.h" +#include "util/evlist-hybrid.h" +#include "util/clockid.h" #include "asm/bug.h" #include "perf.h" @@ -59,6 +63,7 @@ #include #include #include +#include struct switch_output { bool enabled; @@ -80,6 +85,7 @@ struct record { struct auxtrace_record *itr; struct evlist *evlist; struct perf_session *session; + struct evlist *sb_evlist; int realtime_prio; bool no_buildid; bool no_buildid_set; @@ -197,7 +203,7 @@ static int record__aio_complete(struct mmap *md, struct aiocb *cblock) * every aio write request started in record__aio_push() so * decrement it because the request is now complete. */ - perf_mmap__put(md); + perf_mmap__put(&md->core); rc = 1; } else { /* @@ -276,7 +282,7 @@ static int record__aio_pushfn(struct mmap *map, void *to, void *buf, size_t size if (record__comp_enabled(aio->rec)) { size = zstd_compress(aio->rec->session, aio->data + aio->size, - perf_mmap__mmap_len(map) - aio->size, + mmap__mmap_len(map) - aio->size, buf, size); } else { memcpy(aio->data + aio->size, buf, size); @@ -293,7 +299,7 @@ static int record__aio_pushfn(struct mmap *map, void *to, void *buf, size_t size * after started aio request completion or at record__aio_push() * if the request failed to start. */ - perf_mmap__get(map); + perf_mmap__get(&map->core); } aio->size += size; @@ -332,7 +338,7 @@ static int record__aio_push(struct record *rec, struct mmap *map, off_t *off) * map->refcount is decremented in record__aio_complete() after * aio write operation finishes successfully. */ - perf_mmap__put(map); + perf_mmap__put(&map->core); } return ret; @@ -488,7 +494,7 @@ static int record__pushfn(struct mmap *map, void *to, void *bf, size_t size) struct record *rec = to; if (record__comp_enabled(rec)) { - size = zstd_compress(rec->session, map->data, perf_mmap__mmap_len(map), bf, size); + size = zstd_compress(rec->session, map->data, mmap__mmap_len(map), bf, size); bf = map->data; } @@ -755,19 +761,28 @@ static int record__open(struct record *rec) int rc = 0; /* - * For initial_delay we need to add a dummy event so that we can track - * PERF_RECORD_MMAP while we wait for the initial delay to enable the - * real events, the ones asked by the user. + * For initial_delay or system wide, we need to add a dummy event so + * that we can track PERF_RECORD_MMAP to cover the delay of waiting or + * event synthesis. */ - if (opts->initial_delay) { - if (perf_evlist__add_dummy(evlist)) + if (opts->initial_delay || target__has_cpu(&opts->target)) { + if (evlist__add_dummy(evlist)) return -ENOMEM; + /* Disable tracking of mmaps on lead event. */ pos = evlist__first(evlist); pos->tracking = 0; + /* Set up dummy event. */ pos = evlist__last(evlist); pos->tracking = 1; - pos->core.attr.enable_on_exec = 1; + /* + * Enable the dummy event when the process is forked for + * initial_delay, immediately for system wide. + */ + if (opts->initial_delay) + pos->core.attr.enable_on_exec = 1; + else + pos->immediate = 1; } perf_evlist__config(evlist, opts, &callchain_param); @@ -783,7 +798,7 @@ static int record__open(struct record *rec) if ((errno == EINVAL || errno == EBADF) && pos->leader != pos && pos->weak_group) { - pos = perf_evlist__reset_weak_group(evlist, pos); + pos = perf_evlist__reset_weak_group(evlist, pos, true); goto try_again; } rc = -errno; @@ -1061,6 +1076,9 @@ static void record__init_features(struct record *rec) if (!(rec->opts.use_clockid && rec->opts.clockid_res_ns)) perf_header__clear_feat(&session->header, HEADER_CLOCKID); + if (!rec->opts.use_clockid) + perf_header__clear_feat(&session->header, HEADER_CLOCK_DATA); + perf_header__clear_feat(&session->header, HEADER_DIR_FORMAT); if (!record__comp_enabled(rec)) perf_header__clear_feat(&session->header, HEADER_COMPRESSED); @@ -1231,48 +1249,18 @@ static int record__synthesize(struct record *rec, bool tail) struct perf_data *data = &rec->data; struct record_opts *opts = &rec->opts; struct perf_tool *tool = &rec->tool; - int fd = perf_data__fd(data); int err = 0; if (rec->opts.tail_synthesize != tail) return 0; if (data->is_pipe) { - /* - * We need to synthesize events first, because some - * features works on top of them (on report side). - */ - err = perf_event__synthesize_attrs(tool, rec->evlist, - process_synthesized_event); - if (err < 0) { - pr_err("Couldn't synthesize attrs.\n"); - goto out; - } - - err = perf_event__synthesize_features(tool, session, rec->evlist, + err = perf_event__synthesize_for_pipe(tool, session, data, process_synthesized_event); - if (err < 0) { - pr_err("Couldn't synthesize features.\n"); - return err; - } + if (err < 0) + goto out; - if (have_tracepoints(&rec->evlist->core.entries)) { - /* - * FIXME err <= 0 here actually means that - * there were no tracepoints so its not really - * an error, just that we don't need to - * synthesize anything. We really have to - * return this more properly and also - * propagate errors that now are calling die() - */ - err = perf_event__synthesize_tracing_data(tool, fd, rec->evlist, - process_synthesized_event); - if (err <= 0) { - pr_err("Couldn't record tracing data.\n"); - goto out; - } - rec->bytes_written += err; - } + rec->bytes_written += err; } err = perf_event__synth_time_conv(record__pick_pc(rec), tool, @@ -1349,6 +1337,40 @@ static int record__synthesize(struct record *rec, bool tail) return err; } +static int record__init_clock(struct record *rec) +{ + struct perf_session *session = rec->session; + struct timespec ref_clockid; + struct timeval ref_tod; + u64 ref; + + if (!rec->opts.use_clockid) + return 0; + + session->header.env.clock.clockid = rec->opts.clockid; + + if (gettimeofday(&ref_tod, NULL) != 0) { + pr_err("gettimeofday failed, cannot set reference time.\n"); + return -1; + } + + if (clock_gettime(rec->opts.clockid, &ref_clockid)) { + pr_err("clock_gettime failed, cannot set reference time.\n"); + return -1; + } + + ref = (u64) ref_tod.tv_sec * NSEC_PER_SEC + + (u64) ref_tod.tv_usec * NSEC_PER_USEC; + + session->header.env.clock.tod_ns = ref; + + ref = (u64) ref_clockid.tv_sec * NSEC_PER_SEC + + (u64) ref_clockid.tv_nsec; + + session->header.env.clock.clockid_ns = ref; + return 0; +} + static int __cmd_record(struct record *rec, int argc, const char **argv) { int err; @@ -1360,7 +1382,6 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) struct perf_data *data = &rec->data; struct perf_session *session; bool disabled = false, draining = false; - struct evlist *sb_evlist = NULL; int fd; float ratio = 0; @@ -1373,6 +1394,15 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) if (rec->opts.record_namespaces) tool->namespace_events = true; + if (rec->opts.record_cgroup) { +#ifdef HAVE_FILE_HANDLE + tool->cgroup_events = true; +#else + pr_err("cgroup tracking is not supported\n"); + return -1; +#endif + } + if (rec->opts.auxtrace_snapshot_mode || rec->switch_output.enabled) { signal(SIGUSR2, snapshot_sig_handler); if (rec->opts.auxtrace_snapshot_mode) @@ -1383,7 +1413,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) signal(SIGUSR2, SIG_IGN); } - session = perf_session__new(data, false, tool); + session = perf_session__new(data, tool); if (IS_ERR(session)) { pr_err("Perf session creation failed.\n"); return PTR_ERR(session); @@ -1400,6 +1430,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) session->header.env.comp_type = PERF_COMP_ZSTD; session->header.env.comp_level = rec->opts.comp_level; + if (record__init_clock(rec)) + return -1; + record__init_features(rec); if (rec->opts.use_clockid && rec->opts.clockid_res_ns) @@ -1463,18 +1496,29 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) goto out_child; } + err = -1; if (!rec->no_buildid && !perf_header__has_feat(&session->header, HEADER_BUILD_ID)) { pr_err("Couldn't generate buildids. " "Use --no-buildid to profile anyway.\n"); - err = -1; goto out_child; } - if (!opts->no_bpf_event) - bpf_event__add_sb_event(&sb_evlist, &session->header.env); + if (!opts->no_bpf_event) { + rec->sb_evlist = evlist__new(); + + if (rec->sb_evlist == NULL) { + pr_err("Couldn't create side band evlist.\n."); + goto out_child; + } - if (perf_evlist__start_sb_thread(sb_evlist, &rec->opts.target)) { + if (evlist__add_bpf_sb_event(rec->sb_evlist, &session->header.env)) { + pr_err("Couldn't ask for PERF_RECORD_BPF_EVENT side band events.\n."); + goto out_child; + } + } + + if (perf_evlist__start_sb_thread(rec->sb_evlist, &rec->opts.target)) { pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n"); opts->no_bpf_event = true; } @@ -1748,7 +1792,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) perf_session__delete(session); if (!opts->no_bpf_event) - perf_evlist__stop_sb_thread(sb_evlist); + perf_evlist__stop_sb_thread(rec->sb_evlist); return status; } @@ -1840,103 +1884,6 @@ static int perf_record_config(const char *var, const char *value, void *cb) return 0; } -struct clockid_map { - const char *name; - int clockid; -}; - -#define CLOCKID_MAP(n, c) \ - { .name = n, .clockid = (c), } - -#define CLOCKID_END { .name = NULL, } - - -/* - * Add the missing ones, we need to build on many distros... - */ -#ifndef CLOCK_MONOTONIC_RAW -#define CLOCK_MONOTONIC_RAW 4 -#endif -#ifndef CLOCK_BOOTTIME -#define CLOCK_BOOTTIME 7 -#endif -#ifndef CLOCK_TAI -#define CLOCK_TAI 11 -#endif - -static const struct clockid_map clockids[] = { - /* available for all events, NMI safe */ - CLOCKID_MAP("monotonic", CLOCK_MONOTONIC), - CLOCKID_MAP("monotonic_raw", CLOCK_MONOTONIC_RAW), - - /* available for some events */ - CLOCKID_MAP("realtime", CLOCK_REALTIME), - CLOCKID_MAP("boottime", CLOCK_BOOTTIME), - CLOCKID_MAP("tai", CLOCK_TAI), - - /* available for the lazy */ - CLOCKID_MAP("mono", CLOCK_MONOTONIC), - CLOCKID_MAP("raw", CLOCK_MONOTONIC_RAW), - CLOCKID_MAP("real", CLOCK_REALTIME), - CLOCKID_MAP("boot", CLOCK_BOOTTIME), - - CLOCKID_END, -}; - -static int get_clockid_res(clockid_t clk_id, u64 *res_ns) -{ - struct timespec res; - - *res_ns = 0; - if (!clock_getres(clk_id, &res)) - *res_ns = res.tv_nsec + res.tv_sec * NSEC_PER_SEC; - else - pr_warning("WARNING: Failed to determine specified clock resolution.\n"); - - return 0; -} - -static int parse_clockid(const struct option *opt, const char *str, int unset) -{ - struct record_opts *opts = (struct record_opts *)opt->value; - const struct clockid_map *cm; - const char *ostr = str; - - if (unset) { - opts->use_clockid = 0; - return 0; - } - - /* no arg passed */ - if (!str) - return 0; - - /* no setting it twice */ - if (opts->use_clockid) - return -1; - - opts->use_clockid = true; - - /* if its a number, we're done */ - if (sscanf(str, "%d", &opts->clockid) == 1) - return get_clockid_res(opts->clockid, &opts->clockid_res_ns); - - /* allow a "CLOCK_" prefix to the name */ - if (!strncasecmp(str, "CLOCK_", 6)) - str += 6; - - for (cm = clockids; cm->name; cm++) { - if (!strcasecmp(str, cm->name)) { - opts->clockid = cm->clockid; - return get_clockid_res(opts->clockid, - &opts->clockid_res_ns); - } - } - - opts->use_clockid = false; - ui__warning("unknown clockid %s, check man page\n", ostr); - return -1; -} static int record__parse_affinity(const struct option *opt, const char *str, int unset) { @@ -2182,6 +2129,10 @@ static struct option __record_options[] = { OPT_BOOLEAN('d', "data", &record.opts.sample_address, "Record the sample addresses"), OPT_BOOLEAN(0, "phys-data", &record.opts.sample_phys_addr, "Record the sample physical addresses"), + OPT_BOOLEAN(0, "data-page-size", &record.opts.sample_data_page_size, + "Record the sampled data address data page size"), + OPT_BOOLEAN(0, "code-page-size", &record.opts.sample_code_page_size, + "Record the sampled code address (ip) page size"), OPT_BOOLEAN(0, "sample-cpu", &record.opts.sample_cpu, "Record the sample cpu"), OPT_BOOLEAN_SET('T', "timestamp", &record.opts.sample_time, &record.opts.sample_time_set, @@ -2236,6 +2187,8 @@ static struct option __record_options[] = { "per thread proc mmap processing timeout in ms"), OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces, "Record namespaces events"), + OPT_BOOLEAN(0, "all-cgroups", &record.opts.record_cgroup, + "Record cgroup events"), OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events, "Record context switch events"), OPT_BOOLEAN_FLAG(0, "all-kernel", &record.opts.all_kernel, @@ -2429,10 +2382,19 @@ int cmd_record(int argc, const char **argv) if (record.opts.overwrite) record.opts.tail_synthesize = true; - if (rec->evlist->core.nr_entries == 0 && - __perf_evlist__add_default(rec->evlist, !record.opts.no_samples) < 0) { - pr_err("Not enough memory for event selector list\n"); - goto out; + if (rec->evlist->core.nr_entries == 0) { + if (perf_pmu__has_hybrid()) { + err = evlist__add_default_hybrid(rec->evlist, + !record.opts.no_samples); + } else { + err = __evlist__add_default(rec->evlist, + !record.opts.no_samples); + } + + if (err < 0) { + pr_err("Not enough memory for event selector list\n"); + goto out; + } } if (rec->opts.target.tid && !rec->opts.no_inherit_set) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index dc228bdf2bbc207fab082cf5325c892e8f244999..137a2476c6824e5d0894e3b866d1eb25d982fb9b 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -740,7 +740,7 @@ static size_t maps__fprintf_task(struct maps *maps, int indent, FILE *fp) map->prot & PROT_EXEC ? 'x' : '-', map->flags & MAP_SHARED ? 's' : 'p', map->pgoff, - map->ino, map->dso->name); + map->dso->id.ino, map->dso->name); } return printed; @@ -1050,6 +1050,7 @@ int cmd_report(int argc, const char **argv) .mmap2 = perf_event__process_mmap2, .comm = perf_event__process_comm, .namespaces = perf_event__process_namespaces, + .cgroup = perf_event__process_cgroup, .exit = perf_event__process_exit, .fork = perf_event__process_fork, .lost = perf_event__process_lost, @@ -1070,6 +1071,8 @@ int cmd_report(int argc, const char **argv) .socket_filter = -1, .annotation_opts = annotation__default_options, }; + char *sort_order_help = sort_help("sort by key(s):"); + char *field_order_help = sort_help("output field(s): overhead period sample "); const struct option options[] = { OPT_STRING('i', "input", &input_name, "file", "input file name"), @@ -1104,9 +1107,9 @@ int cmd_report(int argc, const char **argv) OPT_BOOLEAN(0, "header-only", &report.header_only, "Show only data header."), OPT_STRING('s', "sort", &sort_order, "key[,key2...]", - sort_help("sort by key(s):")), + sort_order_help), OPT_STRING('F', "fields", &field_order, "key[,keys...]", - sort_help("output field(s): overhead period sample ")), + field_order_help), OPT_BOOLEAN(0, "show-cpu-utilization", &symbol_conf.show_cpu_utilization, "Show sample percentage for different cpu modes"), OPT_BOOLEAN_FLAG(0, "showcpuutilization", &symbol_conf.show_cpu_utilization, @@ -1163,6 +1166,10 @@ int cmd_report(int argc, const char **argv) "Display raw encoding of assembly instructions (default)"), OPT_STRING('M', "disassembler-style", &report.annotation_opts.disassembler_style, "disassembler style", "Specify disassembler style (e.g. -M intel for intel syntax)"), + OPT_STRING(0, "prefix", &report.annotation_opts.prefix, "prefix", + "Add prefix to source file path names in programs (with --prefix-strip)"), + OPT_STRING(0, "prefix-strip", &report.annotation_opts.prefix_strip, "N", + "Strip first N entries of source file path name in programs (with --prefix)"), OPT_BOOLEAN(0, "show-total-period", &symbol_conf.show_total_period, "Show a column with the sum of periods"), OPT_BOOLEAN_SET(0, "group", &symbol_conf.event_group, &report.group_set, @@ -1222,11 +1229,11 @@ int cmd_report(int argc, const char **argv) char sort_tmp[128]; if (ret < 0) - return ret; + goto exit; ret = perf_config(report__config, &report); if (ret) - return ret; + goto exit; argc = parse_options(argc, argv, options, report_usage, 0); if (argc) { @@ -1240,6 +1247,11 @@ int cmd_report(int argc, const char **argv) report.symbol_filter_str = argv[0]; } + if (annotate_check_args(&report.annotation_opts) < 0) { + ret = -EINVAL; + goto exit; + } + if (report.mmaps_mode) report.tasks_mode = true; @@ -1249,12 +1261,14 @@ int cmd_report(int argc, const char **argv) if (symbol_conf.vmlinux_name && access(symbol_conf.vmlinux_name, R_OK)) { pr_err("Invalid file: %s\n", symbol_conf.vmlinux_name); - return -EINVAL; + ret = -EINVAL; + goto exit; } if (symbol_conf.kallsyms_name && access(symbol_conf.kallsyms_name, R_OK)) { pr_err("Invalid file: %s\n", symbol_conf.kallsyms_name); - return -EINVAL; + ret = -EINVAL; + goto exit; } if (report.inverted_callchain) @@ -1277,13 +1291,15 @@ int cmd_report(int argc, const char **argv) data.force = symbol_conf.force; repeat: - session = perf_session__new(&data, false, &report.tool); - if (IS_ERR(session)) - return PTR_ERR(session); + session = perf_session__new(&data, &report.tool); + if (IS_ERR(session)) { + ret = PTR_ERR(session); + goto exit; + } ret = evswitch__init(&report.evswitch, session->evlist, stderr); if (ret) - return ret; + goto exit; if (zstd_init(&(session->zstd_data), 0) < 0) pr_warning("Decompression initialization failed. Reported data may be incomplete.\n"); @@ -1490,5 +1506,8 @@ int cmd_report(int argc, const char **argv) } zstd_fini(&(session->zstd_data)); perf_session__delete(session); +exit: + free(sort_order_help); + free(field_order_help); return ret; } diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c index 5cacc4f84c8d9b6d76ca4203533187a739238811..817a3efd22b4dea410db741e1da830739c376c0e 100644 --- a/tools/perf/builtin-sched.c +++ b/tools/perf/builtin-sched.c @@ -1798,7 +1798,7 @@ static int perf_sched__read_events(struct perf_sched *sched) }; int rc = -1; - session = perf_session__new(&data, false, &sched->tool); + session = perf_session__new(&data, &sched->tool); if (IS_ERR(session)) { pr_debug("Error creating perf session"); return PTR_ERR(session); @@ -2990,7 +2990,7 @@ static int perf_sched__timehist(struct perf_sched *sched) symbol_conf.use_callchain = sched->show_callchain; - session = perf_session__new(&data, false, &sched->tool); + session = perf_session__new(&data, &sched->tool); if (IS_ERR(session)) return PTR_ERR(session); diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index bbf1f2d3387e398218c1b3d9e7a1cb6767b7e1e3..e67a534836a76e3efb6ad857a82bac156f13ed90 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -735,6 +735,7 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample, struct perf_event_attr *attr, FILE *fp) { struct branch_stack *br = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); struct addr_location alf, alt; u64 i, from, to; int printed = 0; @@ -743,8 +744,8 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample, return 0; for (i = 0; i < br->nr; i++) { - from = br->entries[i].from; - to = br->entries[i].to; + from = entries[i].from; + to = entries[i].to; if (PRINT_FIELD(DSO)) { memset(&alf, 0, sizeof(alf)); @@ -768,10 +769,10 @@ static int perf_sample__fprintf_brstack(struct perf_sample *sample, } printed += fprintf(fp, "/%c/%c/%c/%d ", - mispred_str( br->entries + i), - br->entries[i].flags.in_tx? 'X' : '-', - br->entries[i].flags.abort? 'A' : '-', - br->entries[i].flags.cycles); + mispred_str(entries + i), + entries[i].flags.in_tx ? 'X' : '-', + entries[i].flags.abort ? 'A' : '-', + entries[i].flags.cycles); } return printed; @@ -782,6 +783,7 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample, struct perf_event_attr *attr, FILE *fp) { struct branch_stack *br = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); struct addr_location alf, alt; u64 i, from, to; int printed = 0; @@ -793,8 +795,8 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample, memset(&alf, 0, sizeof(alf)); memset(&alt, 0, sizeof(alt)); - from = br->entries[i].from; - to = br->entries[i].to; + from = entries[i].from; + to = entries[i].to; thread__find_symbol_fb(thread, sample->cpumode, from, &alf); thread__find_symbol_fb(thread, sample->cpumode, to, &alt); @@ -813,10 +815,10 @@ static int perf_sample__fprintf_brstacksym(struct perf_sample *sample, printed += fprintf(fp, ")"); } printed += fprintf(fp, "/%c/%c/%c/%d ", - mispred_str( br->entries + i), - br->entries[i].flags.in_tx? 'X' : '-', - br->entries[i].flags.abort? 'A' : '-', - br->entries[i].flags.cycles); + mispred_str(entries + i), + entries[i].flags.in_tx ? 'X' : '-', + entries[i].flags.abort ? 'A' : '-', + entries[i].flags.cycles); } return printed; @@ -827,6 +829,7 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample, struct perf_event_attr *attr, FILE *fp) { struct branch_stack *br = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); struct addr_location alf, alt; u64 i, from, to; int printed = 0; @@ -838,8 +841,8 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample, memset(&alf, 0, sizeof(alf)); memset(&alt, 0, sizeof(alt)); - from = br->entries[i].from; - to = br->entries[i].to; + from = entries[i].from; + to = entries[i].to; if (thread__find_map_fb(thread, sample->cpumode, from, &alf) && !alf.map->dso->adjust_symbols) @@ -862,10 +865,10 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample, printed += fprintf(fp, ")"); } printed += fprintf(fp, "/%c/%c/%c/%d ", - mispred_str(br->entries + i), - br->entries[i].flags.in_tx ? 'X' : '-', - br->entries[i].flags.abort ? 'A' : '-', - br->entries[i].flags.cycles); + mispred_str(entries + i), + entries[i].flags.in_tx ? 'X' : '-', + entries[i].flags.abort ? 'A' : '-', + entries[i].flags.cycles); } return printed; @@ -1011,6 +1014,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample, struct machine *machine, FILE *fp) { struct branch_stack *br = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); u64 start, end; int i, insn, len, nr, ilen, printed = 0; struct perf_insn x; @@ -1031,31 +1035,31 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample, printed += fprintf(fp, "%c", '\n'); /* Handle first from jump, of which we don't know the entry. */ - len = grab_bb(buffer, br->entries[nr-1].from, - br->entries[nr-1].from, + len = grab_bb(buffer, entries[nr-1].from, + entries[nr-1].from, machine, thread, &x.is64bit, &x.cpumode, false); if (len > 0) { - printed += ip__fprintf_sym(br->entries[nr - 1].from, thread, + printed += ip__fprintf_sym(entries[nr - 1].from, thread, x.cpumode, x.cpu, &lastsym, attr, fp); - printed += ip__fprintf_jump(br->entries[nr - 1].from, &br->entries[nr - 1], + printed += ip__fprintf_jump(entries[nr - 1].from, &entries[nr - 1], &x, buffer, len, 0, fp, &total_cycles); if (PRINT_FIELD(SRCCODE)) - printed += print_srccode(thread, x.cpumode, br->entries[nr - 1].from); + printed += print_srccode(thread, x.cpumode, entries[nr - 1].from); } /* Print all blocks */ for (i = nr - 2; i >= 0; i--) { - if (br->entries[i].from || br->entries[i].to) + if (entries[i].from || entries[i].to) pr_debug("%d: %" PRIx64 "-%" PRIx64 "\n", i, - br->entries[i].from, - br->entries[i].to); - start = br->entries[i + 1].to; - end = br->entries[i].from; + entries[i].from, + entries[i].to); + start = entries[i + 1].to; + end = entries[i].from; len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false); /* Patch up missing kernel transfers due to ring filters */ if (len == -ENXIO && i > 0) { - end = br->entries[--i].from; + end = entries[--i].from; pr_debug("\tpatching up to %" PRIx64 "-%" PRIx64 "\n", start, end); len = grab_bb(buffer, start, end, machine, thread, &x.is64bit, &x.cpumode, false); } @@ -1068,7 +1072,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample, printed += ip__fprintf_sym(ip, thread, x.cpumode, x.cpu, &lastsym, attr, fp); if (ip == end) { - printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, ++insn, fp, + printed += ip__fprintf_jump(ip, &entries[i], &x, buffer + off, len - off, ++insn, fp, &total_cycles); if (PRINT_FIELD(SRCCODE)) printed += print_srccode(thread, x.cpumode, ip); @@ -1092,9 +1096,9 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample, * Hit the branch? In this case we are already done, and the target * has not been executed yet. */ - if (br->entries[0].from == sample->ip) + if (entries[0].from == sample->ip) goto out; - if (br->entries[0].flags.abort) + if (entries[0].flags.abort) goto out; /* @@ -1105,7 +1109,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample, * between final branch and sample. When this happens just * continue walking after the last TO until we hit a branch. */ - start = br->entries[0].to; + start = entries[0].to; end = sample->ip; if (end < start) { /* Missing jump. Scan 128 bytes for the next branch */ @@ -3096,7 +3100,7 @@ int find_scripts(char **scripts_array, char **scripts_path_array, int num, char *temp; int i = 0; - session = perf_session__new(&data, false, NULL); + session = perf_session__new(&data, NULL); if (IS_ERR(session)) return PTR_ERR(session); @@ -3762,7 +3766,7 @@ int cmd_script(int argc, const char **argv) use_browser = 0; } - session = perf_session__new(&data, false, &script.tool); + session = perf_session__new(&data, &script.tool); if (IS_ERR(session)) return PTR_ERR(session); diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index ea183922c4ef1d31bf8dfb4072d84fd667b441eb..748be8cf5a7383d7f9fe858dfefae29f0324e3cc 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -65,6 +65,7 @@ #include "util/target.h" #include "util/time-utils.h" #include "util/top.h" +#include "util/affinity.h" #include "asm/bug.h" #include @@ -420,6 +421,62 @@ static bool is_target_alive(struct target *_target, return false; } +enum counter_recovery { + COUNTER_SKIP, + COUNTER_RETRY, + COUNTER_FATAL, +}; + +static enum counter_recovery stat_handle_error(struct evsel *counter) +{ + char msg[BUFSIZ]; + /* + * PPC returns ENXIO for HW counters until 2.6.37 + * (behavior changed with commit b0a873e). + */ + if (errno == EINVAL || errno == ENOSYS || + errno == ENOENT || errno == EOPNOTSUPP || + errno == ENXIO) { + if (verbose > 0) + ui__warning("%s event is not supported by the kernel.\n", + perf_evsel__name(counter)); + counter->supported = false; + /* + * errored is a sticky flag that means one of the counter's + * cpu event had a problem and needs to be reexamined. + */ + counter->errored = true; + + if ((counter->leader != counter) || + !(counter->leader->core.nr_members > 1)) + return COUNTER_SKIP; + } else if (perf_evsel__fallback(counter, errno, msg, sizeof(msg))) { + if (verbose > 0) + ui__warning("%s\n", msg); + return COUNTER_RETRY; + } else if (target__has_per_thread(&target) && + evsel_list->core.threads && + evsel_list->core.threads->err_thread != -1) { + /* + * For global --per-thread case, skip current + * error thread. + */ + if (!thread_map__remove(evsel_list->core.threads, + evsel_list->core.threads->err_thread)) { + evsel_list->core.threads->err_thread = -1; + return COUNTER_RETRY; + } + } + + perf_evsel__open_strerror(counter, &target, + errno, msg, sizeof(msg)); + ui__error("%s\n", msg); + + if (child_pid != -1) + kill(child_pid, SIGTERM); + return COUNTER_FATAL; +} + static int __run_perf_stat(int argc, const char **argv, int run_idx) { int interval = stat_config.interval; @@ -433,6 +490,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx) int status = 0; const bool forks = (argc > 0); bool is_pipe = STAT_RECORD ? perf_stat.data.is_pipe : false; + struct affinity affinity; + int i, cpu; + bool second_pass = false; if (interval) { ts.tv_sec = interval / USEC_PER_MSEC; @@ -457,61 +517,104 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx) if (group) perf_evlist__set_leader(evsel_list); - evlist__for_each_entry(evsel_list, counter) { + if (affinity__setup(&affinity) < 0) + return -1; + + evlist__for_each_cpu (evsel_list, i, cpu) { + affinity__set(&affinity, cpu); + + evlist__for_each_entry(evsel_list, counter) { + if (evsel__cpu_iter_skip(counter, cpu)) + continue; + if (counter->reset_group || counter->errored) + continue; try_again: - if (create_perf_stat_counter(counter, &stat_config, &target) < 0) { - - /* Weak group failed. Reset the group. */ - if ((errno == EINVAL || errno == EBADF) && - counter->leader != counter && - counter->weak_group) { - counter = perf_evlist__reset_weak_group(evsel_list, counter); - goto try_again; - } + if (create_perf_stat_counter(counter, &stat_config, &target, + counter->cpu_iter - 1) < 0) { - /* - * PPC returns ENXIO for HW counters until 2.6.37 - * (behavior changed with commit b0a873e). - */ - if (errno == EINVAL || errno == ENOSYS || - errno == ENOENT || errno == EOPNOTSUPP || - errno == ENXIO) { - if (verbose > 0) - ui__warning("%s event is not supported by the kernel.\n", - perf_evsel__name(counter)); - counter->supported = false; - - if ((counter->leader != counter) || - !(counter->leader->core.nr_members > 1)) - continue; - } else if (perf_evsel__fallback(counter, errno, msg, sizeof(msg))) { - if (verbose > 0) - ui__warning("%s\n", msg); - goto try_again; - } else if (target__has_per_thread(&target) && - evsel_list->core.threads && - evsel_list->core.threads->err_thread != -1) { /* - * For global --per-thread case, skip current - * error thread. + * Weak group failed. We cannot just undo this here + * because earlier CPUs might be in group mode, and the kernel + * doesn't support mixing group and non group reads. Defer + * it to later. + * Don't close here because we're in the wrong affinity. */ - if (!thread_map__remove(evsel_list->core.threads, - evsel_list->core.threads->err_thread)) { - evsel_list->core.threads->err_thread = -1; + if ((errno == EINVAL || errno == EBADF) && + counter->leader != counter && + counter->weak_group) { + perf_evlist__reset_weak_group(evsel_list, counter, false); + assert(counter->reset_group); + second_pass = true; + continue; + } + + switch (stat_handle_error(counter)) { + case COUNTER_FATAL: + return -1; + case COUNTER_RETRY: goto try_again; + case COUNTER_SKIP: + continue; + default: + break; } + } + counter->supported = true; + } + } - perf_evsel__open_strerror(counter, &target, - errno, msg, sizeof(msg)); - ui__error("%s\n", msg); + if (second_pass) { + /* + * Now redo all the weak group after closing them, + * and also close errored counters. + */ - if (child_pid != -1) - kill(child_pid, SIGTERM); + evlist__for_each_cpu(evsel_list, i, cpu) { + affinity__set(&affinity, cpu); + /* First close errored or weak retry */ + evlist__for_each_entry(evsel_list, counter) { + if (!counter->reset_group && !counter->errored) + continue; + if (evsel__cpu_iter_skip_no_inc(counter, cpu)) + continue; + perf_evsel__close_cpu(&counter->core, counter->cpu_iter); + } + /* Now reopen weak */ + evlist__for_each_entry(evsel_list, counter) { + if (!counter->reset_group && !counter->errored) + continue; + if (evsel__cpu_iter_skip(counter, cpu)) + continue; + if (!counter->reset_group) + continue; +try_again_reset: + pr_debug2("reopening weak %s\n", perf_evsel__name(counter)); + if (create_perf_stat_counter(counter, &stat_config, &target, + counter->cpu_iter - 1) < 0) { + + switch (stat_handle_error(counter)) { + case COUNTER_FATAL: + return -1; + case COUNTER_RETRY: + goto try_again_reset; + case COUNTER_SKIP: + continue; + default: + break; + } + } + counter->supported = true; + } + } + } + affinity__cleanup(&affinity); - return -1; + evlist__for_each_entry(evsel_list, counter) { + if (!counter->supported) { + perf_evsel__free_fd(&counter->core); + continue; } - counter->supported = true; l = strlen(counter->unit); if (l > stat_config.unit_width) @@ -1362,19 +1465,17 @@ static int add_default_attributes(void) if (target__has_cpu(&target)) default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK; - if (perf_evlist__add_default_attrs(evsel_list, default_attrs0) < 0) + if (evlist__add_default_attrs(evsel_list, default_attrs0) < 0) return -1; if (pmu_have_event("cpu", "stalled-cycles-frontend")) { - if (perf_evlist__add_default_attrs(evsel_list, - frontend_attrs) < 0) + if (evlist__add_default_attrs(evsel_list, frontend_attrs) < 0) return -1; } if (pmu_have_event("cpu", "stalled-cycles-backend")) { - if (perf_evlist__add_default_attrs(evsel_list, - backend_attrs) < 0) + if (evlist__add_default_attrs(evsel_list, backend_attrs) < 0) return -1; } - if (perf_evlist__add_default_attrs(evsel_list, default_attrs1) < 0) + if (evlist__add_default_attrs(evsel_list, default_attrs1) < 0) return -1; } @@ -1384,21 +1485,21 @@ static int add_default_attributes(void) return 0; /* Append detailed run extra attributes: */ - if (perf_evlist__add_default_attrs(evsel_list, detailed_attrs) < 0) + if (evlist__add_default_attrs(evsel_list, detailed_attrs) < 0) return -1; if (detailed_run < 2) return 0; /* Append very detailed run extra attributes: */ - if (perf_evlist__add_default_attrs(evsel_list, very_detailed_attrs) < 0) + if (evlist__add_default_attrs(evsel_list, very_detailed_attrs) < 0) return -1; if (detailed_run < 3) return 0; /* Append very, very detailed run extra attributes: */ - return perf_evlist__add_default_attrs(evsel_list, very_very_detailed_attrs); + return evlist__add_default_attrs(evsel_list, very_very_detailed_attrs); } static const char * const stat_record_usage[] = { @@ -1436,7 +1537,7 @@ static int __cmd_record(int argc, const char **argv) return -1; } - session = perf_session__new(data, false, NULL); + session = perf_session__new(data, NULL); if (IS_ERR(session)) { pr_err("Perf session creation failed\n"); return PTR_ERR(session); @@ -1635,7 +1736,7 @@ static int __cmd_report(int argc, const char **argv) perf_stat.data.path = input_name; perf_stat.data.mode = PERF_DATA_MODE_READ; - session = perf_session__new(&perf_stat.data, false, &perf_stat.tool); + session = perf_session__new(&perf_stat.data, &perf_stat.tool); if (IS_ERR(session)) return PTR_ERR(session); diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c index 9e84fae9b096c9863585a9a35c175eea03946705..eda2892aa261c508be68f0549e531b3336cb0bfc 100644 --- a/tools/perf/builtin-timechart.c +++ b/tools/perf/builtin-timechart.c @@ -1598,8 +1598,7 @@ static int __cmd_timechart(struct timechart *tchart, const char *output_name) .force = tchart->force, }; - struct perf_session *session = perf_session__new(&data, false, - &tchart->tool); + struct perf_session *session = perf_session__new(&data, &tchart->tool); int ret = -EINVAL; if (IS_ERR(session)) diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index a30d62186f5e9e0711ab5acd8fe0c0b7053f8cb0..3892e56e72483d81f6786c9b04434caf9fef8185 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -82,6 +82,7 @@ #include #include +#include static volatile int done; static volatile int resize; @@ -871,10 +872,10 @@ static void perf_top__mmap_read_idx(struct perf_top *top, int idx) union perf_event *event; md = opts->overwrite ? &evlist->overwrite_mmap[idx] : &evlist->mmap[idx]; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) return; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { int ret; ret = perf_evlist__parse_sample_timestamp(evlist, event, &last_timestamp); @@ -885,7 +886,7 @@ static void perf_top__mmap_read_idx(struct perf_top *top, int idx) if (ret) break; - perf_mmap__consume(md); + perf_mmap__consume(&md->core); if (top->qe.rotate) { pthread_mutex_lock(&top->qe.mutex); @@ -895,7 +896,7 @@ static void perf_top__mmap_read_idx(struct perf_top *top, int idx) } } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); } static void perf_top__mmap_read(struct perf_top *top) @@ -1512,6 +1513,10 @@ int cmd_top(int argc, const char **argv) "objdump binary to use for disassembly and annotations"), OPT_STRING('M', "disassembler-style", &top.annotation_opts.disassembler_style, "disassembler style", "Specify disassembler style (e.g. -M intel for intel syntax)"), + OPT_STRING(0, "prefix", &top.annotation_opts.prefix, "prefix", + "Add prefix to source file path names in programs (with --prefix-strip)"), + OPT_STRING(0, "prefix-strip", &top.annotation_opts.prefix_strip, "N", + "Strip first N entries of source file path name in programs (with --prefix)"), OPT_STRING('u', "uid", &target->uid_str, "user", "user to profile"), OPT_CALLBACK(0, "percent-limit", &top, "percent", "Don't show entries under that percent", parse_percent_limit), @@ -1542,7 +1547,6 @@ int cmd_top(int argc, const char **argv) OPTS_EVSWITCH(&top.evswitch), OPT_END() }; - struct evlist *sb_evlist = NULL; const char * const top_usage[] = { "perf top []", NULL @@ -1567,8 +1571,11 @@ int cmd_top(int argc, const char **argv) if (argc) usage_with_options(top_usage, options); + if (annotate_check_args(&top.annotation_opts) < 0) + goto out_delete_evlist; + if (!top.evlist->core.nr_entries && - perf_evlist__add_default(top.evlist) < 0) { + evlist__add_default(top.evlist) < 0) { pr_err("Not enough memory for event selector list\n"); goto out_delete_evlist; } @@ -1676,16 +1683,27 @@ int cmd_top(int argc, const char **argv) signal(SIGWINCH, winch_sig); } - top.session = perf_session__new(NULL, false, NULL); + top.session = perf_session__new(NULL, NULL); if (IS_ERR(top.session)) { status = PTR_ERR(top.session); goto out_delete_evlist; } - if (!top.record_opts.no_bpf_event) - bpf_event__add_sb_event(&sb_evlist, &perf_env); + if (!top.record_opts.no_bpf_event) { + top.sb_evlist = evlist__new(); + + if (top.sb_evlist == NULL) { + pr_err("Couldn't create side band evlist.\n."); + goto out_delete_evlist; + } + + if (evlist__add_bpf_sb_event(top.sb_evlist, &perf_env)) { + pr_err("Couldn't ask for PERF_RECORD_BPF_EVENT side band events.\n."); + goto out_delete_evlist; + } + } - if (perf_evlist__start_sb_thread(sb_evlist, target)) { + if (perf_evlist__start_sb_thread(top.sb_evlist, target)) { pr_debug("Couldn't start the BPF side band thread:\nBPF programs starting from now on won't be annotatable\n"); opts->no_bpf_event = true; } @@ -1693,7 +1711,7 @@ int cmd_top(int argc, const char **argv) status = __cmd_top(&top); if (!opts->no_bpf_event) - perf_evlist__stop_sb_thread(sb_evlist); + perf_evlist__stop_sb_thread(top.sb_evlist); out_delete_evlist: evlist__delete(top.evlist); diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index 6052eb057821dbcc4ac1a3b525caaa7c386c734f..06edb52090d4663d4235ce1429545689c838bd48 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -77,6 +77,7 @@ #include #include +#include #ifndef O_CLOEXEC # define O_CLOEXEC 02000000 @@ -393,11 +394,11 @@ static int perf_evsel__init_raw_syscall_tp(struct evsel *evsel, void *handler) static struct evsel *perf_evsel__raw_syscall_newtp(const char *direction, void *handler) { - struct evsel *evsel = perf_evsel__newtp("raw_syscalls", direction); + struct evsel *evsel = evsel__newtp("raw_syscalls", direction); /* older kernel (e.g., RHEL6) use syscalls:{enter,exit} */ if (IS_ERR(evsel)) - evsel = perf_evsel__newtp("syscalls", direction); + evsel = evsel__newtp("syscalls", direction); if (IS_ERR(evsel)) return NULL; @@ -2705,7 +2706,7 @@ static bool evlist__add_vfs_getname(struct evlist *evlist) return found; } -static struct evsel *perf_evsel__new_pgfault(u64 config) +static struct evsel *evsel__new_pgfault(u64 config) { struct evsel *evsel; struct perf_event_attr attr = { @@ -3332,7 +3333,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv) } if ((trace->trace_pgfaults & TRACE_PFMAJ)) { - pgfault_maj = perf_evsel__new_pgfault(PERF_COUNT_SW_PAGE_FAULTS_MAJ); + pgfault_maj = evsel__new_pgfault(PERF_COUNT_SW_PAGE_FAULTS_MAJ); if (pgfault_maj == NULL) goto out_error_mem; perf_evsel__config_callchain(pgfault_maj, &trace->opts, &callchain_param); @@ -3340,7 +3341,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv) } if ((trace->trace_pgfaults & TRACE_PFMIN)) { - pgfault_min = perf_evsel__new_pgfault(PERF_COUNT_SW_PAGE_FAULTS_MIN); + pgfault_min = evsel__new_pgfault(PERF_COUNT_SW_PAGE_FAULTS_MIN); if (pgfault_min == NULL) goto out_error_mem; perf_evsel__config_callchain(pgfault_min, &trace->opts, &callchain_param); @@ -3348,8 +3349,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv) } if (trace->sched && - perf_evlist__add_newtp(evlist, "sched", "sched_stat_runtime", - trace__sched_stat_runtime)) + evlist__add_newtp(evlist, "sched", "sched_stat_runtime", trace__sched_stat_runtime)) goto out_error_sched_stat_runtime; /* @@ -3499,17 +3499,17 @@ static int trace__run(struct trace *trace, int argc, const char **argv) struct mmap *md; md = &evlist->mmap[i]; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) continue; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { ++trace->nr_events; err = trace__deliver_event(trace, event); if (err) goto out_disable; - perf_mmap__consume(md); + perf_mmap__consume(&md->core); if (interrupted) goto out_disable; @@ -3519,7 +3519,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv) draining = true; } } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); } if (trace->nr_events == before) { @@ -3636,7 +3636,7 @@ static int trace__replay(struct trace *trace) /* add tid to output */ trace->multiple_threads = true; - session = perf_session__new(&data, false, &trace->tool); + session = perf_session__new(&data, &trace->tool); if (IS_ERR(session)) return PTR_ERR(session); diff --git a/tools/perf/check-headers.sh b/tools/perf/check-headers.sh index 499235a411628db34a6b24854c409c86ab355a34..9cea76bf0efafa4e51575df7e0273d67b5a37327 100755 --- a/tools/perf/check-headers.sh +++ b/tools/perf/check-headers.sh @@ -29,6 +29,7 @@ arch/x86/include/asm/required-features.h arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/inat_types.h arch/x86/include/asm/emulate_prefix.h +arch/x86/include/asm/msr-index.h arch/x86/include/uapi/asm/prctl.h arch/x86/lib/x86-opcode-map.txt arch/x86/tools/gen-insn-attr-x86.awk @@ -108,8 +109,9 @@ for i in $FILES; do done # diff with extra ignore lines -check arch/x86/lib/memcpy_64.S '-I "^EXPORT_SYMBOL" -I "^#include "' -check arch/x86/lib/memset_64.S '-I "^EXPORT_SYMBOL" -I "^#include "' +check arch/x86/lib/memcpy_64.S '-I "^EXPORT_SYMBOL" -I "^#include " -I"^SYM_FUNC_START\(_LOCAL\)*(memcpy_\(erms\|orig\))"' +check arch/x86/lib/memset_64.S '-I "^EXPORT_SYMBOL" -I "^#include " -I"^SYM_FUNC_START\(_LOCAL\)*(memset_\(erms\|orig\))"' +check arch/x86/include/asm/amd-ibs.h '-I "^#include [<\"]\(asm/\)*msr-index.h"' check include/uapi/asm-generic/mman.h '-I "^#include <\(uapi/\)*asm-generic/mman-common\(-tools\)*.h>"' check include/uapi/linux/mman.h '-I "^#include <\(uapi/\)*asm/mman.h>"' check include/linux/ctype.h '-I "isdigit("' diff --git a/tools/perf/lib/Build b/tools/perf/lib/Build index c31f1c111f8f18bb52c76be2ab836931fa28573d..2ef9a4ec6d998971b777c0481984fe2d337bfe52 100644 --- a/tools/perf/lib/Build +++ b/tools/perf/lib/Build @@ -3,6 +3,7 @@ libperf-y += cpumap.o libperf-y += threadmap.o libperf-y += evsel.o libperf-y += evlist.o +libperf-y += mmap.o libperf-y += zalloc.o libperf-y += xyarray.o libperf-y += lib.o diff --git a/tools/perf/lib/Makefile b/tools/perf/lib/Makefile index 85ccb8c439a475d8ce9e762f40ec8c56def48880..0889c9c3ec19d51e44599acde624d955905d1a79 100644 --- a/tools/perf/lib/Makefile +++ b/tools/perf/lib/Makefile @@ -172,8 +172,9 @@ install_headers: $(call do_install,include/perf/cpumap.h,$(prefix)/include/perf,644); \ $(call do_install,include/perf/threadmap.h,$(prefix)/include/perf,644); \ $(call do_install,include/perf/evlist.h,$(prefix)/include/perf,644); \ - $(call do_install,include/perf/evsel.h,$(prefix)/include/perf,644); - $(call do_install,include/perf/event.h,$(prefix)/include/perf,644); + $(call do_install,include/perf/evsel.h,$(prefix)/include/perf,644); \ + $(call do_install,include/perf/event.h,$(prefix)/include/perf,644); \ + $(call do_install,include/perf/mmap.h,$(prefix)/include/perf,644); install_pkgconfig: $(LIBPERF_PC) $(call QUIET_INSTALL, $(LIBPERF_PC)) \ diff --git a/tools/perf/lib/cpumap.c b/tools/perf/lib/cpumap.c index 2ca1fafa620dfca8ee992b87eb35fe5c36220d0f..5e1ae7a23591a9409bb7c5c4800f35fe2e6e3c98 100644 --- a/tools/perf/lib/cpumap.c +++ b/tools/perf/lib/cpumap.c @@ -272,3 +272,60 @@ int perf_cpu_map__max(struct perf_cpu_map *map) return max; } + +/* + * Merge two cpumaps + * + * orig either gets freed and replaced with a new map, or reused + * with no reference count change (similar to "realloc") + * other has its reference count increased. + */ + +struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig, + struct perf_cpu_map *other) +{ + int *tmp_cpus; + int tmp_len; + int i, j, k; + struct perf_cpu_map *merged; + + if (!orig && !other) + return NULL; + if (!orig) { + perf_cpu_map__get(other); + return other; + } + if (!other) + return orig; + if (orig->nr == other->nr && + !memcmp(orig->map, other->map, orig->nr * sizeof(int))) + return orig; + + tmp_len = orig->nr + other->nr; + tmp_cpus = malloc(tmp_len * sizeof(int)); + if (!tmp_cpus) + return NULL; + + /* Standard merge algorithm from wikipedia */ + i = j = k = 0; + while (i < orig->nr && j < other->nr) { + if (orig->map[i] <= other->map[j]) { + if (orig->map[i] == other->map[j]) + j++; + tmp_cpus[k++] = orig->map[i++]; + } else + tmp_cpus[k++] = other->map[j++]; + } + + while (i < orig->nr) + tmp_cpus[k++] = orig->map[i++]; + + while (j < other->nr) + tmp_cpus[k++] = other->map[j++]; + assert(k <= tmp_len); + + merged = cpu_map__trim_new(k, tmp_cpus); + free(tmp_cpus); + perf_cpu_map__put(orig); + return merged; +} diff --git a/tools/perf/lib/evlist.c b/tools/perf/lib/evlist.c index d1496fee810ccf8b8dd2683ede21d5b960b1771a..b7e8b3b15f25120ae2b6b71e23d61547acd3ed44 100644 --- a/tools/perf/lib/evlist.c +++ b/tools/perf/lib/evlist.c @@ -46,6 +46,7 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist, perf_thread_map__put(evsel->threads); evsel->threads = perf_thread_map__get(evlist->threads); + evlist->all_cpus = perf_cpu_map__merge(evlist->all_cpus, evsel->cpus); } static void perf_evlist__propagate_maps(struct perf_evlist *evlist) diff --git a/tools/perf/lib/evsel.c b/tools/perf/lib/evsel.c index a8cb582e2721dc15258b5c518abee682618ffb6b..ea775dacbd2d23d73b1e77b77934e14e7212704a 100644 --- a/tools/perf/lib/evsel.c +++ b/tools/perf/lib/evsel.c @@ -114,15 +114,23 @@ int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus, return err; } +static void perf_evsel__close_fd_cpu(struct perf_evsel *evsel, int cpu) +{ + int thread; + + for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) { + if (FD(evsel, cpu, thread) >= 0) + close(FD(evsel, cpu, thread)); + FD(evsel, cpu, thread) = -1; + } +} + void perf_evsel__close_fd(struct perf_evsel *evsel) { - int cpu, thread; + int cpu; for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) - for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) { - close(FD(evsel, cpu, thread)); - FD(evsel, cpu, thread) = -1; - } + perf_evsel__close_fd_cpu(evsel, cpu); } void perf_evsel__free_fd(struct perf_evsel *evsel) @@ -140,6 +148,14 @@ void perf_evsel__close(struct perf_evsel *evsel) perf_evsel__free_fd(evsel); } +void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu) +{ + if (evsel->fd == NULL) + return; + + perf_evsel__close_fd_cpu(evsel, cpu); +} + int perf_evsel__read_size(struct perf_evsel *evsel) { u64 read_format = evsel->attr.read_format; diff --git a/tools/perf/lib/include/internal/evlist.h b/tools/perf/lib/include/internal/evlist.h index 9f440ab12b76311144278d921ec0002c7320d040..096bf0b05e040c3c3d1227f9d2661c181e472e32 100644 --- a/tools/perf/lib/include/internal/evlist.h +++ b/tools/perf/lib/include/internal/evlist.h @@ -17,6 +17,7 @@ struct perf_evlist { int nr_entries; bool has_user_cpus; struct perf_cpu_map *cpus; + struct perf_cpu_map *all_cpus; struct perf_thread_map *threads; int nr_mmaps; size_t mmap_len; diff --git a/tools/perf/lib/include/internal/lib.h b/tools/perf/lib/include/internal/lib.h index 5175d491b2d4962bb030fc5afdd0477286f5768e..85471a4b900f73aa5095d9941d67025d53880612 100644 --- a/tools/perf/lib/include/internal/lib.h +++ b/tools/perf/lib/include/internal/lib.h @@ -9,4 +9,6 @@ extern unsigned int page_size; ssize_t readn(int fd, void *buf, size_t n); ssize_t writen(int fd, const void *buf, size_t n); +ssize_t preadn(int fd, void *buf, size_t n, off_t offs); + #endif /* __LIBPERF_INTERNAL_CPUMAP_H */ diff --git a/tools/perf/lib/include/internal/mmap.h b/tools/perf/lib/include/internal/mmap.h index ba1e519c15b912235e7d5fe08e991a4b7464f1f5..ee536c4441bb9675355f2a2134109fdfdc07f8f0 100644 --- a/tools/perf/lib/include/internal/mmap.h +++ b/tools/perf/lib/include/internal/mmap.h @@ -10,23 +10,45 @@ /* perf sample has 16 bits size limit */ #define PERF_SAMPLE_MAX_SIZE (1 << 16) +struct perf_mmap; + +typedef void (*libperf_unmap_cb_t)(struct perf_mmap *map); + /** * struct perf_mmap - perf's ring buffer mmap details * * @refcnt - e.g. code using PERF_EVENT_IOC_SET_OUTPUT to share this */ struct perf_mmap { - void *base; - int mask; - int fd; - int cpu; - refcount_t refcnt; - u64 prev; - u64 start; - u64 end; - bool overwrite; - u64 flush; - char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8); + void *base; + int mask; + int fd; + int cpu; + refcount_t refcnt; + u64 prev; + u64 start; + u64 end; + bool overwrite; + u64 flush; + libperf_unmap_cb_t unmap_cb; + char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8); +}; + +struct perf_mmap_param { + int prot; + int mask; }; +size_t perf_mmap__mmap_len(struct perf_mmap *map); + +void perf_mmap__init(struct perf_mmap *map, bool overwrite, + libperf_unmap_cb_t unmap_cb); +int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp, + int fd, int cpu); +void perf_mmap__munmap(struct perf_mmap *map); +void perf_mmap__get(struct perf_mmap *map); +void perf_mmap__put(struct perf_mmap *map); + +u64 perf_mmap__read_head(struct perf_mmap *map); + #endif /* __LIBPERF_INTERNAL_MMAP_H */ diff --git a/tools/perf/lib/include/perf/core.h b/tools/perf/lib/include/perf/core.h index cfd70e720c1ce0fbb4c3d2b707ee6bd8359e5421..2a80e4b6f819500730b82034df576f87368ed885 100644 --- a/tools/perf/lib/include/perf/core.h +++ b/tools/perf/lib/include/perf/core.h @@ -12,6 +12,8 @@ enum libperf_print_level { LIBPERF_WARN, LIBPERF_INFO, LIBPERF_DEBUG, + LIBPERF_DEBUG2, + LIBPERF_DEBUG3, }; typedef int (*libperf_print_fn_t)(enum libperf_print_level level, diff --git a/tools/perf/lib/include/perf/cpumap.h b/tools/perf/lib/include/perf/cpumap.h index ac9aa497f84ab3151799e1bc4689b5e0b0fd1dc4..6a17ad730cbc112b7e3e5938a4cc47b688d09440 100644 --- a/tools/perf/lib/include/perf/cpumap.h +++ b/tools/perf/lib/include/perf/cpumap.h @@ -12,6 +12,8 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__dummy_new(void); LIBPERF_API struct perf_cpu_map *perf_cpu_map__new(const char *cpu_list); LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file); LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map); +LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig, + struct perf_cpu_map *other); LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map); LIBPERF_API int perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx); LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus); diff --git a/tools/perf/lib/include/perf/event.h b/tools/perf/lib/include/perf/event.h index 18106899cb4eb19f28d1567858ee5229e146ab0b..69b44d2cc0f5001cb3098870555608c17c9f72b2 100644 --- a/tools/perf/lib/include/perf/event.h +++ b/tools/perf/lib/include/perf/event.h @@ -105,6 +105,12 @@ struct perf_record_bpf_event { __u8 tag[BPF_TAG_SIZE]; // prog tag }; +struct perf_record_cgroup { + struct perf_event_header header; + __u64 id; + char path[PATH_MAX]; +}; + struct perf_record_sample { struct perf_event_header header; __u64 array[]; @@ -352,6 +358,7 @@ union perf_event { struct perf_record_mmap2 mmap2; struct perf_record_comm comm; struct perf_record_namespaces namespaces; + struct perf_record_cgroup cgroup; struct perf_record_fork fork; struct perf_record_lost lost; struct perf_record_lost_samples lost_samples; diff --git a/tools/perf/lib/include/perf/evsel.h b/tools/perf/lib/include/perf/evsel.h index 4388667f265ce10f3791182a250aff57da8758e6..ed10a914cd3f40ea8d34da20fa7b8a7dd38fca7c 100644 --- a/tools/perf/lib/include/perf/evsel.h +++ b/tools/perf/lib/include/perf/evsel.h @@ -28,6 +28,7 @@ LIBPERF_API void perf_evsel__delete(struct perf_evsel *evsel); LIBPERF_API int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus, struct perf_thread_map *threads); LIBPERF_API void perf_evsel__close(struct perf_evsel *evsel); +LIBPERF_API void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu); LIBPERF_API int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread, struct perf_counts_values *count); LIBPERF_API int perf_evsel__enable(struct perf_evsel *evsel); diff --git a/tools/perf/lib/include/perf/mmap.h b/tools/perf/lib/include/perf/mmap.h new file mode 100644 index 0000000000000000000000000000000000000000..9508ad90d8b9e19b079dc7fb96d73b85318cd097 --- /dev/null +++ b/tools/perf/lib/include/perf/mmap.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LIBPERF_MMAP_H +#define __LIBPERF_MMAP_H + +#include + +struct perf_mmap; +union perf_event; + +LIBPERF_API void perf_mmap__consume(struct perf_mmap *map); +LIBPERF_API int perf_mmap__read_init(struct perf_mmap *map); +LIBPERF_API void perf_mmap__read_done(struct perf_mmap *map); +LIBPERF_API union perf_event *perf_mmap__read_event(struct perf_mmap *map); + +#endif /* __LIBPERF_MMAP_H */ diff --git a/tools/perf/lib/internal.h b/tools/perf/lib/internal.h index dc92f241732e617c88c8d10ff24f7daeadab6f08..37db745e1502453817d4ee5ccb2d3c5a7555fa60 100644 --- a/tools/perf/lib/internal.h +++ b/tools/perf/lib/internal.h @@ -14,5 +14,7 @@ do { \ #define pr_warning(fmt, ...) __pr(LIBPERF_WARN, fmt, ##__VA_ARGS__) #define pr_info(fmt, ...) __pr(LIBPERF_INFO, fmt, ##__VA_ARGS__) #define pr_debug(fmt, ...) __pr(LIBPERF_DEBUG, fmt, ##__VA_ARGS__) +#define pr_debug2(fmt, ...) __pr(LIBPERF_DEBUG2, fmt, ##__VA_ARGS__) +#define pr_debug3(fmt, ...) __pr(LIBPERF_DEBUG3, fmt, ##__VA_ARGS__) #endif /* __LIBPERF_INTERNAL_H */ diff --git a/tools/perf/lib/lib.c b/tools/perf/lib/lib.c index 18658931fc7145f9788ca27279a381591c68e12e..696fb0ea67c6e31e35b3bd9ab8055f121051f707 100644 --- a/tools/perf/lib/lib.c +++ b/tools/perf/lib/lib.c @@ -38,6 +38,26 @@ ssize_t readn(int fd, void *buf, size_t n) return ion(true, fd, buf, n); } +ssize_t preadn(int fd, void *buf, size_t n, off_t offs) +{ + size_t left = n; + + while (left) { + ssize_t ret = pread(fd, buf, left, offs); + + if (ret < 0 && errno == EINTR) + continue; + if (ret <= 0) + return ret; + + left -= ret; + buf += ret; + offs += ret; + } + + return n; +} + /* * Write exactly 'n' bytes or return an error. */ diff --git a/tools/perf/lib/libperf.map b/tools/perf/lib/libperf.map index ab8dbde1136cc2436e8fd437e565ab371998f4fc..8bb0d73e0c6cefa76e1b8d0b2f3e745fc3749095 100644 --- a/tools/perf/lib/libperf.map +++ b/tools/perf/lib/libperf.map @@ -40,6 +40,10 @@ LIBPERF_0.0.1 { perf_evlist__next; perf_evlist__set_maps; perf_evlist__poll; + perf_mmap__consume; + perf_mmap__read_init; + perf_mmap__read_done; + perf_mmap__read_event; local: *; }; diff --git a/tools/perf/lib/mmap.c b/tools/perf/lib/mmap.c new file mode 100644 index 0000000000000000000000000000000000000000..0752c193b0fba5890a789bef9940ad0d035b85f7 --- /dev/null +++ b/tools/perf/lib/mmap.c @@ -0,0 +1,273 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "internal.h" + +void perf_mmap__init(struct perf_mmap *map, bool overwrite, + libperf_unmap_cb_t unmap_cb) +{ + map->fd = -1; + map->overwrite = overwrite; + map->unmap_cb = unmap_cb; + refcount_set(&map->refcnt, 0); +} + +size_t perf_mmap__mmap_len(struct perf_mmap *map) +{ + return map->mask + 1 + page_size; +} + +int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp, + int fd, int cpu) +{ + map->prev = 0; + map->mask = mp->mask; + map->base = mmap(NULL, perf_mmap__mmap_len(map), mp->prot, + MAP_SHARED, fd, 0); + if (map->base == MAP_FAILED) { + map->base = NULL; + return -1; + } + + map->fd = fd; + map->cpu = cpu; + return 0; +} + +void perf_mmap__munmap(struct perf_mmap *map) +{ + if (map && map->base != NULL) { + munmap(map->base, perf_mmap__mmap_len(map)); + map->base = NULL; + map->fd = -1; + refcount_set(&map->refcnt, 0); + } + if (map && map->unmap_cb) + map->unmap_cb(map); +} + +void perf_mmap__get(struct perf_mmap *map) +{ + refcount_inc(&map->refcnt); +} + +void perf_mmap__put(struct perf_mmap *map) +{ + BUG_ON(map->base && refcount_read(&map->refcnt) == 0); + + if (refcount_dec_and_test(&map->refcnt)) + perf_mmap__munmap(map); +} + +static inline void perf_mmap__write_tail(struct perf_mmap *md, u64 tail) +{ + ring_buffer_write_tail(md->base, tail); +} + +u64 perf_mmap__read_head(struct perf_mmap *map) +{ + return ring_buffer_read_head(map->base); +} + +static bool perf_mmap__empty(struct perf_mmap *map) +{ + struct perf_event_mmap_page *pc = map->base; + + return perf_mmap__read_head(map) == map->prev && !pc->aux_size; +} + +void perf_mmap__consume(struct perf_mmap *map) +{ + if (!map->overwrite) { + u64 old = map->prev; + + perf_mmap__write_tail(map, old); + } + + if (refcount_read(&map->refcnt) == 1 && perf_mmap__empty(map)) + perf_mmap__put(map); +} + +static int overwrite_rb_find_range(void *buf, int mask, u64 *start, u64 *end) +{ + struct perf_event_header *pheader; + u64 evt_head = *start; + int size = mask + 1; + + pr_debug2("%s: buf=%p, start=%"PRIx64"\n", __func__, buf, *start); + pheader = (struct perf_event_header *)(buf + (*start & mask)); + while (true) { + if (evt_head - *start >= (unsigned int)size) { + pr_debug("Finished reading overwrite ring buffer: rewind\n"); + if (evt_head - *start > (unsigned int)size) + evt_head -= pheader->size; + *end = evt_head; + return 0; + } + + pheader = (struct perf_event_header *)(buf + (evt_head & mask)); + + if (pheader->size == 0) { + pr_debug("Finished reading overwrite ring buffer: get start\n"); + *end = evt_head; + return 0; + } + + evt_head += pheader->size; + pr_debug3("move evt_head: %"PRIx64"\n", evt_head); + } + WARN_ONCE(1, "Shouldn't get here\n"); + return -1; +} + +/* + * Report the start and end of the available data in ringbuffer + */ +static int __perf_mmap__read_init(struct perf_mmap *md) +{ + u64 head = perf_mmap__read_head(md); + u64 old = md->prev; + unsigned char *data = md->base + page_size; + unsigned long size; + + md->start = md->overwrite ? head : old; + md->end = md->overwrite ? old : head; + + if ((md->end - md->start) < md->flush) + return -EAGAIN; + + size = md->end - md->start; + if (size > (unsigned long)(md->mask) + 1) { + if (!md->overwrite) { + WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n"); + + md->prev = head; + perf_mmap__consume(md); + return -EAGAIN; + } + + /* + * Backward ring buffer is full. We still have a chance to read + * most of data from it. + */ + if (overwrite_rb_find_range(data, md->mask, &md->start, &md->end)) + return -EINVAL; + } + + return 0; +} + +int perf_mmap__read_init(struct perf_mmap *map) +{ + /* + * Check if event was unmapped due to a POLLHUP/POLLERR. + */ + if (!refcount_read(&map->refcnt)) + return -ENOENT; + + return __perf_mmap__read_init(map); +} + +/* + * Mandatory for overwrite mode + * The direction of overwrite mode is backward. + * The last perf_mmap__read() will set tail to map->core.prev. + * Need to correct the map->core.prev to head which is the end of next read. + */ +void perf_mmap__read_done(struct perf_mmap *map) +{ + /* + * Check if event was unmapped due to a POLLHUP/POLLERR. + */ + if (!refcount_read(&map->refcnt)) + return; + + map->prev = perf_mmap__read_head(map); +} + +/* When check_messup is true, 'end' must points to a good entry */ +static union perf_event *perf_mmap__read(struct perf_mmap *map, + u64 *startp, u64 end) +{ + unsigned char *data = map->base + page_size; + union perf_event *event = NULL; + int diff = end - *startp; + + if (diff >= (int)sizeof(event->header)) { + size_t size; + + event = (union perf_event *)&data[*startp & map->mask]; + size = event->header.size; + + if (size < sizeof(event->header) || diff < (int)size) + return NULL; + + /* + * Event straddles the mmap boundary -- header should always + * be inside due to u64 alignment of output. + */ + if ((*startp & map->mask) + size != ((*startp + size) & map->mask)) { + unsigned int offset = *startp; + unsigned int len = min(sizeof(*event), size), cpy; + void *dst = map->event_copy; + + do { + cpy = min(map->mask + 1 - (offset & map->mask), len); + memcpy(dst, &data[offset & map->mask], cpy); + offset += cpy; + dst += cpy; + len -= cpy; + } while (len); + + event = (union perf_event *)map->event_copy; + } + + *startp += size; + } + + return event; +} + +/* + * Read event from ring buffer one by one. + * Return one event for each call. + * + * Usage: + * perf_mmap__read_init() + * while(event = perf_mmap__read_event()) { + * //process the event + * perf_mmap__consume() + * } + * perf_mmap__read_done() + */ +union perf_event *perf_mmap__read_event(struct perf_mmap *map) +{ + union perf_event *event; + + /* + * Check if event was unmapped due to a POLLHUP/POLLERR. + */ + if (!refcount_read(&map->refcnt)) + return NULL; + + /* non-overwirte doesn't pause the ringbuffer */ + if (!map->overwrite) + map->end = perf_mmap__read_head(map); + + event = perf_mmap__read(map, &map->start, map->end); + + if (!map->overwrite) + map->prev = map->start; + + return event; +} diff --git a/tools/perf/tests/attr/system-wide-dummy b/tools/perf/tests/attr/system-wide-dummy new file mode 100644 index 0000000000000000000000000000000000000000..eba723cc0d380ecc2661a98986cf3c3bb6f8deb6 --- /dev/null +++ b/tools/perf/tests/attr/system-wide-dummy @@ -0,0 +1,50 @@ +# Event added by system-wide or CPU perf-record to handle the race of +# processes starting while /proc is processed. +[event] +fd=1 +group_fd=-1 +cpu=* +pid=-1 +flags=8 +type=1 +size=120 +config=9 +sample_period=4000 +sample_type=455 +read_format=4 +# Event will be enabled right away. +disabled=0 +inherit=1 +pinned=0 +exclusive=0 +exclude_user=0 +exclude_kernel=0 +exclude_hv=0 +exclude_idle=0 +mmap=1 +comm=1 +freq=1 +inherit_stat=0 +enable_on_exec=0 +task=1 +watermark=0 +precise_ip=0 +mmap_data=0 +sample_id_all=1 +exclude_host=0 +exclude_guest=0 +exclude_callchain_kernel=0 +exclude_callchain_user=0 +mmap2=1 +comm_exec=1 +context_switch=0 +write_backward=0 +namespaces=0 +use_clockid=0 +wakeup_events=0 +bp_type=0 +config1=0 +config2=0 +branch_sample_type=0 +sample_regs_user=0 +sample_stack_user=0 diff --git a/tools/perf/tests/attr/test-record-C0 b/tools/perf/tests/attr/test-record-C0 index 93818054ae2086e4818f14231ef9c731d27cb71d..317730b906dd3cf4c18804de30dcd06bb0d51eca 100644 --- a/tools/perf/tests/attr/test-record-C0 +++ b/tools/perf/tests/attr/test-record-C0 @@ -9,6 +9,14 @@ cpu=0 # no enable on exec for CPU attached enable_on_exec=0 -# PERF_SAMPLE_IP | PERF_SAMPLE_TID PERF_SAMPLE_TIME | # PERF_SAMPLE_PERIOD +# PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | +# PERF_SAMPLE_ID | PERF_SAMPLE_PERIOD # + PERF_SAMPLE_CPU added by -C 0 -sample_type=391 +sample_type=455 + +# Dummy event handles mmaps, comm and task. +mmap=0 +comm=0 +task=0 + +[event:system-wide-dummy] diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c index 5128f727c0ef1d188ddbfa96b8d131343bcdefcb..15cea518f5add2df0c108f8e3081dc103c8ac7e6 100644 --- a/tools/perf/tests/backward-ring-buffer.c +++ b/tools/perf/tests/backward-ring-buffer.c @@ -13,6 +13,7 @@ #include "util/mmap.h" #include #include +#include #define NR_ITERS 111 @@ -37,8 +38,8 @@ static int count_samples(struct evlist *evlist, int *sample_count, struct mmap *map = &evlist->overwrite_mmap[i]; union perf_event *event; - perf_mmap__read_init(map); - while ((event = perf_mmap__read_event(map)) != NULL) { + perf_mmap__read_init(&map->core); + while ((event = perf_mmap__read_event(&map->core)) != NULL) { const u32 type = event->header.type; switch (type) { @@ -53,7 +54,7 @@ static int count_samples(struct evlist *evlist, int *sample_count, return TEST_FAIL; } } - perf_mmap__read_done(map); + perf_mmap__read_done(&map->core); } return TEST_OK; } diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c index 8669bb85e7c7fc9f04eb611b4f0295565a019b37..a439868738bd2920b38b44c5f7795b5c043e75b2 100644 --- a/tools/perf/tests/bpf.c +++ b/tools/perf/tests/bpf.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "tests.h" #include "llvm.h" #include "debug.h" @@ -185,16 +186,16 @@ static int do_test(struct bpf_object *obj, int (*func)(void), struct mmap *md; md = &evlist->mmap[i]; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) continue; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { const u32 type = event->header.type; if (type == PERF_RECORD_SAMPLE) count ++; } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); } if (count != expect) { diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index 55774baffc2aecc2fb2f1eb5d61636c79aaa52e8..d111d2ed1c10ceba03a593c2ae71f45cdf4fbed6 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -259,6 +259,11 @@ static struct test generic_tests[] = { .desc = "Print cpu map", .func = test__cpu_map_print, }, + { + .desc = "Merge cpu map", + .func = test__cpu_map_merge, + }, + { .desc = "Probe SDT events", .func = test__sdt_event, diff --git a/tools/perf/tests/code-reading.c b/tools/perf/tests/code-reading.c index f5764a3890b96913fd097772fbc393dc6ee620bb..1f017e1b2a55ed73ad667bfcf08742e549cf7452 100644 --- a/tools/perf/tests/code-reading.c +++ b/tools/perf/tests/code-reading.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "debug.h" #include "dso.h" @@ -425,16 +426,16 @@ static int process_events(struct machine *machine, struct evlist *evlist, for (i = 0; i < evlist->core.nr_mmaps; i++) { md = &evlist->mmap[i]; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) continue; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { ret = process_event(machine, evlist, event, state); - perf_mmap__consume(md); + perf_mmap__consume(&md->core); if (ret < 0) return ret; } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); } return 0; } diff --git a/tools/perf/tests/cpumap.c b/tools/perf/tests/cpumap.c index 8a0d236202b05ba179199b2709080b949c5d52d3..4ac56741ac5fe6375fb4e0a1171b7bbd5a7f7c42 100644 --- a/tools/perf/tests/cpumap.c +++ b/tools/perf/tests/cpumap.c @@ -120,3 +120,19 @@ int test__cpu_map_print(struct test *test __maybe_unused, int subtest __maybe_un TEST_ASSERT_VAL("failed to convert map", cpu_map_print("1-10,12-20,22-30,32-40")); return 0; } + +int test__cpu_map_merge(struct test *test __maybe_unused, int subtest __maybe_unused) +{ + struct perf_cpu_map *a = perf_cpu_map__new("4,2,1"); + struct perf_cpu_map *b = perf_cpu_map__new("4,5,7"); + struct perf_cpu_map *c = perf_cpu_map__merge(a, b); + char buf[100]; + + TEST_ASSERT_VAL("failed to merge map: bad nr", c->nr == 5); + cpu_map__snprint(c, buf, sizeof(buf)); + TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "1-2,4-5,7")); + perf_cpu_map__put(a); + perf_cpu_map__put(b); + perf_cpu_map__put(c); + return 0; +} diff --git a/tools/perf/tests/event-times.c b/tools/perf/tests/event-times.c index 1ee8704e22849726dd30a0aa74642515fd07f059..1e8a9f5c356dd623226c5fb7dee5e4f30b002b6f 100644 --- a/tools/perf/tests/event-times.c +++ b/tools/perf/tests/event-times.c @@ -125,7 +125,7 @@ static int attach__cpu_disabled(struct evlist *evlist) evsel->core.attr.disabled = 1; - err = perf_evsel__open_per_cpu(evsel, cpus); + err = perf_evsel__open_per_cpu(evsel, cpus, -1); if (err) { if (err == -EACCES) return TEST_SKIP; @@ -152,7 +152,7 @@ static int attach__cpu_enabled(struct evlist *evlist) return -1; } - err = perf_evsel__open_per_cpu(evsel, cpus); + err = perf_evsel__open_per_cpu(evsel, cpus, -1); if (err == -EACCES) return TEST_SKIP; diff --git a/tools/perf/tests/evsel-tp-sched.c b/tools/perf/tests/evsel-tp-sched.c index 261e6eaaee99d6394a8a2a3ed6d079f755af29ca..3805e5a74e17f762eda39f55925d8b03e699ff75 100644 --- a/tools/perf/tests/evsel-tp-sched.c +++ b/tools/perf/tests/evsel-tp-sched.c @@ -35,11 +35,11 @@ static int perf_evsel__test_field(struct evsel *evsel, const char *name, int test__perf_evsel__tp_sched_test(struct test *test __maybe_unused, int subtest __maybe_unused) { - struct evsel *evsel = perf_evsel__newtp("sched", "sched_switch"); + struct evsel *evsel = evsel__newtp("sched", "sched_switch"); int ret = 0; if (IS_ERR(evsel)) { - pr_debug("perf_evsel__newtp failed with %ld\n", PTR_ERR(evsel)); + pr_debug("evsel__newtp failed with %ld\n", PTR_ERR(evsel)); return -1; } @@ -66,10 +66,10 @@ int test__perf_evsel__tp_sched_test(struct test *test __maybe_unused, int subtes evsel__delete(evsel); - evsel = perf_evsel__newtp("sched", "sched_wakeup"); + evsel = evsel__newtp("sched", "sched_wakeup"); if (IS_ERR(evsel)) { - pr_debug("perf_evsel__newtp failed with %ld\n", PTR_ERR(evsel)); + pr_debug("evsel__newtp failed with %ld\n", PTR_ERR(evsel)); return -1; } diff --git a/tools/perf/tests/hists_cumulate.c b/tools/perf/tests/hists_cumulate.c index 6367c8f6ca22f80cd340b0668c2a5d1fe476cbd0..7a542f1c1c78477c1715cb6b91010298d3248ab5 100644 --- a/tools/perf/tests/hists_cumulate.c +++ b/tools/perf/tests/hists_cumulate.c @@ -280,7 +280,7 @@ static int test1(struct evsel *evsel, struct machine *machine) symbol_conf.use_callchain = false; symbol_conf.cumulate_callchain = false; - perf_evsel__reset_sample_bit(evsel, CALLCHAIN); + evsel__reset_sample_bit(evsel, CALLCHAIN); setup_sorting(NULL); callchain_register_param(&callchain_param); @@ -427,7 +427,7 @@ static int test2(struct evsel *evsel, struct machine *machine) symbol_conf.use_callchain = true; symbol_conf.cumulate_callchain = false; - perf_evsel__set_sample_bit(evsel, CALLCHAIN); + evsel__set_sample_bit(evsel, CALLCHAIN); setup_sorting(NULL); callchain_register_param(&callchain_param); @@ -485,7 +485,7 @@ static int test3(struct evsel *evsel, struct machine *machine) symbol_conf.use_callchain = false; symbol_conf.cumulate_callchain = true; - perf_evsel__reset_sample_bit(evsel, CALLCHAIN); + evsel__reset_sample_bit(evsel, CALLCHAIN); setup_sorting(NULL); callchain_register_param(&callchain_param); @@ -669,7 +669,7 @@ static int test4(struct evsel *evsel, struct machine *machine) symbol_conf.use_callchain = true; symbol_conf.cumulate_callchain = true; - perf_evsel__set_sample_bit(evsel, CALLCHAIN); + evsel__set_sample_bit(evsel, CALLCHAIN); setup_sorting(NULL); diff --git a/tools/perf/tests/keep-tracking.c b/tools/perf/tests/keep-tracking.c index 92c7d591bcacca3d717db7b48b0a8cfbde75c0c1..50a0c9fcde7da3c79e3a80706b5e08fa451f8b08 100644 --- a/tools/perf/tests/keep-tracking.c +++ b/tools/perf/tests/keep-tracking.c @@ -5,6 +5,7 @@ #include #include #include +#include #include "debug.h" #include "parse-events.h" @@ -38,17 +39,17 @@ static int find_comm(struct evlist *evlist, const char *comm) found = 0; for (i = 0; i < evlist->core.nr_mmaps; i++) { md = &evlist->mmap[i]; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) continue; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { if (event->header.type == PERF_RECORD_COMM && (pid_t)event->comm.pid == getpid() && (pid_t)event->comm.tid == getpid() && strcmp(event->comm.comm, comm) == 0) found += 1; - perf_mmap__consume(md); + perf_mmap__consume(&md->core); } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); } return found; } diff --git a/tools/perf/tests/mmap-basic.c b/tools/perf/tests/mmap-basic.c index 3a22dce991ba94c2ecc7499a9f9de9d05570ac6b..3fcd6e56ecb3eba82cdcb576ead2ad4f3d6850a0 100644 --- a/tools/perf/tests/mmap-basic.c +++ b/tools/perf/tests/mmap-basic.c @@ -16,6 +16,7 @@ #include #include #include +#include /* * This test will generate random numbers of calls to some getpid syscalls, @@ -78,14 +79,14 @@ int test__basic_mmap(struct test *test __maybe_unused, int subtest __maybe_unuse char name[64]; snprintf(name, sizeof(name), "sys_enter_%s", syscall_names[i]); - evsels[i] = perf_evsel__newtp("syscalls", name); + evsels[i] = evsel__newtp("syscalls", name); if (IS_ERR(evsels[i])) { - pr_debug("perf_evsel__new(%s)\n", name); + pr_debug("evsel__new(%s)\n", name); goto out_delete_evlist; } evsels[i]->core.attr.wakeup_events = 1; - perf_evsel__set_sample_id(evsels[i], false); + evsel__set_sample_id(evsels[i], false); evlist__add(evlist, evsels[i]); @@ -113,10 +114,10 @@ int test__basic_mmap(struct test *test __maybe_unused, int subtest __maybe_unuse } md = &evlist->mmap[0]; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) goto out_init; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { struct perf_sample sample; if (event->header.type != PERF_RECORD_SAMPLE) { @@ -139,9 +140,9 @@ int test__basic_mmap(struct test *test __maybe_unused, int subtest __maybe_unuse goto out_delete_evlist; } nr_events[evsel->idx]++; - perf_mmap__consume(md); + perf_mmap__consume(&md->core); } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); out_init: err = 0; diff --git a/tools/perf/tests/openat-syscall-all-cpus.c b/tools/perf/tests/openat-syscall-all-cpus.c index 93c176523e385d8a97a837ab8bc76fcbda5479e1..e6013b99d2c7d42f12056a016adfe115b6a8200b 100644 --- a/tools/perf/tests/openat-syscall-all-cpus.c +++ b/tools/perf/tests/openat-syscall-all-cpus.c @@ -44,7 +44,7 @@ int test__openat_syscall_event_on_all_cpus(struct test *test __maybe_unused, int CPU_ZERO(&cpu_set); - evsel = perf_evsel__newtp("syscalls", "sys_enter_openat"); + evsel = evsel__newtp("syscalls", "sys_enter_openat"); if (IS_ERR(evsel)) { tracing_path__strerror_open_tp(errno, errbuf, sizeof(errbuf), "syscalls", "sys_enter_openat"); pr_debug("%s\n", errbuf); diff --git a/tools/perf/tests/openat-syscall-tp-fields.c b/tools/perf/tests/openat-syscall-tp-fields.c index 2b5c468130537417dceaae8d703d358194897182..d6b3ddf0c9ffe58ba0ebc5fd50dfffa61b7eb5f7 100644 --- a/tools/perf/tests/openat-syscall-tp-fields.c +++ b/tools/perf/tests/openat-syscall-tp-fields.c @@ -13,6 +13,7 @@ #include "debug.h" #include "util/mmap.h" #include +#include #ifndef O_DIRECTORY #define O_DIRECTORY 00200000 @@ -45,9 +46,9 @@ int test__syscall_openat_tp_fields(struct test *test __maybe_unused, int subtest goto out; } - evsel = perf_evsel__newtp("syscalls", "sys_enter_openat"); + evsel = evsel__newtp("syscalls", "sys_enter_openat"); if (IS_ERR(evsel)) { - pr_debug("%s: perf_evsel__newtp\n", __func__); + pr_debug("%s: evsel__newtp\n", __func__); goto out_delete_evlist; } @@ -92,10 +93,10 @@ int test__syscall_openat_tp_fields(struct test *test __maybe_unused, int subtest struct mmap *md; md = &evlist->mmap[i]; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) continue; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { const u32 type = event->header.type; int tp_flags; struct perf_sample sample; @@ -103,7 +104,7 @@ int test__syscall_openat_tp_fields(struct test *test __maybe_unused, int subtest ++nr_events; if (type != PERF_RECORD_SAMPLE) { - perf_mmap__consume(md); + perf_mmap__consume(&md->core); continue; } @@ -123,7 +124,7 @@ int test__syscall_openat_tp_fields(struct test *test __maybe_unused, int subtest goto out_ok; } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); } if (nr_events == before) diff --git a/tools/perf/tests/openat-syscall.c b/tools/perf/tests/openat-syscall.c index 5ebffae186051f8213a408890d9787ebfb9c39a0..7e9e57a1275b8a782cc5e6d62f2110073428c786 100644 --- a/tools/perf/tests/openat-syscall.c +++ b/tools/perf/tests/openat-syscall.c @@ -27,7 +27,7 @@ int test__openat_syscall_event(struct test *test __maybe_unused, int subtest __m return -1; } - evsel = perf_evsel__newtp("syscalls", "sys_enter_openat"); + evsel = evsel__newtp("syscalls", "sys_enter_openat"); if (IS_ERR(evsel)) { tracing_path__strerror_open_tp(errno, errbuf, sizeof(errbuf), "syscalls", "sys_enter_openat"); pr_debug("%s\n", errbuf); diff --git a/tools/perf/tests/perf-record.c b/tools/perf/tests/perf-record.c index 437426be29e99db7f9d33cc2bce3651f9df73ef8..83adfd846ccda5c1b83ca6563fda4c4bb65506d1 100644 --- a/tools/perf/tests/perf-record.c +++ b/tools/perf/tests/perf-record.c @@ -6,6 +6,7 @@ #include #include +#include #include "evlist.h" #include "evsel.h" #include "debug.h" @@ -105,9 +106,9 @@ int test__PERF_RECORD(struct test *test __maybe_unused, int subtest __maybe_unus * Config the evsels, setting attr->comm on the first one, etc. */ evsel = evlist__first(evlist); - perf_evsel__set_sample_bit(evsel, CPU); - perf_evsel__set_sample_bit(evsel, TID); - perf_evsel__set_sample_bit(evsel, TIME); + evsel__set_sample_bit(evsel, CPU); + evsel__set_sample_bit(evsel, TID); + evsel__set_sample_bit(evsel, TIME); perf_evlist__config(evlist, &opts, NULL); err = sched__get_first_possible_cpu(evlist->workload.pid, &cpu_mask); @@ -170,10 +171,10 @@ int test__PERF_RECORD(struct test *test __maybe_unused, int subtest __maybe_unus struct mmap *md; md = &evlist->mmap[i]; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) continue; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { const u32 type = event->header.type; const char *name = perf_event__name(type); @@ -276,9 +277,9 @@ int test__PERF_RECORD(struct test *test __maybe_unused, int subtest __maybe_unus ++errs; } - perf_mmap__consume(md); + perf_mmap__consume(&md->core); } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); } /* diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c index 2a01dd1d725c4f1f143124ed1e34f82dae016a1d..31cca805dafffcbba60569c0f40a36a4f811941d 100644 --- a/tools/perf/tests/sample-parsing.c +++ b/tools/perf/tests/sample-parsing.c @@ -99,6 +99,7 @@ static bool samples_same(const struct perf_sample *s1, if (type & PERF_SAMPLE_BRANCH_STACK) { COMP(branch_stack->nr); + COMP(branch_stack->hw_idx); for (i = 0; i < s1->branch_stack->nr; i++) MCOMP(branch_stack->entries[i]); } @@ -150,6 +151,9 @@ static bool samples_same(const struct perf_sample *s1, if (type & PERF_SAMPLE_PHYS_ADDR) COMP(phys_addr); + if (type & PERF_SAMPLE_CGROUP) + COMP(cgroup); + if (type & PERF_SAMPLE_AUX) { COMP(aux_sample.size); if (memcmp(s1->aux_sample.data, s2->aux_sample.data, @@ -186,7 +190,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) u64 data[64]; } branch_stack = { /* 1 branch_entry */ - .data = {1, 211, 212, 213}, + .data = {1, -1ULL, 211, 212, 213}, }; u64 regs[64]; const u32 raw_data[] = {0x12345678, 0x0a0b0c0d, 0x11020304, 0x05060708, 0 }; @@ -208,6 +212,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) .transaction = 112, .raw_data = (void *)raw_data, .callchain = &callchain.callchain, + .no_hw_idx = false, .branch_stack = &branch_stack.branch_stack, .user_regs = { .abi = PERF_SAMPLE_REGS_ABI_64, @@ -228,6 +233,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) .regs = regs, }, .phys_addr = 113, + .cgroup = 114, .aux_sample = { .size = sizeof(aux_data), .data = (void *)aux_data, @@ -244,6 +250,9 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) if (sample_type & PERF_SAMPLE_REGS_INTR) evsel.core.attr.sample_regs_intr = sample_regs; + if (sample_type & PERF_SAMPLE_BRANCH_STACK) + evsel.core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; + for (i = 0; i < sizeof(regs); i++) *(i + (u8 *)regs) = i & 0xfe; @@ -331,7 +340,7 @@ int test__sample_parsing(struct test *test __maybe_unused, int subtest __maybe_u * were added. Please actually update the test rather than just change * the condition below. */ - if (PERF_SAMPLE_MAX > PERF_SAMPLE_AUX << 1) { + if (PERF_SAMPLE_MAX > PERF_SAMPLE_CGROUP << 1) { pr_debug("sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating\n"); return -1; } diff --git a/tools/perf/tests/sw-clock.c b/tools/perf/tests/sw-clock.c index 84519df87f309ecf466acdf54be13b04dac9bf03..4b9b731977c8c261c6aa396a16a9bab4024ed3af 100644 --- a/tools/perf/tests/sw-clock.c +++ b/tools/perf/tests/sw-clock.c @@ -15,6 +15,7 @@ #include "util/mmap.h" #include "util/thread_map.h" #include +#include #define NR_LOOPS 10000000 @@ -55,7 +56,7 @@ static int __test__sw_clock_freq(enum perf_sw_ids clock_id) evsel = evsel__new(&attr); if (evsel == NULL) { - pr_debug("perf_evsel__new\n"); + pr_debug("evsel__new\n"); goto out_delete_evlist; } evlist__add(evlist, evsel); @@ -99,10 +100,10 @@ static int __test__sw_clock_freq(enum perf_sw_ids clock_id) evlist__disable(evlist); md = &evlist->mmap[0]; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) goto out_init; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { struct perf_sample sample; if (event->header.type != PERF_RECORD_SAMPLE) @@ -117,9 +118,9 @@ static int __test__sw_clock_freq(enum perf_sw_ids clock_id) total_periods += sample.period; nr_samples++; next_event: - perf_mmap__consume(md); + perf_mmap__consume(&md->core); } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); out_init: if ((u64) nr_samples == total_periods) { diff --git a/tools/perf/tests/switch-tracking.c b/tools/perf/tests/switch-tracking.c index ffa592e0020eeda9963a27ab04f0c9c68e4144fe..b08c8a0898d33d58dfc5baff83c2b28d8d4b364c 100644 --- a/tools/perf/tests/switch-tracking.c +++ b/tools/perf/tests/switch-tracking.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "debug.h" #include "parse-events.h" @@ -269,17 +270,17 @@ static int process_events(struct evlist *evlist, for (i = 0; i < evlist->core.nr_mmaps; i++) { md = &evlist->mmap[i]; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) continue; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { cnt += 1; ret = add_event(evlist, &events, event); - perf_mmap__consume(md); + perf_mmap__consume(&md->core); if (ret < 0) goto out_free_nodes; } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); } events_array = calloc(cnt, sizeof(struct event_node)); @@ -393,8 +394,8 @@ int test__switch_tracking(struct test *test __maybe_unused, int subtest __maybe_ switch_evsel = evlist__last(evlist); - perf_evsel__set_sample_bit(switch_evsel, CPU); - perf_evsel__set_sample_bit(switch_evsel, TIME); + evsel__set_sample_bit(switch_evsel, CPU); + evsel__set_sample_bit(switch_evsel, TIME); switch_evsel->core.system_wide = true; switch_evsel->no_aux_samples = true; @@ -411,8 +412,8 @@ int test__switch_tracking(struct test *test __maybe_unused, int subtest __maybe_ goto out_err; } - perf_evsel__set_sample_bit(cycles_evsel, CPU); - perf_evsel__set_sample_bit(cycles_evsel, TIME); + evsel__set_sample_bit(cycles_evsel, CPU); + evsel__set_sample_bit(cycles_evsel, TIME); /* Fourth event */ err = parse_events(evlist, "dummy:u", NULL); @@ -428,7 +429,7 @@ int test__switch_tracking(struct test *test __maybe_unused, int subtest __maybe_ tracking_evsel->core.attr.freq = 0; tracking_evsel->core.attr.sample_period = 1; - perf_evsel__set_sample_bit(tracking_evsel, TIME); + evsel__set_sample_bit(tracking_evsel, TIME); /* Config events */ perf_evlist__config(evlist, &opts, NULL); diff --git a/tools/perf/tests/task-exit.c b/tools/perf/tests/task-exit.c index d85c9f608564cf2d6fe0be75020fb63cf282977f..adaff904433199b046dca9d6d451fabe48efe201 100644 --- a/tools/perf/tests/task-exit.c +++ b/tools/perf/tests/task-exit.c @@ -12,6 +12,7 @@ #include #include #include +#include static int exited; static int nr_exit; @@ -119,16 +120,16 @@ int test__task_exit(struct test *test __maybe_unused, int subtest __maybe_unused retry: md = &evlist->mmap[0]; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) goto out_init; - while ((event = perf_mmap__read_event(md)) != NULL) { + while ((event = perf_mmap__read_event(&md->core)) != NULL) { if (event->header.type == PERF_RECORD_EXIT) nr_exit++; - perf_mmap__consume(md); + perf_mmap__consume(&md->core); } - perf_mmap__read_done(md); + perf_mmap__read_done(&md->core); out_init: if (!exited || !nr_exit) { diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h index 72912eb473cb14c3f563bf7750bac4085daa6725..27e3c65c6955b9e78ef735b1429b9ad33ee3bc3d 100644 --- a/tools/perf/tests/tests.h +++ b/tools/perf/tests/tests.h @@ -98,6 +98,7 @@ int test__event_update(struct test *test, int subtest); int test__event_times(struct test *test, int subtest); int test__backward_ring_buffer(struct test *test, int subtest); int test__cpu_map_print(struct test *test, int subtest); +int test__cpu_map_merge(struct test *test, int subtest); int test__sdt_event(struct test *test, int subtest); int test__is_printable_array(struct test *test, int subtest); int test__bitmap_print(struct test *test, int subtest); diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c index f4a2c0df09549e4cccf1f478ac8bd1fccc38043c..c8af1310bd7c7b7073bfdae9d3c80aaf2399eaa8 100644 --- a/tools/perf/tests/topology.c +++ b/tools/perf/tests/topology.c @@ -37,7 +37,7 @@ static int session_write_header(char *path) .mode = PERF_DATA_MODE_WRITE, }; - session = perf_session__new(&data, false, NULL); + session = perf_session__new(&data, NULL); TEST_ASSERT_VAL("can't get session", !IS_ERR(session)); session->evlist = perf_evlist__new_default(); @@ -67,7 +67,7 @@ static int check_cpu_topology(char *path, struct perf_cpu_map *map) }; int i; - session = perf_session__new(&data, false, NULL); + session = perf_session__new(&data, NULL); TEST_ASSERT_VAL("can't get session", !IS_ERR(session)); /* On platforms with large numbers of CPUs process_cpu_topology() diff --git a/tools/perf/util/Build b/tools/perf/util/Build index 013ec02ad872a2e23b8f732213b45a35d1e2e6de..df4219f7615a527716c1a70196349d24e62a2634 100644 --- a/tools/perf/util/Build +++ b/tools/perf/util/Build @@ -9,6 +9,8 @@ perf-y += db-export.o perf-y += env.o perf-y += event.o perf-y += evlist.o +perf-y += evlist-hybrid.o +perf-y += sideband_evlist.o perf-y += evsel.o perf-y += evsel_fprintf.o perf-y += perf_event_attr_fprintf.o @@ -48,12 +50,14 @@ perf-y += header.o perf-y += callchain.o perf-y += values.o perf-y += debug.o +perf-y += fncache.o perf-y += machine.o perf-y += map.o perf-y += pstack.o perf-y += session.o perf-y += sample-raw.o perf-y += s390-sample-raw.o +perf-y += amd-sample-raw.o perf-$(CONFIG_TRACE) += syscalltbl.o perf-y += ordered-events.o perf-y += namespaces.o @@ -66,6 +70,7 @@ perf-y += parse-events-bison.o perf-y += pmu.o perf-y += pmu-flex.o perf-y += pmu-bison.o +perf-y += pmu-hybrid.o perf-y += trace-event-read.o perf-y += trace-event-info.o perf-y += trace-event-scripting.o @@ -75,6 +80,7 @@ perf-y += sort.o perf-y += hist.o perf-y += util.o perf-y += cpumap.o +perf-y += affinity.o perf-y += cputopo.o perf-y += cgroup.o perf-y += target.o @@ -85,6 +91,7 @@ perf-y += counts.o perf-y += stat.o perf-y += stat-shadow.o perf-y += stat-display.o +perf-y += perf_api_probe.o perf-y += record.o perf-y += srcline.o perf-y += srccode.o @@ -122,6 +129,7 @@ perf-y += time-utils.o perf-y += expr-bison.o perf-y += branch.o perf-y += mem2node.o +perf-y += clockid.o perf-$(CONFIG_LIBBPF) += bpf-loader.o perf-$(CONFIG_LIBBPF) += bpf_map.o @@ -149,6 +157,7 @@ perf-$(CONFIG_LIBUNWIND_X86) += libunwind/x86_32.o perf-$(CONFIG_LIBUNWIND_AARCH64) += libunwind/arm64.o perf-$(CONFIG_LIBBABELTRACE) += data-convert-bt.o +perf-y += data-convert-json.o perf-y += scripting-engines/ diff --git a/tools/perf/util/affinity.c b/tools/perf/util/affinity.c new file mode 100644 index 0000000000000000000000000000000000000000..a5e31f82682804c3f6c837bc82e4197a6e649e4e --- /dev/null +++ b/tools/perf/util/affinity.c @@ -0,0 +1,73 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Manage affinity to optimize IPIs inside the kernel perf API. */ +#define _GNU_SOURCE 1 +#include +#include +#include +#include +#include "perf.h" +#include "cpumap.h" +#include "affinity.h" + +static int get_cpu_set_size(void) +{ + int sz = cpu__max_cpu() + 8 - 1; + /* + * sched_getaffinity doesn't like masks smaller than the kernel. + * Hopefully that's big enough. + */ + if (sz < 4096) + sz = 4096; + return sz / 8; +} + +int affinity__setup(struct affinity *a) +{ + int cpu_set_size = get_cpu_set_size(); + + a->orig_cpus = bitmap_alloc(cpu_set_size * 8); + if (!a->orig_cpus) + return -1; + sched_getaffinity(0, cpu_set_size, (cpu_set_t *)a->orig_cpus); + a->sched_cpus = bitmap_alloc(cpu_set_size * 8); + if (!a->sched_cpus) { + zfree(&a->orig_cpus); + return -1; + } + bitmap_zero((unsigned long *)a->sched_cpus, cpu_set_size); + a->changed = false; + return 0; +} + +/* + * perf_event_open does an IPI internally to the target CPU. + * It is more efficient to change perf's affinity to the target + * CPU and then set up all events on that CPU, so we amortize + * CPU communication. + */ +void affinity__set(struct affinity *a, int cpu) +{ + int cpu_set_size = get_cpu_set_size(); + + if (cpu == -1) + return; + a->changed = true; + set_bit(cpu, a->sched_cpus); + /* + * We ignore errors because affinity is just an optimization. + * This could happen for example with isolated CPUs or cpusets. + * In this case the IPIs inside the kernel's perf API still work. + */ + sched_setaffinity(0, cpu_set_size, (cpu_set_t *)a->sched_cpus); + clear_bit(cpu, a->sched_cpus); +} + +void affinity__cleanup(struct affinity *a) +{ + int cpu_set_size = get_cpu_set_size(); + + if (a->changed) + sched_setaffinity(0, cpu_set_size, (cpu_set_t *)a->orig_cpus); + zfree(&a->sched_cpus); + zfree(&a->orig_cpus); +} diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h new file mode 100644 index 0000000000000000000000000000000000000000..0ad6a18ef20c6cfce47411a6cab976cba40c4963 --- /dev/null +++ b/tools/perf/util/affinity.h @@ -0,0 +1,17 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef PERF_AFFINITY_H +#define PERF_AFFINITY_H 1 + +#include + +struct affinity { + unsigned long *orig_cpus; + unsigned long *sched_cpus; + bool changed; +}; + +void affinity__cleanup(struct affinity *a); +void affinity__set(struct affinity *a, int cpu); +int affinity__setup(struct affinity *a); + +#endif // PERF_AFFINITY_H diff --git a/tools/perf/util/amd-sample-raw.c b/tools/perf/util/amd-sample-raw.c new file mode 100644 index 0000000000000000000000000000000000000000..38b4be42d0a98837e1903c795ce2e2b040c8da4e --- /dev/null +++ b/tools/perf/util/amd-sample-raw.c @@ -0,0 +1,341 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * AMD specific. Provide textual annotation for IBS raw sample data. + */ + +#include +#include +#include +#include + +#include +#include "../../arch/x86/include/asm/amd-ibs.h" + +#include "debug.h" +#include "session.h" +#include "evlist.h" +#include "sample-raw.h" +#include "pmu-events/pmu-events.h" + +static u32 cpu_family, cpu_model, ibs_fetch_type, ibs_op_type; +static bool zen4_ibs_extensions; + +static void pr_ibs_fetch_ctl(union ibs_fetch_ctl reg) +{ + const char * const ic_miss_strs[] = { + " IcMiss 0", + " IcMiss 1", + }; + const char * const l1tlb_pgsz_strs[] = { + " L1TlbPgSz 4KB", + " L1TlbPgSz 2MB", + " L1TlbPgSz 1GB", + " L1TlbPgSz RESERVED" + }; + const char * const l1tlb_pgsz_strs_erratum1347[] = { + " L1TlbPgSz 4KB", + " L1TlbPgSz 16KB", + " L1TlbPgSz 2MB", + " L1TlbPgSz 1GB" + }; + const char *ic_miss_str = NULL; + const char *l1tlb_pgsz_str = NULL; + char l3_miss_str[sizeof(" L3MissOnly _ FetchOcMiss _ FetchL3Miss _")] = ""; + + if (cpu_family == 0x19 && cpu_model < 0x10) { + /* + * Erratum #1238 workaround is to ignore MSRC001_1030[IbsIcMiss] + * Erratum #1347 workaround is to use table provided in erratum + */ + if (reg.phy_addr_valid) + l1tlb_pgsz_str = l1tlb_pgsz_strs_erratum1347[reg.l1tlb_pgsz]; + } else { + if (reg.phy_addr_valid) + l1tlb_pgsz_str = l1tlb_pgsz_strs[reg.l1tlb_pgsz]; + ic_miss_str = ic_miss_strs[reg.ic_miss]; + } + + if (zen4_ibs_extensions) { + snprintf(l3_miss_str, sizeof(l3_miss_str), + " L3MissOnly %d FetchOcMiss %d FetchL3Miss %d", + reg.l3_miss_only, reg.fetch_oc_miss, reg.fetch_l3_miss); + } + + printf("ibs_fetch_ctl:\t%016llx MaxCnt %7d Cnt %7d Lat %5d En %d Val %d Comp %d%s " + "PhyAddrValid %d%s L1TlbMiss %d L2TlbMiss %d RandEn %d%s%s\n", + reg.val, reg.fetch_maxcnt << 4, reg.fetch_cnt << 4, reg.fetch_lat, + reg.fetch_en, reg.fetch_val, reg.fetch_comp, ic_miss_str ? : "", + reg.phy_addr_valid, l1tlb_pgsz_str ? : "", reg.l1tlb_miss, reg.l2tlb_miss, + reg.rand_en, reg.fetch_comp ? (reg.fetch_l2_miss ? " L2Miss 1" : " L2Miss 0") : "", + l3_miss_str); +} + +static void pr_ic_ibs_extd_ctl(union ic_ibs_extd_ctl reg) +{ + printf("ic_ibs_ext_ctl:\t%016llx IbsItlbRefillLat %3d\n", reg.val, reg.itlb_refill_lat); +} + +static void pr_ibs_op_ctl(union ibs_op_ctl reg) +{ + char l3_miss_only[sizeof(" L3MissOnly _")] = ""; + + if (zen4_ibs_extensions) + snprintf(l3_miss_only, sizeof(l3_miss_only), " L3MissOnly %d", reg.l3_miss_only); + + printf("ibs_op_ctl:\t%016llx MaxCnt %9d%s En %d Val %d CntCtl %d=%s CurCnt %9d\n", + reg.val, ((reg.opmaxcnt_ext << 16) | reg.opmaxcnt) << 4, l3_miss_only, + reg.op_en, reg.op_val, reg.cnt_ctl, + reg.cnt_ctl ? "uOps" : "cycles", reg.opcurcnt); +} + +static void pr_ibs_op_data(union ibs_op_data reg) +{ + printf("ibs_op_data:\t%016llx CompToRetCtr %5d TagToRetCtr %5d%s%s%s BrnRet %d " + " RipInvalid %d BrnFuse %d Microcode %d\n", + reg.val, reg.comp_to_ret_ctr, reg.tag_to_ret_ctr, + reg.op_brn_ret ? (reg.op_return ? " OpReturn 1" : " OpReturn 0") : "", + reg.op_brn_ret ? (reg.op_brn_taken ? " OpBrnTaken 1" : " OpBrnTaken 0") : "", + reg.op_brn_ret ? (reg.op_brn_misp ? " OpBrnMisp 1" : " OpBrnMisp 0") : "", + reg.op_brn_ret, reg.op_rip_invalid, reg.op_brn_fuse, reg.op_microcode); +} + +static void pr_ibs_op_data2_extended(union ibs_op_data2 reg) +{ + static const char * const data_src_str[] = { + "", + " DataSrc 1=Local L3 or other L1/L2 in CCX", + " DataSrc 2=A peer cache in a near CCX", + " DataSrc 3=Data returned from DRAM", + " DataSrc 4=(reserved)", + " DataSrc 5=A peer cache in a far CCX", + " DataSrc 6=DRAM address map with \"long latency\" bit set", + " DataSrc 7=Data returned from MMIO/Config/PCI/APIC", + " DataSrc 8=Extension Memory (S-Link, GenZ, etc)", + " DataSrc 9=(reserved)", + " DataSrc 10=(reserved)", + " DataSrc 11=(reserved)", + " DataSrc 12=Peer Agent Memory", + /* 13 to 31 are reserved. Avoid printing them. */ + }; + int data_src = (reg.data_src_hi << 3) | reg.data_src_lo; + + printf("ibs_op_data2:\t%016llx %sRmtNode %d%s\n", reg.val, + (data_src == 1 || data_src == 2 || data_src == 5) ? + (reg.cache_hit_st ? "CacheHitSt 1=O-State " : "CacheHitSt 0=M-state ") : "", + reg.rmt_node, + data_src < (int)ARRAY_SIZE(data_src_str) ? data_src_str[data_src] : ""); +} + +static void pr_ibs_op_data2_default(union ibs_op_data2 reg) +{ + static const char * const data_src_str[] = { + "", + " DataSrc 1=(reserved)", + " DataSrc 2=Local node cache", + " DataSrc 3=DRAM", + " DataSrc 4=Remote node cache", + " DataSrc 5=(reserved)", + " DataSrc 6=(reserved)", + " DataSrc 7=Other" + }; + + printf("ibs_op_data2:\t%016llx %sRmtNode %d%s\n", reg.val, + reg.data_src_lo == 2 ? (reg.cache_hit_st ? "CacheHitSt 1=O-State " + : "CacheHitSt 0=M-state ") : "", + reg.rmt_node, data_src_str[reg.data_src_lo]); +} + +static void pr_ibs_op_data2(union ibs_op_data2 reg) +{ + if (zen4_ibs_extensions) + return pr_ibs_op_data2_extended(reg); + pr_ibs_op_data2_default(reg); +} + +static void pr_ibs_op_data3(union ibs_op_data3 reg) +{ + char l2_miss_str[sizeof(" L2Miss _")] = ""; + char op_mem_width_str[sizeof(" OpMemWidth _____ bytes")] = ""; + char op_dc_miss_open_mem_reqs_str[sizeof(" OpDcMissOpenMemReqs __")] = ""; + + /* + * Erratum #1293 + * Ignore L2Miss and OpDcMissOpenMemReqs (and opdata2) if DcMissNoMabAlloc or SwPf set + */ + if (!(cpu_family == 0x19 && cpu_model < 0x10 && (reg.dc_miss_no_mab_alloc || reg.sw_pf))) { + snprintf(l2_miss_str, sizeof(l2_miss_str), " L2Miss %d", reg.l2_miss); + snprintf(op_dc_miss_open_mem_reqs_str, sizeof(op_dc_miss_open_mem_reqs_str), + " OpDcMissOpenMemReqs %2d", reg.op_dc_miss_open_mem_reqs); + } + + if (reg.op_mem_width) + snprintf(op_mem_width_str, sizeof(op_mem_width_str), + " OpMemWidth %2d bytes", 1 << (reg.op_mem_width - 1)); + + printf("ibs_op_data3:\t%016llx LdOp %d StOp %d DcL1TlbMiss %d DcL2TlbMiss %d " + "DcL1TlbHit2M %d DcL1TlbHit1G %d DcL2TlbHit2M %d DcMiss %d DcMisAcc %d " + "DcWcMemAcc %d DcUcMemAcc %d DcLockedOp %d DcMissNoMabAlloc %d DcLinAddrValid %d " + "DcPhyAddrValid %d DcL2TlbHit1G %d%s SwPf %d%s%s DcMissLat %5d TlbRefillLat %5d\n", + reg.val, reg.ld_op, reg.st_op, reg.dc_l1tlb_miss, reg.dc_l2tlb_miss, + reg.dc_l1tlb_hit_2m, reg.dc_l1tlb_hit_1g, reg.dc_l2tlb_hit_2m, reg.dc_miss, + reg.dc_mis_acc, reg.dc_wc_mem_acc, reg.dc_uc_mem_acc, reg.dc_locked_op, + reg.dc_miss_no_mab_alloc, reg.dc_lin_addr_valid, reg.dc_phy_addr_valid, + reg.dc_l2_tlb_hit_1g, l2_miss_str, reg.sw_pf, op_mem_width_str, + op_dc_miss_open_mem_reqs_str, reg.dc_miss_lat, reg.tlb_refill_lat); +} + +/* + * IBS Op/Execution MSRs always saved, in order, are: + * IBS_OP_CTL, IBS_OP_RIP, IBS_OP_DATA, IBS_OP_DATA2, + * IBS_OP_DATA3, IBS_DC_LINADDR, IBS_DC_PHYSADDR, BP_IBSTGT_RIP + */ +static void amd_dump_ibs_op(struct perf_sample *sample) +{ + struct perf_ibs_data *data = sample->raw_data; + union ibs_op_ctl *op_ctl = (union ibs_op_ctl *)data->data; + __u64 *rip = (__u64 *)op_ctl + 1; + union ibs_op_data *op_data = (union ibs_op_data *)(rip + 1); + union ibs_op_data3 *op_data3 = (union ibs_op_data3 *)(rip + 3); + + pr_ibs_op_ctl(*op_ctl); + if (!op_data->op_rip_invalid) + printf("IbsOpRip:\t%016llx\n", *rip); + pr_ibs_op_data(*op_data); + /* + * Erratum #1293: ignore op_data2 if DcMissNoMabAlloc or SwPf are set + */ + if (!(cpu_family == 0x19 && cpu_model < 0x10 && + (op_data3->dc_miss_no_mab_alloc || op_data3->sw_pf))) + pr_ibs_op_data2(*(union ibs_op_data2 *)(rip + 2)); + pr_ibs_op_data3(*op_data3); + if (op_data3->dc_lin_addr_valid) + printf("IbsDCLinAd:\t%016llx\n", *(rip + 4)); + if (op_data3->dc_phy_addr_valid) + printf("IbsDCPhysAd:\t%016llx\n", *(rip + 5)); + if (op_data->op_brn_ret && *(rip + 6)) + printf("IbsBrTarget:\t%016llx\n", *(rip + 6)); +} + +/* + * IBS Fetch MSRs always saved, in order, are: + * IBS_FETCH_CTL, IBS_FETCH_LINADDR, IBS_FETCH_PHYSADDR, IC_IBS_EXTD_CTL + */ +static void amd_dump_ibs_fetch(struct perf_sample *sample) +{ + struct perf_ibs_data *data = sample->raw_data; + union ibs_fetch_ctl *fetch_ctl = (union ibs_fetch_ctl *)data->data; + __u64 *addr = (__u64 *)fetch_ctl + 1; + union ic_ibs_extd_ctl *extd_ctl = (union ic_ibs_extd_ctl *)addr + 2; + + pr_ibs_fetch_ctl(*fetch_ctl); + printf("IbsFetchLinAd:\t%016llx\n", *addr++); + if (fetch_ctl->phy_addr_valid) + printf("IbsFetchPhysAd:\t%016llx\n", *addr); + pr_ic_ibs_extd_ctl(*extd_ctl); +} + +/* + * Test for enable and valid bits in captured control MSRs. + */ +static bool is_valid_ibs_fetch_sample(struct perf_sample *sample) +{ + struct perf_ibs_data *data = sample->raw_data; + union ibs_fetch_ctl *fetch_ctl = (union ibs_fetch_ctl *)data->data; + + if (fetch_ctl->fetch_en && fetch_ctl->fetch_val) + return true; + + return false; +} + +static bool is_valid_ibs_op_sample(struct perf_sample *sample) +{ + struct perf_ibs_data *data = sample->raw_data; + union ibs_op_ctl *op_ctl = (union ibs_op_ctl *)data->data; + + if (op_ctl->op_en && op_ctl->op_val) + return true; + + return false; +} + +/* AMD vendor specific raw sample function. Check for PERF_RECORD_SAMPLE events + * and if the event was triggered by IBS, display its raw data with decoded text. + * The function is only invoked when the dump flag -D is set. + */ +void evlist__amd_sample_raw(struct evlist *evlist, union perf_event *event, + struct perf_sample *sample) +{ + struct evsel *evsel; + + if (event->header.type != PERF_RECORD_SAMPLE || !sample->raw_size) + return; + + evsel = perf_evlist__event2evsel(evlist, event); + if (!evsel) + return; + + if (evsel->core.attr.type == ibs_fetch_type) { + if (!is_valid_ibs_fetch_sample(sample)) { + pr_debug("Invalid raw IBS Fetch MSR data encountered\n"); + return; + } + amd_dump_ibs_fetch(sample); + } else if (evsel->core.attr.type == ibs_op_type) { + if (!is_valid_ibs_op_sample(sample)) { + pr_debug("Invalid raw IBS Op MSR data encountered\n"); + return; + } + amd_dump_ibs_op(sample); + } +} + +static void parse_cpuid(struct perf_env *env) +{ + const char *cpuid; + int ret; + + cpuid = perf_env__cpuid(env); + /* + * cpuid = "AuthenticAMD,family,model,stepping" + */ + ret = sscanf(cpuid, "%*[^,],%u,%u", &cpu_family, &cpu_model); + if (ret != 2) + pr_debug("problem parsing cpuid\n"); +} + +/* + * Find and assign the type number used for ibs_op or ibs_fetch samples. + * Device names can be large - we are only interested in the first 9 characters, + * to match "ibs_fetch". + */ +bool evlist__has_amd_ibs(struct evlist *evlist) +{ + struct perf_env *env = evlist->env; + int ret, nr_pmu_mappings = perf_env__nr_pmu_mappings(env); + const char *pmu_mapping = perf_env__pmu_mappings(env); + char name[sizeof("ibs_fetch")]; + u32 type; + + while (nr_pmu_mappings--) { + ret = sscanf(pmu_mapping, "%u:%9s", &type, name); + if (ret == 2) { + if (strstarts(name, "ibs_op")) + ibs_op_type = type; + else if (strstarts(name, "ibs_fetch")) + ibs_fetch_type = type; + } + pmu_mapping += strlen(pmu_mapping) + 1 /* '\0' */; + } + + if (perf_env__find_pmu_cap(env, "ibs_op", "zen4_ibs_extensions")) + zen4_ibs_extensions = 1; + + if (ibs_fetch_type || ibs_op_type) { + if (!cpu_family) + parse_cpuid(env); + return true; + } + + return false; +} diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 0344cb86fa9554efbb9361380dad7ccc1a0913b6..ac7a5de062ccd14603e6a65634daa39d4242bf84 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1486,44 +1486,26 @@ annotation_line__print(struct annotation_line *al, struct symbol *sym, u64 start * means that it's not a disassembly line so should be treated differently. * The ops.raw part will be parsed further according to type of the instruction. */ -static int symbol__parse_objdump_line(struct symbol *sym, FILE *file, +static int symbol__parse_objdump_line(struct symbol *sym, struct annotate_args *args, - int *line_nr) + char *parsed_line, int *line_nr) { struct map *map = args->ms.map; struct annotation *notes = symbol__annotation(sym); struct disasm_line *dl; - char *line = NULL, *parsed_line, *tmp, *tmp2; - size_t line_len; + char *tmp; s64 line_ip, offset = -1; regmatch_t match[2]; - if (getline(&line, &line_len, file) < 0) - return -1; - - if (!line) - return -1; - - line_ip = -1; - parsed_line = strim(line); - /* /filename:linenr ? Save line number and ignore. */ if (regexec(&file_lineno, parsed_line, 2, match, 0) == 0) { *line_nr = atoi(parsed_line + match[1].rm_so); return 0; } - tmp = skip_spaces(parsed_line); - if (*tmp) { - /* - * Parse hexa addresses followed by ':' - */ - line_ip = strtoull(tmp, &tmp2, 16); - if (*tmp2 != ':' || tmp == tmp2 || tmp2[1] == '\0') - line_ip = -1; - } - - if (line_ip != -1) { + /* Process hex address followed by ':'. */ + line_ip = strtoull(parsed_line, &tmp, 16); + if (parsed_line != tmp && tmp[0] == ':' && tmp[1] != '\0') { u64 start = map__rip_2objdump(map, sym->start), end = map__rip_2objdump(map, sym->end); @@ -1531,7 +1513,7 @@ static int symbol__parse_objdump_line(struct symbol *sym, FILE *file, if ((u64)line_ip < start || (u64)line_ip >= end) offset = -1; else - parsed_line = tmp2 + 1; + parsed_line = tmp + 1; } args->offset = offset; @@ -1540,7 +1522,6 @@ static int symbol__parse_objdump_line(struct symbol *sym, FILE *file, args->ms.sym = sym; dl = disasm_line__new(args); - free(line); (*line_nr)++; if (dl == NULL) @@ -1858,6 +1839,67 @@ static int symbol__disassemble_bpf(struct symbol *sym __maybe_unused, } #endif // defined(HAVE_LIBBFD_SUPPORT) && defined(HAVE_LIBBPF_SUPPORT) +/* + * Possibly create a new version of line with tabs expanded. Returns the + * existing or new line, storage is updated if a new line is allocated. If + * allocation fails then NULL is returned. + */ +static char *expand_tabs(char *line, char **storage, size_t *storage_len) +{ + size_t i, src, dst, len, new_storage_len, num_tabs; + char *new_line; + size_t line_len = strlen(line); + + for (num_tabs = 0, i = 0; i < line_len; i++) + if (line[i] == '\t') + num_tabs++; + + if (num_tabs == 0) + return line; + + /* + * Space for the line and '\0', less the leading and trailing + * spaces. Each tab may introduce 7 additional spaces. + */ + new_storage_len = line_len + 1 + (num_tabs * 7); + + new_line = malloc(new_storage_len); + if (new_line == NULL) { + pr_err("Failure allocating memory for tab expansion\n"); + return NULL; + } + + /* + * Copy regions starting at src and expand tabs. If there are two + * adjacent tabs then 'src == i', the memcpy is of size 0 and the spaces + * are inserted. + */ + for (i = 0, src = 0, dst = 0; i < line_len && num_tabs; i++) { + if (line[i] == '\t') { + len = i - src; + memcpy(&new_line[dst], &line[src], len); + dst += len; + new_line[dst++] = ' '; + while (dst % 8 != 0) + new_line[dst++] = ' '; + src = i + 1; + num_tabs--; + } + } + + /* Expand the last region. */ + len = line_len + 1 - src; + memcpy(&new_line[dst], &line[src], len); + dst += len; + new_line[dst] = '\0'; + + free(*storage); + *storage = new_line; + *storage_len = new_storage_len; + return new_line; + +} + static int symbol__disassemble(struct symbol *sym, struct annotate_args *args) { struct annotation_options *opts = args->options; @@ -1873,6 +1915,8 @@ static int symbol__disassemble(struct symbol *sym, struct annotate_args *args) int lineno = 0; int nline; pid_t pid; + char *line; + size_t line_len; int err = dso__disassemble_filename(dso, symfs_filename, sizeof(symfs_filename)); if (err) @@ -1911,14 +1955,20 @@ static int symbol__disassemble(struct symbol *sym, struct annotate_args *args) err = asprintf(&command, "%s %s%s --start-address=0x%016" PRIx64 " --stop-address=0x%016" PRIx64 - " -l -d %s %s -C \"$1\" 2>/dev/null|grep -v \"$1:\"|expand", + " -l -d %s %s %s %c%s%c %s%s -C \"$1\"", opts->objdump_path ?: "objdump", opts->disassembler_style ? "-M " : "", opts->disassembler_style ?: "", map__rip_2objdump(map, sym->start), map__rip_2objdump(map, sym->end), - opts->show_asm_raw ? "" : "--no-show-raw", - opts->annotate_src ? "-S" : ""); + opts->show_asm_raw ? "" : "--no-show-raw-insn", + opts->annotate_src ? "-S" : "", + opts->prefix ? "--prefix " : "", + opts->prefix ? '"' : ' ', + opts->prefix ?: "", + opts->prefix ? '"' : ' ', + opts->prefix_strip ? "--prefix-strip=" : "", + opts->prefix_strip ?: ""); if (err < 0) { pr_err("Failure allocating memory for the command to run\n"); @@ -1961,18 +2011,40 @@ static int symbol__disassemble(struct symbol *sym, struct annotate_args *args) goto out_free_command; } + /* Storage for getline. */ + line = NULL; + line_len = 0; + nline = 0; while (!feof(file)) { + const char *match; + char *expanded_line; + + if (getline(&line, &line_len, file) < 0 || !line) + break; + + /* Skip lines containing "filename:" */ + match = strstr(line, symfs_filename); + if (match && match[strlen(symfs_filename)] == ':') + continue; + + expanded_line = strim(line); + expanded_line = expand_tabs(expanded_line, &line, &line_len); + if (!expanded_line) + break; + /* * The source code line number (lineno) needs to be kept in * across calls to symbol__parse_objdump_line(), so that it * can associate it with the instructions till the next one. * See disasm_line__new() and struct disasm_line::line_nr. */ - if (symbol__parse_objdump_line(sym, file, args, &lineno) < 0) + if (symbol__parse_objdump_line(sym, args, expanded_line, + &lineno) < 0) break; nline++; } + free(line); if (nline == 0) pr_err("No output from %s\n", command); @@ -3134,3 +3206,12 @@ int annotate_parse_percent_type(const struct option *opt, const char *_str, free(str1); return err; } + +int annotate_check_args(struct annotation_options *args) +{ + if (args->prefix_strip && !args->prefix) { + pr_err("--prefix-strip requires --prefix\n"); + return -1; + } + return 0; +} diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h index d76fd0e81f4658ed2b1f3e23874d031b094bdaf6..47f3d2fd20ced6016480463ab7d05f72b4c1fa56 100644 --- a/tools/perf/util/annotate.h +++ b/tools/perf/util/annotate.h @@ -93,6 +93,8 @@ struct annotation_options { int context; const char *objdump_path; const char *disassembler_style; + const char *prefix; + const char *prefix_strip; unsigned int percent_type; }; @@ -419,4 +421,7 @@ void annotation_config__init(void); int annotate_parse_percent_type(const struct option *opt, const char *_str, int unset); + +int annotate_check_args(struct annotation_options *args); + #endif /* __PERF_ANNOTATE_H */ diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c index 23a8c152e5686f38f40df21f63e7911f10e7f8e2..185136bae6f69b7097192da95522b2f1f8cf9600 100644 --- a/tools/perf/util/auxtrace.c +++ b/tools/perf/util/auxtrace.c @@ -33,6 +33,7 @@ #include "evsel.h" #include "evsel_config.h" #include "symbol.h" +#include "util/perf_api_probe.h" #include "util/synthetic-events.h" #include "thread_map.h" #include "asm/bug.h" @@ -59,25 +60,6 @@ #include "symbol/kallsyms.h" #include -static struct perf_pmu *perf_evsel__find_pmu(struct evsel *evsel) -{ - struct perf_pmu *pmu = NULL; - - while ((pmu = perf_pmu__scan(pmu)) != NULL) { - if (pmu->type == evsel->core.attr.type) - break; - } - - return pmu; -} - -static bool perf_evsel__is_aux_event(struct evsel *evsel) -{ - struct perf_pmu *pmu = perf_evsel__find_pmu(evsel); - - return pmu && pmu->auxtrace; -} - /* * Make a group from 'leader' to 'last', requiring that the events were not * already grouped to a different leader. @@ -720,10 +702,10 @@ static int auxtrace_validate_aux_sample_size(struct evlist *evlist, pr_err("Cannot add AUX area sampling because group leader is not an AUX area event\n"); return -EINVAL; } - perf_evsel__set_sample_bit(evsel, AUX); + evsel__set_sample_bit(evsel, AUX); opts->auxtrace_sample_mode = true; } else { - perf_evsel__reset_sample_bit(evsel, AUX); + evsel__reset_sample_bit(evsel, AUX); } } @@ -2397,7 +2379,7 @@ static int parse_addr_filter(struct evsel *evsel, const char *filter, static int perf_evsel__nr_addr_filter(struct evsel *evsel) { - struct perf_pmu *pmu = perf_evsel__find_pmu(evsel); + struct perf_pmu *pmu = evsel__find_pmu(evsel); int nr_addr_filters = 0; if (!pmu) diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h index 39ef333ddb6c01e5b2e7566f95909f2d09b36bb9..15d24882ce400a6e020e5636d337691536aa2cc9 100644 --- a/tools/perf/util/auxtrace.h +++ b/tools/perf/util/auxtrace.h @@ -82,6 +82,9 @@ enum itrace_period_type { * @llc: whether to synthesize last level cache events * @tlb: whether to synthesize TLB events * @remote_access: whether to synthesize remote access events + * @vm_time_correlation: perform VM Time Correlation + * @vm_tm_corr_dry_run: VM Time Correlation dry-run + * @vm_tm_corr_args: VM Time Correlation implementation-specific arguments * @callchain_sz: maximum callchain size * @last_branch_sz: branch context size * @period: 'instructions' events period @@ -113,6 +116,9 @@ struct itrace_synth_opts { bool llc; bool tlb; bool remote_access; + bool vm_time_correlation; + bool vm_tm_corr_dry_run; + char *vm_tm_corr_args; unsigned int callchain_sz; unsigned int last_branch_sz; unsigned long long period; diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c index 782c0c8a9a83662b8f5cf10aa94c26da246b680a..96e0a31a38f6e46e2e023fda45d6773f9e477b71 100644 --- a/tools/perf/util/bpf-event.c +++ b/tools/perf/util/bpf-event.c @@ -422,8 +422,7 @@ static int bpf_event__sb_cb(union perf_event *event, void *data) return 0; } -int bpf_event__add_sb_event(struct evlist **evlist, - struct perf_env *env) +int evlist__add_bpf_sb_event(struct evlist *evlist, struct perf_env *env) { struct perf_event_attr attr = { .type = PERF_TYPE_SOFTWARE, diff --git a/tools/perf/util/bpf-event.h b/tools/perf/util/bpf-event.h index 81fdc88e6c1a879062e8b36b4af013522bce7688..68f315c3df5bed03c85b7d4e758fe0fd66d7e3bc 100644 --- a/tools/perf/util/bpf-event.h +++ b/tools/perf/util/bpf-event.h @@ -33,8 +33,7 @@ struct btf_node { #ifdef HAVE_LIBBPF_SUPPORT int machine__process_bpf(struct machine *machine, union perf_event *event, struct perf_sample *sample); -int bpf_event__add_sb_event(struct evlist **evlist, - struct perf_env *env); +int evlist__add_bpf_sb_event(struct evlist *evlist, struct perf_env *env); void bpf_event__print_bpf_prog_info(struct bpf_prog_info *info, struct perf_env *env, FILE *fp); @@ -46,8 +45,8 @@ static inline int machine__process_bpf(struct machine *machine __maybe_unused, return 0; } -static inline int bpf_event__add_sb_event(struct evlist **evlist __maybe_unused, - struct perf_env *env __maybe_unused) +static inline int evlist__add_bpf_sb_event(struct evlist *evlist __maybe_unused, + struct perf_env *env __maybe_unused) { return 0; } diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c index 10c187b8b8ead6301175edd860881e6f44f52eb9..fe6767ecfbbaeb3d9c10fde0f8866033f2c1ae4f 100644 --- a/tools/perf/util/bpf-loader.c +++ b/tools/perf/util/bpf-loader.c @@ -328,12 +328,6 @@ config_bpf_program(struct bpf_program *prog) probe_conf.no_inlines = false; probe_conf.force_add = false; - config_str = bpf_program__title(prog, false); - if (IS_ERR(config_str)) { - pr_debug("bpf: unable to get title for program\n"); - return PTR_ERR(config_str); - } - priv = calloc(sizeof(*priv), 1); if (!priv) { pr_debug("bpf: failed to alloc priv\n"); @@ -341,6 +335,7 @@ config_bpf_program(struct bpf_program *prog) } pev = &priv->pev; + config_str = bpf_program__section_name(prog); pr_debug("bpf: config program '%s'\n", config_str); err = parse_prog_config(config_str, &main_str, &is_tp, pev); if (err) @@ -454,10 +449,7 @@ preproc_gen_prologue(struct bpf_program *prog, int n, if (err) { const char *title; - title = bpf_program__title(prog, false); - if (!title) - title = "[unknown]"; - + title = bpf_program__section_name(prog); pr_debug("Failed to generate prologue for program %s\n", title); return err; diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h index 88e00d268f6f2795a0f0b311a8d0d2de1fd59c56..154a05cd03af5c66023f6cfb78e90ccdd226c85b 100644 --- a/tools/perf/util/branch.h +++ b/tools/perf/util/branch.h @@ -12,6 +12,7 @@ #include #include #include +#include "event.h" struct branch_flags { u64 mispred:1; @@ -39,9 +40,30 @@ struct branch_entry { struct branch_stack { u64 nr; + u64 hw_idx; struct branch_entry entries[0]; }; +/* + * The hw_idx is only available when PERF_SAMPLE_BRANCH_HW_INDEX is applied. + * Otherwise, the output format of a sample with branch stack is + * struct branch_stack { + * u64 nr; + * struct branch_entry entries[0]; + * } + * Check whether the hw_idx is available, + * and return the corresponding pointer of entries[0]. + */ +static inline struct branch_entry *perf_sample__branch_entries(struct perf_sample *sample) +{ + u64 *entry = (u64 *)sample->branch_stack; + + entry++; + if (sample->no_hw_idx) + return (struct branch_entry *)entry; + return (struct branch_entry *)(++entry); +} + struct branch_type_stat { bool branch_to; u64 counts[PERF_BR_MAX]; diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h index aad419bb165cd9a3b38db2de85f4934d63f238d1..949f7e54c9cb0045dacb4c07cc24692c6596243c 100644 --- a/tools/perf/util/build-id.h +++ b/tools/perf/util/build-id.h @@ -29,6 +29,10 @@ int build_id__mark_dso_hit(struct perf_tool *tool, union perf_event *event, int dsos__hit_all(struct perf_session *session); +int perf_event__inject_buildid(struct perf_tool *tool, union perf_event *event, + struct perf_sample *sample, struct evsel *evsel, + struct machine *machine); + bool perf_session__read_build_ids(struct perf_session *session, bool with_hits); int perf_session__write_buildid_table(struct perf_session *session, struct feat_fd *fd); diff --git a/tools/perf/util/clockid.c b/tools/perf/util/clockid.c new file mode 100644 index 0000000000000000000000000000000000000000..74365a5d99c1f17af2beba84eeaf4fe7181f7eb7 --- /dev/null +++ b/tools/perf/util/clockid.c @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include "debug.h" +#include "clockid.h" +#include "record.h" + +struct clockid_map { + const char *name; + int clockid; +}; + +#define CLOCKID_MAP(n, c) \ + { .name = n, .clockid = (c), } + +#define CLOCKID_END { .name = NULL, } + + +/* + * Add the missing ones, we need to build on many distros... + */ +#ifndef CLOCK_MONOTONIC_RAW +#define CLOCK_MONOTONIC_RAW 4 +#endif +#ifndef CLOCK_BOOTTIME +#define CLOCK_BOOTTIME 7 +#endif +#ifndef CLOCK_TAI +#define CLOCK_TAI 11 +#endif + +static const struct clockid_map clockids[] = { + /* available for all events, NMI safe */ + CLOCKID_MAP("monotonic", CLOCK_MONOTONIC), + CLOCKID_MAP("monotonic_raw", CLOCK_MONOTONIC_RAW), + + /* available for some events */ + CLOCKID_MAP("realtime", CLOCK_REALTIME), + CLOCKID_MAP("boottime", CLOCK_BOOTTIME), + CLOCKID_MAP("tai", CLOCK_TAI), + + /* available for the lazy */ + CLOCKID_MAP("mono", CLOCK_MONOTONIC), + CLOCKID_MAP("raw", CLOCK_MONOTONIC_RAW), + CLOCKID_MAP("real", CLOCK_REALTIME), + CLOCKID_MAP("boot", CLOCK_BOOTTIME), + + CLOCKID_END, +}; + +static int get_clockid_res(clockid_t clk_id, u64 *res_ns) +{ + struct timespec res; + + *res_ns = 0; + if (!clock_getres(clk_id, &res)) + *res_ns = res.tv_nsec + res.tv_sec * NSEC_PER_SEC; + else + pr_warning("WARNING: Failed to determine specified clock resolution.\n"); + + return 0; +} + +int parse_clockid(const struct option *opt, const char *str, int unset) +{ + struct record_opts *opts = (struct record_opts *)opt->value; + const struct clockid_map *cm; + const char *ostr = str; + + if (unset) { + opts->use_clockid = 0; + return 0; + } + + /* no arg passed */ + if (!str) + return 0; + + /* no setting it twice */ + if (opts->use_clockid) + return -1; + + opts->use_clockid = true; + + /* if its a number, we're done */ + if (sscanf(str, "%d", &opts->clockid) == 1) + return get_clockid_res(opts->clockid, &opts->clockid_res_ns); + + /* allow a "CLOCK_" prefix to the name */ + if (!strncasecmp(str, "CLOCK_", 6)) + str += 6; + + for (cm = clockids; cm->name; cm++) { + if (!strcasecmp(str, cm->name)) { + opts->clockid = cm->clockid; + return get_clockid_res(opts->clockid, + &opts->clockid_res_ns); + } + } + + opts->use_clockid = false; + ui__warning("unknown clockid %s, check man page\n", ostr); + return -1; +} + +const char *clockid_name(clockid_t clk_id) +{ + const struct clockid_map *cm; + + for (cm = clockids; cm->name; cm++) { + if (cm->clockid == clk_id) + return cm->name; + } + return "(not found)"; +} diff --git a/tools/perf/util/clockid.h b/tools/perf/util/clockid.h new file mode 100644 index 0000000000000000000000000000000000000000..9b49b4711c76837a08a13d6379b49c117493d459 --- /dev/null +++ b/tools/perf/util/clockid.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __PERF_CLOCKID_H +#define __PERF_CLOCKID_H + +struct option; +int parse_clockid(const struct option *opt, const char *str, int unset); + +const char *clockid_name(clockid_t clk_id); + +#endif diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h index 2553bef1279dcba598e202aad852f151ac885774..dbc1d7e949edd757835f533381cbe9b8c6c3cd80 100644 --- a/tools/perf/util/cpumap.h +++ b/tools/perf/util/cpumap.h @@ -60,4 +60,5 @@ int cpu_map__build_map(struct perf_cpu_map *cpus, struct perf_cpu_map **res, int cpu_map__cpu(struct perf_cpu_map *cpus, int idx); bool cpu_map__has(struct perf_cpu_map *cpus, int cpu); + #endif /* __PERF_CPUMAP_H */ diff --git a/tools/perf/util/cputopo.c b/tools/perf/util/cputopo.c index 1b52402a892307d9bb32188580f4e84faf938fb2..ec77e2a7b3ca1875ef610d0f6ce3be17fc75a266 100644 --- a/tools/perf/util/cputopo.c +++ b/tools/perf/util/cputopo.c @@ -12,6 +12,7 @@ #include "cpumap.h" #include "debug.h" #include "env.h" +#include "pmu-hybrid.h" #define CORE_SIB_FMT \ "%s/devices/system/cpu/cpu%d/topology/core_siblings_list" @@ -351,3 +352,82 @@ void numa_topology__delete(struct numa_topology *tp) free(tp); } + +static int load_hybrid_node(struct hybrid_topology_node *node, + struct perf_pmu *pmu) +{ + const char *sysfs; + char path[PATH_MAX]; + char *buf = NULL, *p; + FILE *fp; + size_t len = 0; + + node->pmu_name = strdup(pmu->name); + if (!node->pmu_name) + return -1; + + sysfs = sysfs__mountpoint(); + if (!sysfs) + goto err; + + snprintf(path, PATH_MAX, CPUS_TEMPLATE_CPU, sysfs, pmu->name); + fp = fopen(path, "r"); + if (!fp) + goto err; + + if (getline(&buf, &len, fp) <= 0) { + fclose(fp); + goto err; + } + + p = strchr(buf, '\n'); + if (p) + *p = '\0'; + + fclose(fp); + node->cpus = buf; + return 0; + +err: + zfree(&node->pmu_name); + free(buf); + return -1; +} + +struct hybrid_topology *hybrid_topology__new(void) +{ + struct perf_pmu *pmu; + struct hybrid_topology *tp = NULL; + u32 nr, i = 0; + + nr = perf_pmu__hybrid_pmu_num(); + if (nr == 0) + return NULL; + + tp = zalloc(sizeof(*tp) + sizeof(tp->nodes[0]) * nr); + if (!tp) + return NULL; + + tp->nr = nr; + perf_pmu__for_each_hybrid_pmu(pmu) { + if (load_hybrid_node(&tp->nodes[i], pmu)) { + hybrid_topology__delete(tp); + return NULL; + } + i++; + } + + return tp; +} + +void hybrid_topology__delete(struct hybrid_topology *tp) +{ + u32 i; + + for (i = 0; i < tp->nr; i++) { + zfree(&tp->nodes[i].pmu_name); + zfree(&tp->nodes[i].cpus); + } + + free(tp); +} diff --git a/tools/perf/util/cputopo.h b/tools/perf/util/cputopo.h index 7bf6b811f715ea329759486168822cc00c391787..528b747152f328fbda8519007df84a2d740027d8 100644 --- a/tools/perf/util/cputopo.h +++ b/tools/perf/util/cputopo.h @@ -25,10 +25,23 @@ struct numa_topology { struct numa_topology_node nodes[0]; }; +struct hybrid_topology_node { + char *pmu_name; + char *cpus; +}; + +struct hybrid_topology { + u32 nr; + struct hybrid_topology_node nodes[]; +}; + struct cpu_topology *cpu_topology__new(void); void cpu_topology__delete(struct cpu_topology *tp); struct numa_topology *numa_topology__new(void); void numa_topology__delete(struct numa_topology *tp); +struct hybrid_topology *hybrid_topology__new(void); +void hybrid_topology__delete(struct hybrid_topology *tp); + #endif /* __PERF_CPUTOPO_H */ diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index f5a9cb4088080205f00f00da6de468760fe52a78..f9cc15f93c4a74b7c467f19127e12ba2205b5261 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1192,6 +1192,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, union perf_event *event = tidq->event_buf; struct dummy_branch_stack { u64 nr; + u64 hw_idx; struct branch_entry entries; } dummy_bs; u64 ip; @@ -1222,6 +1223,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq, if (etm->synth_opts.last_branch) { dummy_bs = (struct dummy_branch_stack){ .nr = 1, + .hw_idx = -1ULL, .entries = { .from = sample.ip, .to = sample.addr, diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c index dbc772bfb04ecbd6183ef4ccb1519c311ac900bf..c764c6da33a2aaa2ffa52387fac80f5d3eeb6834 100644 --- a/tools/perf/util/data-convert-bt.c +++ b/tools/perf/util/data-convert-bt.c @@ -21,7 +21,7 @@ #include #include #include "asm/bug.h" -#include "data-convert-bt.h" +#include "data-convert.h" #include "session.h" #include "debug.h" #include "tool.h" @@ -31,6 +31,9 @@ #include "config.h" #include #include +#include +#include "util.h" +#include "clockid.h" #define pr_N(n, fmt, ...) \ eprintf(n, debug_data_convert, fmt, ##__VA_ARGS__) @@ -1381,11 +1384,26 @@ do { \ return 0; } -static int ctf_writer__setup_clock(struct ctf_writer *cw) +static int ctf_writer__setup_clock(struct ctf_writer *cw, + struct perf_session *session, + bool tod) { struct bt_ctf_clock *clock = cw->clock; + const char *desc = "perf clock"; + int64_t offset = 0; - bt_ctf_clock_set_description(clock, "perf clock"); + if (tod) { + struct perf_env *env = &session->header.env; + + if (!env->clock.enabled) { + pr_err("Can't provide --tod time, missing clock data. " + "Please record with -k/--clockid option.\n"); + return -1; + } + + desc = clockid_name(env->clock.clockid); + offset = env->clock.tod_ns - env->clock.clockid_ns; + } #define SET(__n, __v) \ do { \ @@ -1394,8 +1412,8 @@ do { \ } while (0) SET(frequency, 1000000000); - SET(offset_s, 0); - SET(offset, 0); + SET(offset, offset); + SET(description, desc); SET(precision, 10); SET(is_absolute, 0); @@ -1481,7 +1499,8 @@ static void ctf_writer__cleanup(struct ctf_writer *cw) memset(cw, 0, sizeof(*cw)); } -static int ctf_writer__init(struct ctf_writer *cw, const char *path) +static int ctf_writer__init(struct ctf_writer *cw, const char *path, + struct perf_session *session, bool tod) { struct bt_ctf_writer *writer; struct bt_ctf_stream_class *stream_class; @@ -1505,7 +1524,7 @@ static int ctf_writer__init(struct ctf_writer *cw, const char *path) cw->clock = clock; - if (ctf_writer__setup_clock(cw)) { + if (ctf_writer__setup_clock(cw, session, tod)) { pr("Failed to setup CTF clock.\n"); goto err_cleanup; } @@ -1613,17 +1632,15 @@ int bt_convert__perf2ctf(const char *input, const char *path, if (err) return err; - /* CTF writer */ - if (ctf_writer__init(cw, path)) - return -1; - err = -1; /* perf.data session */ - session = perf_session__new(&data, 0, &c.tool); - if (IS_ERR(session)) { - err = PTR_ERR(session); - goto free_writer; - } + session = perf_session__new(&data, &c.tool); + if (IS_ERR(session)) + return PTR_ERR(session); + + /* CTF writer */ + if (ctf_writer__init(cw, path, session, opts->tod)) + goto free_session; if (c.queue_size) { ordered_events__set_alloc_size(&session->ordered_events, @@ -1632,17 +1649,17 @@ int bt_convert__perf2ctf(const char *input, const char *path, /* CTF writer env/clock setup */ if (ctf_writer__setup_env(cw, session)) - goto free_session; + goto free_writer; /* CTF events setup */ if (setup_events(cw, session)) - goto free_session; + goto free_writer; if (opts->all && setup_non_sample_events(cw, session)) - goto free_session; + goto free_writer; if (setup_streams(cw, session)) - goto free_session; + goto free_writer; err = perf_session__process_events(session); if (!err) @@ -1670,10 +1687,10 @@ int bt_convert__perf2ctf(const char *input, const char *path, return err; -free_session: - perf_session__delete(session); free_writer: ctf_writer__cleanup(cw); +free_session: + perf_session__delete(session); pr_err("Error during conversion setup.\n"); return err; } diff --git a/tools/perf/util/data-convert-bt.h b/tools/perf/util/data-convert-bt.h deleted file mode 100644 index 821674d63c4ed8f208efc1f859a34b374275065a..0000000000000000000000000000000000000000 --- a/tools/perf/util/data-convert-bt.h +++ /dev/null @@ -1,11 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __DATA_CONVERT_BT_H -#define __DATA_CONVERT_BT_H -#include "data-convert.h" -#ifdef HAVE_LIBBABELTRACE_SUPPORT - -int bt_convert__perf2ctf(const char *input_name, const char *to_ctf, - struct perf_data_convert_opts *opts); - -#endif /* HAVE_LIBBABELTRACE_SUPPORT */ -#endif /* __DATA_CONVERT_BT_H */ diff --git a/tools/perf/util/data-convert-json.c b/tools/perf/util/data-convert-json.c new file mode 100644 index 0000000000000000000000000000000000000000..f1ab6edba446bb6159971abefc8f78379b309e10 --- /dev/null +++ b/tools/perf/util/data-convert-json.c @@ -0,0 +1,384 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * JSON export. + * + * Copyright (C) 2021, CodeWeavers Inc. + */ + +#include "data-convert.h" + +#include +#include +#include +#include + +#include "linux/compiler.h" +#include "linux/err.h" +#include "util/auxtrace.h" +#include "util/debug.h" +#include "util/dso.h" +#include "util/event.h" +#include "util/evsel.h" +#include "util/evlist.h" +#include "util/header.h" +#include "util/map.h" +#include "util/session.h" +#include "util/symbol.h" +#include "util/thread.h" +#include "util/tool.h" + +struct convert_json { + struct perf_tool tool; + FILE *out; + bool first; + u64 events_count; +}; + +// Outputs a JSON-encoded string surrounded by quotes with characters escaped. +static void output_json_string(FILE *out, const char *s) +{ + fputc('"', out); + while (*s) { + switch (*s) { + + // required escapes with special forms as per RFC 8259 + case '"': fputs("\\\"", out); break; + case '\\': fputs("\\\\", out); break; + case '\b': fputs("\\b", out); break; + case '\f': fputs("\\f", out); break; + case '\n': fputs("\\n", out); break; + case '\r': fputs("\\r", out); break; + case '\t': fputs("\\t", out); break; + + default: + // all other control characters must be escaped by hex code + if (*s <= 0x1f) + fprintf(out, "\\u%04x", *s); + else + fputc(*s, out); + break; + } + + ++s; + } + fputc('"', out); +} + +// Outputs an optional comma, newline and indentation to delimit a new value +// from the previous one in a JSON object or array. +static void output_json_delimiters(FILE *out, bool comma, int depth) +{ + int i; + + if (comma) + fputc(',', out); + fputc('\n', out); + for (i = 0; i < depth; ++i) + fputc('\t', out); +} + +// Outputs a printf format string (with delimiter) as a JSON value. +__printf(4, 5) +static void output_json_format(FILE *out, bool comma, int depth, const char *format, ...) +{ + va_list args; + + output_json_delimiters(out, comma, depth); + va_start(args, format); + vfprintf(out, format, args); + va_end(args); +} + +// Outputs a JSON key-value pair where the value is a string. +static void output_json_key_string(FILE *out, bool comma, int depth, + const char *key, const char *value) +{ + output_json_delimiters(out, comma, depth); + output_json_string(out, key); + fputs(": ", out); + output_json_string(out, value); +} + +// Outputs a JSON key-value pair where the value is a printf format string. +__printf(5, 6) +static void output_json_key_format(FILE *out, bool comma, int depth, + const char *key, const char *format, ...) +{ + va_list args; + + output_json_delimiters(out, comma, depth); + output_json_string(out, key); + fputs(": ", out); + va_start(args, format); + vfprintf(out, format, args); + va_end(args); +} + +static void output_sample_callchain_entry(struct perf_tool *tool, + u64 ip, struct addr_location *al) +{ + struct convert_json *c = container_of(tool, struct convert_json, tool); + FILE *out = c->out; + + output_json_format(out, false, 4, "{"); + output_json_key_format(out, false, 5, "ip", "\"0x%" PRIx64 "\"", ip); + + if (al && al->sym && al->sym->namelen) { + fputc(',', out); + output_json_key_string(out, false, 5, "symbol", al->sym->name); + + if (al->map && al->map->dso) { + const char *dso = al->map->dso->short_name; + + if (dso && strlen(dso) > 0) { + fputc(',', out); + output_json_key_string(out, false, 5, "dso", dso); + } + } + } + + output_json_format(out, false, 4, "}"); +} + +static int process_sample_event(struct perf_tool *tool, + union perf_event *event __maybe_unused, + struct perf_sample *sample, + struct evsel *evsel __maybe_unused, + struct machine *machine) +{ + struct convert_json *c = container_of(tool, struct convert_json, tool); + FILE *out = c->out; + struct addr_location al, tal; + u8 cpumode = PERF_RECORD_MISC_USER; + + if (machine__resolve(machine, &al, sample) < 0) { + pr_err("Sample resolution failed!\n"); + return -1; + } + + ++c->events_count; + + if (c->first) + c->first = false; + else + fputc(',', out); + output_json_format(out, false, 2, "{"); + + output_json_key_format(out, false, 3, "timestamp", "%" PRIi64, sample->time); + output_json_key_format(out, true, 3, "pid", "%i", al.thread->pid_); + output_json_key_format(out, true, 3, "tid", "%i", al.thread->tid); + + if (al.thread->cpu >= 0) + output_json_key_format(out, true, 3, "cpu", "%i", al.thread->cpu); + + output_json_key_string(out, true, 3, "comm", thread__comm_str(al.thread)); + + output_json_key_format(out, true, 3, "callchain", "["); + if (sample->callchain) { + unsigned int i; + bool ok; + bool first_callchain = true; + + for (i = 0; i < sample->callchain->nr; ++i) { + u64 ip = sample->callchain->ips[i]; + + if (ip >= PERF_CONTEXT_MAX) { + switch (ip) { + case PERF_CONTEXT_HV: + cpumode = PERF_RECORD_MISC_HYPERVISOR; + break; + case PERF_CONTEXT_KERNEL: + cpumode = PERF_RECORD_MISC_KERNEL; + break; + case PERF_CONTEXT_USER: + cpumode = PERF_RECORD_MISC_USER; + break; + default: + pr_debug("invalid callchain context: %" + PRId64 "\n", (s64) ip); + break; + } + continue; + } + + if (first_callchain) + first_callchain = false; + else + fputc(',', out); + + ok = thread__find_symbol(al.thread, cpumode, ip, &tal); + output_sample_callchain_entry(tool, ip, ok ? &tal : NULL); + } + } else { + output_sample_callchain_entry(tool, sample->ip, &al); + } + output_json_format(out, false, 3, "]"); + + output_json_format(out, false, 2, "}"); + return 0; +} + +static void output_headers(struct perf_session *session, struct convert_json *c) +{ + struct stat st; + struct perf_header *header = &session->header; + int ret; + int fd = perf_data__fd(session->data); + int i; + FILE *out = c->out; + + output_json_key_format(out, false, 2, "header-version", "%u", header->version); + + ret = fstat(fd, &st); + if (ret >= 0) { + time_t stctime = st.st_mtime; + char buf[256]; + + strftime(buf, sizeof(buf), "%FT%TZ", gmtime(&stctime)); + output_json_key_string(out, true, 2, "captured-on", buf); + } else { + pr_debug("Failed to get mtime of source file, not writing captured-on"); + } + + output_json_key_format(out, true, 2, "data-offset", "%" PRIu64, header->data_offset); + output_json_key_format(out, true, 2, "data-size", "%" PRIu64, header->data_size); + output_json_key_format(out, true, 2, "feat-offset", "%" PRIu64, header->feat_offset); + + output_json_key_string(out, true, 2, "hostname", header->env.hostname); + output_json_key_string(out, true, 2, "os-release", header->env.os_release); + output_json_key_string(out, true, 2, "arch", header->env.arch); + + output_json_key_string(out, true, 2, "cpu-desc", header->env.cpu_desc); + output_json_key_string(out, true, 2, "cpuid", header->env.cpuid); + output_json_key_format(out, true, 2, "nrcpus-online", "%u", header->env.nr_cpus_online); + output_json_key_format(out, true, 2, "nrcpus-avail", "%u", header->env.nr_cpus_avail); + + if (header->env.clock.enabled) { + output_json_key_format(out, true, 2, "clockid", + "%u", header->env.clock.clockid); + output_json_key_format(out, true, 2, "clock-time", + "%" PRIu64, header->env.clock.clockid_ns); + output_json_key_format(out, true, 2, "real-time", + "%" PRIu64, header->env.clock.tod_ns); + } + + output_json_key_string(out, true, 2, "perf-version", header->env.version); + + output_json_key_format(out, true, 2, "cmdline", "["); + for (i = 0; i < header->env.nr_cmdline; i++) { + output_json_delimiters(out, i != 0, 3); + output_json_string(c->out, header->env.cmdline_argv[i]); + } + output_json_format(out, false, 2, "]"); +} + +int bt_convert__perf2json(const char *input_name, const char *output_name, + struct perf_data_convert_opts *opts __maybe_unused) +{ + struct perf_session *session; + int fd; + int ret = -1; + + struct convert_json c = { + .tool = { + .sample = process_sample_event, + .mmap = perf_event__process_mmap, + .mmap2 = perf_event__process_mmap2, + .comm = perf_event__process_comm, + .namespaces = perf_event__process_namespaces, + .cgroup = perf_event__process_cgroup, + .exit = perf_event__process_exit, + .fork = perf_event__process_fork, + .lost = perf_event__process_lost, + .tracing_data = perf_event__process_tracing_data, + .build_id = perf_event__process_build_id, + .id_index = perf_event__process_id_index, + .auxtrace_info = perf_event__process_auxtrace_info, + .auxtrace = perf_event__process_auxtrace, + .event_update = perf_event__process_event_update, + .ordered_events = true, + .ordering_requires_timestamps = true, + }, + .first = true, + .events_count = 0, + }; + + struct perf_data data = { + .mode = PERF_DATA_MODE_READ, + .path = input_name, + .force = opts->force, + }; + + if (opts->all) { + pr_err("--all is currently unsupported for JSON output.\n"); + goto err; + } + if (opts->tod) { + pr_err("--tod is currently unsupported for JSON output.\n"); + goto err; + } + + fd = open(output_name, O_CREAT | O_WRONLY | (opts->force ? O_TRUNC : O_EXCL), 0666); + if (fd == -1) { + if (errno == EEXIST) + pr_err("Output file exists. Use --force to overwrite it.\n"); + else + pr_err("Error opening output file!\n"); + goto err; + } + + c.out = fdopen(fd, "w"); + if (!c.out) { + fprintf(stderr, "Error opening output file!\n"); + close(fd); + goto err; + } + + session = perf_session__new(&data, &c.tool); + if (IS_ERR(session)) { + fprintf(stderr, "Error creating perf session!\n"); + goto err_fclose; + } + + if (symbol__init(&session->header.env) < 0) { + fprintf(stderr, "Symbol init error!\n"); + goto err_session_delete; + } + + // The opening brace is printed manually because it isn't delimited from a + // previous value (i.e. we don't want a leading newline) + fputc('{', c.out); + + // Version number for future-proofing. Most additions should be able to be + // done in a backwards-compatible way so this should only need to be bumped + // if some major breaking change must be made. + output_json_format(c.out, false, 1, "\"linux-perf-json-version\": 1"); + + // Output headers + output_json_format(c.out, true, 1, "\"headers\": {"); + output_headers(session, &c); + output_json_format(c.out, false, 1, "}"); + + // Output samples + output_json_format(c.out, true, 1, "\"samples\": ["); + perf_session__process_events(session); + output_json_format(c.out, false, 1, "]"); + output_json_format(c.out, false, 0, "}"); + fputc('\n', c.out); + + fprintf(stderr, + "[ perf data convert: Converted '%s' into JSON data '%s' ]\n", + data.path, output_name); + + fprintf(stderr, + "[ perf data convert: Converted and wrote %.3f MB (%" PRIu64 " samples) ]\n", + (ftell(c.out)) / 1024.0 / 1024.0, c.events_count); + + ret = 0; +err_session_delete: + perf_session__delete(session); +err_fclose: + fclose(c.out); +err: + return ret; +} diff --git a/tools/perf/util/data-convert.h b/tools/perf/util/data-convert.h index af90b6076c0616b31351a2d803839c16871ad678..1b4c5f598415d34a01e31150dde72a377f6b1c19 100644 --- a/tools/perf/util/data-convert.h +++ b/tools/perf/util/data-convert.h @@ -2,9 +2,20 @@ #ifndef __DATA_CONVERT_H #define __DATA_CONVERT_H +#include + struct perf_data_convert_opts { bool force; bool all; + bool tod; }; +#ifdef HAVE_LIBBABELTRACE_SUPPORT +int bt_convert__perf2ctf(const char *input_name, const char *to_ctf, + struct perf_data_convert_opts *opts); +#endif /* HAVE_LIBBABELTRACE_SUPPORT */ + +int bt_convert__perf2json(const char *input_name, const char *to_ctf, + struct perf_data_convert_opts *opts); + #endif /* __DATA_CONVERT_H */ diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c index 4da900bdb2f18123e90048a69cf327145233fd81..4f37475ad515ac3e71a38d079658a10a65011875 100644 --- a/tools/perf/util/data.c +++ b/tools/perf/util/data.c @@ -169,8 +169,21 @@ static bool check_pipe(struct perf_data *data) is_pipe = true; } - if (is_pipe) - data->file.fd = fd; + if (is_pipe) { + if (data->use_stdio) { + const char *mode; + + mode = perf_data__is_read(data) ? "r" : "w"; + data->file.fptr = fdopen(fd, mode); + + if (data->file.fptr == NULL) { + data->file.fd = fd; + data->use_stdio = false; + } + } else { + data->file.fd = fd; + } + } return data->is_pipe = is_pipe; } @@ -221,11 +234,12 @@ static bool is_dir(struct perf_data *data) static int open_file_read(struct perf_data *data) { + int flags = data->in_place_update ? O_RDWR : O_RDONLY; struct stat st; int fd; char sbuf[STRERR_BUFSIZE]; - fd = open(data->file.path, O_RDONLY); + fd = open(data->file.path, flags); if (fd < 0) { int err = errno; @@ -329,6 +343,9 @@ int perf_data__open(struct perf_data *data) if (check_pipe(data)) return 0; + /* currently it allows stdio for pipe only */ + data->use_stdio = false; + if (!data->path) data->path = "perf.data"; @@ -348,7 +365,21 @@ void perf_data__close(struct perf_data *data) perf_data__close_dir(data); zfree(&data->file.path); - close(data->file.fd); + + if (data->use_stdio) + fclose(data->file.fptr); + else + close(data->file.fd); +} + +ssize_t perf_data__read(struct perf_data *data, void *buf, size_t size) +{ + if (data->use_stdio) { + if (fread(buf, size, 1, data->file.fptr) == 1) + return size; + return feof(data->file.fptr) ? 0 : -1; + } + return readn(data->file.fd, buf, size); } ssize_t perf_data_file__write(struct perf_data_file *file, @@ -360,6 +391,11 @@ ssize_t perf_data_file__write(struct perf_data_file *file, ssize_t perf_data__write(struct perf_data *data, void *buf, size_t size) { + if (data->use_stdio) { + if (fwrite(buf, size, 1, data->file.fptr) == 1) + return size; + return -1; + } return perf_data_file__write(&data->file, buf, size); } diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h index 252d9907124964f00f3cc1b99bcd111994e3ad34..10fbadebae81688d7c5d54e886ac6bb1767e99a5 100644 --- a/tools/perf/util/data.h +++ b/tools/perf/util/data.h @@ -2,6 +2,7 @@ #ifndef __PERF_DATA_H #define __PERF_DATA_H +#include #include #include @@ -12,7 +13,10 @@ enum perf_data_mode { struct perf_data_file { char *path; - int fd; + union { + int fd; + FILE *fptr; + }; unsigned long size; }; @@ -22,6 +26,8 @@ struct perf_data { bool is_pipe; bool is_dir; bool force; + bool in_place_update; + bool use_stdio; enum perf_data_mode mode; struct { @@ -53,11 +59,15 @@ static inline bool perf_data__is_dir(struct perf_data *data) static inline int perf_data__fd(struct perf_data *data) { + if (data->use_stdio) + return fileno(data->file.fptr); + return data->file.fd; } int perf_data__open(struct perf_data *data); void perf_data__close(struct perf_data *data); +ssize_t perf_data__read(struct perf_data *data, void *buf, size_t size); ssize_t perf_data__write(struct perf_data *data, void *buf, size_t size); ssize_t perf_data_file__write(struct perf_data_file *file, diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c index 682146d0437934314fe68655ab10f6fbbd94fb5f..1ab270c2254c2d247b852337ed370a16592b3b30 100644 --- a/tools/perf/util/debug.c +++ b/tools/perf/util/debug.c @@ -24,6 +24,7 @@ #include int verbose; +int debug_peo_args; bool dump_trace = false, quiet = false; int debug_ordered_events; static int redirect_to_stderr; @@ -180,6 +181,7 @@ static struct debug_variable { { .name = "ordered-events", .ptr = &debug_ordered_events}, { .name = "stderr", .ptr = &redirect_to_stderr}, { .name = "data-convert", .ptr = &debug_data_convert }, + { .name = "perf-event-open", .ptr = &debug_peo_args }, { .name = NULL, } }; diff --git a/tools/perf/util/debug.h b/tools/perf/util/debug.h index d25ae1c4cee9ae82738ada6a8e929d8fdcffcbd1..f1734abd98dd05bd790745fb57e5e3835047e241 100644 --- a/tools/perf/util/debug.h +++ b/tools/perf/util/debug.h @@ -8,6 +8,7 @@ #include extern int verbose; +extern int debug_peo_args; extern bool quiet, dump_trace; extern int debug_ordered_events; extern int debug_data_convert; @@ -30,6 +31,14 @@ extern int debug_data_convert; #define pr_debug3(fmt, ...) pr_debugN(3, pr_fmt(fmt), ##__VA_ARGS__) #define pr_debug4(fmt, ...) pr_debugN(4, pr_fmt(fmt), ##__VA_ARGS__) +/* Special macro to print perf_event_open arguments/return value. */ +#define pr_debug2_peo(fmt, ...) { \ + if (debug_peo_args) \ + pr_debugN(0, pr_fmt(fmt), ##__VA_ARGS__); \ + else \ + pr_debugN(2, pr_fmt(fmt), ##__VA_ARGS__); \ +} + #define pr_time_N(n, var, t, fmt, ...) \ eprintf_time(n, var, t, fmt, ##__VA_ARGS__) diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c index 7f07a5dc555f8b23dc4ddf49b66bdafbe0495af2..d12921a903f99a29316354d6d77d372f9681b5fe 100644 --- a/tools/perf/util/dso.c +++ b/tools/perf/util/dso.c @@ -1114,7 +1114,7 @@ struct dso *machine__findnew_kernel(struct machine *machine, const char *name, return dso; } -void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated) +static void dso__set_long_name_id(struct dso *dso, const char *name, struct dso_id *id, bool name_allocated) { struct rb_root *root = dso->root; @@ -1127,8 +1127,8 @@ void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated) if (root) { rb_erase(&dso->rb_node, root); /* - * __dsos__findnew_link_by_longname() isn't guaranteed to add it - * back, so a clean removal is required here. + * __dsos__findnew_link_by_longname_id() isn't guaranteed to + * add it back, so a clean removal is required here. */ RB_CLEAR_NODE(&dso->rb_node); dso->root = NULL; @@ -1139,7 +1139,12 @@ void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated) dso->long_name_allocated = name_allocated; if (root) - __dsos__findnew_link_by_longname(root, dso, NULL); + __dsos__findnew_link_by_longname_id(root, dso, NULL, id); +} + +void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated) +{ + dso__set_long_name_id(dso, name, NULL, name_allocated); } void dso__set_short_name(struct dso *dso, const char *name, bool name_allocated) @@ -1180,13 +1185,15 @@ void dso__set_sorted_by_name(struct dso *dso) dso->sorted_by_name = true; } -struct dso *dso__new(const char *name) +struct dso *dso__new_id(const char *name, struct dso_id *id) { struct dso *dso = calloc(1, sizeof(*dso) + strlen(name) + 1); if (dso != NULL) { strcpy(dso->name, name); - dso__set_long_name(dso, dso->name, false); + if (id) + dso->id = *id; + dso__set_long_name_id(dso, dso->name, id, false); dso__set_short_name(dso, dso->name, false); dso->symbols = dso->symbol_names = RB_ROOT_CACHED; dso->data.cache = RB_ROOT; @@ -1217,6 +1224,11 @@ struct dso *dso__new(const char *name) return dso; } +struct dso *dso__new(const char *name) +{ + return dso__new_id(name, NULL); +} + void dso__delete(struct dso *dso) { if (!RB_EMPTY_NODE(&dso->rb_node)) diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h index 69bb77d191644dea563463bc15b081e5564afa71..90843e2213a1c13341587435339449c7d29be71c 100644 --- a/tools/perf/util/dso.h +++ b/tools/perf/util/dso.h @@ -123,6 +123,16 @@ enum dso_load_errno { #define DSO__DATA_CACHE_SIZE 4096 #define DSO__DATA_CACHE_MASK ~(DSO__DATA_CACHE_SIZE - 1) +/* + * Data about backing storage DSO, comes from PERF_RECORD_MMAP2 meta events + */ +struct dso_id { + u32 maj; + u32 min; + u64 ino; + u64 ino_generation; +}; + struct dso_cache { struct rb_node rb_node; u64 offset; @@ -197,6 +207,7 @@ struct dso { u64 db_id; }; struct nsinfo *nsinfo; + struct dso_id id; refcount_t refcnt; char name[0]; }; @@ -215,9 +226,11 @@ static inline void dso__set_loaded(struct dso *dso) dso->loaded = true; } +struct dso *dso__new_id(const char *name, struct dso_id *id); struct dso *dso__new(const char *name); void dso__delete(struct dso *dso); +int dso__cmp_id(struct dso *a, struct dso *b); void dso__set_short_name(struct dso *dso, const char *name, bool name_allocated); void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated); diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c index 3ea80d203587a281d6d7626d1555790c08530099..f62e9100091b446d46b37bd4640aed1f1f0d189d 100644 --- a/tools/perf/util/dsos.c +++ b/tools/perf/util/dsos.c @@ -9,6 +9,40 @@ #include #include // filename__read_build_id +static int __dso_id__cmp(struct dso_id *a, struct dso_id *b) +{ + if (a->maj > b->maj) return -1; + if (a->maj < b->maj) return 1; + + if (a->min > b->min) return -1; + if (a->min < b->min) return 1; + + if (a->ino > b->ino) return -1; + if (a->ino < b->ino) return 1; + + if (a->ino_generation > b->ino_generation) return -1; + if (a->ino_generation < b->ino_generation) return 1; + + return 0; +} + +static int dso_id__cmp(struct dso_id *a, struct dso_id *b) +{ + /* + * The second is always dso->id, so zeroes if not set, assume passing + * NULL for a means a zeroed id + */ + if (a == NULL) + return 0; + + return __dso_id__cmp(a, b); +} + +int dso__cmp_id(struct dso *a, struct dso *b) +{ + return __dso_id__cmp(&a->id, &b->id); +} + bool __dsos__read_build_ids(struct list_head *head, bool with_hits) { bool have_build_id = false; @@ -34,12 +68,30 @@ bool __dsos__read_build_ids(struct list_head *head, bool with_hits) return have_build_id; } +static int __dso__cmp_long_name(const char *long_name, struct dso_id *id, struct dso *b) +{ + int rc = strcmp(long_name, b->long_name); + return rc ?: dso_id__cmp(id, &b->id); +} + +static int __dso__cmp_short_name(const char *short_name, struct dso_id *id, struct dso *b) +{ + int rc = strcmp(short_name, b->short_name); + return rc ?: dso_id__cmp(id, &b->id); +} + +static int dso__cmp_short_name(struct dso *a, struct dso *b) +{ + return __dso__cmp_short_name(a->short_name, &a->id, b); +} + /* * Find a matching entry and/or link current entry to RB tree. * Either one of the dso or name parameter must be non-NULL or the * function will not work. */ -struct dso *__dsos__findnew_link_by_longname(struct rb_root *root, struct dso *dso, const char *name) +struct dso *__dsos__findnew_link_by_longname_id(struct rb_root *root, struct dso *dso, + const char *name, struct dso_id *id) { struct rb_node **p = &root->rb_node; struct rb_node *parent = NULL; @@ -51,7 +103,7 @@ struct dso *__dsos__findnew_link_by_longname(struct rb_root *root, struct dso *d */ while (*p) { struct dso *this = rb_entry(*p, struct dso, rb_node); - int rc = strcmp(name, this->long_name); + int rc = __dso__cmp_long_name(name, id, this); parent = *p; if (rc == 0) { @@ -67,7 +119,7 @@ struct dso *__dsos__findnew_link_by_longname(struct rb_root *root, struct dso *d * In this case, the short name should be different. * Comparing the short names to differentiate the DSOs. */ - rc = strcmp(dso->short_name, this->short_name); + rc = dso__cmp_short_name(dso, this); if (rc == 0) { pr_err("Duplicated dso name: %s\n", name); return NULL; @@ -90,7 +142,7 @@ struct dso *__dsos__findnew_link_by_longname(struct rb_root *root, struct dso *d void __dsos__add(struct dsos *dsos, struct dso *dso) { list_add_tail(&dso->node, &dsos->head); - __dsos__findnew_link_by_longname(&dsos->root, dso, NULL); + __dsos__findnew_link_by_longname_id(&dsos->root, dso, NULL, &dso->id); /* * It is now in the linked list, grab a reference, then garbage collect * this when needing memory, by looking at LRU dso instances in the @@ -121,17 +173,27 @@ void dsos__add(struct dsos *dsos, struct dso *dso) up_write(&dsos->lock); } -struct dso *__dsos__find(struct dsos *dsos, const char *name, bool cmp_short) +static struct dso *__dsos__findnew_by_longname_id(struct rb_root *root, const char *name, struct dso_id *id) +{ + return __dsos__findnew_link_by_longname_id(root, NULL, name, id); +} + +static struct dso *__dsos__find_id(struct dsos *dsos, const char *name, struct dso_id *id, bool cmp_short) { struct dso *pos; if (cmp_short) { list_for_each_entry(pos, &dsos->head, node) - if (strcmp(pos->short_name, name) == 0) + if (__dso__cmp_short_name(name, id, pos) == 0) return pos; return NULL; } - return __dsos__findnew_by_longname(&dsos->root, name); + return __dsos__findnew_by_longname_id(&dsos->root, name, id); +} + +struct dso *__dsos__find(struct dsos *dsos, const char *name, bool cmp_short) +{ + return __dsos__find_id(dsos, name, NULL, cmp_short); } struct dso *dsos__find(struct dsos *dsos, const char *name, bool cmp_short) @@ -175,9 +237,9 @@ static void dso__set_basename(struct dso *dso) dso__set_short_name(dso, base, true); } -struct dso *__dsos__addnew(struct dsos *dsos, const char *name) +static struct dso *__dsos__addnew_id(struct dsos *dsos, const char *name, struct dso_id *id) { - struct dso *dso = dso__new(name); + struct dso *dso = dso__new_id(name, id); if (dso != NULL) { __dsos__add(dsos, dso); @@ -188,18 +250,22 @@ struct dso *__dsos__addnew(struct dsos *dsos, const char *name) return dso; } -struct dso *__dsos__findnew(struct dsos *dsos, const char *name) +struct dso *__dsos__addnew(struct dsos *dsos, const char *name) { - struct dso *dso = __dsos__find(dsos, name, false); + return __dsos__addnew_id(dsos, name, NULL); +} - return dso ? dso : __dsos__addnew(dsos, name); +static struct dso *__dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id *id) +{ + struct dso *dso = __dsos__find_id(dsos, name, id, false); + return dso ? dso : __dsos__addnew_id(dsos, name, id); } -struct dso *dsos__findnew(struct dsos *dsos, const char *name) +struct dso *dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id *id) { struct dso *dso; down_write(&dsos->lock); - dso = dso__get(__dsos__findnew(dsos, name)); + dso = dso__get(__dsos__findnew_id(dsos, name, id)); up_write(&dsos->lock); return dso; } diff --git a/tools/perf/util/dsos.h b/tools/perf/util/dsos.h index 32f1fbee0feb22841088f6986fd5c7302f17c784..a9b552af1d34688f75dc190d0c00550d2130cc5f 100644 --- a/tools/perf/util/dsos.h +++ b/tools/perf/util/dsos.h @@ -9,6 +9,7 @@ #include "rwsem.h" struct dso; +struct dso_id; /* * DSOs are put into both a list for fast iteration and rbtree for fast @@ -25,15 +26,11 @@ void dsos__add(struct dsos *dsos, struct dso *dso); struct dso *__dsos__addnew(struct dsos *dsos, const char *name); struct dso *__dsos__find(struct dsos *dsos, const char *name, bool cmp_short); struct dso *dsos__find(struct dsos *dsos, const char *name, bool cmp_short); -struct dso *__dsos__findnew(struct dsos *dsos, const char *name); -struct dso *dsos__findnew(struct dsos *dsos, const char *name); -struct dso *__dsos__findnew_link_by_longname(struct rb_root *root, struct dso *dso, const char *name); - -static inline struct dso *__dsos__findnew_by_longname(struct rb_root *root, const char *name) -{ - return __dsos__findnew_link_by_longname(root, NULL, name); -} +struct dso *dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id *id); + +struct dso *__dsos__findnew_link_by_longname_id(struct rb_root *root, struct dso *dso, + const char *name, struct dso_id *id); bool __dsos__read_build_ids(struct list_head *head, bool with_hits); diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c index ef64e197bc8df475d9c8ecc0e438084bd121d701..9a2e9c8a1ea09037c4d88a2e6a482d267b749103 100644 --- a/tools/perf/util/env.c +++ b/tools/perf/util/env.c @@ -2,6 +2,7 @@ #include "cpumap.h" #include "debug.h" #include "env.h" +#include "util/header.h" #include #include #include "bpf-event.h" @@ -10,6 +11,7 @@ #include #include #include +#include "strbuf.h" struct perf_env perf_env; @@ -167,7 +169,7 @@ static void perf_env__purge_bpf(struct perf_env *env) void perf_env__exit(struct perf_env *env) { - int i; + int i, j; perf_env__purge_bpf(env); zfree(&env->hostname); @@ -183,6 +185,9 @@ void perf_env__exit(struct perf_env *env) zfree(&env->sibling_threads); zfree(&env->pmu_mappings); zfree(&env->cpu); + for (i = 0; i < env->nr_cpu_pmu_caps; i++) + zfree(&env->cpu_pmu_caps[i]); + zfree(&env->numa_map); for (i = 0; i < env->nr_numa_nodes; i++) perf_cpu_map__put(env->numa_nodes[i].map); @@ -195,6 +200,20 @@ void perf_env__exit(struct perf_env *env) for (i = 0; i < env->nr_memory_nodes; i++) zfree(&env->memory_nodes[i].set); zfree(&env->memory_nodes); + + for (i = 0; i < env->nr_hybrid_nodes; i++) { + zfree(&env->hybrid_nodes[i].pmu_name); + zfree(&env->hybrid_nodes[i].cpus); + } + zfree(&env->hybrid_nodes); + + for (i = 0; i < env->nr_pmus_with_caps; i++) { + for (j = 0; j < env->pmu_caps[i].nr_caps; j++) + zfree(&env->pmu_caps[i].caps[j]); + zfree(&env->pmu_caps[i].caps); + zfree(&env->pmu_caps[i].pmu_name); + } + zfree(&env->pmu_caps); } void perf_env__init(struct perf_env *env) @@ -260,6 +279,60 @@ int perf_env__read_cpu_topology_map(struct perf_env *env) return 0; } +int perf_env__read_pmu_mappings(struct perf_env *env) +{ + struct perf_pmu *pmu = NULL; + u32 pmu_num = 0; + struct strbuf sb; + + while ((pmu = perf_pmu__scan(pmu))) { + if (!pmu->name) + continue; + pmu_num++; + } + if (!pmu_num) { + pr_debug("pmu mappings not available\n"); + return -ENOENT; + } + env->nr_pmu_mappings = pmu_num; + + if (strbuf_init(&sb, 128 * pmu_num) < 0) + return -ENOMEM; + + while ((pmu = perf_pmu__scan(pmu))) { + if (!pmu->name) + continue; + if (strbuf_addf(&sb, "%u:%s", pmu->type, pmu->name) < 0) + goto error; + /* include a NULL character at the end */ + if (strbuf_add(&sb, "", 1) < 0) + goto error; + } + + env->pmu_mappings = strbuf_detach(&sb, NULL); + + return 0; + +error: + strbuf_release(&sb); + return -1; +} + +int perf_env__read_cpuid(struct perf_env *env) +{ + char cpuid[128]; + int err = get_cpuid(cpuid, sizeof(cpuid)); + + if (err) + return err; + + free(env->cpuid); + env->cpuid = strdup(cpuid); + if (env->cpuid == NULL) + return ENOMEM; + return 0; +} + static int perf_env__read_arch(struct perf_env *env) { struct utsname uts; @@ -342,3 +415,128 @@ const char *perf_env__arch(struct perf_env *env) return normalize_arch(arch_name); } + +const char *perf_env__cpuid(struct perf_env *env) +{ + int status; + + if (!env || !env->cpuid) { /* Assume local operation */ + status = perf_env__read_cpuid(env); + if (status) + return NULL; + } + + return env->cpuid; +} + +int perf_env__nr_pmu_mappings(struct perf_env *env) +{ + int status; + + if (!env || !env->nr_pmu_mappings) { /* Assume local operation */ + status = perf_env__read_pmu_mappings(env); + if (status) + return 0; + } + + return env->nr_pmu_mappings; +} + +const char *perf_env__pmu_mappings(struct perf_env *env) +{ + int status; + + if (!env || !env->pmu_mappings) { /* Assume local operation */ + status = perf_env__read_pmu_mappings(env); + if (status) + return NULL; + } + + return env->pmu_mappings; +} + +int perf_env__numa_node(struct perf_env *env, int cpu) +{ + if (!env->nr_numa_map) { + struct numa_node *nn; + int i, nr = 0; + + for (i = 0; i < env->nr_numa_nodes; i++) { + nn = &env->numa_nodes[i]; + nr = max(nr, perf_cpu_map__max(nn->map)); + } + + nr++; + + /* + * We initialize the numa_map array to prepare + * it for missing cpus, which return node -1 + */ + env->numa_map = malloc(nr * sizeof(int)); + if (!env->numa_map) + return -1; + + for (i = 0; i < nr; i++) + env->numa_map[i] = -1; + + env->nr_numa_map = nr; + + for (i = 0; i < env->nr_numa_nodes; i++) { + int tmp, j; + + nn = &env->numa_nodes[i]; + perf_cpu_map__for_each_cpu(j, tmp, nn->map) + env->numa_map[j] = i; + } + } + + return cpu >= 0 && cpu < env->nr_numa_map ? env->numa_map[cpu] : -1; +} + +char *perf_env__find_pmu_cap(struct perf_env *env, const char *pmu_name, + const char *cap) +{ + char *cap_eq; + int cap_size; + char **ptr; + int i, j; + + if (!pmu_name || !cap) + return NULL; + + cap_size = strlen(cap); + cap_eq = zalloc(cap_size + 2); + if (!cap_eq) + return NULL; + + memcpy(cap_eq, cap, cap_size); + cap_eq[cap_size] = '='; + + if (!strcmp(pmu_name, "cpu")) { + for (i = 0; i < env->nr_cpu_pmu_caps; i++) { + if (!strncmp(env->cpu_pmu_caps[i], cap_eq, cap_size + 1)) { + free(cap_eq); + return &env->cpu_pmu_caps[i][cap_size + 1]; + } + } + goto out; + } + + for (i = 0; i < env->nr_pmus_with_caps; i++) { + if (strcmp(env->pmu_caps[i].pmu_name, pmu_name)) + continue; + + ptr = env->pmu_caps[i].caps; + + for (j = 0; j < env->pmu_caps[i].nr_caps; j++) { + if (!strncmp(ptr[j], cap_eq, cap_size + 1)) { + free(cap_eq); + return &ptr[j][cap_size + 1]; + } + } + } + +out: + free(cap_eq); + return NULL; +} diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h index 37028215d4a535e9eb5333dc74671d93d35266e7..3bc7ee843df4d1f8266f92d10a742ff211b45c61 100644 --- a/tools/perf/util/env.h +++ b/tools/perf/util/env.h @@ -37,6 +37,18 @@ struct memory_node { unsigned long *set; }; +struct hybrid_node { + char *pmu_name; + char *cpus; +}; + +struct pmu_caps { + int nr_caps; + unsigned int max_branches; + char **caps; + char *pmu_name; +}; + struct perf_env { char *hostname; char *os_release; @@ -48,6 +60,7 @@ struct perf_env { char *cpuid; unsigned long long total_mem; unsigned int msr_pmu_type; + unsigned int max_branches; int nr_cmdline; int nr_sibling_cores; @@ -57,12 +70,16 @@ struct perf_env { int nr_memory_nodes; int nr_pmu_mappings; int nr_groups; + int nr_cpu_pmu_caps; + int nr_hybrid_nodes; + int nr_pmus_with_caps; char *cmdline; const char **cmdline_argv; char *sibling_cores; char *sibling_dies; char *sibling_threads; char *pmu_mappings; + char **cpu_pmu_caps; struct cpu_topology_map *cpu; struct cpu_cache_level *caches; int caches_cnt; @@ -76,6 +93,8 @@ struct perf_env { unsigned long long memory_bsize; u64 clockid_res_ns; + struct hybrid_node *hybrid_nodes; + struct pmu_caps *pmu_caps; /* * bpf_info_lock protects bpf rbtrees. This is needed because the * trees are accessed by different threads in perf-top @@ -87,6 +106,22 @@ struct perf_env { struct rb_root btfs; u32 btfs_cnt; } bpf_progs; + + /* For fast cpu to numa node lookup via perf_env__numa_node */ + int *numa_map; + int nr_numa_map; + + /* For real clock time reference. */ + struct { + u64 tod_ns; + u64 clockid_ns; + int clockid; + /* + * enabled is valid for report mode, and is true if above + * values are set, it's set in process_clock_data + */ + bool enabled; + } clock; }; enum perf_compress_type { @@ -104,11 +139,17 @@ void perf_env__exit(struct perf_env *env); int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]); +int perf_env__read_cpuid(struct perf_env *env); +int perf_env__read_pmu_mappings(struct perf_env *env); +int perf_env__nr_pmu_mappings(struct perf_env *env); +const char *perf_env__pmu_mappings(struct perf_env *env); + int perf_env__read_cpu_topology_map(struct perf_env *env); void cpu_cache_level__free(struct cpu_cache_level *cache); const char *perf_env__arch(struct perf_env *env); +const char *perf_env__cpuid(struct perf_env *env); const char *perf_env__raw_arch(struct perf_env *env); int perf_env__nr_cpus_avail(struct perf_env *env); @@ -119,4 +160,8 @@ struct bpf_prog_info_node *perf_env__find_bpf_prog_info(struct perf_env *env, __u32 prog_id); bool perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node); struct btf_node *perf_env__find_btf(struct perf_env *env, __u32 btf_id); + +int perf_env__numa_node(struct perf_env *env, int cpu); +char *perf_env__find_pmu_cap(struct perf_env *env, const char *pmu_name, + const char *cap); #endif /* __PERF_ENV_H */ diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index bfaa9afdb8b4cb9d81a37c5c366e34f3d6857509..49b809419a121d55fe0ea7fe6e190c0713a16276 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -54,6 +54,7 @@ static const char *perf_event__names[] = { [PERF_RECORD_NAMESPACES] = "NAMESPACES", [PERF_RECORD_KSYMBOL] = "KSYMBOL", [PERF_RECORD_BPF_EVENT] = "BPF_EVENT", + [PERF_RECORD_CGROUP] = "CGROUP", [PERF_RECORD_HEADER_ATTR] = "ATTR", [PERF_RECORD_HEADER_EVENT_TYPE] = "EVENT_TYPE", [PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA", @@ -180,6 +181,12 @@ size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp) return ret; } +size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp) +{ + return fprintf(fp, " cgroup: %" PRI_lu64 " %s\n", + event->cgroup.id, event->cgroup.path); +} + int perf_event__process_comm(struct perf_tool *tool __maybe_unused, union perf_event *event, struct perf_sample *sample, @@ -196,6 +203,14 @@ int perf_event__process_namespaces(struct perf_tool *tool __maybe_unused, return machine__process_namespaces_event(machine, event, sample); } +int perf_event__process_cgroup(struct perf_tool *tool __maybe_unused, + union perf_event *event, + struct perf_sample *sample, + struct machine *machine) +{ + return machine__process_cgroup_event(machine, event, sample); +} + int perf_event__process_lost(struct perf_tool *tool __maybe_unused, union perf_event *event, struct perf_sample *sample, @@ -417,6 +432,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp) case PERF_RECORD_NAMESPACES: ret += perf_event__fprintf_namespaces(event, fp); break; + case PERF_RECORD_CGROUP: + ret += perf_event__fprintf_cgroup(event, fp); + break; case PERF_RECORD_MMAP2: ret += perf_event__fprintf_mmap2(event, fp); break; diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h index 85223159737c1be1554dd6ca8c8d95495d304722..c2fd88a2c09ee19d4d0e9dacec900ab9ea7b0c20 100644 --- a/tools/perf/util/event.h +++ b/tools/perf/util/event.h @@ -135,10 +135,14 @@ struct perf_sample { u32 raw_size; u64 data_src; u64 phys_addr; + u64 data_page_size; + u64 code_page_size; + u64 cgroup; u32 flags; u16 insn_len; u8 cpumode; u16 misc; + bool no_hw_idx; /* No hw_idx collected in branch_stack */ char insn[MAX_INSN]; void *raw_data; struct ip_callchain *callchain; @@ -321,6 +325,10 @@ int perf_event__process_namespaces(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, struct machine *machine); +int perf_event__process_cgroup(struct perf_tool *tool, + union perf_event *event, + struct perf_sample *sample, + struct machine *machine); int perf_event__process_mmap(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, @@ -376,6 +384,7 @@ size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp); size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp); size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp); size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp); +size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp); size_t perf_event__fprintf_ksymbol(union perf_event *event, FILE *fp); size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp); size_t perf_event__fprintf(union perf_event *event, FILE *fp); diff --git a/tools/perf/util/evlist-hybrid.c b/tools/perf/util/evlist-hybrid.c new file mode 100644 index 0000000000000000000000000000000000000000..e11998526f2e7508a3d5219b19fdf0e5b7eaa004 --- /dev/null +++ b/tools/perf/util/evlist-hybrid.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include "cpumap.h" +#include "evlist.h" +#include "evsel.h" +#include "../perf.h" +#include "util/pmu-hybrid.h" +#include "util/evlist-hybrid.h" +#include +#include +#include +#include +#include +#include +#include + +int evlist__add_default_hybrid(struct evlist *evlist, bool precise) +{ + struct evsel *evsel; + struct perf_pmu *pmu; + __u64 config; + struct perf_cpu_map *cpus; + + perf_pmu__for_each_hybrid_pmu(pmu) { + config = PERF_COUNT_HW_CPU_CYCLES | + ((__u64)pmu->type << PERF_PMU_TYPE_SHIFT); + evsel = evsel__new_cycles(precise, PERF_TYPE_HARDWARE, + config); + if (!evsel) + return -ENOMEM; + + cpus = perf_cpu_map__get(pmu->cpus); + evsel->core.cpus = cpus; + evsel->core.own_cpus = perf_cpu_map__get(cpus); + evsel->pmu_name = strdup(pmu->name); + evlist__add(evlist, evsel); + } + + return 0; +} diff --git a/tools/perf/util/evlist-hybrid.h b/tools/perf/util/evlist-hybrid.h new file mode 100644 index 0000000000000000000000000000000000000000..e25861649d8f84cb779a8ab9ef08f7a27f68a304 --- /dev/null +++ b/tools/perf/util/evlist-hybrid.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __PERF_EVLIST_HYBRID_H +#define __PERF_EVLIST_HYBRID_H + +#include +#include +#include "evlist.h" +#include + +int evlist__add_default_hybrid(struct evlist *evlist, bool precise); + +#endif /* __PERF_EVLIST_HYBRID_H */ diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 505b890ac85cc463999e165ae58d0e92d3e11346..84c252d28eb63bf02cb5df3b10db4d201500d2f4 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -21,6 +21,9 @@ #include "../perf.h" #include "asm/bug.h" #include "bpf-event.h" +#include "util/string2.h" +#include "util/perf_api_probe.h" +#include "util/evlist-hybrid.h" #include #include #include @@ -42,6 +45,7 @@ #include #include #include +#include #include @@ -76,7 +80,7 @@ struct evlist *perf_evlist__new_default(void) { struct evlist *evlist = evlist__new(); - if (evlist && perf_evlist__add_default(evlist)) { + if (evlist && evlist__add_default(evlist)) { evlist__delete(evlist); evlist = NULL; } @@ -88,7 +92,7 @@ struct evlist *perf_evlist__new_dummy(void) { struct evlist *evlist = evlist__new(); - if (evlist && perf_evlist__add_dummy(evlist)) { + if (evlist && evlist__add_dummy(evlist)) { evlist__delete(evlist); evlist = NULL; } @@ -208,10 +212,12 @@ void perf_evlist__set_leader(struct evlist *evlist) } } -int __perf_evlist__add_default(struct evlist *evlist, bool precise) +int __evlist__add_default(struct evlist *evlist, bool precise) { - struct evsel *evsel = perf_evsel__new_cycles(precise); + struct evsel *evsel; + evsel = evsel__new_cycles(precise, PERF_TYPE_HARDWARE, + PERF_COUNT_HW_CPU_CYCLES); if (evsel == NULL) return -ENOMEM; @@ -219,14 +225,14 @@ int __perf_evlist__add_default(struct evlist *evlist, bool precise) return 0; } -int perf_evlist__add_dummy(struct evlist *evlist) +int evlist__add_dummy(struct evlist *evlist) { struct perf_event_attr attr = { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_DUMMY, .size = sizeof(attr), /* to capture ABI version */ }; - struct evsel *evsel = perf_evsel__new_idx(&attr, evlist->core.nr_entries); + struct evsel *evsel = evsel__new_idx(&attr, evlist->core.nr_entries); if (evsel == NULL) return -ENOMEM; @@ -235,15 +241,14 @@ int perf_evlist__add_dummy(struct evlist *evlist) return 0; } -static int evlist__add_attrs(struct evlist *evlist, - struct perf_event_attr *attrs, size_t nr_attrs) +static int evlist__add_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs) { struct evsel *evsel, *n; LIST_HEAD(head); size_t i; for (i = 0; i < nr_attrs; i++) { - evsel = perf_evsel__new_idx(attrs + i, evlist->core.nr_entries + i); + evsel = evsel__new_idx(attrs + i, evlist->core.nr_entries + i); if (evsel == NULL) goto out_delete_partial_list; list_add_tail(&evsel->core.node, &head); @@ -259,8 +264,7 @@ static int evlist__add_attrs(struct evlist *evlist, return -1; } -int __perf_evlist__add_default_attrs(struct evlist *evlist, - struct perf_event_attr *attrs, size_t nr_attrs) +int __evlist__add_default_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs) { size_t i; @@ -299,10 +303,9 @@ perf_evlist__find_tracepoint_by_name(struct evlist *evlist, return NULL; } -int perf_evlist__add_newtp(struct evlist *evlist, - const char *sys, const char *name, void *handler) +int evlist__add_newtp(struct evlist *evlist, const char *sys, const char *name, void *handler) { - struct evsel *evsel = perf_evsel__newtp(sys, name); + struct evsel *evsel = evsel__newtp(sys, name); if (IS_ERR(evsel)) return -1; @@ -321,6 +324,38 @@ static int perf_evlist__nr_threads(struct evlist *evlist, return perf_thread_map__nr(evlist->core.threads); } +void evlist__cpu_iter_start(struct evlist *evlist) +{ + struct evsel *pos; + + /* + * Reset the per evsel cpu_iter. This is needed because + * each evsel's cpumap may have a different index space, + * and some operations need the index to modify + * the FD xyarray (e.g. open, close) + */ + evlist__for_each_entry(evlist, pos) + pos->cpu_iter = 0; +} + +bool evsel__cpu_iter_skip_no_inc(struct evsel *ev, int cpu) +{ + if (ev->cpu_iter >= ev->core.cpus->nr) + return true; + if (cpu >= 0 && ev->core.cpus->map[ev->cpu_iter] != cpu) + return true; + return false; +} + +bool evsel__cpu_iter_skip(struct evsel *ev, int cpu) +{ + if (!evsel__cpu_iter_skip_no_inc(ev, cpu)) { + ev->cpu_iter++; + return false; + } + return true; +} + void evlist__disable(struct evlist *evlist) { struct evsel *pos; @@ -409,7 +444,7 @@ static void perf_evlist__munmap_filtered(struct fdarray *fda, int fd, struct mmap *map = fda->priv[fd].ptr; if (map) - perf_mmap__put(map); + perf_mmap__put(&map->core); } int evlist__filter_pollfd(struct evlist *evlist, short revents_and_mask) @@ -577,11 +612,11 @@ static void evlist__munmap_nofree(struct evlist *evlist) if (evlist->mmap) for (i = 0; i < evlist->core.nr_mmaps; i++) - perf_mmap__munmap(&evlist->mmap[i]); + perf_mmap__munmap(&evlist->mmap[i].core); if (evlist->overwrite_mmap) for (i = 0; i < evlist->core.nr_mmaps; i++) - perf_mmap__munmap(&evlist->overwrite_mmap[i]); + perf_mmap__munmap(&evlist->overwrite_mmap[i].core); } void evlist__munmap(struct evlist *evlist) @@ -591,6 +626,13 @@ void evlist__munmap(struct evlist *evlist) zfree(&evlist->overwrite_mmap); } +static void perf_mmap__unmap_cb(struct perf_mmap *map) +{ + struct mmap *m = container_of(map, struct mmap, core); + + mmap__munmap(m); +} + static struct mmap *evlist__alloc_mmap(struct evlist *evlist, bool overwrite) { @@ -605,8 +647,6 @@ static struct mmap *evlist__alloc_mmap(struct evlist *evlist, return NULL; for (i = 0; i < evlist->core.nr_mmaps; i++) { - map[i].core.fd = -1; - map[i].core.overwrite = overwrite; /* * When the perf_mmap() call is made we grab one refcount, plus * one extra to let perf_mmap__consume() get the last @@ -616,8 +656,9 @@ static struct mmap *evlist__alloc_mmap(struct evlist *evlist, * Each PERF_EVENT_IOC_SET_OUTPUT points to this mmap and * thus does perf_mmap__get() on it. */ - refcount_set(&map[i].core.refcnt, 0); + perf_mmap__init(&map[i].core, overwrite, perf_mmap__unmap_cb); } + return map; } @@ -644,7 +685,7 @@ static int evlist__mmap_per_evsel(struct evlist *evlist, int idx, int fd; int cpu; - mp->prot = PROT_READ | PROT_WRITE; + mp->core.prot = PROT_READ | PROT_WRITE; if (evsel->core.attr.write_backward) { output = _output_overwrite; maps = evlist->overwrite_mmap; @@ -657,7 +698,7 @@ static int evlist__mmap_per_evsel(struct evlist *evlist, int idx, if (evlist->bkw_mmap_state == BKW_MMAP_NOTREADY) perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_RUNNING); } - mp->prot &= ~PROT_WRITE; + mp->core.prot &= ~PROT_WRITE; } if (evsel->core.system_wide && thread) @@ -672,13 +713,13 @@ static int evlist__mmap_per_evsel(struct evlist *evlist, int idx, if (*output == -1) { *output = fd; - if (perf_mmap__mmap(&maps[idx], mp, *output, evlist_cpu) < 0) + if (mmap__mmap(&maps[idx], mp, *output, evlist_cpu) < 0) return -1; } else { if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0) return -1; - perf_mmap__get(&maps[idx]); + perf_mmap__get(&maps[idx].core); } revent = perf_evlist__should_poll(evlist, evsel) ? POLLIN : 0; @@ -692,7 +733,7 @@ static int evlist__mmap_per_evsel(struct evlist *evlist, int idx, */ if (!evsel->core.system_wide && perf_evlist__add_pollfd(&evlist->core, fd, &maps[idx], revent) < 0) { - perf_mmap__put(&maps[idx]); + perf_mmap__put(&maps[idx].core); return -1; } @@ -898,8 +939,12 @@ int evlist__mmap_ex(struct evlist *evlist, unsigned int pages, * Its value is decided by evsel's write_backward. * So &mp should not be passed through const pointer. */ - struct mmap_params mp = { .nr_cblocks = nr_cblocks, .affinity = affinity, .flush = flush, - .comp_level = comp_level }; + struct mmap_params mp = { + .nr_cblocks = nr_cblocks, + .affinity = affinity, + .flush = flush, + .comp_level = comp_level + }; if (!evlist->mmap) evlist->mmap = evlist__alloc_mmap(evlist, false); @@ -911,7 +956,7 @@ int evlist__mmap_ex(struct evlist *evlist, unsigned int pages, evlist->core.mmap_len = evlist__mmap_size(pages); pr_debug("mmap size %zuB\n", evlist->core.mmap_len); - mp.mask = evlist->core.mmap_len - page_size - 1; + mp.core.mask = evlist->core.mmap_len - page_size - 1; auxtrace_mmap_params__init(&mp.auxtrace_mp, evlist->core.mmap_len, auxtrace_pages, auxtrace_overwrite); @@ -993,7 +1038,7 @@ void __perf_evlist__set_sample_bit(struct evlist *evlist, struct evsel *evsel; evlist__for_each_entry(evlist, evsel) - __perf_evsel__set_sample_bit(evsel, bit); + __evsel__set_sample_bit(evsel, bit); } void __perf_evlist__reset_sample_bit(struct evlist *evlist, @@ -1002,7 +1047,7 @@ void __perf_evlist__reset_sample_bit(struct evlist *evlist, struct evsel *evsel; evlist__for_each_entry(evlist, evsel) - __perf_evsel__reset_sample_bit(evsel, bit); + __evsel__reset_sample_bit(evsel, bit); } int perf_evlist__apply_filters(struct evlist *evlist, struct evsel **err_evsel) @@ -1033,6 +1078,9 @@ int perf_evlist__set_tp_filter(struct evlist *evlist, const char *filter) struct evsel *evsel; int err = 0; + if (filter == NULL) + return -1; + evlist__for_each_entry(evlist, evsel) { if (evsel->core.attr.type != PERF_TYPE_TRACEPOINT) continue; @@ -1045,16 +1093,35 @@ int perf_evlist__set_tp_filter(struct evlist *evlist, const char *filter) return err; } -int perf_evlist__set_tp_filter_pids(struct evlist *evlist, size_t npids, pid_t *pids) +int perf_evlist__append_tp_filter(struct evlist *evlist, const char *filter) +{ + struct evsel *evsel; + int err = 0; + + if (filter == NULL) + return -1; + + evlist__for_each_entry(evlist, evsel) { + if (evsel->core.attr.type != PERF_TYPE_TRACEPOINT) + continue; + + err = perf_evsel__append_tp_filter(evsel, filter); + if (err) + break; + } + + return err; +} + +char *asprintf__tp_filter_pids(size_t npids, pid_t *pids) { char *filter; - int ret = -1; size_t i; for (i = 0; i < npids; ++i) { if (i == 0) { if (asprintf(&filter, "common_pid != %d", pids[i]) < 0) - return -1; + return NULL; } else { char *tmp; @@ -1066,8 +1133,17 @@ int perf_evlist__set_tp_filter_pids(struct evlist *evlist, size_t npids, pid_t * } } - ret = perf_evlist__set_tp_filter(evlist, filter); + return filter; out_free: + free(filter); + return NULL; +} + +int perf_evlist__set_tp_filter_pids(struct evlist *evlist, size_t npids, pid_t *pids) +{ + char *filter = asprintf__tp_filter_pids(npids, pids); + int ret = perf_evlist__set_tp_filter(evlist, filter); + free(filter); return ret; } @@ -1646,7 +1722,8 @@ void perf_evlist__force_leader(struct evlist *evlist) } struct evsel *perf_evlist__reset_weak_group(struct evlist *evsel_list, - struct evsel *evsel) + struct evsel *evsel, + bool close) { struct evsel *c2, *leader; bool is_open = true; @@ -1663,141 +1740,16 @@ struct evsel *perf_evlist__reset_weak_group(struct evlist *evsel_list, if (c2 == evsel) is_open = false; if (c2->leader == leader) { - if (is_open) + if (is_open && close) perf_evsel__close(&c2->core); c2->leader = c2; c2->core.nr_members = 0; + /* + * Set this for all former members of the group + * to indicate they get reopened. + */ + c2->reset_group = true; } } return leader; } - -int perf_evlist__add_sb_event(struct evlist **evlist, - struct perf_event_attr *attr, - perf_evsel__sb_cb_t cb, - void *data) -{ - struct evsel *evsel; - bool new_evlist = (*evlist) == NULL; - - if (*evlist == NULL) - *evlist = evlist__new(); - if (*evlist == NULL) - return -1; - - if (!attr->sample_id_all) { - pr_warning("enabling sample_id_all for all side band events\n"); - attr->sample_id_all = 1; - } - - evsel = perf_evsel__new_idx(attr, (*evlist)->core.nr_entries); - if (!evsel) - goto out_err; - - evsel->side_band.cb = cb; - evsel->side_band.data = data; - evlist__add(*evlist, evsel); - return 0; - -out_err: - if (new_evlist) { - evlist__delete(*evlist); - *evlist = NULL; - } - return -1; -} - -static void *perf_evlist__poll_thread(void *arg) -{ - struct evlist *evlist = arg; - bool draining = false; - int i, done = 0; - /* - * In order to read symbols from other namespaces perf to needs to call - * setns(2). This isn't permitted if the struct_fs has multiple users. - * unshare(2) the fs so that we may continue to setns into namespaces - * that we're observing when, for instance, reading the build-ids at - * the end of a 'perf record' session. - */ - unshare(CLONE_FS); - - while (!done) { - bool got_data = false; - - if (evlist->thread.done) - draining = true; - - if (!draining) - evlist__poll(evlist, 1000); - - for (i = 0; i < evlist->core.nr_mmaps; i++) { - struct mmap *map = &evlist->mmap[i]; - union perf_event *event; - - if (perf_mmap__read_init(map)) - continue; - while ((event = perf_mmap__read_event(map)) != NULL) { - struct evsel *evsel = perf_evlist__event2evsel(evlist, event); - - if (evsel && evsel->side_band.cb) - evsel->side_band.cb(event, evsel->side_band.data); - else - pr_warning("cannot locate proper evsel for the side band event\n"); - - perf_mmap__consume(map); - got_data = true; - } - perf_mmap__read_done(map); - } - - if (draining && !got_data) - break; - } - return NULL; -} - -int perf_evlist__start_sb_thread(struct evlist *evlist, - struct target *target) -{ - struct evsel *counter; - - if (!evlist) - return 0; - - if (perf_evlist__create_maps(evlist, target)) - goto out_delete_evlist; - - evlist__for_each_entry(evlist, counter) { - if (evsel__open(counter, evlist->core.cpus, - evlist->core.threads) < 0) - goto out_delete_evlist; - } - - if (evlist__mmap(evlist, UINT_MAX)) - goto out_delete_evlist; - - evlist__for_each_entry(evlist, counter) { - if (evsel__enable(counter)) - goto out_delete_evlist; - } - - evlist->thread.done = 0; - if (pthread_create(&evlist->thread.th, NULL, perf_evlist__poll_thread, evlist)) - goto out_delete_evlist; - - return 0; - -out_delete_evlist: - evlist__delete(evlist); - evlist = NULL; - return -1; -} - -void perf_evlist__stop_sb_thread(struct evlist *evlist) -{ - if (!evlist) - return; - evlist->thread.done = 1; - pthread_join(evlist->thread.th, NULL); - evlist__delete(evlist); -} diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index d89e72bd1c814dcb74ab35c9dbbd8ab8060a15e3..29c550216a626cda791bbd8775280822e31114c5 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -92,31 +92,31 @@ void evlist__delete(struct evlist *evlist); void evlist__add(struct evlist *evlist, struct evsel *entry); void evlist__remove(struct evlist *evlist, struct evsel *evsel); -int __perf_evlist__add_default(struct evlist *evlist, bool precise); +int __evlist__add_default(struct evlist *evlist, bool precise); -static inline int perf_evlist__add_default(struct evlist *evlist) +static inline int evlist__add_default(struct evlist *evlist) { - return __perf_evlist__add_default(evlist, true); + return __evlist__add_default(evlist, true); } -int __perf_evlist__add_default_attrs(struct evlist *evlist, +int __evlist__add_default_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs); -#define perf_evlist__add_default_attrs(evlist, array) \ - __perf_evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array)) +#define evlist__add_default_attrs(evlist, array) \ + __evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array)) -int perf_evlist__add_dummy(struct evlist *evlist); +int evlist__add_dummy(struct evlist *evlist); -int perf_evlist__add_sb_event(struct evlist **evlist, +int perf_evlist__add_sb_event(struct evlist *evlist, struct perf_event_attr *attr, perf_evsel__sb_cb_t cb, void *data); +void evlist__set_cb(struct evlist *evlist, perf_evsel__sb_cb_t cb, void *data); int perf_evlist__start_sb_thread(struct evlist *evlist, struct target *target); void perf_evlist__stop_sb_thread(struct evlist *evlist); -int perf_evlist__add_newtp(struct evlist *evlist, - const char *sys, const char *name, void *handler); +int evlist__add_newtp(struct evlist *evlist, const char *sys, const char *name, void *handler); void __perf_evlist__set_sample_bit(struct evlist *evlist, enum perf_event_sample_format bit); @@ -133,6 +133,8 @@ int perf_evlist__set_tp_filter(struct evlist *evlist, const char *filter); int perf_evlist__set_tp_filter_pid(struct evlist *evlist, pid_t pid); int perf_evlist__set_tp_filter_pids(struct evlist *evlist, size_t npids, pid_t *pids); +int perf_evlist__append_tp_filter(struct evlist *evlist, const char *filter); + struct evsel * perf_evlist__find_tracepoint_by_id(struct evlist *evlist, int id); @@ -161,10 +163,6 @@ void evlist__close(struct evlist *evlist); struct callchain_param; void perf_evlist__set_id_pos(struct evlist *evlist); -bool perf_can_sample_identifier(void); -bool perf_can_record_switch_events(void); -bool perf_can_record_cpu_wide(void); -bool perf_can_aux_sample(void); void perf_evlist__config(struct evlist *evlist, struct record_opts *opts, struct callchain_param *callchain); int record_opts__config(struct record_opts *opts); @@ -322,9 +320,17 @@ void perf_evlist__to_front(struct evlist *evlist, #define evlist__for_each_entry_safe(evlist, tmp, evsel) \ __evlist__for_each_entry_safe(&(evlist)->core.entries, tmp, evsel) +#define evlist__for_each_cpu(evlist, index, cpu) \ + evlist__cpu_iter_start(evlist); \ + perf_cpu_map__for_each_cpu (cpu, index, (evlist)->core.all_cpus) + void perf_evlist__set_tracking_event(struct evlist *evlist, struct evsel *tracking_evsel); +void evlist__cpu_iter_start(struct evlist *evlist); +bool evsel__cpu_iter_skip(struct evsel *ev, int cpu); +bool evsel__cpu_iter_skip_no_inc(struct evsel *ev, int cpu); + struct evsel * perf_evlist__find_evsel_by_str(struct evlist *evlist, const char *str); @@ -336,5 +342,6 @@ bool perf_evlist__exclude_kernel(struct evlist *evlist); void perf_evlist__force_leader(struct evlist *evlist); struct evsel *perf_evlist__reset_weak_group(struct evlist *evlist, - struct evsel *evsel); + struct evsel *evsel, + bool close); #endif /* __PERF_EVLIST_H */ diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 39912d9b51b3eb368d948489a32857eae70112bc..147855005d3179f2c1a8111f3a6f111e1cef3306 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -184,7 +184,7 @@ void perf_evsel__calc_id_pos(struct evsel *evsel) evsel->is_pos = __perf_evsel__calc_is_pos(evsel->core.attr.sample_type); } -void __perf_evsel__set_sample_bit(struct evsel *evsel, +void __evsel__set_sample_bit(struct evsel *evsel, enum perf_event_sample_format bit) { if (!(evsel->core.attr.sample_type & bit)) { @@ -194,7 +194,7 @@ void __perf_evsel__set_sample_bit(struct evsel *evsel, } } -void __perf_evsel__reset_sample_bit(struct evsel *evsel, +void __evsel__reset_sample_bit(struct evsel *evsel, enum perf_event_sample_format bit) { if (evsel->core.attr.sample_type & bit) { @@ -204,14 +204,14 @@ void __perf_evsel__reset_sample_bit(struct evsel *evsel, } } -void perf_evsel__set_sample_id(struct evsel *evsel, +void evsel__set_sample_id(struct evsel *evsel, bool can_sample_identifier) { if (can_sample_identifier) { - perf_evsel__reset_sample_bit(evsel, ID); - perf_evsel__set_sample_bit(evsel, IDENTIFIER); + evsel__reset_sample_bit(evsel, ID); + evsel__set_sample_bit(evsel, IDENTIFIER); } else { - perf_evsel__set_sample_bit(evsel, ID); + evsel__set_sample_bit(evsel, ID); } evsel->core.attr.read_format |= PERF_FORMAT_ID; } @@ -259,7 +259,7 @@ void evsel__init(struct evsel *evsel, evsel->pmu_name = NULL; } -struct evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx) +struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx) { struct evsel *evsel = zalloc(perf_evsel__object.size); @@ -292,29 +292,27 @@ static bool perf_event_can_profile_kernel(void) return perf_event_paranoid_check(1); } -struct evsel *perf_evsel__new_cycles(bool precise) +struct evsel *evsel__new_cycles(bool precise __maybe_unused, __u32 type, __u64 config) { struct perf_event_attr attr = { - .type = PERF_TYPE_HARDWARE, - .config = PERF_COUNT_HW_CPU_CYCLES, + .type = type, + .config = config, .exclude_kernel = !perf_event_can_profile_kernel(), }; struct evsel *evsel; event_attr_init(&attr); - if (!precise) - goto new_event; - /* * Now let the usual logic to set up the perf_event_attr defaults * to kick in when we return and before perf_evsel__open() is called. */ -new_event: evsel = evsel__new(&attr); if (evsel == NULL) goto out; + arch_evsel__fixup_new_cycles(&evsel->core.attr); + evsel->precise_max = true; /* use asprintf() because free(evsel) assumes name is allocated */ @@ -334,7 +332,7 @@ struct evsel *perf_evsel__new_cycles(bool precise) /* * Returns pointer with encoded error via interface. */ -struct evsel *perf_evsel__newtp_idx(const char *sys, const char *name, int idx) +struct evsel *evsel__newtp_idx(const char *sys, const char *name, int idx) { struct evsel *evsel = zalloc(perf_evsel__object.size); int err = -ENOMEM; @@ -693,7 +691,7 @@ static void __perf_evsel__config_callchain(struct evsel *evsel, bool function = perf_evsel__is_function_event(evsel); struct perf_event_attr *attr = &evsel->core.attr; - perf_evsel__set_sample_bit(evsel, CALLCHAIN); + evsel__set_sample_bit(evsel, CALLCHAIN); attr->sample_max_stack = param->max_stack; @@ -708,11 +706,12 @@ static void __perf_evsel__config_callchain(struct evsel *evsel, "to get user callchain information. " "Falling back to framepointers.\n"); } else { - perf_evsel__set_sample_bit(evsel, BRANCH_STACK); + evsel__set_sample_bit(evsel, BRANCH_STACK); attr->branch_sample_type = PERF_SAMPLE_BRANCH_USER | PERF_SAMPLE_BRANCH_CALL_STACK | PERF_SAMPLE_BRANCH_NO_CYCLES | - PERF_SAMPLE_BRANCH_NO_FLAGS; + PERF_SAMPLE_BRANCH_NO_FLAGS | + PERF_SAMPLE_BRANCH_HW_INDEX; } } else pr_warning("Cannot use LBR callstack with branch stack. " @@ -721,8 +720,8 @@ static void __perf_evsel__config_callchain(struct evsel *evsel, if (param->record_mode == CALLCHAIN_DWARF) { if (!function) { - perf_evsel__set_sample_bit(evsel, REGS_USER); - perf_evsel__set_sample_bit(evsel, STACK_USER); + evsel__set_sample_bit(evsel, REGS_USER); + evsel__set_sample_bit(evsel, STACK_USER); if (opts->sample_user_regs && DWARF_MINIMAL_REGS != PERF_REGS_MASK) { attr->sample_regs_user |= DWARF_MINIMAL_REGS; pr_warning("WARNING: The use of --call-graph=dwarf may require all the user registers, " @@ -759,15 +758,16 @@ perf_evsel__reset_callgraph(struct evsel *evsel, { struct perf_event_attr *attr = &evsel->core.attr; - perf_evsel__reset_sample_bit(evsel, CALLCHAIN); + evsel__reset_sample_bit(evsel, CALLCHAIN); if (param->record_mode == CALLCHAIN_LBR) { - perf_evsel__reset_sample_bit(evsel, BRANCH_STACK); + evsel__reset_sample_bit(evsel, BRANCH_STACK); attr->branch_sample_type &= ~(PERF_SAMPLE_BRANCH_USER | - PERF_SAMPLE_BRANCH_CALL_STACK); + PERF_SAMPLE_BRANCH_CALL_STACK | + PERF_SAMPLE_BRANCH_HW_INDEX); } if (param->record_mode == CALLCHAIN_DWARF) { - perf_evsel__reset_sample_bit(evsel, REGS_USER); - perf_evsel__reset_sample_bit(evsel, STACK_USER); + evsel__reset_sample_bit(evsel, REGS_USER); + evsel__reset_sample_bit(evsel, STACK_USER); } } @@ -791,32 +791,32 @@ static void apply_config_terms(struct evsel *evsel, if (!(term->weak && opts->user_interval != ULLONG_MAX)) { attr->sample_period = term->val.period; attr->freq = 0; - perf_evsel__reset_sample_bit(evsel, PERIOD); + evsel__reset_sample_bit(evsel, PERIOD); } break; case PERF_EVSEL__CONFIG_TERM_FREQ: if (!(term->weak && opts->user_freq != UINT_MAX)) { attr->sample_freq = term->val.freq; attr->freq = 1; - perf_evsel__set_sample_bit(evsel, PERIOD); + evsel__set_sample_bit(evsel, PERIOD); } break; case PERF_EVSEL__CONFIG_TERM_TIME: if (term->val.time) - perf_evsel__set_sample_bit(evsel, TIME); + evsel__set_sample_bit(evsel, TIME); else - perf_evsel__reset_sample_bit(evsel, TIME); + evsel__reset_sample_bit(evsel, TIME); break; case PERF_EVSEL__CONFIG_TERM_CALLGRAPH: - callgraph_buf = term->val.callgraph; + callgraph_buf = term->val.str; break; case PERF_EVSEL__CONFIG_TERM_BRANCH: - if (term->val.branch && strcmp(term->val.branch, "no")) { - perf_evsel__set_sample_bit(evsel, BRANCH_STACK); - parse_branch_str(term->val.branch, + if (term->val.str && strcmp(term->val.str, "no")) { + evsel__set_sample_bit(evsel, BRANCH_STACK); + parse_branch_str(term->val.str, &attr->branch_sample_type); } else - perf_evsel__reset_sample_bit(evsel, BRANCH_STACK); + evsel__reset_sample_bit(evsel, BRANCH_STACK); break; case PERF_EVSEL__CONFIG_TERM_STACK_USER: dump_size = term->val.stack_user; @@ -849,6 +849,8 @@ static void apply_config_terms(struct evsel *evsel, case PERF_EVSEL__CONFIG_TERM_AUX_SAMPLE_SIZE: /* Already applied by auxtrace */ break; + case PERF_EVSEL__CONFIG_TERM_CFG_CHG: + break; default: break; } @@ -893,8 +895,8 @@ static void apply_config_terms(struct evsel *evsel, /* set perf-event callgraph */ if (param.enabled) { if (sample_address) { - perf_evsel__set_sample_bit(evsel, ADDR); - perf_evsel__set_sample_bit(evsel, DATA_SRC); + evsel__set_sample_bit(evsel, ADDR); + evsel__set_sample_bit(evsel, DATA_SRC); evsel->core.attr.mmap_data = track; } perf_evsel__config_callchain(evsel, opts, ¶m); @@ -921,6 +923,20 @@ struct perf_evsel_config_term *__perf_evsel__get_config_term(struct evsel *evsel return found_term; } +void __weak arch_evsel__set_sample_weight(struct evsel *evsel) +{ + evsel__set_sample_bit(evsel, WEIGHT); +} + +void __weak arch_evsel__fixup_new_cycles(struct perf_event_attr *attr __maybe_unused) +{ +} + +void __weak arch__post_evsel_config(struct evsel *evsel __maybe_unused, + struct perf_event_attr *attr __maybe_unused) +{ +} + /* * The enable_on_exec/disabled value strategy: * @@ -961,17 +977,17 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts, attr->inherit = !opts->no_inherit; attr->write_backward = opts->overwrite ? 1 : 0; - perf_evsel__set_sample_bit(evsel, IP); - perf_evsel__set_sample_bit(evsel, TID); + evsel__set_sample_bit(evsel, IP); + evsel__set_sample_bit(evsel, TID); if (evsel->sample_read) { - perf_evsel__set_sample_bit(evsel, READ); + evsel__set_sample_bit(evsel, READ); /* * We need ID even in case of single event, because * PERF_SAMPLE_READ process ID specific data. */ - perf_evsel__set_sample_id(evsel, false); + evsel__set_sample_id(evsel, false); /* * Apply group format only if we belong to group @@ -990,7 +1006,7 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts, if (!attr->sample_period || (opts->user_freq != UINT_MAX || opts->user_interval != ULLONG_MAX)) { if (opts->freq) { - perf_evsel__set_sample_bit(evsel, PERIOD); + evsel__set_sample_bit(evsel, PERIOD); attr->freq = 1; attr->sample_freq = opts->freq; } else { @@ -1029,7 +1045,7 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts, } if (opts->sample_address) { - perf_evsel__set_sample_bit(evsel, ADDR); + evsel__set_sample_bit(evsel, ADDR); attr->mmap_data = track; } @@ -1046,16 +1062,16 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts, if (opts->sample_intr_regs && !evsel->no_aux_samples) { attr->sample_regs_intr = opts->sample_intr_regs; - perf_evsel__set_sample_bit(evsel, REGS_INTR); + evsel__set_sample_bit(evsel, REGS_INTR); } if (opts->sample_user_regs && !evsel->no_aux_samples) { attr->sample_regs_user |= opts->sample_user_regs; - perf_evsel__set_sample_bit(evsel, REGS_USER); + evsel__set_sample_bit(evsel, REGS_USER); } if (target__has_cpu(&opts->target) || opts->sample_cpu) - perf_evsel__set_sample_bit(evsel, CPU); + evsel__set_sample_bit(evsel, CPU); /* * When the user explicitly disabled time don't force it here. @@ -1064,31 +1080,31 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts, (!perf_missing_features.sample_id_all && (!opts->no_inherit || target__has_cpu(&opts->target) || per_cpu || opts->sample_time_set))) - perf_evsel__set_sample_bit(evsel, TIME); + evsel__set_sample_bit(evsel, TIME); if (opts->raw_samples && !evsel->no_aux_samples) { - perf_evsel__set_sample_bit(evsel, TIME); - perf_evsel__set_sample_bit(evsel, RAW); - perf_evsel__set_sample_bit(evsel, CPU); + evsel__set_sample_bit(evsel, TIME); + evsel__set_sample_bit(evsel, RAW); + evsel__set_sample_bit(evsel, CPU); } if (opts->sample_address) - perf_evsel__set_sample_bit(evsel, DATA_SRC); + evsel__set_sample_bit(evsel, DATA_SRC); if (opts->sample_phys_addr) - perf_evsel__set_sample_bit(evsel, PHYS_ADDR); + evsel__set_sample_bit(evsel, PHYS_ADDR); if (opts->no_buffering) { attr->watermark = 0; attr->wakeup_events = 1; } if (opts->branch_stack && !evsel->no_aux_samples) { - perf_evsel__set_sample_bit(evsel, BRANCH_STACK); + evsel__set_sample_bit(evsel, BRANCH_STACK); attr->branch_sample_type = opts->branch_stack; } if (opts->sample_weight) - perf_evsel__set_sample_bit(evsel, WEIGHT); + arch_evsel__set_sample_weight(evsel); attr->task = track; attr->mmap = track; @@ -1100,11 +1116,22 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts, if (opts->record_namespaces) attr->namespaces = track; + if (opts->record_cgroup) { + attr->cgroup = track && !perf_missing_features.cgroup; + evsel__set_sample_bit(evsel, CGROUP); + } + + if (opts->sample_data_page_size) + evsel__set_sample_bit(evsel, DATA_PAGE_SIZE); + + if (opts->sample_code_page_size) + evsel__set_sample_bit(evsel, CODE_PAGE_SIZE); + if (opts->record_switch_events) attr->context_switch = track; if (opts->sample_transaction) - perf_evsel__set_sample_bit(evsel, TRANSACTION); + evsel__set_sample_bit(evsel, TRANSACTION); if (opts->running_time) { evsel->core.attr.read_format |= @@ -1167,9 +1194,9 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts, /* The --period option takes the precedence. */ if (opts->period_set) { if (opts->period) - perf_evsel__set_sample_bit(evsel, PERIOD); + evsel__set_sample_bit(evsel, PERIOD); else - perf_evsel__reset_sample_bit(evsel, PERIOD); + evsel__reset_sample_bit(evsel, PERIOD); } /* @@ -1178,7 +1205,9 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts, * if BRANCH_STACK bit is set. */ if (opts->initial_delay && is_dummy_event(evsel)) - perf_evsel__reset_sample_bit(evsel, BRANCH_STACK); + evsel__reset_sample_bit(evsel, BRANCH_STACK); + + arch__post_evsel_config(evsel, attr); } int perf_evsel__set_filter(struct evsel *evsel, const char *filter) @@ -1543,7 +1572,7 @@ static int __open_attr__fprintf(FILE *fp, const char *name, const char *val, static void display_attr(struct perf_event_attr *attr) { - if (verbose >= 2) { + if (verbose >= 2 || debug_peo_args) { fprintf(stderr, "%.60s\n", graph_dotted_line); fprintf(stderr, "perf_event_attr:\n"); perf_event_attr__fprintf(stderr, attr, __open_attr__fprintf, NULL); @@ -1559,7 +1588,7 @@ static int perf_event_open(struct evsel *evsel, int fd; while (1) { - pr_debug2("sys_perf_event_open: pid %d cpu %d group_fd %d flags %#lx", + pr_debug2_peo("sys_perf_event_open: pid %d cpu %d group_fd %d flags %#lx", pid, cpu, group_fd, flags); fd = sys_perf_event_open(&evsel->core.attr, pid, cpu, group_fd, flags); @@ -1579,17 +1608,18 @@ static int perf_event_open(struct evsel *evsel, break; } - pr_debug2("\nsys_perf_event_open failed, error %d\n", -ENOTSUP); + pr_debug2_peo("\nsys_perf_event_open failed, error %d\n", -ENOTSUP); evsel->core.attr.precise_ip--; - pr_debug2("decreasing precise_ip by one (%d)\n", evsel->core.attr.precise_ip); + pr_debug2_peo("decreasing precise_ip by one (%d)\n", evsel->core.attr.precise_ip); display_attr(&evsel->core.attr); } return fd; } -int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, - struct perf_thread_map *threads) +static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, + struct perf_thread_map *threads, + int start_cpu, int end_cpu) { int cpu, thread, nthreads; unsigned long flags = PERF_FLAG_FD_CLOEXEC; @@ -1639,6 +1669,10 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, } fallback_missing_features: + if (perf_missing_features.weight_struct) { + evsel__set_sample_bit(evsel, WEIGHT); + evsel__reset_sample_bit(evsel, WEIGHT_STRUCT); + } if (perf_missing_features.clockid_wrong) evsel->core.attr.clockid = CLOCK_MONOTONIC; /* should always work */ if (perf_missing_features.clockid) { @@ -1660,13 +1694,15 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, evsel->core.attr.ksymbol = 0; if (perf_missing_features.bpf) evsel->core.attr.bpf_event = 0; + if (perf_missing_features.branch_hw_idx) + evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_HW_INDEX; retry_sample_id: if (perf_missing_features.sample_id_all) evsel->core.attr.sample_id_all = 0; display_attr(&evsel->core.attr); - for (cpu = 0; cpu < cpus->nr; cpu++) { + for (cpu = start_cpu; cpu < end_cpu; cpu++) { for (thread = 0; thread < nthreads; thread++) { int fd, group_fd; @@ -1700,12 +1736,12 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, continue; } - pr_debug2("\nsys_perf_event_open failed, error %d\n", + pr_debug2_peo("\nsys_perf_event_open failed, error %d\n", err); goto try_fallback; } - pr_debug2(" = %d\n", fd); + pr_debug2_peo(" = %d\n", fd); if (evsel->bpf_fd >= 0) { int evt_fd = fd; @@ -1771,60 +1807,84 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, * Must probe features in the order they were added to the * perf_event_attr interface. */ - if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) { + if (!perf_missing_features.weight_struct && + (evsel->core.attr.sample_type & PERF_SAMPLE_WEIGHT_STRUCT)) { + perf_missing_features.weight_struct = true; + pr_debug2("switching off weight struct support\n"); + goto fallback_missing_features; + } else if (!perf_missing_features.code_page_size && + (evsel->core.attr.sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)) { + perf_missing_features.code_page_size = true; + pr_debug2_peo("Kernel has no PERF_SAMPLE_CODE_PAGE_SIZE support, bailing out\n"); + goto out_close; + } else if (!perf_missing_features.data_page_size && + (evsel->core.attr.sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)) { + perf_missing_features.data_page_size = true; + pr_debug2_peo("Kernel has no PERF_SAMPLE_DATA_PAGE_SIZE support, bailing out\n"); + goto out_close; + } else if (!perf_missing_features.cgroup && evsel->core.attr.cgroup) { + perf_missing_features.cgroup = true; + pr_debug2_peo("Kernel has no cgroup sampling support, bailing out\n"); + goto out_close; + } else if (!perf_missing_features.branch_hw_idx && + (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)) { + perf_missing_features.branch_hw_idx = true; + pr_debug2("switching off branch HW index support\n"); + goto fallback_missing_features; + } else if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) { perf_missing_features.aux_output = true; - pr_debug2("Kernel has no attr.aux_output support, bailing out\n"); + pr_debug2_peo("Kernel has no attr.aux_output support, bailing out\n"); goto out_close; } else if (!perf_missing_features.bpf && evsel->core.attr.bpf_event) { perf_missing_features.bpf = true; - pr_debug2("switching off bpf_event\n"); + pr_debug2_peo("switching off bpf_event\n"); goto fallback_missing_features; } else if (!perf_missing_features.ksymbol && evsel->core.attr.ksymbol) { perf_missing_features.ksymbol = true; - pr_debug2("switching off ksymbol\n"); + pr_debug2_peo("switching off ksymbol\n"); goto fallback_missing_features; } else if (!perf_missing_features.write_backward && evsel->core.attr.write_backward) { perf_missing_features.write_backward = true; - pr_debug2("switching off write_backward\n"); + pr_debug2_peo("switching off write_backward\n"); goto out_close; } else if (!perf_missing_features.clockid_wrong && evsel->core.attr.use_clockid) { perf_missing_features.clockid_wrong = true; - pr_debug2("switching off clockid\n"); + pr_debug2_peo("switching off clockid\n"); goto fallback_missing_features; } else if (!perf_missing_features.clockid && evsel->core.attr.use_clockid) { perf_missing_features.clockid = true; - pr_debug2("switching off use_clockid\n"); + pr_debug2_peo("switching off use_clockid\n"); goto fallback_missing_features; } else if (!perf_missing_features.cloexec && (flags & PERF_FLAG_FD_CLOEXEC)) { perf_missing_features.cloexec = true; - pr_debug2("switching off cloexec flag\n"); + pr_debug2_peo("switching off cloexec flag\n"); goto fallback_missing_features; } else if (!perf_missing_features.mmap2 && evsel->core.attr.mmap2) { perf_missing_features.mmap2 = true; - pr_debug2("switching off mmap2\n"); + pr_debug2_peo("switching off mmap2\n"); goto fallback_missing_features; } else if (!perf_missing_features.exclude_guest && (evsel->core.attr.exclude_guest || evsel->core.attr.exclude_host)) { perf_missing_features.exclude_guest = true; - pr_debug2("switching off exclude_guest, exclude_host\n"); + pr_debug2_peo("switching off exclude_guest, exclude_host\n"); goto fallback_missing_features; } else if (!perf_missing_features.sample_id_all) { perf_missing_features.sample_id_all = true; - pr_debug2("switching off sample_id_all\n"); + pr_debug2_peo("switching off sample_id_all\n"); goto retry_sample_id; } else if (!perf_missing_features.lbr_flags && (evsel->core.attr.branch_sample_type & (PERF_SAMPLE_BRANCH_NO_CYCLES | PERF_SAMPLE_BRANCH_NO_FLAGS))) { perf_missing_features.lbr_flags = true; - pr_debug2("switching off branch sample type no (cycles/flags)\n"); + pr_debug2_peo("switching off branch sample type no (cycles/flags)\n"); goto fallback_missing_features; } else if (!perf_missing_features.group_read && evsel->core.attr.inherit && (evsel->core.attr.read_format & PERF_FORMAT_GROUP) && perf_evsel__is_group_leader(evsel)) { perf_missing_features.group_read = true; - pr_debug2("switching off group read\n"); + pr_debug2_peo("switching off group read\n"); goto fallback_missing_features; } out_close: @@ -1833,7 +1893,8 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, do { while (--thread >= 0) { - close(FD(evsel, cpu, thread)); + if (FD(evsel, cpu, thread) >= 0) + close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } thread = nthreads; @@ -1841,6 +1902,12 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, return err; } +int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, + struct perf_thread_map *threads) +{ + return evsel__open_cpu(evsel, cpus, threads, 0, cpus ? cpus->nr : 1); +} + void evsel__close(struct evsel *evsel) { perf_evsel__close(&evsel->core); @@ -1848,9 +1915,14 @@ void evsel__close(struct evsel *evsel) } int perf_evsel__open_per_cpu(struct evsel *evsel, - struct perf_cpu_map *cpus) + struct perf_cpu_map *cpus, + int cpu) { - return evsel__open(evsel, cpus, NULL); + if (cpu == -1) + return evsel__open_cpu(evsel, cpus, NULL, 0, + cpus ? cpus->nr : 1); + + return evsel__open_cpu(evsel, cpus, NULL, cpu, cpu + 1); } int perf_evsel__open_per_thread(struct evsel *evsel, @@ -2142,7 +2214,12 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event, if (data->branch_stack->nr > max_branch_nr) return -EFAULT; + sz = data->branch_stack->nr * sizeof(struct branch_entry); + if (perf_evsel__has_branch_hw_idx(evsel)) + sz += sizeof(u64); + else + data->no_hw_idx = true; OVERFLOW_CHECK(array, sz, max_size); array = (void *)array + sz; } @@ -2184,9 +2261,15 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event, } } - if (type & PERF_SAMPLE_WEIGHT) { + if (type & PERF_SAMPLE_WEIGHT_TYPE) { + union perf_sample_weight weight; + OVERFLOW_CHECK_u64(array); - data->weight = *array; + weight.full = *array; + if (type & PERF_SAMPLE_WEIGHT) + data->weight = weight.full; + else + data->weight = weight.var1_dw; array++; } @@ -2225,6 +2308,24 @@ int perf_evsel__parse_sample(struct evsel *evsel, union perf_event *event, array++; } + data->cgroup = 0; + if (type & PERF_SAMPLE_CGROUP) { + data->cgroup = *array; + array++; + } + + data->data_page_size = 0; + if (type & PERF_SAMPLE_DATA_PAGE_SIZE) { + data->data_page_size = *array; + array++; + } + + data->code_page_size = 0; + if (type & PERF_SAMPLE_CODE_PAGE_SIZE) { + data->code_page_size = *array; + array++; + } + if (type & PERF_SAMPLE_AUX) { OVERFLOW_CHECK_u64(array); sz = *array++; @@ -2525,6 +2626,10 @@ int perf_evsel__open_strerror(struct evsel *evsel, struct target *target, "We found oprofile daemon running, please stop it and try again."); break; case EINVAL: + if (evsel->core.attr.sample_type & PERF_SAMPLE_CODE_PAGE_SIZE && perf_missing_features.code_page_size) + return scnprintf(msg, size, "Asking for the code page size isn't supported by this kernel."); + if (evsel->core.attr.sample_type & PERF_SAMPLE_DATA_PAGE_SIZE && perf_missing_features.data_page_size) + return scnprintf(msg, size, "Asking for the data page size isn't supported by this kernel."); if (evsel->core.attr.write_backward && perf_missing_features.write_backward) return scnprintf(msg, size, "Reading from overwrite event is not supported by this kernel."); if (perf_missing_features.clockid) @@ -2547,7 +2652,7 @@ int perf_evsel__open_strerror(struct evsel *evsel, struct target *target, struct perf_env *perf_evsel__env(struct evsel *evsel) { - if (evsel && evsel->evlist) + if (evsel && evsel->evlist && evsel->evlist->env) return evsel->evlist->env; return &perf_env; } diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index ddc5ee6f6592bed23532aff20f75c7a76db53361..a7e7a25af2be49a4bb88c4f74c159a5ab53ef5b0 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -94,12 +94,23 @@ struct evsel { struct evsel *metric_leader; bool collect_stat; bool weak_group; + bool reset_group; + bool errored; bool percore; + int cpu_iter; const char *pmu_name; struct { perf_evsel__sb_cb_t *cb; void *data; } side_band; + /* + * For reporting purposes, an evsel sample can have a callchain + * synthesized from AUX area data. Keep track of synthesized sample + * types here. Note, the recorded sample_type cannot be changed because + * it is needed to continue to parse events. + * See also evsel__has_callchain(). + */ + __u64 synth_sample_type; }; struct perf_missing_features { @@ -115,6 +126,11 @@ struct perf_missing_features { bool ksymbol; bool bpf; bool aux_output; + bool branch_hw_idx; + bool cgroup; + bool data_page_size; + bool code_page_size; + bool weight_struct; }; extern struct perf_missing_features perf_missing_features; @@ -144,24 +160,27 @@ int perf_evsel__object_config(size_t object_size, int (*init)(struct evsel *evsel), void (*fini)(struct evsel *evsel)); -struct evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx); +struct perf_pmu *evsel__find_pmu(struct evsel *evsel); +bool perf_evsel__is_aux_event(struct evsel *evsel); + +struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx); static inline struct evsel *evsel__new(struct perf_event_attr *attr) { - return perf_evsel__new_idx(attr, 0); + return evsel__new_idx(attr, 0); } -struct evsel *perf_evsel__newtp_idx(const char *sys, const char *name, int idx); +struct evsel *evsel__newtp_idx(const char *sys, const char *name, int idx); /* * Returns pointer with encoded error via interface. */ -static inline struct evsel *perf_evsel__newtp(const char *sys, const char *name) +static inline struct evsel *evsel__newtp(const char *sys, const char *name) { - return perf_evsel__newtp_idx(sys, name, 0); + return evsel__newtp_idx(sys, name, 0); } -struct evsel *perf_evsel__new_cycles(bool precise); +struct evsel *evsel__new_cycles(bool precise, __u32 type, __u64 config); struct tep_event *event_format__new(const char *sys, const char *name); @@ -200,29 +219,31 @@ const char *perf_evsel__name(struct evsel *evsel); const char *perf_evsel__group_name(struct evsel *evsel); int perf_evsel__group_desc(struct evsel *evsel, char *buf, size_t size); -void __perf_evsel__set_sample_bit(struct evsel *evsel, - enum perf_event_sample_format bit); -void __perf_evsel__reset_sample_bit(struct evsel *evsel, - enum perf_event_sample_format bit); +void __evsel__set_sample_bit(struct evsel *evsel, enum perf_event_sample_format bit); +void __evsel__reset_sample_bit(struct evsel *evsel, enum perf_event_sample_format bit); -#define perf_evsel__set_sample_bit(evsel, bit) \ - __perf_evsel__set_sample_bit(evsel, PERF_SAMPLE_##bit) +#define evsel__set_sample_bit(evsel, bit) \ + __evsel__set_sample_bit(evsel, PERF_SAMPLE_##bit) -#define perf_evsel__reset_sample_bit(evsel, bit) \ - __perf_evsel__reset_sample_bit(evsel, PERF_SAMPLE_##bit) +#define evsel__reset_sample_bit(evsel, bit) \ + __evsel__reset_sample_bit(evsel, PERF_SAMPLE_##bit) -void perf_evsel__set_sample_id(struct evsel *evsel, - bool use_sample_identifier); +void evsel__set_sample_id(struct evsel *evsel, bool use_sample_identifier); int perf_evsel__set_filter(struct evsel *evsel, const char *filter); int perf_evsel__append_tp_filter(struct evsel *evsel, const char *filter); int perf_evsel__append_addr_filter(struct evsel *evsel, const char *filter); +void arch_evsel__set_sample_weight(struct evsel *evsel); +void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr); +void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr); + int evsel__enable(struct evsel *evsel); int evsel__disable(struct evsel *evsel); int perf_evsel__open_per_cpu(struct evsel *evsel, - struct perf_cpu_map *cpus); + struct perf_cpu_map *cpus, + int cpu); int perf_evsel__open_per_thread(struct evsel *evsel, struct perf_thread_map *threads); int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus, @@ -382,9 +403,29 @@ static inline bool perf_evsel__has_branch_callstack(const struct evsel *evsel) return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK; } +static inline bool perf_evsel__has_branch_hw_idx(const struct evsel *evsel) +{ + return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX; +} + static inline bool evsel__has_callchain(const struct evsel *evsel) { - return (evsel->core.attr.sample_type & PERF_SAMPLE_CALLCHAIN) != 0; + /* + * For reporting purposes, an evsel sample can have a recorded callchain + * or a callchain synthesized from AUX area data. + */ + return evsel->core.attr.sample_type & PERF_SAMPLE_CALLCHAIN || + evsel->synth_sample_type & PERF_SAMPLE_CALLCHAIN; +} + +static inline bool evsel__has_br_stack(const struct evsel *evsel) +{ + /* + * For reporting purposes, an evsel sample can have a recorded branch + * stack or a branch stack synthesized from AUX area data. + */ + return evsel->core.attr.sample_type & PERF_SAMPLE_BRANCH_STACK || + evsel->synth_sample_type & PERF_SAMPLE_BRANCH_STACK; } struct perf_env *perf_evsel__env(struct evsel *evsel); diff --git a/tools/perf/util/evsel_config.h b/tools/perf/util/evsel_config.h index 6e654ede8fbe26053d135cf9f0b67b8afbe00622..b4a65201e4f7bb285097832780aac934c0f82b58 100644 --- a/tools/perf/util/evsel_config.h +++ b/tools/perf/util/evsel_config.h @@ -26,6 +26,7 @@ enum evsel_term_type { PERF_EVSEL__CONFIG_TERM_PERCORE, PERF_EVSEL__CONFIG_TERM_AUX_OUTPUT, PERF_EVSEL__CONFIG_TERM_AUX_SAMPLE_SIZE, + PERF_EVSEL__CONFIG_TERM_CFG_CHG, }; struct perf_evsel_config_term { @@ -35,17 +36,16 @@ struct perf_evsel_config_term { u64 period; u64 freq; bool time; - char *callgraph; - char *drv_cfg; u64 stack_user; int max_stack; bool inherit; bool overwrite; - char *branch; unsigned long max_events; bool percore; bool aux_output; u32 aux_sample_size; + u64 cfg_chg; + char *str; } val; bool weak; }; diff --git a/tools/perf/util/fncache.c b/tools/perf/util/fncache.c new file mode 100644 index 0000000000000000000000000000000000000000..6225cbc523101313c3faa7f645d2d670abbad597 --- /dev/null +++ b/tools/perf/util/fncache.c @@ -0,0 +1,63 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Manage a cache of file names' existence */ +#include +#include +#include +#include +#include "fncache.h" + +struct fncache { + struct hlist_node nd; + bool res; + char name[]; +}; + +#define FNHSIZE 61 + +static struct hlist_head fncache_hash[FNHSIZE]; + +unsigned shash(const unsigned char *s) +{ + unsigned h = 0; + while (*s) + h = 65599 * h + *s++; + return h ^ (h >> 16); +} + +static bool lookup_fncache(const char *name, bool *res) +{ + int h = shash((const unsigned char *)name) % FNHSIZE; + struct fncache *n; + + hlist_for_each_entry(n, &fncache_hash[h], nd) { + if (!strcmp(n->name, name)) { + *res = n->res; + return true; + } + } + return false; +} + +static void update_fncache(const char *name, bool res) +{ + struct fncache *n = malloc(sizeof(struct fncache) + strlen(name) + 1); + int h = shash((const unsigned char *)name) % FNHSIZE; + + if (!n) + return; + strcpy(n->name, name); + n->res = res; + hlist_add_head(&n->nd, &fncache_hash[h]); +} + +/* No LRU, only use when bounded in some other way. */ +bool file_available(const char *name) +{ + bool res; + + if (lookup_fncache(name, &res)) + return res; + res = access(name, R_OK) == 0; + update_fncache(name, res); + return res; +} diff --git a/tools/perf/util/fncache.h b/tools/perf/util/fncache.h new file mode 100644 index 0000000000000000000000000000000000000000..fe020beaefb1d7c7644f84dc63188db66918f18e --- /dev/null +++ b/tools/perf/util/fncache.h @@ -0,0 +1,7 @@ +#ifndef _FCACHE_H +#define _FCACHE_H 1 + +unsigned shash(const unsigned char *s); +bool file_available(const char *name); + +#endif diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index d3412f2c0d18eee2b9daed474cf77b28452abde0..3b04ddc2f1244fade39be852cf52520b6be8ddcc 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -46,6 +46,8 @@ #include "util/util.h" // perf_exe() #include "cputopo.h" #include "bpf-event.h" +#include "clockid.h" +#include "pmu-hybrid.h" #include #include @@ -896,6 +898,74 @@ static int write_clockid(struct feat_fd *ff, sizeof(ff->ph->env.clockid_res_ns)); } +static int write_clock_data(struct feat_fd *ff, + struct evlist *evlist __maybe_unused) +{ + u64 *data64; + u32 data32; + int ret; + + /* version */ + data32 = 1; + + ret = do_write(ff, &data32, sizeof(data32)); + if (ret < 0) + return ret; + + /* clockid */ + data32 = ff->ph->env.clock.clockid; + + ret = do_write(ff, &data32, sizeof(data32)); + if (ret < 0) + return ret; + + /* TOD ref time */ + data64 = &ff->ph->env.clock.tod_ns; + + ret = do_write(ff, data64, sizeof(*data64)); + if (ret < 0) + return ret; + + /* clockid ref time */ + data64 = &ff->ph->env.clock.clockid_ns; + + return do_write(ff, data64, sizeof(*data64)); +} + +static int write_hybrid_topology(struct feat_fd *ff, + struct evlist *evlist __maybe_unused) +{ + struct hybrid_topology *tp; + int ret; + u32 i; + + tp = hybrid_topology__new(); + if (!tp) + return -ENOENT; + + ret = do_write(ff, &tp->nr, sizeof(u32)); + if (ret < 0) + goto err; + + for (i = 0; i < tp->nr; i++) { + struct hybrid_topology_node *n = &tp->nodes[i]; + + ret = do_write_string(ff, n->pmu_name); + if (ret < 0) + goto err; + + ret = do_write_string(ff, n->cpus); + if (ret < 0) + goto err; + } + + ret = 0; + +err: + hybrid_topology__delete(tp); + return ret; +} + static int write_dir_format(struct feat_fd *ff, struct evlist *evlist __maybe_unused) { @@ -1395,6 +1465,96 @@ static int write_compressed(struct feat_fd *ff __maybe_unused, return do_write(ff, &(ff->ph->env.comp_mmap_len), sizeof(ff->ph->env.comp_mmap_len)); } +static int __write_pmu_caps(struct feat_fd *ff, struct perf_pmu *pmu, + bool write_pmu) +{ + struct perf_pmu_caps *caps = NULL; + int ret; + + ret = do_write(ff, &pmu->nr_caps, sizeof(pmu->nr_caps)); + if (ret < 0) + return ret; + + list_for_each_entry(caps, &pmu->caps, list) { + ret = do_write_string(ff, caps->name); + if (ret < 0) + return ret; + + ret = do_write_string(ff, caps->value); + if (ret < 0) + return ret; + } + + if (write_pmu) { + ret = do_write_string(ff, pmu->name); + if (ret < 0) + return ret; + } + + return ret; +} + +static int write_cpu_pmu_caps(struct feat_fd *ff, + struct evlist *evlist __maybe_unused) +{ + struct perf_pmu *cpu_pmu = perf_pmu__find("cpu"); + int ret; + + if (!cpu_pmu) + return -ENOENT; + + ret = perf_pmu__caps_parse(cpu_pmu); + if (ret < 0) + return ret; + + return __write_pmu_caps(ff, cpu_pmu, false); +} + +static int write_pmu_caps(struct feat_fd *ff, + struct evlist *evlist __maybe_unused) +{ + struct perf_pmu *pmu = NULL; + int nr_pmu = 0; + int ret; + + while ((pmu = perf_pmu__scan(pmu))) { + if (!pmu->name || !strcmp(pmu->name, "cpu") || + perf_pmu__caps_parse(pmu) <= 0) + continue; + nr_pmu++; + } + + ret = do_write(ff, &nr_pmu, sizeof(nr_pmu)); + if (ret < 0) + return ret; + + if (!nr_pmu) + return 0; + + /* + * Write hybrid pmu caps first to maintain compatibility with + * older perf tool. + */ + pmu = NULL; + perf_pmu__for_each_hybrid_pmu(pmu) { + ret = __write_pmu_caps(ff, pmu, true); + if (ret < 0) + return ret; + } + + pmu = NULL; + while ((pmu = perf_pmu__scan(pmu))) { + if (!pmu->name || !strcmp(pmu->name, "cpu") || + !pmu->nr_caps || perf_pmu__is_hybrid(pmu->name)) + continue; + + ret = __write_pmu_caps(ff, pmu, true); + if (ret < 0) + return ret; + } + return 0; +} + static void print_hostname(struct feat_fd *ff, FILE *fp) { fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname); @@ -1518,6 +1678,61 @@ static void print_clockid(struct feat_fd *ff, FILE *fp) ff->ph->env.clockid_res_ns * 1000); } +static void print_clock_data(struct feat_fd *ff, FILE *fp) +{ + struct timespec clockid_ns; + char tstr[64], date[64]; + struct timeval tod_ns; + clockid_t clockid; + struct tm ltime; + u64 ref; + + if (!ff->ph->env.clock.enabled) { + fprintf(fp, "# reference time disabled\n"); + return; + } + + /* Compute TOD time. */ + ref = ff->ph->env.clock.tod_ns; + tod_ns.tv_sec = ref / NSEC_PER_SEC; + ref -= tod_ns.tv_sec * NSEC_PER_SEC; + tod_ns.tv_usec = ref / NSEC_PER_USEC; + + /* Compute clockid time. */ + ref = ff->ph->env.clock.clockid_ns; + clockid_ns.tv_sec = ref / NSEC_PER_SEC; + ref -= clockid_ns.tv_sec * NSEC_PER_SEC; + clockid_ns.tv_nsec = ref; + + clockid = ff->ph->env.clock.clockid; + + if (localtime_r(&tod_ns.tv_sec, <ime) == NULL) + snprintf(tstr, sizeof(tstr), ""); + else { + strftime(date, sizeof(date), "%F %T", <ime); + scnprintf(tstr, sizeof(tstr), "%s.%06d", + date, (int) tod_ns.tv_usec); + } + + fprintf(fp, "# clockid: %s (%u)\n", clockid_name(clockid), clockid); + fprintf(fp, "# reference time: %s = %ld.%06d (TOD) = %ld.%09ld (%s)\n", + tstr, tod_ns.tv_sec, (int) tod_ns.tv_usec, + clockid_ns.tv_sec, clockid_ns.tv_nsec, + clockid_name(clockid)); +} + +static void print_hybrid_topology(struct feat_fd *ff, FILE *fp) +{ + int i; + struct hybrid_node *n; + + fprintf(fp, "# hybrid cpu system:\n"); + for (i = 0; i < ff->ph->env.nr_hybrid_nodes; i++) { + n = &ff->ph->env.hybrid_nodes[i]; + fprintf(fp, "# %s cpu list : %s\n", n->pmu_name, n->cpus); + } +} + static void print_dir_format(struct feat_fd *ff, FILE *fp) { struct perf_session *session; @@ -1772,6 +1987,42 @@ static void print_compressed(struct feat_fd *ff, FILE *fp) ff->ph->env.comp_level, ff->ph->env.comp_ratio); } +static void __print_pmu_caps(FILE *fp, int nr_caps, char **caps, char *pmu_name) +{ + const char *delimiter = ""; + int i; + + if (!nr_caps) { + fprintf(fp, "# %s pmu capabilities: not available\n", pmu_name); + return; + } + + fprintf(fp, "# %s pmu capabilities: ", pmu_name); + for (i = 0; i < nr_caps; i++) { + fprintf(fp, "%s%s", delimiter, caps[i]); + delimiter = ", "; + } + + fprintf(fp, "\n"); +} + +static void print_cpu_pmu_caps(struct feat_fd *ff, FILE *fp) +{ + __print_pmu_caps(fp, ff->ph->env.nr_cpu_pmu_caps, + ff->ph->env.cpu_pmu_caps, (char *)"cpu"); +} + +static void print_pmu_caps(struct feat_fd *ff, FILE *fp) +{ + struct pmu_caps *pmu_caps; + + for (int i = 0; i < ff->ph->env.nr_pmus_with_caps; i++) { + pmu_caps = &ff->ph->env.pmu_caps[i]; + __print_pmu_caps(fp, pmu_caps->nr_caps, pmu_caps->caps, + pmu_caps->pmu_name); + } +} + static void print_pmu_mappings(struct feat_fd *ff, FILE *fp) { const char *delimiter = "# pmu mappings: "; @@ -2651,6 +2902,80 @@ static int process_clockid(struct feat_fd *ff, return 0; } +static int process_clock_data(struct feat_fd *ff, + void *_data __maybe_unused) +{ + u32 data32; + u64 data64; + + /* version */ + if (do_read_u32(ff, &data32)) + return -1; + + if (data32 != 1) + return -1; + + /* clockid */ + if (do_read_u32(ff, &data32)) + return -1; + + ff->ph->env.clock.clockid = data32; + + /* TOD ref time */ + if (do_read_u64(ff, &data64)) + return -1; + + ff->ph->env.clock.tod_ns = data64; + + /* clockid ref time */ + if (do_read_u64(ff, &data64)) + return -1; + + ff->ph->env.clock.clockid_ns = data64; + ff->ph->env.clock.enabled = true; + return 0; +} + +static int process_hybrid_topology(struct feat_fd *ff, + void *data __maybe_unused) +{ + struct hybrid_node *nodes, *n; + u32 nr, i; + + /* nr nodes */ + if (do_read_u32(ff, &nr)) + return -1; + + nodes = zalloc(sizeof(*nodes) * nr); + if (!nodes) + return -ENOMEM; + + for (i = 0; i < nr; i++) { + n = &nodes[i]; + + n->pmu_name = do_read_string(ff); + if (!n->pmu_name) + goto error; + + n->cpus = do_read_string(ff); + if (!n->cpus) + goto error; + } + + ff->ph->env.nr_hybrid_nodes = nr; + ff->ph->env.hybrid_nodes = nodes; + return 0; + +error: + for (i = 0; i < nr; i++) { + free(nodes[i].pmu_name); + free(nodes[i].cpus); + } + + free(nodes); + return -1; +} + static int process_dir_format(struct feat_fd *ff, void *_data __maybe_unused) { @@ -2809,6 +3134,126 @@ static int process_compressed(struct feat_fd *ff, return 0; } +static int __process_pmu_caps(struct feat_fd *ff, int *nr_caps, + char ***caps, unsigned int *max_branches) +{ + char *name, *value, *ptr; + u32 nr_pmu_caps, i; + + *nr_caps = 0; + *caps = NULL; + + if (do_read_u32(ff, &nr_pmu_caps)) + return -1; + + if (!nr_pmu_caps) + return 0; + + *caps = zalloc(sizeof(char *) * nr_pmu_caps); + if (!*caps) + return -1; + + for (i = 0; i < nr_pmu_caps; i++) { + name = do_read_string(ff); + if (!name) + goto error; + + value = do_read_string(ff); + if (!value) + goto free_name; + + if (asprintf(&ptr, "%s=%s", name, value) < 0) + goto free_value; + + (*caps)[i] = ptr; + + if (!strcmp(name, "branches")) + *max_branches = atoi(value); + + free(value); + free(name); + } + *nr_caps = nr_pmu_caps; + return 0; + +free_value: + free(value); +free_name: + free(name); +error: + for (; i > 0; i--) + free((*caps)[i - 1]); + free(*caps); + *caps = NULL; + *nr_caps = 0; + return -1; +} + +static int process_cpu_pmu_caps(struct feat_fd *ff, + void *data __maybe_unused) +{ + int ret = __process_pmu_caps(ff, &ff->ph->env.nr_cpu_pmu_caps, + &ff->ph->env.cpu_pmu_caps, + &ff->ph->env.max_branches); + + if (!ret && !ff->ph->env.cpu_pmu_caps) + pr_debug("cpu pmu capabilities not available\n"); + return ret; +} + +static int process_pmu_caps(struct feat_fd *ff, void *data __maybe_unused) +{ + struct pmu_caps *pmu_caps; + u32 nr_pmu, i; + int ret; + int j; + + if (do_read_u32(ff, &nr_pmu)) + return -1; + + if (!nr_pmu) { + pr_debug("pmu capabilities not available\n"); + return 0; + } + + pmu_caps = zalloc(sizeof(*pmu_caps) * nr_pmu); + if (!pmu_caps) + return -ENOMEM; + + for (i = 0; i < nr_pmu; i++) { + ret = __process_pmu_caps(ff, &pmu_caps[i].nr_caps, + &pmu_caps[i].caps, + &pmu_caps[i].max_branches); + if (ret) + goto err; + + pmu_caps[i].pmu_name = do_read_string(ff); + if (!pmu_caps[i].pmu_name) { + ret = -1; + goto err; + } + if (!pmu_caps[i].nr_caps) { + pr_debug("%s pmu capabilities not available\n", + pmu_caps[i].pmu_name); + } + } + + ff->ph->env.nr_pmus_with_caps = nr_pmu; + ff->ph->env.pmu_caps = pmu_caps; + return 0; + +err: + for (i = 0; i < nr_pmu; i++) { + for (j = 0; j < pmu_caps[i].nr_caps; j++) + free(pmu_caps[i].caps[j]); + free(pmu_caps[i].caps); + free(pmu_caps[i].pmu_name); + } + + free(pmu_caps); + return ret; +} + #define FEAT_OPR(n, func, __full_only) \ [HEADER_##n] = { \ .name = __stringify(n), \ @@ -2866,6 +3311,10 @@ const struct perf_header_feature_ops feat_ops[HEADER_LAST_FEATURE] = { FEAT_OPR(BPF_PROG_INFO, bpf_prog_info, false), FEAT_OPR(BPF_BTF, bpf_btf, false), FEAT_OPR(COMPRESSED, compressed, false), + FEAT_OPR(CPU_PMU_CAPS, cpu_pmu_caps, false), + FEAT_OPR(CLOCK_DATA, clock_data, false), + FEAT_OPN(HYBRID_TOPOLOGY, hybrid_topology, true), + FEAT_OPR(PMU_CAPS, pmu_caps, false), }; struct header_print_data { @@ -2946,9 +3395,22 @@ int perf_header__fprintf_info(struct perf_session *session, FILE *fp, bool full) return 0; } +struct header_fw { + struct feat_writer fw; + struct feat_fd *ff; +}; + +static int feat_writer_cb(struct feat_writer *fw, void *buf, size_t sz) +{ + struct header_fw *h = container_of(fw, struct header_fw, fw); + + return do_write(h->ff, buf, sz); +} + static int do_write_feat(struct feat_fd *ff, int type, struct perf_file_section **p, - struct evlist *evlist) + struct evlist *evlist, + struct feat_copier *fc) { int err; int ret = 0; @@ -2962,7 +3424,23 @@ static int do_write_feat(struct feat_fd *ff, int type, (*p)->offset = lseek(ff->fd, 0, SEEK_CUR); - err = feat_ops[type].write(ff, evlist); + /* + * Hook to let perf inject copy features sections from the input + * file. + */ + if (fc && fc->copy) { + struct header_fw h = { + .fw.write = feat_writer_cb, + .ff = ff, + }; + + /* ->copy() returns 0 if the feature was not copied */ + err = fc->copy(fc, type, &h.fw); + } else { + err = 0; + } + if (!err) + err = feat_ops[type].write(ff, evlist); if (err < 0) { pr_debug("failed to write feature %s\n", feat_ops[type].name); @@ -2978,7 +3456,8 @@ static int do_write_feat(struct feat_fd *ff, int type, } static int perf_header__adds_write(struct perf_header *header, - struct evlist *evlist, int fd) + struct evlist *evlist, int fd, + struct feat_copier *fc) { int nr_sections; struct feat_fd ff; @@ -3007,7 +3486,7 @@ static int perf_header__adds_write(struct perf_header *header, lseek(fd, sec_start + sec_size, SEEK_SET); for_each_set_bit(feat, header->adds_features, HEADER_FEAT_BITS) { - if (do_write_feat(&ff, feat, &p, evlist)) + if (do_write_feat(&ff, feat, &p, evlist, fc)) perf_header__clear_feat(header, feat); } @@ -3045,9 +3524,10 @@ int perf_header__write_pipe(int fd) return 0; } -int perf_session__write_header(struct perf_session *session, - struct evlist *evlist, - int fd, bool at_exit) +static int perf_session__do_write_header(struct perf_session *session, + struct evlist *evlist, + int fd, bool at_exit, + struct feat_copier *fc) { struct perf_file_header f_header; struct perf_file_attr f_attr; @@ -3091,7 +3571,7 @@ int perf_session__write_header(struct perf_session *session, header->feat_offset = header->data_offset + header->data_size; if (at_exit) { - err = perf_header__adds_write(header, evlist, fd); + err = perf_header__adds_write(header, evlist, fd, fc); if (err < 0) return err; } @@ -3124,6 +3604,21 @@ int perf_session__write_header(struct perf_session *session, return 0; } +int perf_session__write_header(struct perf_session *session, + struct evlist *evlist, + int fd, bool at_exit) +{ + return perf_session__do_write_header(session, evlist, fd, at_exit, NULL); +} + +int perf_session__inject_header(struct perf_session *session, + struct evlist *evlist, + int fd, + struct feat_copier *fc) +{ + return perf_session__do_write_header(session, evlist, fd, true, fc); +} + static int perf_header__getbuffer64(struct perf_header *header, int fd, void *buf, size_t size) { @@ -3393,16 +3888,17 @@ static int perf_file_section__process(struct perf_file_section *section, } static int perf_file_header__read_pipe(struct perf_pipe_file_header *header, - struct perf_header *ph, int fd, - bool repipe) + struct perf_header *ph, + struct perf_data* data, + bool repipe, int repipe_fd) { struct feat_fd ff = { - .fd = STDOUT_FILENO, + .fd = repipe_fd, .ph = ph, }; ssize_t ret; - ret = readn(fd, header, sizeof(*header)); + ret = perf_data__read(data, header, sizeof(*header)); if (ret <= 0) return -1; @@ -3420,19 +3916,18 @@ static int perf_file_header__read_pipe(struct perf_pipe_file_header *header, return 0; } -static int perf_header__read_pipe(struct perf_session *session) +static int perf_header__read_pipe(struct perf_session *session, int repipe_fd) { struct perf_header *header = &session->header; struct perf_pipe_file_header f_header; - if (perf_file_header__read_pipe(&f_header, header, - perf_data__fd(session->data), - session->repipe) < 0) { + if (perf_file_header__read_pipe(&f_header, header, session->data, + session->repipe, repipe_fd) < 0) { pr_debug("incompatible file format\n"); return -EINVAL; } - return 0; + return f_header.size == sizeof(f_header) ? 0 : -1; } static int read_attr(int fd, struct perf_header *ph, @@ -3527,14 +4022,14 @@ static int perf_evlist__prepare_tracepoint_events(struct evlist *evlist, return 0; } -int perf_session__read_header(struct perf_session *session) +int perf_session__read_header(struct perf_session *session, int repipe_fd) { struct perf_data *data = session->data; struct perf_header *header = &session->header; struct perf_file_header f_header; struct perf_file_attr f_attr; u64 f_id; - int nr_attrs, nr_ids, i, j; + int nr_attrs, nr_ids, i, j, err; int fd = perf_data__fd(data); session->evlist = evlist__new(); @@ -3543,12 +4038,25 @@ int perf_session__read_header(struct perf_session *session) session->evlist->env = &header->env; session->machines.host.env = &header->env; - if (perf_data__is_pipe(data)) - return perf_header__read_pipe(session); + + /* + * We can read 'pipe' data event from regular file, + * check for the pipe header regardless of source. + */ + err = perf_header__read_pipe(session, repipe_fd); + if (!err || perf_data__is_pipe(data)) { + data->is_pipe = true; + return err; + } if (perf_file_header__read(&f_header, header, fd) < 0) return -EINVAL; + if (header->needs_swap && data->in_place_update) { + pr_err("In-place update not supported when byte-swapping is required\n"); + return -EINVAL; + } + /* * Sanity check that perf.data was written cleanly; data size is * initialized to 0 and updated only if the on_exit function is run. diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h index ca53a929e9fdd5f2e1bb69e388974707e8324ef7..5db0f8fc074c3b18cb86143c32526c53e4fb41ad 100644 --- a/tools/perf/util/header.h +++ b/tools/perf/util/header.h @@ -43,6 +43,10 @@ enum { HEADER_BPF_PROG_INFO, HEADER_BPF_BTF, HEADER_COMPRESSED, + HEADER_CPU_PMU_CAPS, + HEADER_CLOCK_DATA, + HEADER_HYBRID_TOPOLOGY, + HEADER_PMU_CAPS, HEADER_LAST_FEATURE, HEADER_FEAT_BITS = 256, }; @@ -115,12 +119,27 @@ struct perf_session; struct perf_tool; union perf_event; -int perf_session__read_header(struct perf_session *session); +int perf_session__read_header(struct perf_session *session, int repipe_fd); int perf_session__write_header(struct perf_session *session, struct evlist *evlist, int fd, bool at_exit); int perf_header__write_pipe(int fd); +/* feat_writer writes a feature section to output */ +struct feat_writer { + int (*write)(struct feat_writer *fw, void *buf, size_t sz); +}; + +/* feat_copier copies a feature section using feat_writer to output */ +struct feat_copier { + int (*copy)(struct feat_copier *fc, int feat, struct feat_writer *fw); +}; + +int perf_session__inject_header(struct perf_session *session, + struct evlist *evlist, + int fd, + struct feat_copier *fc); + void perf_header__set_feat(struct perf_header *header, int feat); void perf_header__clear_feat(struct perf_header *header, int feat); bool perf_header__has_feat(const struct perf_header *header, int feat); diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 7b6eaf5e0bda51d512c3ab216b0ef224bd45b26d..151b9e43c88f910de09c0df51180712edd44a7c1 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -2572,9 +2572,10 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al, struct perf_sample *sample, bool nonany_branch_mode) { struct branch_info *bi; + struct branch_entry *entries = perf_sample__branch_entries(sample); /* If we have branch cycles always annotate them. */ - if (bs && bs->nr && bs->entries[0].flags.cycles) { + if (bs && bs->nr && entries[0].flags.cycles) { int i; bi = sample__resolve_bstack(sample, al); diff --git a/tools/perf/util/include/linux/linkage.h b/tools/perf/util/include/linux/linkage.h index f01d48a8d707987de71244cec8df09353c592129..b8a5159361b4123651509d9510e851a51f6c1a42 100644 --- a/tools/perf/util/include/linux/linkage.h +++ b/tools/perf/util/include/linux/linkage.h @@ -5,10 +5,93 @@ /* linkage.h ... for including arch/x86/lib/memcpy_64.S */ -#define ENTRY(name) \ - .globl name; \ +/* Some toolchains use other characters (e.g. '`') to mark new line in macro */ +#ifndef ASM_NL +#define ASM_NL ; +#endif + +#ifndef __ALIGN +#define __ALIGN .align 4,0x90 +#define __ALIGN_STR ".align 4,0x90" +#endif + +/* SYM_T_FUNC -- type used by assembler to mark functions */ +#ifndef SYM_T_FUNC +#define SYM_T_FUNC STT_FUNC +#endif + +/* SYM_A_* -- align the symbol? */ +#define SYM_A_ALIGN ALIGN + +/* SYM_L_* -- linkage of symbols */ +#define SYM_L_GLOBAL(name) .globl name +#define SYM_L_LOCAL(name) /* nothing */ + +#define ALIGN __ALIGN + +/* === generic annotations === */ + +/* SYM_ENTRY -- use only if you have to for non-paired symbols */ +#ifndef SYM_ENTRY +#define SYM_ENTRY(name, linkage, align...) \ + linkage(name) ASM_NL \ + align ASM_NL \ name: +#endif + +/* SYM_START -- use only if you have to */ +#ifndef SYM_START +#define SYM_START(name, linkage, align...) \ + SYM_ENTRY(name, linkage, align) +#endif + +/* SYM_END -- use only if you have to */ +#ifndef SYM_END +#define SYM_END(name, sym_type) \ + .type name sym_type ASM_NL \ + .size name, .-name +#endif + +/* + * SYM_FUNC_START_ALIAS -- use where there are two global names for one + * function + */ +#ifndef SYM_FUNC_START_ALIAS +#define SYM_FUNC_START_ALIAS(name) \ + SYM_START(name, SYM_L_GLOBAL, SYM_A_ALIGN) +#endif + +/* SYM_FUNC_START -- use for global functions */ +#ifndef SYM_FUNC_START +/* + * The same as SYM_FUNC_START_ALIAS, but we will need to distinguish these two + * later. + */ +#define SYM_FUNC_START(name) \ + SYM_START(name, SYM_L_GLOBAL, SYM_A_ALIGN) +#endif + +/* SYM_FUNC_START_LOCAL -- use for local functions */ +#ifndef SYM_FUNC_START_LOCAL +/* the same as SYM_FUNC_START_LOCAL_ALIAS, see comment near SYM_FUNC_START */ +#define SYM_FUNC_START_LOCAL(name) \ + SYM_START(name, SYM_L_LOCAL, SYM_A_ALIGN) +#endif + +/* SYM_FUNC_END_ALIAS -- the end of LOCAL_ALIASed or ALIASed function */ +#ifndef SYM_FUNC_END_ALIAS +#define SYM_FUNC_END_ALIAS(name) \ + SYM_END(name, SYM_T_FUNC) +#endif -#define ENDPROC(name) +/* + * SYM_FUNC_END -- the end of SYM_FUNC_START_LOCAL, SYM_FUNC_START, + * SYM_FUNC_START_WEAK, ... + */ +#ifndef SYM_FUNC_END +/* the same as SYM_FUNC_END_ALIAS, see comment near SYM_FUNC_START */ +#define SYM_FUNC_END(name) \ + SYM_END(name, SYM_T_FUNC) +#endif #endif /* PERF_LINUX_LINKAGE_H_ */ diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index b40832419a279b6d5df1f4cbfc6926a4822804b0..5e72294bc030715900aba87dd6a649358cf7bd01 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -33,6 +33,7 @@ #include "tsc.h" #include "intel-pt.h" #include "config.h" +#include "util/perf_api_probe.h" #include "util/synthetic-events.h" #include "time-utils.h" @@ -1278,6 +1279,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq) struct perf_sample sample = { .ip = 0, }; struct dummy_branch_stack { u64 nr; + u64 hw_idx; struct branch_entry entries; } dummy_bs; @@ -1299,6 +1301,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq) if (pt->synth_opts.last_branch && sort__mode == SORT_MODE__BRANCH) { dummy_bs = (struct dummy_branch_stack){ .nr = 1, + .hw_idx = -1ULL, .entries = { .from = sample.ip, .to = sample.addr, @@ -1802,13 +1805,29 @@ static int intel_pt_synth_pebs_sample(struct intel_pt_queue *ptq) if (sample_type & PERF_SAMPLE_ADDR && items->has_mem_access_address) sample.addr = items->mem_access_address; - if (sample_type & PERF_SAMPLE_WEIGHT) { + if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) { /* * Refer kernel's setup_pebs_adaptive_sample_data() and * intel_hsw_weight(). */ - if (items->has_mem_access_latency) - sample.weight = items->mem_access_latency; + if (items->has_mem_access_latency) { + u64 weight = items->mem_access_latency >> 32; + + /* + * Starts from SPR, the mem access latency field + * contains both cache latency [47:32] and instruction + * latency [15:0]. The cache latency is the same as the + * mem access latency on previous platforms. + * + * In practice, no memory access could last than 4G + * cycles. Use latency >> 32 to distinguish the + * different format of the mem access latency field. + */ + if (weight > 0) + sample.weight = weight & 0xffff; + else + sample.weight = items->mem_access_latency; + } if (!sample.weight && items->has_tsx_aux_info) { /* Cycles last block */ sample.weight = (u32)items->tsx_aux_info; diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 8c3addc2e9e1e23c85aa92879cd0d60fc9aa5916..0b51c453bfda860511e99ef53e97d024adc9eaf0 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -650,6 +650,16 @@ int machine__process_namespaces_event(struct machine *machine __maybe_unused, return err; } +int machine__process_cgroup_event(struct machine *machine __maybe_unused, + union perf_event *event, + struct perf_sample *sample __maybe_unused) +{ + if (dump_trace) + perf_event__fprintf_cgroup(event, stdout); + + return 0; +} + int machine__process_lost_event(struct machine *machine __maybe_unused, union perf_event *event, struct perf_sample *sample __maybe_unused) { @@ -1651,6 +1661,12 @@ int machine__process_mmap2_event(struct machine *machine, { struct thread *thread; struct map *map; + struct dso_id dso_id = { + .maj = event->mmap2.maj, + .min = event->mmap2.min, + .ino = event->mmap2.ino, + .ino_generation = event->mmap2.ino_generation, + }; int ret = 0; if (dump_trace) @@ -1671,10 +1687,7 @@ int machine__process_mmap2_event(struct machine *machine, map = map__new(machine, event->mmap2.start, event->mmap2.len, event->mmap2.pgoff, - event->mmap2.maj, - event->mmap2.min, event->mmap2.ino, - event->mmap2.ino_generation, - event->mmap2.prot, + &dso_id, event->mmap2.prot, event->mmap2.flags, event->mmap2.filename, thread); @@ -1727,9 +1740,7 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event map = map__new(machine, event->mmap.start, event->mmap.len, event->mmap.pgoff, - 0, 0, 0, 0, prot, 0, - event->mmap.filename, - thread); + NULL, prot, 0, event->mmap.filename, thread); if (map == NULL) goto out_problem_map; @@ -1885,6 +1896,8 @@ int machine__process_event(struct machine *machine, union perf_event *event, ret = machine__process_mmap_event(machine, event, sample); break; case PERF_RECORD_NAMESPACES: ret = machine__process_namespaces_event(machine, event, sample); break; + case PERF_RECORD_CGROUP: + ret = machine__process_cgroup_event(machine, event, sample); break; case PERF_RECORD_MMAP2: ret = machine__process_mmap2_event(machine, event, sample); break; case PERF_RECORD_FORK: @@ -2082,15 +2095,16 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample, { unsigned int i; const struct branch_stack *bs = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); struct branch_info *bi = calloc(bs->nr, sizeof(struct branch_info)); if (!bi) return NULL; for (i = 0; i < bs->nr; i++) { - ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to); - ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from); - bi[i].flags = bs->entries[i].flags; + ip__resolve_ams(al->thread, &bi[i].to, entries[i].to); + ip__resolve_ams(al->thread, &bi[i].from, entries[i].from); + bi[i].flags = entries[i].flags; } return bi; } @@ -2186,6 +2200,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread, /* LBR only affects the user callchain */ if (i != chain_nr) { struct branch_stack *lbr_stack = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); int lbr_nr = lbr_stack->nr, j, k; bool branch; struct branch_flags *flags; @@ -2211,31 +2226,29 @@ static int resolve_lbr_callchain_sample(struct thread *thread, ip = chain->ips[j]; else if (j > i + 1) { k = j - i - 2; - ip = lbr_stack->entries[k].from; + ip = entries[k].from; branch = true; - flags = &lbr_stack->entries[k].flags; + flags = &entries[k].flags; } else { - ip = lbr_stack->entries[0].to; + ip = entries[0].to; branch = true; - flags = &lbr_stack->entries[0].flags; - branch_from = - lbr_stack->entries[0].from; + flags = &entries[0].flags; + branch_from = entries[0].from; } } else { if (j < lbr_nr) { k = lbr_nr - j - 1; - ip = lbr_stack->entries[k].from; + ip = entries[k].from; branch = true; - flags = &lbr_stack->entries[k].flags; + flags = &entries[k].flags; } else if (j > lbr_nr) ip = chain->ips[i + 1 - (j - lbr_nr)]; else { - ip = lbr_stack->entries[0].to; + ip = entries[0].to; branch = true; - flags = &lbr_stack->entries[0].flags; - branch_from = - lbr_stack->entries[0].from; + flags = &entries[0].flags; + branch_from = entries[0].from; } } @@ -2282,6 +2295,7 @@ static int thread__resolve_callchain_sample(struct thread *thread, int max_stack) { struct branch_stack *branch = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); struct ip_callchain *chain = sample->callchain; int chain_nr = 0; u8 cpumode = PERF_RECORD_MISC_USER; @@ -2329,7 +2343,7 @@ static int thread__resolve_callchain_sample(struct thread *thread, for (i = 0; i < nr; i++) { if (callchain_param.order == ORDER_CALLEE) { - be[i] = branch->entries[i]; + be[i] = entries[i]; if (chain == NULL) continue; @@ -2348,7 +2362,7 @@ static int thread__resolve_callchain_sample(struct thread *thread, be[i].from >= chain->ips[first_call] - 8) first_call++; } else - be[i] = branch->entries[branch->nr - i - 1]; + be[i] = entries[branch->nr - i - 1]; } memset(iter, 0, sizeof(struct iterations) * nr); @@ -2701,9 +2715,14 @@ u8 machine__addr_cpumode(struct machine *machine, u8 cpumode, u64 addr) return addr_cpumode; } +struct dso *machine__findnew_dso_id(struct machine *machine, const char *filename, struct dso_id *id) +{ + return dsos__findnew_id(&machine->dsos, filename, id); +} + struct dso *machine__findnew_dso(struct machine *machine, const char *filename) { - return dsos__findnew(&machine->dsos, filename); + return machine__findnew_dso_id(machine, filename, NULL); } char *machine__resolve_kernel_addr(void *vmachine, unsigned long long *addrp, char **modp) diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h index 18e13c0ccd6afd9c82c5e024993b2d43494ebc7c..fbb13c07f07e9255a5ac8a64b99a117dd3d08c55 100644 --- a/tools/perf/util/machine.h +++ b/tools/perf/util/machine.h @@ -11,6 +11,7 @@ struct addr_location; struct branch_stack; struct dso; +struct dso_id; struct evsel; struct perf_sample; struct symbol; @@ -127,6 +128,9 @@ int machine__process_switch_event(struct machine *machine, int machine__process_namespaces_event(struct machine *machine, union perf_event *event, struct perf_sample *sample); +int machine__process_cgroup_event(struct machine *machine, + union perf_event *event, + struct perf_sample *sample); int machine__process_mmap_event(struct machine *machine, union perf_event *event, struct perf_sample *sample); int machine__process_mmap2_event(struct machine *machine, union perf_event *event, @@ -202,6 +206,7 @@ int machine__nr_cpus_avail(struct machine *machine); struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid); struct thread *machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid); +struct dso *machine__findnew_dso_id(struct machine *machine, const char *filename, struct dso_id *id); struct dso *machine__findnew_dso(struct machine *machine, const char *filename); size_t machine__fprintf(struct machine *machine, FILE *fp); diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c index 571e99c908a0e165d72d8c70706872e8dc53c5d1..2028922982ba9da7af9b2222d622c4af28ae9a84 100644 --- a/tools/perf/util/map.c +++ b/tools/perf/util/map.c @@ -28,21 +28,6 @@ static void __maps__insert(struct maps *maps, struct map *map); static void __maps__insert_name(struct maps *maps, struct map *map); -static inline int is_anon_memory(const char *filename, u32 flags) -{ - return flags & MAP_HUGETLB || - !strcmp(filename, "//anon") || - !strncmp(filename, "/dev/zero", sizeof("/dev/zero") - 1) || - !strncmp(filename, "/anon_hugepage", sizeof("/anon_hugepage") - 1); -} - -static inline int is_no_dso_memory(const char *filename) -{ - return !strncmp(filename, "[stack", 6) || - !strncmp(filename, "/SYSV",5) || - !strcmp(filename, "[heap]"); -} - static inline int is_android_lib(const char *filename) { return !strncmp(filename, "/data/app-lib", 13) || @@ -145,8 +130,8 @@ void map__init(struct map *map, u64 start, u64 end, u64 pgoff, struct dso *dso) } struct map *map__new(struct machine *machine, u64 start, u64 len, - u64 pgoff, u32 d_maj, u32 d_min, u64 ino, - u64 ino_gen, u32 prot, u32 flags, char *filename, + u64 pgoff, struct dso_id *id, + u32 prot, u32 flags, char *filename, struct thread *thread) { struct map *map = malloc(sizeof(*map)); @@ -159,14 +144,9 @@ struct map *map__new(struct machine *machine, u64 start, u64 len, int anon, no_dso, vdso, android; android = is_android_lib(filename); - anon = is_anon_memory(filename, flags); + anon = is_anon_memory(filename) || flags & MAP_HUGETLB; vdso = is_vdso_map(filename); no_dso = is_no_dso_memory(filename); - - map->maj = d_maj; - map->min = d_min; - map->ino = ino; - map->ino_generation = ino_gen; map->prot = prot; map->flags = flags; nsi = nsinfo__get(thread->nsinfo); @@ -196,7 +176,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len, pgoff = 0; dso = machine__findnew_vdso(machine, thread); } else - dso = machine__findnew_dso(machine, filename); + dso = machine__findnew_dso_id(machine, filename, id); if (dso == NULL) goto out_delete; diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h index c3614195ddc7ee29e3834af1280658dde4fe7d0f..0be5e970b9a8e51dfbb8d9027f040ccf86c992c7 100644 --- a/tools/perf/util/map.h +++ b/tools/perf/util/map.h @@ -32,9 +32,6 @@ struct map { u32 flags; u64 pgoff; u64 reloc; - u32 maj, min; /* only valid for MMAP2 record */ - u64 ino; /* only valid for MMAP2 record */ - u64 ino_generation;/* only valid for MMAP2 record */ /* ip -> dso rip */ u64 (*map_ip)(struct map *, u64); @@ -110,9 +107,11 @@ struct thread; void map__init(struct map *map, u64 start, u64 end, u64 pgoff, struct dso *dso); + +struct dso_id; + struct map *map__new(struct machine *machine, u64 start, u64 len, - u64 pgoff, u32 d_maj, u32 d_min, u64 ino, - u64 ino_gen, u32 prot, u32 flags, + u64 pgoff, struct dso_id *id, u32 prot, u32 flags, char *filename, struct thread *thread); struct map *map__new2(u64 start, struct dso *dso); void map__delete(struct map *map); @@ -176,4 +175,18 @@ static inline bool is_entry_trampoline(const char *name) return !strcmp(name, ENTRY_TRAMPOLINE_NAME); } + +static inline int is_anon_memory(const char *filename) +{ + return !strcmp(filename, "//anon") || + !strncmp(filename, "/dev/zero", sizeof("/dev/zero") - 1) || + !strncmp(filename, "/anon_hugepage", sizeof("/anon_hugepage") - 1); +} + +static inline int is_no_dso_memory(const char *filename) +{ + return !strncmp(filename, "[stack", 6) || + !strncmp(filename, "/SYSV", 5) || + !strcmp(filename, "[heap]"); +} #endif /* __PERF_MAP_H */ diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index a35dc57d59950a64b3e0a53cf99bca969c16d374..2a8bf0ab861c3a9e740094ed92d4f880e2f3a79e 100644 --- a/tools/perf/util/mmap.c +++ b/tools/perf/util/mmap.c @@ -13,6 +13,7 @@ #include #include #include // sysconf() +#include #ifdef HAVE_LIBNUMA_SUPPORT #include #endif @@ -23,116 +24,9 @@ #include "../perf.h" #include /* page_size */ -size_t perf_mmap__mmap_len(struct mmap *map) +size_t mmap__mmap_len(struct mmap *map) { - return map->core.mask + 1 + page_size; -} - -/* When check_messup is true, 'end' must points to a good entry */ -static union perf_event *perf_mmap__read(struct mmap *map, - u64 *startp, u64 end) -{ - unsigned char *data = map->core.base + page_size; - union perf_event *event = NULL; - int diff = end - *startp; - - if (diff >= (int)sizeof(event->header)) { - size_t size; - - event = (union perf_event *)&data[*startp & map->core.mask]; - size = event->header.size; - - if (size < sizeof(event->header) || diff < (int)size) - return NULL; - - /* - * Event straddles the mmap boundary -- header should always - * be inside due to u64 alignment of output. - */ - if ((*startp & map->core.mask) + size != ((*startp + size) & map->core.mask)) { - unsigned int offset = *startp; - unsigned int len = min(sizeof(*event), size), cpy; - void *dst = map->core.event_copy; - - do { - cpy = min(map->core.mask + 1 - (offset & map->core.mask), len); - memcpy(dst, &data[offset & map->core.mask], cpy); - offset += cpy; - dst += cpy; - len -= cpy; - } while (len); - - event = (union perf_event *)map->core.event_copy; - } - - *startp += size; - } - - return event; -} - -/* - * Read event from ring buffer one by one. - * Return one event for each call. - * - * Usage: - * perf_mmap__read_init() - * while(event = perf_mmap__read_event()) { - * //process the event - * perf_mmap__consume() - * } - * perf_mmap__read_done() - */ -union perf_event *perf_mmap__read_event(struct mmap *map) -{ - union perf_event *event; - - /* - * Check if event was unmapped due to a POLLHUP/POLLERR. - */ - if (!refcount_read(&map->core.refcnt)) - return NULL; - - /* non-overwirte doesn't pause the ringbuffer */ - if (!map->core.overwrite) - map->core.end = perf_mmap__read_head(map); - - event = perf_mmap__read(map, &map->core.start, map->core.end); - - if (!map->core.overwrite) - map->core.prev = map->core.start; - - return event; -} - -static bool perf_mmap__empty(struct mmap *map) -{ - return perf_mmap__read_head(map) == map->core.prev && !map->auxtrace_mmap.base; -} - -void perf_mmap__get(struct mmap *map) -{ - refcount_inc(&map->core.refcnt); -} - -void perf_mmap__put(struct mmap *map) -{ - BUG_ON(map->core.base && refcount_read(&map->core.refcnt) == 0); - - if (refcount_dec_and_test(&map->core.refcnt)) - perf_mmap__munmap(map); -} - -void perf_mmap__consume(struct mmap *map) -{ - if (!map->core.overwrite) { - u64 old = map->core.prev; - - perf_mmap__write_tail(map, old); - } - - if (refcount_read(&map->core.refcnt) == 1 && perf_mmap__empty(map)) - perf_mmap__put(map); + return perf_mmap__mmap_len(&map->core); } int __weak auxtrace_mmap__mmap(struct auxtrace_mmap *mm __maybe_unused, @@ -170,7 +64,7 @@ static int perf_mmap__aio_enabled(struct mmap *map) #ifdef HAVE_LIBNUMA_SUPPORT static int perf_mmap__aio_alloc(struct mmap *map, int idx) { - map->aio.data[idx] = mmap(NULL, perf_mmap__mmap_len(map), PROT_READ|PROT_WRITE, + map->aio.data[idx] = mmap(NULL, mmap__mmap_len(map), PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); if (map->aio.data[idx] == MAP_FAILED) { map->aio.data[idx] = NULL; @@ -183,7 +77,7 @@ static int perf_mmap__aio_alloc(struct mmap *map, int idx) static void perf_mmap__aio_free(struct mmap *map, int idx) { if (map->aio.data[idx]) { - munmap(map->aio.data[idx], perf_mmap__mmap_len(map)); + munmap(map->aio.data[idx], mmap__mmap_len(map)); map->aio.data[idx] = NULL; } } @@ -196,7 +90,7 @@ static int perf_mmap__aio_bind(struct mmap *map, int idx, int cpu, int affinity) if (affinity != PERF_AFFINITY_SYS && cpu__max_node() > 1) { data = map->aio.data[idx]; - mmap_len = perf_mmap__mmap_len(map); + mmap_len = mmap__mmap_len(map); node_mask = 1UL << cpu__get_node(cpu); if (mbind(data, mmap_len, MPOL_BIND, &node_mask, 1, 0)) { pr_err("Failed to bind [%p-%p] AIO buffer to node %d: error %m\n", @@ -210,7 +104,7 @@ static int perf_mmap__aio_bind(struct mmap *map, int idx, int cpu, int affinity) #else /* !HAVE_LIBNUMA_SUPPORT */ static int perf_mmap__aio_alloc(struct mmap *map, int idx) { - map->aio.data[idx] = malloc(perf_mmap__mmap_len(map)); + map->aio.data[idx] = malloc(mmap__mmap_len(map)); if (map->aio.data[idx] == NULL) return -1; @@ -311,19 +205,13 @@ static void perf_mmap__aio_munmap(struct mmap *map __maybe_unused) } #endif -void perf_mmap__munmap(struct mmap *map) +void mmap__munmap(struct mmap *map) { perf_mmap__aio_munmap(map); if (map->data != NULL) { - munmap(map->data, perf_mmap__mmap_len(map)); + munmap(map->data, mmap__mmap_len(map)); map->data = NULL; } - if (map->core.base != NULL) { - munmap(map->core.base, perf_mmap__mmap_len(map)); - map->core.base = NULL; - map->core.fd = -1; - refcount_set(&map->core.refcnt, 0); - } auxtrace_mmap__munmap(&map->auxtrace_mmap); } @@ -353,7 +241,7 @@ static void perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params CPU_SET(map->core.cpu, &map->affinity_mask); } -int perf_mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu) +int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu) { /* * The last one will be done at perf_mmap__consume(), so that we @@ -369,18 +257,12 @@ int perf_mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu) * perf_evlist__filter_pollfd(). */ refcount_set(&map->core.refcnt, 2); - map->core.prev = 0; - map->core.mask = mp->mask; - map->core.base = mmap(NULL, perf_mmap__mmap_len(map), mp->prot, - MAP_SHARED, fd, 0); - if (map->core.base == MAP_FAILED) { + + if (perf_mmap__mmap(&map->core, &mp->core, fd, cpu)) { pr_debug2("failed to mmap perf event ring buffer, error %d\n", errno); - map->core.base = NULL; return -1; } - map->core.fd = fd; - map->core.cpu = cpu; perf_mmap__setup_affinity_mask(map, mp); @@ -389,7 +271,7 @@ int perf_mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu) map->comp_level = mp->comp_level; if (map->comp_level && !perf_mmap__aio_enabled(map)) { - map->data = mmap(NULL, perf_mmap__mmap_len(map), PROT_READ|PROT_WRITE, + map->data = mmap(NULL, mmap__mmap_len(map), PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); if (map->data == MAP_FAILED) { pr_debug2("failed to mmap data buffer, error %d\n", @@ -406,96 +288,16 @@ int perf_mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu) return perf_mmap__aio_mmap(map, mp); } -static int overwrite_rb_find_range(void *buf, int mask, u64 *start, u64 *end) -{ - struct perf_event_header *pheader; - u64 evt_head = *start; - int size = mask + 1; - - pr_debug2("%s: buf=%p, start=%"PRIx64"\n", __func__, buf, *start); - pheader = (struct perf_event_header *)(buf + (*start & mask)); - while (true) { - if (evt_head - *start >= (unsigned int)size) { - pr_debug("Finished reading overwrite ring buffer: rewind\n"); - if (evt_head - *start > (unsigned int)size) - evt_head -= pheader->size; - *end = evt_head; - return 0; - } - - pheader = (struct perf_event_header *)(buf + (evt_head & mask)); - - if (pheader->size == 0) { - pr_debug("Finished reading overwrite ring buffer: get start\n"); - *end = evt_head; - return 0; - } - - evt_head += pheader->size; - pr_debug3("move evt_head: %"PRIx64"\n", evt_head); - } - WARN_ONCE(1, "Shouldn't get here\n"); - return -1; -} - -/* - * Report the start and end of the available data in ringbuffer - */ -static int __perf_mmap__read_init(struct mmap *md) -{ - u64 head = perf_mmap__read_head(md); - u64 old = md->core.prev; - unsigned char *data = md->core.base + page_size; - unsigned long size; - - md->core.start = md->core.overwrite ? head : old; - md->core.end = md->core.overwrite ? old : head; - - if ((md->core.end - md->core.start) < md->core.flush) - return -EAGAIN; - - size = md->core.end - md->core.start; - if (size > (unsigned long)(md->core.mask) + 1) { - if (!md->core.overwrite) { - WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n"); - - md->core.prev = head; - perf_mmap__consume(md); - return -EAGAIN; - } - - /* - * Backward ring buffer is full. We still have a chance to read - * most of data from it. - */ - if (overwrite_rb_find_range(data, md->core.mask, &md->core.start, &md->core.end)) - return -EINVAL; - } - - return 0; -} - -int perf_mmap__read_init(struct mmap *map) -{ - /* - * Check if event was unmapped due to a POLLHUP/POLLERR. - */ - if (!refcount_read(&map->core.refcnt)) - return -ENOENT; - - return __perf_mmap__read_init(map); -} - int perf_mmap__push(struct mmap *md, void *to, int push(struct mmap *map, void *to, void *buf, size_t size)) { - u64 head = perf_mmap__read_head(md); + u64 head = perf_mmap__read_head(&md->core); unsigned char *data = md->core.base + page_size; unsigned long size; void *buf; int rc = 0; - rc = perf_mmap__read_init(md); + rc = perf_mmap__read_init(&md->core); if (rc < 0) return (rc == -EAGAIN) ? 1 : -1; @@ -522,24 +324,7 @@ int perf_mmap__push(struct mmap *md, void *to, } md->core.prev = head; - perf_mmap__consume(md); + perf_mmap__consume(&md->core); out: return rc; } - -/* - * Mandatory for overwrite mode - * The direction of overwrite mode is backward. - * The last perf_mmap__read() will set tail to map->core.prev. - * Need to correct the map->core.prev to head which is the end of next read. - */ -void perf_mmap__read_done(struct mmap *map) -{ - /* - * Check if event was unmapped due to a POLLHUP/POLLERR. - */ - if (!refcount_read(&map->core.refcnt)) - return; - - map->core.prev = perf_mmap__read_head(map); -} diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h index e567c1c875bdb8f65d1d82b8b3a8a47fff538191..bee4e83f7109d33ebc495658f5534bc74a75a215 100644 --- a/tools/perf/util/mmap.h +++ b/tools/perf/util/mmap.h @@ -37,37 +37,19 @@ struct mmap { }; struct mmap_params { - int prot, mask, nr_cblocks, affinity, flush, comp_level; + struct perf_mmap_param core; + int nr_cblocks, affinity, flush, comp_level; struct auxtrace_mmap_params auxtrace_mp; }; -int perf_mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu); -void perf_mmap__munmap(struct mmap *map); - -void perf_mmap__get(struct mmap *map); -void perf_mmap__put(struct mmap *map); - -void perf_mmap__consume(struct mmap *map); - -static inline u64 perf_mmap__read_head(struct mmap *mm) -{ - return ring_buffer_read_head(mm->core.base); -} - -static inline void perf_mmap__write_tail(struct mmap *md, u64 tail) -{ - ring_buffer_write_tail(md->core.base, tail); -} +int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu); +void mmap__munmap(struct mmap *map); union perf_event *perf_mmap__read_forward(struct mmap *map); -union perf_event *perf_mmap__read_event(struct mmap *map); - int perf_mmap__push(struct mmap *md, void *to, int push(struct mmap *map, void *to, void *buf, size_t size)); -size_t perf_mmap__mmap_len(struct mmap *map); +size_t mmap__mmap_len(struct mmap *map); -int perf_mmap__read_init(struct mmap *md); -void perf_mmap__read_done(struct mmap *map); #endif /*__PERF_MMAP_H */ diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 229ed0713280bbe921e16e3dbb1979189cd060b0..27fd5501e053c8f7dc53c5c3b73c99c0140d7176 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -342,8 +342,11 @@ __add_event(struct list_head *list, int *idx, cpu_list ? perf_cpu_map__new(cpu_list) : NULL; event_attr_init(attr); + if (pmu && attr->type == PERF_TYPE_RAW) + perf_pmu__warn_invalid_config(pmu, attr->config, name); - evsel = perf_evsel__new_idx(attr, *idx); + + evsel = evsel__new_idx(attr, *idx); if (!evsel) return NULL; @@ -526,9 +529,8 @@ static int add_tracepoint(struct list_head *list, int *idx, struct parse_events_error *err, struct list_head *head_config) { - struct evsel *evsel; + struct evsel *evsel = evsel__newtp_idx(sys_name, evt_name, (*idx)++); - evsel = perf_evsel__newtp_idx(sys_name, evt_name, (*idx)++); if (IS_ERR(evsel)) { tracepoint_error(err, PTR_ERR(evsel), sys_name, evt_name); return PTR_ERR(evsel); @@ -1201,8 +1203,7 @@ static int config_attr(struct perf_event_attr *attr, static int get_config_terms(struct list_head *head_config, struct list_head *head_terms __maybe_unused) { -#define ADD_CONFIG_TERM(__type, __name, __val) \ -do { \ +#define ADD_CONFIG_TERM(__type) \ struct perf_evsel_config_term *__t; \ \ __t = zalloc(sizeof(*__t)); \ @@ -1211,9 +1212,19 @@ do { \ \ INIT_LIST_HEAD(&__t->list); \ __t->type = PERF_EVSEL__CONFIG_TERM_ ## __type; \ - __t->val.__name = __val; \ __t->weak = term->weak; \ - list_add_tail(&__t->list, head_terms); \ + list_add_tail(&__t->list, head_terms) + +#define ADD_CONFIG_TERM_VAL(__type, __name, __val) \ +do { \ + ADD_CONFIG_TERM(__type); \ + __t->val.__name = __val; \ +} while (0) + +#define ADD_CONFIG_TERM_STR(__type, __val) \ +do { \ + ADD_CONFIG_TERM(__type); \ + __t->val.str = __val; \ } while (0) struct parse_events_term *term; @@ -1221,59 +1232,101 @@ do { \ list_for_each_entry(term, head_config, list) { switch (term->type_term) { case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD: - ADD_CONFIG_TERM(PERIOD, period, term->val.num); + ADD_CONFIG_TERM_VAL(PERIOD, period, term->val.num); break; case PARSE_EVENTS__TERM_TYPE_SAMPLE_FREQ: - ADD_CONFIG_TERM(FREQ, freq, term->val.num); + ADD_CONFIG_TERM_VAL(FREQ, freq, term->val.num); break; case PARSE_EVENTS__TERM_TYPE_TIME: - ADD_CONFIG_TERM(TIME, time, term->val.num); + ADD_CONFIG_TERM_VAL(TIME, time, term->val.num); break; case PARSE_EVENTS__TERM_TYPE_CALLGRAPH: - ADD_CONFIG_TERM(CALLGRAPH, callgraph, term->val.str); + ADD_CONFIG_TERM_STR(CALLGRAPH, term->val.str); break; case PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE: - ADD_CONFIG_TERM(BRANCH, branch, term->val.str); + ADD_CONFIG_TERM_STR(BRANCH, term->val.str); break; case PARSE_EVENTS__TERM_TYPE_STACKSIZE: - ADD_CONFIG_TERM(STACK_USER, stack_user, term->val.num); + ADD_CONFIG_TERM_VAL(STACK_USER, stack_user, + term->val.num); break; case PARSE_EVENTS__TERM_TYPE_INHERIT: - ADD_CONFIG_TERM(INHERIT, inherit, term->val.num ? 1 : 0); + ADD_CONFIG_TERM_VAL(INHERIT, inherit, + term->val.num ? 1 : 0); break; case PARSE_EVENTS__TERM_TYPE_NOINHERIT: - ADD_CONFIG_TERM(INHERIT, inherit, term->val.num ? 0 : 1); + ADD_CONFIG_TERM_VAL(INHERIT, inherit, + term->val.num ? 0 : 1); break; case PARSE_EVENTS__TERM_TYPE_MAX_STACK: - ADD_CONFIG_TERM(MAX_STACK, max_stack, term->val.num); + ADD_CONFIG_TERM_VAL(MAX_STACK, max_stack, + term->val.num); break; case PARSE_EVENTS__TERM_TYPE_MAX_EVENTS: - ADD_CONFIG_TERM(MAX_EVENTS, max_events, term->val.num); + ADD_CONFIG_TERM_VAL(MAX_EVENTS, max_events, + term->val.num); break; case PARSE_EVENTS__TERM_TYPE_OVERWRITE: - ADD_CONFIG_TERM(OVERWRITE, overwrite, term->val.num ? 1 : 0); + ADD_CONFIG_TERM_VAL(OVERWRITE, overwrite, + term->val.num ? 1 : 0); break; case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE: - ADD_CONFIG_TERM(OVERWRITE, overwrite, term->val.num ? 0 : 1); + ADD_CONFIG_TERM_VAL(OVERWRITE, overwrite, + term->val.num ? 0 : 1); break; case PARSE_EVENTS__TERM_TYPE_DRV_CFG: - ADD_CONFIG_TERM(DRV_CFG, drv_cfg, term->val.str); + ADD_CONFIG_TERM_STR(DRV_CFG, term->val.str); break; case PARSE_EVENTS__TERM_TYPE_PERCORE: - ADD_CONFIG_TERM(PERCORE, percore, - term->val.num ? true : false); + ADD_CONFIG_TERM_VAL(PERCORE, percore, + term->val.num ? true : false); break; case PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT: - ADD_CONFIG_TERM(AUX_OUTPUT, aux_output, term->val.num ? 1 : 0); + ADD_CONFIG_TERM_VAL(AUX_OUTPUT, aux_output, + term->val.num ? 1 : 0); break; case PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE: - ADD_CONFIG_TERM(AUX_SAMPLE_SIZE, aux_sample_size, term->val.num); + ADD_CONFIG_TERM_VAL(AUX_SAMPLE_SIZE, aux_sample_size, + term->val.num); break; default: break; } } -#undef ADD_EVSEL_CONFIG + return 0; +} + +/* + * Add PERF_EVSEL__CONFIG_TERM_CFG_CHG where cfg_chg will have a bit set for + * each bit of attr->config that the user has changed. + */ +static int get_config_chgs(struct perf_pmu *pmu, struct list_head *head_config, + struct list_head *head_terms) +{ + struct parse_events_term *term; + u64 bits = 0; + int type; + + list_for_each_entry(term, head_config, list) { + switch (term->type_term) { + case PARSE_EVENTS__TERM_TYPE_USER: + type = perf_pmu__format_type(&pmu->format, term->config); + if (type != PERF_PMU_FORMAT_VALUE_CONFIG) + continue; + bits |= perf_pmu__format_bits(&pmu->format, term->config); + break; + case PARSE_EVENTS__TERM_TYPE_CONFIG: + bits = ~(u64)0; + break; + default: + break; + } + } + + if (bits) + ADD_CONFIG_TERM_VAL(CFG_CHG, cfg_chg, bits); + +#undef ADD_CONFIG_TERM return 0; } @@ -1402,6 +1455,13 @@ int parse_events_add_pmu(struct parse_events_state *parse_state, if (get_config_terms(head_config, &config_terms)) return -ENOMEM; + /* + * When using default config, record which bits of attr->config were + * changed by the user. + */ + if (pmu->default_config && get_config_chgs(pmu, head_config, &config_terms)) + return -ENOMEM; + if (perf_pmu__config(pmu, &attr, head_config, parse_state->error)) { struct perf_evsel_config_term *pos, *tmp; diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-regs-options.c index 869ef7e22bd91ece4629db60531d0b9b7d343a49..a4a100425b3a29bcb11765e833d9fb9e2d66da38 100644 --- a/tools/perf/util/parse-regs-options.c +++ b/tools/perf/util/parse-regs-options.c @@ -13,7 +13,7 @@ static int __parse_regs(const struct option *opt, const char *str, int unset, bool intr) { uint64_t *mode = (uint64_t *)opt->value; - const struct sample_reg *r; + const struct sample_reg *r = NULL; char *s, *os = NULL, *p; int ret = -1; uint64_t mask; @@ -46,19 +46,23 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr) if (!strcmp(s, "?")) { fprintf(stderr, "available registers: "); +#ifdef HAVE_PERF_REGS_SUPPORT for (r = sample_reg_masks; r->name; r++) { if (r->mask & mask) fprintf(stderr, "%s ", r->name); } +#endif fputc('\n', stderr); /* just printing available regs */ goto error; } +#ifdef HAVE_PERF_REGS_SUPPORT for (r = sample_reg_masks; r->name; r++) { if ((r->mask & mask) && !strcasecmp(s, r->name)) break; } - if (!r->name) { +#endif + if (!r || !r->name) { ui__warning("Unknown register \"%s\", check man page or run \"perf record %s?\"\n", s, intr ? "-I" : "--user-regs="); goto error; diff --git a/tools/perf/util/perf_api_probe.c b/tools/perf/util/perf_api_probe.c new file mode 100644 index 0000000000000000000000000000000000000000..1337965673d7069d5bc50cc3389d51eaa2e37aaa --- /dev/null +++ b/tools/perf/util/perf_api_probe.c @@ -0,0 +1,164 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include "perf-sys.h" +#include "util/cloexec.h" +#include "util/evlist.h" +#include "util/evsel.h" +#include "util/parse-events.h" +#include "util/perf_api_probe.h" +#include +#include + +typedef void (*setup_probe_fn_t)(struct evsel *evsel); + +static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str) +{ + struct evlist *evlist; + struct evsel *evsel; + unsigned long flags = perf_event_open_cloexec_flag(); + int err = -EAGAIN, fd; + static pid_t pid = -1; + + evlist = evlist__new(); + if (!evlist) + return -ENOMEM; + + if (parse_events(evlist, str, NULL)) + goto out_delete; + + evsel = evlist__first(evlist); + + while (1) { + fd = sys_perf_event_open(&evsel->core.attr, pid, cpu, -1, flags); + if (fd < 0) { + if (pid == -1 && errno == EACCES) { + pid = 0; + continue; + } + goto out_delete; + } + break; + } + close(fd); + + fn(evsel); + + fd = sys_perf_event_open(&evsel->core.attr, pid, cpu, -1, flags); + if (fd < 0) { + if (errno == EINVAL) + err = -EINVAL; + goto out_delete; + } + close(fd); + err = 0; + +out_delete: + evlist__delete(evlist); + return err; +} + +static bool perf_probe_api(setup_probe_fn_t fn) +{ + const char *try[] = {"cycles:u", "instructions:u", "cpu-clock:u", NULL}; + struct perf_cpu_map *cpus; + int cpu, ret, i = 0; + + cpus = perf_cpu_map__new(NULL); + if (!cpus) + return false; + cpu = cpus->map[0]; + perf_cpu_map__put(cpus); + + do { + ret = perf_do_probe_api(fn, cpu, try[i++]); + if (!ret) + return true; + } while (ret == -EAGAIN && try[i]); + + return false; +} + +static void perf_probe_sample_identifier(struct evsel *evsel) +{ + evsel->core.attr.sample_type |= PERF_SAMPLE_IDENTIFIER; +} + +static void perf_probe_comm_exec(struct evsel *evsel) +{ + evsel->core.attr.comm_exec = 1; +} + +static void perf_probe_context_switch(struct evsel *evsel) +{ + evsel->core.attr.context_switch = 1; +} + +bool perf_can_sample_identifier(void) +{ + return perf_probe_api(perf_probe_sample_identifier); +} + +bool perf_can_comm_exec(void) +{ + return perf_probe_api(perf_probe_comm_exec); +} + +bool perf_can_record_switch_events(void) +{ + return perf_probe_api(perf_probe_context_switch); +} + +bool perf_can_record_cpu_wide(void) +{ + struct perf_event_attr attr = { + .type = PERF_TYPE_SOFTWARE, + .config = PERF_COUNT_SW_CPU_CLOCK, + .exclude_kernel = 1, + }; + struct perf_cpu_map *cpus; + int cpu, fd; + + cpus = perf_cpu_map__new(NULL); + if (!cpus) + return false; + cpu = cpus->map[0]; + perf_cpu_map__put(cpus); + + fd = sys_perf_event_open(&attr, -1, cpu, -1, 0); + if (fd < 0) + return false; + close(fd); + + return true; +} + +/* + * Architectures are expected to know if AUX area sampling is supported by the + * hardware. Here we check for kernel support. + */ +bool perf_can_aux_sample(void) +{ + struct perf_event_attr attr = { + .size = sizeof(struct perf_event_attr), + .exclude_kernel = 1, + /* + * Non-zero value causes the kernel to calculate the effective + * attribute size up to that byte. + */ + .aux_sample_size = 1, + }; + int fd; + + fd = sys_perf_event_open(&attr, -1, 0, -1, 0); + /* + * If the kernel attribute is big enough to contain aux_sample_size + * then we assume that it is supported. We are relying on the kernel to + * validate the attribute size before anything else that could be wrong. + */ + if (fd < 0 && errno == E2BIG) + return false; + if (fd >= 0) + close(fd); + + return true; +} diff --git a/tools/perf/util/perf_api_probe.h b/tools/perf/util/perf_api_probe.h new file mode 100644 index 0000000000000000000000000000000000000000..706c3c6426e2fb68043884eeabd690669651fdb6 --- /dev/null +++ b/tools/perf/util/perf_api_probe.h @@ -0,0 +1,14 @@ + +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __PERF_API_PROBE_H +#define __PERF_API_PROBE_H + +#include + +bool perf_can_aux_sample(void); +bool perf_can_comm_exec(void); +bool perf_can_record_cpu_wide(void); +bool perf_can_record_switch_events(void); +bool perf_can_sample_identifier(void); + +#endif // __PERF_API_PROBE_H diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c index 651203126c71ec49e2caae4725230b3873616f5a..533377743dbcc7089661eaceb2a8d9649e2e165f 100644 --- a/tools/perf/util/perf_event_attr_fprintf.c +++ b/tools/perf/util/perf_event_attr_fprintf.c @@ -35,6 +35,8 @@ static void __p_sample_type(char *buf, size_t size, u64 value) bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER), bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC), bit_name(WEIGHT), bit_name(PHYS_ADDR), bit_name(AUX), + bit_name(CGROUP), bit_name(DATA_PAGE_SIZE), bit_name(CODE_PAGE_SIZE), + bit_name(WEIGHT_STRUCT), { .name = NULL, } }; #undef bit_name @@ -50,6 +52,7 @@ static void __p_branch_sample_type(char *buf, size_t size, u64 value) bit_name(ABORT_TX), bit_name(IN_TX), bit_name(NO_TX), bit_name(COND), bit_name(CALL_STACK), bit_name(IND_JUMP), bit_name(CALL), bit_name(NO_FLAGS), bit_name(NO_CYCLES), + bit_name(HW_INDEX), { .name = NULL, } }; #undef bit_name @@ -131,6 +134,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr, PRINT_ATTRf(ksymbol, p_unsigned); PRINT_ATTRf(bpf_event, p_unsigned); PRINT_ATTRf(aux_output, p_unsigned); + PRINT_ATTRf(cgroup, p_unsigned); PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned); PRINT_ATTRf(bp_type, p_unsigned); diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index 2774cec1f15fa0561f8e68ae01cb03c3757a8c14..5ee47ae1509c67fcf015ae5e52637d27272af423 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -3,10 +3,6 @@ #include "perf_regs.h" #include "event.h" -const struct sample_reg __weak sample_reg_masks[] = { - SMPL_REG_END -}; - int __weak arch_sdt_arg_parse_op(char *old_op __maybe_unused, char **new_op __maybe_unused) { diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h index ec7640cc4c91124d00c27e5ed2dae78739530fd0..a4549912618443cbc3e4f79d94a3861187f25b7a 100644 --- a/tools/perf/util/perf_regs.h +++ b/tools/perf/util/perf_regs.h @@ -15,8 +15,6 @@ struct sample_reg { #define SMPL_REG2(n, b) { .name = #n, .mask = 3ULL << (b) } #define SMPL_REG_END { .name = NULL } -extern const struct sample_reg sample_reg_masks[]; - enum { SDT_ARG_VALID = 0, SDT_ARG_SKIP, @@ -27,6 +25,8 @@ uint64_t arch__intr_reg_mask(void); uint64_t arch__user_reg_mask(void); #ifdef HAVE_PERF_REGS_SUPPORT +extern const struct sample_reg sample_reg_masks[]; + #include #define DWARF_MINIMAL_REGS ((1ULL << PERF_REG_IP) | (1ULL << PERF_REG_SP)) diff --git a/tools/perf/util/pmu-hybrid.c b/tools/perf/util/pmu-hybrid.c new file mode 100644 index 0000000000000000000000000000000000000000..f51ccaac60ee484676a21721bd8f9eb218de2fb8 --- /dev/null +++ b/tools/perf/util/pmu-hybrid.c @@ -0,0 +1,89 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "fncache.h" +#include "pmu-hybrid.h" + +LIST_HEAD(perf_pmu__hybrid_pmus); + +bool perf_pmu__hybrid_mounted(const char *name) +{ + char path[PATH_MAX]; + const char *sysfs; + FILE *file; + int n, cpu; + + if (strncmp(name, "cpu_", 4)) + return false; + + sysfs = sysfs__mountpoint(); + if (!sysfs) + return false; + + snprintf(path, PATH_MAX, CPUS_TEMPLATE_CPU, sysfs, name); + if (!file_available(path)) + return false; + + file = fopen(path, "r"); + if (!file) + return false; + + n = fscanf(file, "%u", &cpu); + fclose(file); + if (n <= 0) + return false; + + return true; +} + +struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name) +{ + struct perf_pmu *pmu; + + if (!name) + return NULL; + + perf_pmu__for_each_hybrid_pmu(pmu) { + if (!strcmp(name, pmu->name)) + return pmu; + } + + return NULL; +} + +bool perf_pmu__is_hybrid(const char *name) +{ + return perf_pmu__find_hybrid_pmu(name) != NULL; +} + +char *perf_pmu__hybrid_type_to_pmu(const char *type) +{ + char *pmu_name = NULL; + + if (asprintf(&pmu_name, "cpu_%s", type) < 0) + return NULL; + + if (perf_pmu__is_hybrid(pmu_name)) + return pmu_name; + + /* + * pmu may be not scanned, check the sysfs. + */ + if (perf_pmu__hybrid_mounted(pmu_name)) + return pmu_name; + + free(pmu_name); + return NULL; +} diff --git a/tools/perf/util/pmu-hybrid.h b/tools/perf/util/pmu-hybrid.h new file mode 100644 index 0000000000000000000000000000000000000000..2b186c26a43eaef4f55f6b2339d5a2d8ac40cb42 --- /dev/null +++ b/tools/perf/util/pmu-hybrid.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __PMU_HYBRID_H +#define __PMU_HYBRID_H + +#include +#include +#include +#include +#include "pmu.h" + +extern struct list_head perf_pmu__hybrid_pmus; + +#define perf_pmu__for_each_hybrid_pmu(pmu) \ + list_for_each_entry(pmu, &perf_pmu__hybrid_pmus, hybrid_list) + +bool perf_pmu__hybrid_mounted(const char *name); + +struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name); +bool perf_pmu__is_hybrid(const char *name); +char *perf_pmu__hybrid_type_to_pmu(const char *type); + +static inline int perf_pmu__hybrid_pmu_num(void) +{ + struct perf_pmu *pmu; + int num = 0; + + perf_pmu__for_each_hybrid_pmu(pmu) + num++; + + return num; +} + +#endif /* __PMU_HYBRID_H */ diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index 99227038063b2674e06581b7e2561f8496342e33..708cbd322d613a32f1ecfdbb5a84723241958ce4 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -18,12 +18,15 @@ #include #include #include "debug.h" +#include "evsel.h" #include "pmu.h" #include "parse-events.h" #include "header.h" #include "pmu-events/pmu-events.h" #include "string2.h" #include "strbuf.h" +#include "fncache.h" +#include "pmu-hybrid.h" struct perf_pmu_format { char *name; @@ -36,6 +39,7 @@ int perf_pmu_parse(struct list_head *list, char *name); extern FILE *perf_pmu_in; static LIST_HEAD(pmus); +static bool hybrid_scanned; /* * Parse & process all the sysfs attributes located under @@ -82,7 +86,6 @@ int perf_pmu__format_parse(char *dir, struct list_head *head) */ static int pmu_format(const char *name, struct list_head *format) { - struct stat st; char path[PATH_MAX]; const char *sysfs = sysfs__mountpoint(); @@ -92,8 +95,8 @@ static int pmu_format(const char *name, struct list_head *format) snprintf(path, PATH_MAX, "%s" EVENT_SOURCE_DEVICE_PATH "%s/format", sysfs, name); - if (stat(path, &st) < 0) - return 0; /* no error if format does not exist */ + if (!file_available(path)) + return 0; if (perf_pmu__format_parse(path, format)) return -1; @@ -470,7 +473,6 @@ static int pmu_aliases_parse(char *dir, struct list_head *head) */ static int pmu_aliases(const char *name, struct list_head *head) { - struct stat st; char path[PATH_MAX]; const char *sysfs = sysfs__mountpoint(); @@ -480,8 +482,8 @@ static int pmu_aliases(const char *name, struct list_head *head) snprintf(path, PATH_MAX, "%s/bus/event_source/devices/%s/events", sysfs, name); - if (stat(path, &st) < 0) - return 0; /* no error if 'events' does not exist */ + if (!file_available(path)) + return 0; if (pmu_aliases_parse(path, head)) return -1; @@ -520,7 +522,6 @@ static int pmu_alias_terms(struct perf_pmu_alias *alias, */ static int pmu_type(const char *name, __u32 *type) { - struct stat st; char path[PATH_MAX]; FILE *file; int ret = 0; @@ -532,7 +533,7 @@ static int pmu_type(const char *name, __u32 *type) snprintf(path, PATH_MAX, "%s" EVENT_SOURCE_DEVICE_PATH "%s/type", sysfs, name); - if (stat(path, &st) < 0) + if (access(path, R_OK) < 0) return -1; file = fopen(path, "r"); @@ -592,8 +593,8 @@ static struct perf_cpu_map *__pmu_cpumask(const char *path) * Uncore PMUs have a "cpumask" file under sysfs. CPU PMUs (e.g. on arm/arm64) * may have a "cpus" file. */ +#define SYS_TEMPLATE_ID "./bus/event_source/devices/%s/identifier" #define CPUS_TEMPLATE_UNCORE "%s/bus/event_source/devices/%s/cpumask" -#define CPUS_TEMPLATE_CPU "%s/bus/event_source/devices/%s/cpus" static struct perf_cpu_map *pmu_cpumask(const char *name) { @@ -623,14 +624,29 @@ static struct perf_cpu_map *pmu_cpumask(const char *name) static bool pmu_is_uncore(const char *name) { char path[PATH_MAX]; - struct perf_cpu_map *cpus; - const char *sysfs = sysfs__mountpoint(); + const char *sysfs; + if (perf_pmu__hybrid_mounted(name)) + return false; + + sysfs = sysfs__mountpoint(); snprintf(path, PATH_MAX, CPUS_TEMPLATE_UNCORE, sysfs, name); - cpus = __pmu_cpumask(path); - perf_cpu_map__put(cpus); + return file_available(path); +} + +static char *pmu_id(const char *name) +{ + char path[PATH_MAX], *str; + size_t len; - return !!cpus; + snprintf(path, PATH_MAX, SYS_TEMPLATE_ID, name); + + if (sysfs__read_str(path, &str, &len) < 0) + return NULL; + + str[len - 1] = 0; /* remove line feed */ + + return str; } /* @@ -640,7 +656,6 @@ static bool pmu_is_uncore(const char *name) */ static int is_arm_pmu_core(const char *name) { - struct stat st; char path[PATH_MAX]; const char *sysfs = sysfs__mountpoint(); @@ -650,10 +665,7 @@ static int is_arm_pmu_core(const char *name) /* Look for cpu sysfs (specific to arm) */ scnprintf(path, PATH_MAX, "%s/bus/event_source/devices/%s/cpus", sysfs, name); - if (stat(path, &st) == 0) - return 1; - - return 0; + return file_available(path); } static char *perf_pmu__getcpuid(struct perf_pmu *pmu) @@ -842,15 +854,22 @@ static struct perf_pmu *pmu_lookup(const char *name) pmu->name = strdup(name); pmu->type = type; pmu->is_uncore = pmu_is_uncore(name); + if (pmu->is_uncore) + pmu->id = pmu_id(name); + pmu->is_hybrid = perf_pmu__hybrid_mounted(name); pmu->max_precise = pmu_max_precise(name); pmu_add_cpu_aliases(&aliases, pmu); INIT_LIST_HEAD(&pmu->format); INIT_LIST_HEAD(&pmu->aliases); + INIT_LIST_HEAD(&pmu->caps); list_splice(&format, &pmu->format); list_splice(&aliases, &pmu->aliases); list_add_tail(&pmu->list, &pmus); + if (pmu->is_hybrid) + list_add_tail(&pmu->hybrid_list, &perf_pmu__hybrid_pmus); + pmu->default_config = perf_pmu__get_default_config(pmu); return pmu; @@ -882,6 +901,25 @@ struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu) return NULL; } +struct perf_pmu *evsel__find_pmu(struct evsel *evsel) +{ + struct perf_pmu *pmu = NULL; + + while ((pmu = perf_pmu__scan(pmu)) != NULL) { + if (pmu->type == evsel->core.attr.type) + break; + } + + return pmu; +} + +bool perf_evsel__is_aux_event(struct evsel *evsel) +{ + struct perf_pmu *pmu = evsel__find_pmu(evsel); + + return pmu && pmu->auxtrace; +} + struct perf_pmu *perf_pmu__find(const char *name) { struct perf_pmu *pmu; @@ -925,6 +963,16 @@ __u64 perf_pmu__format_bits(struct list_head *formats, const char *name) return bits; } +int perf_pmu__format_type(struct list_head *formats, const char *name) +{ + struct perf_pmu_format *format = pmu_find_format(formats, name); + + if (!format) + return -1; + + return format->value; +} + /* * Sets value based on the format definition (format parameter) * and unformated value (value parameter). @@ -1536,7 +1584,6 @@ bool pmu_have_event(const char *pname, const char *name) static FILE *perf_pmu__open_file(struct perf_pmu *pmu, const char *name) { - struct stat st; char path[PATH_MAX]; const char *sysfs; @@ -1546,10 +1593,8 @@ static FILE *perf_pmu__open_file(struct perf_pmu *pmu, const char *name) snprintf(path, PATH_MAX, "%s" EVENT_SOURCE_DEVICE_PATH "%s/%s", sysfs, pmu->name, name); - - if (stat(path, &st) < 0) + if (!file_available(path)) return NULL; - return fopen(path, "r"); } @@ -1569,3 +1614,134 @@ int perf_pmu__scan_file(struct perf_pmu *pmu, const char *name, const char *fmt, va_end(args); return ret; } + +static int perf_pmu__new_caps(struct list_head *list, char *name, char *value) +{ + struct perf_pmu_caps *caps = zalloc(sizeof(*caps)); + + if (!caps) + return -ENOMEM; + + caps->name = strdup(name); + if (!caps->name) + goto free_caps; + caps->value = strndup(value, strlen(value) - 1); + if (!caps->value) + goto free_name; + list_add_tail(&caps->list, list); + return 0; + +free_name: + zfree(caps->name); +free_caps: + free(caps); + + return -ENOMEM; +} + +/* + * Reading/parsing the given pmu capabilities, which should be located at: + * /sys/bus/event_source/devices//caps as sysfs group attributes. + * Return the number of capabilities + */ +int perf_pmu__caps_parse(struct perf_pmu *pmu) +{ + struct stat st; + char caps_path[PATH_MAX]; + const char *sysfs = sysfs__mountpoint(); + DIR *caps_dir; + struct dirent *evt_ent; + + if (pmu->caps_initialized) + return pmu->nr_caps; + + pmu->nr_caps = 0; + + if (!sysfs) + return -1; + + snprintf(caps_path, PATH_MAX, + "%s" EVENT_SOURCE_DEVICE_PATH "%s/caps", sysfs, pmu->name); + + if (stat(caps_path, &st) < 0) { + pmu->caps_initialized = true; + return 0; /* no error if caps does not exist */ + } + + caps_dir = opendir(caps_path); + if (!caps_dir) + return -EINVAL; + + while ((evt_ent = readdir(caps_dir)) != NULL) { + char path[PATH_MAX + NAME_MAX + 1]; + char *name = evt_ent->d_name; + char value[128]; + FILE *file; + + if (!strcmp(name, ".") || !strcmp(name, "..")) + continue; + + snprintf(path, sizeof(path), "%s/%s", caps_path, name); + + file = fopen(path, "r"); + if (!file) + continue; + + if (!fgets(value, sizeof(value), file) || + (perf_pmu__new_caps(&pmu->caps, name, value) < 0)) { + fclose(file); + continue; + } + + pmu->nr_caps++; + fclose(file); + } + + closedir(caps_dir); + + pmu->caps_initialized = true; + return pmu->nr_caps; +} + +void perf_pmu__warn_invalid_config(struct perf_pmu *pmu, __u64 config, + char *name) +{ + struct perf_pmu_format *format; + __u64 masks = 0, bits; + char buf[100]; + unsigned int i; + + list_for_each_entry(format, &pmu->format, list) { + if (format->value != PERF_PMU_FORMAT_VALUE_CONFIG) + continue; + + for_each_set_bit(i, format->bits, PERF_PMU_FORMAT_BITS) + masks |= 1ULL << i; + } + + /* + * Kernel doesn't export any valid format bits. + */ + if (masks == 0) + return; + + bits = config & ~masks; + if (bits == 0) + return; + + bitmap_scnprintf((unsigned long *)&bits, sizeof(bits) * 8, buf, sizeof(buf)); + + pr_warning("WARNING: event '%s' not valid (bits %s of config " + "'%llx' not supported by kernel)!\n", + name ?: "N/A", buf, config); +} + +bool perf_pmu__has_hybrid(void) +{ + if (!hybrid_scanned) { + hybrid_scanned = true; + perf_pmu__scan(NULL); + } + + return !list_empty(&perf_pmu__hybrid_pmus); +} diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h index ed9154b182bfc5c46c51f920f07e19ee2edf76b5..e0cf540acac2a43d29e76adc4a2d4492e60d6088 100644 --- a/tools/perf/util/pmu.h +++ b/tools/perf/util/pmu.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include "parse-events.h" @@ -18,21 +19,34 @@ enum { #define PERF_PMU_FORMAT_BITS 64 #define EVENT_SOURCE_DEVICE_PATH "/bus/event_source/devices/" +#define CPUS_TEMPLATE_CPU "%s/bus/event_source/devices/%s/cpus" struct perf_event_attr; +struct perf_pmu_caps { + char *name; + char *value; + struct list_head list; +}; + struct perf_pmu { char *name; + char *id; __u32 type; bool selectable; bool is_uncore; + bool is_hybrid; bool auxtrace; int max_precise; struct perf_event_attr *default_config; struct perf_cpu_map *cpus; struct list_head format; /* HEAD struct perf_pmu_format -> list */ struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */ + bool caps_initialized; + u32 nr_caps; + struct list_head caps; /* HEAD struct perf_pmu_caps -> list */ struct list_head list; /* ELEM */ + struct list_head hybrid_list; }; struct perf_pmu_info { @@ -71,6 +85,7 @@ int perf_pmu__config_terms(struct list_head *formats, struct list_head *head_terms, bool zero, struct parse_events_error *error); __u64 perf_pmu__format_bits(struct list_head *formats, const char *name); +int perf_pmu__format_type(struct list_head *formats, const char *name); int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms, struct perf_pmu_info *info); struct list_head *perf_pmu__alias(struct perf_pmu *pmu, @@ -100,4 +115,11 @@ struct pmu_events_map *perf_pmu__find_map(struct perf_pmu *pmu); int perf_pmu__convert_scale(const char *scale, char **end, double *sval); +int perf_pmu__caps_parse(struct perf_pmu *pmu); + +void perf_pmu__warn_invalid_config(struct perf_pmu *pmu, __u64 config, + char *name); + +bool perf_pmu__has_hybrid(void); + #endif /* __PMU_H */ diff --git a/tools/perf/util/python-ext-sources b/tools/perf/util/python-ext-sources index 9af183860fbd08767969490133f444cec9be08cf..e7279ea6043aed2390065cb3ea03200400e3ac7f 100644 --- a/tools/perf/util/python-ext-sources +++ b/tools/perf/util/python-ext-sources @@ -33,3 +33,4 @@ util/trace-event.c util/string.c util/symbol_fprintf.c util/units.c +util/affinity.c diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c index 02460362256d17f5833320cbb6db5ebe0768907e..83212c65848bb244ad9128494b7833acaf0d3c22 100644 --- a/tools/perf/util/python.c +++ b/tools/perf/util/python.c @@ -6,6 +6,7 @@ #include #include #include +#include #include "evlist.h" #include "callchain.h" #include "evsel.h" @@ -64,6 +65,7 @@ struct perf_env perf_env; * implementing 'verbose' and 'eprintf'. */ int verbose; +int debug_peo_args; int eprintf(int level, int var, const char *fmt, ...); @@ -1022,10 +1024,10 @@ static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist, if (!md) return NULL; - if (perf_mmap__read_init(md) < 0) + if (perf_mmap__read_init(&md->core) < 0) goto end; - event = perf_mmap__read_event(md); + event = perf_mmap__read_event(&md->core); if (event != NULL) { PyObject *pyevent = pyrf_event__new(event); struct pyrf_event *pevent = (struct pyrf_event *)pyevent; @@ -1045,7 +1047,7 @@ static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist, err = perf_evsel__parse_sample(evsel, event, &pevent->sample); /* Consume the even only after we parsed it out. */ - perf_mmap__consume(md); + perf_mmap__consume(&md->core); if (err) return PyErr_Format(PyExc_OSError, diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c index 7def661685032811c11e059bc080deda6f0f161a..24e3c867d3a735672eb9883c188fa9d8a24c7908 100644 --- a/tools/perf/util/record.c +++ b/tools/perf/util/record.c @@ -10,163 +10,10 @@ #include #include #include "cloexec.h" +#include "util/perf_api_probe.h" #include "record.h" #include "../perf-sys.h" -typedef void (*setup_probe_fn_t)(struct evsel *evsel); - -static int perf_do_probe_api(setup_probe_fn_t fn, int cpu, const char *str) -{ - struct evlist *evlist; - struct evsel *evsel; - unsigned long flags = perf_event_open_cloexec_flag(); - int err = -EAGAIN, fd; - static pid_t pid = -1; - - evlist = evlist__new(); - if (!evlist) - return -ENOMEM; - - if (parse_events(evlist, str, NULL)) - goto out_delete; - - evsel = evlist__first(evlist); - - while (1) { - fd = sys_perf_event_open(&evsel->core.attr, pid, cpu, -1, flags); - if (fd < 0) { - if (pid == -1 && errno == EACCES) { - pid = 0; - continue; - } - goto out_delete; - } - break; - } - close(fd); - - fn(evsel); - - fd = sys_perf_event_open(&evsel->core.attr, pid, cpu, -1, flags); - if (fd < 0) { - if (errno == EINVAL) - err = -EINVAL; - goto out_delete; - } - close(fd); - err = 0; - -out_delete: - evlist__delete(evlist); - return err; -} - -static bool perf_probe_api(setup_probe_fn_t fn) -{ - const char *try[] = {"cycles:u", "instructions:u", "cpu-clock:u", NULL}; - struct perf_cpu_map *cpus; - int cpu, ret, i = 0; - - cpus = perf_cpu_map__new(NULL); - if (!cpus) - return false; - cpu = cpus->map[0]; - perf_cpu_map__put(cpus); - - do { - ret = perf_do_probe_api(fn, cpu, try[i++]); - if (!ret) - return true; - } while (ret == -EAGAIN && try[i]); - - return false; -} - -static void perf_probe_sample_identifier(struct evsel *evsel) -{ - evsel->core.attr.sample_type |= PERF_SAMPLE_IDENTIFIER; -} - -static void perf_probe_comm_exec(struct evsel *evsel) -{ - evsel->core.attr.comm_exec = 1; -} - -static void perf_probe_context_switch(struct evsel *evsel) -{ - evsel->core.attr.context_switch = 1; -} - -bool perf_can_sample_identifier(void) -{ - return perf_probe_api(perf_probe_sample_identifier); -} - -static bool perf_can_comm_exec(void) -{ - return perf_probe_api(perf_probe_comm_exec); -} - -bool perf_can_record_switch_events(void) -{ - return perf_probe_api(perf_probe_context_switch); -} - -bool perf_can_record_cpu_wide(void) -{ - struct perf_event_attr attr = { - .type = PERF_TYPE_SOFTWARE, - .config = PERF_COUNT_SW_CPU_CLOCK, - .exclude_kernel = 1, - }; - struct perf_cpu_map *cpus; - int cpu, fd; - - cpus = perf_cpu_map__new(NULL); - if (!cpus) - return false; - cpu = cpus->map[0]; - perf_cpu_map__put(cpus); - - fd = sys_perf_event_open(&attr, -1, cpu, -1, 0); - if (fd < 0) - return false; - close(fd); - - return true; -} - -/* - * Architectures are expected to know if AUX area sampling is supported by the - * hardware. Here we check for kernel support. - */ -bool perf_can_aux_sample(void) -{ - struct perf_event_attr attr = { - .size = sizeof(struct perf_event_attr), - .exclude_kernel = 1, - /* - * Non-zero value causes the kernel to calculate the effective - * attribute size up to that byte. - */ - .aux_sample_size = 1, - }; - int fd; - - fd = sys_perf_event_open(&attr, -1, 0, -1, 0); - /* - * If the kernel attribute is big enough to contain aux_sample_size - * then we assume that it is supported. We are relying on the kernel to - * validate the attribute size before anything else that could be wrong. - */ - if (fd < 0 && errno == E2BIG) - return false; - if (fd >= 0) - close(fd); - - return true; -} - void perf_evlist__config(struct evlist *evlist, struct record_opts *opts, struct callchain_param *callchain) { @@ -215,7 +62,7 @@ void perf_evlist__config(struct evlist *evlist, struct record_opts *opts, if (sample_id) { evlist__for_each_entry(evlist, evsel) - perf_evsel__set_sample_id(evsel, use_sample_identifier); + evsel__set_sample_id(evsel, use_sample_identifier); } perf_evlist__set_id_pos(evlist); diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h index 941f9ae49e18e5771d4eee7a0010fa8ce6271ae8..e6a552a7882ed5003c35dd0bba969349c71dff22 100644 --- a/tools/perf/util/record.h +++ b/tools/perf/util/record.h @@ -22,6 +22,8 @@ struct record_opts { bool raw_samples; bool sample_address; bool sample_phys_addr; + bool sample_data_page_size; + bool sample_code_page_size; bool sample_weight; bool sample_time; bool sample_time_set; @@ -34,6 +36,7 @@ struct record_opts { bool auxtrace_snapshot_on_exit; bool auxtrace_sample_mode; bool record_namespaces; + bool record_cgroup; bool record_switch_events; bool all_kernel; bool all_user; diff --git a/tools/perf/util/s390-sample-raw.c b/tools/perf/util/s390-sample-raw.c index 05b43ab4eeefc3101466166a87e4cacbfbc6547a..8626ae40d05809ce300f3d504e70059a006b4d39 100644 --- a/tools/perf/util/s390-sample-raw.c +++ b/tools/perf/util/s390-sample-raw.c @@ -197,8 +197,7 @@ static void s390_cpumcfdg_dump(struct perf_sample *sample) * its raw data. * The function is only invoked when the dump flag -D is set. */ -void perf_evlist__s390_sample_raw(struct evlist *evlist, union perf_event *event, - struct perf_sample *sample) +void evlist__s390_sample_raw(struct evlist *evlist, union perf_event *event, struct perf_sample *sample) { struct evsel *ev_bc000; diff --git a/tools/perf/util/sample-raw.c b/tools/perf/util/sample-raw.c index e84bbe0e441a06380f621ac004c8b92dba6961da..f3f6bd9d290eb448f255a7bf04fac31ec360692b 100644 --- a/tools/perf/util/sample-raw.c +++ b/tools/perf/util/sample-raw.c @@ -1,18 +1,26 @@ /* SPDX-License-Identifier: GPL-2.0 */ #include +#include #include "evlist.h" #include "env.h" +#include "header.h" #include "sample-raw.h" /* * Check platform the perf data file was created on and perform platform * specific interpretation. */ -void perf_evlist__init_trace_event_sample_raw(struct evlist *evlist) +void evlist__init_trace_event_sample_raw(struct evlist *evlist) { const char *arch_pf = perf_env__arch(evlist->env); + const char *cpuid = perf_env__cpuid(evlist->env); if (arch_pf && !strcmp("s390", arch_pf)) - evlist->trace_event_sample_raw = perf_evlist__s390_sample_raw; + evlist->trace_event_sample_raw = evlist__s390_sample_raw; + else if (arch_pf && !strcmp("x86", arch_pf) && + cpuid && strstarts(cpuid, "AuthenticAMD") && + evlist__has_amd_ibs(evlist)) { + evlist->trace_event_sample_raw = evlist__amd_sample_raw; + } } diff --git a/tools/perf/util/sample-raw.h b/tools/perf/util/sample-raw.h index afe1491a117e7b59f0f04d67ced7225f1c4cc41e..ea01c581150306067305a8de6c290e22f321bcd8 100644 --- a/tools/perf/util/sample-raw.h +++ b/tools/perf/util/sample-raw.h @@ -6,9 +6,10 @@ struct evlist; union perf_event; struct perf_sample; -void perf_evlist__s390_sample_raw(struct evlist *evlist, - union perf_event *event, - struct perf_sample *sample); - -void perf_evlist__init_trace_event_sample_raw(struct evlist *evlist); +void evlist__s390_sample_raw(struct evlist *evlist, union perf_event *event, + struct perf_sample *sample); +bool evlist__has_amd_ibs(struct evlist *evlist); +void evlist__amd_sample_raw(struct evlist *evlist, union perf_event *event, + struct perf_sample *sample); +void evlist__init_trace_event_sample_raw(struct evlist *evlist); #endif /* __PERF_EVLIST_H */ diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c index 3b02c3f1b289527b2db3463fc77acb1c1d02b943..2bdd10c4c2460d7729c69c02db741f546b012675 100644 --- a/tools/perf/util/scripting-engines/trace-event-python.c +++ b/tools/perf/util/scripting-engines/trace-event-python.c @@ -464,6 +464,7 @@ static PyObject *python_process_brstack(struct perf_sample *sample, struct thread *thread) { struct branch_stack *br = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); PyObject *pylist; u64 i; @@ -484,28 +485,28 @@ static PyObject *python_process_brstack(struct perf_sample *sample, Py_FatalError("couldn't create Python dictionary"); pydict_set_item_string_decref(pyelem, "from", - PyLong_FromUnsignedLongLong(br->entries[i].from)); + PyLong_FromUnsignedLongLong(entries[i].from)); pydict_set_item_string_decref(pyelem, "to", - PyLong_FromUnsignedLongLong(br->entries[i].to)); + PyLong_FromUnsignedLongLong(entries[i].to)); pydict_set_item_string_decref(pyelem, "mispred", - PyBool_FromLong(br->entries[i].flags.mispred)); + PyBool_FromLong(entries[i].flags.mispred)); pydict_set_item_string_decref(pyelem, "predicted", - PyBool_FromLong(br->entries[i].flags.predicted)); + PyBool_FromLong(entries[i].flags.predicted)); pydict_set_item_string_decref(pyelem, "in_tx", - PyBool_FromLong(br->entries[i].flags.in_tx)); + PyBool_FromLong(entries[i].flags.in_tx)); pydict_set_item_string_decref(pyelem, "abort", - PyBool_FromLong(br->entries[i].flags.abort)); + PyBool_FromLong(entries[i].flags.abort)); pydict_set_item_string_decref(pyelem, "cycles", - PyLong_FromUnsignedLongLong(br->entries[i].flags.cycles)); + PyLong_FromUnsignedLongLong(entries[i].flags.cycles)); thread__find_map_fb(thread, sample->cpumode, - br->entries[i].from, &al); + entries[i].from, &al); dsoname = get_dsoname(al.map); pydict_set_item_string_decref(pyelem, "from_dsoname", _PyUnicode_FromString(dsoname)); thread__find_map_fb(thread, sample->cpumode, - br->entries[i].to, &al); + entries[i].to, &al); dsoname = get_dsoname(al.map); pydict_set_item_string_decref(pyelem, "to_dsoname", _PyUnicode_FromString(dsoname)); @@ -561,6 +562,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample, struct thread *thread) { struct branch_stack *br = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); PyObject *pylist; u64 i; char bf[512]; @@ -581,22 +583,22 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample, Py_FatalError("couldn't create Python dictionary"); thread__find_symbol_fb(thread, sample->cpumode, - br->entries[i].from, &al); + entries[i].from, &al); get_symoff(al.sym, &al, true, bf, sizeof(bf)); pydict_set_item_string_decref(pyelem, "from", _PyUnicode_FromString(bf)); thread__find_symbol_fb(thread, sample->cpumode, - br->entries[i].to, &al); + entries[i].to, &al); get_symoff(al.sym, &al, true, bf, sizeof(bf)); pydict_set_item_string_decref(pyelem, "to", _PyUnicode_FromString(bf)); - get_br_mspred(&br->entries[i].flags, bf, sizeof(bf)); + get_br_mspred(&entries[i].flags, bf, sizeof(bf)); pydict_set_item_string_decref(pyelem, "pred", _PyUnicode_FromString(bf)); - if (br->entries[i].flags.in_tx) { + if (entries[i].flags.in_tx) { pydict_set_item_string_decref(pyelem, "in_tx", _PyUnicode_FromString("X")); } else { @@ -604,7 +606,7 @@ static PyObject *python_process_brstacksym(struct perf_sample *sample, _PyUnicode_FromString("-")); } - if (br->entries[i].flags.abort) { + if (entries[i].flags.abort) { pydict_set_item_string_decref(pyelem, "abort", _PyUnicode_FromString("A")); } else { diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 5de955d60a04452179102be968606866cbc8a0d5..dbf1d7611bf833636b3210e33be407b21defc663 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -101,11 +101,11 @@ static int perf_session__deliver_event(struct perf_session *session, struct perf_tool *tool, u64 file_offset); -static int perf_session__open(struct perf_session *session) +static int perf_session__open(struct perf_session *session, int repipe_fd) { struct perf_data *data = session->data; - if (perf_session__read_header(session) < 0) { + if (perf_session__read_header(session, repipe_fd) < 0) { pr_err("incompatible file format (rerun with -v to learn more)\n"); return -1; } @@ -184,8 +184,9 @@ static int ordered_events__deliver_event(struct ordered_events *oe, session->tool, event->file_offset); } -struct perf_session *perf_session__new(struct perf_data *data, - bool repipe, struct perf_tool *tool) +struct perf_session *__perf_session__new(struct perf_data *data, + bool repipe, int repipe_fd, + struct perf_tool *tool) { int ret = -ENOMEM; struct perf_session *session = zalloc(sizeof(*session)); @@ -209,7 +210,7 @@ struct perf_session *perf_session__new(struct perf_data *data, session->data = data; if (perf_data__is_read(data)) { - ret = perf_session__open(session); + ret = perf_session__open(session, repipe_fd); if (ret < 0) goto out_delete; @@ -222,7 +223,7 @@ struct perf_session *perf_session__new(struct perf_data *data, perf_session__set_comm_exec(session); } - perf_evlist__init_trace_event_sample_raw(session->evlist); + evlist__init_trace_event_sample_raw(session->evlist); /* Open the directory data. */ if (data->is_dir) { @@ -467,6 +468,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool) tool->comm = process_event_stub; if (tool->namespaces == NULL) tool->namespaces = process_event_stub; + if (tool->cgroup == NULL) + tool->cgroup = process_event_stub; if (tool->fork == NULL) tool->fork = process_event_stub; if (tool->exit == NULL) @@ -1004,6 +1007,7 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample) { struct ip_callchain *callchain = sample->callchain; struct branch_stack *lbr_stack = sample->branch_stack; + struct branch_entry *entries = perf_sample__branch_entries(sample); u64 kernel_callchain_nr = callchain->nr; unsigned int i; @@ -1040,10 +1044,10 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample) i, callchain->ips[i]); printf("..... %2d: %016" PRIx64 "\n", - (int)(kernel_callchain_nr), lbr_stack->entries[0].to); + (int)(kernel_callchain_nr), entries[0].to); for (i = 0; i < lbr_stack->nr; i++) printf("..... %2d: %016" PRIx64 "\n", - (int)(i + kernel_callchain_nr + 1), lbr_stack->entries[i].from); + (int)(i + kernel_callchain_nr + 1), entries[i].from); } } @@ -1065,6 +1069,7 @@ static void callchain__printf(struct evsel *evsel, static void branch_stack__printf(struct perf_sample *sample, bool callstack) { + struct branch_entry *entries = perf_sample__branch_entries(sample); uint64_t i; printf("%s: nr:%" PRIu64 "\n", @@ -1072,7 +1077,7 @@ static void branch_stack__printf(struct perf_sample *sample, bool callstack) sample->branch_stack->nr); for (i = 0; i < sample->branch_stack->nr; i++) { - struct branch_entry *e = &sample->branch_stack->entries[i]; + struct branch_entry *e = &entries[i]; if (!callstack) { printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n", @@ -1236,7 +1241,7 @@ static void dump_sample(struct evsel *evsel, union perf_event *event, if (evsel__has_callchain(evsel)) callchain__printf(evsel, sample); - if (sample_type & PERF_SAMPLE_BRANCH_STACK) + if (evsel__has_br_stack(evsel)) branch_stack__printf(sample, perf_evsel__has_branch_callstack(evsel)); if (sample_type & PERF_SAMPLE_REGS_USER) @@ -1248,7 +1253,7 @@ static void dump_sample(struct evsel *evsel, union perf_event *event, if (sample_type & PERF_SAMPLE_STACK_USER) stack_user__printf(&sample->user_stack); - if (sample_type & PERF_SAMPLE_WEIGHT) + if (sample_type & PERF_SAMPLE_WEIGHT_TYPE) printf("... weight: %" PRIu64 "\n", sample->weight); if (sample_type & PERF_SAMPLE_DATA_SRC) @@ -1431,6 +1436,8 @@ static int machines__deliver_event(struct machines *machines, return tool->comm(tool, event, sample, machine); case PERF_RECORD_NAMESPACES: return tool->namespaces(tool, event, sample, machine); + case PERF_RECORD_CGROUP: + return tool->cgroup(tool, event, sample, machine); case PERF_RECORD_FORK: return tool->fork(tool, event, sample, machine); case PERF_RECORD_EXIT: @@ -1856,7 +1863,6 @@ static int __perf_session__process_pipe_events(struct perf_session *session) { struct ordered_events *oe = &session->ordered_events; struct perf_tool *tool = session->tool; - int fd = perf_data__fd(session->data); union perf_event *event; uint32_t size, cur_size = 0; void *buf = NULL; @@ -1876,7 +1882,8 @@ static int __perf_session__process_pipe_events(struct perf_session *session) ordered_events__set_copy_on_queue(oe, true); more: event = buf; - err = readn(fd, event, sizeof(struct perf_event_header)); + err = perf_data__read(session->data, event, + sizeof(struct perf_event_header)); if (err <= 0) { if (err == 0) goto done; @@ -1908,7 +1915,8 @@ static int __perf_session__process_pipe_events(struct perf_session *session) p += sizeof(struct perf_event_header); if (size - sizeof(struct perf_event_header)) { - err = readn(fd, p, size - sizeof(struct perf_event_header)); + err = perf_data__read(session->data, p, + size - sizeof(struct perf_event_header)); if (err <= 0) { if (err == 0) { pr_err("unexpected end of event stream\n"); @@ -2064,6 +2072,7 @@ struct reader { u64 data_size; u64 data_offset; reader_cb_t process; + bool in_place_update; }; static int @@ -2097,7 +2106,9 @@ reader__process_events(struct reader *rd, struct perf_session *session, mmap_prot = PROT_READ; mmap_flags = MAP_SHARED; - if (session->header.needs_swap) { + if (rd->in_place_update) { + mmap_prot |= PROT_WRITE; + } else if (session->header.needs_swap) { mmap_prot |= PROT_WRITE; mmap_flags = MAP_PRIVATE; } @@ -2183,6 +2194,7 @@ static int __perf_session__process_events(struct perf_session *session) .data_size = session->header.data_size, .data_offset = session->header.data_offset, .process = process_simple, + .in_place_update = session->data->in_place_update, }; struct ordered_events *oe = &session->ordered_events; struct perf_tool *tool = session->tool; diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h index b4c9428c18f0728445e462526d52c6bb380cdeac..f68c33562e73ee8ee586a437db57f0fa0b245fa9 100644 --- a/tools/perf/util/session.h +++ b/tools/perf/util/session.h @@ -54,8 +54,16 @@ struct decomp { struct perf_tool; -struct perf_session *perf_session__new(struct perf_data *data, - bool repipe, struct perf_tool *tool); +struct perf_session *__perf_session__new(struct perf_data *data, + bool repipe, int repipe_fd, + struct perf_tool *tool); + +static inline struct perf_session *perf_session__new(struct perf_data *data, + struct perf_tool *tool) +{ + return __perf_session__new(data, false, -1, tool); +} + void perf_session__delete(struct perf_session *session); void perf_event_header__bswap(struct perf_event_header *hdr); diff --git a/tools/perf/util/sideband_evlist.c b/tools/perf/util/sideband_evlist.c new file mode 100644 index 0000000000000000000000000000000000000000..4d01b4f80b3c731aaf13f02da50c65d33388cd76 --- /dev/null +++ b/tools/perf/util/sideband_evlist.c @@ -0,0 +1,148 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "util/debug.h" +#include "util/evlist.h" +#include "util/evsel.h" +#include "util/mmap.h" +#include "util/perf_api_probe.h" +#include +#include +#include +#include +#include +#include + +int perf_evlist__add_sb_event(struct evlist *evlist, struct perf_event_attr *attr, + perf_evsel__sb_cb_t cb, void *data) +{ + struct evsel *evsel; + + if (!attr->sample_id_all) { + pr_warning("enabling sample_id_all for all side band events\n"); + attr->sample_id_all = 1; + } + + evsel = evsel__new_idx(attr, evlist->core.nr_entries); + if (!evsel) + return -1; + + evsel->side_band.cb = cb; + evsel->side_band.data = data; + evlist__add(evlist, evsel); + return 0; +} + +static void *perf_evlist__poll_thread(void *arg) +{ + struct evlist *evlist = arg; + bool draining = false; + int i, done = 0; + /* + * In order to read symbols from other namespaces perf to needs to call + * setns(2). This isn't permitted if the struct_fs has multiple users. + * unshare(2) the fs so that we may continue to setns into namespaces + * that we're observing when, for instance, reading the build-ids at + * the end of a 'perf record' session. + */ + unshare(CLONE_FS); + + while (!done) { + bool got_data = false; + + if (evlist->thread.done) + draining = true; + + if (!draining) + evlist__poll(evlist, 1000); + + for (i = 0; i < evlist->core.nr_mmaps; i++) { + struct mmap *map = &evlist->mmap[i]; + union perf_event *event; + + if (perf_mmap__read_init(&map->core)) + continue; + while ((event = perf_mmap__read_event(&map->core)) != NULL) { + struct evsel *evsel = perf_evlist__event2evsel(evlist, event); + + if (evsel && evsel->side_band.cb) + evsel->side_band.cb(event, evsel->side_band.data); + else + pr_warning("cannot locate proper evsel for the side band event\n"); + + perf_mmap__consume(&map->core); + got_data = true; + } + perf_mmap__read_done(&map->core); + } + + if (draining && !got_data) + break; + } + return NULL; +} + +void evlist__set_cb(struct evlist *evlist, perf_evsel__sb_cb_t cb, void *data) +{ + struct evsel *evsel; + + evlist__for_each_entry(evlist, evsel) { + evsel->core.attr.sample_id_all = 1; + evsel->core.attr.watermark = 1; + evsel->core.attr.wakeup_watermark = 1; + evsel->side_band.cb = cb; + evsel->side_band.data = data; + } +} + +int perf_evlist__start_sb_thread(struct evlist *evlist, struct target *target) +{ + struct evsel *counter; + + if (!evlist) + return 0; + + if (perf_evlist__create_maps(evlist, target)) + goto out_delete_evlist; + + if (evlist->core.nr_entries > 1) { + bool can_sample_identifier = perf_can_sample_identifier(); + + evlist__for_each_entry(evlist, counter) + evsel__set_sample_id(counter, can_sample_identifier); + + perf_evlist__set_id_pos(evlist); + } + + evlist__for_each_entry(evlist, counter) { + if (evsel__open(counter, evlist->core.cpus, evlist->core.threads) < 0) + goto out_delete_evlist; + } + + if (evlist__mmap(evlist, UINT_MAX)) + goto out_delete_evlist; + + evlist__for_each_entry(evlist, counter) { + if (evsel__enable(counter)) + goto out_delete_evlist; + } + + evlist->thread.done = 0; + if (pthread_create(&evlist->thread.th, NULL, perf_evlist__poll_thread, evlist)) + goto out_delete_evlist; + + return 0; + +out_delete_evlist: + evlist__delete(evlist); + evlist = NULL; + return -1; +} + +void perf_evlist__stop_sb_thread(struct evlist *evlist) +{ + if (!evlist) + return; + evlist->thread.done = 1; + pthread_join(evlist->thread.th, NULL); + evlist__delete(evlist); +} diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 4027906fd3e38eb356ebf335cfa6e48b3fe93a29..90b0da006e7a13a18a5a9ae92fbc5ed716f96d6d 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -1200,6 +1200,7 @@ sort__dcacheline_cmp(struct hist_entry *left, struct hist_entry *right) { u64 l, r; struct map *l_map, *r_map; + int rc; if (!left->mem_info) return -1; if (!right->mem_info) return 1; @@ -1218,18 +1219,9 @@ sort__dcacheline_cmp(struct hist_entry *left, struct hist_entry *right) if (!l_map) return -1; if (!r_map) return 1; - if (l_map->maj > r_map->maj) return -1; - if (l_map->maj < r_map->maj) return 1; - - if (l_map->min > r_map->min) return -1; - if (l_map->min < r_map->min) return 1; - - if (l_map->ino > r_map->ino) return -1; - if (l_map->ino < r_map->ino) return 1; - - if (l_map->ino_generation > r_map->ino_generation) return -1; - if (l_map->ino_generation < r_map->ino_generation) return 1; - + rc = dso__cmp_id(l_map->dso, r_map->dso); + if (rc) + return rc; /* * Addresses with no major/minor numbers are assumed to be * anonymous in userspace. Sort those on pid then address. @@ -1240,8 +1232,8 @@ sort__dcacheline_cmp(struct hist_entry *left, struct hist_entry *right) if ((left->cpumode != PERF_RECORD_MISC_KERNEL) && (!(l_map->flags & MAP_SHARED)) && - !l_map->maj && !l_map->min && !l_map->ino && - !l_map->ino_generation) { + !l_map->dso->id.maj && !l_map->dso->id.min && + !l_map->dso->id.ino && !l_map->dso->id.ino_generation) { /* userspace anonymous */ if (left->thread->pid_ > right->thread->pid_) return -1; @@ -1277,8 +1269,8 @@ static int hist_entry__dcacheline_snprintf(struct hist_entry *he, char *bf, if ((he->cpumode != PERF_RECORD_MISC_KERNEL) && map && !(map->prot & PROT_EXEC) && (map->flags & MAP_SHARED) && - (map->maj || map->min || map->ino || - map->ino_generation)) + (map->dso->id.maj || map->dso->id.min || + map->dso->id.ino || map->dso->id.ino_generation)) level = 's'; else if (!map) level = 'X'; @@ -3142,7 +3134,7 @@ static void add_hpp_sort_string(struct strbuf *sb, struct hpp_dimension *s, int add_key(sb, s[i].name, llen); } -const char *sort_help(const char *prefix) +char *sort_help(const char *prefix) { struct strbuf sb; char *s; diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 7b93f34ac1f4c34328f2dc1378ccc2663ef4fe43..c047b4591b7055de3deb0e5525c9ef3ab5550240 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -285,7 +285,7 @@ void reset_output_field(void); void sort__setup_elide(FILE *fp); void perf_hpp__set_elide(int idx, bool elide); -const char *sort_help(const char *prefix); +char *sort_help(const char *prefix); int report_parse_ignore_callees_opt(const struct option *opt, const char *arg, int unset); diff --git a/tools/perf/util/srccode.c b/tools/perf/util/srccode.c index d84ed8b6caaa21bd5c32b937b099cb53e3007cec..c29edaaca8633e5539511abb457e5592ee7fbbf1 100644 --- a/tools/perf/util/srccode.c +++ b/tools/perf/util/srccode.c @@ -16,6 +16,7 @@ #include "srccode.h" #include "debug.h" #include // page_size +#include "fncache.h" #define MAXSRCCACHE (32*1024*1024) #define MAXSRCFILES 64 @@ -36,14 +37,6 @@ static LIST_HEAD(srcfile_list); static long map_total_sz; static int num_srcfiles; -static unsigned shash(unsigned char *s) -{ - unsigned h = 0; - while (*s) - h = 65599 * h + *s++; - return h ^ (h >> 16); -} - static int countlines(char *map, int maplen) { int numl; diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c index 5156aa971fbb2360b912e0e1e03fde563fa881ba..93c95b4b13ddcc6d57ce999e5088158335c962aa 100644 --- a/tools/perf/util/stat.c +++ b/tools/perf/util/stat.c @@ -465,7 +465,8 @@ size_t perf_event__fprintf_stat_config(union perf_event *event, FILE *fp) int create_perf_stat_counter(struct evsel *evsel, struct perf_stat_config *config, - struct target *target) + struct target *target, + int cpu) { struct perf_event_attr *attr = &evsel->core.attr; struct evsel *leader = evsel->leader; @@ -509,7 +510,7 @@ int create_perf_stat_counter(struct evsel *evsel, } if (target__has_cpu(target) && !target__has_per_thread(target)) - return perf_evsel__open_per_cpu(evsel, evsel__cpus(evsel)); + return perf_evsel__open_per_cpu(evsel, evsel__cpus(evsel), cpu); return perf_evsel__open_per_thread(evsel, evsel->core.threads); } diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h index edbeb2f63e8dfb781ba79bbb2a3e298db17d54f7..0773e4b7ec44b50d347f1ae8e72876f6389573d7 100644 --- a/tools/perf/util/stat.h +++ b/tools/perf/util/stat.h @@ -211,7 +211,8 @@ size_t perf_event__fprintf_stat_config(union perf_event *event, FILE *fp); int create_perf_stat_counter(struct evsel *evsel, struct perf_stat_config *config, - struct target *target); + struct target *target, + int cpu); void perf_evlist__print_counters(struct evlist *evlist, struct perf_stat_config *config, diff --git a/tools/perf/util/string2.h b/tools/perf/util/string2.h index 708805f5573e3c4be48d4d17b70c2d9c999d919e..73df616ced4302cdd279d5d96b84aa90646b5d02 100644 --- a/tools/perf/util/string2.h +++ b/tools/perf/util/string2.h @@ -4,6 +4,7 @@ #include #include +#include // pid_t #include #include @@ -32,6 +33,8 @@ static inline char *asprintf_expr_not_in_ints(const char *var, size_t nints, int return asprintf_expr_inout_ints(var, false, nints, ints); } +char *asprintf__tp_filter_pids(size_t npids, pid_t *pids); + char *strpbrk_esc(char *str, const char *stopset); char *strdup_esc(const char *str); diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c index e357f2b268de8ce82978d9a6a4e3655dacd741de..96e2adca27191bd3b2de8c77a2d9e4820b0bd852 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1,5 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-only +#include "util/cgroup.h" +#include "util/data.h" #include "util/debug.h" #include "util/dso.h" #include "util/event.h" @@ -1183,7 +1185,8 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); - sz += sizeof(u64); + /* nr, hw_idx */ + sz += 2 * sizeof(u64); result += sz; } @@ -1206,7 +1209,7 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, } } - if (type & PERF_SAMPLE_WEIGHT) + if (type & PERF_SAMPLE_WEIGHT_TYPE) result += sizeof(u64); if (type & PERF_SAMPLE_DATA_SRC) @@ -1228,6 +1231,15 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, if (type & PERF_SAMPLE_PHYS_ADDR) result += sizeof(u64); + if (type & PERF_SAMPLE_CGROUP) + result += sizeof(u64); + + if (type & PERF_SAMPLE_DATA_PAGE_SIZE) + result += sizeof(u64); + + if (type & PERF_SAMPLE_CODE_PAGE_SIZE) + result += sizeof(u64); + if (type & PERF_SAMPLE_AUX) { result += sizeof(u64); result += sample->aux_sample.size; @@ -1344,7 +1356,8 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); - sz += sizeof(u64); + /* nr, hw_idx */ + sz += 2 * sizeof(u64); memcpy(array, sample->branch_stack, sz); array = (void *)array + sz; } @@ -1370,8 +1383,10 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo } } - if (type & PERF_SAMPLE_WEIGHT) { + if (type & PERF_SAMPLE_WEIGHT_TYPE) { *array = sample->weight; + if (type & PERF_SAMPLE_WEIGHT_STRUCT) + *array &= 0xffffffff; array++; } @@ -1401,6 +1416,21 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo array++; } + if (type & PERF_SAMPLE_CGROUP) { + *array = sample->cgroup; + array++; + } + + if (type & PERF_SAMPLE_DATA_PAGE_SIZE) { + *array = sample->data_page_size; + array++; + } + + if (type & PERF_SAMPLE_CODE_PAGE_SIZE) { + *array = sample->code_page_size; + array++; + } + if (type & PERF_SAMPLE_AUX) { sz = sample->aux_sample.size; *array++ = sz; @@ -1894,3 +1924,53 @@ int perf_event__synthesize_features(struct perf_tool *tool, struct perf_session free(ff.buf); return ret; } + +int perf_event__synthesize_for_pipe(struct perf_tool *tool, + struct perf_session *session, + struct perf_data *data, + perf_event__handler_t process) +{ + int err; + int ret = 0; + struct evlist *evlist = session->evlist; + + /* + * We need to synthesize events first, because some + * features works on top of them (on report side). + */ + err = perf_event__synthesize_attrs(tool, evlist, process); + if (err < 0) { + pr_err("Couldn't synthesize attrs.\n"); + return err; + } + ret += err; + + err = perf_event__synthesize_features(tool, session, evlist, process); + if (err < 0) { + pr_err("Couldn't synthesize features.\n"); + return err; + } + ret += err; + + if (have_tracepoints(&evlist->core.entries)) { + int fd = perf_data__fd(data); + + /* + * FIXME err <= 0 here actually means that + * there were no tracepoints so its not really + * an error, just that we don't need to + * synthesize anything. We really have to + * return this more properly and also + * propagate errors that now are calling die() + */ + err = perf_event__synthesize_tracing_data(tool, fd, evlist, + process); + if (err <= 0) { + pr_err("Couldn't record tracing data.\n"); + return err; + } + ret += err; + } + + return ret; +} diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h index baead0cdc381033b2568b2f5098605c51bf2b417..96e11c7eed9d67c2ff98e5963d66e0d307ce6413 100644 --- a/tools/perf/util/synthetic-events.h +++ b/tools/perf/util/synthetic-events.h @@ -14,6 +14,7 @@ struct evsel; struct machine; struct perf_counts_values; struct perf_cpu_map; +struct perf_data; struct perf_event_attr; struct perf_event_mmap_page; struct perf_sample; @@ -100,4 +101,9 @@ static inline int perf_event__synthesize_bpf_events(struct perf_session *session } #endif // HAVE_LIBBPF_SUPPORT +int perf_event__synthesize_for_pipe(struct perf_tool *tool, + struct perf_session *session, + struct perf_data *data, + perf_event__handler_t process); + #endif // __PERF_SYNTHETIC_EVENTS_H diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h index 2abbf668b8dec1a2293cce9e7b6600a06823dd37..472ef5eb406861770e8490c5c93eb4962996d823 100644 --- a/tools/perf/util/tool.h +++ b/tools/perf/util/tool.h @@ -46,6 +46,7 @@ struct perf_tool { mmap2, comm, namespaces, + cgroup, fork, exit, lost, diff --git a/tools/perf/util/top.h b/tools/perf/util/top.h index f117d4f4821e0a26d441aec796c0ae6dcefcade4..7bea36a61645d198de37743af9072c01511a742b 100644 --- a/tools/perf/util/top.h +++ b/tools/perf/util/top.h @@ -18,7 +18,7 @@ struct perf_session; struct perf_top { struct perf_tool tool; - struct evlist *evlist; + struct evlist *evlist, *sb_evlist; struct record_opts record_opts; struct annotation_options annotation_opts; struct evswitch evswitch; diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c index 7c4f83a8c97371a5e073d969238ad491ea09fc62..6c693a026b831526e31c1f0d4cbc9ff64da46c8d 100644 --- a/tools/power/cpupower/utils/helpers/amd.c +++ b/tools/power/cpupower/utils/helpers/amd.c @@ -13,7 +13,8 @@ #define MSR_AMD_PSTATE 0xc0010064 #define MSR_AMD_PSTATE_LIMIT 0xc0010061 -union msr_pstate { +union core_pstate { + /* pre fam 17h: */ struct { unsigned fid:6; unsigned did:3; @@ -26,7 +27,8 @@ union msr_pstate { unsigned idddiv:2; unsigned res3:21; unsigned en:1; - } bits; + } pstate; + /* since fam 17h: */ struct { unsigned fid:8; unsigned did:6; @@ -35,36 +37,56 @@ union msr_pstate { unsigned idddiv:2; unsigned res1:31; unsigned en:1; - } fam17h_bits; + } pstatedef; + /* since fam 1Ah: */ + struct { + unsigned fid:12; + unsigned res1:2; + unsigned vid:8; + unsigned iddval:8; + unsigned idddiv:2; + unsigned res2:31; + unsigned en:1; + } pstatedef2; unsigned long long val; }; -static int get_did(int family, union msr_pstate pstate) +static int get_did(int family, union core_pstate pstate) { int t; - if (family == 0x12) + /* Fam 1Ah onward do not use did */ + if (cpupower_cpu_info.family >= 0x1A) + return 0; + + if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATEDEF) + t = pstate.pstatedef.did; + else if (family == 0x12) t = pstate.val & 0xf; - else if (family == 0x17 || family == 0x18) - t = pstate.fam17h_bits.did; else - t = pstate.bits.did; + t = pstate.pstate.did; return t; } -static int get_cof(int family, union msr_pstate pstate) +static int get_cof(int family, union core_pstate pstate) { int t; - int fid, did, cof; + int fid, did, cof = 0; did = get_did(family, pstate); - if (family == 0x17 || family == 0x18) { - fid = pstate.fam17h_bits.fid; - cof = 200 * fid / did; + if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATEDEF) { + if (cpupower_cpu_info.family >= 0x1A) { + fid = pstate.pstatedef2.fid; + if (fid > 0x0f) + cof = (fid * 5); + } else { + fid = pstate.pstatedef.fid; + cof = 200 * fid / did; + } } else { t = 0x10; - fid = pstate.bits.fid; + fid = pstate.pstate.fid; if (family == 0x11) t = 0x8; cof = (100 * (fid + t)) >> did; @@ -89,14 +111,13 @@ int decode_pstates(unsigned int cpu, unsigned int cpu_family, int boost_states, unsigned long *pstates, int *no) { int i, psmax, pscur; - union msr_pstate pstate; + union core_pstate pstate; unsigned long long val; - /* Only read out frequencies from HW when CPU might be boostable - to keep the code as short and clean as possible. - Otherwise frequencies are exported via ACPI tables. - */ - if (cpu_family < 0x10 || cpu_family == 0x14) + /* Only read out frequencies from HW if HW Pstate is supported, + * otherwise frequencies are exported via ACPI tables. + */ + if (!(cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_HW_PSTATE)) return -1; if (read_msr(cpu, MSR_AMD_PSTATE_LIMIT, &val)) @@ -119,9 +140,9 @@ int decode_pstates(unsigned int cpu, unsigned int cpu_family, } if (read_msr(cpu, MSR_AMD_PSTATE + i, &pstate.val)) return -1; - if ((cpu_family == 0x17) && (!pstate.fam17h_bits.en)) + if ((cpu_family == 0x17) && (!pstate.pstatedef.en)) continue; - else if (!pstate.bits.en) + else if (!pstate.pstate.en) continue; pstates[i] = get_cof(cpu_family, pstate); diff --git a/tools/power/cpupower/utils/helpers/cpuid.c b/tools/power/cpupower/utils/helpers/cpuid.c index 5cc39d4e23edb29871d72aac9f2361a90f4de0d2..929f1dcb169561cd5a6f68d357f164046cb9222a 100644 --- a/tools/power/cpupower/utils/helpers/cpuid.c +++ b/tools/power/cpupower/utils/helpers/cpuid.c @@ -128,9 +128,19 @@ int get_cpu_info(struct cpupower_cpu_info *cpu_info) /* AMD or Hygon Boost state enable/disable register */ if (cpu_info->vendor == X86_VENDOR_AMD || cpu_info->vendor == X86_VENDOR_HYGON) { - if (ext_cpuid_level >= 0x80000007 && - (cpuid_edx(0x80000007) & (1 << 9))) - cpu_info->caps |= CPUPOWER_CAP_AMD_CBP; + if (ext_cpuid_level >= 0x80000007) { + if (cpuid_edx(0x80000007) & (1 << 9)) + cpu_info->caps |= CPUPOWER_CAP_AMD_CPB; + + if ((cpuid_edx(0x80000007) & (1 << 7)) && + cpu_info->family != 0x14) { + /* HW pstate was not implemented in family 0x14 */ + cpu_info->caps |= CPUPOWER_CAP_AMD_HW_PSTATE; + + if (cpu_info->family >= 0x17) + cpu_info->caps |= CPUPOWER_CAP_AMD_PSTATEDEF; + } + } } if (cpu_info->vendor == X86_VENDOR_INTEL) { diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h index 357b19bb136eb5343a91f17fbb15021c2841f7b3..7d2b510c0d7e366e30670a532303bf09d2f1857d 100644 --- a/tools/power/cpupower/utils/helpers/helpers.h +++ b/tools/power/cpupower/utils/helpers/helpers.h @@ -64,11 +64,13 @@ enum cpupower_cpu_vendor {X86_VENDOR_UNKNOWN = 0, X86_VENDOR_INTEL, #define CPUPOWER_CAP_INV_TSC 0x00000001 #define CPUPOWER_CAP_APERF 0x00000002 -#define CPUPOWER_CAP_AMD_CBP 0x00000004 +#define CPUPOWER_CAP_AMD_CPB 0x00000004 #define CPUPOWER_CAP_PERF_BIAS 0x00000008 #define CPUPOWER_CAP_HAS_TURBO_RATIO 0x00000010 #define CPUPOWER_CAP_IS_SNB 0x00000020 #define CPUPOWER_CAP_INTEL_IDA 0x00000040 +#define CPUPOWER_CAP_AMD_HW_PSTATE 0x00000100 +#define CPUPOWER_CAP_AMD_PSTATEDEF 0x00000200 #define CPUPOWER_AMD_CPBDIS 0x02000000 diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c index f406adc40bad5189529817a4fbb8b8981a5ef4c8..3113e551bd8acd83c5edf922a635663551a57f04 100644 --- a/tools/power/cpupower/utils/helpers/misc.c +++ b/tools/power/cpupower/utils/helpers/misc.c @@ -18,7 +18,7 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active, if (ret) return ret; - if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_CBP) { + if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_CPB) { *support = 1; /* AMD Family 0x17 does not utilize PCI D18F4 like prior