Ticket #2200 (new defect)

Opened 2 years ago

Last modified 1 year ago

S2875 k8temp reports different values from BIOS and w83627hf

Reported by: tuskentower@gmail.com Assigned to: somebody
Priority: minor Milestone:
Component: hardware Version:
Keywords: S2875 k8temp Cc:

Description

I just rebuilt my kernel to try something out, and I enabled the k8temp kernel module. When I ran sensors I saw two new values corresponding to the two Opterons on the motherboard. These values conflict by ~10 degrees C with the w83627hf and BIOS (I have confirmed that they match).

I am using the sensors.conf that Tyan hosts somewhere on their website. I changed the config file slightly to get the correct temp values and fan speeds. I am attaching this file and lspci output.

Motherboard: Tyan S2875
Distro: Debian Etch testing with a custom Linux 2.6.21-rc2 kernel
lm-sensors: 2.10.1-3

I just started looking at this 20 minutes ago and I saw another ticket, #2152, that looks similar except that it applies to a different board. If the problem is the same, we can close this one.

Attachments

sensors (1.3 kB) - added by ticket on 04/04/07 23:08:46.
lm-sensors output
lspci (3.0 kB) - added by ticket on 04/04/07 23:09:58.
lspci output
lspci-hexdump-opteron-host-bridge (348 bytes) - added by ticket on 04/04/07 23:10:29.
hex dump the AMD hostbridge
cpuinfo (1.3 kB) - added by ticket on 04/04/07 23:11:40.
/proc/cpuinfo
k8temp_diodeoffset.patch (0.8 kB) - added by ticket on 04/16/07 21:03:03.
patch updated for style
s2875-changed-sensors.conf (2.1 kB) - added by ticket on 04/16/07 21:04:04.
config file updated for +7 degrees C offset

Change History

04/04/07 23:08:46 changed by ticket

  • attachment sensors added.

lm-sensors output

04/04/07 23:09:58 changed by ticket

  • attachment lspci added.

lspci output

04/04/07 23:10:29 changed by ticket

  • attachment lspci-hexdump-opteron-host-bridge added.

hex dump the AMD hostbridge

04/04/07 23:11:40 changed by ticket

  • attachment cpuinfo added.

/proc/cpuinfo

(in reply to: ↑ description ) 04/04/07 23:15:32 changed by ticket

Adding sensors input inline:

k8temp-pci-00c3
Adapter: PCI adapter
temp1: +37°C

k8temp-pci-00cb
Adapter: PCI adapter
temp1: +31°C

w83627hf-isa-0290
Adapter: ISA adapter
...
CPU1 Temp: +48.0°C (high = +80°C, hyst = +75°C) sensor = diode
CPU2 Temp: +43.0°C (high = +80°C, hyst = +75°C) sensor = diode
...

CPU 1 runs hotter than CPU2 for no apparent reason (aside from me screwing up while mounting the heatsink). The values reported by the w83627hf chipset have been confirmed by the values that I have seen in the BIOS.

04/05/07 16:54:24 changed by ticket

I read the k8temp kernel doc which pointed me to AMDs datasheet. From there I read section 4.6.23 "Thermtrip Status Register" with the following excerpt:

Diode Offset (DiodeOffset?|5:0|)—Bits 13–8. Thermal diode offset is used to correct the measurement made by an external temperature sensor. This diode offset supports temperature sensors using two sourcing currents only. Other sourcing current implementations are not compatible with the diode offset and are not supported by AMD. The allowable offset range is provided in the appropriate processor functional data sheet, and the maximum offset can vary for different processors. A correction to the offset may be needed for some temperature sensors. Contact the temperature sensor vendor to determine whether an offset correction is needed.

To me that paragraph means that I should read the datasheet for my processors, 2Ghz 246HE (stepping CG I believe). That last line confuses me though. I'm guessing that my confusion is the result of globbing the diode and reading the diode together. Does that paragraph mean that the sensor reading the diode has its own inaccuracies along with the inaccuracy of the diode (which I need to read from the processor datasheet)?

04/05/07 18:31:26 changed by ticket

OSK246CMP5AU with a thermal resistance (case to ambient) of 0.50 C/W (pulled from 30417.pdf may/2006). I'm not sure what those numbers mean, but they seem important. The Tcase max values are given, but that is really the case temperature and has nothing to do with the diode. Any idea where else I should look?


I read the driver code and more of the tech spec. The driver is not pulling the diode offset information. Does anyone know if pulling the diode offset will help generate better values?

It also looks like the temp measurement might be different with the revision G processors, but the spec sheet didn't make sense to me in the first pass.

04/13/07 05:54:08 changed by ticket

This is the patch that I created to utilize the temperature sensor diode offset in the k8temp.

diff -uprN kernel.orig/drivers/hwmon/k8temp.c kernel/drivers/hwmon/k8temp.c
--- kernel.orig/drivers/hwmon/k8temp.c  2007-04-12 23:15:02.000000000 -0400
+++ kernel/drivers/hwmon/k8temp.c       2007-04-12 23:13:53.000000000 -0400
@@ -33,6 +33,7 @@
 #include <linux/mutex.h>
 
 #define TEMP_FROM_REG(val)     (((((val) >> 16) & 0xff) - 49) * 1000)
+#define OFFSET_FROM_REG(val)   ((val >> 8) & 0x3f)?(11 - ((val >> 8) & 0x3f)):0
 #define REG_TEMP       0xe4
 #define SEL_PLACE      0x40
 #define SEL_CORE       0x04
@@ -117,7 +118,8 @@ static ssize_t show_temp(struct device *
        struct k8temp_data *data = k8temp_update_device(dev);
 
        return sprintf(buf, "%d\n",
-                      TEMP_FROM_REG(data->temp[core][place]));
+                       ((int) TEMP_FROM_REG(data->temp[core][place])) +
+                       (((int) OFFSET_FROM_REG(data->temp[core][place]))*1000));
 }
 
 /* core, place */

After I patched the k8temp driver, the difference between k8temp reported temperatures and BIOS reported temps are now 6 degrees C off. The upshot is that the difference is uniform instead of randomly different. Now I have to figure out how to us the config file (aka rtfm).

04/16/07 21:03:03 changed by ticket

  • attachment k8temp_diodeoffset.patch added.

patch updated for style

04/16/07 21:04:04 changed by ticket

  • attachment s2875-changed-sensors.conf added.

config file updated for +7 degrees C offset

04/16/07 21:08:23 changed by ticket

After updating the k8temp module to use the Diode Offset, I found that the temps reported were off by 6 to 7 degrees C. I updated my sensors.conf file to fix this discrepancy and now the numbers look about the same

k8temp-pci-00c3
Adapter: PCI adapter
CPU1 Temp: +47°C

k8temp-pci-00cb
Adapter: PCI adapter
CPU2 Temp: +43°C

w83627hf-isa-0290
Adapter: ISA adapter
+1.8V: +1.84 V (min = +1.71 V, max = +1.89 V)
CPU VRM: +1.31 V (min = +1.23 V, max = +1.36 V)
+3.3V: +3.25 V (min = +3.14 V, max = +3.47 V)
in3: +2.98 V (min = +2.74 V, max = +3.89 V)
5VSB: +5.26 V (min = +4.50 V, max = +5.50 V)
-12V: -12.41 V (min = -12.48 V, max = -12.27 V) ALARM
HT Volt: +1.23 V (min = +1.26 V, max = +1.14 V) ALARM
in7: +3.20 V (min = +3.87 V, max = +3.71 V) ALARM
VBat: +3.14 V (min = +2.40 V, max = +3.60 V)
CPU2 Fan: 168750 RPM (min = 0 RPM, div = 8)
CPU1 Fan: 1917 RPM (min = 0 RPM, div = 8)
SYS Fan: 0 RPM (min = 2657 RPM, div = 2) ALARM
Mobo Temp: +63°C (high = +99°C, hyst = +44°C) sensor = thermistor
CPU1 Temp: +46.5°C (high = +80°C, hyst = +75°C) sensor = diode
CPU2 Temp: +42.5°C (high = +80°C, hyst = +75°C) sensor = diode
vid: +1.300 V (VRM Version 2.4)
alarms:
beep_enable:

Sound alarm enabled

(follow-up: ↓ 7 ) 04/19/07 14:05:06 changed by khali

  • cc deleted.
  • keywords changed from S2875 k8temp w83627hf to S2875 k8temp.
  • reporter changed from ticket to tuskentower@gmail.com.

Please attach the output of:

lspci -xxx -s 00:18.3
lspci -xxx -s 00:19.3

but as root this time, so that we see all the registers.

(in reply to: ↑ 6 ) 04/19/07 16:43:50 changed by ticket

sudo lspci -xxx -s 00:18.3
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00: 22 10 03 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: ff 3b 00 00 40 00 c0 00 00 00 00 00 00 00 00 00
50: e0 c3 8e bc 00 00 00 00 00 00 00 00 80 af 31 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 11 01 02 51 11 80 00 50 00 38 00 08 1b 22 00 00
80: 00 00 07 23 13 21 13 00 00 00 00 00 00 00 00 00
90: 05 00 00 00 70 00 00 00 00 c0 bd 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 3d 00 00 80 fb 80 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 07 07 e2 04 10 27 00 20 00 25 00 00
e0: 00 00 00 00 20 07 59 00 1b 01 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

sudo  lspci -xxx -s 00:19.3
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00: 22 10 03 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: ff 3b 00 00 40 00 40 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 c7 ff be
60: 77 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 11 01 02 51 11 80 00 50 00 38 00 08 1b 22 00 00
80: 00 00 07 23 13 21 13 00 00 00 00 00 00 00 00 00
90: 05 00 00 00 70 00 00 00 00 c0 bd 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 1f 00 00 40 a2 b0 b9 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 07 07 e2 04 10 27 00 20 00 25 00 00
e0: 00 00 00 00 20 04 53 00 1b 01 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

06/24/07 21:58:29 changed by ticket

I have a AMD 64 X2 3600+ and I've had a similar experience. I also looked the AMD spec and found the interesting values about diode offset.

I my case the diode offset is always 21C. I added code to driver to dump out the raw values of the the word in question.

One set of values (4 numbers 2 CPU X 2 sensors) were:

3ba03a 30e07a 31e03e 2c207e

As a sanity check these have the correct bits turned on the select core 0/1 and sensor 0/1 I wrote a small C prog to format the values as per the data sheet

#include <stdio.h> #include <stdlib.h>

int main (int argc, char * argv[]) {

int i; long val; long diode_offset; long curr_temp; long tj_offset;

float f_diode_offset; float f_curr_temp; float f_t_control;

for (i=1; i<argc; ++i)

{

val = strtol(argv[i], NULL, 16); printf("Arg[%d] = %X\n", i, val);

diode_offset = 0x3F & (val >> 8); curr_temp = 0x3FF & (val >> 14); tj_offset = 0x1F & (val >> 24);

printf("raw:diode_offset = %d\n", diode_offset); printf("raw:curr_temp = %d\n", curr_temp); printf("raw:tj_offest = %d\n", tj_offset);

f_diode_offset = (float)diode_offset - 11; f_curr_temp = (float)curr_temp/4 - 49; f_t_control = f_curr_temp - (float)tj_offset*2 - 49;

printf("diode_offset = %f\n", f_diode_offset); printf("curr_temp = %f\n", f_curr_temp); printf("f_t_control = %f\n", f_t_control);

printf("Guess = %f\n", f_diode_offset + f_curr_temp);

printf("===============\n\n\n");

}

}

06/24/07 22:12:16 changed by ticket

This is graeme, to continue the above example. I ran a CPU load via teaskset to drive one core HOT, and took more reading. Formatting this vai the above utility reveals:

graeme@mediabox:~/src$ ./testamd 40A03A 37A07A 38603E 32E07E Arg[1] = 40A03A raw:diode_offset = 32 raw:curr_temp = 258 raw:tj_offest = 0 diode_offset = 21.000000 curr_temp = 15.500000 f_t_control = -33.500000 Guess = 36.500000 ===============

Arg[2] = 37A07A raw:diode_offset = 32 raw:curr_temp = 222 raw:tj_offest = 0 diode_offset = 21.000000 curr_temp = 6.500000 f_t_control = -42.500000 Guess = 27.500000 ===============

Arg[3] = 38603E raw:diode_offset = 32 raw:curr_temp = 225 raw:tj_offest = 0 diode_offset = 21.000000 curr_temp = 7.250000 f_t_control = -41.750000 Guess = 28.250000 ===============

Arg[4] = 32E07E raw:diode_offset = 32 raw:curr_temp = 203 raw:tj_offest = 0 diode_offset = 21.000000 curr_temp = 1.750000 f_t_control = -47.250000 Guess = 22.750000 ===============

The output from sensors is:

root@mediabox:~# sensors k8temp-pci-00c3 Adapter: PCI adapter temp1: +11°C temp2: +2°C temp3: +1°C temp4: +1073738°C

it8716-isa-0290 Adapter: ISA adapter VCore: +1.41 V (min = +0.00 V, max = +5.55 V) +3.3V: +3.28 V (min = +3.13 V, max = +3.47 V) +5V: +5.05 V (min = +4.74 V, max = +5.25 V) +12V: +12.06 V (min = +0.00 V, max = +17.18 V) 5VSB: +5.16 V (min = +4.75 V, max = +5.25 V) VBat: +2.91 V CPU Fan: 1642 RPM (min = 399 RPM) Case Fan1: 0 RPM (min = 1997 RPM) ALARM Case Fan2: 0 RPM (min = 0 RPM) CPU Temp: +21°C (low = +10°C, high = +60°C) sensor = diode M/B Temp: +32°C (low = +10°C, high = +50°C) sensor = thermistor vid: +0.000 V

root@mediabox:~# cat /etc/sensors.conf

...elided ...

# ########################################################################## #### Here begins the real configuration file

# These values were found, here: # # http://www.abclinuxu.cz/forum/show/145943 # # These were a match for: M2NPV-VM and it8716-isa-0290 # This I think is therefore a ASUS m2npv-vm motherboard with lm-sensors # # I can't read the text, but the 'before case' is a very good match for # what I got: # # Before: # #k8temp-pci-00c3 #Adapter: PCI adapter #Core0 Temp: # +11°C #Core0 Temp: # +1°C #Core1 Temp: # +2°C #Core1 Temp: # -4°C # #it8716-isa-0290 #Adapter: ISA adapter #VCore: +1.04 V (min = +0.00 V, max = +4.08 V) #VDDR: +3.14 V (min = +0.00 V, max = +4.08 V) #+3.3V: +0.00 V (min = +0.00 V, max = +4.08 V) ALARM #+5V: +4.78 V (min = +0.00 V, max = +6.85 V) #+12V: +11.46 V (min = +0.00 V, max = +16.32 V) #-12V: -16.97 V (min = -16.97 V, max = +4.01 V) ALARM #-5V: -8.78 V (min = -8.78 V, max = +4.05 V) ALARM #5VSB: +4.73 V (min = +0.00 V, max = +6.85 V) #VBat: +2.91 V #fan1: 3096 RPM (min = 0 RPM) #fan2: 0 RPM (min = 0 RPM) #fan3: 0 RPM (min = 0 RPM) #temp1: +22°C (low = -1°C, high = +127°C) sensor = diode #temp2: +32°C (low = -1°C, high = +127°C) sensor = thermistor #temp3: +25°C (low = -1°C, high = +127°C) sensor = thermistor #vid: +0.000 V # # # Note the K8temp section is not affected by this. # #

chip "it8716-*"

ignore in2 ignore in5 ignore in6

label in0 "VCore" label in1 "+3.3V" label in3 "+5V" # VCC label in4 "+12V" label in7 "5VSB" # VCCH label in8 "VBat"

compute in0 @*1.36 , @/1.36 compute in1 @*1.047 , @/1.047 compute in3 @*1.773 , @/1.773 compute in4 @*4.21 , @/4.21 compute in7 @*1.833 , @/1.833

set in1_min 3.3 * 0.95 set in1_max 3.3 * 1.05 set in3_min 5 * 0.95 set in3_max 5 * 1.05 set in6_max -5 * 0.95 set in6_min -5 * 1.05 set in7_min 5 * 0.95 set in7_max 5 * 1.05

label temp1 "CPU Temp" label temp2 "M/B Temp" ignore temp3

set temp1_over 60 set temp1_low 10 set temp2_over 50 set temp2_low 10

label fan1 "CPU Fan" label fan2 "Case Fan1" label fan3 "Case Fan2" # ignore fan3

set fan1_min 400 set fan2_min 2000

# # Reading http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/32559.pdf # # My guess is 000H = -49.00C # 001H = -48.75C # 002H = -48.50C # # So I think we want to do: # <raw value>/4 + 49 # However this is already done in the driver (I read teh source) so # it should not be needed. Anyhow the values appear to be rubbish #

#chip "k8temp-*"

label temp1 "Core0 Temp" label temp2 "Core0 Temp" label temp3 "Core1 Temp" label temp4 "Core1 Temp"

#ompute temp1 @+21 , @-21 #ompute temp2 @+21 , @-21 #ompute temp3 @+21 , @-21 #ompute temp4 @+21 , @-21

FYI, the BIOS reports things like CPU=32C MB = 34C (of course not at this exact time and the CPU varies very quickly)