Kernel source recovery - problems
Preface
While recovering the Kernel sources for my Z8000 UNIX operating system called WEGA I
encountered different errors. The main goal was (and still is) to create C sources
which are compiling to the same object which is located in the Kernel library LIB1
(sys) and LIB2 (dev) WEGA itself is a ZEUS whose kernel was modified by EAW to match
the special P8000 hardware. So most parts of LIB2 are homebrew EAW stuff. Those parts
modified/developed by EAW are not problematic since I've the original sources of them.
All other files I don't had the sources for are original ZEUS kernel objects. As a
point where to start I used the original SYSIII sources and compared the resulted ASM
code with the ASM code the Z8000-disassembler produced out of the Kernel-Object files.
I then adjusted the SYSIII sources until they are compiling now to the same objects.
Some parts where written in the original kernel directly in ASM so there where easy to
recreate based on the disassembled objects.
Now I'm at a point where I've got sources files which are not compiling to the same
objects and I wonder if someone is willing to help me getting this done by reading
this text and trying to answer the questions I've summarized here.
Index
problem description
I've some functions where the asm code looks as follows:
0530 3582 0004 584 ldl rr2,rr8(#4)
0534 9424 ldl rr4,rr2
0536 0704 7f00 585 and r4,#32512
04d2 5d04 8000* 586 ldl _u+78,rr4
04d6 004e*
This means, an unsigned long value stored in rr8 at position 4 gets loaded into rr2, then into
rr4 and then ANDed with 7F00FFFF (r4 are the first 2 bytes of rr4). After the operation is done,
the result gets loaded into the address the external reference _u is stored + 78 bytes.
The C code I tried to produce out of this information is:
u.u_dirp.l = (caddr_t)(((long)uap->linkname) & 0x7F00FFFF);
But this generates to the following ASM code:
0530 3582 0004 584 ldl rr2,rr8(#4)
0536 0702 7f00 and r2,#32512
04d2 5d02 8000* 585 ldl _u+78,rr2
04d6 004e*
This might be functional the same, but not from the ASM point of view.
things I've read out of my C-code
The ANDing is done to strip out invalid segments. On a Z8001, only segments 0 - 127 are valid. In an address, the segments are decoded like this "SS -- AA AA"
- the upper 4 bits decode the segment
- the 4 bits below are ignored
- the lower 8 bits are the address in the given segment
So - this logic makes sure only 0 - 7F is in the memory adsress as everything else would rais an invalid segment TRAP of the CPU.
things I've already tried
- because rr8(#4) gets loaded from rr2 into rr4 (so it gets copied) It came to my mind,
that maybe what is in rr8(#4) will later be reused - in this case uap->linkname. Means,
the C-Optimizer copies the register before modifying it because then the original content
can be used later to not access rr8(#4) once more because it is a bit more cost intensive.
But I can't see any reference to uap->linkname later in the ASM or C code.
- When changing
u.u_dirp.l = (caddr_t)(((long)uap->linkname) & 0x7F00FFFF);
into u.u_dirp.l = *(caddr_t)(((long)uap->linkname) & 0x7F00FFFF);
so that
a pointer gets loaded into _u+78 the line "ldl rr4,@rr2" gets added and then the
further processing (and+ldl) is happening with rr4 - but this loads the address into
_u+78 which is not what the original code does.
- u.u_dirp.l = (long)((saddr_t *)uap->linkname)->l & 0x7f00ffffL;
- u.u_dirp.l = ((long)uap->linkname&0x7F00FFFFL);
- u.u_dirp.l = (long)((int)uap->linkname&0x7F00);
- ipc.ip_addr.l = (caddr_t)((uap->addr.left & 0x7F00)<<16|uap->addr.right)
ld r2,rr10(#4)
and r2,#32512
sla r2,#16
ldl rr4,rr10
inc r5,#6
or r2,@rr4
sub r4,r4
ld r5,r2
ldl _ipc+4,rr4
(this was tried with another file/function but with the same problem)
my questions
- How should the C code look like to get this ASM code?
further information
Below a copy of my C implementation of the link() Syscall the example was taken from:
The complete sys2.c can be accessed here: sys2.c
The user.h header which explains the u(ser) structure is stored here: user.h
The param.h header which explains the types saddr_t, caddr_t and so on is stored here: param.h
/*
* link system call
*/
link()
{
register struct inode *ip, *xp;
register struct a {
char *target;
char *linkname;
} *uap;
uap = (struct a *)u.u_ap;
ip = namei(uchar, 0);
if (ip == NULL)
return;
if ((ip->i_mode&IFMT)==IFDIR && !suser())
goto out;
prele(ip);
u.u_dirp.l = (caddr_t)(((long)uap->linkname) & 0x7F00FFFF); /* FIXME: this is not 100% compatible */
xp = namei(uchar, 1);
if (xp != NULL) {
u.u_error = EEXIST;
iput(xp);
goto out;
}
if (u.u_error)
goto out;
if (u.u_pdir->i_dev != ip->i_dev) {
iput(u.u_pdir);
u.u_error = EXDEV;
goto out;
}
wdir(ip);
if (u.u_error==0) {
ip->i_nlink++;
ip->i_flag |= ICHG;
}
out:
iput(ip);
}