|
List Info
Thread: Re: MI SONIC Ethernet driver for mac68k
|
|
| Re: MI SONIC Ethernet driver for mac68k |

|
2007-06-05 09:15:43 |
hauke Espresso.Rhein-Neckar.DE wrote:
> >> Do you have a performance comparison for the
old vs. the MI one?
> >
> >Unfortunately, MI one is slower (currently).
>
> Can you time the transfers from the other (I assume,
non-mac68k) machine
> for comparison?
The other side is NetBSD/i386 (Athlon64) connected via
re(4)
and a Gig switch.
I've tried the similar tests with more recent (today)
sources
with my esp(4) fix, then the MI one gets a bit better
result
than before while it's still slower than old MD one on TX:
---
with old MD driver:
on mac68k side:
---
:
root file system type: ffs
Enter pathname of shell or RETURN for /bin/sh:
We recommend creating a non-root account and using su(1) for
root access.
No entry for terminal type "dumb";
using dumb terminal settings.
# mount -a -t nonfs
# ifconfig sn0 192.168.20.35
# dmesg|grep sn0
sn0 at obio0: integrated Ethernet adapter
sn0: Ethernet address 08:00:07:9f:07:c6
# ./ttcp -rs
ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001
tcp
ttcp-r: socket
ttcp-r: accept from 192.168.20.1
ttcp-r: 16777216 bytes in 19.33 real seconds = 847.75 KB/sec
+++
ttcp-r: 2049 I/O calls, msec/call = 9.66, calls/sec =
106.02
ttcp-r: 0.0user 19.2sys 0:19real 99% 0i+0d 0maxrss 0+2pf
0+0csw
# ./ttcp -ts 192.168.20.1
ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5001
tcp -> 192.168.20.1
ttcp-t: socket
ttcp-t: connect
ttcp-t: 16777216 bytes in 15.93 real seconds = 1028.54
KB/sec +++
ttcp-t: 2048 I/O calls, msec/call = 7.96, calls/sec =
128.57
ttcp-t: 0.1user 15.4sys 0:15real 97% 0i+0d 0maxrss 0+4098pf
0+0csw
#
---
on i386 side:
---
% dmesg|grep cpu0
cpu0 at mainbus0 apid 0: (boot processor)
cpu0: AMD Athlon 64 or Sempron (686-class), 2210.86 MHz, id
0x40ff2
:
cpu0: "AMD Athlon(tm) 64 Processor 3500+"
:
% uname -mrs
NetBSD 4.99.20 i386
% ttcp -ts 192.168.20.35
ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5001
tcp -> 192.168.20.35
ttcp-t: socket
ttcp-t: connect
ttcp-t: 16777216 bytes in 19.36 real seconds = 846.22 KB/sec
+++
ttcp-t: 2048 I/O calls, msec/call = 9.68, calls/sec =
105.78
ttcp-t: -1.9user 0.0sys 0:19real 0% 0i+0d 0maxrss 0+4098pf
0+0csw
% ttcp -rs
ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001
tcp
ttcp-r: socket
ttcp-r: accept from 192.168.20.35
ttcp-r: 16777216 bytes in 15.97 real seconds = 1025.70
KB/sec +++
ttcp-r: 11586 I/O calls, msec/call = 1.41, calls/sec =
725.33
ttcp-r: 0.0user 0.0sys 0:15real 0% 0i+0d 0maxrss 0+2pf
0+0csw
%
---
with MI driver:
on mac68k side:
---
:
using dumb terminal settings.
# mount -a -t nonfs
# ifconfig sn0 192.168.20.35
# dmesg|grep sn0
sn0 at obio0: integrated SONIC Ethernet adapter
sn0: Ethernet address 08:00:07:9f:07:c6
# ./ttcp -rs
ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001
tcp
ttcp-r: socket
ttcp-r: accept from 192.168.20.1
ttcp-r: 16777216 bytes in 19.14 real seconds = 855.99 KB/sec
+++
ttcp-r: 2049 I/O calls, msec/call = 9.57, calls/sec =
107.05
ttcp-r: 0.0user 19.0sys 0:19real 99% 0i+0d 0maxrss 0+2pf
0+0csw
# ./ttcp -ts 192.168.20.1
ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5001
tcp -> 192.168.20.1
ttcp-t: socket
ttcp-t: connect
ttcp-t: 16777216 bytes in 20.61 real seconds = 794.98 KB/sec
+++
ttcp-t: 2048 I/O calls, msec/call = 10.30, calls/sec =
99.37
ttcp-t: 0.1user 20.4sys 0:20real 99% 0i+0d 0maxrss 0+4098pf
0+0csw
#
---
on i386 side:
---
% ttcp -ts 192.168.20.35
ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5001
tcp -> 192.168.20.35
ttcp-t: socket
ttcp-t: connect
ttcp-t: 16777216 bytes in 19.18 real seconds = 854.25 KB/sec
+++
ttcp-t: 2048 I/O calls, msec/call = 9.59, calls/sec =
106.78
ttcp-t: -1.9user 0.0sys 0:19real 0% 0i+0d 0maxrss 0+4098pf
0+0csw
% ttcp -rs
ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001
tcp
ttcp-r: socket
ttcp-r: accept from 192.168.20.35
ttcp-r: 16777216 bytes in 20.65 real seconds = 793.25 KB/sec
+++
ttcp-r: 12273 I/O calls, msec/call = 1.72, calls/sec =
594.21
ttcp-r: 0.0user 0.0sys 0:20real 0% 0i+0d 0maxrss 0+2pf
0+0csw
%
---
Summary:
TX on sn0 RX on sn0
MD: 1026KB/s 846KB/s
MI: 793KB/s 854KB/s
- RX looks mostly the same.
Maybe I forgot to update <m68k/types.h> then MI
dp83932.c might
do extra copies due to lack of __NO_STRICT_ALIGNMENT, and
the bottleneck is in some upper layer?
- TX is still slower on MI driver.
Maybe MI dp83932.c tries to set up too many DMA
descriptors
to send fragmented mbufs directly, and cache flush ops
against such descriptors are more expensive than copying
mbufs
to uncached contiguous buffer?
(if so, adding BUS_DMA_COHERENT support may improve
performance)
> Since, as I understand, the MD driver does
buffer-to-memory
> transfers by cpu, it may well lock out timer interrupts
and lose clock
> ticks, possibly skewing your timing results.
Actually I see esp(4) driver on mac68k has such problem
(softclock seems blocked too much according to vmstat -i),
but MD mac68k/dev/if_sn.c doesn't have splhigh() at all
so I don't think it causes tick loss.
---
Izumi Tsutsui
|
|
| Re: MI SONIC Ethernet driver for mac68k |

|
2007-06-06 11:54:57 |
I wrote:
> Summary:
> TX on sn0 RX on sn0
> MD: 1026KB/s 846KB/s
> MI: 793KB/s 854KB/s
more results:
MI driver with BUS_DMA_COHERENT support:
TX on sn0 RX on sn0
842KB/s 888KB/s
MI driver with BUS_DMA_COHERENT support and 16bytes TX DMA
threshold:
TX on sn0 RX on sn0
903KB/s 886KB/s
---
Izumi Tsutsui
Index: arch/m68k/include/bus_dma.h
============================================================
=======
RCS file: /cvsroot/src/sys/arch/m68k/include/bus_dma.h,v
retrieving revision 1.8
diff -u -r1.8 bus_dma.h
--- arch/m68k/include/bus_dma.h 4 Mar 2007 06:00:04
-0000 1.8
+++ arch/m68k/include/bus_dma.h 6 Jun 2007 16:48:19 -0000
 -119,6
+119,7 
struct m68k_bus_dma_segment {
bus_addr_t ds_addr; /* DMA address */
bus_size_t ds_len; /* length of transfer */
+ u_int _ds_flags; /* MD flags */
};
typedef struct m68k_bus_dma_segment bus_dma_segment_t;
 -215,7
+216,7 
int _dm_segcnt; /* number of segs this map can map */
bus_size_t _dm_maxmaxsegsz; /* fixed largest possible
segment */
bus_size_t _dm_boundary; /* don't cross this */
- int _dm_flags; /* misc. flags */
+ u_int _dm_flags; /* misc. flags */
/* Machine dependant fields: */
bus_size_t dm_xfer_len; /* length of successful transfer
*/
Index: arch/m68k/include/pmap_motorola.h
============================================================
=======
RCS file:
/cvsroot/src/sys/arch/m68k/include/pmap_motorola.h,v
retrieving revision 1.13
diff -u -r1.13 pmap_motorola.h
--- arch/m68k/include/pmap_motorola.h 12 May 2007 17:43:53
-0000 1.13
+++ arch/m68k/include/pmap_motorola.h 6 Jun 2007 16:48:19
-0000
 -202,10
+202,8 
#define PMAP_PREFER(foff, vap, sz, td) pmap_prefer((foff),
(vap))
#endif
-#ifdef mvme68k
void _pmap_set_page_cacheable(struct pmap *, vaddr_t);
void _pmap_set_page_cacheinhibit(struct pmap *, vaddr_t);
int _pmap_page_is_cacheable(struct pmap *, vaddr_t);
-#endif
#endif /* !_M68K_PMAP_MOTOROLA_H_ */
Index: arch/m68k/m68k/bus_dma.c
============================================================
=======
RCS file: /cvsroot/src/sys/arch/m68k/m68k/bus_dma.c,v
retrieving revision 1.23
diff -u -r1.23 bus_dma.c
--- arch/m68k/m68k/bus_dma.c 2 Jun 2007 11:13:45 -0000 1.23
+++ arch/m68k/m68k/bus_dma.c 6 Jun 2007 16:48:19 -0000
 -141,23
+141,30 
bus_size_t sgsize;
bus_addr_t curaddr, lastaddr, baddr, bmask;
vaddr_t vaddr = (vaddr_t)buf;
- int seg;
+ int seg, cacheable, coherent;
+ pmap_t pmap;
bool rv;
+ coherent = BUS_DMA_COHERENT;
lastaddr = *lastaddrp;
bmask = ~(map->_dm_boundary - 1);
+ if (!VMSPACE_IS_KERNEL_P(vm))
+ pmap = vm_map_pmap(&vm->vm_map);
+ else
+ pmap = pmap_kernel();
for (seg = *segp; buflen > 0 ; ) {
/*
* Get the physical address for this segment.
*/
- if (!VMSPACE_IS_KERNEL_P(vm))
- rv = pmap_extract(vm_map_pmap(&vm->vm_map),
- vaddr, &curaddr);
- else
- rv = pmap_extract(pmap_kernel(), vaddr, &curaddr);
+ rv = pmap_extract(pmap, vaddr, &curaddr);
KASSERT(rv);
+ cacheable = _pmap_page_is_cacheable(pmap, vaddr);
+
+ if (cacheable)
+ coherent = 0;
+
/*
* Compute the segment size, and adjust counts.
*/
 -181,6
+188,8 
if (first) {
map->dm_segs[seg].ds_addr = curaddr;
map->dm_segs[seg].ds_len = sgsize;
+ map->dm_segs[seg]._ds_flags =
+ cacheable ? 0 : BUS_DMA_COHERENT;
first = 0;
} else {
if (curaddr == lastaddr &&
 -195,6
+204,8 
break;
map->dm_segs[seg].ds_addr = curaddr;
map->dm_segs[seg].ds_len = sgsize;
+ map->dm_segs[seg]._ds_flags =
+ cacheable ? 0 : BUS_DMA_COHERENT;
}
}
 -205,6
+216,9 
*segp = seg;
*lastaddrp = lastaddr;
+ map->_dm_flags &= ~BUS_DMA_COHERENT;
+ /* BUS_DMA_COHERENT is set only if all segments are
uncached */
+ map->_dm_flags |= coherent;
/*
* Did we fit?
 -408,6
+422,7 
map->dm_maxsegsz = map->_dm_maxmaxsegsz;
map->dm_mapsize = 0;
map->dm_nsegs = 0;
+ map->_dm_flags &= ~BUS_DMA_COHERENT;
}
/*
 -426,6
+441,7 
#if defined(M68040) || defined(M68060)
bus_addr_t p, e, ps, pe;
bus_size_t seglen;
+ bus_dma_segment_t *seg;
int i;
#endif
 -438,6
+454,10 
#endif
#if defined(M68040) || defined(M68060)
+ /* If the whole DMA map is uncached, do nothing. */
+ if ((map->_dm_flags & BUS_DMA_COHERENT) != 0)
+ return;
+
/* Short-circuit for unsupported `ops' */
if ((ops & (BUS_DMASYNC_PREREAD |
BUS_DMASYNC_PREWRITE)) == 0)
return;
 -446,9
+466,10 
* flush/purge the cache.
*/
for (i = 0; i < map->dm_nsegs && len != 0;
i++) {
- if (map->dm_segs[i].ds_len <= offset) {
+ seg = &map->dm_segs[i];
+ if (seg->ds_len <= offset) {
/* Segment irrelevant - before requested offset */
- offset -= map->dm_segs[i].ds_len;
+ offset -= seg->ds_len;
continue;
}
 -457,11
+478,15 
* each segment until we have exhausted the
* length.
*/
- seglen = map->dm_segs[i].ds_len - offset;
+ seglen = seg->ds_len - offset;
if (seglen > len)
seglen = len;
- ps = map->dm_segs[i].ds_addr + offset;
+ /* Ignore cache-inhibited segments */
+ if ((seg->_ds_flags & BUS_DMA_COHERENT) != 0)
+ continue;
+
+ ps = seg->ds_addr + offset;
pe = ps + seglen;
if (ops & BUS_DMASYNC_PREWRITE) {
 -655,10
+680,20 
pmap_enter(pmap_kernel(), va, addr,
VM_PROT_READ | VM_PROT_WRITE,
VM_PROT_READ | VM_PROT_WRITE | PMAP_WIRED);
+
+ /* Cache-inhibit the page if necessary */
+ if ((flags & BUS_DMA_COHERENT) != 0)
+ _pmap_set_page_cacheinhibit(pmap_kernel(), va);
+
+ segs[curseg]._ds_flags &= ~BUS_DMA_COHERENT;
+ segs[curseg]._ds_flags |= (flags &
BUS_DMA_COHERENT);
}
}
pmap_update(pmap_kernel());
+ if ((flags & BUS_DMA_COHERENT) != 0)
+ TBIAS();
+
return 0;
}
 -669,6
+704,8 
void
_bus_dmamem_unmap(bus_dma_tag_t t, void *kva, size_t size)
{
+ vaddr_t va;
+ size_t s;
#ifdef DIAGNOSTIC
if ((u_long)kva & PGOFSET)
 -677,6
+714,15 
size = round_page(size);
+ /*
+ * Re-enable cacheing on the range
+ * XXXSCW: There should be some way to indicate that the
pages
+ * were mapped DMA_MAP_COHERENT in the first place...
+ */
+ for (s = 0, va = (vaddr_t)kva; s < size;
+ s += PAGE_SIZE, va += PAGE_SIZE)
+ _pmap_set_page_cacheable(pmap_kernel(), va);
+
pmap_remove(pmap_kernel(), (vaddr_t)kva, (vaddr_t)kva +
size);
pmap_update(pmap_kernel());
uvm_km_free(kernel_map, (vaddr_t)kva, size,
UVM_KMF_VAONLY);
 -707,6
+753,10 
continue;
}
+ /*
+ * XXXSCW: What about BUS_DMA_COHERENT ??
+ */
+
return m68k_btop((char *)segs[i].ds_addr + off);
}
Index: arch/m68k/m68k/pmap_motorola.c
============================================================
=======
RCS file: /cvsroot/src/sys/arch/m68k/m68k/pmap_motorola.c,v
retrieving revision 1.30
diff -u -r1.30 pmap_motorola.c
--- arch/m68k/m68k/pmap_motorola.c 18 May 2007 01:46:40
-0000 1.30
+++ arch/m68k/m68k/pmap_motorola.c 6 Jun 2007 16:48:20
-0000
 -2848,8
+2848,6 
(void)cachectl1(0x80000004, va, len, p);
}
-#ifdef mvme68k
-
void
_pmap_set_page_cacheable(pmap_t pmap, vaddr_t va)
{
 -2905,8
+2903,6 
return (pmap_pte_ci(pmap_pte(pmap, va)) == 0) ? 1 : 0;
}
-#endif /* mvme68k */
-
#ifdef DEBUG
/*
* pmap_pvdump:
|
|
[1-2]
|
|