|
|
Search Data | Documents | FTP Databanks |
AboutGridFTP.Bio-Mirror.net provides mirroring of public biology data sets. Bio-mirror.net project has been working since 1999 to provide up-to-date public biology data to the world."GridFTP" means a new, faster data transfer method is in place, now in test mode (Aug 2010). This software from the Globus project. has recently been improved to offer UDP-based file transport, with long-distance speed improvements of 3x to 10x over the usual TCP-based file transport. -- Don Gilbert, August 2010, gilbertd At bio-mirror.net GridFTP Documents and Installation helpREAD_ME: http://www.globus.org/toolkit/docs/5.0/5.0.2/data/gridftp/Early adopters of this will need to read the docs from Globus. We will later provide a how-to document suited to Bio-mirror.net.
As of this writing you will need to fetch and compile
the GridFTP software with UDT support.
Globus Version 5.0.2 or later is required. GridFTP + UDT will compile and run on Linux with these steps: export GLOBUS_LOCATION=/usr/local/globus5 ./configure --prefix=$GLOBUS_LOCATION make gridftp udt install > log.mks1 2>&1 & cd $GLOBUS_LOCATION cp -p bin/XXXpthr/shared/globus-url-copy bin/This last step for thr/shared/globus-url-copy is documented, but not obvious at first. UDT will only work with the Threaded build. GridFTP + UDT will compile and run on MacOSX and Solaris 10 (my preference) if you modify the globus build process for UDT, to enable compile options for UNIX (Solaris 10) or Mac OSX. I also needed to modify udt4/src/channel.cpp for solaris 10. GridFTP runs in anonymous FTP mode at bio-mirror.net, which also requires a few server source changes for better anon-ftp. Globus-url-copy needs a patch to preserve file timestamp, which it should, especially with the new -sync option.
These gt5.0.2_patches.txt
are my patches that can be applied to gt5.0.2-all-source-installer/
Trial runsA small repository for testing is available at port 2899 of this server. Please use this for trial runs, and you need not register.List serverglobus-url-copy -list \ ftp://gridftp.bio-mirror.net:2899/biomirror/ Copy tiny data setusing TCP: time globus-url-copy -sync -cd \ ftp://gridftp.bio-mirror.net:2899/biomirror/rebase/ \ rebase/ using UDT: add -udt Copy larger data setA useful 3GB data set of NCBI Blast NR protein data using TCP: time globus-url-copy -sync -cd \ 'ftp://gridftp.bio-mirror.net:2899/biomirror/blast/nr.*.tar.gz' \ blast/ using UDT: add -udt Standard FTP comparisonThis same repository is available to standard FTP for comparison, asftp://gridftp.bio-mirror.net/biomirror/ This will be the same data as at ftp://bio-mirror.net/biomirror/ (by end of August 2010). Please use the hostname gridftp.bio-mirror.net rather than IP address, as we plan to change the address.
Register for usageThe full bio-mirror data repository is accessible on the standard GridFTP port 2811, after you register your computer IP address and contact info.globus-url-copy -list \ ftp://gridftp.bio-mirror.net:2811/biomirror/We ask you to register your computer IP address for full access to GridFTP.Bio-Mirror.net because this is still in a trial stage, and we need to be able to assess problems and contact you if about any such. Early tests match other reports, the server cpu and memory load is higher than regular FTP, but not greatly so. GridFTP/UDT appears less conumptive of cpu than rsync, as well as more useful.
|
Test CasesGridFTP TCP and UDT transfer times for 113 GB fromgridftp.bio-mirror.net/biomirror/blast/ (Indiana USA)
Ping Time(min) TCP/ Distance
Site RTT TCP UDT UDT Km Network Route
--------------------------------------------------------------
NCSA 10 139 138 1 200 Indiana - U of Illinois - NCSA
14 14 Megabytes/sec
Purdue 17 125 125 1 500 In. - Chicago - Purdue, Indiana
15 15 MB/s
ORNL 25 361 120 3 1200 In. - Chi. - Nashv., Tennesee - ORNL
5.3 16 MB/s
TACC 37 616 120 5 2000 In. - Chi. - Houston, Texas - TACC
3.1 16 MB/s
SDSC 65 750 475 1.6 3300 In. - Chi. - LA, California - SDSC
2.5 4.0 MB/s
CSTNET 274 3722* 304 12 12000 In. - Internet2 - Korea - Beijing, China
0.5 6.3 MB/s; * est. from partial TCP result
--------------------------------------------------------------
Transfer times (minutes), and below speed in Megabytes/second,
for TCP and UDT, and the TCP/UDT ratio.
NCSA, Purdue, ORNL, TACC, SDSC are Teragrid.org sites in USA.
Land/sea line distance is given in Km.
RTT is network distance as average round trip ping time in ms.
TCP and UDT transfers were run simultaneously from each site.
TCP buffer setting is -tcp-bs 500000
Resource use by client globus-url-copy was higher for UDT. UDT: 1.0% - 3.0% CPU; 40 Mb Memory TCP: 0.1% - 0.6% CPU; 6 Mb MemoryScript for Testing
Report on GridFTP qualitiesUDT as an Alternative Transport Protocol for GridFTPJohn Bresnahan, Michael Link, Rajkumar Kettimuthu, and Ian Foster Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 "We compare the performance of Iperf, scp, bbcp, GridFTP over TCP (both single and multiple streams), GridFTP over UDT, and raw UDT on four different networksÑa wide-area network between Argonne National Laboratory (ANL) and the University of Auckland, New Zealand (NZ), with a round-trip time of 204 ms; a wide-area network between ANL and Los Angeles US(ISI), with a round-trip time of 60 ms; a wide-area network between the Ohio State University US (BMI) and JA site in Japan, which is a part of the Japan Gigibit Network II project, with a round-trip time of 193 ms; and a wide-area network between the JA site and Oak Ridge National Laboratory, Tennessee US (ORNL), with a round-trip time of 194 ms. To the best of our knowledge, all the pairs of the sites used in the experiments have 1 Gbit/s (maximum possible bandwidth)." Table 1: Throughput (in Mbit/s) achieved when transferring 1 GB of data over two wide-area networks, using various mechanisms. Mechanism ANL/NZ ANL/ISI BMI/JA JA/ORNL scp 2 9 3 3 bbcp -- 35 5 112 Iperf 19 74 59 110 GridFTP.TCP 16 59 73 113 GridFTP.UDT 187 418 220 380 UDT 174 398 211 374 # for data on disk, 1 transport stream (see paper)"In these experiments, 1 GB of data was transferred between the end points. Table 1 shows the throughput achieved in megabit per second. We noted that the performance of GridFTP over TCP is comparable to the performance of iperf and is significantly better than scp and bbcp. GridFTP over UDT outperforms the best possible throughput obtained with TCP by a factor of 4 on two testbeds (ANL-NZ and ANL-ISI). GridFTP over UDT outperforms GridFTP over TCP (single stream) by a factor of 3 on the other two testbeds (BMI-Japan and Japan- ORNL)."
|