dvd::rip Cluster node CD

Using the Clusternode CD technotalk Links Bugs Mail me!

This is version 0.1 alpha of a bootable CD that is a completely self contained mini Linux system that can be used together with dvd::rip, a Perl Gtk+ frontend to Linux Video Stream Processing tool, transcode.

The idea of this comes from the dvd::rip mailinglist. Both transcode and dvd::rip can work in cluster mode. Meaning that several computers (nodes) can work together to shorten the time it takes to convert from one format to another. Please read the dvd::rip webpages for more information. It is execellent.

Using this CD you can turn any system into a cluster node without installing anything. Let's put those windows systems to work :)

News

05-03-02
A new version of the cluster node CD :). This time the ramdisk contains everything and the CD is not mounted. This means that you only need one CD and you can boot multiple systems with it. Just boot a system, remove the CD and put it in the next system.
The other changes are:
  • Only one kernel that contains the pcmcia code
  • Everything not needed is removed from the kernel. Even the IDE drivers are not there. This should solve most if not all of the problems concerning hardware.
  • the latest transcode (transcode v0.6.0pre4-20020301) is there.
  • The ssh hostkey is not calculated each time the CD is started. This changed when Joern Reder told me it wasn't needed. (Thanks Joern).
  • The image is no longer compressed. There was too little difference between the compressed image and the noncompressed image. This is because the ramdisk and the kernel are already compressed in the iso image.
Not resolved yet :(
  • compatability between openSSH and SSH. OpenSSH uses a different key algorithm and I need to play with this.
  • Booting old hardware
The question was put to me why I included the transcode binaries on the CD and not doing a nfs mount from the server where the transcode are also available. The answer is quite simple. By running everything from the CD compatability with the libraries is ensured (it's called running in a sandbox).
20-02-02
Thomas just released a new version of the transcode software. I've made a new image and posted it to the website.

According to the logs several people have already downloaded the image and many more people have already looked at the page. Good :).

Some remarks based on mails I received
1 NFS
Please try to use the same userid for the nodes. The dvdrip user on the clusterCD has uid 100. Is this is not appropriate it can be changed as follows: edit /etc/passwd after starting up the CD and change the first 100 into what you want it to be go to /initrd and do "chmod dvdrip dvdrip dvdrip/.ssh2". This will set the correct permissions on the homedirectory and the .ssh2 directoy.

2 Hardware.
Allthough two of the kernels are 386 generic It is very doubtfull the image will start on older hardware. This has to do with several settings in the kernel. Even if the kernel starts I doubt it will do you much good as a node. Network cards. I only have two types network cards here (a 3com cardbus and realtek 8139 based cards). I cannot check every card there is. If there are troubles and you cannot solve them. Mail me as much detail as possible preferably with the exact make of the nic. I will then make an image with the card compiled into the kernel The current kernels access the cdrom as an ide-scsi. this is done so that the naming is always the same. If I would try to access the cdrom as an ATAPI device I would have to write a script or something that will probe which ATAPI device it is.

3 ssh
I have had to reports that sshd did not work, just a closed connection error when using ssh. If you can ping your node, proving that the network card and layer are working, please kill the sshd2 on the clusternode and start it as "sshd2 -v" this will keep the sshd daemon in the foreground and it will print out debugging information. This should help you to solve the problem. If you cannot sort it out with that.mail the EXACT errors to me and I will see if I can reproduce here

Using the Cluster node CD

Getting the image
The image can be downloaded here(use shift left mouse to download). It is an iso image. After downloading the image it needs to be burned on CD. I use cdrecord on my linux system like this:
cdrecord -dev=0,1,0 -speed 2 -v cluster-node.iso
Yes, I have a very old Burner :). Currently you need a CD-ROM per cluster. This will change.

Starting up the image
Make sure the BIOS in your system is set to boot first from CD-ROM. Otherwise it will just boot the OS that is on the harddisk.
There are currently two kernels on the CD. The first one, DVD-Ripper, is an Athlon optimized kernel. The second one is a generic 386 Kernel. I have yet to test wether using an optimized kernel will make a difference.
After the kernel is booted you will be prompted to "Press enter to activate this console". Please do so

assumptions
Before going into the configuration here are the assumptions I work with during the writing of this document:
  • clustercontroller has IP address 192.168.1.1 and is called master
  • node 1 has IP address 192.168.1.100 and the master knows it as node1
  • and so on for as many nodes as you have
  • the host file of the master (or your own private DNS server) knows the names and IP addresses of the nodes
Configuring the image
There are some steps that must taken before this node can be put to use
  1. configure your network card.
    Most current networkcard drivers are on the CD in the directory /lib/modules. If you know the card you have you can just do a modprobe <carddriver>.
    If you do not know what kind of card is there, check the /proc/pci file. If you have a PCI based ethernet card, you will find information there.
    If you still cannot find the ethernet card info, boot the system up in the OS it normally runs and check it there.

  2. assigning an IP address to your card
    Once the driver is loaded, you must assign a unique IP address to it. Most home networks use the 192.168.1.xxx network.
    ifconfig eth0 192.168.1.100
    Will assign 192.168.1.100 to the network card. Setting a route is not necessary.

  3. Setting up the host file
    The node must know at least the name of the master. So use vi /etc/hosts> and add the IP address and the name of the master into it like this:
    192.168.1.1 master
    The VI on the CD cannot do :x so you have to do :w<cr> followed by :q<cr>

  4. Setting up access to the node
    The node controller (master) needs to be able to run commands on the node without having to enter a password. The easiest way is to use a public key that does not have pass phrase. This is insecure!!! Luckily with ssh version 3.1.0 there is a way around this. You can have several identity keys. I created a second key without a pass phrase and added it to my identification file on the master. So the file now looks like this:
    IdKey id_dsa_1024_a
    IdKey id_dsa_1024_b
    The second key (id_dsa_1024_b) is the one without a passphrase. By only using this key for access to the nodes you will not get into security hazards.

    There is a user called dvdrip on the CD-ROM. It has an empty password and a writable directory (see the techno explanation) copy the public key from the master to .ssh2 directory (/initrd/dvdrip/.ssh2) of the user dvdrip:
    master$ scp id_dsa_1024_b.pub dvdrip@node1:/initrd/dvdrip/.ssh2
    dvdrip@node1's password:
    id_dsa_1024_b.pub | 733B | 0.7 kB/s | TOC: 00:00:01 | 100%
    when asked for a password, just press <cr>. If this does not work then something in the network or sshd is not running. Check all the previous steps to see if there are any errors.
    After this you must create an authorization file in the .ssh2 directory of the user dvdrip on node1:
    master$ ssh dvdrip@node1
    dvdrip's password:
    Authentication successful.


    BusyBox v0.60.2 (2002.02.17-10:34+0000) Built-in shell (msh)
    Enter 'help' for a list of built-in commands.

    $ cd .ssh2
    $ echo key id_dsa_1024_b.pub >authorization
    $ exit
    Connection to node1 closed.
    master$
    Again, when prompted for the password, just press <cr>.As you can see I used ssh to logon and create the file. Now you should be able to logon without a password. Let;s startup transcode :) :
         master$ ssh dvdrip@node1 transcode
    Authentication successful.
    transcode v0.6.0pre3-20020214 (C) 2001-2002 Thomas Östreich
    'transcode -h' shows a list of available command line options.
    master$
    You are now ready to start cluster ripping !

Techno talk

Build up of the disks
The CD is basically build up in two ways. During startup a ramdisk with the size of 25M is created and filled with a 25M ext2 filesystem that is almost completely empty. This filesystem contains the files needed to start Linux up. After it has started, the CD-ROM is mounted and a pivot_root command is done to make the CD-ROM the root filesystem.
The ramdisk is available after this as /initrd. This is very important because several links in the CD-ROM root filesystem point to directories to.

Here is a directory tree structure of the ramdisk filesystem:
 .
|-- bin
|-- dev
|-- dvdrip
|-- etc
| `-- init.d
|-- initrd
|-- lib
|-- lost+found
|-- mnt
|-- opt
| `-- ssh
| `-- etc
| |-- hostkeys
| `-- knownhosts
|-- proc
|-- root
|-- sbin
|-- tmp
`-- usr
|-- bin
`-- sbin
As you can see the home directories of root and dvdrip as well as the tmp and the /opt/ssh/etc directory (where the hostkey is stored) are on the ramdisk. This way the we can write to it without any problem.
There really are only two programs on the ramdisk, busybox and tinylogin. Tinylogin is not used yet, but it might in the future.

The directory /etc contains the following files:
.
|-- group
|-- hosts
|-- init.d
| `-- rcS
|-- passwd
`-- passwd-
These are the files most likely to change while the node is running.

The CD-Rom is build up as follows:
.
|-- bin
|-- dev
|-- etc
|-- initrd
|-- isolinux
|-- lib
| `-- modules
|-- mnt
| `-- ramdisk
|-- opt
| |-- ssh
| | |-- bin
| | `-- sbin
| `-- transcode
| |-- bin
| `-- lib
| `-- transcode
|-- sbin
`-- usr
|-- bin
`-- sbin
As you can see the transcode is in a separate directory /opt/transcode. this is on purpose. Everytime a new release comes out. I can recompile easily on my system and copy just the tree over. All the programs in /opt/transcode/bin are symlinked to /usr/bin. Thus making them accessible thru ssh.

The kernel I used is 2.4.17 with the devfs system in it. It made life a lot easier.
Just about all information to do this can be found on the net or in the linux documentation that comes with the linux kernel sources.

Links

BusyBox and Tinylinux Syslinux and companions
dvd::rip Linux Video Stream Processing tool
The kernel page The ssh webside
fping It was not in my distribution. Vmware, this is not meant as an add, but making the CD would not be as simple if I did not have this :)

Bugs

Probably many. Mail me about them.

Valid HTML 4.01!