Caterpillar Crawling: November 2012

Friday, November 23, 2012

Run VM instance on Eucalyptus, Xen

(on front machine)

1. check image

 $ euca-describe-images

note down the image id: emi-xxxxxxx

2. create key pair

$ euca-add-keypair user01-keypair > user01-keypair.private  
$ chmod 0600 user01-keypair.private  
$ euca-describe-keypairs

 KEYPAIR     user01-keypair

03:d8:30:69:36:41:62:f8:6b:60:8d:17:19:c1:29:16:5b:6e:79:a9

3. Authorize security groups

unlimited support on ssh port 22

 euca-authorize -P tcp -p 22 -s 0.0.0.0/0 default

4. Launch an instance

euca-run-instances emi-48AE3FD9 -k user01-keypair

if instance is launched, the following should be shown:

 RESERVATION     r-B5A143D7     292622667431     default  
 INSTANCE     i-28C442AB     emi-48AE3FD9     0.0.0.0     0.0.0.0     pending     user01-keypair     0          m1.small     2012-11-19T15:28:09.818Z     cluster01     eki-855C3923     eri-05BE397E          monitoring-disabled     0.0.0.00.0.0.0               instance-store

5. Get instance status & log into the instance

euca-describe-instances

ssh -i user01-keypair.private root@<instance_ip>

Trouble shooting:

1) error: RunInstancesType: Not enough resources (0 in cluster01 < 1): vm instances.

run the following on front machine, NC and CC:

 euca-describe-availability-zones verbose  
 ip addr show  
 route -n

run the following on cluster machine:

 euca-describe-services -E

2) @ nc.log (*** still searching for solution****)

 [Mon Nov 19 15:36:52 2012][002893][EUCAINFO ] [i-CA574225] tuning root file system on disk 0 partition 1  
 [Mon Nov 19 15:36:52 2012][002893][EUCAERROR ] {2798909184} error: bad return code from cmd '//usr/lib/eucalyptus/euca_rootwrap /sbin/tune2fs /dev/mapper/euca-W9JINGX8VP5TYHDLPJPZM-i-CA574225-emi-48AE3$  
 [Mon Nov 19 15:36:52 2012][002893][EUCADEBUG ] /sbin/tune2fs: Bad magic number in super-block while trying to open /dev/mapper/euca-W9JINGX8VP5TYHDLPJPZM-i-CA574225-emi-48AE3FD9-6d4be1e1  
 Couldn't find valid filesystem superblock.  
 tune2fs 1.42 (29-Nov-2011)  
 [Mon Nov 19 15:36:52 2012][002893][EUCAINFO ] {2798909184} error: cannot tune file system on '/dev/mapper/euca-W9JINGX8VP5TYHDLPJPZM-i-CA574225-emi-48AE3FD9-6d4be1e1'  
 [Mon Nov 19 15:36:52 2012][002893][EUCAERROR ] [i-CA574225] error: failed to tune root file system: blobstore.c:3196 file access only supported for uncloned blockblobs  
 [Mon Nov 19 15:36:52 2012][002893][EUCAERROR ] [i-CA574225] error: failed to create artifact emi-48AE3FD9-6d4be1e1 (error=1, may retry) on try 1  
 [Mon Nov 19 15:36:52 2012][002893][EUCADEBUG ] {2798909184} detaching from loop device /dev/loop5  
 [Mon Nov 19 15:36:52 2012][002893][EUCADEBUG ] {2798909184} detaching from loop device /dev/loop4  
 [Mon Nov 19 15:36:52 2012][002893][EUCADEBUG ] [i-CA574225] error: failed to implement artifact 019|emi-48AE3FD9-6d4be1e1 on try 1  
 [Mon Nov 19 15:36:52 2012][002893][EUCAERROR ] [i-CA574225] error: failed to provision dependency emi-48AE3FD9-6d4be1e1 for artifact dsk-48AE3FD9-09c9bcae (error=1) on try 1  
 [Mon Nov 19 15:36:52 2012][002893][EUCADEBUG ] [i-CA574225] error: failed to implement artifact 024|dsk-48AE3FD9-09c9bcae on try 1  
 [Mon Nov 19 15:36:52 2012][002893][EUCAERROR ] [i-CA574225] error: failed to provision dependency dsk-48AE3FD9-09c9bcae for artifact i-CA574225 (error=1) on try 1  
 [Mon Nov 19 15:36:52 2012][002893][EUCADEBUG ] [i-CA574225] error: failed to implement artifact 013|i-CA574225 on try 1  
 [Mon Nov 19 15:36:52 2012][002893][EUCAERROR ] [i-CA574225] error: failed to implement backing for instance  
 [Mon Nov 19 15:36:53 2012][002893][EUCAERROR ] [i-CA574225] error: failed to prepare images for instance (error=1)  
 [Mon Nov 19 15:36:53 2012][002893][EUCADEBUG ] [i-CA574225] state change for instance: Staging -> Shutoff (Pending)  
 [Mon Nov 19 15:36:53 2012][002893][EUCAINFO ] [i-CA574225] cleaning up state for instance

3) if nc.log does not log or error in cc.log
ERROR: DescribeResource() could not be invoked (check NC host, port, and credentials)

-check whether nc node is running, if not run

 sudo service eucalyptus-nc start

- check whether xend is runing using

 sudo xm list

if not run

 sudo service xend start

$ virsh version
Compiled against library: libvir 0.9.8
Using library: libvir 0.9.8
Using API: QEMU 0.9.8
Running hypervisor: QEMU 1.0.0

solution:

$ sudo -s

# echo "export VIRSH_DEFAULT_CONNECT_URI=xen:///" >> /etc/profile.d/libvirtd.sh

# chmod +x /etc/profile.d/libvirtd.sh

# reboot

modify /etc/xen/xend-config.sxp

(xend-http-server yes)
(xend-unix-server yes)

(xend-unix-path /var/lib/xend/xend-socket)

#sudo service xend restart
#sudo service eucalyptus-nc restart
#virsh version


2012-11-22 14:08:10.678+0000: 3051: info : libvirt version: 0.9.8
2012-11-22
 14:08:10.678+0000: 3051: warning : xenHypervisorMakeCapabilities:2751 :
 Failed to get host power management capabilities
Compiled against library: libvir 0.9.8
Using library: libvir 0.9.8
Using API: Xen 0.9.8
Running hypervisor: Xen 4.1.0

5) running with ip 0.0.0.0

check the instance in the console using

 sudo xm console <instance-id>

look for network configuration and ip address.

6) RunInstancesType: Failed to lookup kernel image information unknown because of: Attempt to resolve a kerneId for BootableSet:machine=arn:aws:euca:eucalyptus:388002304024:image/emi-3F8B38A3/:ramdisk=false:kernel=false:isLinux=true during request RunInstancesType:2cc2fab3-6e36-45f9-986a-33dcb278399e:return=true:epoch=null:status=null

kernel and rmdisk are not registered. check in download files for correspondent hypervisor, then use 3 commands to bundle, upload, register both of them. run the instance again.

Monday, November 19, 2012

Local JARs for Maven Dependency

1. download jar into /mavenlocal

2. run

mvn install:install-file -DgroupId=classifier4j -DartifactId=Classifier4J -Dversion=0.6 -Dpackaging=jar -DcreateChecksum=true -Dfile=/home/xzhao/mavenlocal/Classifier4J-0.6.jar

3. Add dependency

<groupId>classifier4j</groupId>

<artifactId>Classifier4J</artifactId>

</dependency>

The following method not working well

1. Download jar file into /lib

2. Use Maven to install to project repo

 mvn install:install-file -DlocalRepositoryPath=repo -DcreateChecksum=true -Dpackaging=jar -Dfile=[your-jar] -DgroupId=[...] -DartifactId=[...] -Dversion=[...]

maven repository created in /lib/repo

3. Add repository in pom.xml

 <repository>  
   <id>repo</id>  
   <url>file://${project.basedir}/repo</url>  
 </repository>

4. Add dependencies

sources:

http://stackoverflow.com/questions/364114/can-i-add-jars-to-maven-2-build-classpath-without-installing-them
http://blog.dub.podval.org/2010/01/maven-in-project-repository.html

Friday, November 16, 2012

Create Ubuntu Xen image for Eucalyptus

(perform steps 1-6 on node controller; 7- on client machine)

1. create a folder and download ISO file.

sudo mkdir ubuntu-xen-manual
cd ubuntu-xen-manual
wget ***.iso

2. Create a 4GB virtual disk

sudo dd if=/dev/zero of=ubuntu-12.04D.img bs=1M count=4096

3. Create Xen configuration file (xen.cfg) with followings:

 name = "ubuntubox"  
 #make sure kernel is in right place  
 kernel = "/usr/lib/xen-default/boot/hvmloader"  
 memory = 1024  
 builder = "hvm"  
 #make sure device_model is in right place  
 device_model = "/usr/lib/xen-default/bin/qemu-dm"  
 boot = "d"  
 disk = ['file:~/ubuntu-xen-manual/***.iso,hdc:cdrom,r',  
 'file:~/ubuntu-xen-manual/ubuntu-12.04D.img,hda,w']  
 vif = ['']  
 #dhcp="on"  
 vnc = 1  
 vncdisplay = 7  
 pae = 1

4. Start domU

sudo xen create xen.cfg

5. Connect with a VNC viewer (if desktop version)

sudo apt-get install xvnc4viewer

xvncviewer localhost:7

6. Find out the starting block and the block size of the root file system.

sudo parted ubuntu-12.04D.img

(parted) U
Unit? [compact]? b

(parted) p

You'll see an "unrecognized disk label" message because it is a new drive.

(parted) mklabel msdos

(parted) print free

 Number  Start   End          Size         Type  File system  Flags
        16384B  4294967295B  4294950912B        Free Space

(parted) q

sudo dd if=ubuntu-12.04D.img of=rootfs.img bs=1 skip=16384 count=4294950912

(*the extraction takes quite a while; to test change the image size to smaller 512*)

the root image is created as rootfs.img.

7. Bundle, upload and register root image to Eucalyptus

scp rootfs.img to client machine

open a terminal on client machine:

sudo scp server01@10.1.62.172:~/ubuntu-xen-manual/rootfs.img ~/

source eucarc

cd ~/.euca

source eucarc

euca-bundle-image -i ~/rootfs.img (execute in ~/.euca; otherwise EC2_CERT not found error)

 Checking image  
 Compressing image  
 Encrypting image  
 Splitting image...  
 Part: rootfs.img.part.00  
 Generating manifest /tmp/rootfs.img.manifest.xml

euca-upload-bundle -b ubuntu -m /tmp/rootfs.img.manifest.xml

 Checking bucket: ubuntu  
 Creating bucket: ubuntu  
 Uploading manifest file  
 Uploading part: rootfs.img.part.00  
 Uploaded image as ubuntu/rootfs.img.manifest.xml

euca-register ubuntu/rootfs.img.manifest.xml

 IMAGE     emi-48AE3FD9

euca-describe-images

 IMAGE     emi-48AE3FD9     ubuntu/rootfs.img.manifest.xml

292622667431     available public          i386     machine

eki-855C3923     eri-05BE397E          instance-store

Wednesday, November 7, 2012

Configure SIREn, Solr on Jetty

(Default settings: SIREn with Lucene 3.5, Solr 3.6 with default Jetty)

1. create a /lib folder under SOLR_HOME/example/solr, add the following from SIREn targets:

siren-core-0.2.3-SNAPSHOT.jar,
siren-qparser-0.2.3-SNAPSHOT.jar
siren-solr-0.2.3-SNAPSHOT.jar

2. modify solrconfig.xml, add following:

 <!-- Example of Registration of the siren query parser. -->  
  <queryParser name="siren" class="org.sindice.siren.solr.SirenQParserPlugin"/>  
  <requestHandler name="siren" class="solr.StandardRequestHandler">  
   <!-- default values for query parameters -->  
    <lst name="defaults">  
     <str name="defType">siren</str>  
     <str name="echoParams">explicit</str>  
                 <!-- Disable field query in keyword parser -->  
     <str name="disableField">true</str>  
     <str name="qf">  
      ntriple^1.0 url^1.2  
     </str>  
     <str name="nqf">  
      ntriple^1.0  
     </str>  
     <!-- the NTriple query multi-field operator:  
       - disjunction: the query should match in at least one of the fields  
       - scattered: each Ntriple patterns should match in at least on of the fields  
     -->   
     <str name="nqfo">scattered</str>  
     <str name="tqf">  
      tabular^1.0  
     </str>  
     <!-- the Tabular query multi-field operator:  
       - disjunction: the query should match in at least one of the fields  
       - scattered: each tabular patterns should match in at least on of the fields  
     -->  
     <str name="tqfo">scattered</str>  
     <str name="fl">  
      id  
     </str>  
    </lst>  
  </requestHandler>

3. modify schema.xml, add following and rename fields url and id if they exist in the file already.

 <!-- The ID (URL) of the document   
         Use the 'string' field type (no tokenisation)  
      -->  
        <field name="id" type="string" indexed="true" stored="true" required="false"/>  
       <!-- The URL of the document   
         Use the 'text' field type in order to be tokenised  
      -->  
        <field name="url" type="uri" indexed="true" stored="true" required="true"/>  
 <!-- n-triple indexing scheme -->  
        <field name="ntriple" type="ntriple" indexed="true" stored="true" multiValued="false"/>  
     <!-- tabular indexing scheme -->  
     <field name="tabular" type="tabular" indexed="true" stored="false" multiValued="false"/>

 <!-- A uri field that uses WhitespaceTokenizer and WordDelimiterFilter to   
      split URIs into multiple compoenents. Stopwords is customized by   
      external files.  
      omitNorms is true since it is a short field, and it does not make   
      really sense on URI.  
      Does not use the ASCIIFoldingExpansionFilter since URIs should not  
      contain accented characters.  
   -->  
   <fieldType name="uri" class="solr.TextField" omitNorms="true" positionIncrementGap="100">  
    <analyzer type="index">  
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>  
     <!-- Splits words into subwords based on delimiters  
        - split subwords based on case change  
        - preserveOriginal="1" in order to preserve the original word.  
        Removed split based on numerics to fix SND-355 and SND-1283   
     -->  
     <filter class="solr.WordDelimiterFilterFactory"   
         generateWordParts="1"   
         generateNumberParts="1"   
         catenateWords="0"   
         catenateNumbers="0"   
         catenateAll="0"   
         splitOnCaseChange="1"  
         splitOnNumerics="0"  
         preserveOriginal="1"/>  
     <!-- Filters out those tokens *not* having length min through max   
        inclusive. -->  
     <filter class="solr.LengthFilterFactory" min="2" max="256"/>  
     <!-- Change to lowercase text -->  
     <filter class="solr.LowerCaseFilterFactory"/>  
     <!-- Case insensitive stop word removal.  
      add enablePositionIncrements=true in both the index and query  
      analyzers to leave a 'gap' for more accurate phrase queries.  
     -->  
     <filter class="solr.StopFilterFactory"  
         ignoreCase="true"  
         words="stopwords.txt"  
         enablePositionIncrements="true"  
         />  
    </analyzer>  
    <analyzer type="query">  
     <!-- whitespace tokenizer to not tokenize URI -->  
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>  
     <!-- Filters out those tokens *not* having length min through max   
        inclusive. -->  
     <filter class="solr.LengthFilterFactory" min="2" max="256"/>  
     <filter class="solr.LowerCaseFilterFactory"/>  
     <filter class="solr.StopFilterFactory"  
         ignoreCase="true"  
         words="stopwords.txt"  
         enablePositionIncrements="true"  
         />  
     <!-- Replace Qnames by their name spaces in URIs. -->  
     <filter class="org.sindice.siren.solr.analysis.QNamesFilterFactory"   
         qnames="qnames.txt"/>  
    </analyzer>  
   </fieldType>  
 <!--  
            The SIREn field type:  
                The top-level analyzers must be defined in the top-level analyzer   
    configuration file (ntriple-analyzers.xml) and the datatype analyzers in   
    the datatype analyzer configuration file (ntriples-datatypes.xml).   
                Field norms are not useful for SIREn fields. Set omitNorms to true reduces  
                memory consumption, and improve ranking.  
    omitTermFreqAndPositions *must* be set to false.  
           -->  
   <fieldType name="ntriple" class="org.sindice.siren.solr.schema.SirenField"  
         omitNorms="true"   
         omitTermFreqAndPositions="false"  
         analyzerConfig="tuple-analyzers.xml"  
         datatypeConfig="tuple-datatypes.xml"/>  
   <fieldType name="tabular" class="org.sindice.siren.solr.schema.SirenField"  
         omitNorms="true"   
         omitTermFreqAndPositions="false"  
         analyzerConfig="tuple-analyzers.xml"  
         datatypeConfig="tuple-datatypes.xml"/>

 <similarity class="org.sindice.siren.similarity.SirenSimilarity"/>

4. copy the following files from SIREN_HOME/siren_solr/example/solr/config to SOLR_HOME/example/solr/config

tuple-analyzers.xml
tuple-datatypes.xml
qnames.txt

5. Restart default Jetty in Solr by java -jar start.jar

6. test with sample code in SIREN_HOME/siren_solr/example/

The examples are indexed successfully but the queries return no result.

P.S. SIREn doesn't support SPARQL.

sources:

https://github.com/rdelbru/SIREn/blob/master/siren-solr/example/INSTALL.txt

Install SIREn on Ubuntu 12.04

1. check jdk and maven installation.

$sudo apt-get install maven (*maven 3 will be installed*)

2. run maven at the SIREn directory

$mvn package

3. check jars

The jar files are located under /target in each folder.

P.S. If the following error occurs:

[ERROR] Failed to execute goal on project siren-core: Could not resolve dependencies for project org.sindice.siren:siren-core:jar:0.2.3-SNAPSHOT: Failure to find com.google.code.caliper:caliper:jar:1.0-SNAPSHOT in https://oss.sonatype.org/content/groups/public/ was cached in the local repository, resolution will not be reattempted until the update interval of oss-sonatype has elapsed or updates are forced -> [Help 1]

Modify pom.xml under siren-core

Change

 <dependency>  
    <groupId>com.google.code.caliper</groupId>  
    <artifactId>caliper</artifactId>  
    <version>1.0-SNAPSHOT</version>  
    <scope>test</scope>  
   </dependency>

 <dependency>  
 <groupId>com.google.caliper</groupId>  
 <artifactId>caliper</artifactId>  
 <version>0.5-rc1</version>  
 <scope>test</scope>  
 </dependency>

Sources:

https://github.com/rdelbru/SIREn/wiki/Getting-Started