Pages

Hbase Region is Multiply Assigned to Region Servers

No comments:
After running hbase hbck command from command line, following errors says that  Hbase Region is Multiply Assigned to Region Servers. Status is inconsistent.

Number of Tables: 1
Number of live region servers: 8
Number of dead region servers: 0

........Number of empty REGIONINFO_QUALIFIER rows in .META.: 0
14/03/18 11:56:05 DEBUG client.HConnectionManager$HConnectionImplementation: The connection to hconnection-0x344d0b06aaa0028 has been closed.
ERROR: Region .META.,,1.1028785192 is listed in META on region server hdptest2.test.local:60020 but is multiply assigned to region servers hdptest2.test.local:60020, hdptest2.test.local:60020
ERROR: Region -ROOT-,,0.70236052 is listed in META on region server hdptest3.test.local:60020 but is multiply assigned to region servers hdptest3.test.local:60020, hdptest3.test.local:60020
ERROR: Region ambarismoketest,,1395071340885.7c4d1eb0609daaa87fd3b7a2bb725b44. is listed in META on region server hdptest4.test.local:60020 but is multiply assigned to region servers hdptest4.test.local:60020, hdptest4.test.local:60020

Summary:
-ROOT- is okay.
Number of regions: 1
Deployed on: hdptest3.test.local:60020
.META. is okay.
Number of regions: 1
Deployed on: hdptest2.test.local:60020
ambarismoketest is okay.
Number of regions: 1
Deployed on: hdptest4.test.local:60020
3 inconsistencies detected.
Status: INCONSISTENT

In my case the reason for the above-mentioned errors is coming from contents of /etc/hosts file. In order to configure a hadoop cluster properly, hostnames of all nodes are set correctly. When called "hostname -f" should return fully qualified domain name.

Following two lines should not be removed from /etc/hosts file.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

Then set your fully qualified domain name of the server with the following order:
192.168.1.22  hdptest2.test.local  hdptest2


Oozie says jar does not exist

No comments:
After submitting an oozie shell action job, oozie logs say shell-launcher.jar does not exist.

JA008: File /var/tmp/oozie/oozie-oozi450698329702305.dir/shell-launcher.jar does not exist.

On the oozie server shell-luncher.jar exists in the local folder.

# locate shell-launcher.jar
/usr/local/oozie-3.3.2/distro/target/oozie-3.3.2-distro/oozie-3.3.2/oozie-server/temp/oozie-root3458733450304809568234.dir/shell-launcher.jar

In order to resolve this error, just crate hdfs folders and give appropriate permissions to them.

# su -l hdfs -c "hadoop fs -mkdir /var"
# su -l hdfs -c "hadoop fs -mkdir /var/tmp"
# su -l hdfs -c "hadoop fs -mkdir /var/tmp/oozie"
# su -l hdfs -c "hadoop fs -chmod -R 755 /var"
# su -l hdfs -c "hadoop fs -chown oozie:hdfs /var/tmp/oozie"

What Apache Ambari Does

No comments:
Apache Ambari is used for:

Provisioning a Hadoop Cluster

Managing a Hadoop Cluster

Monitoring a Hadoop Cluster

Integrating Hadoop with Other Applications (Using REST APIs)

Chrome Custom URL Redirector Extension

No comments:
Working as a IT professional sometimes you need to access remote systems using vpn connection. Throughout the vpn connection dns records and hosts records may be overwritten and control panel url addresses are inaccessible. To overcome this you need to use custom browser extension like greasemonkey, tempermonkey etc.

But i don't want to use these third party extensions so i wrote a simple chrome plugin which translates url addresses to ip addresses.

You need two files, manifest.json and background.js

Examples of these files are(add urls and ip addresses to urlDict variable as required!):

manifest.json:
{
  "name": "Custom URL Redirector",
  "version": "0.1",
  "description": "Checks URL and redirects as required.",
  "manifest_version": 2,
  "background": {
    "scripts": ["background.js"]
  },
  "permissions":["http://*/*","https://*/*","webRequest","webRequestBlocking","tabs"]
}

background.js:
chrome.webRequest.onBeforeRequest.addListener(
  function(details) {
    var urlDict = {
      "abc.com" : "1.1.1.1",
      "def.net" : "2.2.2.2",
      "ghi.org" : "3.3.3.3"
    };
    for(var key in urlDict) {
      if (details.url.indexOf(key) !== -1) {
        var re = new RegExp(key, "g");
        redirURL = details.url.replace(re, urlDict[key])
        return {redirectUrl: redirURL};
      }
    }
  },
  {urls: ["<all_urls>"]},
  ["blocking"]);

Save these two files in the same folder. And import them from chrome's extension page. Turn on developer mode , click load unpacked extension and select the folder containing these two files.

Now you can use the extension. Then all requests made to abc.com would be redirected to 1.1.1.1

How to create hive external table for nutch's hbase webpage schema?

No comments:
In order to query hbase table using hive, an external table should be created.

CREATE EXTERNAL TABLE webpage_hive (key string, baseUrl string, status int, prevFetchTime bigint, fetchTime bigint, fetchInterval bigint, retriesSinceFetch int, reprUrl string, content string, contentType string, protocolStatus string, modifiedTime bigint, prevModifiedTime bigint, batchId string, title string, text string, parseStatus int, signature string, prevSignature string, score int, headers map<string,string>, inlinks map<string,string>, outlinks map<string,string>, metadata map<string,string>, markers map<string,string>) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,f:bas,f:st,f:pts#b,f:ts#b,f:fi#b,f:rsf,f:rpr,f:cnt,f:typ,f:prot,f:mod#b,f:pmod#b,f:bid,p:t,p:c,p:st,p:sig,p:psig,s:s,h:,il:,ol:,mtdt:,mk:") TBLPROPERTIES ("hbase.table.name" = "webpage");

after executing this statement columns are created like:

baseurl string from deserializer
batchid string from deserializer
content string from deserializer
contenttype string from deserializer
fetchinterval bigint from deserializer
fetchtime bigint from deserializer
headers map<string,string> from deserializer
inlinks map<string,string> from deserializer
key string from deserializer
markers map<string,string> from deserializer
metadata map<string,string> from deserializer
modifiedtime bigint from deserializer
outlinks map<string,string> from deserializer
parsestatus int from deserializer
prevfetchtime bigint from deserializer
prevmodifiedtime bigint from deserializer
prevsignature string from deserializer
protocolstatus string from deserializer
reprurl string from deserializer
retriessincefetch int from deserializer
score int from deserializer
signature string from deserializer
status int from deserializer
text string from deserializer
title string from deserializer

some of example queries are:

Following query converts bigint epoch to readable date format:
select baseurl,from_unixtime(fetchtime, "[dd/MM/yyyy:HH:mm:ss Z]") AS ft from webpage_hive order by baseurl desc;

Following query explode outlinks in a lateral view and displays as key,value pairs:
SELECT baseurl, outl_key,outl_value FROM webpage_hive LATERAL VIEW explode(outlinks) olTable AS outl_key,outl_value;

Howto disable IPv6 Stack Ubuntu Linux

No comments:
Sometimes network infrastructures are misconfigured to use ipv6 stack. Then you want to disable it at all. For ubuntu distribution it is easy as changing sysctl.conf and reboot your machine. Here is how to do it.

First check whether it is enabled or not:
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
if output is 0 it is enabled and 1 it is disabled.

Adding the following rows is enough to disable ipv6 stack then restart your computer to take effect:
# vi /etc/sysctl.conf
net.ipv6.conf.lo.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.all.disable_ipv6 = 1
# shutdown -r now


When your computer is up again, you can check it:
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6


Pig Latin Error 2998 when loading data from HBaseStorage

No comments:
ERROR 2998: Unhandled internal error. org/apache/hadoop/hbase/filter/WritableByteArrayComparable

java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/filter/WritableByteArrayComparable

Probably your PIG_CLASSPATH does not contain hbase jars. PIG_CLASSPATH should contain them like:

export PIG_CLASSPATH=/<hbase_home>/<hbase-your_version.jar>:<hbase_home>/lib/<zookeeper-your_version.jar>:$PIG_CLASSPATH