To schedule a hive job with java and datastax entreprise, i used the scheduler embeded in Jboss 7 and Hive-JDBC.
First step in to includ the following libraries to the project
- ~/dse-3.0/resources/hive/lib/hive-jdbc-0.9.0.1.jar
- ~/dse-3.0/resources/hive/lib/hive-metastore-0.9.0.1.jar
- ~/dse-3.0/resources/hive/lib/hive-service-0.9.0.1.jar
- ~/dse-3.0/resources/hive/lib/libfb303-0.7.0.jar
- ~/dse-3.0/resources/hadoop/hadoop-core-1.0.4.2.jar
- ~/dse-3.0/resources/hive/lib/hive-serde-0.9.0.1.jar
- ~/dse-3.0/resources/hive/lib/commons-logging-1.0.4.jar
- ~/dse-3.0/resources/hive/lib/hive-exec-0.9.0.1.jar
Then start hive server:
dse-3.0/bin/dse hive --service hives
Then follow this example of Hive-JDBC to write my job:
import java.sql.DriverManager; import java.sql.Connection;
import java.sql.ResultSet; import java.sql.Statement; import java.sql.SQLException; import javax.ejb.Schedule; import javax.ejb.Stateless; import org.apache.log4j.Logger; @Stateless(name = "AutomaticSchedulerBean") public class HiveJob { private static Logger mLogger = Logger.getLogger(HiveJob.class); private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver"; @Schedule(dayOfWeek = "*", hour = "*", minute = "*/5", year = "*", persistent = false) public void execute() throws SQLException{ mLogger.info("Start HiveJob"); try { Class.forName(driverName); } catch (ClassNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); System.exit(1); } Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", ""); Statement stmt = con.createStatement(); // regular hive query String sql = "select count(*) from universe where status = 'GO'"; System.out.println("Running: " + sql); ResultSet res = stmt.executeQuery(sql); while (res.next()) { System.out.println(res.getString(1)); } mLogger.info("HiveJob executed!"); } }
This Job is going to run every 5 minutes.
Enjoy :)
Aucun commentaire:
Enregistrer un commentaire