Introduction

This project was done as part of my Big Data Technology class.

HetioNet is a hetnet with multiple node and edge (relationship) types, which encodes biology. The hetnet was designed for Project Rephetio, which aims to systematically identify why drugs work and predict new therapies for drugs.

Purpose

This program:

  • Computes the number of genes that are BIND (CbG) to it using MapReduce method.
  • Computes the number of GENE(s) for each disease that are UPREGULATES (DuG) using MapReduce method. Outputing results with the top 5 number of GENE(s) in a descending order.
  • Computes the hash tables using mid-square method (with r = 3, 4) and Folding Method (digit-size = 2 and 3). Outputs with 10 hash tables for both methods and finds which method requires the least number of storage.