While natural language generation has seen significant improvements, hallucinations remain a critical issue, producing irrelevant or factually incorrect information. This project develops an ensemble of metrics to evaluate factual correctness in generated text. Our findings demonstrate that fine-tuning effectively mitigates hallucinations, whereas prompt engineering shows limited effectiveness.