Multiple choice items (MCIs) are commonly used in high-stake testing and classroom assessment because of their reliable assessment results. However, the recent literature has revealed that item-writing guidelines have been repeatedly violated in creating MCIs, which could also threaten reliability and validity. Another threat to the validity occurs when items favor certain groups even though the magnitude of underlying ability of the different groups is the same, and this is called differential item functioning (DIF). This empirical study aims to compare item parameters for MCIs with negative wording stem and complex MCIs, which are commonly used MCI formats that violate item-writing guidelines for MCIs, and to investigate the impact of DIF on gender differences considering the use of these item formats. The results of this study showed that DIF detection methods flagged two complex MCIs favoring male students because of the item format and tendency of male students’ taking more risk on solving MCIs.