Sevgi Aydin Gunbatar*a,
Gizem Tezcan Sirin
b,
Onur Can Ilkyaz
b and
Yusuf Mutlu
b
aCollege of Education, Mathematics and Science Education Dept., Van Yuzuncu Yil University, Van, Turkiye. E-mail: sevgi.aydin45@hotmail.com
bInstitute of Educational Sciences, Van Yuzuncu Yil University, Van, Turkiye
First published on 11th July 2025
In this multi-case study, the ChatGPT interaction profiles of participants with varying years of teaching experience were examined during lesson planning for the topic of acids and bases. Six participants (i.e., two senior pre-service science teachers, one induction year science teacher, two experienced science teachers, and an experienced science teacher educator) participated in the study. A lesson plan in the Content Representation (CoRe) format, a copy of dialogue with ChatGPT (i.e., conversation history with ChatGPT), a reflection paper on details of the lesson planning with ChatGPT, and online focus group interview data were collected from all participants. Both deductive and inductive analyses of the multiple data sources revealed four profiles, namely, Artificial Intelligence (AI)-reliant planners, AI-collaborative planners, AI-assisted plan refiners, and AI-independent planners. The profiles differed from each other in terms of the flow of lesson planning with AI, the interaction purposes of the participants with ChatGPT during lesson planning, and the extent to which the participants incorporated the content provided by ChatGPT into their lesson plan preparation. In light of the results, science pre-service teachers and teachers should be trained on what AI can offer them, how AI tools can be effectively utilized in science education, and the ethical considerations of AI use.
In this transition period when AI technologies are being used in education, “to unleash their [AI tools] full potential for education, it is crucial to approach these models with caution and critically evaluate their limitations and potential biases, understanding that they are tools to support teaching and learning and do not replace teachers” (Van den Berg and du Plessis, 2023, p. 1). In this study, through the use of the technological pedagogical content knowledge (TPACK) framework enlightening the integration of technology (i.e., particularly AI technology) (Mishra and Koehler, 2008; Celik, 2023) into educational settings, we aimed to examine pre-service, in-service teachers (i.e., both in-experienced and experienced), and a teacher educator's lesson planning with ChatGPT for acids and bases topic, a fundamental topic of middle school science and high school chemistry curricula (Drechsler and Van Driel, 2008). The acids and bases topic encompasses abstract concepts, which makes learning the topic challenging for learners (Cetin-Dindar and Geban, 2017) and leads to alternative conceptions (Nakhleh, 1994; Sesen and Tarhan, 2011).
Previous research has provided information on how instructors and pre- and in-service teachers utilize AI in lesson planning. To the best of our knowledge, there is a gap in the literature regarding the examination of how participants with varying levels of teaching experience utilize the models, their aims in interacting with the models, and whether they engage with the model's offer, and if so, how they handle the offer. Although Clark et al. (2024) argued that prior knowledge is an important factor in AI dialogue, the questions of how it changes the planning procedure and directs the dialogue with AI remain unanswered. By focusing on the points listed, this study has the potential to inform teacher educators about the use of AI tools in teacher education, at which points they should support teachers, and how science and chemistry teaching methods courses can be enriched with the use of AI.
In the related literature, there are different PCK models, such as the hexagonal model (Park, 2005) and the Refined Consensus (RCM) model (Carlson and Daehler, 2019). The subcomponents are more or less similar across different PCK models. For example, alternative conceptions students may have about the subject and the difficulties they encounter, as well as learning outcomes in the science curriculum and instructional and assessment strategies. In contrast to other models, Carlson and Daehler's (2019) RCM model distinguishes between three separate realms of PCK: collective PCK (cPCK), personal PCK (pPCK), and enacted PCK (ePCK). While pPCK is formed by an individual teacher, cPCK is “held by multiple educators in field” (p. 82). ePCK is the one that is employed by an individual teacher during teaching and/or planning instruction for a specific topic taught to a specific grade level. In addition to different PCK models, the PCK construct has been enlarged to teach specific constructs (e.g., PCK for Science, Technology, and Mathematics [STEM] (Aydin-Gunbatar et al., 2020), PCK for Nature of Science [NOS]) (Hanuscin et al., 2011; Faikhamta, 2013). Additionally, PCK was enriched with technology integration. The interactions between technology, content, and pedagogy created a new derived construct: technological pedagogical content knowledge (TPACK) (Angeli and Valanides, 2009). Although Carlson and Daehler (2019) did not establish a connection with TPACK for the three different realms they proposed for PCK, this categorization for PCK is also applicable to TPACK (i.e., personal TPACK, enacted TPACK).
Mishra and Koehler (2006) proposed a TPACK framework based on Shulman's (1986,1987) idea of PCK. “At the heart of good teaching with technology are three core components: content, pedagogy, and technology, plus the relationships among and between them.” (Koehler and Mishra, 2009, p. 62). TPACK involves the knowledge of content, pedagogical knowledge, and technological tools to advance the teaching and learning process (Mishra and Koehler, 2006) (Fig. 1).
Technology knowledge (TK) is defined as an understanding of technological possibilities, an interest in emerging technologies, and the ability to use technology (Mishra and Koehler, 2006). TK is crucial for teachers to support students' understanding of the subject and material taught (Hasriadi and Nurul, 2023). Teachers with a high level of TK have the ability to integrate technological tools and equipment into their professional and daily lives. They can practically comprehend the support or obstacles technology offers to solve problems (Koehler and Mishra, 2009).
CK, PK, and TK interactions create secondary-level knowledge types, such as Technological Content Knowledge (TCK), which refers to understanding the interactions between CK and TK. It includes knowledge about technologies used in the relevant domain (e.g., chemistry; e.g., utilizing simulations for ionic compound dissolution in water) (Mishra and Koehler, 2006). Technological Pedagogical Knowledge (TPK) refers to the use of technology for educational purposes (Mishra and Koehler, 2006; Mishra et al., 2010). Regardless of the domain (i.e., chemistry or history teaching), teachers' use of Canva and similar technologies to prepare mini-quizzes can be used as an example for TPK.
Finally, TPACK is a special combination of all core elements. TPACK refers to the understanding of the most effective pedagogical strategies that can be applied, as well as the most appropriate technological tools that can be used to teach a particular domain. It is formed by the complex interactions of all knowledge components (Mishra and Koehler, 2006). For example, a chemistry teacher who observes that high school students struggle with abstract thinking uses animation to illustrate ion and electron movement while teaching the topic of electrochemical cells. The chemistry teacher integrates animation technology into the lesson to make the reactions in the half cells, the changes in the electrodes (e.g., the cathode gaining mass), and the ion and electron movements more concrete.
Today's teachers need to master TPACK as much as possible to develop innovative, creative, and effective classroom implementation, given their students' engagement and interest in technology (Li et al., 2022). Teachers should integrate technology into teaching and learning (Lee and Zhai, 2024). As a result, TPACK has become an essential component for teacher training.
To teach a conceptual understanding of science, according to the TPCK [TPACK] framework, teachers need to focus on learners’ difficulties (e.g., visualization of particles) in learning science topics (e.g., dissolution of NaCl salt in water), determine how to help learners overcome their issues, then look for technological applications and integrate them into the lesson. Thus, TPCK is fruitful in helping learners, teachers, and teacher educators in solving instructional problems (Yerdelen-Damar et al., 2017, p. 396).
Hence, more support and emphasis are required in teacher education regarding technology infrastructure to ensure a broader application of the scope of TPACK in learning and teaching processes in the coming years (Hasriadi and Nurul, 2023).
The impact of AI on the educational process necessitates a critical reevaluation of the interaction among technology, pedagogy, and content (Ning et al., 2024). In response to the rapid adoption of AI in the field of education, Celik (2023), Feldman-Maggor et al. (2025) and Lorenz and Romeike (2023) proposed that an AI dimension should be added to the TPACK construct. According to Celik (2023), “as teachers have more knowledge to interact with AI-based tools, they will better understand the pedagogical contributions of AI” (Celik, 2023, p. 8) (Fig. 2). To conclude, the TPACK construct is evolving into the AI-TPACK construct. AI-TPACK is an AI-based framework designed to help teachers achieve their educational goals more effectively.
![]() | ||
Fig. 2 Intelligent-TPACK framework with its components (Celik, 2023, p. 8). |
AI-TK refers to teachers' awareness of AI and their competencies in communicating with AI applications and using AI applications at a basic level. AI-TCK refers to teachers' utilizing and interacting with AI to improve their content expertise. AI-TPK refers to teachers' awareness of the pedagogical elements in the content of AI applications. Ethics refers to having the necessary knowledge to understand the results of interactions with AI applications, distinguish between right and wrong, and evaluate them (Celik, 2023). “In addition to teachers’ technological and pedagogical knowledge, their ethical assessments play an important role in effective AI integration.” (Celik, 2023, p. 2). Examples of ethical issues include biases related to gender, fairness, the ambiguity surrounding the developers of the AI tool, the knowledge provided by AI tools, and inclusiveness. Moreover, hallucination is another issue that teachers consider when using AI. Feldman-Maggor et al. (2025) revealed an example of ChatGPT 3.5's hallucination. A chemistry teacher engaged in a dialogue with AI and asked the students to identify possible alternative conceptions in chemistry to the model. The teacher also asked for references for those possible alternative conceptions. The AI provided publications that did not exist. To conclude, the Institute for Ethical AI in Education (2022) stated that for the better use of AI in education, the issues listed above should be introduced to users (e.g., teachers and learners) to make them aware of both the potential enrichment and the drawbacks of the new technology. Ultimately, AI-based decisions should be critically evaluated by users in light of the points outlined above.
The findings of the studies in which the participant group consists of experienced teachers and higher education instructors emphasize that ChatGPT can be an effective tool in teachers' and instructors’ lesson planning processes. However, guidance and critical thinking are important for correct use. Van den Berg (2024) worked with lecturers and trainers from various educational levels, including higher education (both private and public), primary schools, and technical vocational education. In the study, Van den Berg (2024) examined how the participants used AI tools, particularly ChatGPT, in their teaching processes. The educators facilitated their teaching by utilizing AI in tasks such as lesson planning, text translation, creating assessment questions, and promoting critical thinking. The study emphasized the potential transformation of AI in education, while also acknowledging limitations such as accuracy, bias, and reliability. Educators have used AI for personalized teaching and more efficient lesson presentations; however, it was concluded that institutional guidance is necessary for effective integration. In another study by Clark et al. (2024), four experienced general chemistry instructors used ChatGPT-4 to develop their university-level lesson plans for teaching historical experiments (e.g., Thompson's Cathode Ray experiment). The participants utilized ChatGPT in various tasks, including creating lesson plan outlines, discussing teaching strategies, explaining calculations, tailoring explanations to student levels, and developing assessments. The dimensions of instructional objectives, background knowledge, instructional strategies, resources, and assessment were addressed in the research process. The findings showed that ChatGPT helped prepare lesson outlines, suggest resources, discuss teaching strategies, explain calculations, create explanations tailored to student levels, and design assessments. However, it was insufficient in producing visual aids and slides. In this context, it is stated that ChatGPT can complement the teacher's role in creating tutorial content, but cannot substitute it (Clark et al., 2024).
Unlike all the summarized studies, Powell and Courchesne (2024) did not work with any participant group. They analysed the lesson plan documents prepared by ChatGPT and examined ChatGPT's efficiency in creating lesson plans. The study employed a case study design to investigate the effectiveness of generative AI in creating lesson plans aligned with the Massachusetts curriculum framework. The lesson plan was found to be aligned with the 5E instructional model and the curriculum, but it contained missing details and inaccuracies. These findings, similar to previous studies, emphasize the need for teachers to be cautious when utilizing AI outputs. It was also concluded that careful guidance is needed in teacher training to enable educators to use AI effectively. In this context, while the integration of AI in education enhances active learning strategies in science classrooms to increase student engagement and promote critical thinking and deep understanding of scientific concepts, it requires a focus on professional development training for teachers to use these technologies effectively and a human-centred approach and attention to ethical issues such as data privacy (Vorsah and Oppong, 2024).
Although there are studies in the literature on PSTs’ and experienced teachers’ lesson planning with ChatGPT, no study, to the best of our knowledge, compares PSTs, induction year, experienced in-service, and teacher educators’ lesson planning process with AI. In addition, existing studies generally collected a single type of data source (e.g., lesson plan). Given the domain- and topic-specific nature of the TPACK (Chai et al., 2013), elaborating on the lesson-planning process for a specific topic is necessary. With participants’ varying levels of teaching experience and prior knowledge, as well as multiple data sources, this study has the potential to inform the literature regarding the different profiles of AI use for lesson planning, specifically in the context of the acids and bases topic.
The main and sub-research questions guiding the study are:
What are the profiles of participants with different years of teaching experience in interacting with ChatGPT to prepare lesson plans for teaching acids and bases for middle school students?
i. How do participants utilize ChatGPT in the context of lesson planning?
ii. What are the purposes of interacting with the participants and ChatGPT during lesson planning?
iii. To what extent did the participants deal with the content provided by ChatGPT during the lesson plan preparation process and include it in the plan?
Participants | Gender | Degree | Professional experience | Teaching acids–bases | Perceived tech. competency (over 10) |
---|---|---|---|---|---|
(–) is used in the sense of absence and (✓) sign is used in the sense of presence. | |||||
Zoe | Female | PST | — | — | 8 |
Nancy | Female | PST | — | — | 6 |
Emily | Female | Graduate student | One semester | — | 9 |
Teagan | Female | PhD candidate | Eight years | ✓ | 8 |
Carter | Male | PhD candidate | Six years | ✓ | 8 |
Asley | Female | Full professor | Twelve years | ✓ | 9 |
Based on the related literature (Shulman, 1987; Stojanov et al., 2024; Wijaya et al., 2024), the participants' teaching experience, familiarity with AI tools, previous experience of teaching the acid–base topic (i.e., the subject-specificity of TPACK), and how competent they consider themselves in terms of technology use are important for this study.
Regarding familiarity with AI, all participants reported having experience with ChatGPT for general use and educational purposes (e.g., preparing projects or classroom activities). Based on the data collected via the participant information form, all participants previously used ChatGPT and other Large Language Models (LLMs) (e.g., GEMINI). Except for Zoe, the other participants did not have any formal training in using Gen AI tools. Zoe stated that she participated in basic AI training at the entrepreneurship centre for a short time. She had the opportunity to examine and discuss the purposes for which AI applications are used in education, as well as examples of studies on AI. Additionally, all other participants read research papers on AI or its use in education, except Nancy.
Regarding technology competency, Nancy gave herself a six out of ten, indicating that she did not consider herself technology competent. The other participants rated themselves between eight and nine out of ten, indicating that they were highly competent (Table 1).
Regarding the degrees held by participants and their teaching experience, Zoe and Nancy are senior PSTs enrolled in the four-year undergraduate program. They were in the seventh semester of the program (i.e., each semester has 14 weeks). They took general science content courses (e.g., general chemistry, physics), pedagogy courses (e.g., classroom management), technology courses (e.g., instructional technologies), and science-specific pedagogy courses (e.g., science curriculum, science teaching methods I&II). They did not have any teaching experience. Emily, the induction year teacher, graduated from the same undergraduate program that all the PSTs enrolled in and experienced teachers graduated from (i.e., a four-year undergraduate science teacher education program). Then, she enrolled in a graduate program. She was taking graduate courses at the time of this study and designing her thesis proposal on GenAI use in science education. She also started to work in a state middle school three months ago. However, she had not taught the acids and bases topic previously (Table 1). Teagan and Carter graduated from the same science teacher education program and then completed their graduate studies. They have over five years of experience in science teaching at middle schools and were PhD candidates at the time of the study. Both had taught the topic many times previously (Table 1). Finally, an experienced chemistry teacher educator, Asley graduated from secondary science teaching undergraduate and graduate programs. She had taught the topic many times before (Table 1).
To receive the necessary details of the participants, the researchers prepared a ‘Participant Information Form’. An expert opinion was taken, and revisions were made in light of the experts’ suggestions. In the final form, there were questions about perceived technological competence, participation in training on AI use, familiarity with AI use in education, teaching experience in years, and experience teaching acid–base topics at the middle school level. All participants completed the form at the beginning of the research.
The CoRe (i.e., a lesson plan format) was developed by Loughran et al. (2012) as a matrix format for lesson planning (Fig. 4). The CoRe is both a research tool for accessing science teachers’ understandings of content and a way of representing this knowledge. The CoRe helps researchers codify teachers’ knowledge in the content area under investigation, thereby identifying important content features that science teachers notice and respond to. Each question in the CoRe aligns with a few key ideas identified by the teacher. Due to the matrix format, teachers planning lessons answer the eight questions for each big idea.
All the participants were requested to underline the parts taken from the AI's suggestions in the lesson plan submitted to the researchers. Therefore, the researchers could understand the extent to which, and for which components of instruction (e.g., instructional strategy, or learners’ alternative conception) were informed by ChatGPT. In other words, at this stage, we focused on the participants’ enacted TPACK, enriched by the use of ChatGPT.
Yet another data source was a copy of the dialogue with ChatGPT. All participants shared the link to the dialogue for further analysis (e.g., the prompts and the CoRe questions on which more support is needed). Additionally, after preparing the lesson plan (i.e., the CoRe), the participants were asked to write a short reflection paper on their experience preparing the CoRe on acids and bases topics with ChatGPT support. Specifically, we provided the participants with a short description of what we expect:
Please explain in a paragraph how you benefited from ChatGPT in preparing a lesson plan on acids and bases. You can specify how the process progressed, how your lesson plan preparation process with the AI started and continued, in which aspects you received more or less support, etc.
Finally, a focus group interview was conducted to collect rich data to determine at which stage, to what extent, and how the support was received from ChatGPT during the CoRe preparation process by the participants. The interview was then transcribed verbatim. The questions were prepared for the focus group interview and shared with an expert in science education and TPACK. A revised version of the questions was prepared in light of the feedback. This interview was conducted online after all participants prepared the CoRe. It lasted 90 minutes. The questions were:
1. When preparing the plan, the CoRe, on which question did ChatGPT help you more? What is the possible reason for this?
(i) Were there any questions where you used ChatGPT's suggestion without any changes? Can you give an example?
(2) Were there any steps where you did not get any help?
(3) How did you incorporate the answers into your CoRe?
(i) Did you use them directly, or did you make changes in the answers? Explain.
(4) While preparing the CoRe, were there any ChatGPT answers that surprised you positively or negatively? Can you give an example?
(5) Was there a situation where you realized that ChatGPT hallucinated or gave a biased answer? Please explain.
(6) How was the setup of the prompts you sent to ChatGPT while preparing CoRe? When the scope of the prompts changed, did the answers you received change?
(7) Did using ChatGPT on acids and bases contribute to you in terms of pedagogical content knowledge? Were you able to solve the difficulties in teaching this subject with the help of ChatGPT?
In the first round, the researchers coded a randomly selected participant's lesson plan, dialogue with ChatGPT, and reflection paper on how the dialogue proceeded. Then, they came together to compare and contrast the coding. The intercoder agreement was calculated as Cohen's Kappa. In the first round, we obtained a value of 0.44, which falls into a medium agreement category (Landis and Koch, 1977). After coding and discussing the inconsistencies (i.e., the nature of the prompts that are either specific to the big ideas set or curriculum objectives or not, and the presence of ethical issues), we developed an alternative lesson plan. In the second round, the agreement was calculated as 0.84, which indicates a strong agreement (Landis and Koch, 1977). After addressing the minor inconsistencies, researchers coded the remaining data. At the end of this stage, we compiled a summary table for all participants and aspects (e.g., the process, prompts, and handling the content proposed by ChatGPT) (Table 2).
Participants | The process | Interaction purposes | Dealing with ChatGPT's content offer | Ethics and other issues |
---|---|---|---|---|
Zoe | “First, I tried to fill in the relevant fields given in the blank lesson plan myself. Since I was still at the beginning of my profession and I had not taught the topic of acids and bases before, I thought that I could not answer the questions like a professional teacher. Then I got help from ChatGPT’’ (from reflection paper) | All the alternative conceptions listed and the instructional strategies planned to be used were taken directly from ChatGPT's offer (from the lesson plan, CoRe) | At the very beginning of the lesson, she takes the activity suggested by ChatGPT to attract attention: she pours an acid and base into purple cabbage juice. She asks the students which one is the acid and which one is the base based on the colour change. (From lesson plan) There is no way for them to know this, but they can guess. | Non-existent |
Seek ideas | She introduces the definitions of acid and base by providing a few examples and then asks a question taken from ChatGPT: “Why do you need to brush your teeth after eating candies?” The student does not know the concept of neutralization. She took those from ChatGPT; however, she was not able to critique ChatGPT's offer. | |||
Dialogue history also supports the reflection paper data | ||||
Preparing a lesson plan with ChatGPT | Mostly unable to deal with ChatGPT's offers | |||
Asley | “First of all, I prepared a lesson plan for three different big ideas. When this work was done, I had a question mark in my mind whether I was able to fully predict the alternative conceptions of 8th graders. Since I did not teach this subject at the 8th grade level, I felt the need to ask.” (from reflection paper) ChatGPT dialogue history also confirmed that. | “I had a prompt about the alternative conceptions. The offers were not at the level I wanted…. But in terms of assessment and evaluation, it saved me time. I mean the concept cartoon in terms of creating a visual.” To check her knowledge about the alternative conceptions she wrote on the plan & to save time and have professional-looking visuals (e.g., concept cartoons) (from focus group interview) | Asley: based on the 8th grade curriculum, can you prepare a branched tree diagnostic test including possible alternative conceptions about acids, bases and their general properties? | Existent |
Preparing own lesson plan and then asking ChatGPT | ChatGPT: sure! Here is a structured branched tree on acids and bases based on grade 8. | “Can you show the sources from where you get those alternative conceptions?” (From the dialogue with ChatGPT) | ||
Asley asked when she realized some alternative conceptions in this structure, such as acid–base forces, which were not focused on in Grade 8: | ||||
Asley asked: “Are these at the 8th grade level?” | ||||
Then ChatGPT replied: “You are right, the content may have been a bit verbose. At the 8th grade level, I should have presented a more simplified structure, in line with the knowledge level and age group of the students.” | ||||
“I want a concept cartoon on the formulas of acids and bases.” (from dialogue with ChatGPT) | ||||
Able to deal with ChatGPT's offers |
In the second part of the analysis (i.e., inductive), based on the summary table we created (Table 3), we attempted to categorize the participants' interactions with ChatGPT during the lesson planning process. This part was inductive, and we formed the categories from the data (Patton, 2002).
Participants | Prompts | Dealing with what ChatGPT offers | Interaction purpose | Ethical issue | Profiles |
---|---|---|---|---|---|
Nancy | Specific to curricular objectives | She took all of ChatGPT's recommended activities without critically evaluating them | Planning by taking all the activities from ChatGPT | She did not address | AI-reliant planner |
Zoe | Specific to curricular objectives | She took almost all of ChatGPT's recommended activities without critically evaluating them | Planning by taking almost all of the activities from ChatGPT | She did not address | AI-reliant planner |
Emily | Specific to curricular objectives | She took all of ChatGPT's recommended activities without critically evaluating them | Planning by taking all the activities from ChatGPT | She did not address | AI-reliant planner |
Teagen | Specific to curricular objectives, specific to the instructional strategy that she planned to implement | She dealt with some of ChatGPT's offers | Preparing her lesson plan and then dialoging with ChatGPT through the plan to address some missing points in the plan | She did not address | AI-collaborative planner |
Carter | Specific to curricular objectives | He dealt with ChatGPT's offers | Preparing his own lesson plan and then asking ChatGPT to critique the plan | He did not address | AI-assisted plan refiner |
Asley | Specific to alternative conception, specific to grade level | She dealt with ChatGPT's offers | Preparing her lesson plan and then asking ChatGPT questions on two specific points | She asked for the sources where ChatGPT took the alternative conceptions | AI-independent planner |
In the category-defining first step we determined the endpoints of the spectrum. At the lowest level, a category was defined in which all suggestions from ChatGPT were accepted without question. The process continued by asking ChatGPT for a lesson plan, and there was no consideration of ethical implications. Respondents in this category made plans based on AI. They even had the AI make plans, which indicates AI-reliant Planners. Nancy, Zoe, and Emily were in this category. At the other end of the spectrum there is a profile of a user who asks ChatGPT questions on a few specific points (e.g., alternative conceptions common to middle school students), evaluates the content presented by AI, and even provides the original source articles of the alternative conceptions presented to her, which indicates an AI-independent Planner. Asley was in this category. Between the two, Carter and Teagan's prompts, dealing with offers from AI and the interaction purposes were not only different from the other categories but also different from each other. Additionally, they were neither AI-reliant nor completely independent. Hence, Teagan's and Carter's categories were named in light of their data, which indicated that Teagan was an AI-collaborative planner, whereas Carter was an AI-assisted plan refiner, as Carter did not plan with AI. On the contrary, he asked AI to critique his plan. In this category naming process, for consistency with the literature, we utilized the category names proposed by Guner and Er (2025) and Stojanov et al. (2024) to name these endpoints. They proposed great examples of profile names (e.g., AI-Collaborative Coders, AI-Assisted Code Reviewers, and AI-Independent Coders, all-around low-trusters, information seekers). We changed the ‘coder’ and ‘reviewer’ category names to ‘planner’ for our context.
Finally, to address trustworthiness, data triangulation (i.e., lesson plan, reflection paper, ChatGPT dialogue history, and focus group interview), calculation of intercoder agreement for two rounds, and member check were employed (Patton, 2002). Five available participants were informed about the data analysis results and the profile details. The authors presented the differences and similarities of the participants’ collaboration process with ChatGPT. Those five participants approved the profiles.
The necessary ethical permission was obtained from Van Yuzuncu Yil University, Social and Human Sciences Publication Ethics Committee, with the number 2025/39. All participants voluntarily participated in the study. They signed the consent form before the study started. To ensure anonymity, we utilized pseudonyms.
First, I tried to fill in the relevant fields in the blank lesson plan. Since I was still at the beginning of my profession and had not taught the topic of acids and bases before, I thought that I could not answer the questions like a professional teacher. Then, I got help from ChatGPT.
On the contrary, Nancy preferred to ask ChatGPT to prepare a plan, and then review it. She wrote in her reflection paper:
I needed to have a general plan in mind. I used ChatGPT for this. First, I created a general framework by asking ChatGPT the questions, “Can you prepare a lesson plan for 8th grade science on acids and bases?” and “What activity can I do on acids and bases?”
Second, regarding interaction purposes, AI-reliant planners seek ideas for learners’ alternative conceptions, difficulties in learning the topic, instructional strategies for teaching ‘acids and bases,’ and assessing middle school students’ understanding. The analysis of Zoe's CoRe revealed that all statements about learners’ possible alternative conceptions (i.e., the sixth question in the CoRe) related to the topic were generated by ChatGPT. In the ChatGPT conversation history, Zoe asked, “I will teach the acids and bases topic to my 8th-grade students. Which alternative conceptions are observed in the acids and bases, and pH?” In the CoRe, Zoe listed all the alternative conceptions offered by ChatGPT (e.g., ‘acids and bases are dangerous and should not be touched’, ‘students may think that all acids and bases are strong and dangerous’, ‘acids are always hot, bases are always cold because students may think that acids are hot because they burn, while bases feel cold’).
While preparing the lesson plan, Zoe used ChatGPT to identify the alternative conceptions that students may have. Since Zoe did not verify whether the alternative concepts presented by LLM ChatGPT were already in the literature, the authors did so. The literature review showed that some of them had equivalents in the literature. For example, statements such as “acids and bases are dangerous, so they should not be touched”, “acids are always harmful, bases are not harmful”, and “everything that tastes sour is acid” were reported by Ross and Munby (1991). Likewise, “cleaners such as soap and detergents are only basic” (Nahadi et al., 2022), and the idea that “pH value is just a number” (Mubarokah et al., 2018) exist in the literature. However, “the colour change of the reagents is irreversible”, “acids are always hot, while bases are cold”, and “pH measurement can only be done with substances such as litmus or phenolphthalein” are not reported as alternative conceptions in the literature.
In the focus group, Zoe stated:
I mostly benefited from ChatGPT in terms of the alternative conceptions and typical difficulties that students might have. I needed this because I thought that I would not be able to identify them since I had no previous experience teaching the acid–base topic.
In addition to alternative conceptions, the analysis of their CoRes revealed that they took almost all the instructional activities written on the CoRe from ChatGPT. For instance, Emily aimed to teach the students how to compare acids and bases, classify substances using the concept of pH, and the similar properties of acids and bases and their uses in daily life (i.e., three big ideas Emily set). To achieve her goals, she planned to discuss the properties of acids and bases using a two-column table, focusing on pH measurement and the daily life uses of acids and bases, incorporating simulations and videos. Likewise, Nancy stated in the focus group interview that she received the most help finding activities suitable for the topic.
The part where I used the AI most was instructional strategy. For example, I noted that these pH-related activities are somewhat abstract. I asked if we could show students a simulation about it [to make the topic more concrete]. ChatGPT recommended a website called PhET Simulations… It was great to have a simulation idea.
Similarly, in the assessment part of the CoRe (i.e., the last question), Emily utilized all assessment strategies provided by ChatGPT. She took the idea of asking learners to prepare a poster or class presentation about acids and bases, supported with visual material, make a list of substances containing acids and bases that they use at home, and explain the properties of these substances. Additionally, she copied ChatGPT's offer: “Learners can be given self-assessment questionnaires that include questions such as “Which concepts of acids and bases did I find difficult?” and “At which point am I still undecided?”. To conclude, the role that the AI-reliant planners assigned to AI was knowledge authority.
Finally, regarding the integration of information provided by ChatGPT, the AI-reliant planners encountered issues with incorporating ChatGPT's suggestions into their plan. For instance, in her CoRe, Nancy set the third big idea as “learners will be able to classify these substances by the use of pH value to understand the differences and similarities between acids and bases, and generalize those substances by creating a pattern.” To achieve this goal, Nancy stated that she would use the PhET simulation proposed by ChatGPT. However, when we examine the flow in the lesson plan, it is evident that she plans to use this simulation suggestion without even defining pH. In other words, when integrating ChatGPT's suggestion, she could not focus on the flow of the lesson. Similarly, ChatGPT proposed an alternative conception that 'acids are always hot, and bases are cold.' Zoe took it without inquiry and wrote it to the CoRe. Since her experience and reading of the literature were insufficient, she took it directly. When we reviewed the conversation history, we noticed that she did not send any follow-up prompts to question ChatGPT's offers. Additionally, there are no instructional activities to address the alternative conceptions identified.
In Emily's CoRe, similar to Nancy's CoRe, she took the idea of preparing solutions of varying concentrations and forming a pH spectrum, which is consistent with her 2nd big idea (i.e., classifying substances by their use of pH). However, for the 1st big idea (i.e., comparison of acids and bases), she took ChatGPT's offer, which was “Prepare a table with two columns (acids and bases). Ask the students to write down the properties appropriate to each group. Then discuss with the whole class, emphasizing the common characteristics.” Although this idea may work, it seems more like a method for diagnosing learners’ prior knowledge about the topic rather than a teaching strategy. Without any supportive hands-on activity or video, only taking learners’ ideas, it seems that she did not criticise the suggestions made by ChatGPT.
Second, she aimed to secure the model's offer to complement her own. In the focus group interview, she stated:
In the alternative conception section, I outlined some possible options based on my experience. I also found ChatGPT's suggestions useful. Likewise, in terms of typical difficulties, it offered different perspectives in addition to the situations I experienced, and I blended them into my plan.
Teagan stated that the alternative conceptions written in the plan were not based on her own classroom observations and experiences. She received some support from ChatGPT in this process. She added the alternative conceptions suggested by AI. Similar to previous ones, Teagan also did not check the ChatGPT's output for students’ alternative conceptions. The authors checked all of them. For instance, “all acids are harmful” and “acids and bases are opposite substances, they have no common properties” overlap with the ones previously identified in the literature (e.g., Ross and Munby, 1991; Mubarokah et al., 2018). Likewise, “thinking that all acids are strong and all bases are weak” was revealed in the literature (Ross and Munby, 1991; Mubarokah et al., 2018; Nahadi et al., 2022). On the other hand, “pH of strong acids is close to 1, pH of weak acids is close to 7; pH of strong bases is close to 14, pH of weak bases is close to 7” were not found in the literature. Similarly, “thinking that acids always show acidic properties when they react with water” and “focusing only on the properties of acids and bases against water” were not found in the literature.
In this interaction, Teagan used ChatGPT as if she were exchanging information with a colleague to complete the missing points of the plan in her mind (i.e., complementary role). In her CoRe plan, Teagan, who planned to use a design-based STEM activity, asked ChatGPT to write a scenario for daily life related to this activity. In other words, she had a plan with a specific aim and flow. She only needed a daily-life problem to make the STEM activity more interesting and attractive for students, which was copied below.
You are on the R&D team of a Baby Care Products company. Recently, the company received many complaints that the baby shampoo irritates babies’ eyes. The company officials are asking your team to design a shampoo formula that does not harm or irritate babies' eyes.
In the focus group interview, Teagan stated:
I integrated the support I received from ChatGPT into my plan. I used ChatGPT's suggestions in some question examples and scenarios that I would use in the lesson because it made my job easier… I have taught acids and bases many times in my classes. However, ChatGPT has different ideas. It puts everything packaged in front of you. There will be some things you take from these. There are some situations that you add to your repertoire.
Finally, when she issued ChatGPT's offer, she filtered it through her pedagogical content knowledge, which she gained through eight years of teaching experience at middle school. In the reflection paper, Teagan wrote, “I blended my eight years of teaching experience at middle school and ChatGPT's suggestions.”
She described how she dealt with ChatGPT's suggestion in the focus group interview.
I did not want ChatGPT to intervene too much. I found it [ChatGPT's suggestions] superficial. In terms of alternative conceptions, for example, I found it practical to consider its suggestions and adopted the three or four alternative conceptions it proposed. ChatGPT offered seven alternative conceptions. When I checked them, I noticed some things I observed in the classroom. Some of them were unreasonable; I did not take them. ChatGPT can assist us in presenting packaged information. In this sense, ChatGPT really saves time. Can it be adapted to the classroom? It can definitely be adapted, but of course, it should be done to the extent that the teacher knows his/her students.
Likewise, she took some of the items written in the assessment part of the plan. The question taken from ChatGPT is copied below.
She took three questions offered by ChatGPT, which were a good match with the activity she planned. In the CoRe, Teagan planned to ask students to observe the colour changes by dropping black cabbage juice on different solutions prepared (e.g., vinegar). pH strips were introduced, and students were asked to measure and note the pH.
In a chemistry laboratory, a group of students work with four different liquid samples, measuring the pH levels of different liquids. The students conduct a test using litmus paper to measure the pH of each liquid. At the end of each trial, they would note the colour of the litmus paper and determine whether the liquids were acidic or basic. The students obtained the following results: |
Lemon juice: Red |
Soapy water: Blue |
Vinegar: Red |
Carbonate solution: Blue |
Based on these data, which liquid is acidic and basic? Explain in which ranges the pH values of these liquids can be found and in which products these liquids can be found in daily life. |
I started my ChatGPT dialog with a prompt like “I prepared a 5-hour lesson plan on acids and bases for 8th grade in a middle school with 45 students. Can you provide suggestions and criticisms for the plan?” However, ChatGPT created a lesson plan directly without criticizing my own. At this point, I changed my prompt. After I restated that I had prepared a plan and wanted to move forward with it, the conversation became productive.
The process consisted of the AI reviewing the existing lesson plan and providing suggestions.
Second, as the flow revealed, the dialogue with ChatGPT aimed to ask the AI to critically examine the plan with a well-defined context (i.e., the number of students, the length, grade level, and the topic). Carter gave ChatGPT a critic and, to some extent, a complementary role. For instance, he mentioned an interesting case in the focus group interview.
… There was a point where I said how I could have forgotten it. I do not know why I did not put it in my plan. There was a point where ChatGPT made a great point; it noted that we need to assess students' readiness levels. Immediately afterward, I asked it to explain this more. It stated that 8th-grade students generally struggle to understand abstract concepts. Therefore, it said to be careful to make a plan that includes visuals, experiment, activities, and videos.
Finally, regarding dealing with ChatGPT's offer, due to the difference in his interaction with ChatGPT, Carter mainly dealt with the model's critics. For example, Carter's plan included a long section on the properties of acids and bases. At this point, the model suggested that he divide the section and include activities to ensure the student's active participation and attention. When we examined Carter's CoRe, we realized that he divided the related section into sub-sections between which he included videos focusing on acids and bases properties, a game (i.e., in which the properties are read to determine whether the substance belongs to the acid or base class), and brainstorming (i.e., to attract students’ attention). Likewise, ChatGPT criticized a point that made him delete the part from the plan. In the focus group interview, he explained it:
In my plan, there was a section mentioning the taste of acids and bases. ChatGPT issued a serious warning about this. It said, “Don't use this example [the point about tasting soap]; take this example out.”
Regarding the complementary role, in the reflection paper, Carter wrote that although he did not write a specific prompt for students’ alternative conceptions of the topic, ChatGPT provided possible ones. In his CoRe, in addition to his statements of the possible alternative conceptions (e.g., increasing acidic properties as pH increases, acids and bases always cause colour change in litmus paper), he also added ChatGPT's offer (e.g., bases are harmless, acid with a pH of 5 is twice as acidic as an acid with a pH of 6). In the focus group interview, Carter stated that he received some ideas related to alternative conceptions; however, he also wrote many other possible ones based on his experience in teaching. He added:
In the process, I benefited from ChatGPT because its suggestions were efficient, paying attention to the student's readiness and highlighting key aspects in a specific part of the plan.
Carter did not check the existence of the alternative conceptions. When we checked them, we found that the first one (i.e., bases are harmless) was reported by Bučková and Prokša (2021). The second alternative conception related to pH was not found in the literature as presented here, but it was stated that students had difficulty understanding the concept of pH and could not fully comprehend what the mathematical value of pH meant (Nakhleh, 1990; Sheppard, 2006). In our search, we discovered that Sheppard (2006) employed a similar task. “Students were shown beakers with colorless solutions marked ‘pH 3’,’ pH 5’ and ‘pH 11’, and were asked to explain their sub-microscopic composition using drawings.” (p. 34). Sheppard (2006) reported:
Only two students defined pH as pH = −log[H+] and only one of these was able to explain correctly the hundred-fold difference in H+ concentration between the pH three and pH five solutions, despite all students having had instruction that had emphasized the use of the equation, including several simple calculations of pH values from H+ concentrations. For most students, pH was a linear scale (p. 36).
Although not exactly the same, an alternative concept with a similar meaning was presented by AI. Carter received ChatGPT's suggestions by filtering them through some criteria (i.e., students’ readiness). However, those are mainly based on the pedagogical aspect rather than PCK. The former focuses on pedagogy that is not specific to the content of acids and bases. The latter is bifocal with both pedagogy (i.e., teachers’ knowledge about 8th-grade students’ characteristics, such as difficulties in abstract thinking in general) and content (i.e., teachers’ knowledge about 8th-grade students’ readiness, difficulties, and alternative conceptions related to acids and bases for 8th-grade students).
Second, in terms of the purpose of the interaction with the AI, analysis revealed two main purposes. The AI-independent planner's first interaction purpose was to check her knowledge about the alternative conceptions of middle school students, a different group from high school students, with which she is familiar.
The second purpose was to create professional-looking visuals (i.e., concept cartoon and diagnostic branched tree with possible alternative conceptions), thus saving time. In the focus group interview, Asley said:
I am unable to create a concept cartoon by myself. I use ChatGPT Pro, especially when creating visuals like these. For example, I get help when I need to draw at a particle level or when I need to create a concept map. Drawing on a computer is very difficult and time-consuming, and the result often appears amateurish when compared to drawing by hand. In this sense, I think it will help me with measurement and evaluation.
In her dialogue with ChatGPT, first, she wrote a prompt: “Can you draw me a concept cartoon on acids, bases, and pH at an 8th-grade level, including alternative conceptions?” The concept cartoons offered by ChatGPT did not satisfy Asley, so she wrote, “I want a concept cartoon with alternative conceptions related to the formulas of acids and bases.” She integrated the concept cartoon into her plan (Fig. 5).
In the reflection paper, Asley also wrote:
For the assessment section, I also asked ChatGPT to form a diagnostic branched tree to assess the extent to which the lesson I planned addressed the learners’ alternative conceptions. However, I did not use the suggestion that ChatGPT provided because it was not visual, but rather in a statement form. So, I drew it by myself.
Asley's CoRe analysis revealed that she accepted ChatGPT's offer only for the assessment part. As mentioned above, she planned to use the concept cartoon that ChatGPT drew. However, the use was not directly taken from the cartoon. On the contrary, she planned to revise it. In the focus group interview, Asley stated:
This was not the concept cartoon I had envisioned, but I had indicated in my plan that I would use it with some revisions. In the speech bubble, ‘I am an acid and my formula starts with H’. In the cartoon, the child also has N and H atoms on his sweatshirt. I will revise it by making it NH3. The child will ask the question, ‘Then is NH3 also an acid? My students will answer that child. There is an alternative conception among the students: “Acids are chemicals containing hydrogen. Bases contain OH.” … I envisioned using it by making such an arrangement.
In other parts of the CoRe, she relied on her repertoire, which appeared to be rich. For example, ‘identifying acidity/basicity pattern with pH value’ is a big idea she set; Asley planned to use an activity that allowed students to measure the pH of chemicals from daily life and create their pH scales. The students then use those pH values to create their pH scales by colouring them with crayons and exhibiting them in the laboratory or classroom. Asley also proposed a similar activity for a school without a laboratory environment. She proposed an option that included PhET Interactive Simulations (i.e., the pH Scale simulation, which lets students measure daily life solutions such as milk, coffee, vinegar, orange juice, vomit, blood, soap, and drain cleaner). In the plan, Asley emphasized that the visuals of ions in the acidic environment should also be shown. In this way, she stated that she planned to address the alternative conception of “there are no OH− ions in acidic solutions or no H+ ions in basic solutions” mentioned in the previous alternative conception dimension.
Finally, regarding the information provided by ChatGPT, although Asley used AI in a limited way, based on the dialogue history, she always inquired about the information presented to her in this dialogue. When she asked for a diagnosed branched tree, the following dialogue took place.
Asley: Can you prepare a diagnostic branched tree with possible alternative conceptions about acids, bases, and general properties based on the 8th-grade Turkish curriculum?
ChatGPT: [ChatGPT provided True and False statements] …. [F] Neutralization always occurs at pH 7…… The pH after neutralization depends on the strength of the acid and the base [T]. Strong bases can also be harmful [T]……
Asley: Are these for 8th-grade level students?
ChatGPT: You are right; the content may have been detailed. At the 8th grade level, I should have presented a more simplified structure in accordance with the students' knowledge level and age group.
When we reviewed the related literature on the existence of the “neutralization always occurs at pH 7” alternative conceptions, we found references to Schmidt (1995). In the reflection paper, Asley stated that although she specifically asked for 8th-grade students, ChatGPT's offer included information not taught at middle school (e.g., acid strength and neutralization of weak and strong acids and bases).
In another dialog, Asley asked ChatGPT to draw a concept cartoon with an alternative concept about acids, bases, and pH at the 8th-grade level. After ChatGPT's suggestion, Asley asked, “Which alternative concept is this about? Can you explain what the students think is wrong here?” ChatGPT replied: “This cartoon targets an alternative conception about the relationship between litmus paper and pH. The alternative conception is: The idea that bases (e.g., soap) will also turn litmus paper red.” When we reviewed the literature, we could not find it. As can be seen, her dialogue is based on questioning what the model offers. Asley's knowledge of the learner and curriculum enabled her to filter the suggestions presented to her.
Finally, as observed in Asley's case, the dialogue history showed that Asley deeply evaluated ChatGPT's offer. Asley asked the model to give citations for the possible alternative conceptions provided. ChatGPT replied to her:
My information about alternative conceptions is based on common findings in general education research and science teaching. However, for direct references, I cannot say at this time that this information comes from specific academic studies or teaching guides because it is based on general pedagogy and science education literature (from dialogue history).
In the following prompt, Asley wrote:
I would like information on which alternative conceptions 8th-grade students have about the general properties of acids and bases. Provide article tags for direct quotations from current academic articles or guides related to science education.
ChatGPT stated that it scanned five websites and provided five resources related to learners’ alternative conceptions of acids and bases (e.g., Mubarokah et al., 2018) (from dialogue history).
To conclude, profound variances were observed between cases with limited experience and cases with long-term and varied experience (i.e., both teaching and researching acids and bases) in terms of the flow of using the AI, the purpose of use, and the rate at which the model suggestions were reviewed and incorporated into the lesson plan.
In our cases, two PSTs and Emily (i.e., a teacher in her induction year) adhered to ChatGPT while preparing lesson plans for teaching acids and bases. They also directly incorporated ChatGPT suggestions into the lesson plan. Additionally, two PSTs and Emily viewed ChatGPT as a source of information, with their intended use being to seek the model's suggestions about instructional strategies, learners’ difficulties, and alternative conceptions related to acids and bases. However, they did not need assistance in setting curricular goals and objectives. Likewise, Lee and Zhai (2024) reported that PSTs have a high dependency on ChatGPT in lesson planning. Additionally, Lee and Zhai (2024) stated that participating PSTs did not need support in every component when preparing a lesson plan. Similarly, in their study, the PSTs were more proficient in curriculum goals and instructional strategies than in integrating those aspects for effective planning.
The literature on AI use by teachers emphasizes the necessity of professional knowledge to prepare a successful lesson plan to help students learn a concept using AI tools (Van den Berg and du Plessis, 2023; Zhai, 2023). The first aspect of this knowledge is technological knowledge (TK), which encompasses the nature of knowledge produced by AI tools (e.g., biased knowledge) and how to effectively deal with and integrate that knowledge into planning (Zhai, 2023). Based on the participants’ self-reports, only Nancy rated herself 6 out of 10 on technology competency, meaning she considers herself above average. On the other hand, Zoe (8) and Emily (9) consider themselves reasonably competent. Moreover, Zoe stated that she had previously participated in basic AI training. Other experienced participants also consider themselves reasonably competent in this sense. Since we did not measure TK and technology proficiency, which is one of the limitations of this study, we have to rely on participants' self-reports. When we consider all the data we collected, the participants have standard features, except for Zoe, who did not receive training on AI tools and is familiar with ChatGPT. Therefore, we argue that a different dimension should explain these profile differences: the purposes of the AI use, the nature of the interaction (e.g., seeing this technology as a source of information), and PCK.
The second aspect of the necessary professional knowledge is PCK and, with the integration of AI technology, TPACK (Van den Berg and du Plessis, 2023; Zhai, 2023).
If teachers simply focus on just integrating cutting-edge GenAI technologies such as ChatGPT into their pedagogical practice without careful consideration of how to best use them, it can be a waste of resources that fails to have a meaningful impact on learning outcomes (Lee and Zhai, 2024, p. 1652).
Careful consideration that Lee and Zhai (2024) indicated can be possible with a solid PCK and TPACK. The studies in the teacher education literature revealed that with experience, teachers could coordinate multiple knowledge types (e.g., content knowledge [CK], pedagogical knowledge [PK], and technology knowledge [TK]) and PCK components (e.g., learner, curriculum, and assessment knowledge) for solid instructional planning and decision-making (Gess-Newsome, 1999; Akin and Uzuntiryaki-Kondakci, 2018). However, PSTs have limited CK, PCK, and TPACK, and require PST education programs to enrich their development (Jong et al., 2005; Akin and Uzuntiryaki-Kondakci, 2018; Tondeur et al., 2020; Ekiz-Kiran et al., 2021). Abell et al. (2009) argued that at the beginning of their careers, PSTs are observers unaware of how the classroom works. In the professional continuum, structured training and courses (Aydin et al., 2013), as well as experience, are valuable sources for PCK (Grossman, 1990) and TPACK (Tondeur et al., 2020) development. The experienced teachers in this study and the teacher educator had teaching experience in addition to other areas (e.g., research on acids and bases, alternative conceptions, and experience).
Regarding the influence of prior knowledge on AI use profiles, limited studies have examined the interaction process between prior knowledge and AI in the fields of chemistry education (e.g., Clark et al., 2024) and programming learning with AI (e.g., Guner and Er, 2025). Clark et al. (2024) revealed that chemistry professors’ prior knowledge about historical experiments (Thomson, Millikan, and Rutherford's experiments) influenced how the professors engaged with AI tools. Likewise, Guner and Er (2025) reported that the participants with limited prior programming knowledge showed a superficial use of the tool. In contrast, those with strong knowledge “prefer to first try coding the task themselves, then they ask ChatGPT for revision or refinement” (p. 20). Likewise, in this study, experienced science teachers’ dependence on ChatGPT was less than that of PSTs. Furthermore, experienced teachers viewed ChatGPT as a colleague, so their process was more akin to exchanging information with a colleague or critiquing a lesson plan. Furthermore, Teagan, Carter, and Asley incorporated a filter role by evaluating the model's suggestions. However, PSTs with limited or almost no experience, for example Emily, could not play that role, which is in line with Zhai's (2023) point. “Teachers must also be able to evaluate the quality and relevance of the information provided by ChatGPT and make informed decisions about its use in the classroom.” (p. 45). As stated above, we argue that years of observing their students and seeing the students' difficulties and mistakes in their exam papers may have served as this filter. The results showed they need limited assistance, especially with learners’ difficulties and alternative conceptions.
Another important point regarding the difference between the profiles was paying attention to ethics and other related issues (e.g., bias, hallucination). Due to the non-questioning nature of the participants except for Asley, the authors checked these alternative concepts presented by AI from the literature. Some of the alternative conceptions offered by the model are found in the literature, while others are not. In fact, in a few cases, alternative conceptions that are not exactly the same but may have a close meaning have been encountered. Moreover, the concept cartoon (Fig. 4) created by the AI also had some issues regarding ammonia's formula (i.e., the formula was given as NH, but it should be NH3). Likewise, the formula of sodium hydroxide was also problematic (i.e., on the blue bottle the formula was written as Naoh, but it should be NaOH). The results showed that content AI provides should be filtered and checked, which is one of the important suggestions that AI users should adhere to (Ivanova, 2025). Because ChatGPT can generate information that lacks a scientific basis or has not been proven accurate, it can exhibit artificial intelligence-induced hallucinations (Tyson, 2023). Only Asley's data had an example of this aspect. She prompted ChatGPT regarding the references to the alternative conceptions that ChatGPT listed. Although other participants, except Nancy, mentioned reading papers about AI use in education, they did not address these issues in their dialogue with ChatGPT. It seems that an important dimension of the TPACK-AI has not yet taken a significant place in the participants' minds. Reading papers and participating in a brief training process did not yield results for the participants. More emphasis and longer training are needed, as noted by Feldman-Maggor et al. (2025).
Yet another point is that, contrary to the PSTs and induction year teachers’ inconsistent decisions in the plan flow, the experienced teachers' and teacher educator's plans with ChatGPT were consistent throughout the lessons. In other words, two experienced teachers and the teacher educator could successfully synthesize their knowledge and experience with ChatGPT's recommendations in a logical sequence. Regarding the issue, Van den Berg and du Plessis (2023) stated that one of the participants in their study (i.e., an experienced teacher) made an excellent analogy for the use of ChatGPT to plan a lesson, that is, “a recipe which still needed a chef.” (p. 6). Based on this analogy and in light of the findings, as teachers gain experience in their careers, they become more capable of critically analysing the lesson plans presented to them. It appears that this critical view is based on several variables, including the level of the students, curricular gains, and the time and flow.
Given that the experienced teachers had master's degrees, these graduate courses may also have enriched their PCK and TPACK. In other words, those courses may have strengthened the filter for the model's suggestions evaluation, in addition to providing teaching experience. As Abell et al. (2009) argued, over time and through various roles (i.e., observer as a PST, apprentice as a graduate student, and independent instructor after PhD), PCK is enriched. Likewise, this aspect could explain the experienced teachers’ (i.e., Teagan and Carter) and teacher educator's (i.e., Asley) profiles. Asley's graduate and PhD education, through which she took advanced teaching and assessment methods courses, her research, and others, most likely informed her about many aspects of the lesson plan. Additionally, considering her experiences as a PhD candidate, her contributions to the planning and teaching of courses, her observations, and her own teaching experiences as an instructor, it is understandable that she only wants to see the model's suggestions on specific points (e.g., 8th-graders' possible alternative conceptions related to the topic). Similarly, Clark et al. (2024) argued that “for instructors seeking to improve their content knowledge, a conversation with ChatGPT can be useful. Conversely, instructors with extensive prior knowledge may have difficulty setting aside their expertise when communicating information to a novice.” (p. 1995).
Finally, teachers' critical approach to AI when using AI-supported systems carries an important responsibility, not only in terms of their pedagogical competencies but also in terms of learning effectiveness and cognitive load. On the other hand, the ethical aspects of algorithms are also another issue. When discussing the ethical aspects of algorithms, Mittelstadt et al. (2016) argue that technical features of systems, such as transparency and auditability, alone do not solve ethical problems. AI systems in education can make seemingly unbiased and objective decisions, but these decisions can create unpredictable inequalities in students' learning processes. In this context, teachers need to be active actors in utilizing the outputs of AI systems while questioning their ethical and pedagogical implications. Finally, the concept of the “help dilemma,” defined by Koedinger and Aleven (2007), emphasizes the importance of finding the right balance between when and how to offer help from a pedagogical perspective in AI-supported instructional systems. As our findings show, the PSTs (Zoe and Nancy) and induction year teacher (Emily) displayed AI-reliant profiles. While this dependency is understandable given their lack of teaching experience, it is our responsibility as chemistry teacher educators to consider how we can reduce this dependency among our pre-service teachers, who are the teachers of the future, and to take action. If we do not reduce this level of reliance, there is a risk (Dodge et al., 2022) that PSTs will not be able to develop their thinking skills sufficiently and will not be able to use different PCK and TPACK components in a way that informs each other. Implications of this will be suggested in the following sections.
For experienced in-service teachers, elaboration on how to advance integration strategies, focusing on how AI can enrich existing plans, and collaborative lesson planning with AI to develop individual and group PCK (pPCK and cPCK) (Carlson and Daehler, 2019; Forsler et al., 2024) would be valuable. Collaborative lesson planning with AI research should be conducted on different topics with distinct natures. Acids and bases are among the primary topics in chemistry education, with daily-life examples, calculations, and particulate-level representations (Drechsler and Van Driel, 2008; Cetin-Dindar and Geban, 2017). More research on different topics with nature of science links (e.g., periodic table), abstract concepts (e.g., orbitals and hybridization), and more calculations (e.g., physical chemistry topics), and graphics (e.g., enthalpy of reactions) should be conducted to enrich the chemistry education literature with AI use.
Finally, teacher educators and other instructors require training on utilizing AI to enhance chemistry instruction at the tertiary level (Koh, 2020). The limited aspect of AI tools, ethical issues, and their strengths should be introduced to the teachers and instructors (van den Berg and du Plessis, 2023; Zhai, 2023). That introduction should be domain-specific (Feldman-Maggor et al., 2025) because TPACK and PCK are.
(a) Identifies the properties of acids and bases.
(b) Lists the similar properties of acids and bases.
(c) Lists the different properties of acids and bases.
S.8.5.4.3. Uses “pH” values related to the acidity and basicity of substances to reason inductively.
(a) Finds patterns related to the “pH” values of substances.
(b) Makes generalizations using the “pH” values of substances (National Ministry of Education, 2024, p. 213).
Alternative conceptions reported related to the acids and bases topic | The study reported the alternative conception |
---|---|
✓ Acids and bases are dangerous, so they should not be touched | Ross and Munby (1991) |
✓ Acids are always harmful; bases are not harmful | |
✓ Everything that tastes sour is acidic | |
✓ All acids are harmful | |
Cleaners such as soap and detergents are only basic | Nahadi et al. (2022) |
✓ pH value is just a number | Mubarokah et al. (2018) |
✓ Acids and bases are opposite substances; they have no common properties | |
All acids are strong, and all bases are weak | |
Bases are harmless | Bučková and Prokša (2021) |
✓ pH is a linear scale | Sheppard (2006) |
✓ pH value is just a number | |
Neutralization always occurs at pH 7 | Schmidt (1995) |
This journal is © The Royal Society of Chemistry 2025 |